Journal of Personality and Social Psychology, 2006, Vol. 91-03

Volume 91 Number 3 Published monthly by the American Psychological Association September 2006 ISSN 0022-3514 Journal ...

Author: JPSP

102 downloads 1311 Views 8MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Volume 91 Number 3 Published monthly by the American Psychological Association

September 2006

ISSN 0022-3514

Journal of

Personality and Social Psychology ATTITUDES AND SOCIAL COGNITION

Charles M. Judd, Editor Dacher Keltner, Associate Editor Anne Maass, Associate Editor Bernd Wittenbrink, Associate Editor Vincent Yzerbyt, Associate Editor INTERPERSONAL RELATIONS AND GROUP PROCESSES

John F. Dovidio, Editor Daphne Blunt Bugental, Associate Editor Beverley Fehr, Associate Editor Jacques-Philippe Leyens, Associate Editor Antony Manstead, Associate Editor Jeffry A. Simpson, Associate Editor Scott Tindale, Associate Editor Jacquie D. Vorauer, Associate Editor PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES

www.apa.org/journals/psp.html

Charles S. Carver, Editor Tim Kasser, Associate Editor Mario Mikulincer, Associate Editor Eva M. Pomerantz, Associate Editor Richard W. Robins, Associate Editor Gerard Saucier, Associate Editor Thomas A. Widiger, Associate Editor

The Journal of Personality and Social Psychology publishes original papers in all areas of personality and social psychology. It emphasizes empirical reports but may include specialized theoretical, methodological, and review papers. The journal is divided into three independently edited sections: f ATTITUDES AND SOCIAL COGNITION addresses those domains of social behavior in which cognition plays a major role, including the interface of cognition with overt behavior, affect, and motivation. Among topics covered are the formation, change, and utilization of attitudes, attributions, and stereotypes, person memory, self-regulation, and the origins and consequences of moods and emotions insofar as these interact with cognition. Of interest also is the influence of cognition and its various interfaces on significant social phenomena such as persuasion, communication, prejudice, social development, and cultural trends. f INTERPERSONAL RELATIONS AND GROUP PROCESSES focuses on psychological and structural features of interaction in dyads and groups. Appropriate to this section are papers on the nature and dynamics of interactions and social relationships, including interpersonal attraction, communication, emotion, and relationship development, and on group and organizational processes such as social influence, group decision making and task performance, intergroup relations, and aggression, prosocial behavior and other types of social behavior. f PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES publishes research on all aspects of personality psychology. It includes studies of individual differences and basic processes in behavior, emotions, coping, health, motivation, and other phenomena that reflect personality. Articles in areas such as personality structure, personality development, and personality assessment are also appropriate to this section of the journal, as are studies of the interplay of culture and personality and manifestations of personality in everyday behavior. Manuscripts: Submit manuscripts to the appropriate section editor according to the above definitions and according to the Instructions to Authors. Section editors reserve the right to redirect papers among themselves as appropriate unless an author specifically requests otherwise. Rejection by one section editor is considered rejection by all; therefore a manuscript rejected by one section editor should not be submitted to another. The opinions and statements published are the responsibility of the authors, and such opinions and statements do not necessarily represent the policies of APA or the views of the editors. Section editors’ addresses appear below:

ATTITUDES AND SOCIAL COGNITION Charles M. Judd, Editor c/o Laurie Hawkins Department of Psychology University of Colorado UCB 345 Boulder, CO 80309

INTERPERSONAL RELATIONS AND GROUP PROCESSES John F. Dovidio, Editor Department of Psychology University of Connecticut 406 Babbidge Road Storrs, CT 06269-1020

PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES Charles S. Carver, Editor ATTN: JPSP: PPID Department of Psychology University of Miami P.O. Box 248185 Coral Gables, FL 33124-0751 Change of Address: Send change of address notice and a recent mailing label to the attention of the Subscriptions Department, American Psychological Association, 30 days prior to the actual change of address. APA will not replace undelivered copies resulting from address changes;

journals will be forwarded only if subscribers notify the local post office in writing that they will guarantee periodicals forwarding postage. Electronic access: APA members who subscribe to this journal have automatic access to a 3-year file of the journal in the PsycARTICLES姞 full-text database. See http://members.apa.org/access. Reprints: Authors may order reprints of their articles from the printer when they receive proofs. Single Issues, Back Issues, and Back Volumes: For information regarding back issues or back volumes write to Order Department, American Psychological Association, 750 First Street, NE, Washington, DC 20002-4242. Microform Editions: For information regarding microform editions, write to University Microfilms, Ann Arbor, MI 48106. Copyright and Permission: Those who wish to reuse APAcopyrighted material in a non-APA publication must secure from APA and the author of reproduced material written permission to reproduce a journal article in full or journal text of more than 500 words. APA normally grants permission contingent upon like permission of the author, inclusion of the APA copyright notice on the first page of reproduced material, and payment of a fee of $20 per page. Permission from APA and fees are waived for those who wish to reproduce a single table or figure from a journal for use in a print product, provided the author’s permission is obtained and full credit is given to APA as copyright holder and to the author through a complete citation. (Requesters requiring written permission for commercial use of a single table or figure will be assessed a $25 service fee.) Permission and fees are waived for authors who wish to reproduce their own material for personal use; fees only are waived for authors who wish to use more than a single table or figure of their own material commercially (but for use in edited books, fees are waived for the author only if serving as the book editor). Permission and fees are waived for the photocopying of isolated journal articles for nonprofit classroom or library reserve use by instructors and educational institutions. A permission fee may be charged to the requester if students are charged for the material, multiple articles are copied, or large-scale copying is involved (e.g., for course packs). Access services may use unedited abstracts without the permission of APA or the author. Libraries are permitted to photocopy beyond the limits of U.S. copyright law: (1) post-1977 articles, provided the per-copy fee in the code for this journal (0022-3514/06/ $12.00) is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923; (2) pre-1978 articles, provided that the per-copy fee stated in the Publishers’ Fee List is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. Address requests for reprint permission to the Permissions Office, American Psychological Association, 750 First Street, NE, Washington, DC 20002-4242. APA Journal Staff: Susan J. A. Harris, Senior Director, Journals Program; Skip Maier, Director, Journal Services; Paige W. Jackson, Director, Editorial Services; Greg Long, Production Account Manager; Jodi Ashcraft, Advertising Sales Manager.

Journal of Personality and Social Psychology (ISSN 0022-3514) is published monthly in two volumes per year by the American Psychological Association, 750 First Street, NE, Washington, DC 20002-4242. Subscriptions are available on a calendar year basis only (January through December). The 2006 rates follow: Nonmember Individual: $421 Domestic, $464 Foreign, $491 Air Mail. Institutional: $1,249 Domestic, $1,340 Foreign, $1,367 Air Mail. APA Member: $202. Write to Subscriptions Department, American Psychological Association, 750 First Street, NE, Washington, DC 20002-4242. Printed in the U.S.A. Periodicals postage paid at Washington, DC, and at additional mailing offices. POSTMASTER: Send address changes to Journal of Personality and Social Psychology, 750 First Street, NE, Washington, DC 20002-4242.

The paper in this journal meets or exceeds EPA guidelines for recycled paper. Since 1986, this journal has been printed on acid-free paper.

NEW RELEASES from the American Psychological Association

Second-Order Change in Psychotherapy The Golden Thread That Uniﬁes Effective Treatments

J. Scott Fraser and Andrew D. Solovey 2007. 312 pages. Hardcover. List: $69.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-436-1 ISBN-13: 978-1-59147-436-4 Item # 4317117

Rumor Psychology Social and Organizational Approaches

Nicholas DiFonzo and Prashant Bordia 2007. 392 pages. Hardcover. List: $69.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-426-4 ISBN-13: 978-1-59147-426-5 Item # 4316079

Educating the Human Brain Michael I. Posner and Mary K. Rothbart

Dialogues on Difference Studies of Diversity in the Therapeutic Relationship

Edited by J. Christopher Muran 2007. 336 pages. Hardcover. List: $59.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-451-5 ISBN-13: 978-1-59147-451-7 Item # 4316083

Insight in Psychotherapy Edited by Louis G. Castonguay and Clara E. Hill 2007. 488 pages. Hardcover List: $69.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-477-9 ISBN-13: 978-1-59147-477-7 Item # 4317122

Contributions Toward Evidence-Based Psychocardiology A Systematic Review of the Literature

Edited by Jochen Jordan, Benjamin Bardé and Andreas Michael Zeiher 2007. 552 pages. Hardcover. List: $99.95 APA Member/Afﬁliate: $59.95 ISBN 1-59147-358-6 ISBN-13: 978-1-59147-358-9 Item # 4318028

Shy Children, Phobic Adults Nature and Treatment of Social Anxiety Disorder Second Edition

2007. 272 pages. Hardcover. List: $79.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-381-0 ISBN-13: 978-1-59147-381-7 Item # 4318029

Psychology and Economic Injustice Personal, Professional, and Political Intersections

Bernice Lott and Heather E. Bullock 2007. 192 pages. Hardcover. List: $59.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-429-9 ISBN-13: 978-1-59147-429-6 Item # 4318029

Bereavement in Late Life Coping, Adaptation, and Developmental Inﬂuences

Robert O. Hansson and Margaret S. Stroebe 2007. 232 pages. Hardcover. List: $59.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-472-8 ISBN-13: 978-1-59147-472-2 Item # 4317119

APA Dictionary of Psychology Editor-in-Chief: Gary R. VandenBos 2006. 1,008 pages. Hardcover. List: $49.95 APA Member/Afﬁliate: $39.95 ISBN 1-59147-380-2 ISBN-13: 978-1-59147-380-0 Item # 4311007

Deborah C. Beidel and Samuel M. Turner 2007. 408 pages. Hardcover. List: $59.95 APA Member/Afﬁliate: $49.95 ISBN 1-59147-452-3 ISBN-13: 978-1-59147-452-4 Item # 4317118 AD0472

To O r d e r : 8 0 0 - 374 -2721 • w w w. a p a .or g / b o o ks

apa books

Available July 2006

A Landmark Reference That Deﬁnes the Lexicon of Psychology

APA Dictionary of Psychology Editor-in-Chief: Gary R. VandenBos, PhD The American Psychological Association is proud to announce the publication of an invaluable addition to your reference shelf, one that represents a major scholarly and editorial undertaking. With over 25,000 terms and deﬁnitions, the APA Dictionary of Psychology encompasses all areas of research and application, and includes coverage of concepts, processes, and therapies across all the major subdisciplines of psychology. Ten years in the making and edited by a distinguished editorial board of nearly 100 psychological scholars, researchers and practitioners, the Dictionary is destined to become the most authoritative reference of its kind. Academicians, researchers, clinicians, undergraduates and graduate students, and professionals in allied mental health, education, medicine, and law, as well as academic and public libraries, will ﬁnd the Dictionary essential. 2006. Hardcover. 1,008 pages. List: $49.95 | APA Member/Afﬁliate: $39.95 ISBN 1-59147-380-2 | Item # 4311007

The APA Dictionary of Psychology includes •

25,000 entries offering clear and authoritative deﬁnitions

•

Thousands of incisive cross-references directing the user to synonyms and antonyms, acronyms and abbreviations, and related terms and concepts that deepen the user’s understanding of related topics

•

Balanced coverage of over 100 subject areas across the ﬁeld of psychology including clinical, experimental, neuropsychology, cognitive, personality and social, developmental, health, psychopharmacology, methodology and statistics, and many others

• 800-374-2721 www.apa.org/books

Entries include nearly 8,000 terms from the APA’s Thesaurus of Psychological Index Terms® which helps students and researchers reﬁne their APA database searches (such as the ﬂagship PsycINFO® bibliographic database’s 2+ million records)

• •

“A Guide to Use” and “Quick Guide to Format” that together explain important stylistic and format features to help readers most effectively use the Dictionary Each of four appendices gathers terms into a thematic summary listing, covering (1) biographies; (2) institutions, associations and organizations; (3) psychological therapies and interventions; and (4) psychological tests and assessment instruments

AD0470

Journal of

Personality Social Psychology and

www.apa.org/journals/psp.html September 2006 VOLUME 91 NUMBER 3

Copyright © 2006 by the American Psychological Association

Attitudes and Social Cognition 369

Voluntary Settlement and the Spirit of Independence: Evidence from Japan’s “Northern Frontier” Shinobu Kitayama, Keiko Ishii, Toshie Imada, Kosuke Takemura, and Jenny Ramaswamy

385

At the Boundaries of Automaticity: Negation as Reflective Operation Roland Deutsch, Bertram Gawronski, and Fritz Strack

406

A Novel View of Between-Categories Contrast and Within-Category Assimilation Sarah Queller, Terry Schell, and Winter Mason

423

Resisting Persuasion by the Skin of One’s Teeth: The Hidden Success of Resisted Persuasive Messages Zakary L. Tormala, Joshua J. Clarkson, and Richard E. Petty

Interpersonal Relations and Group Processes 436

Mere Effort as the Mediator of the Evaluation–Performance Relationship Stephen G. Harkins

456

High-Maintenance Interaction: Inefficient Social Coordination Impairs Self-Regulation Eli J. Finkel, W. Keith Campbell, Amy B. Brunell, Amy N. Dalton, Sarah J. Scarbeck, and Tanya L. Chartrand

476

The Time Course of Grief Reactions to Spousal Loss: Evidence From a National Probability Sample Katherine B. Carnelley, Camille B. Wortman, Niall Bolger, and Christopher T. Burke

493

What Do People Value When They Negotiate? Mapping the Domain of Subjective Value in Negotiation Jared R. Curhan, Hillary Anger Elfenbein, and Heng Xu

513

Can Manipulations of Cognitive Load Be Used to Test Evolutionary Hypotheses? H. Clark Barrett, David A. Frederick, Martie G. Haselton, and Robert Kurzban

519

Constraining Accommodative Homunculi in Evolutionary Explorations of Jealousy: A Reply to Barrett et al. (2006) David DeSteno, Monica Y. Bartlett, and Peter Salovey

Personality Processes and Individual Differences 524

Conserving Self-Control Strength Mark Muraven, Dikla Shmueli, and Edward Burkley

538

Five Types of Personality Continuity in Childhood and Adolescence Filip De Fruyt, Meike Bartels, Karla G. Van Leeuwen, Barbara De Clercq, Mieke Decuyper, and Ivan Mervielde

(contents continue)

553

Terror Management and Religion: Evidence That Intrinsic Religiousness Mitigates Worldview Defense Following Mortality Salience Eva Jonas and Peter Fischer

568

The Thrill of Victory and the Agony of Defeat: Spontaneous Expressions of Medal Winners of the 2004 Athens Olympic Games David Matsumoto and Bob Willingham

Other 475 435 384

American Psychological Association Subscription Claims Information E-Mail Notification of Your Latest Issue Online! Instructions to Authors

ii

ATTITUDES AND SOCIAL COGNITION CHARLES M. JUDD, Editor University of Colorado at Boulder ASSOCIATE EDITORS DACHER KELTNER University of California, Berkeley ANNE MAASS Universita` di Padova, Padova, Italy BERND WITTENBRINK University of Chicago VINCENT YZERBYT Catholic University of Louvain, Louvain-la-Neuve, Belgium CONSULTING EDITORS ICEK AJZEN University of Massachusetts

ALICE H. EAGLY Northwestern University

NIRA LIBERMAN Tel Aviv University, Tel Aviv, Israel

LINDA SKITKA University of Illinois at Chicago

NICHOLAS EPLEY University of Chicago

DIANE M. MACKIE University of California, Santa Barbara

JOHN SKOWRONSKI Northern Illinois University

RUSSELL H. FAZIO Ohio State University

NEIL MACRAE Dartmouth College

ELIOT R. SMITH Indiana University Bloomington

LISA FELDMAN BARRETT Boston College

TONY MANSTEAD Cardiff University, Cardiff, Wales

SUSAN T. FISKE Princeton University

THOMAS MUSSWEILER Universita¨t Ko¨ln, Cologne, Germany

DIEDERIK STAPEL University of Groningen, Groningen, the Netherlands

BARBARA L. FREDRICKSON University of Michigan

JAMES M. OLSON University of Western Ontario, London, Ontario, Canada

WENDI GARDNER Northwestern University

MAHZARIN BANAJI Harvard University

BERNADETTE M. PARK University of Colorado at Boulder

DANIEL GILBERT Harvard University

MONICA BIERNAT University of Kansas

RICHARD E. PETTY Ohio State University

THOMAS GILOVICH Cornell University

IRENE V. BLAIR University of Colorado at Boulder

NEAL J. ROESE University of Illinois at Urbana– Champaign

ANTHONY G. GREENWALD University of Washington

GALEN V. BODENHAUSEN Northwestern University

DAVID L. HAMILTON University of California, Santa Barbara

MARKUS BRAUER LAPSCO, Universite´ Blaise Pascal Clermont-Ferrand, France

EDWARD R. HIRT Indiana University Bloomington

MARILYNN B. BREWER Ohio State University

TIFFANY ITO University of Colorado at Boulder

JOHN T. CACIOPPO University of Chicago

YOSHIHISA KASHIMA University of Melbourne, Victoria, Australia

OLIVIER CORNEILLE Catholic University of Louvain, Louvain-la-Neuve, Belgium

KARLE CHRISTOPHE KLAUER Albrecht-Ludwigs-Universita¨t Freiburg, Freiburg, Germany

PATRICIA DEVINE University of Wisconsin—Madison AP DIJKSTERHUIS University of Amsterdam, Amsterdam, the Netherlands DAVID DUNNING Cornell University

MYRON ROTHBART University of Oregon LAURIE RUDMAN Rutgers, The State University of New Jersey MARK SCHALLER University of British Columbia, Vancouver, British Columbia, Canada TONI SCHMADER University of Arizona NORBERT SCHWARZ University of Michigan

ARIE W. KRUGLANSKI University of Maryland

GU¨N R. SEMIN Free University, Amsterdam, the Netherlands

ALAN LAMBERT Washington University in St. Louis

JEFFREY W. SHERMAN University of California, Davis

JENNIFER LERNER Carnegie Mellon University

STEVEN J. SHERMAN Indiana University Bloomington

FRITZ STRACK Universita¨t Wu¨rzburg, Wu¨rzburg, Germany ABRAHAM TESSER University of Georgia YAACOV TROPE New York University THERESA K. VESCIO Pennsylvania State University WILLIAM VON HIPPEL University of New South Wales, Sydney, Australia DUANE T. WEGENER Purdue University DANIEL M. WEGNER Harvard University DIRK WENTURA Saarland University, Saarbru¨cken, Germany DANIEL WIGBOLDUS Radboud University Nijmegen, Nijmegen, the Netherlands TIMOTHY D. WILSON University of Virginia PIOTR WINKIELMEN University of California, San Diego MARK P. ZANNA University of Waterloo, Waterloo, Ontario, Canada

ASSISTANT TO THE EDITOR—LAURIE HAWKINS

INTERPERSONAL RELATIONS AND GROUP PROCESSES JOHN F. DOVIDIO, Editor University of Connecticut ASSOCIATE EDITORS DAPHNE BLUNT BUGENTAL University of California, Santa Barbara BEVERLEY FEHR University of Winnipeg, Winnipeg, Manitoba, Canada JACQUES-PHILIPPE LEYENS Catholic University of Louvain, Louvain-la-Neuve, Belgium ANTONY MANSTEAD Cardiff University, Cardiff, United Kingdom JEFFRY A. SIMPSON University of Minnesota, Twin Cities Campus

ARTHUR ARON State University of New York at Stony Brook

RUPERT BROWN The University of Kent at Canterbury, Canterbury, England

XIMENA ARRIAGA Purdue University

LORNE CAMPBELL University of Western Ontario, London, Ontario, Canada

WINTON W. T. AU The Chinese University of Hong Kong, Shatin, Hong Kong MARK BALDWIN McGill University, Montreal, Quebec, Canada KIM BARTHOLOMEW Simon Fraser University, Burnaby, British Columbia, Canada C. DANIEL BATSON University of Kansas

SCOTT TINDALE Loyola University Chicago

B. ANNE BETTENCOURT University of Missouri—Columbia

JACQUIE D. VORAUER University of Manitoba, Winnipeg, Manitoba, Canada

GERD BOHNER Universita¨t Bielefeld, Bielefeld, Germany

CONSULTING EDITORS DOMINIC ABRAMS University of Kent at Canterbury, Canterbury, England

NIALL BOLGER Columbia University

CHRIS AGNEW Purdue University

JONATHON D. BROWN University of Washington

NYLA R. BRANSCOMBE University of Kansas

SERENA CHEN University of California, Berkeley MARGARET CLARK Yale University CARSTEN DE DREU University of Amsterdam, Amsterdam, the Netherlands STE´PHANIE DEMOULIN Catholic University of Louvain Louvain-la-Neuve, Belgium, and Belgan National Fund for Scientific Research, Brussels, Belgium

KLAUS FIEDLER University of Heidelberg, Heidelberg, Germany GARTH FLETCHER University of Canterbury, Christchurch, New Zealand SHELLY GABLE University of California, Los Angeles LOWELL GAERTNER University of Tennessee, Knoxville SAMUEL L. GAERTNER University of Delaware ADAM GALINSKY Northwestern University PETER GLICK Lawrence University STEPHANIE A. GOODWIN Purdue University

DAVID DESTENO Northeastern University

MARTIE G. HASSELTON University of California, Los Angeles

STEVE DRIGOTAS Johns Hopkins University

S. ALEXANDER HASLAM University of Exeter, Exeter, United Kingdom

ELISSA S. EPEL University of California, San Francisco VICTORIA ESSES University of Western Ontario, London, Ontario, Canada

(editors continue)

VERLIN HINSZ North Dakota State University GORDON HODSON Brock University, St. Catherine’s, Ontario, Canada

MICHAEL A. HOGG University of Queensland, Brisbane, Australia

LAURA J. KRAY University of California, Berkeley

ANDREA B. HOLLINGSHEAD University of Southern California JOHN G. HOLMES University of Waterloo, Waterloo, Ontario, Canada RICK H. HOYLE University of Kentucky

JAMES R. LARSON JR. University of Illinois at Chicago COLIN WAYNE LEACH University of Sussex, Sussex, United Kingdom JOHN LEVINE University of Pittsburgh JOHN E. LYDON McGill University, Montreal, Quebec, Canada

JOLANDA JETTEN University of Exeter, Exeter, United Kingdom

JON K. MANER Florida State University

JAMES D. JOHNSON University of North Carolina at Wilmington TATSUYA KAMEDA Hokkaido University, Sapporo, Japan BENJAMIN R. KARNEY RAND Corporation, Santa Monica, California YOSHI KASHIMA University of Melbourne, Victoria, Australia

BRENDA MAJOR University of California, Santa Barbara CRAIG MCGARTY Australian National University, Canberra, Australia WENDY BERRY MENDES Harvard University RICHARD MORELAND University of Pittsburgh

DEBORAH A. KASHY Michigan State University

SABINE OTTEN University of Gro¨ningen, Gro¨ningen, the Netherlands CRAIG D. PARKS Washington State University LOUIS A. PENNER Wayne State University PAULA PIETROMONACO University of Massachusetts at Amherst

CHRISTINE SMITH Grand Valley State University HEATHER J. SMITH Sonoma State University RUSSELL SPEARS Cardiff University, Cardiff, Wales CHARLES STANGOR University of Maryland GARY L. STASSER Miami University—Ohio

TOM POSTMES University of Exeter, Exeter, United Kingdom

WALTER STEPHAN New Mexico State University

FELICIA PRATTO University of Connecticut

WILLIAM B. SWANN JR. University of Texas at Austin

HARRY T. REIS University of Rochester

JANET SWIM Pennsylvania State University

W. STEVEN RHOLES Texas A&M University

LEIGH L. THOMPSON Northwestern University

JENNIFER A. RICHESON Northwestern University

TOM TYLER New York University

MARK SCHALLER University of British Columbia, Vancouver, British Columbia, Canada

JEROEN VAES University of Padova, Padova, Italy

BRIAN MULLEN KERRY KAWAKAMI University of Kent at Canterbury, York University, Toronto, Ontario, Canada Canterbury, England JANICE R. KELLY AME´LIE MUMMENDEY Purdue University Friedrich-Schiller-Universita¨t, Jena, DACHER KELTNER Jena, Germany University of California, Berkeley MARK MURAVEN DAVID A. KENNY University at Albany, State University University of Connecticut of New York

DAVID A. SCHROEDER University of Arkansas

KEES VAN DEN BOS University of Utrecht, Utrecht, the Netherlands

CONSTANTINE SEDIKIDES University of Southampton, Southampton, England

PAUL A. M. VAN LANGE Free University, Amsterdam, Amsterdam, the Netherlands

PHILLIP R. SHAVER University of California, Davis

LAURIE R. WEINGART Carnegie Mellon University

J. NICOLE SHELTON Princeton University

GWEN M. WITTENBAUM Michigan State University

DOUGLAS T. KENRICK Arizona State University

SANDRA L. MURRAY State University of New York at Buffalo

MARGARET SHIH University of Michigan

NORBERT L. KERR Michigan State University

STACEY SINCLAIR LISA A. NEFF University of Virginia University of Toledo ASSISTANT TO THE EDITOR—CHRISTINE KELLY

WENDY L. WOOD Texas A&M University MICHAEL ZA´RATE University of Texas at El Paso

PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES CHARLES S. CARVER, Editor University of Miami ASSOCIATE EDITORS TIM KASSER Knox College

GEORGE A. BONANNO Teachers College, Columbia University

AVSHALOM CASPI MARIO MIKULINCER Bar-Ilan University, Ramat-Gan, Israel King’s College, London EDWARD C. CHANG EVA M. POMERANTZ University of Michigan University of Illinois at Urbana– Champaign RICHARD W. ROBINS University of California, Davis GERARD SAUCIER University of Oregon THOMAS A. WIDIGER University of Kentucky

SERENA CHEN University of California, Berkeley A. TIMOTHY CHURCH Washington State University JAMES COAN University of Wisconsin—Madison M. LYNNE COOPER University of Missouri—Columbia

EDDIE HARMON-JONES Texas A&M University

DANIEL W. RUSSELL Iowa State University

TODD HEATHERTON Dartmouth College

OLIVER C. SCHULTHEISS University of Michigan

JUTTA HECKHAUSEN University of California, Irvine

SUZANNE C. SEGERSTROM University of Kentucky

STEVEN J. HEINE University of British Columbia, Vancouver, British Columbia, Canada

KENNON M. SHELDON University of Missouri—Columbia

RICHARD KOESTNER McGill University Montreal, Quebec, Canada

C. R. SNYDER University of Kansas SANJAY SRIVASTAVA University of Oregon

DAVID LUBINSKI Vanderbilt University

TIMOTHY STRAUMAN Duke University

MICHAEL EID University of Geneva, Geneva, Switzerland

RICHARD E. LUCAS Michigan State University

MICHAEL J. STRUBE Washington University

ROBERT R. MCCRAE National Institute on Aging, Baltimore

JERRY SULS University of Iowa

ANDREW J. ELLIOT University of Rochester

WENDY BERRY MENDES Harvard University

WILLIAM B. SWANN JR. University of Texas at Austin

LISA FELDMAN BARRETT Boston College

RODOLFO MENDOZA-DENTON University of California, Berkeley

HOWARD TENNEN University of Connecticut Health Center

WILLIAM FLEESON Wake Forest University

DANIEL K. MROCZEK Fordham University

MICHAEL C. ASHTON Brock University, St. Catherines, Ontario, Canada

SUZANNE THOMPSON Pomona College

R. CHRIS FRALEY University of Illinois at Chicago

STEPHEN A. PETRILL Pennsylvania State University

OZLEM AYDUK University of California, Berkeley

ANTONIO L. FREITAS State University of New York at Stony Brook

RALPH L. PIEDMONT Loyola College in Maryland

ROBERT J. VALLERAND Universite´ du Que´bec a` Montre´al Montreal, Quebec, Canada

CONSULTING EDITORS STEPHAN A. AHADI American Institutes for Research, Washington, DC JAMIE ARNDT University of Missouri—Columbia JENS B. ASENDORPF Humboldt-Universita¨t Berlin Berlin, Germany

E. ASHBY PLANT Florida State University

ROY F. BAUMEISTER Florida State University VERO´NICA BENET-MARTI´NEZ University of California, Riverside

DAVID C. FUNDER University of California, Riverside STEVEN W. GANGESTAD University of New Mexico

BRENT ROBERTS University of Illinois at Urbana–Champaign

APRIL L. BLESKE-RECHEK University of Wisconsin—Eau Claire

CAROL L. GOHM University of Mississippi

MICHAEL D. ROBINSON North Dakota State University

ASSISTANT TO THE EDITOR—JESSICA LILLESAND

KATHLEEN D. VOHS University of Minnesota DAVID WATSON University of Iowa BARBARA WOIKE Columbia University REX A. WRIGHT University of Alabama at Birmingham

ATTITUDES AND SOCIAL COGNITION

Voluntary Settlement and the Spirit of Independence: Evidence from Japan’s “Northern Frontier” Shinobu Kitayama

Keiko Ishii

University of Michigan

Hokkaido University

Toshie Imada

Kosuke Takemura

University of Michigan

Hokkaido University

Jenny Ramaswamy Golden Gate University The authors hypothesized that economically motivated voluntary settlement in the frontier fosters independent agency. While illuminating the historical origin of American individualism, this hypothesis can be most powerfully tested in a region that is embedded in a broader culture of interdependence and yet has undergone a recent history of such settlement. The authors therefore examined residents of Japan’s northern island (Hokkaido). Hokkaido was extensively settled by ethnic Japanese beginning in the 1870s and for several decades thereafter. Many of the current residents of Hokkaido are the descendents of the original settlers from this period. As predicted, Japanese socialized and/or immersed in Hokkaido were nearly as likely as European Americans in North America to associate happiness with personal achievement (Study 1), to show a personal dissonance effect wherein self-justification is motivated by a threat to personal self-images (Study 2), and to commit a dispositional bias in causal attribution (Study 3). In contrast, these marker effects of independent agency were largely absent for non-Hokkaido residents in Japan. Implications for theories of cultural change and persistence are discussed. Keywords: culture and self, attribution, subjective well-being, dissonance, individualism

address, March 4, 1881; Schlesinger, 1986). Over nearly 3 centuries, until the end of the 19th century, new lands of the West were continuously acquired, opened, exploited, and settled by Americans of mostly European descent, and the frontier rapidly moved westbound. What had once been the west of the territory was soon to become its midwest, for example. The mentality, or cultural ethos, that was fostered by this collective social movement of immigration and subsequent settlement in the frontier is called the frontier spirit (Turner, 1920). This cultural ethos is composed of collective beliefs and practices of independent agency (Kitayama & Uchida, 2005; Markus & Kitayama, 2004). Importantly anchored in the idea of the “American dream” (Hochschild, 1995), independent agency is composed of strong orientations toward personal goal pursuit and personal choice. Many observers of American culture (e.g., Bellah, Madsen, Sullivan, Swindler, & Tipton, 1985; Dewey, 1930; Schlesinger, 1986; de Tocqueville, 1862/1969; Turner, 1920) have pointed out that the history of voluntary settlement in the frontier significantly contributed to American individualism as it is known today. The purpose of the present work is to examine this possibility. We argue that if voluntary settlement in the frontier is causally linked to American individualism, a similar cultural ethos of indepen-

In the last 400 years, the United States has been a major magnet for immigrants from all over the world (Hong, Wan, No, & Chiu, in press; Sua´rez-Orozco, 2003). With the important exception of African Americans, who were forced to work as slaves, the vast majority of immigrants voluntarily settled in North America. Moreover, from its very beginning, the history of the United States was that of relentless expansion to the west. This westbound expansion was justified in terms of a mythology of manifest destiny, which proposes that it is a sacred mission of all Americans to extend the “boundaries of freedom” (J. A. Garfield, inaugural

Shinobu Kitayama and Toshie Imada, Department of Psychology, University of Michigan; Keiko Ishii and Kosuke Takemura, Department of Behavioral Science, Hokkaido University, Sapporo, Hokkaido, Japan; Jenny Ramaswamy, International Admissions and Advising, Golden Gate University. We thank members of the Hokkaido University Center of Excellence Program on Cultural and Ecological Foundations of the Mind for their support in carrying out this work. We also thank Don Munro for his helpful comments on an earlier version of the article. Correspondence concerning this article should be addressed to Shinobu Kitayama, Department of Psychology, University of Michigan, 503 Church Street, Ann Arbor, MI 48109. E-mail: [email protected]

Journal of Personality and Social Psychology, 2006, Vol. 91, No. 3, 369 –384 Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.369

369

370

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

dence should be found in other regions of the world insofar as the regions share a similar history of voluntary settlement.

Triangulation: Identifying Causally Active Elements of Culture

Voluntary Settlement Hypothesis

It goes without saying that many aspects of the contemporary American culture have their origins in the modern period in Western Europe. Furthermore, many of the ideas of the modern West can be traced back to Greek civilization (Nisbett, 2003). Reformation of the Catholic church and the resulting Calvinic varieties of Protestantism had a major influence (Sanchez-Berks, 2005; Weber, 1904/1930). So did numerous philosophers of the Enlightenment, including Rousseau, Locke, and Voltaire (B. Morris, 1991; Taylor, 1979). Yet the voluntary settlement hypothesis suggests that, in addition to the modern Western European influences and heritage, the initial emigration to the land of opportunity and the subsequent social movement of expanding the nation’s territory to the western frontier substantially fostered and reinforced the ethos of independence in North America. In the process, religious and many other ideational elements of the modern West must have been inseparably interwoven into the spirit of independent agency. Although cursory, this historical analysis reveals a serious problem in testing the voluntary settlement hypothesis: Many sociohistorical factors are confounded with the westbound expansion during the 17th through the 19th centuries; moreover, these factors initially seem impossible to disentangle. One powerful tool for disentangling the history and identifying causally active elements of culture is the method of triangulation (Medin, Unsworth, & Hirschfeld, in press). Imagine Culture A and Culture B. Culture A is more independent than Culture B, and the task is to test the hypothesis that one significant cause for the independence of Culture A is its relatively recent history of voluntary settlement. One can critically test this hypothesis if one can identify a subculture in Culture B, Culture B⬘, that is like Culture B in all conceivable dimensions and aspects except for the hypothesized causal element—that is, voluntary settlement: Unlike Culture B, Culture B⬘ has undergone a recent history of voluntary settlement. If one could show that Culture B⬘ is more similar to Culture A than to Culture B in respect to independent agency, this would support the hypothesized role of voluntary settlement in producing the independent orientation.

In analyzing cultural consequences of voluntary settlement, we must take three distinct processes into account. First, voluntary settlement in a frontier is motivated by desires for personal wealth and freedom, and, furthermore, it requires a major investment and personal sacrifice for anyone who engages in it. Accordingly, we hypothesize that voluntary settlers are likely to have a highly autonomous, independent, goal-oriented mental set. This goaloriented mental set predisposes the individuals to seek novelty and to take risks. As Fredrick Jackson Turner (1920)—the first proponent of the frontier thesis for American individualism—noted, referring to early settlers who set out for westbound journeys from the East Coast cities and colonies of America, Whenever social conditions tended to crystallize in the East, whenever capital tended to press upon labor or political restraints to impede the freedom of the mass, there was this gate of escape to the free conditions of the frontier. These free lands promoted individualism, economic equality, freedom to rise, [and] democracy. (p. 259)

Second, frontier life is often harsh, and every endeavor entails substantial risks— both economic and corporeal—and, thus, more often than not, mere survival is at stake. Westbound journeys were filled with immediate dangers of many sorts. Day after day, there was a dire need for self-protection and self-promotion.1 Without the mental qualities of independent goal pursuit, self-directedness, and self-reliance (and some luck), a sure death was waiting for settlers along the way (Stewart, 1963). These life conditions of the frontier are likely to reinforce the goal-oriented mental set of independent agency (Schooler & Mulatu, 2004). Third, a region that is composed of a large number of voluntary settlers with goal-oriented mental characteristics will soon develop a culturally shared lay theory of behavior as internally motivated and controlled. This dispositional lay theory of independence is ingrained into social practices, daily routines, modes of child rearing, daily discourses, and even explicit education. We suggest that, in the frontier, the lay theory of independence is often appropriated to foster social relations and social organizations.2 In this way, the cultural environment is gradually structured to sustain the ethos of independence. In the process, the dispositional lay theory of independence becomes fully legitimized and normative; as a consequence, it is likely to be transmitted over generations, even when the frontier ceases to be a reality.3 Today, for example, the frontier and an associated image of the American dream might seem a myth from the past. In fact, however, many aspects of daily life and events, from personal narratives (McAdams, 2005) to American foreign policies and beyond (e.g., Faludi, 2003), are powerfully organized and given shape and force by various images of the frontier and the American dream (Hochschild, 1995). When taken in combination, the three processes—self-selection for settlement, reinforcement of independence during settlement, and institutionalization of tacit beliefs and practices of independence—induce and maintain the ethos of independent agency.4 This culture of frontier is likely to be resilient and rapidly consolidated because it is created anew, “from scratch,” in a place that is relatively separated both from the rest of the world and from its own past.

1

Although the westbound expansion was often subsidized by the federal government (Stegner, 1953), the social organizations of the living environments of the settlers were far more primitive than those in the East Coast cities. 2 Hence, it is misleading to equate the frontier spirit with the popular images of “rugged individualism,” wherein humans are defined as “solitary island dwellers rather than as gregarious collaborators,” and governmental regulation is seen as an “unwarranted interference with the individual’s right to pursue self-interest” (Coontz, 1992, p. 52). 3 In other words, the dispositional lay theory will eventually attain the status of the most unquestionable and, thus, typically unspoken tacit assumption— or what Daryl Bem (1972) aptly called the “zero-order belief” (p. 6) of the society. 4 The notion of independent agency is close to what Kashima et al. (2004) called agency. Yet agency can also be interdependent (Kitayama & Uchida, 2005; Markus & Kitayama, 2004). Moreover, it is our working assumption that agency, in the sense of systems of behavioral regulation, is best assessed not by self-reflective judgments about it but rather by online measures of behaviors, both overt and covert (e.g., cognition).

VOLUNTARY SETTLEMENT AND INDEPENDENCE

Hokkaido: Japan’s “Northern Frontier” In an attempt to use the logic of triangulation in testing the voluntary settlement hypothesis, we identified one valuable “laboratory of nature” in Hokkaido, a northern island of Japan. Historically, Japan has received substantial influence from Buddhism, Confucianism, and other ideas of Asian origin. Hence, Japan (Culture B) is separated, not only geographically and historically but also culturally, from North America (Culture A). It is crucial that Japan has maintained its highly interdependent cultural ethos (Doi, 1973; Kondo, 1982; Lebra, 1976). Thus, people are motivated more toward interpersonal adjustment and social harmony than toward personal goals and personal choice. Moreover, people tend to share a nondispositional, more holistic lay theory of behavior as socially afforded and constrained (Kitayama & Uchida, 2005). For our present purposes, however, Hokkaido (Culture B⬘) and North America share one incidental yet theoretically critical element, namely, the recent history of systematic immigration and settlement in the frontier. If we apply the logic of triangulation, the causal role of voluntary settlement would be strongly suggested if Hokkaido (Culture B⬘) were more similar to the United States (Culture A) than to mainland Japan (Culture B) in respect to independent agency. To make a convincing case, however, it is crucial that we examine the recent history of Hokkaido in some detail. Until the mid-19th century, the island of Hokkaido was largely a wilderness inhabited by indigenous people, the Ainu, the hunters and gatherers who engaged in commerce with Japanese (Fitzhugh & Dubreuil, 1999; Watanabe, 1972). The situation changed dramatically during the Meiji Restoration (in 1867), when the central feudal government collapsed and political power was “returned” from the shogun (the most powerful of the feudal lords) to the emperor. During the Meiji Restoration, Japan ended its national seclusion policy (which had lasted more than 200 years), opened its ports to the outside world, and started its path of intensive Westernization. This rapid societal change resulted in, among other things, a large number of samurais (feudal warriors) who lost their means of living. Around the same time, Russia had become a major threat to Japan’s northern territories. In response, the Meiji government recruited the jobless samurais to create settlements in Hokkaido. These settlers opened new lands, expelled the Ainu or assimilated them to Japanese culture, built new roads and bridges, developed a network of railroad tracks, opened ports, and established industrial and commercial centers. Gradually, numerous others, especially farmers from all over Japan, followed suit and settled in Hokkaido in large numbers. Over the next several decades, Hokkaido was transformed from a sparsely populated wilderness to an important territory of Japan. During this period, the total population of Japanese in Hokkaido increased dramatically, from 120,000 in 1872 (Bureau of Statistics, Imperial Cabinet, Japan, 1993) to 2,360,000 in 1920 (Bureau of Statistics, Imperial Cabinet, Japan, 1924). In the early days of settlement, the population was scattered and sparse. Moreover, the climate was harsh, especially in the winter, for those from the much warmer southern parts of Japan, and although the land was plentiful, it was not always fertile (Fitzhugh & Dubreuil, 1999; Watanabe, 1972). These and other conditions presented major challenges for settlers. A major industry was coal mining, although today all coal resources have already been exploited, and the mines have long been closed. To a substantial

371

degree, then, there is a close yet entirely incidental historical resemblance between Hokkaido and North America. Both have a history of voluntary settlement in the frontier. Today, Hokkaido is largely inhabited by ethnic Japanese, with a total population of nearly 6,000,000. Indigenous Ainu people survive, but they have been largely assimilated, and thus they are virtually invisible in the mainstream society, accounting only for 0.5% of the entire population in Hokkaido. Hokkaido Japanese speak Japanese and watch national TV programs. School curriculums are centrally controlled and therefore largely equivalent in Hokkaido and elsewhere in Japan. There are numerous daily flights from Tokyo, Osaka, and other major cities of the main island of Japan to Sapporo, the primary gateway to Hokkaido. On the surface level at least, Hokkaido is fully integrated into the national culture of Japan. Below the surface, however, the voluntary settlement hypothesis suggests that there should be a spirit of independence in this northern island of Japan: Its relatively recent history of settlement in the frontier must have fostered tacit beliefs and practices of independent agency, and, moreover, these beliefs and practices must have been passed over generations among its residents. If true, such a finding would go far beyond Hokkaido per se. In the present work, we test Hokkaido residents in respect to two central marker features of independent agency: (a) personal goal pursuit and choice (as revealed in correlates of happiness and cognitive dissonance) and (b) lay dispositionism (as revealed in causal attribution; Kitayama & Uchida, 2005; Markus & Kitayama, 1991, 2004; Nisbett, 2003). We predict that although Japan traditionally has been governed by a contrasting cultural ethos of interdependence, there will be a strong cultural ethos and spirit of independence in Hokkaido. In this particular respect, Hokkaido Japanese should be more similar to Americans than to mainland Japanese. With the logic of triangulation, such a finding could establish the causal role of voluntary settlement in fostering independent agency. By testing these predictions, we set out to examine regional variations in a single national culture of Japan. Although similar inquiries into regional differences recently have been attempted in the United States (Nisbett & Cohen, 1996; Plaut, Markus, & Lachman, 2002; Vandello & Cohen, 1999), Europe (Knight, Varnum, & Nisbett, 2005), and other areas, such as Asia and Australia (Kashima et al., 2004), the present work is the first of its kind that is specifically designed to test the voluntary settlement hypothesis.

Mechanisms for Cultural Change and Cultural Persistence In the present work, we also explore three specific mechanisms by which voluntary settlement fosters independent agency. For this purpose we examine Hokkaido residents who differ in where they were born and raised, either in Hokkaido or in the Japan mainland. The first hypothesis emphasizes the role of initial enculturation in mainland Japan. By initial enculturation, we mean the process by which individuals acquire their heritage culture. Through enculturation, most individuals acquire culturally sanctioned psychological features, and, once they make this acquisition, these psychological features may be quite stable. This process, if operating among settlers in Hokkaido, would work against cultural change in

372

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

the land of the frontier.5 In the present work, this hypothesis implies that non-Hokkaido-born residents of Hokkaido should be largely no different from mainland Japanese. Second, in all cultures, there are deviants, for a variety of reasons. Thus, some small minority of people raised in mainland Japan may acquire certain psychological tendencies associated with independent agency. For example, some might be more inclined than the rest to pursue their personal goals or to engage in personal choice. These individuals may, in turn, be attracted to images of frontier associated with Hokkaido, and, in fact, they may actively and personally choose to move and settle in the Northern Frontier of the country. This possibility is called the self-selection hypothesis. The process of self-selection, if operating among settlers in Hokkaido, would facilitate cultural change in the direction of independence. In the present work, this hypothesis implies that non-Hokkaido-born residents of Hokkaido should be more similar to Hokkaido-born residents of Hokkaido than to mainland Japanese. A third hypothesis we propose acknowledges the role of resocialization, or acculturation, that takes place in the frontier. That is, even though those mainland Japanese who decide to move to Hokkaido are initially no different than the rest of mainland Japanese, once settled in Hokkaido, they may be acculturated into the independently oriented culture of the frontier. This process, if operating, would also facilitate cultural change in the direction of independence insofar as the ethos of independence is increasingly common in the frontier over time. In the present work, this hypothesis implies that non-Hokkaido-born residents of Hokkaido should be similar to Hokkaido-born residents of Hokkaido, especially after some substantial period of acculturation. Although both the self-selection hypothesis and the acculturation hypothesis would be suggested by similarities between Hokkaido-born Hokkaido residents and their non-Hokkaido-born counterparts, the two hypotheses differ in respect to what we might expect as a function of the length of stay in Hokkaido for the non-Hokkaido-born group. Whereas the acculturation hypothesis predicts that this group of people should be more similar to the Hokkaido-born group as they stay longer in Hokkaido, the selfselection hypothesis predicts that the similarities should be observed from the beginning—regardless of how long the nonHokkaido-born individuals had stayed in Hokkaido.

Groups Compared To test the voluntary settlement hypothesis and to explore the three mechanisms of cultural change and persistence, we included in our research design four groups of participants who were quite comparable except for the pertinent cultural background: American college students, Japanese college students in a university in the Japanese mainland, and both Hokkaido-born students and non-Hokkaido-born students in a university in Hokkaido. We recruited Hokkaido participants from a paid participant pool of Hokkaido University. This university is located in Sapporo (the largest metropolitan center of Hokkaido). According to its enrollment statistics, Hokkaido University attracts its students from both within and outside of Hokkaido, with the out-of-Hokkaido students accounting for nearly 55% of the entire student body. This enabled us to divide the Hokkaido group into two subgroups depending on where the students were from (either Hokkaido born or nonHokkaido born).

Mainland Japanese students were recruited from a paid participant pool of Kyoto University, in Kyoto (a large city in the western center of the Japan mainland). Both Kyoto University and Hokkaido University are among the top universities in Japan. The intellectual achievement and prowess of the student body are roughly equivalent. Moreover, both universities attract students largely from middle class socioeconomic strata. From the enrollment statistics of Kyoto University, we anticipated that a vast majority of students in Kyoto would be from southwestern regions of Japan. In fact, none of our participants were from Hokkaido. American participants were recruited from the Universities of Michigan and Chicago. As two of the nation’s top universities, they attract students primarily from middle and upper middle class socioeconomic strata. The two universities in Japan and the two universities in the United States are comparable in national reputation and the dominant socioeconomic status of their student body.

Study 1: Personal Goal Pursuit—Predictors of Happiness One component of independent agency that we predicted to be associated with voluntary settlement was a strong tendency toward personal goal pursuit. We predicted a greater propensity toward personal goal pursuit in Hokkaido than in mainland Japan. We contrasted this tendency with a tendency toward more communal goals of social harmony and mutual help and support in a relationship. When accomplished, both of these goals—personal and communal— give rise to an experience of happiness (Kitayama, Markus, & Kurokawa, 2000; Oishi & Diener, 2001). Yet the relative significance of the two could vary across cultural groups. Thus, for people with independent agency, happiness should depend primarily on personal achievement, whereas for those with more interdependent orientations, happiness should depend more on social harmony. To test this analysis, Kitayama et al. (2000) asked (nonHokkaido) Japanese and American respondents to report how frequently they experienced a variety of emotions. For Americans, the reported experience of general positive emotions (e.g., happiness) was more strongly associated with the experience of disengaging positive emotions (e.g., pride)—a correlate of personal achievement—than with the experience of engaging positive emotions (e.g., friendly feelings)—a correlate of social harmony. In contrast, for Japanese, the reported experience of general positive emotions was more closely associated with the experience of engaging positive emotions (social harmony) than with the experience of disengaging positive emotions (personal achievement). This general pattern was successfully replicated by Kitayama, Mesquita, and Karasawa (in press). It is also consistent with emerging evidence that predictors of subjective well-being and life satisfaction judgment are systematically different across cultures (Kitayama & Markus, 2000; Uchida, Norasakkunkit, & Kitayama, 2004). Diener and Diener (1995) found that self-esteem (analogous to socially disengaging positive emotions) was a more potent predictor of life satisfaction in individualist cultures, such as North 5 Moreover, if this were the only process that operated, there would never have been established a culture of frontier in Hokkaido. Nevertheless, this process might still be strongly operative among those who move to Hokkaido today.

VOLUNTARY SETTLEMENT AND INDEPENDENCE

America, than in collectivist cultures, such as Asian societies. In a study by Kwan, Bond, and Singelis (1997), life satisfaction judgment was predicted equally by both self-esteem and relationship harmony for Hong Kong Chinese, but it was predicted only by self-esteem for Americans. Study 1 was designed to examine whether Hokkaido-born Hokkaido Japanese might hold a relatively strong orientation toward personal goals rather than orienting themselves to social harmony or interpersonal duties and obligations. Like Americans, Hokkaido-born Hokkaido Japanese should feel happiest when their personal goals are achieved. The pattern should be very different for mainland Japanese, who should feel happiest when social harmony and interdependence with others are established (Kitayama et al., 2000). Finally, we used data from non-Hokkaidoborn residents of Hokkaido to test the three alternative hypotheses regarding cultural change and persistence. Although the initial enculturation hypothesis predicts this group to be no different from mainland Japanese, both the acculturation hypothesis and the selfselection hypothesis predict this group to be more similar to Hokkaido-born Hokkaido Japanese and Americans than to mainland Japanese.

Method Participants. We recruited 68 Hokkaido participants (46 men and 22 women) from a paid participant pool of Hokkaido University. Thirty of the Hokkaido participants were born and brought up in Hokkaido. The remaining 38 were from the rest of Japan. Thirty-one Japanese (25 men and 6 women) were recruited from a paid participant pool of Kyoto University in Kyoto (a large city in the western center of the Japan mainland). All Japanese participants received 500 yen on completion of the study. Finally, 84 Americans (49 men and 35 women) were recruited from a participant pool of the University of Michigan. They received course credit for their participation. Procedure. Participants were tested in groups of a few individuals. On arrival, participants were handed a questionnaire and told that the study was concerned with emotional experience. In the questionnaire, the participants were first asked to briefly describe the most emotional episode they had recently experienced. They then indicated how strongly they had experienced each of 20 emotions on a 6-point rating scale (1 ⫽ not at all, 6 ⫽ very strongly). Drawing on our prior work (Kitayama et al., 2000, in press), we divided the 20 emotions into six categories that were defined by both pleasantness and social orientation. By social orientation, we mean the degree to which each emotion is associated with either independence or interdependence. In the positive domain, some emotions result from success in independence, such as personal achievement (e.g., pride, self-esteem, and feelings of superiority), whereas some others result from success in interdependence, such as social harmony and connectedness (e.g., friendly feelings, close feelings, and respect). Still others are general in that they have no unequivocal association with either independence or interdependence (e.g., happiness, calmness, elation, and a feeling of being relaxed). These three classes of emotions are called socially engaging, disengaging, and general, respectively. They had reasonable reliabilities in each of the four groups tested (all ␣s ⬎ .65; alphas did not vary systematically across the groups tested). Likewise, in the negative domains, some emotions result from failure in independence and motivate the person to restore independence (e.g., anger, sulky feelings, and frustration), whereas others result from failure in interdependence and motivate the person to restore interdependence (e.g., feelings of indebtedness, guilt, and shame). Moreover, still others have no clear associations with either independence or interdependence (e.g., unhappiness, depression, disgust, and boredom). In the present work, the three classes of negative emotions had reasonable reliabilities in all the four

373

groups tested (all ␣s ⬎ .72; alphas did not vary systematically across the groups tested).

Results For each participant, the ratings for the emotions in each emotion type were averaged. For each cultural group, the mean intensity of experiencing general positive emotions (e.g., happiness) was regressed on the mean intensities for the two other types of positive emotions, namely, engaging positive emotions (e.g., friendly feelings) and disengaging positive emotions (e.g., pride). The results are summarized in Figure 1. For Americans, happiness was significantly predicted by both disengaging positive emotion (personal achievement) and engaging positive emotion (social harmony), ts(81) ⫽ 7.71 and 4.63, respectively, both ps ⬍ .0001. Replicating previous findings (Kitayama et al., 2000, 2005), however, the regression coefficient was greater for the disengaging positive emotion than for the engaging positive emotion, although the difference was only marginally significant in the present work. In contrast, for mainland Japanese, happiness was strongly predicted by engaging positive emotion but not by disengaging positive emotion. The difference between the two regression coefficients was significant. This finding also replicates the previous evidence obtained by Kitayama et al. (2000, 2005). In the backdrop of this cross-cultural difference, we now evaluate the pattern shown by Japanese in Hokkaido. Overall, data for these Japanese fell between those of the other two groups. At first glance, the pattern appears to vary as a function of where the Hokkaido Japanese were born and raised. Whereas happiness was predicted equally by both the engaging positive emotion and the disengaging positive emotion for the Hokkaido-born group, it was predicted more reliably by the engaging emotion than by the disengaging emotion for the non-Hokkaido-born group. Thus, as compared with nonHokkaido-born Hokkaido residents, Hokkaido-born Hokkaido residents showed a pattern that was more akin to the American pattern. However, in a regression analysis with a dummy variable that distinguished the Hokkaido-born Hokkaido residents from the non-Hokkaido-born Hokkaido residents (Aiken & West, 1991), neither the Culture ⫻ Disengaging Emotion interaction nor the Culture ⫻ Engaging Emotion interaction was statistically significant, t(62) ⬍ 1 and t(62) ⫽ 1.63, p ⬎ .10, respectively. Hence, there is no strong evidence that the pattern was reliably different among Hokkaido residents depending on their place of birth and initial enculturation. Moreover, when we used a dummy variable distinguishing Hokkaido-born Hokkaido Japanese from Americans and carried out another regression analysis, neither of the interaction terms was significant (ts ⬍ 1). Finally, in a regression analysis with a dummy variable that distinguished the non-Hokkaido-born Hokkaido Japanese from the mainland Japanese, the Culture ⫻ Disengaging Emotion interaction proved significant, t(63) ⫽ 2.02, p ⬍ .05. The association between the disengaging positive emotion and happiness was significantly greater for the non-Hokkaido-born Hokkaido Japanese than for the mainland Japanese. The corresponding effect was absent for engaging emotions, with the Culture ⫻ Engaging Emotion interaction failing to reach statistical significance ( p ⬎ .15). The emerging pattern, then, is one in which the American group and the two Hokkaido groups were mutually similar to one an-

374

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

Figure 1. Standardized regression coefficients for engaging and disengaging positive emotions in predicting happiness (Study 1): Engaging positive emotions (e.g., friendly feelings) were more important for Japanese, and disengaging positive emotions (e.g., pride) were more important for Americans. Hokkaido residents from Hokkaido fell in between the two extremes.

other. Moreover, these groups, in turn, were reliably different from the mainland Japan group. Accordingly, the pattern is most consistent with the acculturation hypothesis, the self-selection hypothesis, or both. Nevertheless, caution is due, because, unlike their Hokkaido-born counterparts, the non-Hokkaido-born Hokkaido group was reliably different from the American group. In particular, in a regression with a dummy variable that distinguished between the Americans and the non-Hokkaido-born Hokkaido Japanese, the Culture ⫻ Disengaging emotion interaction was negligible, t(116) ⫽ 1.00, ns, but the Culture ⫻ Engaging Emotion interaction did prove significant, t(116) ⫽ 8.16, p ⬍ .005. Thus, it is difficult to completely discount the initial enculturation hypothesis.6

mainland Japanese are oriented more toward social concerns and goals than toward personal ones. It is interesting that non-Hokkaido-born residents of Hokkaido were statistically no different from their Hokkaido-born counterparts. This pattern of data is consistent with the acculturation hypothesis, the self-selection hypothesis, or both. Nevertheless, the data are equivocal on the role of initial enculturation. Moreover, the relatively small number of non-Hokkaido-born students in Study 1 prevented us from computing such correlations separately for subgroups that varied in the length of stay in Hokkaido. Hence, it was not possible to assess the relative merit of the acculturation hypothesis and the self-selection hypothesis.

Discussion Study 1 examined predictors of happiness and demonstrated a regional difference in Japan that is consistent with the voluntary settlement hypothesis. Like European Americans in the United States, Hokkaido Japanese embody the spirit of independent agency. These individuals put relatively equal weights on both personal achievement and social harmony. This is in sharp contrast with Japanese in mainland Japan, who appear to value social harmony substantially more than they value personal achievement. Unlike Hokkaido-born Hokkaido Japanese and Americans, the

6 For exploratory purposes, we carried out analogous analyses on the negative domain. As in recent work by Kitayama et al. (2005), there was no difference across cultures. In all groups, general negative emotion (unhappiness) was more reliably predicted by disengaging negative emotion (anger) than by engaging negative emotion (shame; ␤s ⫽ .73 and .19, respectively), t(180) ⫽ 7.22, p ⬍ .0001. One conjecture is that, regardless of whether dominant cultural orientations are independent or interdependent, people may experience both greater frustration (a disengaging negative emotion) and general unhappiness when their dominant orientations are blocked.

VOLUNTARY SETTLEMENT AND INDEPENDENCE

Study 2: Personal Choice—Two Forms of Dissonance Individuals in the frontier are likely to engage in personal choice. One consequence of choice that has been extensively studied by social psychologists during the last half century concerns cognitive dissonance (Brehm, 1956; Festinger, 1957; Harmon-Jones & Mills, 1999; Steele, 1988). Study 2 addresses the question of what forms dissonance might take in the Northern Frontier of Japan. Our analysis is based on a recent theoretical development designed to link culture to dissonance (Kitayama, Snibbe, Markus, & Suzuki, 2004). We anticipated that because a strong emphasis is given to personal choice in the frontier, cognitive dissonance in Hokkaido would take a unique form—the form we call personal (as opposed to interpersonal). Drawing on theories of dissonance process that emphasize the role of the self (Aronson, 1968; Steele, Spencer, & Lynch, 1993), Kitayama et al. (2004) suggested that dissonance can take crossculturally divergent forms because cultures emphasize different aspects of the self. In some cultural contexts, such as North American cultures, in which independence of the self in general and personal goal orientation in particular are emphasized, private self-images, such as the self’s competence and moral integrity, are highlighted, and, as a consequence, individuals are hypothesized to experience dissonance when their behavioral choice poses a threat to a certain private self-image they wish to sustain. For example, a choice between two equally attractive cars can raise a threat to one’s competence as a wise decision maker and consumer, because the chosen car might have negative features or the rejected car might have positive features. Kitayama et al. called this dissonance the personal dissonance. In contrast, in other cultural contexts, such as many Asian cultures, interdependence of the self in general and orientation toward social others in particular receive a far greater emphasis. In these cultures, public self-images, such as the self’s reputation and social acceptance, are highlighted, and, as a consequence, individuals experience dissonance when their behavioral choice poses a threat to a certain public self-image they hope to maintain. For example, choosing to buy a luxurious German car might raise a concern about what one’s colleagues and neighbors might think about one. This dissonance has been called the interpersonal dissonance. Whether personal or interpersonal, dissonance is an aversive emotional state that motivates the person to justify the original choice (Festinger, 1957). Thus, once induced to experience dissonance, individuals are motivated to justify their choice by increasing their liking for the chosen item, decreasing their liking for the rejected item, or both. Nevertheless, depending on the cultural contexts of the person at issue, the dissonance should be aroused under quite different circumstances (Hoshino-Browne et al., 2005; Imada & Kitayama, 2005; Kitayama et al., 2004). One critical variable is an awareness of “eyes of others” watching and closely monitoring the self (Imada & Kitayama, 2005; Kitayama et al., 2004). Because personal dissonance hinges on what one’s choice means to one’s private self-image, it should happen in total privacy, in the absence of any eyes of others. In contrast, interpersonal dissonance hinges on what one’s choice might mean to one’s public self-image. For this dissonance to arise, the choice has to be public— or at least perceived to be so. In fact, if the choice is made in complete privacy, it entails no ramification to one’s public self-image, resulting in no dissonance.

375

To test these ideas, Kitayama et al. (2004; Imada & Kitayama, 2005) had both American (mostly White, middle class individuals) and Japanese participants (those living in Kyoto) make a choice between two equally attractive CDs and examined the degree to which the liking for the chosen CD was increased and the liking for the rejected CD was decreased. A key manipulation involved a poster that seemingly had been prepared for a conference presentation. The poster contained several schematic faces that were “watching” whoever was seated right in front of it. In an eyes-ofothers condition, this poster was surreptitiously placed in front of the participant. No one raised any suspicions about the poster. In a control condition, no such poster was placed. In support of the foregoing analysis, Japanese did not show any dissonance effect in the control condition, but they did show a reliable dissonance effect in the “eyes-of-others” condition. This suggests that, in making a choice, Japanese worry mostly about what the choice might mean to their public self-image. A dissonance effect for them therefore happens only when these public self-image concerns are experienced. In contrast, Americans showed a strong dissonance effect in the control condition. However, this effect was somewhat reduced in the eyes-of-others condition. Imada and Kitayama (2005) replicated this reduction of dissonance in the “eyes-of-others” condition and suggested that Americans assume that others are trying to influence them (Morling, Kitayama, & Miyamoto, 2002) and, as a consequence, perceive watching eyes of others to be constraining their choice. Choice under these conditions is therefore less free, entailing a lesser threat to one’s private self-image and, thus, a lesser need for self-justification. This pattern of findings suggests that, in making a choice, Americans worry mostly about what the choice might mean to their private self-image. Only to the extent that they are anxious about their private self does a dissonance effect accrue for Americans. Drawing on the foregoing research by Kitayama and colleagues (Imada & Kitayama, 2005; Kitayama et al., 2004), we conducted Study 2 to examine a dissonance effect in Hokkaido. We predicted that residents of Hokkaido would show a reliable dissonance effect in the absence of any eyes of others. Like North Americans, when exposed to eyes of others, they would show a weaker dissonance effect. In Study 2 we also examined non-Hokkaido-born Hokkaido Japanese to determine the relative merit of the three alternative hypotheses on cultural change and persistence (i.e., initial enculturation, acculturation, and self-selection).

Method Participants. Eighty-one college students (57 men and 24 women) were recruited from a psychology paid participant pool of Hokkaido University. Forty-one (25 men and 16 women) were born and brought up in Hokkaido, and the remaining 40 (32 men and 8 women) were born and brought up in the Japanese mainland. Procedure. The participants individually took part in the study. They were randomly assigned to either a control condition or a poster condition. These two conditions were identical, except that in the poster condition a poster that seemingly had been prepared for a conference presentation was surreptitiously placed in front of the participant. The poster depicted several schematic faces so that when the poster was placed right in front of the participant, these faces were “looking at” him or her (see Kitayama et al., 2004, for the poster that we used). No participant reported any suspicions about this manipulation. No poster was placed in front of the participants in the control condition.

376

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

The participants were told that the main part of the study concerned music preferences and were given a bogus music survey. As part of this survey, participants reported their birthplace and the length of their stay in Hokkaido. After a while, in keeping with studies by Heine and Lehman (1997) and by Kitayama et al. (2004), the experimenter presented the participants with 30 CDs and asked them to pick 10 that they wanted (and did not have). They rank ordered the 10 CDs according to their preferences and then rated their preferences on a 5-point rating scale ranging from “I don’t want it at all (1)” to “I want it very much (5).” At this point, they were told that 2 CDs were available for them to take home after the experiment. They were given a choice between the 2. These 2 CDs were the ones that the participant had ranked 5th and 6th in the ranking task. The participants were given the CD that they chose and continued to work on the music survey for another 10 min, at which point the experimenter told the participants that he wanted them to report their preferences of the CDs again because “he wanted to see what people would feel toward the CDs when they were not looking at them.” The participants subsequently rank ordered the 10 CDs and also rated their likings for them. The participants were asked to indicate the preferences they felt right at that moment. After this, the participants were debriefed and dismissed.

Results A dissonance reduction effect was indexed by both an upward rank change of the chosen CD and a downward rank change of the rejected CD. We added these two rank change scores to yield a measure of spread of alternatives (SA). A comparable measure based on rating measures of preference showed an identical pattern, so we report only the ranking data. We examined the SA measure. In support of the prediction that Hokkaido residents are concerned with private self-images and, thus, experience dissonance even when they make a choice in private, there was a reliable dissonance effect in the control condition for Hokkaido residents from Hokkaido (M ⫽ 0.90), t(20) ⫽ 2.19, p ⬍ .05. Moreover, consistent with the hypothesis that, once exposed to the “eyes” of social others, individuals with independent agency would feel constrained or even influenced by these others, the dissonance effect was no longer statistically reliable for the Hokkaido residents from Hokkaido (M ⫽ 0.60), t(19) ⫽ 1.06, p ⬎ .20. Next we examined non-Hokkaido-born residents of Hokkaido. The result was unequivocal. The pattern for these participants was no different from the pattern for the Hokkaido-born residents of Hokkaido. Thus, the SA was reliably positive in the control condition (M ⫽ 1.26), t(18) ⫽ 2.29, p ⬍ .05, and, moreover, it was reduced to be no different from zero in the eyes-of-others condition (M ⫽ 0.57; t ⬍ 1). If anything, the pattern was slightly more pronounced for the non-Hokkaido-born group than for the Hokkaido-born group, although the interaction between condition and birthplace was negligible (F ⬍ 1). This evidence lends support to the acculturation hypothesis, the self-selection hypothesis, or both. We can test these two hypotheses by examining whether the pattern we found might become more pronounced as a function of the length of stay of the participants in Hokkaido. That is, if the dissonance effect in the control condition became larger and the dissonance effect in the poster condition became smaller as a function of the length of stay, the acculturation hypothesis would be strongly supported. The absence of such correlations would be more consistent with the self-selection hypothesis. The length of stay had some reasonable variability (10 and 84 months, with a standard deviation of 14.77), yet the correlations were essentially

zero (rs ⫽ ⫺.36 and ⫺.05, ns, for the control and the poster conditions, respectively). One alternative possibility is that acculturation requires a minimal amount of time. That is, mere exposure (Zajonc, 1968) to an independent culture might be sufficient to produce a personal dissonance. In fact, many cultural priming effects (e.g., Gardner, Gabriel, & Lee, 1999; Hong, Morris, Chiu, & Benet-Martinez, 2000) might seem consistent with such a conjecture. Especially for dissonance, this conjecture has a difficulty, however. Heine and Lehman (1997) examined a relatively representative group of mainland Japanese students temporarily attending a Canadian university and found an interpersonal pattern of dissonance among them. In particular, these researchers tested a group of mainland Japanese students who participated in an overseas exchange program. The nature of the program was such that there is no strong reason to believe that those who participated in the program were systematically different from those who did not. Hence, Heine and Lehman’s data suggest that there is little or no acculturation effect in dissonance. Also consistent with this is a recent finding by Hoshino-Browne et al. (2005) that the interpersonal pattern of dissonance was just as strong among Asian Canadians as among Japanese in Japan. Clearly, a mere exposure to North American culture is not sufficient to produce a personal pattern of dissonance among those with Asian heritage.7 The procedure of the current study is identical to the procedure of Study 4 of Kitayama et al. (2004), wherein the researchers collected Japanese data in Kyoto and compared them with data collected from Caucasian Americans in the United States. For comparison purposes, we plotted the pertinent SA means in Figure 2. As can be seen, Japanese in Kyoto showed a substantially larger dissonance effect in the poster condition than in the control condition. In contrast, all the remaining three groups (Hokkaido residents from Hokkaido, non-Hokkaido-born Hokkaido residents, and North Americans) showed a larger dissonance effect in the control condition.

Discussion Study 2 included a standard free-choice condition that was modeled closely after the procedure of the earlier studies (e.g., Heine & Lehman, 1997; Kitayama et al., 2004; Steele et al., 1993); we found that, unlike mainland Japanese in the previous work, Hokkaido resi7 This finding also suggests that immigrants from Asian cultures in North America might not have strong personal goal or choice orientations. Although these observations must be further examined in a more systematic study, they may be related to the fact that a vast majority of immigrants from Asia and South America arrived fairly recently (Sua´rez-Orozco, 2003). These “late immigrants” did not settle in new lands or frontiers. Instead, they moved into cultural and social institutions that had already been set up and held in place and were often similar to those available in the home country (e.g., Chinatown and Little Tokyo). One could presume that there was substantially less need for independence and self-sufficiency for these late Asian immigrants than for earlier immigrants in both North America and Hokkaido, who settled in literally new, unknown, and unexploited lands of opportunity wherein virtually no infrastructures or cultural systems had been laid out. Another possibility is that Asian immigrants are motivated to maintain their cultural identity by contrasting it against the individualistic ethos of the mainstream society in North America. No such identity concerns were supposedly involved for the original settlers in Hokkaido or for those in the United States.

VOLUNTARY SETTLEMENT AND INDEPENDENCE

377

Figure 2. Dissonance effect in the two experimental conditions (Study 2): Hokkaido residents showed a dissonance effect in both of the conditions. This pattern is analogous to the pattern observed for Americans but not for non-Hokkaido Japanese in a previous study. The North American data and the mainland Japanese data in the figure (marked with an asterisk) are from Kitayama et al. (2004, Study 4). Error bars indicate standard errors of the mean.

dents showed a reliable dissonance effect when no watching eyes of others were present. When exposed to eyes of others, these individuals no longer showed a reliable dissonance effect. These findings are consistent with the hypothesis that our Hokkaido participants are concerned with their private self-image and thus experience a personal dissonance. This conclusion is bolstered by the fact that the pattern observed in Hokkaido is very similar to the one observed for North Americans in a study with the identical design and procedure (Study 4 in Kitayama et al., 2004). One significant finding comes from the non-Hokkaido-born group in Hokkaido. The pattern for these individuals was no different from the pattern for either Hokkaido-born Hokkaido Japanese or Americans. Moreover, this was the case regardless of the length of stay of these individuals in Hokkaido. Null findings such as these are hard to interpret. Thus, we cannot entirely exclude the involvement of acculturation. However, taken as a whole, the current evidence pertaining to personal goal pursuit and personal choice is most consistent with the self-selection hypothesis, which states that some small proportion of mainland Japanese acquire a strong tendency toward personal choice and personal goal pursuit and, furthermore, that these mainland Japanese are especially strongly attracted to and, thus, likely to move to Hokkaido—Japan’s Northern Frontier.

Study 3: Dispositional Lay Theory—The Fundamental Attribution Error So far, we have found that Hokkaido Japanese are inclined toward personal goal pursuit (as revealed in the happiness measure) and personal choice (as revealed in the dissonance measure). Moreover, the data pattern suggests that the self-selection process plays a significant part in accounting for this observation. In Study 3, we examine whether the same pattern might be observed for dispositional lay theory. Theoretically, a large group of people with behavioral proclivities toward personal goal pursuit and personal choice may be expected to develop a shared belief that action is internally motivated and controlled. Accordingly, Hokkaido Japanese may be expected to share a highly dispositional lay theory of the person. It is not clear, however, whether Hokkaido residents who were born and brought up in mainland Japan might also share such a belief. While growing up in mainland Japan, these individuals are surrounded by others who do not have any psychological proclivities toward personal choice or personal goals. It is therefore possible that these individuals acquire a nondispositional lay theory while in mainland Japan. Moreover, they may even keep this theory

378

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

when they move to Hokkaido.8 For this measure, therefore, there is a greater likelihood that the initial enculturation hypothesis will be supported. To address these questions, we examine a cognitive bias toward dispositional attribution, called the fundamental attribution error (Ross, 1977). According to this cognitive bias, when asked to explain another’s behavior, people refer primarily to the person’s internal attributes, such as traits and attitudes, in lieu of external factors that surround him or her. This bias is caused by a mental model of a person as independent and as organizing his or her own actions in terms of his or her own internal attributes, such as attitudes and goals. Although this bias is extremely common and robust in North American cultural contexts (Jones, 1979; Ross, 1977), it is likely to be less so in other cultural contexts. Especially in East Asia, a collective belief that people are interdependent is dominant, which highlights the role of social contextual factors in guiding one’s own behaviors. In these cultures, then, the dispositional bias, such as the fundamental attribution error, may be attenuated or even absent (Markus & Kitayama, 1991; Nisbett, Peng, Choi, & Norenzayan, 2001). Since a pioneering study by Miller (1984), this cross-cultural prediction has received substantial support (Choi, Nisbett, & Norenzayan, 1999). M. W. Morris and Peng (1994), for example, presented both American and Chinese respondents various pictures of a number of fish swimming in different formations. When asked to account for the reasons for the movement of a target fish, Americans were more likely to refer to internal factors of the fish (e.g., its psychological dispositions) than to factors that were external to it (e.g., movements of other fish that were present) as causal factors underlying the movements. In contrast, Chinese were more likely to refer to external factors than to internal factors in the same task (see also Chiu, Hong, & Dweck, 1997). The same cross-cultural difference has been observed in content analyses of media materials (M. W. Morris & Peng, 1994), commentaries on professional sports events (Lee, Hallahan, & Herzog, 1996), and TV coverage of Olympics games (Markus, Uchida, Omoregie, Townsend, & Kitayama, 2006). It has also been documented in an attitude-inference paradigm (e.g., Masuda & Kitayama, 2004; Miyamoto & Kitayama, 2002). In Study 3, we use a causal attribution task. We compare four groups of participants, namely, Hokkaido residents who were born in Hokkaido, those who were not born in Hokkaido, mainland (i.e., Kyoto) Japanese, and Americans in the United States (in Chicago). We predicted, first, that we would observe a strong emphasis on dispositional attribution among both Americans and Hokkaido Japanese socialized in Hokkaido. Second, we expected that this pattern would be either attenuated or even entirely nonexistent among non-Hokkaido Japanese. Third, we used the responses from non-Hokkaido-born Hokkaido Japanese to test the relative merit of the three hypotheses on cultural change and persistence. The initial enculturation hypothesis predicts that non-Hokkaido-born residents of Hokkaido will be similar to non-Hokkaido Japanese in the Japanese mainland. However, both the self-selection hypothesis and the acculturation hypothesis predict that the responses of non-Hokkaido-born residents of Hokkaido will be more similar to those of Americans and Hokkaido residents raised in Hokkaido than to those of mainland Japanese.

Method Participants. Thirty-eight Japanese (19 men and 19 women) were recruited from a participant pool of Kyoto University. All received course credit for their participation. None of these participants were from Hokkaido. Forty-two more Japanese (28 men and 14 women) were recruited from a paid participant pool of Hokkaido University. All received 500 yen on completion of the study. Seventeen of these Hokkaido participants were born and brought up in Hokkaido, and the rest were originally from the rest of Japan and came to Hokkaido to attend the university. Thirty Americans (15 men and 15 women) were recruited from the University of Chicago. These American participants responded to fliers we distributed across the campus. On completion of the study, they received $5 for their participation. Procedure. Participants were tested in groups of a few individuals. On arrival, participants were told that the study was concerned with social judgment and were handed a questionnaire booklet. The booklet had six stories. Each story featured a protagonist who committed an action that was either desirable or undesirable. Following Miller (1984), we prepared both a desirable action version and an undesirable action version for each story. For example, the following are two versions of a story about a professional pitcher, Tom Lyons. Desirable action version: Professional pitchers, like Tom Lyons, are very busy almost everyday during the regular season. The pitchers work hard practicing and playing in games. In the off-season, therefore, many professional pitchers take vacations. However, Tom Lyons holds several free baseball camps for kids living in poor neighborhoods instead of taking a vacation. Undesirable action version: A pitcher for a professional baseball team, Tom Lyons, lost several games in the beginning of the season. Instead of spending extra time practicing, he used performance-enhancing drugs for the rest of the regular season. Tom Lyons continued to use the drugs, even though the use of performance-enhancing drugs is illegal and considered to be cheating. Participants were then asked to indicate their agreement or disagreement with four statements on 7-point rating scales (1 ⫽ strongly disagree, 7 ⫽ strongly agree). In keeping with some earlier work (M. W. Morris & Peng, 1994), we included counterfactual judgments in addition to straightforward attribution judgments. Attribution judgments require the participant to report the extent to which each of the two sets of causes (internal and external) influenced the behavior at issue. Counterfactual judgments require the participant to report the extent to which he or she thinks the behavior would have changed if one or the other set of causes had been different. If participants perceived an internal (or external) factor as an important cause for the behavior, they should also report that the behavior would have been different if the internal (or external) factor had been different. We thus prepared the following four questions about Tom Lyons: 1.

“Features of Tom Lyons (such as his character, attitude, or temperament) influenced his behavior” (internal attribution).

2.

“Features of the environment that surrounds Tom Lyons (such as the social atmosphere, social norms, or other contextual factors) influenced his behavior” (external attribution).

3.

“Tom Lyons would have acted differently if his features (such as his character, attitude, or temperament) had been different” (internal counterfactual judgment).

8 This analysis implies that voluntary settlers in Hokkaido are likely to be oriented toward personal goals and choice in their actions but believe in a nondispositional, more holistic lay theory. Such dissociation between belief and action is not uncommon (Nisbett & Wilson, 1977).

VOLUNTARY SETTLEMENT AND INDEPENDENCE 4.

“Tom Lyons would have acted differently if features of the environment that surround him (such as the social atmosphere, social norms, or other contextual factors) had been different” (external counterfactual judgment).

All materials were developed by a team of two Japanese–English bilinguals and one native English speaker. They were translated and backtranslated between the two languages. This process was repeated several times to ensure the semantic equivalence between the two languages. On completing the sixth story, participants were debriefed and dismissed. For half of the participants, three of the six stories were presented in the desirable action version, whereas the remaining three stories were presented in the undesirable action version. These stories were presented in a single random order. For the other half of the participants, the desirability of each story was switched. For all six stories, there was no systematic difference between the desirable version and the undesirable version, so we dropped this variable from consideration. Finally, the Hokkaido participants reported how long they had lived in Hokkaido.

Results Attribution judgment. Responses to the internal and external causal attribution questions were analyzed with an analysis of variance with two between-subjects variables (culture and gender

379

groups) and one within-subject variable (causal locus). The main effect of causal locus was significant. Overall, there was a greater tendency to attribute behaviors to internal factors than to external factors, F(1, 102) ⫽ 30.34, p ⬍ .0001. As predicted, however, the Culture ⫻ Causal Locus interaction proved significant, F(3, 102) ⫽ 4.04, p ⬍ .01. For the relevant means, see the left panel of Figure 3. Replicating numerous American studies that have demonstrated the fundamental attribution error (Ross, 1977), American participants reported the internal factors to be more important than the external factors in producing the behaviors. Moreover, replicating more recent studies that showed a significant cross-cultural variation in causal attribution, this effect entirely vanished in our Kyoto sample, t(102) ⫽ 1.63, ns. Within this general cross-cultural difference, data from our Hokkaido-born Hokkaido sample lend support to the voluntary settlement hypothesis. These participants were no different from Americans, showing a much stronger emphasis on the internal factors than on the external factors, t(102) ⫽ 4.51, p ⬍ .01. The difference between the internal score and the external score was the same for the two groups (t ⬍ 1). It is interesting that the pattern for non-Hokkaido-born Hokkaido Japanese was consistent with the fundamental attribution error yet was quite weak, t(102) ⫽

Figure 3. Attribution judgment and counterfactual judgment in Study 3: There was a reliable tendency to weigh internal factors more than external factors among North Americans and Hokkaido residents from Hokkaido. For non-Hokkaido Japanese, there was no such tendency. Error bars indicate standard errors of the mean.

380

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

1.93, p ⬍ .10. In fact, this pattern was more similar to the pattern observed for the mainland (Kyoto) Japanese. The difference between the internal score and the external score was no different between these two Japanese groups (t ⬍ 1). In contrast, the difference between the internal score and the external score for the Hokkaido-born Hokkaido Japanese group was significantly greater than the corresponding difference for the non-Hokkaido-born Hokkaido Japanese group, t(102) ⫽ 2.25, p ⬍ .05. This pattern clearly favors the initial enculturation hypothesis over the self-selection hypothesis or the acculturation hypothesis. Counterfactual judgment. Results for the counterfactual measures, shown in the right panel of Figure 3, are quite similar to those for the causal attribution measures. As predicted, an analysis of variance performed on these data showed a significant causal locus main effect and its interaction with culture, F(1, 102) ⫽ 25.94, p ⬍ .0001, and F(3, 102) ⫽ 6.77, p ⬍ .005, respectively. Both Americans and Hokkaido-born Hokkaido Japanese reported that the behavior at issue was more likely to have changed with a change in internal factors than with a change in external factors, t(102) ⫽ 4.41, p ⬍ .01, and t(102) ⫽ 4.94, p ⬍ .01, respectively. There was no significant difference between the two groups, t(102) ⫽ 1.29, ns. In contrast, this bias—the fundamental attribution error—vanished entirely in both the non-Hokkaido-born Hokkaido Japanese and the Kyoto Japanese (ts ⬍ 1). Again, there was no significant difference between the two groups (t ⬍ 1). Finally, the difference between the internal score and the external score for the Hokkaido-born Hokkaido Japanese group was significantly greater than the corresponding difference for the non-Hokkaido-born Hokkaido Japanese group, t(102) ⫽ 3.48, p ⬍ .01. Correlation with the length of stay in Hokkaido. The data we have reported for the non-Hokkaido-born Hokkaido Japanese are most consistent with the initial enculturation hypothesis. According to this hypothesis, once socialized in mainland Japan, these individuals acquire a nondispositional lay theory because they are surrounded by others who are quite interdependent and socially oriented. This may be the case even though they are behaviorally more oriented toward personal goal pursuit and personal choice (as indicated by Studies 1 and 2). Moreover, these individuals seem to retain the nondispositional lay theory even when they move to and are immersed in the culture of frontier, where a contrasting dispositional lay theory holds sway. Accordingly, we expected that there should be no systematic correlation between the tendency for dispositional attribution (endorsement of the internal causes minus endorsement of the external causes, with the two attribution measures combined) and the length of time these individuals had spent in Hokkaido when they participated in the study. The length of stay had some reasonable variability (3 and 36 months, with a standard deviation of 7.53), yet, as predicted, the correlation was essentially zero (r ⫽ ⫺.01, ns).

Discussion In Study 3, we have examined dispositional lay theory and have obtained evidence that Hokkaido-born Hokkaido Japanese are very similar to Americans. Both of these groups showed a strong bias toward dispositional (as opposed to situational) factors in causal inference. Given these findings and the evidence from the first two studies, we may conclude that there is a strong ethos of independent agency in Hokkaido. In stark contrast with the results in the first two studies (pertaining to personal goals and choice), how-

ever, according to this cognitive measure of independent agency (lay dispositionism), non-Hokkaido-born Hokkaido Japanese were more similar to mainland Japanese than to either Hokkaido-born residents of Hokkaido or Americans. This latter data pattern is most consistent with the initial enculturation hypothesis.

General Discussion The Voluntary Settlement Hypothesis Where does culture come from? How is it produced, maintained, and, in many cases, changed over time? Some theorists have emphasized ideational resources that are long stored and preserved in distinct geographical regions (e.g., Western civilization and Eastern civilization; e.g., Markus & Kitayama, 1991; Nisbett, 2003) or distinct groups, religious or otherwise (e.g., Protestants and Catholics; Sanchez-Burks, 2005). These and other theorists have also emphasized ecological conditions of living (Berry, 1976; Triandis, 1995). Of course, since Marx and Weber, modernization theorists have long argued that economy plays critical roles in the process (Inglehart & Baker, 2000; Schooler, in press). The present hypothesis can be located squarely at the intersection of these proposals and analyses. We have hypothesized that American individualism owes importantly to voluntary settlement by Europeans in the formative years of the United States. This settlement was motivated initially by a quest for religious freedom and later more by a pursuit of personal wealth. Moreover, the subsequent expansion of the territory to the west also must have fostered the ethos of independence. Although this hypothesis is difficult to test in North America because of myriad confounding sociohistorical factors, it does suggest that there should be elements of independent agency even outside of North America as long as there is a relatively recent history of voluntary settlement. We thus examined Hokkaido—a northern island of Japan—as a natural experiment for testing the voluntary settlement hypothesis. This island was intensively settled since the 1870s over several decades, primarily by jobless samurais and subsequently by numerous farmers from all over Japan. The motivations for the settlement were personal success and achievement in certain tangible terms. The dictum, “Boys, be ambitious!” (broadly attributed to an American educator, William S. Clark, who served as the first vice president of the Hokkaido University between 1876 and 1877) eloquently expresses the regional ethos of Hokkaido around that time, which we believe has since been deeply ingrained into the regional culture. As predicted, in three different measures of independent agency, Hokkaido Japanese who were born and raised in Hokkaido were more similar to Americans than to mainland Japanese. Unlike happiness for the mainland Japanese, happiness for the Hokkaido Japanese was distinctly more personal. Moreover, the Hokkaido Japanese also showed evidence of personal dissonance. This finding on dissonance is in sharp contrast with typical findings for mainland Japanese (Hoshino-Browne et al., 2005; Imada & Kitayama, 2005; Kitayama et al., 2004), in which the dissonance effect is obtained only in the presence of certain social cues or contexts that evoke public self-image concerns. Finally, unlike the mainland Japanese, the Hokkaido Japanese exhibited clear evidence of the fundamental attribution error. Given these findings and the logic of triangulation, we may conclude that voluntary

VOLUNTARY SETTLEMENT AND INDEPENDENCE

settlement is a powerful causal factor in producing independent agency.

Enculturation, Acculturation, and Self-Selection In an attempt to obtain new insights into the mechanisms underlying cultural change and persistence, we tested non-Hokkaidoborn residents of Hokkaido. If these individuals were no different from the mainland Japanese, the initial enculturation hypothesis would be supported. This hypothesis holds that once they are socialized in the mainland Japan, individuals acquire an interdependent ethos; moreover, once acquired, this ethos is hard to change, even when the individuals are relocated and immersed in Hokkaido culture. If this group were more similar to the Hokkaidoborn residents of Hokkaido, then the acculturation hypothesis or the self-selection hypothesis would be supported. Whereas the acculturation hypothesis proposes that once they are immersed in Hokkaido culture, mainland Japanese will change in the direction of independence, the self-selection hypothesis holds that a relatively small number of mainland Japanese who have acquired independent agency are especially likely to choose to move to Hokkaido. The results are quite suggestive. Whereas the initial enculturation hypothesis was supported for lay theory (Study 3), the selfselection hypothesis was supported for personal goal pursuit (Study 1) and personal choice (Study 2). These findings suggest that self-selection operates primarily on psychological propensities toward personal goals and personal choice. That is, a relative small group of high school graduates in mainland Japan who have strong propensities toward personal goal pursuit and personal choice may be attracted to Hokkaido.9 In fact, this group of people might be quite similar in that particular respect to the original settlers who created the foundations of the present-day Hokkaido several generations ago. This being the case, the same cultural dynamic that created the frontier might still be at work today. Nevertheless, consistent with the idea that acquisition of lay theory depends primarily on behavioral patterns of others who surround the self, these individuals still showed evidence of a more interdependent, nondispositional lay theory. One may speculate that dispositional lay theory that is evident in contemporary Hokkaido was gradually developed by the original settlers and their descendents, whose behavioral propensities were quite suggestive of this lay theory.

The Role of Voluntary Settlement in Western Individualism Future work should compare North Americans with Europeans in a similar set of social psychological tasks. Are Europeans equally prone to the fundamental attribution error? Do they also hold personal forms of happiness or dissonance? Given the evidently critical role of voluntary settlement in fostering these phenomena, we might predict that these effects are somewhat attenuated in Europe— especially in traditionally agrarian regions where people have undergone no history of settlement and resettlement. At present, critical data are missing because of the paucity of systematic cross-cultural work that compares North Americans with Europeans. Moreover, with a few notable exceptions (e.g., Knight et al., 2005; Markova et al., 1998; Semin & Rubini, 1990), within-Europe regional variations largely have been neglected. Yet

381

explorations into these variations are another important avenue of research for testing the voluntary settlement hypothesis. Equally important, there may still remain regional variations within North America. For example, researchers could compare people who have lived in New England over many generations, since the days of the Pilgrims, with those in Montana whose great-grandparents settled there 150 years ago. There is ample reason to suspect that the Montanans might show a stronger ethos of independence than the New Englanders would. This, in fact, seems to be the case. In an attempt to map the distribution of individualistic traits across states in the United States, Vandello and Cohen (1999) created an amalgam index of individualism that was composed of several demographic indicators. These indicators included percentage of people who lived alone, ratio of divorce rate to marriage rate, percentage of older adults who lived alone, and percentage of self-employed people. Overall, there was an overarching tendency for the western states to be substantially more individualistic than the eastern states, with the Mountain West and the Great Plains (including Montana, Wyoming, and Colorado, among others) being especially high in individualism. Moreover, in a recent study on regional variation in various facets of well-being, Plaut et al. (2002) found that both the sense of mastery over the environment and perceived freedom from constraints were especially high in the Mountain West as compared with the rest of the country. These patterns would have been predicted by the voluntary settlement hypothesis. Indeed, Vandello and Cohen (1999) attributed the finding to the fact that “the West was the last remaining frontier of the United States, and even today some parts of the Mountain West remain largely uncultivated and wild” (p. 281).” Notice that the foregoing quote from Vandello and Cohen (1999) would apply equally well to Hokkaido if the United States is substituted with Japan. In fact, the latest statistics from the National Census in Japan (Statistics Bureau, Ministry of Internal Affairs and Communications, 2000) suggest that the divorce rate is nearly two standard deviations higher in Hokkaido (2.50 per 1,000 persons) than the national mean computed from average rates for the 47 prefectures (M ⫽ 1.96, SD ⫽ .28). Only Okinawa and Osaka exceed Hokkaido in divorce rate. Moreover, extended families are relatively less common in Hokkaido than elsewhere in Japan. Thus, in terms of the proportion of households with at least 9 As Japan’s foremost “frontier,” Hokkaido still is not popular as a place to choose for college education. A vast majority of graduates from high schools in mainland Japan attend universities in the mainland, not in Hokkaido. Moreover, most who dare to come all the way to Hokkaido do attend Hokkaido University—the region’s premier institution. Accordingly, although the proportion of non-Hokkaido-born residents at Hokkaido University is substantial (approximately 55%), the proportion of such students in each high school is quite negligible. Self-selection may therefore be strongly suspected. That is, non-Hokkaido-born students at Hokkaido University might be systematically different from their majority counterparts who choose to attend universities in mainland Japan. Indeed, these students are among those most strongly attracted to the ideas of independence, frontier, and personal dream. In support of this conjecture, a biannual survey conducted by Hokkaido University on its new class of students (Hokkaido University official Web site, http://www.hokudai.ac.jp/ bureau/nyu/tottemol) shows that “the cultural and natural climate of Sapporo and Hokkaido” is the most important reason for the out-of-Hokkaido students to have decided to come to Hokkaido.

382

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

one grandchild, Hokkaido is the fifth lowest (10%) of the 47 prefectures (the national average is 21%, SD ⫽ 8.9%). Among the 13 largest metropolitan areas, Sapporo (5%) is the lowest of all (M ⫽ 8%, SD ⫽ 2.0%). Further support for the same conclusion comes from the proportion of older people (65 or older) living alone. Among the 47 prefectures, Hokkaido is the 10th highest. Although it is not compelling by itself, the meaning of this statistic changes once one recognizes that the 8 of the 9 prefectures that show higher averages on this statistic are concentrated in the southern island (Kyusyu) of Japan, where a much milder climate throughout the year makes it relatively easy for older adults to live alone. The only exception is Tokyo—the largest metropolitan center of Japan. Among the prefectures that suffer a severe winter climate, Hokkaido rises to the top on this chart. Although preliminary, this review of the Japanese census data provides further evidence for the voluntary settlement hypothesis.

Concluding Remarks The present work is one of the first attempts to specify the mechanisms and processes that have created American individualism (see Inglehart & Baker, 2000; Sanchez-Burks, 2005, for other notable attempts). The finding that voluntary settlement has the substantial effect of fostering an independent mental set even in an interdependent context suggests a causal impact of this variable in breeding and fueling an individualistic ethos throughout the history of North America. Another important contribution of the present work is that it establishes the existence of significant regional variations within Japan, consistent with the voluntary settlement hypothesis. Although many anthropologists and sociologists have argued, quite reasonably, that cultural groups do not always correspond to any given nation state, the present work is one of the first clear demonstrations of regional variation. This work therefore makes the important methodological point that a careful, theoretically motivated examination of regional variations is just as informative and as important as more traditional approaches of examining macroscopic differences and similarities across broadly defined cultural regions, such as North America and Asia (see Cohen, 2001; Nisbett & Cohen, 1996, for important predecessors). Indeed, the logic of triangulation used in the present work should be used more often as an essential tool for identifying active causal factors of culture in producing a variety of psychological phenomena (Medin et al., in press). We acknowledge some limitations of the present work. First, we examined college students in both the United States and Japan. There is reason to believe that a majority of participants in our studies were from middle or upper middle social classes. Because middle class social status seems to breed an independent ethos, especially in the United States (Schooler, in press; Snibbe & Markus, 2005), it is important to determine whether the same might also be the case in Hokkaido and in Japan or Asia more generally. Second, although our measures of independent agency were quite inclusive and relatively representative of those that have been identified in the last 2 decades of cultural psychological work, future researchers should examine some others, such as attitude– behavior consistency (Triandis, 1989), the motivation-enhancing effect of choice (Iyengar & Lepper, 1999), and attentional sensitivity to relational cues, such as vocal tone (Ishii, Reyes, &

Kitayama, 2003). Moreover, it remains to be seen whether analogous regional variations could be found in nonsocial cognitions, such as analytic and holistic modes of thought (Nisbett, 2003). Yet another limitation of the current work stems from the fact that we did not address questions about cultural practices and institutions that presumably mediate the psychological effects we observed. Careful ethnographic work of both the “Wild West” of North America and the “wild north” of Japan would be informative. In particular, ecological conditions of living in Hokkaido may deserve a careful scrutiny. Moreover, a systematic analysis of cultural products, such as ads and public discourses (e.g., Markus et al., 2006), would also be indispensable. Finally, all these questions must be couched in an overarching developmental framework (Greenfield, Keller, Fuligni, & Marnard, 2003). One further research agenda for the future is to explore the relative importance of the three components of the voluntary settlement hypothesis—that is, self-selection (personality dispositions for emigration), reinforcement (ecological conditions for independence), and institutionalization (practices and meanings of independence). For example, are voluntary settlers systematically different from nonsettlers in certain personality dispositions? If so, is self-selection an integral part of cultures of independence? Although the current work provides some preliminary evidence for such a possibility (see also Chen, Burton, Greenberger, & Dmitrieva, 1999, for an intriguing correlation between migration and a genetic predisposition toward risk taking and Allik & McCrae, 2004, for a systematic geographic distribution of personality traits on the globe), the issue is far from settled. Likewise, does voluntary settlement encourage independent agency only when it is met with harsh ecological conditions? Might it even be safe to assume, as some evolutionary psychological reasoning might (Cosmides & Tooby, 1987), that such ecological conditions automatically “evoke” a culture of independence? In addition, precisely what roles does institutionalization have on independent agency? Is culture possible without institutionalization? There is no easy answer to any of these questions. Yet, by bringing them to the fore, the voluntary settlement hypothesis will help us shed some further light on antecedent conditions of the human mind, such as ecology, society, and culture, insofar as these conditions are tightly intertwined in the frontier.

References Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. Allik, J., & McCrae, R. R. (2004). Toward a geography of personality traits: Patterns of profiles across 36 cultures. Journal of Cross-Cultural Psychology, 35, 13–28. Aronson, E. (1968). Dissonance theory: Progress and problems. In R. P. Abelson, E. Aronson, W. J. McGuire, T. M. Newcomb, M. J. Rosenberg, & P. H. Tannenbaum (Eds.), Theories of cognitive consistency: A sourcebook (pp. 5–27). Chicago: Rand McNally. Bellah, R., Madsen, R., Sullivan, W., Swindler, A., & Tipton, S. (1985). Habits of the heart: Individualism and commitment in American life. Berkeley: University of California Press. Bem, D. J. (1972). Beliefs, attitudes, and human affairs. Belmont, CA: Brooks/Cole. Berry, J. W. (1976). Human ecology and cognitive style: Comparative studies in cultural and psychological adaptation. New York: Sage/ Halsted. Brehm, J. W. (1956). Postdecision changes in the desirability of alternatives. Journal of Abnormal and Social Psychology, 52, 384 –389.

VOLUNTARY SETTLEMENT AND INDEPENDENCE Bureau of Statistics, Imperial Cabinet, Japan. (1924). Re´sume´ statistique de le´mpire du Japon (in Japanese) [Demographic statistics in Japan before census]. Tokyo: Author. Bureau of Statistics, Imperial Cabinet, Japan. (1993). Kokuseichosa izenno Nihon jinko toukei shusei (in Japanese) [Demographic statistics in Japan before census]. Tokyo: Touyoushorin. (Original work published 1872) Chen, C., Burton, M., Greenberger, E., & Dmitrieva, J. (1999). Population migration and the variation of dopamine D4 receptor (DRD4) allele frequencies around the globe. Evolution and Human Behavior, 20, 309 –324. Chiu, C. Y., Hong, Y. Y., & Dweck, C. S. (1977). Lay dispositionism and implicit theories of personality. Journal of Personality and Social Psychology, 73, 19 –30. Choi, I., Nisbett, R. E., & Norenzayan, A. (1999). Causal attribution across cultures: Variation and universality. Psychological Bulletin, 125, 47– 63. Cohen, D. (2001). Cultural variation: Considerations and implications. Psychological Bulletin, 127, 451– 471. Coontz, S. (1992). The way we never were. New York: Harper Collins. Cosmides, L., & Tooby, J. (1987). From evolution to behavior: Evolutionary psychology as the missing link. In J. Dupre (Ed.), The latest on the best: Essays on evolution and optimality (pp. 277–306). Cambridge, England: Cambridge University Press. Dewey, J. (1930). Individualism: Old and new. New York: Minton Balch. de Tocqueville, A. (1969). Democracy in America (13th ed.; J. P. Mayer, Ed., & G. Lawrence, Trans.). New York: Doubleday. (Original work published 1862) Diener, E., & Diener, M. (1995). Cross cultural correlates of life satisfaction and self-esteem. Journal of Personality and Social Psychology, 68, 653– 663. Doi, T. (1973). The anatomy of dependence. Tokyo: Kodansha. Faludi, S. (2003, March 30). An American myth rides into the sunset. New York Times, p. L13. Festinger, L. (1957). A theory of cognitive dissonance. Stanford, CA: Stanford University Press. Fitzhugh, W. W., & Dubreuil, C. O. (1999). Ainu: Spirit of a northern people. Washington, DC.: Arctic Studies Center, National Museum of Natural History, Smithsonian Institution in association with University of Washington Press. Gardner, W. L., Gabriel, S., & Lee, A. Y. (1999). “I” value freedom, but “we” value relationships: Self-construal priming mirrors cultural differences in judgment. Psychological Science, 10, 321–326. Greenfield, P. M., Keller, H., Fuligni, A., & Marnard, A. (2003). Cultural pathways through universal development. Annual Review of Psychology, 54, 461– 490. Harmon-Jones, E., & Mills, J. (1999). Cognitive dissonance: Progress on a pivotal theory in social psychology. Washington, DC: American Psychological Association. Heine, S. J., & Lehman, D. R. (1997). Culture, dissonance, and selfaffirmation. Personality and Social Psychology Bulletin, 23, 389 – 400. Hochschild, J. L. (1995). Facing up to the American dream. Princeton, NJ: Princeton University Press. Hong, Y. Y., Morris, M. W., Chiu, C. Y., & Benet-Martinez, V. (2000). Multicultural minds: A dynamic constructivist approach to culture and cognition. American Psychologist, 55, 709 –720. Hong, Y. Y., Wan, C., No, S., & Chiu, C. Y. (in press). Multicultural identities. In S. Kitayama & D. Cohen (Eds.), Handbook of cultural psychology. New York: Guilford Press. Hoshino-Browne, E., Zanna, A. S., Spencer, S. J., Zanna, M. P., Kitayama, S., & Lackenbauer, S. (2005). On the cultural guises of cognitive dissonance: The case of Easterners and Westerners. Journal of Personality and Social Psychology, 89, 294 –310. Imada, T., & Kitayama, S. (2005). Dissonance, self, and eyes of others in Japan and the US. Unpublished manuscript, University of Michigan, Ann Arbor. Inglehart, R., & Baker, W. E. (2000). Modernization, cultural change, and

383

the persistence of traditional values. American Sociological Review, 65, 19 –51. Ishii, K., Reyes, J. A., & Kitayama, S. (2003). Spontaneous attention to word content versus emotional tone: Differences among three cultures. Psychological Science, 14, 39 – 46. Iyengar, S. S., & Lepper, M. R. (1999). Rethinking the value of choice: A cultural perspective on intrinsic motivation. Journal of Personality and Social Psychology, 76, 349 –366. Jones, E. E. (1979). The rocky road from acts to disposition. American Psychologist, 34, 107–117. Kashima, Y., Kokubo, T., Kashima, E. S., Boxall, D., Yamaguchi, S., & Macrae, K. (2004). Culture and self: Are there within-culture differences in self between metropolitan areas and regional cities? Personality and Social Psychology Bulletin, 30, 816 – 823. Kitayama, S., & Markus, H. R. (2000). The pursuit of happiness and the realization of sympathy: Cultural patterns of self, social relations, and well-being. In E. Diener & E. Suh (Eds.), Subjective well-being across cultures (pp. 113–161). Cambridge, MA: MIT Press. Kitayama, S., Markus, H. R., & Kurokawa, M. (2000). Culture, emotion, and well-being: Good feelings in Japan and the United States. Cognition and Emotion, 14, 93–124. Kitayama, S., Mesquita, B., & Karasawa, M. (in press). Cultural affordances and emotional experience: Socially engaging and disengaging emotions in Japan and the United States. Journal of Personality and Social Psychology. Kitayama, S., Snibbe, A. C., Markus, H. R., & Suzuki, T. (2004). Is there any “free” choice?: Self and dissonance in two cultures. Psychological Science, 15, 527–533. Kitayama, S., & Uchida, Y. (2005). Interdependent agency: An alternative system for action. In R. Sorrentino, D. Cohen, J. M. Olson, & M. P. Zanna (Eds.), Culture and social behavior: The Ontario Symposium (Vol. 10, pp. 165–198). Mahwah, NJ: Erlbaum. Knight, N., Varnum, M. E. W., & Nisbett, R. E. (2005). Culture, class, and categorization. Unpublished manuscript, University of Michigan, Ann Arbor. Kondo, D. (1982). Work, family and the self: A cultural analysis of Japanese family enterprise. Unpublished doctoral dissertation, Harvard University. Kwan, V. S. Y., Bond, M. H., & Singelis, T. M. (1997). Pancultural explanations for life satisfaction: Adding relationship harmony to selfesteem. Journal of Personality and Social Psychology, 73, 1038 –1051. Lebra, T. S. (1976). Japanese patterns of behavior. Honolulu: University of Hawaii Press. Lee, F., Hallahan, M., & Herzog, T. (1996). Explaining real-time events: How culture and domain shape attributions. Personality and Social Psychology Bulletin, 22, 732–741. Markova, I., Moodie, E., Farr, R., Drozda-Senkowaska, E., Eros, F., Plichtova, J., et al. (1998). Social representations of the individual: A post-Communist perspective. European Journal of Social Psychology, 28, 797– 829. Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98, 224 – 253. Markus, H. R., & Kitayama, S. (2004). Models of agency: Sociocultural diversity in the construction of action. In G. Berman & J. Berman (Eds.), The 49th annual Nebraska Symposium on Motivation: Cross-cultural differences in perspectives on self (pp. 18 –74). Lincoln: University of Nebraska Press. Markus, H. R., Uchida, Y., Omoregie, H., Townsend, S., & Kitayama, S. (2006). Going for the gold: American and Japanese models of Olympic agency. Psychological Science, 17, 103–112. Masuda, T., & Kitayama, S. (2004). Perceiver-induced constraint and attitude attribution in Japan and the US: A case for the cultural dependence of the correspondence bias. Journal of Experimental Social Psychology, 40, 409 – 416.

384

KITAYAMA, ISHII, IMADA, TAKEMURA, AND RAMASWAMY

McAdams, D. P. (2005). The redemptive self: Stories Americans live by. New York: Oxford University Press. Medin, D. L., Unsworth, S. J., & Hirschfeld, L. (in press). Culture, categorization and reasoning. In S. Kitayama & D. Cohen (Eds.), Handbook of cultural psychology. New York: Guilford Press. Miller, J. G. (1984). Culture and the development of everyday social explanation. Journal of Personality and Social Psychology, 46, 961– 978. Miyamoto, Y., & Kitayama, S. (2002). Cultural variation in correspondence bias: The critical role of attitude diagnosticity of socially constrained behavior. Journal of Personality and Social Psychology, 83, 1239 –1248. Morling, B., Kitayama, S., & Miyamoto, Y. (2002). Cultural practices emphasize influence in the US and adjustment in Japan. Personality and Social Psychology Bulletin, 28, 311–323. Morris, B. (1991). Western conceptions of the individual. Oxford, England: Berg. Morris, M. W., & Peng, K. (1994). Culture and cause: American and Chinese attributions for social and physical events. Journal of Personality and Social Psychology, 67, 949 –971. Nisbett, R. E. (2003). The geography of thought: How Asians and Westerners think differently and why. New York: Free Press. Nisbett, R. E., & Cohen, D. (1996). Culture of honor. Boulder, CO: Westview Press. Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic vs. analytic cognition. Psychological Review, 108, 291–310. Nisbett, R. E., & Wilson, D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231–253. Oishi, S., & Diener, E. (2001). Goals, culture, and subjective well-being. Personality and Social Psychology Bulletin, 27, 1674 –1682. Plaut, V. C., Markus, H. R., & Lachman, M. E. (2002). Place matters: Consensual features and regional variation in American well-being and self. Journal of Personality and Social Psychology, 83, 160 –184. Ross, L. (1977). The intuitive psychologist and his shortcomings: Distortions in the attribution process. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 10, pp. 174 –220). New York: Academic Press. Sanchez-Burks, J. (2005). Protestant relational ideology: The cognitive underpinnings and organizational implications of an American anomaly. In B. Staw & R. Kramer (Eds.), Research in organizational behavior (Vol. 26, pp. 267–308). New York: Elsevier. Schlesinger, A. M., Jr. (1986). The cycles of American history. Boston: Houghton Mifflin. Schooler, C. (in press). Culture and social structure: The relevance of social structure to cultural psychology. In S. Kitayama & D. Cohen (Eds.), Handbook of cultural psychology. New York: Guilford Press. Schooler, C., & Mulatu, M. S. (2004). Occupational self-direction, intellectual functioning, and self-directed orientation in older workers: Find-

ings and implications for individuals and societies. American Journal of Sociology, 110, 161–197. Semin, G. R., & Rubini, M. (1990). Unfolding the concept of person by verbal abuse. European Journal of Social Psychology, 20, 463– 474. Snibbe, A. C., & Markus, H. R. (2005). You can’t always get what you want. Journal of Personality and Social Psychology, 88, 703–720. Statistics Bureau, Ministry of Internal Affairs and Communications. (2000). Outline of the 2000 population census of Japan. Retrieved July 28, 2006, http://www.stat.go.jp/English/data/kokusei/2000/outline.htm Steele, C. M. (1988). The psychology of self-affirmation: Sustaining the integrity of the self. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 21, pp. 261–302). San Diego, CA: Academic Press. Steele, C. M., Spencer, S. J., & Lynch, M. (1993). Self-image resilience and dissonance: The role of affirmational resources. Journal of Personality and Social Psychology, 64, 885– 896. Stegner, W. (1953). Beyond the hundredth meridian: John Wesley Powell and the second opening of the West. New York: Penguin Books. Stewart, G. R. (1963). Ordeal by hunger: The story of the Donner Party. Boston: Houghton Mifflin. Sua´rez-Orozco, M. (2003). Everything you ever wanted to know about assimilation but were afraid to ask. In R. Shweder, M. Minow, & H. R. Markus (Eds.), Engaging cultural differences: The multicultural challenge in liberal democracies (pp. 19 – 42). New York: Russell Sage Foundation. Taylor, C. (1979). Sources of the self. Cambridge, MA: Harvard University Press. Triandis, H. C. (1989). The self and social behavior in differing cultural contexts. Psychological Review, 96, 506 –520. Triandis, H. C. (1995). Individualism and collectivism. Boulder, CO: Westview Press. Turner, F. J. (1920). The frontier in American history. New York: Henry Holt. Uchida, Y., Norasakkunkit, V., & Kitayama, S. (2004). Cultural constructions of happiness: Theory and empirical evidence. Journal of Happiness Studies, 5, 223–239. Vandello, J. A., & Cohen, D. (1999). Patterns of individualism and collectivism across the United States. Journal of Personality and Social Psychology, 77, 279 –292. Watanabe, H. (1972). The Ainu ecosystem: Environment and group structure. Seattle: University of Washington Press. Weber, M. (1930). Protestant ethic and the spirit of capitalism. Winchester, MA: Allen & Unwin. (Original work published 1904) Zajonc, R. B. (1968). Attitudinal effect of mere exposure. Journal of Personality and Social Psychology, 9, 1–27.

Received April 12, 2005 Revision received November 25, 2005 Accepted November 28, 2005 䡲

Instructions to Authors For Instructions to Authors, please visit www.apa.org/journals/psp and click on the “Instructions to Authors” link in the Journal Info box on the right.

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 385– 405

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.385

At the Boundaries of Automaticity: Negation as Reflective Operation Roland Deutsch, Bertram Gawronski, and Fritz Strack University of Wu¨rzburg The present research investigated whether automatic social– cognitive skills are based on the same representations and processes as their controlled counterparts. Using the cognitive task of negating valence, the authors demonstrate that enhanced practice in negating the valence of a stimulus can lead to changes in the underlying associative representation. However, procedural, rule-based components of negations were generally unaffected by practice (Experiments 1–3). Moreover, negations of evaluative stimuli did not influence automatic evaluative responses to these stimuli, unless the negation was included in the associative representation of a stimulus (Experiments 4 – 6). These results suggest that some practice-related skill improvements are limited to conditions in which a general procedure can be substituted by the retrieval of results of previous applications from associative memory. Implications for research on automaticity and social cognition are discussed. Keywords: automaticity, practice, skill learning, evaluation, priming

is engaged” (Bargh, 1997, p. 28) and thus may ultimately become automatic. A question that has received relatively little attention in the social– cognitive literature, however, is what happens to the underlying computations and representations when a complex social– cognitive skill becomes automatic. This is an important issue because some theories of automatization (e.g., Logan, 1988) suggest that rule-based, algorithmic processes may be substituted by one-step retrieval from associative memory (see Moors & De Houwer, 2006). In such cases, the outcome of a specific rule application is directly retrieved from memory, thus making a deliberate application of the rule obsolete. Such shifts possibly go hand in hand with changes in the processing capabilities of the skill. In many cases, however, association-based retrieval processes may be indistinguishable from controlled rule application. This may erroneously be interpreted as evidence that the controlled processes underlying the skill have themselves become automatic. The present research focuses on representational and computational shifts for a particular mental operation: the negation of valence. Specifically, we demonstrate that the associative outcomes of repeatedly negated evaluations show typical features of automaticity, whereas rule-based components do not. Negations (i.e., the reversal of the truth value of a proposition) have been shown to play a crucial role in many social– cognitive phenomena, such as attitude change (e.g., Jung Grant, Malaviya, & Sternthal, 2004; Petty, Tormala, Brin˜ol, & Jarvis, 2006), stereotype control (e.g., Kawakami, Dovidio, Moll, Hermsen, & Russin, 2000), and person perception (e.g., Mayo, Schul, & Burnstein, 2004). Thus, not only can the study of automaticity in the domain of negations be expected to improve our understanding of automatization processes per se, but it may also provide deeper insights into the underlying processes of many social– cognitive phenomena, including the ones mentioned above.

Many facets of social cognition and behavior are influenced to a large degree by automatic processes (Bargh, 1997). Apparently, various affective and cognitive responses can occur with little awareness, intention, control, and cognitive effort if they are highly practiced (see E. R. Smith, 1989; E. R. Smith, Branscombe, & Borman, 1988; E. R. Smith & Lerner, 1986). Initially, research on automaticity in the social domain addressed rather simple processes such as stereotype or attitude activation (e.g., Devine, 1989; Fazio, Sanbonmatsu, Powell, & Kardes, 1986; Higgins, Rholes, & Jones, 1977). More recently, however, social psychologists have also studied automaticity in the domain of more complex phenomena, such as motivated behavior (Bargh & Barndollar, 1996); problem solving (Dijksterhuis, 2004); trait, causal, and goal inferences (e.g., Hassin, Aarts, & Ferguson, 2005; Hassin, Bargh, & Uleman, 2002; Uleman, 1999); or social comparisons (e.g., Stapel & Blanton, 2004). The omnipresence of automatic phenomena in social psychology has led social cognition researchers to conclude that presumably “any skill, be it perceptual, motor, or cognitive, requires less and less conscious attention the more frequently and consistently it

Roland Deutsch, Bertram Gawronski, and Fritz Strack, Department of Psychology, University of Wu¨rzburg, Wu¨rzburg, Germany. Bertram Gawronski is now at the Department of Psychology, University of Western Ontario, London, Ontario, Canada. This research was supported by Grant Str. 264/21-1 from the German Science Foundation. Portions of this article were presented at the Annual Meeting of the Person Memory Interest Group 2003, West Greenwich, Rhode Island. Experiments 4 – 6 were part of a doctoral thesis, submitted by Roland Deutsch to the University of Wu¨rzburg. We would like to thank the Wu¨rzburg Social Cognition Group, the Group for Attitudes and Persuasion at the Ohio State University, Barbara Kaup, and Eliot Smith for helpful suggestions on this research. Correspondence concerning this article should be addressed to Roland Deutsch, Lehrstuhl fu¨r Psychologie II, Universita¨t Wu¨rzburg, Ro¨ntgenring 10, 97070 Wu¨rzburg, Germany. E-mail: deutsch@ psychologie.uni-wuerzburg.de

Automaticity and Unique Roles of Control Traditionally, automaticity is defined by a set of features such as independence of awareness, independence of intention, high effi385

386

DEUTSCH, GAWRONSKI, AND STRACK

ciency, and little opportunity to inhibit the automatic process voluntarily (see Bargh, 1994). There is great consensus that practice of a skill is a precondition for use of the skill to become more efficient and finally automatic (e.g., Gupta & Cohen, 2002; Logan, 1988; Moors & De Houwer, 2006). Many theories of automatization assume that shifts toward automaticity occur because practice makes time-consuming control processes obsolete. According to Schneider and Shiffrin (1977), a resource-limited process is responsible for coordinated response selection in nonautomatic responding. With consistent practice, however, the respective stimulus–response associations are permanently stored in longterm memory. As a result, merely perceiving a relevant stimulus immediately activates the response (Schneider & Chein, 2003). In a similar vein, Logan (1988) assumed that without practice, general but slow algorithms solve cognitive tasks. When this happens frequently for a specific task, the solution to that task becomes stored in memory and is quickly activated upon the perception of task-relevant stimuli. Consider, for example, the case of mental arithmetic. If a child learns to multiply one-digit numbers, he or she will start out with applying a general rule for multiplication (e.g., repeated additions). With extended practice of a task (e.g., mentally multiplying 6 ⫻ 6), the solution to this task (e.g., 36) becomes stored in memory and associated with the representation of the original task. Thus, the inefficient algorithm becomes obsolete, because the solution can be retrieved directly from memory (see also Zeelenberg, Wagenmakers, & Shiffrin, 2004). What kind of processes may be responsible for inefficiency with unpracticed skills? Theories of control suggest that there may be a limited number of basic control functions, which are inherently inefficient. Among the most important functions are the assembly of new, unlearned sequences of behavior and planning (e.g., Bargh, 2004; Miller & Cohen, 2001); abstract, relational reasoning and the active maintenance of multiple representations (Hummel & Holyoak, 2003; O’Reilly, Braver, & Cohen, 1999); the regulation of response conflicts (e.g., Amodio et al., 2004; Botvinick, Braver, Barch, Carter, & Cohen, 2001); and the inhibition of goal-inappropriate habits (E. E. Smith & Jonides, 1999). There is evidence that these control functions are implemented in a limited number of interconnected neural systems (e.g., Heyder, Suchan, & Daum, 2004; Ridderinkhof, van den Wildenberg, Segalowitz, & Carter, 2004), the most important being the prefrontal cortex (Miller & Cohen, 2001; E. E. Smith & Jonides, 1999). Presumably, these control functions are strongly capacity limited and profit only little from practice. Thus, to the degree a skill contains such elements, it seems plausible that memory retrieval processes rather than automatization of the control functions themselves are responsible for performance enhancements. In addition to memory-based automaticity, some researchers have proposed that abstract procedures or rules can become more efficient with practice (J. R. Anderson, 1993; Gupta & Cohen, 2002). Take again the case of mental arithmetic. Extended practice of multiplying one-digit numbers may also make the general rule or algorithm of repeated additions more efficient. In this case, the algorithm may remain the same, but it may become more efficient because of enhanced accessibility or attunement of the sequence of subprocesses (J. R. Anderson, 1993). Both rule strengthening and instance learning have been well documented in the literature (J. R. Anderson, Fincham, & Douglass, 1997; Gupta & Cohen, 2002; Logan, 1988; Schneider & Chein, 2003; Schneider & Shiffrin,

1977; Shiffrin & Dumais, 1981; Strayer & Kramer, 1990). However, it is less clear whether rule strengthening generates true automaticity. Rule strengthening was primarily documented with skills operating far from automatic performance. For instance, J. R. Anderson et al. (1997) had participants extensively practice different semantic rules over 4 –5 days; response latencies were well above 1,000 ms in all experiments at the end of training. E. R. Smith et al. (1988) found evidence for an increase in the efficiency of a general procedure to infer traits from behaviors. However, the degree of training was rather limited (60 –250 trials), and response latencies remained above 1,000 ms after training. Other studies reported comparable (or even lower) degrees of practice and speed of responding (Ru¨ter & Mussweiler, 2004; E. R. Smith & Lerner, 1986). Thus, even though practice may indeed speed up the original computations, we are not aware of studies showing that extended practice results in very fast response latencies that would suggest independence from intentional control.

Automatic Social Cognition The above analysis is of great importance for automaticity in the social domain. Phenomena such as stereotype and attitude activation can be readily reconstructed as instance-based automaticity. For example, perceiving a person of a stereotyped group or an attitude object may be sufficient to activate well-practiced stereotypic or evaluative associations in memory. The application of stereotypes, however, may require controlled processing to establish a relation between the stereotype and a concrete person. Even more so, complex social– cognitive skills probably represent a mixture of automatic activation in associative memory on the one hand, and core control processes on the other hand. For instance, pursuing an unpracticed goal involves the activation of the goal construct in memory, the directed search of possible means to achieve that goal, and their combination to new sequences of action (e.g., Bargh, 2004). To the degree that these genuine control elements cannot be fully automatized, one can expect that the computations underlying complex social– cognitive skills change during automatization. This assumption is in line with current dual-system models of social cognition (e.g., Lieberman, Gaunt, Gilbert, & Trope, 2002; E. R. Smith & DeCoster, 2000; Strack & Deutsch, 2004). These theories distinguish between two processing systems, which differ in the way and degree to which they support automatic processing. In these theories, the system responsible for cognitive control is assumed to generate and manipulate symbolic, propositional representations on the basis of abstract rules of reasoning. The automatic system, in contrast, is assumed to generate responses on the basis of simple associative structures and spread of activation through associated contents. These associative processes lack abstract thinking capabilities like negation or explicit representations of time, are content-specific, and are less flexible than reflection. Resembling Logan’s (1988) instance theory, these models explicitly assume that frequently generating a response in a rule-based, controlled manner creates “the conditions for associative learning, so eventually the same answer can be retrieved by pattern-completion from the associative system, rendering the step-by-step procedure superfluous” (E. R. Smith & DeCoster, 2000, pp. 115–116). Hence, in these models, automatization is a consequence of responding being transferred from the

BOUNDARIES OF AUTOMATICITY

rule-based, reflective system to the association-based impulsive system (Lieberman et al., 2002; Strack & Deutsch, 2004). So far, very few studies have addressed the question of possible changes in the representations and computations underlying complex social cognitive skills (e.g., Ru¨ter & Mussweiler, 2004; E. R. Smith, 1989; E. R. Smith et al., 1988; E. R. Smith & Lerner, 1986). From a general perspective, these studies provide evidence for both rule strengthening and instance learning. Yet, as described above, performance in these studies was far from automatic. Another ambiguity results from the fact that studies in the social domain usually estimate the degree of content-independent practice through the generalization of practice to new instances. However, observed generalization may have occurred because of memory activation instead of rule strengthening. For instance, E. R. Smith and Lerner (1986) repeatedly asked participants to indicate whether a given list of four traits was typical for a waitress (or a librarian). After participants practiced this task with typical and nontypical traits, the target stereotype was switched. Then, participants had to perform the same task with the librarian (or waitress) stereotype. E. R. Smith and Lerner observed a considerable transfer from one task to the other. As E. R. Smith and Lerner concluded, this may be regarded as evidence for a genuine speed up of the cognitive procedure independent of the content. However, one could object that the employed traits and stereotypes share semantic overlap. For instance, some traits that are stereotypically attributed to waitresses (e.g., being extraverted) are the exact opposite of what is stereotypically attributed to librarians (e.g., being introverted). Because activating one pole of a dimension in memory often increases the accessibility of the whole dimension (Park, Yoon, Kim, & Wyer, 2001), practicing to respond to the waitress stereotype may also enhance the accessibility of semantic contents relevant to the librarian stereotype, thus leading to transfer effects. Similar content-based effects may have also been prevalent in other studies using generalization of practice as a criterion (e.g., E. R. Smith, 1989; E. R. Smith et al., 1988). Generally speaking, as long as there is semantic overlap between the materials used for practice and transfer, results of generalization paradigms are still open to alternative interpretations. In a similar vein, demonstrating that a preexisting complex social– cognitive skill can be executed with little intention, consciousness, control, and cognitive effort does not provide clear evidence that the skill is still based on the same computations as the controlled counterpart. Such demonstrations require experimental situations in which instance-based responding and rulebased responding yield diverging results. Relying on already existing skills makes this endeavor difficult, because it is not known what exactly is stored in memory by the virtue of previous practice.

The Present Research The above analysis implies that the representations and computations underlying cognitive skills may change over the course of automatization. Changes in underlying representations are of greatest relevance for those social– cognitive skills, which originally include primary control functions, such as planning, regulation of unwanted habits, and abstract reasoning. In these cases, a shift from rule-based to association-based processing will go hand in hand with a loss of distinct properties of the original skill. Many

387

previous studies demonstrating practice-related generalization effects in the social domain contain a semantic overlap in the employed stimulus material (e.g., E. R. Smith, 1989; E. R. Smith et al., 1988; E. R. Smith & Lerner, 1986). As such, these studies are limited in their conclusiveness of the observed generalization to new exemplars. In the present research, we tried to overcome these limitations by using a task that allows measurement of the unique impact of rule-based and content-based elements independent of generalization. Specifically, we designed an evaluation task comprising affirmations and negations attached to positive and negative words. This skill to negate the evaluative meaning of a proposition is particularly fit to overcome the problems associated with previous practice studies based on generalization. More precisely, we argue that comparing responses to affirmed and negated words allows for direct estimation of the speed of the procedure to negate. A second reason for the study of negations is their significance for many social– cognitive phenomena. Negations have gained considerable interest in social– cognition research during the past decade. Generally, negations were shown to put a particular strain on cognition. With little motivation and resources, people were demonstrated to fail to extract the meaning of negations and to respond in a way opposite to what was implied by logic. For instance, in research on persuasion, several studies have demonstrated that persuasive attempts containing negated terms (e.g., Drinking is not sexy) can lead to attitude changes in the opposite direction of what was intended (e.g., making drinking more attractive; Christie et al., 2001; Jung Grant et al., 2004; Skurnik, Yoon, Park, & Schwarz, 2005). In the realm of behavior-to-trait inferences, Mayo et al. (2004) showed that perceivers need more cognitive resources to infer the absence of traits from behaviors than to infer their presence, unless there is a schema for transforming the negation into an affirmative concept (see also Hasson, Simmons, & Todorov, 2005). Studying the role of negations in stereotype control, Kawakami et al. (2000) found that highly extensive training in the negation of stereotypic associations can reduce their automatic activation in memory. In addition to these findings, numerous studies have suggested that attitudes and beliefs tend to sustain in memory at a residual level, even when their original basis was invalidated by negation (e.g., C. A. Anderson, 1982; Petty et al., 2006; Walster, Berscheid, Abrahams, & Aronson, 1967; Wyer & Unverzagt, 1985). A similar perseverance is involved in the innuendo effect, in which a negated negative statement about a specific person (e.g., This politician was not bribed) leads to more negative attitudes toward this person (Wegner, Wenzlaff, Kerker, & Beattie, 1981). Given the significance of negations for these phenomena, we expect the present research to provide deeper insights into the cognitive processes that may be responsible for the abovementioned findings. Finally, negation can be seen as a prototype of an abstract, rule-based reasoning process. Particularly, explicit negations require a propositional representation, in which the meaning of the negated construct (e.g., This is not a friend) is activated and maintained in working memory while the meaning of the negated proposition (e.g., This is an enemy) is construed (e.g., Kaup, Zwaan, & Lu¨dtke, in press). Such maintenance and construal processes are a core function of cognitive control (Miller & Cohen, 2001) and may play an important role for a number of social– cognitive processes. For instance, generating explicit inferences

DEUTSCH, GAWRONSKI, AND STRACK

388

about relations between people presumably requires symbolic, abstract reasoning (Hummel & Holyoak, 2003). Likewise, generating and correcting causal attributions may specifically rely on the same type of reasoning (e.g., Lieberman et al., 2002; Satpute et al., 2005). In their dual-system model, E. R. Smith and DeCoster (2000) described a number of social– cognitive processes that may be based on symbolic, rule-based processing, among them counterfactual thinking, social transmittal of knowledge, the justification of attitudes and behaviors, and the correction of socially undesirable stereotypes or attitudes (see also Strack & Deutsch, 2004). In a similar vein, Miller and Cohen (2001) argued that cognitive control involves the “active maintenance of patterns of activity that represent goals and the means to achieve them” (p. 171). Particularly, control is seen as responsible to store and flexibly switch between abstract rules of responding. Such processes are especially important for the negation of valence. Usually, words appear in affirmed versions, and their perception is often sufficient to activate their valence in memory. If a negation is attached to a word, a correct task solution requires one to override expressing the automatically activated evaluation, and to substitute it with the inferred valence. Thus, findings regarding the automatization of negations may help to determine how explicit social– cognitive processes involving flexible rule-based, propositional reasoning respond to enhanced practice. To investigate the quality of automatization processes in the context of valence negation, we conducted a total of six experiments. In Experiments 1–3, participants practiced evaluating affirmed and negated positive and negative words. We expected that training would speed up responses in general through contentbased mechanisms. However, the overall speed to negate the valence of a word should be unaffected by practice. In Experiments 4 – 6, we studied automatic evaluations of affirmed and negated positive and negative words in a sequential priming task. We expected that the stored evaluative meaning of a given word is activated automatically. However, negating its evaluative meaning should require higher order rule-based processes, unless the compound meaning of the negated word is stored as a separate instance in associative memory.

Experiment 1 The aim of Experiment 1 was to study how practice affects associative and rule-based aspects of evaluation. To disentangle these elements, we used the subtraction method proposed by Donders (1969). Specifically, we construed a task in which participants had to evaluate affirmed (e.g., a party) and negated (e.g., no party) versions of positive and negative words by pressing appropriate keys. Over six blocks, participants practiced this task in a total of 600 trials. In each of these blocks, a given word appeared equally often in an affirmed and a negated version. The rationale for using this setup becomes apparent in Figure 1. In response to both affirmed and negated target words, participants must determine the valence of the word to identify the correct key. As long as the words have a clear positive or negative connotation, this process is presumably based on memory activation. With affirmed targets (see Figure 1A), this memory activation process is sufficient to determine the correct response. With negated targets (see Figure 1B), however, the retrieved valence must be reversed to determine the correct response (see Clark & Chase, 1974;

Figure 1. Response-latency model for Experiments 1–3. In response to affirmed (A) and negated (B) targets, the word valence must be determined in order to determine the correct response (left vs. right key). Negated targets, however, additionally require participants to reverse the word valence.

Gilbert, 1991; Gough, 1965; Wason, 1959). Hence, the difference in response latencies toward affirmed and negated targets can be used as an estimate of the time needed to reverse the word valence (Donders, 1969). Based on the considerations outlined above, we expected that the activation of word valence and the reversal of word valence are differentially affected by practice. Particularly, we expected that when participants retrieve the valence of a given word over and over again, its valence should become highly accessible in memory (Fazio, 1995). Moreover, the mapping of valence and response keys should be stored in memory, such that pressing the correct key associated with a given valence should become more efficient with practice. Therefore, we expected an overall speed up of response latencies. The reversal of the word valence, on the other hand, constitutes a general procedure that has its roots in higher order rule-based processes. As such, the speed of valence reversal should be unaffected by extended practice. In other words, the difference in response latencies for affirmed and negated words can be interpreted as a proxy for the speed of reversal, and this difference should remain constant over various levels of practice.

Method Participants and Design A total of 42 students of the University of Wu¨rzburg (28 women, 14 men) took part in a study purportedly concerned with attention and performance. Participants received €6 (approximately U.S. $5 at that time) as compensation. The experiment consisted of a 2 (word valence: positive vs. negative) ⫻ 2 (qualifier: affirmation vs. negation) ⫻ 6 (practice block: 1– 6) within-subject design.

Procedure The experiment was part of a larger set of unrelated studies and took about 40 min. The whole battery of studies took about 1 hr. Under the guise of studying the ability to concentrate while working with a computer, participants repeatedly evaluated affirmed and negated words. Participants worked on six blocks of practice, each consisting of 100 trials. The six blocks were separated by breaks of 20 s. In the course of each block, each of 20 stimuli (5 each affirmed positive, negated positive, affirmed negative, and negated negative) was presented 5 times. Consequently, participants evaluated each qualifier–word combination 30 times during this experiment. Each trial started with the presentation of a warning signal (XXX) in the center of the screen for 500 ms followed by a blank screen for 200 ms. Then the stimulus was presented in bold 30-point Arial font letters in bright yellow color on a black background. Participants were asked to press a

BOUNDARIES OF AUTOMATICITY left-hand key (A key) for positive stimuli and a right-hand key (5 number pad key) for negative stimuli. After correct responses, the next trial started immediately, resulting in a response–stimulus interval of 700 ms. For incorrect responses, participants received error feedback (Error! Positive – left, negative – right), which remained on the screen for 1,500 ms. If participants did not respond within 2,000 ms, the trial was aborted, and a warning message (Try to respond faster!) was displayed for 1,500 ms. Immediately after feedback for errors and slow responses, the next trial started, resulting in a feedback–stimulus interval of 700 ms.

Materials Each participant practiced with 5 positive and 5 negative words, each of which repeatedly appeared in both an affirmed and negated form. We conducted a pretest to identify negations of low frequency in everyday language. We reasoned that the use of frequently negated words would prevent participants from actually practicing negations because their valence could be directly retrieved from memory (see Experiment 6). For this purpose, negations of 53 positive and 53 negative words were judged by 71 psychology students with regard to their frequency and their valence. From these stimuli, we selected 10 positive and 10 negative words, which revealed low frequency estimates in their negated form, but still exhibited unambiguous valence (see Appendix A for the words and Appendix B for pretest data). Four random subsets, each consisting of 5 positive and 5 negative words, were chosen and combined with either an affirming or negating qualifier. Each participant was randomly assigned to one of the four subsets.

Results Trials on which participants classified the target incorrectly (7.7%), as well as the first reaction in each block were excluded from analyses. No anticipations (reaction time [RT] ⬍ 300 ms) occurred. RTs decreased as a negatively accelerated function of practice, reaching an asymptote of learning after Block 4 (see Figure 2). This conclusion is supported by the results of a 2 (word valence) ⫻ 2 (qualifier) ⫻ 6 (practice block) analysis of variance (ANOVA) for repeated measures,1 which yielded a main effect of practice block, F(5, 205) ⫽ 44.16, p ⬍ .001, ␩2 ⫽ .51. The

Affirmation

1300

Negation

1200

RT (ms)

1100 1000 900

389

respective contrasts were significant up to Block 4 (all Fs ⬎ 6.80, all ps ⬍ .05), whereas no significant increase occurred in the last two blocks (all Fs ⬍ 0.50, all ps ⱖ .5). Most important to our hypotheses, participants responded slower to negated targets (M ⫽ 952 ms, SD ⫽ 119 ms) as compared with affirmed targets (M ⫽ 849 ms, SD ⫽ 104 ms), and this processing advantage of affirmed words was unaffected by practice. This conclusion is supported by a significant main effect of qualifier, F(1, 41) ⫽ 358.60, p ⬍ .001, ␩2 ⫽ .90, and a nonsignificant interaction of Block ⫻ Qualifier, F(5, 205) ⫽ 0.14, p ⫽ .96, ␩2 ⬍ .01 (see Table 1). To further specify this result, we calculated the cost of reversing word valence by subtracting the latencies of affirmed trials from the latencies of negated trials as a function of the six blocks. Independent of the degree of practice, responding to a negated word took about 100 ms longer than responding to an affirmed word. In addition to these predicted effects, the specific valence of a word influenced response times in several ways. First, negative words (M ⫽ 940 ms, SD ⫽ 123 ms) were evaluated more slowly than positive words (M ⫽ 860 ms, SD ⫽ 102 ms), F(1, 41) ⫽ 121.87, p ⬍ .001, ␩2 ⫽ .75. In addition, responses to negative words profited more strongly from practice than positive words, F(5, 205) ⫽ 5.88, p ⬍ .001, ␩2 ⫽ .13. Finally, although affirmed words were always evaluated faster than negated words, this effect was somewhat smaller for negative words (Maffirmed ⫽ 906 ms, SDaffirmed ⫽ 124 ms vs. Mnegated ⫽ 975 ms, SDnegated ⫽ 126 ms) as compared with positive words (Maffirmed ⫽ 792 ms, SDaffirmed ⫽ 91 ms vs. Mnegated ⫽ 929 ms, SDnegated ⫽ 116 ms), F(1, 41) ⫽ 75.42, p ⬍ .001, ␩2 ⫽ .65.

Discussion The present results suggest that participants’ performance in the evaluation task strongly profited from the training. In Block 6, participants needed 13% less time than in Block 1 to respond to affirmed words and 11% less time for negated words. Thus, our training procedure was indeed effective in speeding up responses in the evaluation task. In addition, contrast analyses indicate that latencies did not further decrease between Blocks 4 and 6, suggesting an asymptotic change in performance. Most important, however, the time required to negate the valence of a word was generally unaffected by practice. Overall, responses to negated words required about 100 ms more than responses to affirmed versions of the same words, and this difference was unaffected by the degree of practice. In other words, it seems that responses became quicker because either (a) extracting the valence of the target words or (b) mapping of valence and motor responses (or both) became more efficient. However, reversing the valence of a word did not become more efficient through practice.

800

Experiment 2

700 0

1

2

3

4

5

6

7

Block

Figure 2. Response latencies to affirmed and negated words as a function of practice block (Experiment 1). Error bars indicate the standard errors of the means. RT ⫽ response time.

Even though results from Experiment 1 are consistent with our predictions, the present conclusions are contingent upon the assumption that the difference in response latencies for affirmed and 1

Degrees of freedom were adjusted according to Greenhouse-Geisser where appropriate.

DEUTSCH, GAWRONSKI, AND STRACK

Table 1 Mean Response Latencies, Standard Errors, and Percentages of Error for Responses to Affirmed and Negated Words as a Function of Practice Block, Experiment 1 Block Qualifier Affirmation M SEM Error % Negation M SEM Error %

1

2

3

4

5

6

934.29 18.69 9.64

862.41 17.40 6.82

841.23 16.47 6.45

822.66 16.61 4.04

817.60 16.95 4.43

813.94 17.11 3.60

1,034.42 20.49 14.51

963.11 18.97 11.47

947.20 18.28 9.37

924.74 18.22 8.50

924.81 22.71 8.32

917.46 21.02 6.88

negated targets truly reflects the speed of negation. Thus, Experiment 2 was designed to provide additional evidence that is independent from this measure. In this study, we investigated whether practice effects generalize to new, unpracticed instances. If practice effects leave responses to new, unpracticed items unaffected, this would provide additional support for our assumption that effects of practice are primarily driven by associative mechanisms of memory activation. In the present context, studying generalization seems important because memory-based automaticity is bound to the exemplars which were practiced and stored in memory, whereas general procedures are not (e.g., Logan, 1988; E. R. Smith et al., 1988). In Experiment 2, the setup of Experiment 1 was used up to the fifth block of learning. Block 6, however, consisted of new, unpracticed affirmed and negated words. On the basis of the response-latency model outlined for Experiment 1 (see Figure 1), we predicted that the retrieval of word valence, as well as the mapping of valence and response keys, should become more efficient with practice. Hence, we expected response latencies to drop as a function of practice for both affirmed and negated words up to Block 5. However, the speed of valence reversal, and hence the difference between affirmed and negated trials, should remain constant. For Block 6, we expected that practice in valenceresponse mapping might to some degree transfer to the new items. However, participants’ performance level in Block 6 should not reach the performance level in Block 5, because of the lack of prior valence activation for the new, unpracticed items. Most important, reversing the valence of words should not profit from training at all and, hence, should require the same amount of time in Block 6 as in all of the previous blocks. As such, the difference in response latencies for affirmed and negated items in Block 6 should be equal to those obtained in Blocks 1–5. If, contrary to our reasoning, the obtained speed up in responding to affirmed and negated words was due to the strengthening of general procedures, this general skill should be transferable to the new items. In this case, responses to the new negated words should profit from the previous practice, whereas responses to the new affirmed words should not profit at all. As such, latencies for new affirmed words should be much closer to the performance in the early blocks than to the latencies for new negated blocks.

Method Participants and Design Thirty-three students of the University of Wu¨rzburg (25 women, 8 men) took part in a study purportedly concerned with attention and performance. Participants received €6 (approximately U.S. $5 at that time) as compensation. The experiment consisted of a 2 (word valence: positive vs. negative) ⫻ 2 (qualifier: affirmation vs. negation) ⫻ 5 (practice block: 1–5) within-subject design. In addition, we included a sixth block in which a new set of words was used.

Procedure The present experiment lasted about 40 min and was part of a larger set of unrelated studies. The training phase was identical to that of Experiment 1 with one exception. Instead of practicing with the same set of affirmed and negated words in all six blocks, participants practiced with one set of words in Blocks 1–5, and then they were tested with a new set of words in Block 6.

Materials The same 10 positive and 10 negative words as in Experiment 1 were randomly divided into two sets (Set A and Set B), each containing 5 positive and 5 negative words, which were then used either in Blocks 1–5 or in Block 6. In each block, every word was presented five times with an affirmation and five times with a negation. In one condition, participants practiced with Set A in Blocks 1–5 and received Set B in Block 6; in another condition, participants practiced with Set B in Blocks 1–5, and received Set A in Block 6.

Results Incorrect responses (9.3%), anticipations (RT ⬍ 300 ms, 0.1%), and the first response in each block were excluded from analyses. Replicating the findings obtained in Experiment 1, response latencies decreased as a negatively accelerated function of practice, reaching an asymptote of learning after Block 4 (see Figure 3). This conclusion is supported by the results of a 2 (word valence) ⫻

1300

Affirmation

Negation

Affirmation New

Negation New

1200 1100 RT (ms)

390

1000 900 800 700 0

1

2

3 4 Block

5

6

7

Figure 3. Response latencies to affirmed and negated words as a function of practice (Experiment 2). Error bars indicate the standard errors of the means. RT ⫽ response time.

BOUNDARIES OF AUTOMATICITY

2 (qualifier) ⫻ 5 (practice block) ANOVA for repeated measures,2 which yielded a significant main effect for block, F(4, 128) ⫽ 33.91, p ⬍ .001, ␩2 ⫽ .51. In addition, a significant main effect of qualifier indicated that response to negated words was generally slower than response to affirmed words, F(1, 32) ⫽ 340.67, p ⬍ .001, ␩2 ⫽ .91. Most important, this effect was independent of practice, as reflected by the nonsignificant interaction of Block ⫻ Qualifier, F(4, 128) ⫽ 1.61, p ⫽ .19, ␩2 ⫽ .05. How did responses to new, unpracticed items profit from previous practice? Inspection of mean values indicates that performance for both affirmed and negated new items was somewhere in between the performances in Blocks 1 and 2 (see Table 2). This observation is supported by contrast analyses showing that latencies in Block 6 differed from all other blocks for both affirmed (all Fs ⬎ 4.00, all ps ⬍ .06) and negated words (all Fs ⬎ 5.40, all ps ⬍ .03). Most important, the increase in response latencies from Block 5 to Block 6 was identical for affirmed (M ⫽ 6.54%, SD ⫽ 7.49%) and negated (M ⫽ 6.78%, SD ⫽ 6.61%) new words, F(1, 32) ⫽ 0.03, p ⫽ .86, ␩2 ⬍ .01. Likewise, the decrease in response latencies from Block 1 to Block 6 did not differ for affirmed (M ⫽ 3.81%, SD ⫽ 7.90%) and negated (M ⫽ 4.10%, SD ⫽ 8.46%) new words, F(1, 32) ⫽ 0.04, p ⫽ .84, ␩2 ⬍ .01. An ANOVA on the cost of negation using block (1– 6) as a within-subject factor yielded no significant main effect of block, F(5, 160) ⫽ 1.47, p ⫽ .20, ␩2 ⫽ .04, suggesting that negation speed was unaffected by practice for both trained and untrained items.

Discussion The results of Experiment 2 provide further evidence for the memory-based nature of practice effects on negating valence. Replicating the basic findings of Experiment 1, we observed a general speed up in responses to both affirmed and negated words from Block 1 to Block 5. At the same time, however, the difference in response latencies remained constant, indicating that the time required to reverse word valence was unaffected by practice. This pattern corroborates our assumption that the increased accessibility of word valence and stored valence–response associations were responsible for practice effects, whereas the general procedure to negate did not become more efficient by practice. The introduction of a sixth block, consisting of new, unpracticed targets provided further evidence for this notion. Consistent with our response-

Table 2 Mean Response Latencies, Standard Errors, and Percentages of Error for Responses to Affirmed and Negated Words as a Function of Practice Block, Experiment 2 Block Qualifier Affirmation M SEM Error % Negation M SEM Error %

1

2

3

4

5

6

978.28 20.45 8.37

909.60 18.94 7.09

897.51 18.20 5.31

885.72 16.96 4.91

872.81 16.19 5.93

937.25 18.48 6.74

1,120.71 19.14 17.91

1,032.55 20.21 13.89

1,014.89 22.86 12.24

993.00 21.12 8.80

997.94 20.90 10.04

1,070.81 18.48 13.59

391

latency model (see Figure 1), responses in Block 6 were slightly faster than in Block 1. This transfer effect, however, was identical for affirmed and negated targets. If the general procedure of negating valence had become more efficient through practice, transfer effects should have been stronger for negated than for affirmed targets. The symmetrical nature of the transfer effect, however, indicates that transfer was solely due to stored valence– response associations in memory. In other words, Experiments 1 and 2 both demonstrated that responding to negated items becomes quicker with practice, which in itself might have been interpreted as evidence for an increase in the efficiency to negate. However, our data on generalization and the constant difference between responses toward affirmed and negated words indicate that the general procedure to negate did not become more efficient with extended practice. Rather, improvements in the skill to evaluate were driven by memory-based, content-specific mechanisms. Experiment 3 further demonstrates how content-based mechanisms can lead to performance levels that strongly resemble the speedup of an abstract procedure.

Experiment 3 As indicated by Experiments 1 and 2, evaluating negated expressions requires the application of the general procedure to negate, and practice did not enhance the efficiency of this procedure. In Experiment 3, we tried to implement conditions that facilitate memory-based automaticity, thus making the general procedure obsolete. Such memory-based automaticity can be expected if practice creates associations between the representation of the stimuli and the correct solution in memory. For instance, the compound term no way is a frequently used expression in everyday English. As such, the meaning of this term may be activated in memory without applying the operation of negation. That is, the compound term may have acquired independent meaning in associative memory, which does not require controlled construal processes upon the perception of the two words. An infrequent negation, however, such as no hay, would still require the application of the negation (see Mayo et al., 2004). The training conditions in our first two experiments presumably prevented the compounds from acquiring a new meaning. Research by Schneider and Shiffrin (1977) suggested that automatic responding is most likely to occur if a stimulus is consistently paired with the same response. In the previous experiments, however, each word was processed equally often in an affirmed and a negated version, thus requiring directly opposing responses. To facilitate the emergence of memory-based automatic processing of negations, we used a modified practice task in Experiment 3. Different from the previous experiments, each word appeared either in an affirmed or a negated manner during training, but never in both versions. For instance, one group of participants was presented the word party always with a negating qualifier (i.e., no party), but never with an affirming qualifier (i.e., a party). In a second condition, the same word always appeared in an affirmed manner, but never in a negated manner. This procedural change eradicates the inconsistent pairing of words and valence in mem2

Degrees of freedom were adjusted according to Greenhouse-Geisser where appropriate.

DEUTSCH, GAWRONSKI, AND STRACK

392

ory, which was prevalent in the first two experiments. Consequently, we expected that participants would store the respective compound meanings together with their valence in associative memory. Thus, with increased practice, the compounds should be stored as new concepts in associative memory, and their evaluative meaning should be activated as easily as for affirmed compounds. Drawing on these considerations, we expected the difference between affirmed and negated trials to decrease as a function of practice under the conditions implemented in Experiment 3. A problem of the proposed setup is, however, that participants may associate a given compound stimulus with the left-hand or the right-hand key without activating the specific valence. For instance, participants may learn that the compound no party always implies to press the right-hand key. Hence, seeing no party may become associated with right-hand key instead of negativity. To prevent such stimulus– key associations, the mapping of valence and key was changed from trial to trial. At the beginning of each trial, participants were informed whether they had to press the left (right) key if a positive (negative) expression appeared on the screen. Thereafter, the affirmed or negated word appeared, and participants had to press the appropriate key. This way, a given compound term was always associated with the same valence, but not with the same key.

Method Participants and Design Twenty-one psychology students (18 women, 3 men) of the University of Wu¨rzburg took part in the present study, purportedly on concentration. Participants received course credit for their participation. The experiment consisted of a 2 (word valence: positive vs. negative) ⫻ 2 (qualifier: affirmation vs. negation) ⫻ 6 (practice block: 1– 6) within-subject design.

respond to the overall valence of the compound (instead of the valence of the single words). The critical learning phase consisted of six blocks. In each of the six blocks, each of five exemplars of the four types of stimuli (i.e., affirmed positive, affirmed negative, negated positive, negated negative) were presented five times, resulting in a total of 100 trials per block.

Materials For the practice phase, 10 positive and 10 negative words (see Appendix C) were selected from a standardized list of positive and negative words published by Klauer and Musch (1999). For the critical training blocks, the same 10 positive and 10 negative words as in Experiment 1 were randomly divided into two sets (Set A and Set B), each consisting of 5 positive and 5 negative words. In one experimental condition, all words from Set A were presented in a negated form, and all words from Set B were presented in an affirmed manner. In a second condition, the set assignment was reversed.

Results Incorrect responses (7.0%), anticipations (RT ⬍ 300 ms, 0.04%), and the first reaction in each block was excluded from analyses. Even though the overall response latencies were much longer than in the previous two studies (most likely because of the newly implemented key switching), RTs decreased as a negatively accelerated function of practice (see Figure 4). Different from the previous studies, however, only responses to affirmed stimuli reached asymptotic learning, whereas the responses to negated stimuli continued to become quicker up to the last block. Although responses to negated stimuli again took considerably longer than responses to affirmed stimuli, this difference was reduced by practice. These interpretations are supported by the results of a 2 (word valence) ⫻ 2 (qualifier) ⫻ 6 (practice block) ANOVA for

Procedure Affirmation Negation

1700

1500 RT (ms)

The experiment took about 30 min and was conducted in group sessions with up to 3 participants. The procedure was identical to that of Experiment 1 with the following exceptions. To familiarize participants with the alternating valence-key mapping, we included a practice phase of 40 trials, in which participants repeatedly evaluated 10 positive and 10 negative nouns without qualifiers. Participants were instructed to evaluate the words as quickly as possible by pressing one of two keys, and they were informed about the alternating key assignment. Each trial started with the presentation of a warning signal (XXX) in the center of the screen for 200 ms. Immediately afterward, the words positive and negative were presented on the left and the right side of the letter string, indicating the key assignment for the upcoming trial. After 1,000 ms, a positive or negative target word was presented in the center of the screen. The key assignment and the target remained on the screen until participants responded. If participants responded correctly, the next trial started immediately, resulting in a response–stimulus interval of 1,200 ms. For incorrect responses, participants received error feedback (Error!), which remained on the screen for 1,500 ms. If participants did not respond within 2,000 ms, the trial was aborted and a warning message (Try to respond faster!) was displayed for 1,500 ms. Immediately after feedback for errors and slow responses, the next trial started, resulting in a feedback–stimulus interval of 1,200 ms. The actual training blocks were identical to the practice phase with two exceptions. Specifically, participants were presented compounds of new affirmed and negated words (instead of single words), and they were instructed to

1300

1100

900 0

1

2

3

4

5

6

7

Block Figure 4. Response latencies to affirmed and negated words as a function of practice (Experiment 3). Error bars indicate the standard errors of the means. Note that because of longer response latencies, the scaling differs from Figures 2 and 3. RT ⫽ response time.

BOUNDARIES OF AUTOMATICITY

393

Table 3 Mean Response Latencies, Standard Errors, and Percentages of Error for Responses to Affirmed and Negated Words as a Function of Practice Block, Experiment 3 Block Qualifier Affirmation M SEM Error % Negation M SEM Error %

1

2

3

4

5

6

1,319.12 33.18 6.82

1,171.45 34.26 5.73

1,145.72 35.60 5.14

1,075.87 33.24 5.06

1,051.31 36.11 5.60

1,036.33 31.15 5.62

1,560.96 44.28 10.54

1,381.27 43.60 8.97

1,342.38 44.91 6.94

1,303.24 44.53 8.23

1,254.20 42.40 8.07

1,202.34 36.18 8.89

repeated measures,3 which yielded a significant main effect for block, F(5, 100) ⫽ 51.01, p ⬍ .001, ␩2 ⫽ .72; a significant main effect of qualifier, F(1, 20) ⫽ 113.91, p ⬍ .001, ␩2 ⫽ .85; and most important, a significant interaction of Qualifier ⫻ Block, F(5, 100) ⫽ 2.55, p ⫽ .032, ␩2 ⫽ .11 (see Table 3). Simple contrasts revealed significant learning effects for affirmed stimuli up to Block 4 (except for the contrast between Blocks 2 and 3; all Fs ⬎ 11.00, all ps ⬍ .03), but not for the last two blocks (all Fs ⬍ 2.70, all ps ⱖ .1). For negated stimuli, on the other hand, all contrasts (except for the one between Blocks 4 and 5) were significant (all Fs ⬎ 4.00, all ps ⬍ .051). To further specify the significant interaction, we computed the cost of reversing word valence by subtracting the latencies of affirmed trials from the latencies of negated trials as a function of the six blocks. Contrast analyses revealed a significant decrease in the cost of negation from Blocks 1– 6, F(1, 20) ⫽ 13.23, p ⫽ .002, ␩2 ⫽ .40; Blocks 4 – 6, F(1, 20) ⫽ 7.80, p ⫽ .01, ␩2 ⫽ .28; and Blocks 5– 6, F(1, 20) ⫽ 4.57, p ⫽ .04, ␩2 ⫽ .19, and the overall linear contrast was significant too, F(1, 20) ⫽ 9.10, p ⫽ .007, ␩2 ⫽ .31. As with Experiment 1, word valence influenced response times in several ways. First, negative words (M ⫽ 1,259, SD ⫽ 150) were evaluated more slowly than positive words (M ⫽ 1,215, SD ⫽ 159), F(1, 20) ⫽ 29.88, p ⬍ .001, ␩2 ⫽ .60. This main effect of word valence was qualified by an interaction with the qualifier, indicating that negative words were processed slower only when they were affirmed (Mnegative ⫽ 1,185, SDnegative ⫽ 149 vs. Mpositive ⫽ 1,081, SDpositive ⫽ 129), but not when they were negated (Mnegative ⫽ 1,333, SDnegative ⫽ 162 vs. Mpositive ⫽ 1,349, SDpositive ⫽ 208), F(1, 20) ⫽ 16.46, p ⫽ .001, ␩2 ⫽ .45. Finally, the interaction of Valence ⫻ Block reached significance, F(5, 100) ⫽ 2.81, p ⫽ .020, ␩2 ⫽ .12, indicating that responses to negative stimuli were slower in Blocks 1, 3, and 5, whereas no difference occurred in the remaining blocks.

Discussion The results of Experiment 3 indicate that evaluating negated expressions can be driven by memory retrieval instead of the application of the procedure to negate. Different from Experiments 1 and 2, word stimuli in Experiment 3 appeared either in an affirmed or in a negated manner in the training blocks. This way, the storing of the compound stimulus along with its overall valence

in associative memory was assumed to be facilitated. The data indeed support this notion. As in the previous experiments, training generally reduced response latencies toward affirmed and negated stimuli. Different from the previous experiments, however, the reduction was not symmetrical for affirmed and negated words. In particular, negated words profited more from the training than affirmed words. This asymmetry was indicated by asymptotic learning curves for affirmed words but not for negated words. Moreover, unlike in the previous experiments, the difference between response latencies to affirmed and negated words decreased as a function of practice. This result suggests that, with extended practice, participants stored the overall valence of the negated expressions in memory and were thus able to retrieve them directly with greater efficiency. As such, the cost of negating a given compound declined as a function of practice. Even though Experiments 1–3 are consistent with our prediction, one might object that the mental operation to reverse the valence of a word is frequently used in everyday language and thinking. Therefore, the level of efficiency may have already reached a maximum, which cannot be altered by further training. According to this reasoning, the failure to observe a decrease in the time of negating valence may simply be a floor effect, because negating might already be an automatic skill. If this assumption is correct, negation should operate independent from intention (see Shiffrin & Dumais, 1981) and cognitive capacity. In contrast to this interpretation, however, the processing of negations has been shown to put substantial stress on the cognitive system (Lea & Mulligan, 2002; see also Gilbert, 1991). For instance, logical reasoning becomes slower and more prone to error if negations are part of premises or conclusions (e.g., Evans, Newstead, & Byrne, 1993; Wason, 1959). In a similar vein, psycholinguistic research has indicated that the meaning of sentences containing negations requires the construction of mental models that describe the situation implied by the negation (e.g., Kaup, 2001; Lea & Mulligan, 2002). Finally, a recent study by Mayo et al. (2004) indicated that it is easier to determine whether a given fact (e.g., Tom’s clothes are folded neatly in his closet) indicates the presence of a personality trait (e.g., Tom is a tidy person) as opposed to the absence of this trait (e.g., Tom is not a tidy person). However, even though 3

Degrees of freedom were adjusted according to Greenhouse-Geisser where appropriate.

DEUTSCH, GAWRONSKI, AND STRACK

394

these studies suggested that processing negations is relatively inefficient, they were not conclusive regarding the question of whether the processing of negations can take place independent from intentions. Experiments 4 – 6 were designed to answer this question more directly.

Experiment 4 In Experiments 1 and 2, extensive training did not increase the efficiency of the procedure to negate the valence of a word. If this lack of increase was due to a floor effect— caused by the extensive practice of negations in everyday language processing and reasoning—processing negations should be relatively efficient and independent from intentions. According to the present hypotheses, however, processing negations should be both relatively inefficient and dependent on intentions. Although existing evidence is incompatible with the first assumption (see Gilbert, 1991), there is little evidence addressing the second assumption. In Experiment 4, we tested our prediction by comparing evaluative priming effects of negated or affirmed positive and negative stimuli to deliberate evaluative judgments of the same stimuli. In evaluative priming paradigms (Fazio et al., 1986), a prime stimulus is presented briefly (usually for less than 300 ms) before the presentation of a target word. The participants’ task is to indicate the valence of the target word. The evaluation of the target word is usually facilitated if the valence of the prime and the target are congruent. However, the evaluation of the target word is usually inhibited if the valence of the prime and the target are incongruent. Most important, such evaluative priming effects emerge even though participants are not required to process the valence of the prime stimulus. Thus, if the processing of negations is indeed highly efficient and independent from intentions, negated prime stimuli should not only activate the word valence in memory, but also lead to an immediate reversal of the activated valence. Accordingly, priming effects of positive and negative prime words should differ as a function of whether they are affirmed or negated. That is, affirmed positive and negated negative prime words should lead to positive evaluative priming effects, whereas negated positive and affirmed negative prime words should lead to negative evaluative priming effects. However, if our assumption is correct that the processing of negations requires intention, priming effects of positive and negative prime words should not differ as a function of whether they are affirmed or negated. That is, both affirmed and negated positive prime words should lead to positive evaluative priming effects, whereas both affirmed and negated negative prime words should lead to negative evaluative priming effects. Whereas most priming paradigms (Fazio et al., 1986; Neely, 1977) use only one prime stimulus, the research question addressed in the present experiment requires the presentation of two primes (i.e., qualifier and concept). Thus, a paradigm capable of capturing the preconstructive effects of two stimuli was required. Balota and Paul (1996) used a sequential priming paradigm to explore the joint operation of multiple primes in a semantic priming task. They expected that if two primes are semantically related to a target (e.g., stripes and cage as primes and tiger as target), a stronger priming effect should occur compared with a situation in which only one or none of the two primes is related to the target (e.g., beans and dance as primes and tiger as target). To test this prediction, Balota and Paul sequentially presented two primes, 133

ms each with a 33-ms interval between the last prime and target onset, resulting in a stimulus onset asynchrony (SOA) of 299 ms. Using this paradigm, Balota and Paul found that responses to target words are fastest if both primes are related to the target and slowest if both primes are unrelated to the target, with conditions in which one prime is related to the target falling in between. This paradigm also seems suitable for the present purpose of assessing the joint effects of different qualifiers and different target concepts. More precisely, we used either affirming or negating qualifiers as the first of two sequentially presented primes and positive or negative words as the second primes in an affective priming paradigm adapted from Fazio et al. (1986). In addition, we assessed participants’ reflective evaluations of affirmed and negated prime stimuli. Although in this evaluative judgment task the presentation of the affirmed and negated words was exactly the same as in the evaluative priming task, it additionally involved the intention to process the valence of the compound term, thus warranting a successful processing of the negation. There were four types of qualifier–word pairings: affirmed positive (e.g., a party), negated positive (e.g., no party), affirmed negative (e.g., a disease), and negated negative (e.g., no disease).

Method Participants and Design Thirty-seven students (25 women, 12 men) of the University of Wu¨rzburg took part in a study purportedly concerned with concentration. Participants received €6 as compensation (approximately U.S. $5 at that time). The experiment consisted of a 2 (word valence: positive vs. negative) ⫻ 2 (qualifier: affirmation vs. negation) ⫻ 2 (measure: evaluative judgment vs. evaluative priming) within-subject design.

Procedure Practice trials. Participants first practiced the evaluation of the target words without primes. Half of the participants were instructed to press the left key as fast as possible if the word was positive and to press the right key if the word was negative. For the remaining half of participants, the key assignment was reversed. Each target word was presented once, resulting in a total of 20 practice trials. Each trial started with a warning signal (* * *) in the center of the screen for 500 ms, followed by a blank screen for 500 ms. The target word was then presented in the center of the screen in uppercase letters and bright yellow color. As soon as participants pressed the correct key, the reaction was recorded and the next trial started, resulting in a response–stimulus interval of 1,000 ms. If participants pressed the wrong key, appropriate error feedback (e.g., Error! Positive left, negative right) appeared on the screen for 1,000 ms. Then the next trial started, resulting in a feedback-stimulus interval of 1,000 ms. Evaluative priming task. After the practice trials, participants learned that the following task would be similar to the previous one, with the exception that they would see two additional words in white letters for a brief time before the yellow target words appear on the screen. Participants were told to focus particularly on the yellow words and to ignore the white words. As in the practice trials, the key assignment for categorization responses was varied between participants. Primes and targets were matched randomly for each participant and trial. Each prime combination was presented once with a positive and once with a negative target, resulting in a total of 80 priming trials, representing a 2 (first prime qualifier: affirmation vs. negation) ⫻ 2 (second prime valence: positive vs. negative) ⫻ 2 (target valence: positive vs. negative) within-subject subdesign.

BOUNDARIES OF AUTOMATICITY Priming trials were identical to the practice trials with the following exceptions. After the warning signal, either an affirmation (i.e., a) or negation (i.e., no) term was presented for 133 ms in the center of the screen in white uppercase letters, which was immediately followed by either a positive or negative word for 133 ms, also in white letters. A blank screen then replaced the second prime. After 33 ms the target word appeared on the screen, which was presented in yellow uppercase letters. Evaluative judgment task. After the priming task, participants were told that they would again see the white words that were used as primes in the previous block and that their task was to judge the valence of these pairs of words on a 5-point rating scale ranging from 1 (very bad) to 5 (very good). They were explicitly asked to take as much time as they wanted to make their judgment. The same 40 qualifier–words pairings as in the priming task were used, resulting in a total of 40 trials for the judgment task. The order of stimulus presentation was randomized for each participant. The procedure for each trial was the same as the priming trials, except that no target words were presented. Instead, the rating scale followed the presentation of each prime combination. Also, because of the usage of a judgment scale instead of positive–negative decision, error feedback was omitted.

395

Table 4 Mean Response Latencies, Standard Errors, and Percentages of Error for Responses to Positive and Negative Target Words as a Function of Prime Valence and Qualifier Attached to Prime, Experiment 4 Prime valence Target valence

Positive

Negative

Prime qualifier: Affirmation Positive M SEM Error % Negative M SEM Error %

616.84 12.86 1.35

639.97 11.15 4.80

620.73 12.08 2.85

604.96 12.96 1.42

Prime qualifier: Negation

Materials For this and the following experiment, words were selected from a standardized list of positive and negative words published by Klauer and Musch (1999). To generate prime stimuli, 10 positive and 10 negative nouns were selected on the basis of their evaluative extremity. These 20 nouns were presented together with qualifiers indicating an affirmation or negation (i.e. a, no), resulting in a total of 40 different qualifier (Prime 1) and word (Prime 2) combinations (see Appendix D). In addition to the prime words, we selected 10 positive and 10 negative nouns from Klauer and Musch’s (1999) list to be chosen as target words for the evaluative priming task (see Appendix E).

Results For the analyses of the evaluative priming data, latencies of incorrect responses (2.3%) and all response latencies higher than 1,000 ms (5.4%) were excluded.4 To simplify the comparison between evaluative priming and evaluative judgment data, we calculated positivity indices for each of the four prime combinations (i.e., affirmed positive, affirmed negative, negated positive, negated negative) by subtracting the latencies for positive targets from the latencies for negative targets, given a specific prime combination (for absolute response latencies, see Table 4). The resulting positivity indices of the evaluative priming task, as well as positivity ratings of the evaluative judgment task, were then z transformed, based on the distribution of each measure. These scores were then submitted to a 2 (word valence) ⫻ 2 (qualifier) ⫻ 2 (measure) ANOVA for repeated measures. As expected, negations had a differential impact on evaluative judgments as compared with evaluative priming. Whereas negations reversed the valence of words for in the evaluative judgment task (see Figure 5, right panel), the positivity index for the evaluative priming task was unaffected by the negations (see Figure 5, left panel). This result is reflected in a highly significant three-way interaction of Word Valence ⫻ Qualifier ⫻ Measure, F(1, 36) ⫽ 182.29, p ⬍ .001, ␩2 ⫽ .84. To further specify the nature of this interaction, we conducted separate analyses for each measure. A 2 (word valence) ⫻ 2 (qualifier) ANOVA on evaluative judgments revealed a significant main effect of word valence, F(1, 36) ⫽ 10.62, p ⫽ .002, ␩2 ⫽ .29, and more important, a highly

Positive M SEM Error % Negative M SEM Error %

605.87 13.16 2.15

630.11 12.95 1.95

626.14 12.20 2.16

607.43 12.27 1.89

significant interaction between valence and qualifier, F(1, 36) ⫽ 172.80, p ⬍ .001, ␩2 ⫽ .83. Simple contrasts indicated that affirmed positive words (M ⫽ 4.30, SD ⫽ 0.59) were evaluated more positively than affirmed negative words (M ⫽ 1.89, SD ⫽ 0.50), F(1, 36) ⫽ 219.82, p ⬍ .001, ␩2 ⫽ .86, and that negated positive words (M ⫽ 2.21, SD ⫽ 0.69) were evaluated more negatively than negated negative words (M ⫽ 3.89, SD ⫽ 0.71), F(1, 36) ⫽ 59.21, p ⬍ .001, ␩2 ⫽ .62. Moreover, affirmed positive words were evaluated as more positive than negated negative words, F(1, 36) ⫽ 7.85, p ⫽ .004, ␩2 ⫽ .21, and negated positive words were evaluated as less negative than affirmed negative words, F(1, 36) ⫽ 7.85, p ⫽ .004, ␩2 ⫽ .18. The same ANOVA on positivity indices of the evaluative priming task revealed a significant main effect for word valence, F(1, 36) ⫽ 29.62, p ⬍ .001, ␩2 ⫽ .45, indicating that positive prime words (M ⫽ 12.08, SD ⫽ 39.45) showed a more positive valence than negative prime words (M ⫽ ⫺28.84, SD ⫽ 48.15). Most important, this effect was independent of the qualifier, as indicated by a nonsignificant interaction between qualifier and valence (F ⬍ 1). Also, the main effect of the qualifier was not significant F(1, 36) ⫽ 2.12, p ⫽ .154, ␩2 ⫽ .06. Simple contrasts further indicated that affirmed positive words (M ⫽ 3.89, SD ⫽ 47.49) had a more positive valence than affirmed negative words (M ⫽ ⫺35.00, SD ⫽ 53.49), F(1, 36) ⫽ 14.37, p ⫽ .001, ␩2 ⫽ .26, and that negated positive words (M ⫽ 20.27, SD ⫽ 50.96) had a more 4

As proposed by Ratcliff (1993), the results of the main analyses were validated with a second analysis, in which the data were trimmed by an inverse transformation of the raw response latencies instead of a cut-off procedure. Analyses with both data sets revealed corresponding results.

DEUTSCH, GAWRONSKI, AND STRACK

396

Positive

40

Evaluative Judgment

30

Priming-Index

20 10 0 -10 -20 -30

Positive

5

Negative

Negative

4 3 2 1

-40 -50

0

Affirmation

Negation

Affirmation

Negation

Figure 5. Mean evaluative priming (left) and evaluative judgment effects (right) as a function of word valence and qualifier (Experiment 4). Higher values indicate more positive valence. Error bars indicate standard errors of the mean.

positive valence than negated negative words (M ⫽ ⫺22.68, SD ⫽ 74.06), F(1, 36) ⫽ 17.12, p ⬍ .001, ␩2 ⫽ .32. The valence of affirmed negative and negated negative words did not differ from each other, F(1, 36) ⫽ 0.76, p ⫽ .39, ␩2 ⫽ .02. The same was true for the valence of affirmed positive and negated positive words F(1, 36) ⫽ 2.85, p ⫽ .10, ␩2 ⫽ .07.

Discussion The results of Experiment 4 support our assumption that the findings obtained in Experiments 1–3 are due to genuine differences in the processing of affirmations and negations, rather than to a high efficiency level in the processing of negations. Specifically, one could argue that the mental operation to reverse the valence of a word is very frequent in everyday language and thinking, thus leading to floor effects in the time required for processing negations. This assumption is clearly inconsistent with the present findings. In the present study, negations influenced only reflective evaluations in an evaluative judgment task. However, unintentional evaluations obtained in an evaluative priming task (Fazio et al., 1986) were generally unaffected by the respective qualifiers. That is, positive prime words showed a more positive valence than negative prime words irrespective of whether these words were affirmed or negated. This pattern is in contrast to the notion that negations may already be trained to a degree such that further training could not increase their efficiency. If this was the case, negations should not only alter evaluative judgments but also evaluative priming effects. The evaluative judgment task additionally demonstrated that the presentation times of the primes were sufficient to process the two primes. In this task, qualifier and prime valence showed a highly significant interaction effect. Therefore, it can be ruled out that negations were ineffective because they were presented too briefly. There are, however, two possible objections which may question the conclusions drawn from Experiment 4. First, one might argue that the qualifiers (Prime 1) did not affect automatic processing because they were presented much earlier than the prime words (Prime 2). As such, activated representations in memory may have faded away before the target presentation. Research by Hermans, De Houwer, and Eelen (2001), for example, indicated that evaluative priming effects strongly depend on the SOA. According to their experiments, priming effects reach their maximum

with a prime presentation time of 200 ms and an SOA well below 300 ms. Thus, although Balota and Paul (1996) were successful in showing joint effects of two primes in this paradigm, the timing may be insufficient to show priming effects with abstract qualifiers. Second, the degree of automatization of a particular procedure may depend on the degree of practice (Bargh, 1997). Thus, even though negations are extensively practiced in everyday language, the usual way of processing negations in written language of participants’ mother tongue is to read them from left to right. As such, it is possible that negations can be processed automatically if the respective stimuli are presented in a more common format. Experiment 5 was designed to rule out these two objections by presenting qualifiers and words in a parallel rather than in a sequential manner.

Experiment 5 To prevent qualifiers from being more distant to the targets than the prime words, qualifier–word combinations were presented simultaneously instead of sequentially. The use of compound primes also ensured that the primes were perceived similar as in everyday reading. Additionally, the SOA and presentation times were chosen so that a maximum priming effect could be expected (see Hermans et al., 2001). As in Experiment 4, we added an evaluative judgment condition to study reflective effects.

Method Participants and Design Thirty-one students (17 women, 14 men) of the University of Wu¨rzburg took part in a study purportedly dealing with concentration and attention. Participants received either €6 (approximately U.S. $5 at that time) or course credit as compensation. The experiment consisted of a 2 (word valence: positive vs. negative) ⫻ 2 (qualifier: affirmation vs. negation) ⫻ 2 (measure: evaluative judgment vs. evaluative priming) within-subject design.

Procedure The stimulus material and procedure were identical to those in Experiment 4 with the following exceptions. Instead of the particular key assignment being varied between participants, the key assignment was now manipulated on a within-subject basis. The order of key assignment was

BOUNDARIES OF AUTOMATICITY counterbalanced. Because of the within-variation of the key assignment, the 40 qualifier-word combinations were used twice as primes with positive and negative targets, resulting in a total of 160 priming trials. More important, qualifiers and words were presented in parallel (rather than sequentially). Each trial started with a warning signal (* * *) in the center of the screen for 500 ms. After a blank screen for 200 ms, the prime words were displayed for 200 ms. Immediately afterward, the target words appeared on the screen, resulting in an SOA of 200 ms. Because of the slightly reduced interval between the warning and prime presentation, the feedback-stimulus and response-stimulus intervals were only 700 ms. Instructions for the task were adapted accordingly. The evaluative judgment task was identical to Experiment 4, except that the presentation of stimuli was adapted to the parallel priming procedure.

Results For the analyses of the evaluative priming data, latencies of trials in which participants incorrectly classified the target (3.9%) and all response latencies greater than 1,000 ms (8.0%) were excluded.5 Indices of positivity were calculated according to the procedure described in Experiment 4 (for absolute response latencies, see Table 5). The resulting positivity indices of the evaluative priming task as well as positivity ratings of the evaluative judgment task were then z transformed, based on the distribution of each measure. These scores were then submitted to a 2 (word valence) ⫻ 2 (qualifier) ⫻ 2 (measure) ANOVA for repeated measures. Consistent with our predictions, evaluative judgments and evaluative priming effects were differentially affected by the qualifiers. This result is reflected in a highly significant three-way interaction of Word Valence ⫻ Qualifier ⫻ Type of Measure, F(1, 30) ⫽ 751.48, p ⬍ .001, ␩2 ⫽ .96 (see Figure 6). To further specify the nature of this interaction, we conducted separate analyses for each measure. Table 5 Mean Response Latencies, Standard Errors, and Percentages of Error for Responses to Positive and Negative Target Words as a Function of Prime Valence and Qualifier Attached to Prime, Experiment 5 Prime valence Target valence

Positive

Negative

Prime qualifier: Affirmation Positive M SEM Error % Negative M SEM Error %

616.11 10.91 2.90

626.70 10.02 3.75

637.45 12.49 4.45

621.02 10.50 4.75

Prime qualifier: Negation Positive M SEM Error % Negative M SEM Error %

615.40 10.37 2.97

633.41 11.44 4.64

637.29 12.12 4.02

636.94 11.17 3.32

397

Replicating the results of Experiment 4, a 2 (word valence) ⫻ 2 (qualifier) ANOVA on evaluative judgments revealed a significant main effect of word valence, F(1, 30) ⫽ 22.84, p ⬍ .001, ␩2 ⫽ .43; a significant main effect of the qualifier, F(1, 30) ⫽ 33.93, p ⬍ .001, ␩2 ⫽ .53; and more important, a highly significant interaction of Word Valence ⫻ Qualifier, F(1, 30) ⫽ 917.33, p ⬍ .001, ␩2 ⫽ .97. Simple contrasts indicate that participants evaluated affirmed positive words (M ⫽ 4.59, SD ⫽ 0.29) more positively than affirmed negative words (M ⫽ 1.85, SD ⫽ 0.29), F(1, 30) ⫽ 1171.65, p ⬍ .001, ␩2 ⫽ .97. Conversely, negated negative words (M ⫽ 4.00, SD ⫽ 0.45) were evaluated more positively than negated positive words (M ⫽ 1.76, SD ⫽ 0.30), F(1, 30) ⫽ 404.21, p ⬍ .001, ␩2 ⫽ .93. Even though negated negative words were seen as less positive than affirmed positive words, F(1, 30) ⫽ 48.02, p ⬍ .001, ␩2 ⫽ .66, negated positive words were rated equally negative as affirmed negative words, F(1, 30) ⫽ 1.65, p ⫽ .21, ␩2 ⫽ .05. The same ANOVA on positivity indices of the evaluative priming task revealed a significant main effect only for word valence, F(1, 30) ⫽ 12.66, p ⫽ .001, ␩2 ⫽ .30, indicating that positive words (M ⫽ 21.61, SD ⫽ 29.58) showed a more positive valence than negative words (M ⫽ ⫺1.07, SD ⫽ 32.99). Most important, this effect was again independent of the qualifier, as indicated by a nonsignificant interaction between Qualifier ⫻ Valence (F ⬍ 1). Simple contrasts further revealed that negated positive primes (M ⫽ 21.88, SD ⫽ 38.89) tended to have a more positive valence than negated negative primes (M ⫽ 3.53, SD ⫽ 46.82), F(1, 30) ⫽ 3.12, p ⫽ .09, ␩2 ⫽ .09. Correspondingly, affirmed positive primes (M ⫽ 21.34, SD ⫽ 31.80) had a more positive valence than affirmed negative primes (M ⫽ ⫺5.68, SD ⫽ 35.11), F(1, 30) ⫽ 12.70, p ⫽ .001, ␩2 ⫽ .30. In addition, affirmed positive words showed a more positive valence than negated negative words, F(1, 30) ⫽ 4.19, p ⬍ .05, ␩2 ⫽ .12, and affirmed negative words showed a less positive valence than negated positive words, F(1, 30) ⫽ 14.64, p ⫽ .001, ␩2 ⫽ .33.

Discussion Experiment 5 confirms our assumption that the results of Experiment 4 are due to genuine effects related to the processing of negations, rather than to contingent features of the stimulus presentation. Specifically, Experiment 5 aimed to rule out the objective that the ineffectiveness of negations in the priming task of Experiment 4 was due to the unfamiliar presentation of negations and their temporal distance to the target. In the present study, qualifiers and positive and negative prime words were presented simultaneously at presentation times and SOAs that are optimal for automatic evaluative priming effects. Replicating the results of Experiment 4, negations only influenced responses in the evaluative judgment task. However, unintentional evaluations in the evaluative priming task (Fazio et al., 1986) were generally unaffected by relevant qualifiers. In this task, positive prime words showed a more positive valence than negative prime words, irre5 As with Experiment 4, the results of the main analyses were validated with a second analysis, in which the data were trimmed by an inverse transformation of the raw response latencies instead of a cut-off procedure (cf. Ratcliff, 1993). Analyses with both data sets revealed corresponding results.

DEUTSCH, GAWRONSKI, AND STRACK

398

Positive

40

Evaluative Judgment

30

Priming-Index

20 10 0 -10 -20 -30

Positive

5

Negative

Negative

4 3 2 1

-40 -50

0

Affirmation

Negation

Affirmation

Negation

Figure 6. Mean evaluative priming (left) and evaluative judgment effects (right) as a function of word valence and qualifier (Experiment 5). Higher values indicate more positive valence. Error bars indicate standard errors of the mean.

spective of whether these words were affirmed or negated. As such, it is quite unlikely that the inefficiency of negations in the two priming studies were caused by a lack of familiarity with the particular kind of presentation (i.e., parallel vs. sequential) or by the temporal distance between qualifier and target in sequential presentations. The next experiment was devised to explore how the familiarity of specific negations influences evaluative priming effects.

Experiment 6 Experiments 4 and 5 suggest that, in line with our main hypothesis, processing time and intention are necessary prerequisites for the general procedure to negate valence. Experiment 6 was designed to further illustrate the role of instance learning in the negation of valence. Paralleling Experiment 3, we sought to establish conditions under which memory-based mechanisms would strongly resemble the output of the original procedure to negate. Theories of memory-based automaticity predict that the meaning of specific negations can be stored in memory through frequent practice. Consequently, if a highly practiced negation is perceived later, the compound meaning implied by the negation will be activated automatically and will thereby influence further processing. If this reasoning is correct, the immediate effects of highly trained negations should differ considerably from those of untrained, novel negations. Experiment 6 tested this assumption by comparing the evaluative priming effects of negations that are frequent in everyday language (e.g., no luck) with those elicited by rare negations (e.g., no cockroach). As in the previous experiments, participants also judged the valence of the stimuli used as primes.

Method Participants and Design Fifty-five students (38 women, 17 men) of the University of Wu¨rzburg took part in an experiment purportedly concerned with concentration. Participants received €6 as compensation (approximately U.S. $5 at that time). The experiment consisted of a 2 (word valence: positive vs. negative) ⫻ 2 (frequency: frequent vs. rare) ⫻ 2 (measure: evaluative judgment vs. evaluative priming) within-subject design. In contrast to the previous experiments, only negated stimuli were used for the analyses in Experiment 6 (see below).

Procedure The procedure and instructions were the same as in Experiment 5 except for the following deviations. First, the key assignment was manipulated between rather than within participants. Second, participants were primed twice with each of the frequent and rare negations, resulting in a total of 80 priming trials. In addition, affirmed versions of the stimuli were entered as filler stimuli to keep the overall structure of the materials comparable with Experiments 4 and 5. This added another 80 trials. However, because the frequency and valence data were obtained for the negated forms only, affirmations were generally excluded from analyses. Third, some new target words were used, because the previous set contained words identical to the selected rare and frequent negations. The procedure for the evaluative judgment task was identical to that of Experiment 5, except for the variations in the stimuli.

Materials To identify frequent and rare negations of positive and negative words, we selected 53 positive and 53 negative words on rational grounds (see Experiment 1). Seventy-one psychology students evaluated these negations with respect to their frequency and their valence. On the basis of these data, 40 negations were chosen, 10 for each of the following categories: frequent negations of positive words, frequent negations of negative words, rare negations of positive words, and rare negations of negative words (see Appendix F for the words and Appendix H for pretest data). In addition to the prime words, we selected 10 positive and 10 negative nouns from Klauer and Musch’s (1999) list to be chosen as target words for the evaluative priming task (see Appendix G).

Results For the analyses of the evaluative priming data, latencies of trials in which participants incorrectly classified the target (3.8%) and all response latencies greater than 1,000 ms (5.9%) were excluded from analyses.6 Indices of positivity were calculated according to the procedure described in Experiment 4 (for absolute response latencies, see Table 6). Consistent with our predictions, a 6 As with Experiments 4 and 5, the results of the main analyses were validated with a second analysis, in which the data were trimmed by an inverse transformation of the raw response latencies instead of a cut-off procedure (cf. Ratcliff, 1993). Analyses with both data sets revealed corresponding results.

BOUNDARIES OF AUTOMATICITY

Table 6 Mean Response Latencies, Standard Errors, and Percentages of Error for Responses to Positive and Negative Target Words as a Function of Prime Valence and Qualifier Attached to Prime, Experiment 6 Prime valence Target valence

Positive

Negative

Frequent negations Positive M SEM Error % Negative M SEM Error %

605.84 8.81 5.32

597.94 8.92 3.64

596.07 8.16 3.97

607.94 8.59 4.41

Rare negations Positive M SEM Error % Negative M SEM Error %

599.40 8.96 2.61

615.23 8.65 4.95

607.52 8.65 3.72

607.21 9.81 2.91

2 (word valence) ⫻ 2 (frequency) ⫻ 2 (measure) ANOVA revealed a highly significant three-way interaction, F(1, 54) ⫽ 21.07, p ⬍ .001, ␩2 ⫽ .28 (see Figure 7). To further specify the nature of this interaction, we conducted separate analyses for each measure. A 2 (frequency) ⫻ 2 (word valence) ANOVA on evaluative judgments revealed a significant main effect of valence, indicating that participants evaluated negated negative words more positively than negated positive words, F(1, 54) ⫽ 1,364.67, p ⬍ .001, ␩2 ⫽ .96. In addition, frequent negations were evaluated less positively than rare negations, F(1, 54) ⫽ 8.47, p ⫽ .005, ␩2 ⫽ .14. Moreover, frequency and word valence revealed a significant interac-

tion, F(1, 54) ⫽ 73.59, p ⬍ .001, ␩2 ⫽ .58, showing that frequent negations of positive words (M ⫽ 1.50, SD ⫽ 0.36) were evaluated more negatively than frequent negations of negative words (M ⫽ 4.46, SD ⫽ 0.31), F(1, 54) ⫽ 1,574.85, p ⬍ .001, ␩2 ⫽ .97. Similarly, rare negations of positive words (M ⫽ 1.97, SD ⫽ 0.36) were evaluated more negatively than rare negations of negative words (M ⫽ 4.21, SD ⫽ 0.42), F(1, 54) ⫽ 629.74, p ⬍ .001, ␩2 ⫽ .92. Moreover, negated negative words were evaluated more positively when they were frequent rather than rare negations, F(1, 54) ⫽ 14.28, p ⬍ .001, ␩2 ⫽ .21, whereas negated positive words were evaluated more negatively when the negations were frequent rather than rare negations, F(1, 54) ⫽ 108.36, p ⬍ .001, ␩2 ⫽ .67. The same ANOVA on positivity indices of the evaluative priming task revealed a significant interaction between frequency and valence, F(1, 54) ⫽ 9.80, p ⫽ .003, ␩2 ⫽ .15. As expected, rare negations were generally unaffected by negations, whereas the valence of the prime words was reversed for frequent negations. Neither the main effect of word valence nor the main effect of frequency was significant (both Fs ⬍ 1). Further inspection revealed that for negations low in frequency, negated positive words (M ⫽ 8.12, SD ⫽ 49.14) tended to show a more positive valence than negated negative words (M ⫽ ⫺8.02, SD ⫽ 48.91), F(1, 54) ⫽ 3.47, p ⫽ .07, ␩2 ⫽ .15. For negations high in frequency, in contrast, negated negative words (M ⫽ 9.90, SD ⫽ 43.48) showed a more positive valence than negated positive words (M ⫽ ⫺9.68, SD ⫽ 57.23), F(1, 54) ⫽ 5.29, p ⫽ .03, ␩2 ⫽ .09. In addition, frequently negated negative words showed a more positive valence than rarely negated negative words, F(1, 54) ⫽ 4.96, p ⫽ .03, ␩2 ⫽ .08, whereas frequently negated positive words tended to show a more negative valence than rarely negated positive words, F(1, 54) ⫽ 3.21, p ⫽ .08, ␩2 ⫽ .06. No other contrast was statistically significant (F ⬍ 1).

Discussion Results from Experiment 6 further corroborate our assumption that automatization of negations is due to instance learning, rather than to an automatization of the general procedure to negate. In the present study, evaluative judgments were qualitatively unaffected by the frequency of negations. For both frequent and rare negations, negated negative words were evaluated more positively than

Positive

40

Evaluative Judgment

20 10 0 -10 -20 -30

Positive

5

Negative

30

Priming-Index

399

Negative

4 3 2 1

-40 -50

0

Rare

Frequent

Rare

Frequent

Figure 7. Mean evaluative priming (left) and evaluative judgment effects (right) as a function of word valence and frequency of the negation (Experiment 6). Higher values indicate more positive valence. Note that all stimuli were presented in negated form. Error bars indicate standard errors of the mean.

DEUTSCH, GAWRONSKI, AND STRACK

400

negated positive words. However, this pattern of results was different for evaluative priming effects. For rare negations, the qualifier did not alter the word valence, such that negated positive words showed a more positive valence than negated negative words. For frequent negations, however, the qualifier did indeed alter word valence, such that negated positive words showed a less positive valence than negated negative words. These findings, together with the results of Experiments 4 and 5, indicate that with extended practice, the cognitive skill of evaluating negated expressions can become very efficient and independent of intentions. This automatic skill, however, is not due to enhanced efficiency of the procedure of negating valence. Rather, our data suggest that the automatic skill is based on the retrieval of highly practiced instances from memory. Notwithstanding these findings, however, it is important to note that we did not manipulate the frequency of negations experimentally; rather, it was based on a pretest. Thus, participants in the pretest could have based their judgments of frequency on the perceived ease with which the meaning of the negation can be extracted. Importantly, ease of extracting the meaning of the negation could be influenced by factors other than frequency. As such, the conclusions drawn from Experiment 6 should be treated as preliminary. Future research should further establish their validity.

words. However, the same negations had a strong impact on evaluative judgments. In particular, positive prime words showed greater automatic positivity than negative prime words, irrespective of whether they were affirmed or negated. This finding also proved to be robust against variations in the priming paradigm. Finally, we argued that associative mechanisms can substitute reflective mechanisms that underlie a social– cognitive skill. This conclusion is supported by the results of Experiments 3 and 6. In Experiment 3, enhanced practice reduced the difference between response latencies to affirmed and negated trials under conditions that facilitate instance learning. We interpret this finding as evidence that participants stored the correct response to a negated expression in memory. This interpretation is confirmed by the results of Experiment 6, which investigated evaluative priming effects for frequent and rare negations. In this study, frequently negated negative words exhibited a more positive automatic valence than frequently negated positive words. However, evaluative priming effects of rarely negated words exclusively depend on their valence, irrespective of whether these words were affirmed or negated. This result corroborates our assumption that the cognitive skill to negate valence can be performed automatically only under specific conditions, namely when specific instances are stored in associative memory.

General Discussion

Implications for Research on Training Effects

The goal of the present research was to investigate the cognitive mechanisms underlying automatic and controlled social– cognitive skills. On the basis of theories of automatization (e.g., Logan, 1988) and dual-systems models in social psychology (e.g., Lieberman et al., 2002; E. R. Smith & DeCoster, 2000; Strack & Deutsch, 2004), we argued that, if a social– cognitive skill becomes automatic through practice, the increased efficiency is caused by a shift from rule-based to association-based processing during automatization. At the same time, however, genuine control processes remain unaffected by practice. More specifically, we claimed that the skill to evaluate affirmed and negated expressions consists of a memory-based, associative component (i.e., the activation of word valence) and a reflective, rule-based component (i.e., the reversal of the retrieved valence in the case of a negated word). We predicted that practicing this skill would increase only the efficiency of the retrieval, not the efficiency of reversing the word’s valence. This prediction is supported by the findings of Experiments 1 and 2. Practicing to evaluate affirmed and negated words resulted in a speedup of responses in general. At the same time, however, the difference in response latencies to affirmed and negated words remained constant. Given that responses to affirmed and negated words did not differ in any aspect but the negation, the difference in response latencies can be interpreted as an estimate of the time required to apply the negation (Donders, 1969). Taken together, these results suggest that practice effects in the context of negations are primarily based on the enhanced accessibility of correct responses in memory, whereas genuine control processes remain unaffected by practice. We further hypothesized that the associative, content-based component of the skill to evaluate affirmed and negated words is executed unintentionally, whereas the reflective, rule-based part depends on intention. Experiments 4 and 5 found that negations did not alter evaluative priming effects of positive and negative

The present results qualify the conclusions commonly drawn from previous research on how social– cognitive skills are affected by practice (e.g., E. R. Smith, 1989; E. R. Smith et al., 1988; E. R. Smith & Lerner, 1986; Ru¨ter & Mussweiler, 2004). Particularly, some researchers argued that practice not only establishes memory-based automaticity, but also makes general rules or procedures more efficient. Our findings suggest, however, that at least some general procedures are not or only very little affected by practice. One potential reason for the diverging results lies in the different methods used to estimate procedure-based and memorybased components of the cognitive skill. Whereas previous research relied on the degree of generalization to new instances to estimate rule-based components, we additionally estimated these components from differences between response latencies to affirmed and negated trials. As outlined above, the degree of generalization can be an ambiguous indicator if there is semantic overlap between training and transfer materials. Our method of estimating the procedural component of negating valence excluded such semantic overlaps. In our studies, the abstract procedure of negating valence was repeatedly applied to a limited set of words, and performance on this negation task was compared with participants’ performance on a conceptually corresponding affirmation task. Because responses to affirmed and negated trials should be equally affected by the accessibility of the respective concepts as well as by potential semantic overlaps, these two factors can be ruled out as alternative explanations in the present studies. It is important to note, however, that we found some evidence for generalization in Experiment 2. In this study, participants’ performance with new words was slightly improved as compared with their performance without training. On the surface, these results seem to suggest that the procedure to negate has become more efficient independent of the specific content. Our analysis of the difference in response latencies, however, indicates that this

BOUNDARIES OF AUTOMATICITY

401

was not the case. In fact, the time necessary to reverse the word valence was the same as in the beginning of the training. If the procedure of negating valence had indeed become quicker, generalization effects should have been larger for negated than for affirmed trials. This, however, was not the case. We therefore conclude that the generalization effects obtained in our study were due to the training of valence-response key mapping, rather than to enhanced efficiency in the procedure of negating valence. There is, however, another possible cause of the fact that we found no indication of rule strengthening while other researchers did. Particularly, it is conceivable that some general procedures are less susceptible to training effects than other procedures. For instance, we consider it possible that the procedure of generating trait-tobehavior inferences (e.g., E. R. Smith et al., 1988) can be automatized, whereas negating cannot. Our theoretical analysis suggests that no or little improvements through training can be expected for those parts of a skill which require genuine control functions, such as action planning, overriding of unwanted habits, and the flexible maintenance and integration of multimodal information (Miller & Cohen, 2001; Hummel & Holyoak, 2003). These control functions are presumably part of a number of social– cognitive processes, such as stereotype-control, social comparisons, complex attributions, but also motivated behavior and problem solving. To the degree that cognitive negations are representative of genuine control functions, one can expect the present results to be informative about how other social– cognitive skills respond to practice. We assume that negations are a good model for flexible, symbol-based processing but that the inference on other control functions like impulse-control or planning is less certain. Clearly, future research will be needed to further bolster such inferences.

procedures and associative look-alikes (e.g., Conrey, Sherman, Gawronski, Hugenberg, & Groom, 2005). So far, there are only a few attempts to make similar distinctions in the realm of goal pursuit or other complex skills (see Chartrand & Bargh, 2002). For instance, Bargh, Gollwitzer, Lee-Chai, Barndollar, and Troetschel (2001) specified features specific to controlled goal pursuit, such as an increase of goal strength over time, persistence in the face of obstacles, and the resumption of goal pursuit after interruption. In a series of studies, they demonstrated that these features were also observable if goals were primed instead of conveyed by instructions. Whereas Bargh et al.’s (2001) study suggested that automatic and controlled goal pursuit are mediated by the same mechanisms, a recent study by Dijksterhuis (2004) suggested that unconscious thinking has very different qualities from conscious thought. Particularly, when confronted with complex decision problems, participants made better decisions if they were distracted from engaging in conscious thought than when they were not being distracted. The author concluded that unconscious thought leads to clearer, more polarized, and more integrated representations in memory. What might be the reasons for such diverging evidence to occur? We argue that studies on preexisting skills will often be inconclusive regarding the representations and computations underlying these skills. Associative, content-based simulations of control processes can be very powerful, but they may nevertheless lack important qualities of the controlled process, such as flexibility and generality. Moreover, even if typical features of control are observed with preexisting skills, it could well be the case that the participants are responding based on a differentiated associative structure. As such, it seems desirable to develop training paradigms that can supplement experiments based on preexisting skills.

Implications for Research on Automaticity

Implications for Social–Cognitive Phenomena

The present results also have important implications for research on automaticity in social psychology. This is particularly the case for studies on complex social– cognitive skills, such as motivated behavior (Bargh & Barndollar, 1996) or problem solving (Dijksterhuis, 2004). In these studies, the respective skills most likely involve a conglomerate of both controlled components (e.g., symbolic representations, flexible response selection) as well as memory-based components (e.g., retrieval of semantic contents from memory). Thus, it is possible that automatic variants of complex social– cognitive skills are partially based on different representations and computations from their controlled variants. Take, for instance, the case in which the same goal is pursued repeatedly in the same situation. In the beginning, genuine control processes may have governed the behavior, compiling new sequences of behavior using abstract symbolic representations. With extended practice, new associative structures in memory may emerge, linking perceptual and motor representations. However, these new associative structures will not be able to circumvent obstacles, which may unexpectedly inhibit successful goal pursuit. In such cases, controlled processes would have to be set in motion to fulfill this function (see Lieberman et al., 2002). Drawing on these considerations, it seems desirable to directly investigate the underlying representations and processes when studying automatic processes in social cognition. The main challenge in this endeavor is to find reliable methods that can distinguish between abstract

The present findings also provide a new perspective on previous research automatic stereotype activation. Kawakami et al. (2000), for example, demonstrated that long-term training in the negation of social stereotypes can reduce the subsequent activation of these stereotypes. From a general perspective, these findings could be due to either (a) an improvement of the general procedure to inhibit automatic stereotypes or (b) a storage of negated instances in associative memory. Even though Kawakami et al.’s (2000) data are ambiguous with regard to these explanations, our findings clearly support the latter account. However, they are inconsistent with the former explanation. Specifically, the present results suggest that negation training should lead to a reduction in automatic stereotype activation only if the trained instances are stored in associative memory. Most important, this mechanism implies that negation training for a specific stereotype should not generalize to other stereotypes (unless these stereotypes are semantically related). For instance, enhanced practice in the negation of gender stereotypes may lead to a reduction in the automatic activation of gender stereotypical associations. However, the same training should leave the automatic activation of stereotypes about Black people unaffected. Similar considerations apply to several other social– cognitive phenomena that involve an important role of negations. With regard to persuasion, for example, one could argue that persuasive attempts containing negated terms may lead to unintended attitude

DEUTSCH, GAWRONSKI, AND STRACK

402

changes in the opposite direction (e.g., Christie et al., 2001; Jung Grant et al., 2004; Skurnik et al., 2005), unless the negated proposition is stored as an independent instance in memory. The same argument could be made for behavior-to-trait inferences, such that perceivers may readily infer the absence of traits from behaviors when the absence of a given trait (e.g., not friendly) is stored as an independent unit in memory (e.g., Hasson et al., 2005; Mayo et al., 2004). Similar conclusions can be drawn for many other social– cognitive phenomena that involve negations, such as innuendo effects (Wegner et al., 1981), attitude change (Petty et al., 2006), perseverance effects (C. A. Anderson, 1982; Walster et al., 1967; Wyer & Unverzagt, 1985), or counterfactual thinking (Roese, 1994). The crucial aspect in all these applications is that negations may lead to ironic or unintended effects, unless the meaning of a negated proposition is stored independently in associative memory.

Conclusion The main goal of the present research was to investigate whether automatic social– cognitive skills are based on the same representations and processes as their controlled counterparts. Specifically, our experiments were designed to estimate the relative contributions of associative, content-based, and procedural, rule-based components in the processing of negations. Our findings suggest that the procedural, rule-based component of negations is unaffected by increased practice, whereas the associative, contentbased component is strongly influenced by training. Generally, these results suggest that practice-related skill improvements are limited to conditions in which a general procedure can be substituted by storing the results of previous applications in associative memory. With extended practice, associative substitutes can be very powerful, and only few experimental paradigms may be able to distinguish them from their controlled counterparts. Although such an analysis is highly feasible within the negation paradigm, it might be harder to do for other social– cognitive skills, such as person perception, goal pursuit, or social comparison. Yet, we conceive this endeavor as the next important step in research on automatic social cognition.

References Amodio, D. M., Harmon-Jones, E., Devine, P. G., Curtin, J. J., Hartley, S. L., & Covert, A. E. (2004). Neural signals for the detection of unintentional race bias. Psychological Science, 15, 88 –93. Anderson, C. A. (1982). Inoculation and counterexplanation: Debiasing techniques in the perseverance of social theories. Social Cognition, 1, 126 –139. Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Erlbaum. Anderson, J. R., Fincham, J. M., & Douglass, S. (1997). The role of examples and rules in the acquisition of a cognitive skill. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 932– 945. Balota, D. A., & Paul, S. T. (1996). Summation of activation: Evidence from multiple primes that converge and diverge within semantic memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 827– 845. Bargh, J. A. (1994). The four horsemen of automaticity: Awareness, intention, efficiency, and control in social cognition. In R. S. Wyer & T. K. Srull (Eds.), Handbook of social cognition (Vol. 1, pp. 1– 40). Hillsdale, NJ: Erlbaum.

Bargh, J. A. (1997). The automaticity of everyday life. In R. S. Wyer, Jr. (Ed.), Advances in social cognition (Vol. 10, pp. 1– 61). Mahwah, NJ: Erlbaum. Bargh, J. A. (2004). Bypassing the will: Towards demystifying nonconscious control of social behavior. In R. R. Hassin, J. S. Uleman, & J. A. Bargh (Eds.), The new unconscious (pp. 37–58). Oxford, England: Oxford University Press. Bargh, J. A., & Barndollar, K. (1996). Automaticity in action: The unconscious as repository of chronic goals and motives. In P. Gollwitzer & J. A. Bargh (Eds.), The psychology of action (pp. 457– 481). New York: Guilford. Bargh, J. A., Gollwitzer, P. M., Lee-Chai, A. Y., Barndollar, K., & Troetschel, R. (2001). The automated will: Nonconscious activation and pursuit of behavioral goals. Journal of Personality and Social Psychology, 81, 1014 –1027. Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict monitoring and cognitive control. Psychological Review, 108, 624 – 652. Chartrand, T. L., & Bargh, J. A. (2002). Nonconscious motivations: Their activation, operation, and consequences. In A. Tesser, D. Stapel, & J. Wood (Eds.), Self and motivation: Emerging psychological perspectives (pp. 13– 41). Washington, DC: American Psychological Association. Christie, J., Kozup, J. C., Smith, S., Fisher, D., Burton, S., & Creyer, E. (2001). The effects of bar sponsored alcohol beverage promotions across binge and non-binge drinkers. Journal of Public Policy and Marketing, 20, 240 –253. Clark, H. H., & Chase, W. G. (1974). Perceptual coding strategies in the formation and verification of descriptions. Memory and Cognition, 2, 101–111. Conrey, F. R., Sherman, J. W., Gawronski, B., Hugenberg, K., & Groom, C. (2005). Separating multiple processes in implicit social cognition: The quad model of implicit task performance. Journal of Personality and Social Psychology, 89, 469 – 487. Devine, P. G. (1989). Stereotypes and prejudice: Their automatic and controlled components. Journal of Personality and Social Psychology, 56, 5–18. Dijksterhuis, A. (2004). Think different: The merits of unconscious thought in preference development and decision making. Journal of Personality and Social Psychology, 87, 586 –598. Donders, F. C. (1969). On the speed of mental processes. Acta Psychologica, 30, 412– 431. Evans, J. St.B. T., Newstead, S. E., & Byrne, R. M. J. (1993). Human reasoning: The psychology of deduction. London: Erlbaum. Fazio, R. H. (1995). Attitudes as object-evaluation associations: Determinants, consequences, and correlates of attitude accessibility. In R. E. Petty & J. A. Krosnick (Eds.), Attitude strength (pp. 247–282). Mahwah, NJ: Erlbaum. Fazio, R. H., Sanbonmatsu, D. M., Powell, M. C., & Kardes, F. R. (1986). On the automatic activation of attitudes. Journal of Personality and Social Psychology, 50, 229 –238. Gilbert, D. T. (1991). How mental systems believe. American Psychologist, 46, 107–119. Gough, P. B. (1965). Grammatical transformations and speed of understanding. Journal of Verbal Learning and Verbal Behavior, 4, 107–111. Gupta, P., & Cohen, N. J. (2002). Theoretical and computational analysis of skill learning, repetition priming, and procedural memory. Psychological Review, 109, 401– 448. Hassin, R. R., Aarts, H., & Ferguson, M. J. (2005). Automatic goal inferences. Journal of Experimental Social Psychology, 41, 129 –140. Hassin, R. R., Bargh, J. A., & Uleman, J. S. (2002). Spontaneous causal inferences. Journal of Experimental Social Psychology, 38, 515–522. Hasson, U., Simmons, J. P., & Todorov, A. (2005). Believe it or not: On the possibility of suspending belief. Psychological Science, 16, 566 – 571.

BOUNDARIES OF AUTOMATICITY Hermans, D., De Houwer, J., & Eelen, P. (2001). A time course analysis of the affective priming effect. Cognition and Emotion, 15, 143–165. Heyder, K., Suchan, B., & Daum, I. (2004). Cortico-subcortical contributions to executive control. Acta Psychologica, 115, 271–289. Higgins, E. T., Rholes, W. S., & Jones, C. R. (1977). Category accessibility and impression formation. Journal of Experimental Social Psychology, 13, 141–154. Hummel, J. E., & Holyoak, K. J. (2003). A symbolic-connectionist theory of relational inference and generalization. Psychological Review, 110, 220 –264. Jung Grant, S., Malaviya, P., & Sternthal, B. (2004). The influence of negation on product evaluations. Journal of Consumer Research, 31, 583–591. Kaup, B. (2001). Negation and its impact on the accessibility of text information. Memory and Cognition, 29, 960 –967. Kaup, B., Zwaan, R. A., & Lu¨dtke, J. (in press). The experiential view of language comprehension: How is negated text information represented? In F. Schmalhofer & C. A. Perfetti (Eds.), Higher level language processes in the brain: Inference and comprehension processes. Mahwah, NJ: Erlbaum. Kawakami, K., Dovidio, J. F., Moll, J., Hermsen, S., & Russin, A. (2000). Just say no (to stereotyping): Effects of training in the negation of stereotypic associations on stereotype activation. Journal of Personality and Social Psychology, 78, 871– 888. Klauer, K. C., & Musch, J. (1999). Eine Normierung unterschiedlicher Aspekte der evaluativen Bewertung von 92 Substantiven [A normative study on different aspects of the evaluation of 92 nouns]. Zeitschrift fu¨r Sozialpsychologie, 30, 1–11. Lea, R. B., & Mulligan, E. J. (2002). The effect of negation on deductive inferences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 303–317. Lieberman, M. D., Gaunt, R., Gilbert, D. T., & Trope, Y. (2002). Reflection and reflexion: A social cognitive neuroscience approach to attributional inference. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 34, pp. 199 –249). New York: Academic Press. Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Review, 95, 492–527. Mayo, R., Schul, Y., & Burnstein, E. (2004). “I am not guilty” vs. “I am innocent”: Successful negation may depend on the schema used for its encoding. Journal of Experimental Social Psychology, 40, 433– 449. Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202. Moors, A., & De Houwer, J. (2006). Automaticity: A conceptual and theoretical analysis. Psychological Bulletin, 132, 297–326. Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. Journal of Experimental Psychology: General, 106, 226 –254. O’Reilly, R. C., Braver, T. S., & Cohen, J. D. (1999). A biologically-based neural network model of working memory. In P. Shah & A. Miyake (Eds.), Models of working memory (pp. 375– 411). Cambridge: England: Cambridge University Press. Park, J.-W., Yoon, S.-O., Kim, K.-H., & Wyer, R. S. (2001). Effects of priming a bipolar attribute concept on dimension versus concept-specific accessibility of semantic memory. Journal of Personality and Social Psychology, 81, 405– 420. Petty, R. E., Tormala, Z. L., Brin˜ol, P., & Jarvis, W. B. G. (2006). Implicit ambivalence from attitude change: An exploration of the PAST Model. Journal of Personality and Social Psychology, 90, 21– 41. Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114, 510 –532. Ridderinkhof, K. R., van den Wildenberg, W. P. M., Segalowitz, S. J., & Carter, C. S. (2004). Neurocognitive mechanisms of cognitive control: The role of prefrontal cortex in action selection, response inhibition,

403

performance monitoring, and reward-based learning. Brain and Cognition, 56, 129 –140. Roese, N. J. (1994). The functional basis of counterfactual thinking. Journal of Personality and Social Psychology, 66, 805– 818. Ru¨ter, K., & Mussweiler, T. (2004). It’s all about speed! Practice effects in social comparisons with routine standards. Unpublished manuscript, University of Wu¨rzburg. Satpute, A. B., Fenker, D. B., Waldmann, M. R., Tabibnia, G., Holyoak, K. J., & Lieberman, M. D. (2005). An fMRI study of causal judgments. European Journal of Neuroscience, 22, 1233–1238. Schneider, W., & Chein, J. (2003). Controlled and automatic processing: Behavior, theory, and biological mechanisms. Cognitive Science, 27, 525–559. Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review, 84, 1– 66. Shiffrin, R. M., & Dumais, S. T. (1981). The development of automatism. In J. R. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, NJ: Erlbaum. Skurnik, I. W., Yoon, C., Park, D., & Schwarz, N. (2005). How warnings about false claims become recommendations: Paradoxical effects of warnings on beliefs of older consumers. Journal of Consumer Research, 31, 713–724. Smith, E. E., & Jonides, J. (1999, March 12). Storage and executive processes in the frontal lobes. Science, 283, 1657–1661. Smith, E. R. (1989). Procedural efficiency: General and specific components and effects on social judgment. Journal of Experimental Social Psychology, 25, 500 –523. Smith, E. R., Branscombe, N. R., & Borman, C. (1988). Generality of the effects of practice on social judgment tasks. Journal of Personality and Social Psychology, 54, 385–395. Smith, E. R., & DeCoster, J. (2000). Dual process models in social and cognitive psychology: Conceptual integration and links to underlying memory systems. Personality and Social Psychology Review, 4, 108 – 131. Smith, E. R., & Lerner, M. (1986). Development of automatism of social judgments. Journal of Personality and Social Psychology, 50, 246 –259. Stapel, D. A., & Blanton, H. (2004). From seeing to believing: Subliminal social comparisons affect implicit and explicit self-evaluations. Journal of Personality and Social Psychology, 87, 468 – 481. Strack, F., & Deutsch, R. (2004). Reflective and impulsive determinants of human behavior. Personality and Social Psychology Review, 8, 220 – 247. Strayer, D. L., & Kramer, A. F. (1990). An analysis of memory-based theories of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 291–304. Uleman, J. S. (1999). Spontaneous versus intentional inferences in impression formation. In S. Chaiken & Y. Trope (Eds.), Dual-process theories in social psychology (pp. 141–160). New York: Guilford Press. Walster, E., Berscheid, E., Abrahams, D., & Aronson, V. (1967). Effectiveness of debriefing following deception experiments. Journal of Personality and Social Psychology, 6, 371–380. Wason, P. C. (1959). The processing of positive and negative information. Quarterly Journal of Experimental Psychology, 11, 92–107. Wegner, D. M., Wenzlaff, R., Kerker, R. M., & Beattie, A. E. (1981). Incrimination through innuendo: Can media questions become public answers? Journal of Personality and Social Psychology, 40, 822– 832. Wyer, R. S., & Unverzagt, W. H. (1985). Effects of instructions to disregard information on its subsequent recall and use in making judgments. Journal of Personality and Social Psychology, 48, 533–549. Zeelenberg, R., Wagenmakers, E. M., & Shiffrin, R. M. (2004). Nonword repetition priming in lexical decision reverses as a function of study task and speed stress. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 270 –277.

(Appendixes follow)

404

DEUTSCH, GAWRONSKI, AND STRACK

Appendix A Stimuli Used in Experiments 1–3 Stimuli were selected on the basis of subjective ratings of valence and frequency of the negated compounds.

Affirmed Positive EIN TRIUMPH (a triumph), EIN KINO (a cinema), EIN PARADIES (a paradise), EINE KARRIERE (a career), EIN TANZ (a dance), EIN VORBILD (a role model), EIN SIEG (a victory), EIN WACHSTUM (a growth), EIN GENUSS (a pleasure), EIN KUCHEN (a cake)

Negated Negative KEIN EITER (no pus), KEINE GESCHWULST (no tumor), KEIN DIEB (no thief), KEIN SCHLEIM (no phlegm), KEINE KAKERLAKE (no cockroach), KEIN DIKTATOR (no dictator), KEINE FOLTER (no torture), ¨ NDIGUNG (no layoff), KEINE NEUROSE (no neurosis), KEINE KU KEINE SCHLANGE (no snake)

Affirmed Negative

Appendix B

EIN EITER (a pus), EINE GESCHWULST (a tumor), EIN DIEB (a thief), EIN SCHLEIM (a phlegm), EINE KAKERLAKE (a cockroach), EIN DIKTATOR (a dictator), EINE FOLTER (a torture), EINE NEUROSE (a neu¨ NDIGUNG (a layoff), EINE SCHLANGE (a snake) rosis), EINE KU

Pretest Data for Negations Used in Experiments 1–3 Negated positive Statistic

Negated Positive KEIN TRIUMPH (no triumph), KEIN KINO (no cinema), KEIN PARADIES (no paradise), KEINE KARRIERE (no career), KEIN TANZ (no dance), KEIN VORBILD (no role model), KEIN SIEG (no victory), KEIN WACHSTUM (no growth), KEIN GENUSS (no pleasure), KEIN KUCHEN (no cake)

M SD

Negated negative

Valence

Frequency

Valence

Frequency

2.64 0.25

3.63 0.57

5.31 1.08

2.93 1.36

Note. Data represent subjective ratings of valence and frequency in everyday language (N ⫽ 71). Scales ranged from 1 (very negative, rare) to 7 (very positive, frequent).

Appendix C Positive and Negative Words Used in the Practice Trials of Experiment 3 Positive

Negative

FREUND (friend), URLAUB (vacation), SOMMER (summer), MUSIK (music), PARTY (party), BLUMEN (flowers), GESCHENK (present), KINO (cinema), ERDBEERE (strawberry), PIZZA (pizza)

¨ LLE KRIEG (war), BOMBEN (bombs), HASS (hate), VIRUS (virus), HO (hell), TOD (death), KREBS (cancer), GEWEHRE (rifles), ABFALL (waste), MOSKITO (mosquito)

Appendix D Prime Stimuli Presented in Experiments 4 –5 Affirmed Positive

Negated Positive

¨ GEN (an amusement), EIN FREUND (a friend), EIN EIN VERGNU URLAUB (a vacation), EIN SOMMER (a summer), EINE PARTY (a party), EINE BLUME (a flower), EIN GESCHENK (a present), EIN GENUSS (a pleasure), EINE SCHOKOLADE (a chocolate), EIN KUCHEN (a cake)

¨ GEN (no amusement), KEIN FREUND (no friend), KEIN VERGNU KEIN URLAUB (no vacation), KEIN SOMMER (no summer), KEINE PARTY (no party), KEINE BLUME (no flower), KEIN GESCHENK (no present), KEIN GENUSS (no pleasure), KEINE SCHOKOLADE (no chocolate), KEIN KUCHEN (no cake)

Affirmed Negative

Negated Negative

EINE BOMBE (a bomb), EINE KRANKHEIT (a disease), EINE BEERDIGUNG (a funeral) EIN VIRUS (a virus), EIN VERBRECHEN (a crime), EINE REZESSION (a recession), EINE KAKERLAKE (a cockroach), EIN MOSKITO (a mosquito), EINE RATTE (a rat), EIN WURM (a worm)

KEINE BOMBE (no bomb), KEINE KRANKHEIT (no disease), KEINE BEERDIGUNG (no funeral), KEIN VIRUS (no virus), KEIN VERBRECHEN (no crime), KEINE REZESSION (no recession), KEINE KAKERLAKE (no cockroach), KEIN MOSKITO (no mosquito), KEINE RATTE (no rat), KEIN WURM (no worm)

BOUNDARIES OF AUTOMATICITY

405

Appendix E Target Stimuli Presented in Experiments 4 –5 Positive Targets

Negative Targets

SONNENSCHEIN (sunshine), MUSIK (music), KINO (cinema), ERDBEERE (strawberry), HAWAII (Hawaii), BABY (baby), EISCREME (icecream), SCHWIMMEN (to swim), KA¨TZCHEN (kitten), TANZ (dance)

KRIEG (war), ALKOHOLISMUS (alcoholism), ZAHNSCHMERZ (tooth ¨ LLE (hell), SCHEIDUNG (dipain), HASS (hate), HITLER (Hitler), HO ¨ LL (garbage), ABFALL (waste) vorce), KREBS (cancer), MU

Appendix F Prime Stimuli Presented in Experiment 6 Frequent Negated Positive

Rare Negated Positive

KEINE LUST (no lust), KEIN GELD (no money), KEINE CHANCE (no ¨ CK (no luck), KEINE AUSchance), KEINE SONNE (no sun), KEIN GLU DAUER (no endurance), KEIN SPASS (no fun), KEIN VERTRAUEN (no trust), KEIN FRIEDEN (no peace), KEIN ERFOLG (no success)

KEIN TRIUMPH (no triumph), KEIN KINO (no cinema), KEIN PARADIES (no paradise), KEINE KARRIERE (no career), KEIN TANZ (no dance), KEIN VORBILD (no role model), KEIN SIEG (no victory), KEIN WACHSTUM (no growth), KEIN GENUSS (no pleasure), KEIN KUCHEN (no cake)

Frequent Negated Negative

Rare Negated Negative

KEIN PROBLEM (no problem), KEINE ANGST (no fear), KEINE PANIK (no panic), KEINE SORGE (no sorrow), KEIN STRESS (no stress), KEINE EILE (no rush), KEIN KRIEG (no war), KEINE GEWALT (no ¨ HR (no fee), KEIN PICKEL (no pimple) violence), KEINE GEBU

KEIN EITER (no pus), KEINE GESCHWULST (no tumor), KEIN DIEB (no thief), KEIN SCHLEIM (no phlegm), KEINE KAKERLAKE (no cockroach), KEIN DIKTATOR (no dictator), KEINE FOLTER (no torture), ¨ NDIGUNG (no layoff), KEINE NEUROSE (no neurosis), KEINE KU KEINE SCHLANGE (no snake)

Appendix G

Appendix H

Target Stimuli Presented in Experiment 6

Pretest Data for Frequent and Rare Negations Used in Experiment 6

Positive Targets GESCHENK (gift), MUSIK (music), PARTY (party), ERDBEERE (strawberry), HAWAII (Hawaii), BABY (baby), EISCREME (ice cream), URLAUB (vacation), KA¨TZCHEN (kitten), BLUMEN (flowers)

Negative Targets HITLER (Hitler), BOMBEN (bombs), ALKOHOLISMUS (alcoholism), ¨ LLE (hell), SCHEIDUNG ZAHNSCHMERZ (tooth pain), HASS (hate), HO ¨ LL (garbage), ABFALL (waste) (divorce), KREBS (cancer), MU

Negated positive Frequency category Frequent M SD Rare M SD

Negated negative

Valence

Frequency

Valence

Frequency

1.94 0.32

5.86 0.44

6.19 0.38

5.77 0.40

2.64 0.25

3.63 0.57

5.31 1.08

2.93 1.36

Note. Data represent subjective ratings of valence and frequency in everyday language (N ⫽ 71). Scales ranged from 1 (very negative, rare) to 7 (very positive, frequent).

Received May 1, 2005 Revision received November 3, 2005 Accepted November 29, 2005 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 406 – 422

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.406

A Novel View of Between-Categories Contrast and Within-Category Assimilation Sarah Queller

Terry Schell

Indiana University

Rand Corporation

Winter Mason Indiana University This research manipulated the portion of a category distribution that is misclassified by the optimal classifier and investigated the impact on assessments of category attributes. Three separate studies manipulated the direction of overlap, the extent of overlap, and the relative base rate of the comparison category. All 3 studies produced large between-categories contrast and within-category assimilation. As expected, these effects were enhanced in conditions in which the optimal classifier misclassified a larger portion of the target category. Study 4 demonstrated that intercategory overlap in the absence of overt classification does not produce contrast and assimilation. Ironically, optimizing categorization accuracy can produce highly inaccurate beliefs about category attributes. Keywords: stereotype/stereotyping, categorization, stereotype accuracy, contrast, assimilation

category is like (Allport, 1954). Tajfel and Wilkes’s (1963) early experimental work reinforced these ideas. These researchers labeled the shortest half in a series of lines of graded length with one category label and the longest half in the series with another category label. They demonstrated that the discrepancy between perceivers’ estimates of the lengths of the two stimuli nearest the category boundary was exaggerated compared with an unlabeled control condition and a randomly labeled control condition. They expected to find within-category assimilation as well, so that stimuli within the same systematically labeled category would seem more similar to one another than would the same stimuli in either the no label or the randomly labeled control condition. Their data, however, failed to support this latter expectation.

A large body of research has focused on how perceivers relegate items that vary along one or more dimensions into discrete classes, or categories. Consequently, the field has made great strides in describing human categorization, and the modeling of the underlying psychological processes continues to stir considerable debate. In the process of focusing closely on how people put things into bins, cognitive categorization researchers have largely set aside questions related to the importance of categorization. Why do we care about these bins anyway? Presumably, it is because such classifications can simplify our interactions with the myriad items we encounter as we go about our daily lives. For example, when gathering wild mushrooms, the gatherer’s ability to tell an amanita from a puffball can be quite important. Once a mushroom is classified, the gatherer knows whether to gather and eat it (puffball) or to steer clear (amanita). Thus, the category membership carries information that guides behavior. This is easily recognized with social categories as well. A person classified as a subordinate in the social hierarchy might inspire a jovial slap on the back, whereas a person classified as a superior might be approached with more deference. By assigning instances to categories, perceivers are able to infer accurate and useful information about the attributes of that class of stimuli that can guide behavior. Thus, the importance of categories is derived from how attributes are inferred from category membership as much as it is from how category membership is inferred from attributes. For some time, psychologists have claimed that the act of categorization itself might bias perceivers’ beliefs about what a

Between-Categories Contrast and Within-Category Assimilation Such influences had particular import for researchers of social categorization who argued that the act of classifying people into different groups might lead to inaccurate stereotypes. Accordingly, Tajfel and Wilkes’s (1963) work has since become a cornerstone of researchers’ claims that differences between social groups might be exaggerated. Indeed, the article was cited 46 times between January 2000 and October 2004, mostly in social psychology journals. More recent research has produced results that are consistent with Tajfel and Wilkes’s original hypotheses regarding betweencategories contrast (Corneille, Klein, Lambert, & Judd, 2002; Goldstone, 1994) and within-category assimilation (Livingston, Andrews, & Harnad, 1998). Evidence of both effects occurring simultaneously, however, is rare. This is true even though a wide variety of measures have been used to detect changes in perceptions of the category. Stimulus similarity judgments (Livingston et al., 1998), same– different judgments (Goldstone, 1994), and esti-

Sarah Queller and Winter Mason, Department of Psychology, Indiana University; Terry Schell, Rand Corporation, Santa Monica, California. Correspondence regarding this article should be addressed to Sarah Queller, Department of Psychology, Indiana University, 1101 East 10th Street, Bloomington, IN 47405. E-mail: [email protected] 406

CLASSIFICATION, CONTRAST, AND ASSIMILATION

mates of the attribute values of specific stimuli (Eiser, 1971; Krueger & Clement, 1994; McGarty & Penny, 1988; Tajfel & Wilkes, 1963) have been relied on to assess how perceivers enhance or reduce differences between stimuli. Other work has focused on biases in judgments about the prototypical category member as a function of intercategory comparisons (Goldstone, 1996; Krueger, Rothbart, & Sriram, 1989). Still other work has considered judgments of stimulus typicality as evidence of contrastive and assimilative biases (Corneille & Judd, 1999). In some cases, both between-categories contrast and withincategory assimilation are present in the same study, but each is found using a different dependent measure (Goldstone, Lippa, & Shiffrin, 2001; Livingston & Andrews, 2005). In these cases, one dependent measure may slant the judgment toward contrastive processing and the other may slant the judgment toward assimilative processing. Although these studies are informative about factors that lead to contrast and assimilation, they have provided little support for the kind of simultaneous between-categories contrast and within-category assimilation to which Tajfel and Wilkes (1963) alluded. Like many of the researchers who followed them, they clearly expected to obtain both effects with the same measure. Thus, despite widespread claims that assessments of category attributes are subject to both between-categories contrast and within-category assimilation, both effects have not been consistently evident in experimental studies.

An Effect in Search of a Mechanism Although much of the research in this area has been directed toward demonstrating the existence of categorization-driven contrast and assimilation effects, research has also been directed at explicating the mechanisms that drive these effects. Social psychologists have focused on motivational forces. Describing one set of motivations, Pickett and Brewer (2001) argued that people have competing needs to be both similar to others (assimilative needs) and distinct from others (distinctiveness needs). Biased perceptions of within-category homogeneity and between-categories distinctiveness can serve these goals. People are also prone to exaggerate differences between groups if such exaggeration casts the in-group in a more favorable light than the out-group (Mullen, Brown, & Smith, 1992). Although these motivational mechanisms likely play a role in learning about social groups, they cannot explain contrast and assimilation with stimuli such as line lengths, fish, and chick cloacae (Corneille & Judd, 1999; Corneille et al., 2002; Livingston et al., 1998; Tajfel & Wilkes, 1963). Conversely, however, the mechanisms that explain contrast and assimilation in these nonsocial categories are likely to contribute to learning about social categories. Why, then, do contrast and assimilation occur in these less motivated situations? One possibility is that categorical perception effects drive between-categories contrast and within-category assimilation. Goldstone’s (1994) work is informative in this domain. Using same– different judgments, he showed that perceptual discrimination between similar stimuli is increased along a categorizationrelevant dimension and is particularly pronounced at the category boundary. Presumably, such enhanced perceptual discrimination could lead to assessments of group characteristics that exaggerate differences between groups. Replicating Tajfel and Wilkes (1963), Goldstone did not find diminished perceptual discrimination of

407

stimuli within the same category that, if present, would inflate assessments of within-group homogeneity. These types of perceptual effects have been the basis for applying the original Tajfel and Wilkes (1963) research to the issue of biased stereotypes. Yet it seems unlikely that small perceptual effects would lead to large changes in assessments of a category’s central tendency or dispersion. To be relevant to the claim that stereotypes of social groups are biased from reality in a way that might matter for behavioral interaction, these perceptual effects would have to substantially impact assessments of group attributes.

The Present Research The present work suggests a rather different approach to understanding between-categories contrast and within-category assimilation. The approach was based on the idea that optimizing categorization can lead to systematically biased beliefs about grouplevel attributes. Note that this is a rather alarming contention: It suggests that optimizing the accuracy of categorization can lead to quite inaccurate knowledge about category attributes. In connecting categorization optimization with biased perceptions of group attributes, we made underlying assumptions in the present work that distinguish it from many prior studies on between-categories contrast and within-category assimilation (Corneille & Judd, 1999; Corneille et al., 2002; Eiser, 1971; Goldstone, 1994; Krueger & Clement, 1994; Livingston et al., 1998; Tajfel & Wilkes, 1963). These assumptions were that categories often contain large numbers of members and that their distributions often overlap along continuous dimensions. Note that these two assumptions are certainly true of the categories that are of interest to stereotyping researchers. African Americans and European Americans, Democrats and Republicans, blue-collar and white-collar workers, and women and men are just a few of many examples of large categories that overlap in terms of physical attributes (e.g., skin color, height), attitudes (e.g., handguns, welfare), and trait attributes (e.g., social sensitivity, intelligence). These assumptions are also true of many nonsocial categories, such as classes of cars (e.g., gas mileage, cargo capacity) and diseases (e.g., blood pressure, temperature, discharge color). A consequence of these assumptions is that even the optimal classifier cannot achieve 100% categorization accuracy. To illustrate, suppose you know a man who is rather effeminate, and you want to decide whether that man is gay or straight. You know he is either a gay man or an unusually effeminate straight man because past experience has taught you that men this effeminate are more likely to be gay than straight. In the absence of other cues to inform your judgment, you will maximize your chance of being right (i.e., maximize your classification accuracy), if you classify this man as gay. Of course, some straight men actually are this effeminate, so you will be wrong some of the time. Because you do not know which of the men with this level of effeminacy are gay and which are straight, however, you can achieve higher classification accuracy if you always respond “gay” for men with this level of effeminacy. More generally, there is a level of effeminacy at which gay and straight men are equally likely. Given effeminacy as your only cue, you can maximize your classification accuracy if you consistently respond “gay” for effeminacies greater than the point of equal likelihood and consistently respond “straight” for effeminacies less than that point. For a target that has an effemi-

408

QUELLER, SCHELL, AND MASON

nacy that is equally likely for gay and straight men, your best bet is to guess at sexual orientation. The optimal classifier will systematically misclassify those stimuli whose attributes have a higher likelihood of being from the opposing category. Because human perceivers become nearly optimal in their classification after training with feedback (Ashby, 1992; Ashby & Gott, 1988), human perceivers also misclassify those stimuli whose characteristics make them more likely to have come from the opposing category. These misclassified stimuli are precisely those stimuli that are most similar to the comparison category. A simple assumption of the present theorizing is that systematic misclassification of a specific portion of the category distribution might lead to a biased representation of what the category is like. In the previous example, misclassifying the most effeminate straight men as gay men could lead you to underestimate the average effeminacy of straight men. It could also lead you to underestimate how effeminate the most effeminate straight man is. By a similar argument, you would overestimate the average effeminacy of gay men and the effeminacy of the least effeminate gay man. Established theory suggests why such bias might occur. For example, exemplar theory (Nosofsky, 1986) assumes that the correct category label that is provided during feedback is stored with each exemplar. Consistently responding “Group B” to the Group A stimuli whose attributes make them more likely to come from Group B could lead to storage of those Group A stimuli with the incorrect Group B label. That is, it may be the response label rather than the feedback label that is stored with each exemplar (Nosofsky & Johansen, 2000). In effect, this is tantamount to storage of a distribution of exemplars for each group that is truncated at the intercategory boundary. Storage of truncated distributions would produce both exaggerated differences between category means and exaggerated within-category homogeneity. In contrast to exemplar theory, Ashby and colleagues (e.g., Ashby & Gott, 1988) have argued that categorization is optimized not by making judgments based on stored exemplars but by determining the decision bound that optimizes accuracy in categorization. Although these researchers have not been explicit about the representational underpinnings of such boundary-driven responding, one possibility is that the response regions of the stimulus space are stored (Ashby & Casale, 2003). In this case, portions of the stimulus space that correspond to the real category distributions will be ignored in the representation of what the category is like. Assuming that the stored response region is used as the basis of making category-central tendency and dispersion estimates, both between-categories contrast and within-category assimilation would increase as the portions of the categories that are misclassified increase. There are, undoubtedly, other mechanisms that would weaken the associations between optimally misclassified stimuli and their correct category label and/or strengthen the associations between these stimuli and the incorrect category label. The goal of our research was not to pit these mechanisms against each other. Rather, we simply investigated the claim that optimizing classification accuracy can lead to large and predictable shifts in beliefs about category attributes. In doing so, the present research stands to support Allport’s (1954) claim that the simple act of categorization can affect beliefs about social groups.

This work contributes to the extant literature by suggesting a previously uninvestigated basis for between-categories contrast and within-category assimilation. Specifically, between-categories contrast and within-category assimilation are both predicted to occur when a prior goal to optimize categorization accuracy has induced systematic misclassification of portions of the category distributions. Consistent with this argument, four studies manipulated factors that altered the optimally misclassified portions of the category distributions and, as expected, produced predictable shifts in between-categories contrast and within-category assimilation.

Study 1 The optimal responder will misclassify those stimuli whose attributes make them more likely to have come from the opposing category. For a nonideal responder, misclassifications may not be restricted to the optimally misclassified stimuli but will nonetheless occur more frequently for those stimuli whose attributes are similar to those of the contrast category. Thus, if systematic misclassification leads to biased assessments of category attributes, human classifiers will shift their estimates of a fixed target category’s central tendency as a function of the direction of overlap with the comparison category. Using a between-subjects design, Study 1 compared a fixed target Group A with a contrast category that was either more intelligent and less friendly (Group Bi) or one that was more friendly and less intelligent (Group Bf; see Figure 1). To maximize A versus Bi classification accuracy, perceivers will misclassify the most intelligent and least friendly Group A members as belonging to Group B. As a result, Group A was expected to be seen as relatively less intelligent and more friendly than it really was. Similarly, to maximize A versus Bf classification accuracy, we

Figure 1. Bivariate normal stimulus distributions for Groups A and B of Study 1. Participants learned about either Group A and a contrast category that was more friendly and less intelligent (Bf) or about Group A and a contrast category that was more intelligent and less friendly (Bi). Studies 2 and 3 use the same Group A stimuli and variations of the Group Bf stimuli as described in the text.

CLASSIFICATION, CONTRAST, AND ASSIMILATION

misclassified the most friendly and least intelligent Group A members as belonging to Group B. Consequently, Group A was expected to be seen as less friendly and more intelligent than it really was. Thus, assessments of the target category’s attributes were predicted to be systematically contrasted away from the relevant comparison category. Note, however, that comparison of perceivers’ estimates of Group A’s attributes to Group A’s true distribution parameters does not provide the best test of the hypothesis. Any systematic misjudgment, for example, a tendency to underestimate, would bias the estimates away from the true population parameters. In addition, a spurious main effect, such as a bias to report that people are more intelligent than friendly, would affect comparisons of estimates to actual population parameters. These potential confounds were dealt with by testing for the presence of an interaction between direction of overlap (Bf vs. Bi) and trait (intelligence vs. friendliness). A significant interaction such that a fixed Group A was judged to be relatively more intelligent and less friendly in the Bf condition than in the Bi condition could be taken as evidence of between-categories contrast. On each trial, participants viewed two bars whose heights depicted the degree of friendliness and intelligence of an individual group member. The participant learned about the groups by deciding whether each person was a member of Group A or Group B, receiving immediate corrective feedback, and continuing on to the next trial. After participants demonstrated the ability to distinguish between the categories at near optimal levels, they made judgments about the fixed target group, Group A.

Method Design and participants. Study 1 used a 2 (Bf vs. Bi direction of overlap) ⫻ 2 (bar/exemplar based vs. exemplar based/bar counterbalance) ⫻ 2 (intelligence vs. friendliness) design with direction of category overlap as the key variable that was manipulated as a between-subjects factor (Bf vs. Bi). A counterbalance in the order of the dependent measures constituted a second between-subjects factor (bar– exemplar based vs. exemplar based– bar; see the Procedure section for details). Information about both intelligence and friendliness was provided for each stimulus, so trait (intelligence vs. friendliness) constituted a third, within-subject factor. Twenty-six undergraduates enrolled in introductory psychology at the University of California, Santa Barbara, participated in this study and received partial course credit for their participation. Participants were randomly assigned to either the Bf or the Bi condition. Stimulus materials. The stimuli presented to each participant were displayed on a conventional VGA computer monitor. Each stimulus consisted of two red bars against a black background. The bar on the left was labeled intelligence and the bar on the right was labeled friendliness. The heights of the bars indicated each stimulus person’s level of intelligence and friendliness. For the training phase of the experiment, 250 stimuli were generated for Group A and 250 stimuli were generated for Group B. For the Group A stimuli, the mean bar height on the two dimensions was equal (␮i ⫽ ␮f ⫽ 180 pixels, see Figure 1). For participants in the Bi condition, Group B was more intelligent (␮i ⫽ 235) and less friendly (␮f ⫽ 125) than Group A. For participants in the Bf condition, Group B was more friendly (␮f ⫽ 235) and less intelligent (␮i ⫽ 125) than Group A. For each group, stimuli were sampled from bivariate normal populations that varied on the dimensions of intelligence and friendliness. The standard deviation on each dimension was equal to 30 pixels, and the covariation between the two dimensions was zero. The sampling of stimuli from these distributions was random with three constraints: (a) no stimuli could be more than 3 standard

409

deviations from the mean, (b) constants were added to the stimuli from each group so that the sample means were equal to the desired population mean, and (c) a linear decision bound existed that would produce 90 (⫾1)% correct categorizations. For the test phase of the experiment, 121 stimuli were generated. These stimuli were distributed across the stimulus space at equal intervals. Specifically, intelligence and friendliness bar heights ranged from 30 to 330 pixels in 30-pixel increments, and a stimulus was generated for each possible pairing of these intelligence and friendliness bar heights. The same set of 500 training and 121 test stimuli was shown to all participants. The order of presentation of the stimuli within the training and test phases was randomly determined. However, to reduce betweensubjects variability, we presented the stimuli to all participants in the same random order. Procedure. Participants were placed in individual cubicles and read the following instructions on a computer: In this part of the experiment, your task will be to learn about the characteristics of people from one of two groups, labeled Group A and Group B. Each person from these groups has been given a personality test that measures both friendliness and intelligence. A graph of these two scores will be displayed for each individual. Your task is to guess the group membership of that person based on these graphs. In each training trial, the bar graph depicting the intelligence and friendliness of one group member was displayed on the monitor. The participant’s categorization judgment was recorded by a keyboard response. Immediately after this response, written feedback was presented above the bar graph. This feedback indicated whether the participant’s response was correct or incorrect and provided the correct group label for that individual (e.g., “[in]correct, that was a member of Group A”). The feedback and stimuli remained on the screen for 2 s before the beginning of the next trial. As soon as 131 of the previous 150 responses were correct (87.3% accuracy), the participant moved on to the test phase. Note that this learning criterion is ⬃3% short of optimal responding and is well above the optimal accuracy that can be achieved using a single dimension (82.2%). The test phase included both a bar height adjustment measure of group central tendency and an exemplar-based measure of group central tendency. The order of these two tasks was counterbalanced. For the bar height adjustment measure of central tendency, participants directly estimated the average levels of intelligence and friendliness for Group A. They were presented with a stimulus in which both the intelligence and friendliness bars were initially set to the minimum values. They were told to “adjust the bars so that they represent the scores of the average member of Group A.” Participants adjusted the height of a particular bar using the 1 and 2 keys; they selected whether they were adjusting the intelligence or the friendliness bar by using the 4 or 3 keys. When they were satisfied with their estimation, they pressed the ↵ key, and the computer recorded the estimates. It is important to note that the starting point for the bar provides a low anchor for participants’ assessments of the category average. Thus, it would not be surprising if this anchor produced a bias such that both intelligence and friendliness were underestimated (Mussweiler, 2003; Tversky & Kahneman, 1974). Such an anchoring effect would not, however, jeopardize the power of the study to detect relative differences between the Bi and Bf conditions in estimates of intelligence and friendliness. For the exemplar-based measure of central tendency, participants viewed each test phase stimulus and pressed one of two keys to indicate whether that stimulus was or was not typical of Group A. By the end of the test phase, we had a set of intelligence–friendliness combinations that participants considered typical of Group A. We then derived an exemplar-based measure of central tendency by computing the average friendliness and intelligence for this set of exemplars. This exemplar-based measure is inherently multidimensional in that participants judge a particular combi-

QUELLER, SCHELL, AND MASON

410

nation of friendliness and intelligence on each trial, rather than making one judgment on intelligence and a separate judgment on friendliness. We hoped that the exemplar-based measure would get at perceptions of Group A that were affected as little as possible by comparisons to Group B at the time of judgment. Thus, participants were informed that they would decide whether each test individual was typical of Group A or not, but they were also informed that none of the individuals presented in the test phase would be members of Group B.

Results Recall that the true Group A mean has equal bar heights for intelligence and friendliness (␮i ⫽ ␮f ⫽ 180). In the Bi condition, in which Group B was more intelligent and less friendly than Group A, maximizing categorization accuracy should lead participants to misclassify the most intelligent and least friendly Group A members. Consequently, these participants’ assessments of the Group A average should be less intelligent than friendly. In the Bf condition, in which Group B was more friendly and less intelligent than Group A, maximizing categorization accuracy should lead participants to misclassify the most friendly and least intelligent Group A members. Consequently, these participants’ assessments of the Group A average should be less friendly than intelligent. These hypotheses were tested using a 2 (Bf vs. Bi direction of overlap) ⫻ 2 (bar/exemplar based vs. exemplar based/bar counterbalance) ⫻ 2 (intelligence vs. friendliness) analysis of variance (ANOVA). The counterbalance factor had no main effects and was involved in no interactions, and it is not discussed further. Bar height adjustment task. Five participants were excluded from this analysis because they failed to adjust both bars before continuing with the experiment. The group average judgments for the remaining 21 participants showed the predicted biases (Table 1); the predicted Direction of Overlap ⫻ Trait interaction was significant, F(1, 17) ⫽ 24.52; p ⬍ .001. A simple-effects analysis was conducted to determine whether the predicted bias occurred reliably within each condition. In the Bi condition, participants’ central tendency estimates of Group A had intelligence bars that were significantly lower than the corresponding friendliness bars, F(1, 17) ⫽ 5.90, p ⬍ .05, d ⫽ 0.54. In the Bf condition, participants’ central tendency estimates of Group A had intelligence bars that were significantly higher than the corresponding friendliness bars, F(1, 17) ⫽ 18.59, p ⬍ .001, d ⫽ 2.64. There was also a nonpredicted main effect for the trait that was being rated. Averaged across all conditions, participants’ central

tendency estimates had intelligence bars that were significantly higher than the corresponding friendliness bars, Ms ⫽ 176.9 and 150.1 respectively, F(1, 17) ⫽ 12.79, p ⬍ .01, d ⫽ 0.74. Exemplar-based measure of central tendency. The average intelligence and friendliness bar heights of the exemplars that participants judged to be typical of Group A also showed the predicted biases (Table 1) and the Direction of Overlap ⫻ Trait interaction was significant, F(1, 22) ⫽ 155.2, p ⬍ .001. A simple-effects analysis was conducted to determine whether the bias occurred reliably within each condition. In the Bi condition, the average intelligence bars of exemplars judged typical of Group A were significantly lower than the corresponding friendliness bars, F(1, 22) ⫽ 75.25, p ⬍ .001, d ⫽ 3.17. In the Bf condition, the average intelligence bars of exemplars judged typical of Group A were significantly higher than the corresponding friendliness bars, F(1, 22) ⫽ 79.14, p ⬍ .001, d ⫽ 6.93. This analysis was replicated with single-subject analyses. For those stimuli that a particular participant judged as typical of Group A, the heights of the intelligence bars were compared with the heights of the friendliness bars. In the Bf condition, 12 of 13 participants showed the predicted pattern (i.e., intelligence ⬎ friendliness), Fs ⬎ 5.17, ps ⬍ .05. In the Bi condition, 11 of 13 participants showed the predicted pattern (i.e., friendliness ⬎ intelligence), Fs ⬎ 6.58, ps ⬍ .05. None of the remaining participants showed significant results opposite the predicted pattern.

Discussion In Study 1, participants’ estimates of Group A’s central tendency were contrasted from Group B. As the portion of Group A that overlapped with Group B was systematically altered, the participants’ assessments of Group A also shifted, even though the Group A stimuli remained constant. This pattern of bias was evident in participants’ estimates of the Group A average as well as in the stimuli they judged to be typical of Group A. These biases were demonstrated for participants who distinguished Group A members from Group B members at near optimal levels in the training phase. Thus, as expected, very good categorization performance was accompanied by rather gross errors in judgments about category attributes. The main effect for trait type on the group average judgments was unanticipated. Averaged over both stimulus types, participants’ central tendency estimates had intelligence bars that were

Table 1 Perceived Group A Characteristics by Condition for Different Measures of Central Tendency (Study 1) Intelligence Variable Group A average estimates Bf Bi Stimuli judged typical of A Bf Bi

Friendliness

M

SD

M

SD

Difference

Cohen’s d

201.2 152.6

32.1 48.4

122.7 175.3

27.1 35.0

78.5 ⫺22.7

2.64 0.54

221.9 155.4

8.4 20.1

145.2 214.5

13.2 17.1

76.7 ⫺59.1

6.93 3.17

Note. All trait measures are in screen pixels. Bf ⫽ B more friendly and less intelligent condition; Bi ⫽ B more intelligent and less friendly condition.

CLASSIFICATION, CONTRAST, AND ASSIMILATION

higher than the corresponding friendliness bars. Because participants’ stereotypes were reliably shifted for both conditions, this main effect does not limit the conclusions of the current study. It does, however, suggest that experiments should be designed so that the crucial model predictions are tested as interactions between traits and other factors. Note that assessments of a given trait across conditions differ by up to 69 pixels in the predicted direction. Compare this with a true difference across conditions of 0 pixels (i.e., the same Group A stimuli were used in both conditions). In absolute terms, these results indicate that participants’ estimates differed by up to 2.4 cm on the computer screen (corresponding to a visual angle for the difference of 2.2°). In addition, the effect sizes were moderate to very large. It is difficult to compare the extent of contrast with a study like Goldstone’s (1994), which measured categorical perception using signal detection analyses of same– different judgments. However, the magnitude of the effect in the present study seems larger than one might expect on the basis of perceptual disambiguation processes alone.

Study 2 Study 2 also kept the information about the target Group A constant across conditions but manipulated the degree of overlap rather than the direction of overlap. In both conditions, the Group B stimuli were, on average, more friendly and less intelligent than the Group A members. The Bfar condition was similar to the Bf condition of Study 1. However, the Bnear condition shifted the Group B stimuli closer to Group A. The portion of the Group A distribution that the optimal responder would misclassify as belonging to Group B was larger in the Bnear condition than in the Bfar condition. Therefore, we expected that assessments of Group A’s attributes would have a greater bias away from Group B in the Bnear condition than in the Bfar condition. The extent of overlap between Groups A and B also has implications for the perceived variability within a category. Because a larger portion of Group A is misclassified by the optimal responder in the Bnear condition than in the Bfar condition, we predicted Group A would be perceived as more homogeneous in the Bnear condition than in the Bfar condition. This prediction is all the more interesting because it directly opposes a prediction based on psychophysical arguments. According to Weber’s law (e.g., Whitaker, Bradley, Barrett, & McGraw, 2002), as the groups move farther apart, the proportion of the space occupied by Group A gets smaller and Group A should be seen as more homogeneous. Thus, our prediction of more homogeneous beliefs about Group A in the Bnear condition than in the Bfar condition would only hold if our predicted effects overpower a well-supported psychophysical law. Many different measures of perceived homogeneity have been used in past research. However, these measures are often incapable of gauging the perceived distribution of multidimensional stimuli. For instance, a standard method of measuring dispersion is to find out which stimuli are perceived to be the highest and lowest on the desired dimension, thus obtaining the perceived range of the target group. For stimuli such as ours, the range of interest involves neither the most intelligent nor the friendliest members, but rather the instances that simultaneously maximize one dimension while minimizing another (the upper right and lower left portions of the distribution; see Figure 1). Thus, our predictions demanded the use

411

of an inherently multivariate measure of dispersion. We realized that we could gauge the perceived dispersion by observing the variability in the exemplars that participants judged to be typical of Group A in the exemplar-based measure of central tendency. Each stimulus presented in this task describes a specific combination of intelligence and friendliness, so this measure is inherently multivariate and can get at stimuli that participants see as “less A-like” because they maximize one dimension and minimize another. Furthermore, because the exemplars presented for typicality judgments were evenly distributed throughout the stimulus space, each represents an equal area (900 pixels2). The number of exemplars categorized as typical of Group A thus provides a multidimensional measure of the perceived dispersion within Group A and is ideal for our study because it integrates the perceived dispersion across both dimensions. In Study 2, then, the exemplars judged as typical of Group A were used to compute both an exemplar-based measure of central tendency and an exemplar-based measure of dispersion. As before, the bar adjustment measure was also included as an additional test of beliefs about Group A central tendency. We predicted the two measures of central tendency would reveal more contrast away from Group B in the Bnear condition than in the Bfar condition. In addition, we expected the exemplar-based measure of dispersion to reveal more within-group homogeneity in the Bnear condition than in the Bfar condition.

Method Design and participants. The design was similar to that of Study 1, except extent of overlap was manipulated rather than direction of overlap. Thus, Study 2 was a 2 (Bnear vs. Bfar) ⫻ 2 (bar/exemplar based vs. exemplar based/bar counterbalance) ⫻ 2 (intelligence vs. friendliness) design, with the latter factor being within-subject and the other two factors being between-subjects. Forty undergraduates enrolled in introductory psychology at the University of Illinois, Urbana–Champaign participated in this study. They received partial course credit for their participation. Stimulus materials. In the Bfar condition, the Group A and Group B means were the same as those in the Bf condition of Study 1 (A: ␮Intelligence ⫽ ␮Friendliness ⫽ 180; Bfar: ␮Intelligence ⫽ 125; ␮Friendliness ⫽ 235). To create the Bnear stimuli, a constant was subtracted from the friendliness scores and added to the intelligence scores of the Bfar stimuli (Bnear: ␮Intelligence ⫽ 150, ␮Friendliness ⫽ 210). This transformation shifted the B stimuli so that they overlapped more with the A stimuli in the Bnear condition than in the Bfar condition, resulting in more stimuli being misclassified by the optimal responder in the Bnear condition. That is, even though the same Group A stimuli are presented in the two conditions, maximal categorization accuracy is achieved by moving the classification boundary 17.7 pixels closer to the Group A mean in the Bnear condition as compared with the Bfar condition (i.e., the y-intercept shifts by 25 pixels). In Study 2, the standard deviation was slightly smaller (␴ ⫽ 25) than in Study 1, producing optimal accuracies of 95.7% in the Bfar condition and 80.2% in the Bnear condition.1

1 The optimal classifier also misclassifies the least friendly and most intelligent Group B members. Assessing beliefs about Group A attributes, however, reveals the effect of the near–far manipulation in the absence of any change in the group’s actual attributes. It is possible that the increase in the actual similarity of Group B to Group A in the Bnear condition would be offset by increased misclassification-driven contrast. We did not test this possibility here.

QUELLER, SCHELL, AND MASON

412

Procedure. The procedure was the same as that for Study 1 with two exceptions. First, because the two conditions differed in the amount of overlap between categories, the training criteria were altered to reflect the relative difficulty of the two stimulus sets. In the Bfar condition (5% overlap), participants had to get 91% correct within the previous 150 trials before moving on to the test phase. In the Bnear condition (20% overlap), participants had to get 76% correct within the previous 150 trials before moving on to the test phase. These criteria were chosen so that participants in both conditions would learn to an accuracy that is within ⬃4% of an optimal classifier. Note that in both conditions, perceiver accuracy had to exceed the accuracy of the optimal unidimensional classifier (Bnear optimal unidimensional accuracy ⫽ 72.7%; Bfar optimal unidimensional accuracy ⫽ 89.7%). Second, the Study 2 procedure explicitly informed participants that although none of the test stimuli (in the exemplar-based measure) were from Group B, some of them would, nonetheless, be rather atypical of Group A. These instructions emphasized that this task was completely different from the training task.

Results Replicating Study 1, participants’ beliefs about the central tendency of Group A were expected to be biased in a manner that would increase contrast between the two groups. As in the Bf condition of Study 1 (Figure 1), contrast away from Group B would lead to estimates of intelligence that are higher than estimates of friendliness for Group A. More important, it was predicted that the size of this bias would be greater in the Bnear condition than in the Bfar condition because of the higher degree of overlap in the Bnear condition. In addition, we predicted that the change in category overlap should affect the range of exemplars that participants judged as typical of Group A. Specifically, participants should consider a smaller number of exemplars as typical of Group A in the Bnear condition than in the Bfar condition. Bar height adjustment task. Three participants were excluded from this analysis because they failed to adjust both bars before continuing with the experiment. The Group A average estimates for the remaining 37 participants replicated Study 1 and showed the predicted biases away from Group B (Table 2). Across both experimental conditions, participants’ central tendency estimates had intelligence bars (M ⫽ 174.2) that were significantly higher than their friendliness bars (M ⫽ 141.9), F(1, 33) ⫽ 66.4, p ⬍ .001, d ⫽ 0.99. A simple-effects analysis was conducted to determine whether the predicted bias occurred reliably within each of the stimulus conditions. The average of the intelligence bars was

reliably higher than the average of the friendliness bars in both the Bnear and Bfar conditions, F(1, 33) ⫽ 50.8, p ⬍ .001, d ⫽ 1.28 and F(1, 33) ⫽ 15.6, p ⬍ .001, d ⫽ 0.66, respectively. Most important, the Extent of Overlap ⫻ Trait interaction was also significant, F(1, 33) ⫽ 6.71, p ⬍ .05. As predicted, the size of the bias increased in the Bnear condition relative to the Bfar condition. Exemplar-based measure of central tendency. Typicality judgments for the 40 participants also revealed the predicted biases (see Table 2). Across both the Bnear and Bfar conditions, the exemplars that participants judged typical of Group A had higher average intelligence bar heights (M ⫽ 212.4 pixels) than friendliness bar heights (M ⫽ 149.7 pixels), F(1, 36) ⫽ 442.8, p ⬍ .001, d ⫽ 6.27. In addition, a simple-effects analysis was conducted to determine whether the predicted bias occurred reliably within each of the overlap conditions. The average of the intelligence bars was higher than the average of the friendliness bars in both the Bnear and Bfar conditions, F(1, 36) ⫽ 297.1, p ⬍ .001, d ⫽ 7.29, and F(1, 36) ⫽ 145.8, p ⬍ .001, d ⫽ 5.21, respectively. Single-subject analyses also showed the predicted effects. In the Bnear condition, 20 of 21 participants showed the predicted pattern, ps ⬍ .05. In the Bfar condition, 15 of 19 participants showed the predicted pattern, ps ⬍ .05. One participant did show significant results opposite the predicted pattern, p ⬍ .05. The main prediction of Study 2 was confirmed in the exemplarbased measure of central tendency as well, as shown by the significant Extent of Overlap ⫻ Trait interaction, F(1, 36) ⫽ 12.91, p ⬍ .05. As expected, the size of the bias was larger in the condition in which Group B was nearer to Group A. Exemplar-based measure of dispersion. It was predicted that participants would judge a larger number (i.e., a wider range) of exemplars as typical of Group A when the groups were farther apart. As predicted, participants in the Bfar condition judged more exemplars as typical of Group A (M ⫽ 50.47) than did participants in the Bnear condition (M ⫽ 46.86), F(1, 38) ⫽ 4.49, p ⬍ .05, d ⫽ 0.66. Thus, increased overlap between categories led to a decreased multidimensional range of stimuli that were judged to be typical of Group A. Stimuli viewed in training. Because the training criteria were different for the two stimulus conditions, it is possible that participants in the two conditions viewed different numbers of training stimuli. There was a marginal trend indicating that participants in the Bnear condition viewed more stimuli before completing training

Table 2 Perceived Group A Characteristics by Condition for Different Measures of Central Tendency (Study 2) Intelligence Variable Group A average estimates Bfar Bnear Stimuli judged typical of A Bfar Bnear

Friendliness

M

SD

M

SD

Difference

Cohen’s d

175.6 172.9

33.2 33.0

154.4 130.0

30.6 33.9

21.2 42.9

0.66 1.28

207.6 217.2

10.2 10.3

155.7 143.9

9.7 9.8

51.9 73.3

5.21 7.29

Note. All trait measures are in screen pixels. Bfar ⫽ similar to the B more friendly and less intelligent condition; Bnear ⫽ stimuli shifted closer to Group A.

CLASSIFICATION, CONTRAST, AND ASSIMILATION

(M ⫽ 184.6) than did participants in the Bfar condition (M ⫽ 140.4), F(1, 38) ⫽ 3.30, p ⬍ .10, d ⫽ 0.56.

Discussion As in the previous studies, the perceived central tendency of Group A was contrasted away from Group B. More important, the magnitude of this bias was affected by the relative closeness of the two groups such that the bias was largest when the groups were more similar. This difference in bias cannot be easily explained as a function of the different learning criteria used in the two conditions because the criteria for the Bnear condition required participants to view slightly more training stimuli. If anything, this increased training should have made the estimates in the Bnear condition more accurate than those in the Bfar condition, opposite the obtained pattern of results. Also as predicted, perceivers’ beliefs about Group A attributes were more homogeneous as the true intergroup similarity increased and more Group A stimuli were optimally misclassified as belonging to Group B. If anything, this finding is counterindicated by well-established laws of psychophysics. Psychophysical effects drive perceivers to see the same two stimuli as less similar to one another when the total range of the stimuli is small and more similar to one another when the total range of stimuli is large. Perceptual effects would thus drive perceivers to see the Group A members as less homogeneous in the Bnear condition. Study 2 produces the opposite effect. Thus, categorization effects are presumably overpowering any psychophysical scaling effects in this study. The findings of Study 2 have both positive and negative implications for accuracy of knowledge about category attributes. On the plus side, the biases are not pervasive because they have minimal effects for highly distinct groups. On the minus side, the biases that do exist are resistant to elimination because a reduction in actual group differences is partially compensated by increased contrast. In short, perceived central tendency and variability are warped in a way that maintains group distinctions. The resulting biases may not always be large, but are likely to be persistent. Study 2 demonstrated that both the between-categories contrast and the within-category assimilation are dependent on the objective extent of overlap in the distributions of the groups. The more similar the groups are, the larger both biases become. This is consistent with the idea that biases in beliefs about category attributes increase as the portion of the category distribution that is misclassified by the optimal observer increases. In summary, taking the perspective of the optimal classifier suggests an interplay between perceived central tendency and perceived variability that is substantiated by the experimental data.

Study 3 In Study 3, the misclassifications made by the optimal responder were manipulated by changing the relative base rates for the two groups. Thus, Group A was either a majority compared with Group B or a minority compared with Group B. Although the relative sizes of the groups varied, the means and variances of the groups remained constant across conditions. When base rates are equal, classification accuracy is maximized by a classification criterion that is defined by the points at which

413

stimuli are equally likely to belong to either category. However, if the number of members in one category is increased, stimuli that had a 50/50 chance of belonging to either category are now more likely to belong to the more numerous category. Therefore, the optimal decision criterion shifts to include these stimuli in the more numerous category (Ashby, 1992; Ashby & Gott, 1988; Maddox, 1995). Although this shift produces more correct classifications overall, it increases the misclassifications of stimuli in the overlapping portion of the minority category. This should result in a larger contrast effect for a minority group than for a majority group. In addition, factors that move the optimal classification boundary toward a category’s mean narrow the range of stimuli that people classify as belonging to that category. As mentioned above, when the base rates of two categories are made unequal, the decision criterion shifts toward the less numerous category. Therefore, perceivers should see a minority group as more homogeneous than a majority group.

Method Design and participants. The design of Study 3 was a 2 (1:2 vs. 2:1 base rate) ⫻ 2 (bar/exemplar based vs. exemplar based/bar counterbalance) ⫻ 2 (intelligence vs. friendliness) factorial, with the latter factor manipulated within subject and the other two factors manipulated between subjects. Twenty-seven undergraduates enrolled in introductory psychology at the University of California, Santa Barbara participated in this study. They received partial course credit for their participation. Stimulus materials. The values of the training stimuli in this experiment were taken from the Bf condition of Study 1 (10% overlap) and were presented in the same manner. Unlike Study 1, however, the number of stimuli shown from the two stimulus groups was not equal. This was accomplished by creating two sets of 750 training stimuli. In one set, each of the 250 Group A stimuli was included twice, whereas the 250 Bf stimuli were included only once (2:1 condition). In the other set, each of the Group A stimuli was included once, whereas the Bf stimuli were included twice (1:2 condition). Even though the stimuli for each group have the same means, variances, and covariance across the base-rate conditions, the optimal bound shifts ⬃16 pixels toward the Group A mean in the 1:2 condition as compared with the 2:1 condition. (Put a different way, the y-intercept is ⬃22 pixels higher in the 2:1 condition than in the 1:2 condition.) Procedure. The procedure of the current study was identical to that of Study 2 with the exception that the two conditions differed in the base rates between groups rather than in the means of Group B stimuli. In addition, consistent with the use of the Bf stimuli from Study 1, the Study 1 learning criterion was used in the present study (87.3%). This learning criterion was below optimal accuracy (90%) but exceeded the maximum accuracy that could be attained using a unidimensional rule (84.8%.)

Results With these stimuli, the central tendency predictions would be confirmed if perceivers’ judgments about Group A’s intelligence were higher than their judgments about friendliness and if this effect were particularly pronounced in the 1:2 base rate condition. The variability predictions would be confirmed if Group A was seen as more homogeneous in the 1:2 base rate condition than in the 2:1 base rate condition. Bar height adjustment task. Four participants were excluded from this analysis because they failed to adjust both bars. The group average judgments for the remaining 23 participants showed

QUELLER, SCHELL, AND MASON

414

the predicted biases (Table 3). Across both base rate conditions, participants’ central tendency estimates had intelligence bars (M ⫽ 177.8) that were significantly higher than their friendliness bars (M ⫽ 146.2), F(1, 19) ⫽ 65.8, p ⬍ .001, d ⫽ 1.29. A simpleeffects analysis was conducted to determine whether the predicted bias occurred reliably within each of the base rate conditions. The average of the intelligence bars was reliably higher than the average of the friendliness bars in both the 1:2 and 2:1 conditions, F(1, 19) ⫽ 35.1, p ⬍ .001, d ⫽ 1.48, and F(1, 19) ⫽ 30.2, p ⬍ .001, d ⫽ 1.01, respectively. There was a trend in the predicted direction for the Trait ⫻ Base Rate interaction, F(1, 19) ⫽ 3.516, p ⬍ .10. Thus, there is tentative evidence that the size of the bias is increased in the 1:2 base rate condition relative to the 2:1 condition. Exemplar-based measure of central tendency. Three participants were excluded from this analysis: 2 because they held down a single response key for a significant number of consecutive trials, and 1 because he failed to follow the task instructions. Typicality judgments for the remaining 24 participants showed the predicted biases (Table 3). Across both base rate conditions, the exemplars that participants judged typical of Group A had higher average intelligence bar heights (M ⫽ 224.6 pixels) than friendliness bar heights (M ⫽ 139.5 pixels), F(1, 20) ⫽ 1343.00, p ⬍ .001, d ⫽ 14.47. In addition, a simple-effects analysis was conducted to determine if the predicted bias occurred reliably within each of the base rate conditions. The average of the intelligence bars was higher than the average of the friendliness bars in both the 1:2 and 2:1 conditions, F(1, 20) ⫽ 33.78, p ⬍ .001, d ⫽ 13.35, and F(1, 20) ⫽ 30.4, p ⬍ .001, d ⫽ 15.42, respectively. This analysis was replicated by comparing mean intelligence and friendliness bar heights in single-subject analyses. In the 1:2 condition, 15 of 15 participants showed the predicted pattern, ps ⬍ .05. In the 2:1 condition, 8 of 9 participants showed the predicted pattern, ps ⬍ .05. The remaining participant did not produce results significantly opposing the predicted effects. The main prediction of Study 2 was confirmed, and the Base Rate ⫻ Trait interaction was also significant, F(1, 20) ⫽ 5.57, p ⬍ .05. The size of the bias was larger in the 1:2 base rate condition than in the 2:1 condition. Exemplar-based measure of dispersion. It was predicted that participants would judge a larger number (i.e., a wider range) of exemplars as typical of Group A when Group A was in the majority than they would when Group A was in the minority. As

predicted, participants in the 2:1 condition judged more exemplars as typical of Group A (M ⫽ 76.83) than did participants in the 1:2 condition (M ⫽ 67.54), F(1, 20) ⫽ 13.14, p ⬍ .01, d ⫽ 1.34. These data are consistent with the prediction that minority status leads to increased perceptions of homogeneity because minority status is accompanied by optimal misclassification of a larger portion of the category. However, these results can also be explained via one of at least three other mechanisms. First, a perceiver who saw more stimuli in a group might feel compelled to report more members as being typical of the group. With our uniformly distributed test stimuli, this would produce a wider measure of variability for the majority compared with the minority group. Second, a perceiver might use a straight measure of numerosity to determine typicality (e.g., “If I saw more than 3 of them, it’s typical”). Third, a perceiver’s judgments of homogeneity might be affected by group size because even if two groups have identical variability in reality, the smaller group provides relatively less exposure to members at the extremes of the category distribution (Linville, Fischer, & Salovey, 1989). It is true that any of these alternative explanations would produce enhanced homogeneity for the minority group relative to the majority group in the present study. It is also true that none of these alternative explanations rely on misclassification of the optimally misclassified stimuli. However, these explanations can account neither for the central tendency results nor for the dispersion results of Study 2. Only the misclassification argument can parsimoniously explain the combined results of Studies 1 through 3.

Discussion In Study 3, higher intelligence than friendliness ratings indicated that perceivers contrasted beliefs about Group A’s attributes away from those of Group B. More important, the size of the bias was predicted to be larger in the condition in which Group A was less numerous than Group B. This effect was found using the exemplar-based measure of central tendency. A trend in this direction was also found in participants’ bar height adjustment estimates of the Group A average. Finally, participants judged a narrower range of exemplars as being typical of Group A when members of that group were in the minority in the training phase of the experiment. Overall, this pattern of effects is consistent with predictions: Increasing the number of optimally misclassified

Table 3 Perceived Group A Characteristics by Condition for Different Measures of Central Tendency (Study 3) Intelligence Variable Group A average estimates 2:1 A:B 1:2 A:B Stimuli judged typical of A 2:1 A:B 1:2 A:B Note.

Friendliness

M

SD

M

SD

Difference

Cohen’s d

167.9 186.3

24.1 25.5

144.4 148.0

22.3 26.2

23.5 38.3

1.01 1.48

220.8 228.4

5.9 5.3

141.2 137.8

4.3 8.0

79.6 90.6

15.42 13.35

All trait measures are in screen pixels.

CLASSIFICATION, CONTRAST, AND ASSIMILATION

Group A stimuli led to increases in both between-categories contrast and within-category assimilation. The results of Study 3 also have practical implications for understanding the perception of minority and majority groups. In spite of the fact that participants were trained to a high accuracy criterion in the categorization task, their assessments of the minority group demonstrated greater bias and greater perceived homogeneity relative to the majority group. This effect occurred in the absence of any explicit group affiliation for the participants, and it cannot be explained as a general positivity bias for majority group members because the bias was positive on one dimension and negative on the other. This evidence for a nonmotivated bias in one’s perception of minority groups has implications for a wide range of social problems, for example, unequal allocation of resources and the maintenance of intergroup conflict. It also suggests that even the most well-informed and well-intended people may find it difficult to accurately perceive minority groups.

415

Group A members) and the highest ability Group B members (as Group C members). In contrast, the optimal classifier in the AB_CD and CD_AB conditions would misclassify only the lowest ability Group B members (as Group A members). If optimized misclassification contributes to contrast and assimilation, then the ABCD condition should produce lower estimates of perceived central tendency compared with the AB_CD and CD_AB conditions. In addition, Group B should be perceived as more homogeneous in the ABCD condition than in the AB_CD and CD_AB conditions. This difference in perceived homogeneity should be driven by the high end of the Group B distribution. An alternate possibility is that overlap with similar categories impacts assessments of a group’s attributes regardless of whether systematic misclassification of stimuli has occurred. If so, assessments of Group B’s attributes would not vary as a function of learning condition.

Method Study 4 Studies 1 through 3 are consistent with the idea that altering the comparison category so that a larger portion of the target category is optimally misclassified increases between-categories contrast and within-category assimilation. To this point, however, any other mechanism that involves similarity between two categories might also explain our results. The goal of Study 4 was to strengthen the argument that optimized misclassification is driving our effects. Study 4 investigated assessments of a fixed-target group in two diagnostic situations. Both situations involved the same degree of intercategory overlap, but one situation included a goal to maximize classification accuracy, whereas the other situation did not. All participants in Study 4 were presented with identical information about the scholastic abilities of members of four groups, Groups A, B, C, and D (Figure 2). In the ABCD condition, participants decided to which of the four groups each stimulus belonged with the benefit of corrective feedback on each trial. In the other two conditions, they classified Groups A and B with feedback and, separately, classified Groups C and D with feedback. In one of these two latter conditions, participants learned about Groups A and B first (the AB_CD condition). In the other, participants learned about Groups C and D first (the CD_AB condition). Thus, the goal of optimizing Group B versus Group C classification accuracy is present in the ABCD condition but is absent in the AB_CD and CD_AB conditions. After classifying all of the stimuli, each participant assessed the attributes of Group B. An optimal classifier in the ABCD condition would misclassify both the lowest ability Group B members (as

Figure 2.

Design and participants. The design of Study 4 is a 3 (ABCD vs. AB_CD vs. CD_AB learning condition) ⫻ 2 (bar/exemplar based vs. exemplar based/bar counterbalance) ⫻ 2 (low/high bar vs. high/low bar counterbalance) between-subjects factorial. The learning condition factor manipulated the variable of interest, whereas the other two factors served only as counterbalances. Two hundred thirty-nine undergraduates enrolled in introductory psychology at Indiana University participated in this study. They received partial course credit for their participation. Stimulus materials. Each stimulus consisted of a single vertical white bar displayed against a black background. The height of the bar indicated the scholastic ability score for a single individual. For the training phase of the experiment, 90 stimuli were sampled from each of four univariate normal distributions (Figure 2). The means of the four groups were ordered in 70-pixel increments: ␮A ⫽ 115 pixels; ␮B ⫽ 185 pixels; ␮C ⫽ 255 pixels; and ␮D ⫽ 325 pixels. For all groups, the standard deviation was equal to 30 pixels. Random samples of 90 stimuli were drawn from each of these four populations. The sampled stimuli for each group were then mathematically transformed to match the population mean and population standard deviation. The amount of overlap per tail was approximately 9.1%. The same set of 90 Group A, 90 Group B, 90 Group C, and 90 Group D training stimuli was shown to participants in all conditions. The order of presentation of the stimuli within the training phase was randomly determined. The stimuli were presented to all participants in the ABCD condition in the same random order. The Group A versus Group B training set was generated by removing the Group C and Group D stimuli from this ordered ABCD training set. The Group C versus Group D training set was generated by removing the Group A and Group B stimuli from this ordered ABCD stimulus set. For the test phase of the experiment, 88 stimuli were sampled without replacement from a uniform distribution ranging from a bar height of 56 to

Univariate normal stimulus distributions for Groups A, B, C, and D of Study 4.

QUELLER, SCHELL, AND MASON

416

a bar height of 400. All participants judged the typicality of each test stimulus. The order of presentation of the test stimuli was randomized but held constant for all participants. Procedure. Participants learned that they would view scholastic ability scores for a series of individuals and that their task would be to select the group to which each individual belonged. In the ABCD condition, participants were instructed that they would see members of four different groups and that they should press one of four response keys labeled A, B, C, or D to indicate the group membership corresponding to the presented scholastic ability score. After each classification choice, participants received corrective feedback and continued on to the next trial. In the AB_CD and CD_AB conditions, participants received similar instructions but were told they would learn about two groups. In the AB_CD condition they first learned about Groups A and B, and in the CD_AB condition they first learned about Groups C and D. They were presented with the corresponding stimuli. Participants pressed one of two labeled keys to indicate their classification choice for each stimulus. Feedback was provided after each classification attempt before the participant continued on to the next trial. After completing training on two groups, participants in the AB_CD and CD_AB conditions repeated the same procedure with the two groups they had not yet classified. All participants trained once on each of the 360 training stimuli and then proceeded to the test phase. In the test phase, participants completed an exemplar-based measure and a group average bar adjustment task similar to those in Studies 1 through 3 (except they included only the one dimension of scholastic ability). The order of the exemplar-based measure and the bar height adjustment task was counterbalanced. In Study 4, participants also adjusted bar heights to indicate their estimates of the lowest Group B member and of the highest Group B member. The low and high estimates always followed the group average estimate, and the order of the low and high estimates was counterbalanced.

Results If misclassification plays a role in assessments of group attributes, assessments of Group B’s central tendency and dispersion should be lower in the ABCD condition than in the AB_CD and the CD_AB conditions. Results are presented in Table 4. As in previous studies, the counterbalancing produced no significant effects. Bar height adjustment task. Group average judgments for Group B showed the predicted biases. The ANOVA comparing all three conditions was significant, F(2, 236) ⫽ 3.93, p ⬍ .021. A contrast indicated that the estimate of group average in the ABCD condition (n ⫽ 72, M ⫽ 135.11, SD ⫽ 57.36) was lower than the

estimate of group average in the AB_CD and CD_AB conditions (n ⫽ 167, M ⫽ 160.51, SD ⫽ 69.22), F(1, 236) ⫽ 7.59, p ⬍ .006, d ⫽ 0.384. A contrast comparing the AB_CD and CD_AB conditions failed to reach significance, p ⬎ .5 High and low assessments in the bar height adjustment task. Participants also adjusted a bar to the highest Group B scholastic ability. As expected, these estimates also significantly differed as a function of condition, F(2, 236) ⫽ 19.63, p ⬍ .0001. A contrast indicated that the high estimate in the ABCD condition (n ⫽ 72, M ⫽ 179.00, SD ⫽ 43.01) was lower than the high estimate in the AB_CD and CD_AB conditions (n ⫽ 167, M ⫽ 249.02, SD ⫽ 92.73), F(1, 236) ⫽ 36.86, p ⬍ .0001, d ⫽ 0.861. The AB_CD and CD_AB conditions did not significantly differ, p ⬎ .19. Unexpectedly, participants’ estimates of the lowest Group B scholastic ability also significantly differed by condition, F(2, 236) ⫽ 3.73, p ⬍ .03. A contrast indicated that the low estimate in the ABCD condition (n ⫽ 72, M ⫽ 94.78, SD ⫽ 27.66) was lower than the low estimate in the AB_CD and CD_AB conditions (n ⫽ 167, M ⫽ 112.11, SD ⫽ 50.61), F(1, 236) ⫽ 7.46, p ⬍ .007, d ⫽ 0.384. The low estimates in the AB_CD and CD_AB conditions did not significantly differ, p ⬎ .87. Exemplar-based measure of central tendency. One participant was excluded from the analyses for judging all of the exemplars as typical of Group B. Typicality judgments for the remaining 238 participants showed that the learning conditions significantly affected the average of the scholastic abilities judged typical of Group B, F(2, 235) ⫽ 25.49, p ⬍ .0001. A contrast indicated that the mean of the abilities rated as typical of Group B in the ABCD condition (n ⫽ 72, M ⫽ 173.04, SD ⫽ 26.05) was lower than the corresponding mean of the AB_CD and CD_AB conditions (n ⫽ 166, M ⫽ 229.11, SD ⫽ 64.18), F(1, 235) ⫽ 50.89, p ⬍ .0001, d ⫽ 1.006. The mean in the AB_CD and CD_AB conditions did not significantly differ, p ⬎ .97. Exemplar-based measure of dispersion. Because the exemplars in the typicality task were randomly sampled from a uniform distribution, the stimuli in Study 4 were not evenly distributed across the stimulus space. Thus, the number of exemplars classified as typical of Group B was not a direct measure of dispersion as it was in Studies 1 through 3. Instead, the standard deviation of the scholastic abilities that each participant judged as typical of Group B was analyzed. The standard deviation of those abilities

Table 4 Perceived Group B Characteristics by Condition for Different Measures of Central Tendency (Study 4) ABCD

AB_CD

CD_AB

Variable

M

SD

M

SD

M

SD

Contrast d

Group average estimates Group high estimates Group low estimates M judged typical SD judged typical Maximum judged typical Minimum judged typical

135.11 179.00 94.78 173.04 41.76 267.11 96.64

57.36 43.01 27.65 26.05 14.16 62.98 29.70

164.03 240.29 112.70 228.99 59.64 350.05 114.26

71.52 93.32 56.37 68.58 16.88 63.16 60.54

157.51 256.49 111.60 229.21 62.02 351.33 105.53

67.44 92.02 45.42 60.50 17.27 65.99 43.41

0.38 0.86 0.38 1.01 1.17 1.30 0.28

Note. All trait measures are in screen pixels. Contrast d is the effect size for the ABCD versus AB_CD and CD_AB contrast, all significant at p ⬍ .01.

CLASSIFICATION, CONTRAST, AND ASSIMILATION

rated as typical of Group B was significantly affected by the learning condition, F(2, 235) ⫽ 35.28, p ⬍ .0001. A contrast indicated that the standard deviation of the abilities rated as typical of Group B in the ABCD condition (n ⫽ 72, M ⫽ 41.76, SD ⫽ 14.16) was lower than the corresponding standard deviation for the AB_CD and CD_AB conditions (n ⫽ 166, M ⫽ 60.91, SD ⫽ 14.16), F(1, 235) ⫽ 68.94, p ⬍ .0001, d ⫽ 1.174. The standard deviation of the abilities rated as typical in the AB_CD and CD_AB conditions did not significantly differ, p ⬎ .34. High and low assessments in the exemplar-based measure. The high and low assessments were also analyzed because the expectation was that assessments of the high end of the Group B distributions would be affected by the learning condition but that assessments of the low end would not. Partially confirming predictions, the highest ability rated as typical of Group B varied with learning condition, F(2, 235) ⫽ 42.63, p ⬍ .0001. A contrast indicated that the highest ability rated as typical of Group B in the ABCD condition (n ⫽ 72, M ⫽ 267.11, SD ⫽ 62.97) was lower than the highest ability rated as typical in the AB_CD and CD_AB conditions (n ⫽ 166, M ⫽ 350.73, SD ⫽ 62.97), F(1, 235) ⫽ 85.02, p ⬍ .0001, d ⫽ 1.302. The highest abilities rated as typical in the AB_CD and CD_AB conditions did not significantly differ, p ⬎ .89. The lowest ability rated as typical of Group B was marginally affected by the learning condition, F(2, 235) ⫽ 2.68, p ⬍ .07. Unexpectedly, however, a contrast indicated that the lowest ability rated as typical of Group B in the ABCD condition (n ⫽ 72, M ⫽ 96.64, SD ⫽ 29.70) was lower than the lowest ability rated as typical in the AB_CD and CD_AB conditions (n ⫽ 166, M ⫽ 109.58, SD ⫽ 52.08), F(1, 235) ⫽ 4.08, p ⬍ .04, d ⫽ .277. The lowest abilities rated as typical in the AB_CD and CD_AB conditions did not significantly differ, p ⬎ .22.

Discussion In Study 4, not only did participants learn about the same stimuli from the target group (Group B), but they also learned about the same stimuli from comparison Groups A, C, and D. Despite this equivalence of presented stimuli, participants’ assessments of Group B attributes differed greatly depending on whether they categorized Group B in direct comparison with Group C or not. An optimal responder comparing Groups B and C would misclassify the most able Group B members as belonging to Group C. Consequently, optimizing categorization of Groups B versus C (ABCD condition) was expected to lead to assessments of Group B as relatively less scholastically able as compared with conditions in which no such misclassification of the most able Group B members occurred (AB_CD and CD_AB). As predicted, when Group B was directly categorized in comparison to Group C, Group B was seen as both less scholastically able and less variable in scholastic ability. Also as expected, the difference in variability was largely due to differences in assessments of the most able Group B member (Cohen’s d ⫽ 1.30) rather than to differences in assessments of the least able Group B member (Cohen’s d ⫽ 0.28). Study 4 clearly shows that it is not just the relative likelihood that affects assessments of category attributes, but also the relative likelihood of those categories with which the target category is compared when optimizing classification accuracy. This is consis-

417

tent with the view that misclassification during categorization learning contributes to between-categories contrast and withincategory assimilation.

General Discussion The claim of the present research is that optimizing classification accuracy can produce systematic misclassifications that contribute to both between-categories contrast and within-category assimilation. Consistent with this hypothesis, four manipulations that altered the behavior of an optimal classifier produced predicted shifts in perceivers’ assessments of category attributes. In Study 1, the direction of overlap with the comparison category systematically altered whether (a) the most friendly and least intelligent members or (b) the least friendly and most intelligent members of the target category were optimally misclassified. As predicted, this manipulation altered perceivers’ assessments of the target category’s attributes in a manner that underutilized optimally misclassified stimuli and exaggerated between-groups differences. In Study 2, increased overlap with the comparison category increased the portion of the target category that was optimally misclassified. The result was a corresponding increase in both between-categories contrast and within-category assimilation. Study 3 demonstrated the predicted effect of relative base rate. Optimal categorization is achieved by shifting the categorization boundary closer to the minority group mean and further from the majority group mean, producing increased misclassification of those minority group members that are most similar to the majority group. As predicted, this shift was accompanied by an increased bias in beliefs about minority group central tendency and increased assessments of minority group homogeneity. In Study 4, betweencategories overlap that involved optimized misclassification produced different assessments of group attributes than did overlap that did not involve optimized misclassification. As expected, the direction of this effect supported the contention that optimally misclassified stimuli have attenuated impact on assessments of group attributes. It is interesting that Studies 2 through 4 consistently and simultaneously produced both between-categories contrast and withincategory assimilation on categorization-relevant dimensions with effect sizes that were moderate to large. These findings stand in contrast to prior work that has suggested the simultaneous occurrence of between-categories contrast and within-category assimilation is rare. Unlike the few studies that have reported simultaneous contrast and assimilation (Goldstone et al., 2001; Livingston & Andrews, 2005), our studies show both effects using a single measure. Presumably, obtaining both effects on a single measure provides somewhat stronger evidence that a single mechanism contributes to both effects than would be the case if each effect was obtained with a different measure. In addition, the present results clearly confirm that categorization can, indeed, bias assessments of category attributes. In fact, highly accurate categorization learning can lead to very misguided views about what a category is like.

Comparisons to Prior Work There are three notable differences between the present work and prior work on between-categories contrast and within-category

418

QUELLER, SCHELL, AND MASON

assimilation. First, in terms of theory, we suggest that systematic misclassification that boosts overall categorization accuracy can lead to predictable distortions in beliefs about what a category is like. Various mechanisms could account for why this might occur. Some possibilities include (a) storing optimally misclassified exemplars with the wrong category label, (b) storing a truncated categorization response region as the representation of the category, (c) storing only the decision bound that optimizes categorization accuracy, (d) weakening (or strengthening) connections between the optimally misclassified stimuli and the correct (or incorrect) category label, or even (e) weighting the optimally misclassified stimuli less at the time of judgment. Clearly, it is beyond the scope of this article to differentiate between these representational assumptions. The key point here is that a variety of theoretical instantiations could all lead to the same previously uninvestigated claim that, ironically, accurate categorization can lead to an outcome of strong and systematic biases in beliefs about categories. Second, because the premise was that biased beliefs ensue to the extent that systematic misclassification occurs, the category structures in the present studies differed markedly from those used in much of the prior work on between-categories contrast and withincategory assimilation. Specifically, the current studies used large, normally distributed categories that overlapped on continuously valued attribute dimensions. Although this is a decided departure from prior work (Corneille & Judd, 1999; Corneille et al., 2002; Eiser, 1971; Goldstone, 1994, 1996; Krueger & Clement, 1994; Krueger et al., 1989; Livingston et al., 1998; McGarty & Penny, 1988; Medin, Wattenmaker, & Hampson, 1987; Rosch & Mervis, 1975; Tajfel & Wilkes, 1963), it is a realistic assumption about what categories are often like in the real world. Third, in contrast to much social psychological work, motivated biases in group perception are minimized in the present work. This was done by choosing groups to which the perceiver did not belong and about which the perceiver had no preexisting expectations. Consequently, the perceiver could not satisfy competing assimilative and distinctiveness needs (Brewer, Dull, & Lui, 1981), could not satisfy self-enhancing needs (Mullen et al., 1992), and had no need to be consistent with prior expectations (Krueger et al., 1989). Obviously, other researchers have demonstrated betweencategories contrast and within-category assimilation in the absence of any overlap between categories. A variety of mechanisms have been proposed in the literature that might account for contrast and assimilation in these cases. As mentioned in the previous paragraph, motivated responding might produce between-categories contrast and within-category assimilation, even in the absence of intercategory overlap. Perceptual effects also contribute to between-categories contrast and within-category assimilation, even in the absence of category overlap (Corneille & Judd, 1999; Goldstone, 1994, 1995; Livingston et al., 1998.) There seem to be three main explanations for how perception might be altered as a function of category membership. One is that stimuli that have a common category label seem more similar to one another than the same stimuli with different category labels or with no category labels (Tversky, 1977). This type of perceptual effect explains within-category assimilation. Another argument is that increased attention to a categorization-relevant dimension produces “expansion” along

that dimension so that stimuli seem relatively more different from one another along that dimension (Goldstone, 1994; Nosofsky, 1986). This could lead to intercategory contrast, but it should also lead to the opposite of within-category assimilation. A third mechanism is expansion of the categorization-relevant dimension that is localized at the intercategory boundary. By this account, stimuli near the boundary are easier to discriminate from close neighbors than are stimuli far from the boundary. Note that the label effect and a global expansion of the relevant dimension compete with regard to within-category homogeneity. Although attention to the categorization-relevant dimension produces enhanced discriminability across the dimension, the effect of a common label decreases discriminability. Together, these two effects could produce a result that looks like localized expansion of a dimension at the intercategory boundary (as in Goldstone, 1994.) These competing mechanisms might explain some of the difficulty in consistently obtaining both between-categories contrast and withincategory assimilation. Variables that might affect the trade-off between these mechanisms include the strength of similarity conveyed by the category labels (A or B might convey less similarity than extrovert or introvert) and the difficulty of the categorization (more attention to the categorization-relevant dimension is required for more difficult categorization tasks). Our misclassification explanation for why intercategory contrast and within-category assimilation occur does not negate the importance of these prior contributions but, instead, adds to them. The misclassification argument (a) suggests an additional mechanism that could lead to assimilation and contrast, (b) seems reasonable given the prevalence of overlapping categories in the world and the relative dearth of overlapping categories in the literature on contrast and assimilation, and (c) can produce very large and coincident between-categories contrast and within-category assimilation effects. In fact, the effects of optimized misclassification are potentially very large and may, in many instances, override any perceptually driven effects. Turning to the realm of social psychology, an interesting comparison can be made between our work and the inclusion/exclusion work of Bless and Schwarz (1998). They argued that assessments of a social group can be altered by manipulating whether a member of that group is temporarily classed as belonging to the group or not. In one condition, perceivers rated the Social Democrat party after thinking about the president in his party-unaffiliated status as president. In the other condition, perceivers rated the Social Democrat party after thinking about the president in his prior role as a long-standing member of the Social Democrat party. They found that perceivers who thought about the target in his partyunaffiliated, presidential role did not include the president’s attributes in their assessment of the party’s attributes. Thus, classifying the target as president rather than party member led to the exclusion of the target’s attributes when judging what the party was like. This sounds quite similar to our proposal that misclassified members’ attributes underinfluence beliefs about their group. However, two differences between these approaches are noteworthy. First, our optimal misclassification claims are more quantitatively specified and are integrated within established classification learning and modeling frameworks. Second, and more important, our optimal misclassification claim is driven by biased encoding rather than by context at the time of judgment. If, for example, calling a Group A member a “B” leads to a weakened

CLASSIFICATION, CONTRAST, AND ASSIMILATION

association in memory with the “A” label and a strengthened association with the “B” label, these associations will persist beyond the context of a particular task. Thus, optimal misclassification may create biases that are relatively stable and long lasting. As mentioned earlier, additional learning will not alleviate these biases if the relative likelihoods remain unchanged. The contribution of the systematic misclassification mechanism to biased assessments of category attributes will, however, fluctuate as a function of any factor that alters the optimal classification boundary. Study 4 demonstrated that its contributions also fluctuated as a function of the observer’s motivation to accurately classify the stimuli.

Implications for Stereotyping Research Certainly the results of these studies have implications for how biased stereotypes might develop. Even in the absence of any motivation to distance beliefs about the attributes of one’s own group from those of another group, biased beliefs can result as a simple function of category learning (Allport, 1954; Krueger et al., 1989). Our knowledge about how this might happen is expanded by the present research. If the attributes of two groups overlap and perceivers rely on that attribute information to assign individuals to groups, then perceivers will think the groups are more different than they really are and that each group is more homogeneous than it really is. More alarming, the present research suggests that accurate category learning might lead to perceptions of numerical minority groups that are highly biased. Any small differences between groups will be exaggerated, and this will largely come at the expense of accuracy in perceptions of the minority group. As mentioned earlier, these biases are likely quite difficult to eliminate: Additional learning does not change the relative likelihoods of the two groups, and consequently, bias will persist. It is important to note that the overlapping characteristics do not have to be evaluative—they could be a simple physical trait such as nose length or lip size. Even if the characteristics are valenced, a minority group could be misperceived as either better than they actually are (e.g., Asians and math) or worse than they actually are (e.g., African Americans and scholastic ability). Speculatively, one could also consider threat as a negative payoff and make predictions about biased stereotypes within the present framework. If one group is actually more threatening than another, the cost of misclassifying a threatening group member as belonging to the unthreatening group is higher than the cost of misclassifying an unthreatening group member as belonging to the threatening group. This discrepancy in negative payoffs should shift the classification boundary toward the mean of the less threatening group (Maddox & Bohil, 2003). The result would be relatively accurate beliefs about the threatening group but quite biased beliefs about the less threatening group. To use a very loaded example, if African Americans actually were more violent than European Americans, perceptions of African Americans’ levels of violence might be relatively accurate, but beliefs about European Americans’ levels of violence would be underestimated. Possibly more interesting is the idea that other attributes that differ between threatening and unthreatening groups might also be biased by the shift in the classification boundary that occurs as a function of threat. For example, if African Americans actually

419

were more violent than European Americans, beliefs about African American skin tone might be relatively accurate, but beliefs about European American skin tone might be lily white. The extent to which optimizing classification accuracy influences beliefs about social groups in the real world remains an open question. It might be argued that one does not learn social categories by classifying a stimulus and then receiving corrective feedback. Although people often do not have immediate trial-bytrial classification feedback, we do manage to develop expectations about the relative likelihoods of different groups, given certain attribute dimensions. Suppose I know, for example, that men who are as effeminate as my friend, Ray, are usually gay. If I encounter a new man who is as effeminate as Ray, I will be more likely to be accurate if I class him as gay than if I class him as straight. I am not likely to ask him outright about his sexual orientation, so I will not get feedback. Instead, I will continue to think of him as gay, regardless of whether this is an accurate or an inaccurate classification. If my classification happens to be correct, I will reaffirm any differences between gay and straight men on the dimension of effeminacy. If it happens to be wrong, I will erroneously add to my repository of beliefs that gay men are effeminate and, simultaneously, erroneously add to my beliefs that straight men are not effeminate. Through this type of optimization of classification on the basis of prior beliefs about likelihoods, I will develop biased impressions that gays are more effeminate than they really are and that straights are less effeminate than they really are. This resonates with prior work on expectancy-based illusory correlation (Hamilton & Rose, 1980) but adds a framework of categorization involving optimized misclassification that reinforces those expectancies. As mentioned earlier, the misclassification framework provides hints about what representational or decisional processes might be involved in generating this bias. The present studies suggest that attempts to maximize classification accuracy for large, overlapping groups can lead to betweencategories contrast and within-category assimilation. We argue that these effects are not driven by altered abilities to perceptually discriminate (e.g., compression and expansion effects) but, rather, are a byproduct of misclassification when attempting to maximize classification accuracy. Even presuming perceivers have accurate beliefs about the likelihoods of different groups, biased beliefs about attributes should persist to the extent that the perceivers engage in optimally classifying individuals to different groups. These biases will only be exacerbated in the absence of corrective feedback following an incorrect classification.

Categorization and Representation Categorization training involves learning to maximize classification accuracy by assigning stimuli to the correct categories as often as possible. Ashby and colleagues (Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Ashby & Casale, 2003) have suggested that categorization can be optimized in some cases by ascertaining the explicit, verbalizable rule that best separates the categories (see also Nosofsky & Palmeri, 1998). They suggested that such rules are learned via the anterior cingulate gyrus and prefrontal cortex and that people can describe these rules as the basis of their categorization behavior. Based on the hypothesized brain areas involved, they proposed and demonstrated that categorization tasks that can be optimized through explicit rule learning are compro-

420

QUELLER, SCHELL, AND MASON

mised by including a coincident task that taps the prefrontal cortex, are not affected by the amount of feedback delay, do not suffer when response keys are switched, and can continue with some success in the absence of feedback. Obviously, such verbal rules can produce optimal categorization for those categories that can be best distinguished by setting a criterion along a single dimension. It is important to note that only knowledge of this optimal rule is required to maximize classification accuracy. If the rule was all that was stored about the categories, perceivers would certainly ignore the optimally misclassified portion of each category. However, such a rule-based representation would provide perceivers with little basis for estimating the group central tendency and dispersion. Perceivers in Study 4’s AB_CD and CD_AB conditions seemed to have exactly this kind of inaccuracy in their typicality ratings. In fact, these perceivers rated stimuli as being typical of Group B that were up to 5.5 standard deviations away from the Group B mean! In Studies 1 through 3, in contrast, optimal classification was achieved if the information from both attribute dimensions was integrated in a manner that could not be easily verbalized. Ashby and colleagues (Ashby et al., 1998; Ashby & Casale, 2003) claimed that such information-integration category structures lead to implicit classification learning in which perceivers learn via a reward-mediated pathway in the striatum (and, more specifically for visual stimuli, in the tail of the caudate nucleus). This type of learning does not allow perceivers to report the basis of their categorization judgments, is not interfered with by a coincident task involving prefrontal cortex, is degraded by feedback delay, suffers if the response keys are switched after training, and is horribly degraded by lack of supervision during training. Information-integration category learning might be represented in exemplar form (Nosofsky, 1988). On the basis of neural arguments, however, Ashby and colleagues (e.g., Ashby et al., 1998) have suggested two representational possibilities that could account for implicit learning of information-integration category structures. One possibility is that the decision rule itself is represented. A striatal network determines on which side of the decision bound a particular stimulus falls and suggests the appropriate categorical response. Another possibility is that repeated exposure produces learning that maps category labels to sets of neurons that correspond to regions of the perceptual space. In this case, each category label becomes associated with different subregions of the perceptual space. These possibilities have different implications for how perceivers assess category attributes. First, although exemplars might account for underutilization of stimuli in overlapping portions of the category distribution that are similar to the contrast category, exemplar storage provides no mechanism that would produce estimates that are more extreme than the stimulus values presented for a category. Yet perceivers in Studies 1 through 3 claimed that stimuli that were well outside the region of space that corresponded to the target category (over 4.5 standard deviations from the mean) were actually typical of the category! A parallel argument holds for the idea that the implicit system maps category responses onto subregions of perceptual space. Ashby and colleagues (Ashby et al., 1998; Ashby & Casale, 2003) argued that as a stimulus is presented and a reward follows, dopamine release increases long-term potentiation in the medium spiny cells of the caudate nucleus (Ashby & Casale, 2003). The reward occurs only

when an accurate response is made to a presented stimulus. Therefore, there should be no mapping between the category label and regions of the stimulus space where no stimuli were ever seen. Hence, assessment of the category as having attributes more extreme than those presented for the category seems at odds with this type of representation. The present results suggest that perceivers describe stimuli as typical of the category if the stimuli fall on the correct side of the decision bound. This holds true even for stimuli that are quite unlike any of the target stimuli presented during training. The representation that seems to best fit with this result is that a decision bound is stored rather than exemplars or regions of the stimulus space. Thus, the present results support the idea that the implicit system stores a categorization rule, albeit a nonverbal one (i.e., a mathematical discriminant function). As with rule-based learning in the prefrontal cortex, the rule provides little detail regarding the specifics of the attributes of the category members.

Limitations Studies 1 through 3 had two properties that limited their conclusiveness regarding representation. Consequently, the present data are only suggestive of a rule-based representation for the implicit neural system. First, although the category structures of Studies 1 through 3 mimic those reviewed in Ashby and Casale (2003) as requiring information integration, the fact that both dimensions were depicted as bar heights in these studies leaves open the possibility that perceivers used a verbal rule to optimize classification accuracy (e.g., “If the intelligence bar is more than 55 pixels taller than the friendliness bar, respond B”). Second, the optimal conjoint responder was able to achieve accuracies that exceeded the learning criterion. It is important to note, however, that the main conclusions regarding between-categories contrast and within-category assimilation are not challenged by these limitations. Like the optimal classifier, the optimal verbal rule and the optimal conjoint classifier would systematically misclassify those stimuli most similar to the comparison category. Thus, biased assessments of category attributes would still be expected if perceivers used these alternative categorization strategies. Future research should use (a) stimuli that vary on undeniably separable dimensions and (b) stimuli for which a conjoint rule cannot provide a feasible alternative explanation. Such studies could more accurately determine whether perceivers assess highly unusual stimuli as belonging to the category as long as they fit the rule defined by the optimal discriminant function. Such studies would help to clarify the representational underpinnings of categories that require information integration to optimize classification accuracy. In addition, it could be argued that in the real world, perceivers trying to accurately classify every individual could search for additional attributes to delineate their group membership. It is true that if all of the targets are correctly classified, there would be no misclassification-induced bias. That is, if you gave an optimal classifier information that uniquely identified each individual, that person would be able to accurately classify them all. Certainly idiosyncratic, distinctive information abounds in human targets. So, for example, Seth may be a straight guy who is similar to gay men in many ways, but he has a unique mole on his left lower chin

CLASSIFICATION, CONTRAST, AND ASSIMILATION

that could serve my ability to see him as an exception within my classification scheme. However, even if such information were available, real constraints on memory and attention could limit our ability as perceivers to effectively maintain and utilize that information. It is interesting to note that studies on rule-plus-exception classification are often conducted using small numbers of stimuli and three to six discrete attribute dimensions (e.g., RULEX; Nosofsky, Palmeri, & McKinley, 1994). Whether perceivers can keep in mind many different types of exceptions to a classification scheme when the attributes are numerous and continuous and the number of stimuli is high is a question worthy of further research.

Summary The present work makes several contributions to our understanding of category learning. We argued that stimuli that are systematically misclassified play a lesser role in assessments of group characteristics (see also Rothbart & John, 1985). Four studies altered the portion of the target category that was misclassified by the optimal classifier. As predicted, changes in the optimally misclassified stimuli led to corresponding changes in assessments of category attributes. Across studies, as the portion of the target category that was optimally misclassified increased, the between-categories contrast and within-category assimilation increased. The fact that these effects were both simultaneous and large is noteworthy because a number of researchers have failed to produce both effects simultaneously, despite Tajfel and Wilkes’s (1963) original expectations that these effects would co-occur (Goldstone, 1994; Livingston et al., 1998; Tajfel & Wilkes, 1963). It is important to note that perceivers’ assessments of category attributes were quite inaccurate despite— or maybe more precisely because of— highly accurate categorization responses. Learning that optimizes classification accuracy does not necessarily provide accurate knowledge about the categories more generally. Thus, what people learn by optimizing the ability to assign items to classes may be less than optimal in guiding interactions with members of those classes.

References Allport, G. W. (1954). The nature of prejudice. Oxford, England: AddisonWesley. Ashby, F. G. (1992). Multidimensional models of perception and cognition. Hillsdale, NJ: Erlbaum. Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in category learning. Psychological Review, 105, 442– 481. Ashby, F. G., & Casale, M. B. (2003). The cognitive neuroscience of implicit category learning. In L. Jimenez (Ed.), Attention and implicit learning. Advances in consciousness research (Vol. 48, pp. 109 –141). Amsterdam, the Netherlands: Benjamins. Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 33–53. Bless, H., & Schwarz, N. (1998). Context effects in political judgment: Assimilation and contrast as a function of categorization processes. European Journal of Social Psychology, 28, 159 –172. Brewer, M. B., Dull, V., & Lui, L. (1981). Perceptions of the elderly: Stereotypes as prototypes. Journal of Personality and Social Psychology, 41, 656 – 670. Corneille, O., & Judd, C. M. (1999). Accentuation and sensitization effects

421

in the categorization of multifaceted stimuli. Journal of Personality and Social Psychology, 77, 927–941. Corneille, O., Klein, O., Lambert, S., & Judd, C. M. (2002). On the role of familiarity with units of measurement in categorical accentuation: Tajfel and Wilkes (1963) revisited and replicated. Psychological Science, 13, 380 –383. Eiser, J. R. (1971). Enhancement of contrast in the absolute judgment of attitude statements. Journal of Personality and Social Psychology, 17, 1–10. Goldstone, R. L. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123, 178 – 200. Goldstone, R. L. (1995). Effects of categorization on color perception. Psychological Science, 6, 298 –304. Goldstone, R. L. (1996). Isolated and interrelated concepts. Memory & Cognition, 24, 608 – 628. Goldstone, R. L., Lippa, Y., & Shiffrin, R. M. (2001). Altering object representations through category learning. Cognition, 78, 27– 43. Hamilton, D. L., & Rose, T. L. (1980). Illusory correlation and the maintenance of stereotypic beliefs. Journal of Personality and Social Psychology, 39, 832– 845. Krueger, J., & Clement, R. W. (1994). Memory-based judgments about multiple categories: A revision and extension of Tajfel’s accentuation theory. Journal of Personality and Social Psychology, 67, 35– 47. Krueger, J., Rothbart, M., & Sriram, N. (1989). Category learning and change: Differences in sensitivity to information that enhances or reduces intercategory distinctions. Journal of Personality and Social Psychology, 56, 866 – 875. Linville, P. W., Fischer, G. W., & Salovey, P. (1989). Perceived distributions of the characteristics of in-group and out-group members: Empirical evidence and a computer simulation. Journal of Personality and Social Psychology, 57, 165–188. Livingston, K. R., & Andrews, J. K. (2005). Evidence for an ageindependent process in category learning. Developmental Science, 8, 319 –325. Livingston, K. R., Andrews, J. K., & Harnad, S. (1998). Categorical perception effects induced by category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 24, 732–753. Maddox, W. T. (1995). Base-rate effects in multidimensional perceptual categorization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 288 –301. Maddox, W. T., & Bohil, C. J. (2003). A theoretical framework for understanding the simultaneous base-rate and payoff manipulations on decision criterion learning in perceptual categorization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 29, 307–320. McGarty, C., & Penny, R. E. (1988). Categorization, accentuation and social judgement. British Journal of Social Psychology, 27, 147–157. Medin, D. L., Wattenmaker, W. D., & Hampson, S. E. (1987). Family resemblance, conceptual cohesiveness, and category construction. Cognitive Psychology, 19, 242–279. Mullen, B., Brown, R., & Smith, C. (1992). Ingroup bias as a function of salience, relevance, and status: An integration. European Journal of Social Psychology, 22, 103–122. Mussweiler, T. (2003). Comparison processes in social judgment: Mechanisms and consequences. Psychological Review, 110, 472– 489. Nosofsky, R. M. (1986). Attention, similarity, and the identificationcategorization relationship. Journal of Experimental Psychology: General, 115, 39 –57. Nosofsky, R. M. (1988). Exemplar-based accounts of relations between classification, recognition, and typicality. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 700 –708. Nosofsky, R. M., & Johansen, M. K. (2000). Exemplar-based accounts of “multiple-system” phenomena in perceptual categorization. Psychonomic Bulletin & Review, 7, 375– 402.

422

QUELLER, SCHELL, AND MASON

Nosofsky, R. M., & Palmeri, T. J. (1998). A rule-plus-exception model for classifying objects in continuous-dimension spaces. Psychonomic Bulletin & Review, 5, 345–369. Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plusexception model of classification learning. Psychological Review, 101, 53–79. Pickett, C. L., & Brewer, M. B. (2001). Assimilation and differentiation needs as motivational determinants of perceived in-group and out-group homogeneity. Journal of Experimental Social Psychology, 37, 341–348. Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573– 605. Rothbart, M., & John, O. P. (1985). Social categorization and behavioral episodes: A cognitive analysis of the effects of intergroup contact. Journal of Social Issues, 41(3), 81–104.

Tajfel, H., & Wilkes, A. L. (1963). Classification and quantitative judgement. British Journal of Psychology, 54, 101–114. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352. Tversky, A., & Kahneman, D. (1974, September). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124 –1130. Whitaker, D., Bradley, A., Barrett, B. T., & McGraw, P. V. (2002). Isolation of stimulus characteristics contribution to Weber’s law for position. Vision Research, 42, 1137–1148.

Received May 11, 2005 Revision received November 24, 2005 Accepted December 5, 2005 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 423– 435

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.423

Resisting Persuasion by the Skin of One’s Teeth: The Hidden Success of Resisted Persuasive Messages Zakary L. Tormala and Joshua J. Clarkson

Richard E. Petty

Indiana University

Ohio State University

Recent research has suggested that when people resist persuasion they can perceive this resistance and, under specifiable conditions, become more certain of their initial attitudes (e.g., Z. L. Tormala & R. E. Petty, 2002). Within the same metacognitive framework, the present research provides evidence for the opposite phenomenon—that is, when people resist persuasion, they sometimes become less certain of their initial attitudes. Four experiments demonstrate that when people perceive that they have done a poor job resisting persuasion (e.g., they believe they generated weak arguments against a persuasive message), they lose attitude certainty, show reduced attitude– behavioral intention correspondence, and become more vulnerable to subsequent persuasive attacks. These findings suggest that resisted persuasive attacks can sometimes have a hidden yet important success by reducing the strength of the target attitude. Keywords: attitudes, persuasion, attitude strength, metacognition, certainty

resist persuasion—meaning their attitude has not moved in valence or extremity following a persuasive attack—there has been literally no change. In other words, it has been assumed that when a persuasive attack is resisted, that attack has made no impact on the target attitude. Recent research has contested this notion, suggesting that even when resisted persuasive attacks do not change the valence or extremity of target attitudes, they sometimes change the certainty with which those attitudes are held (Tormala & Petty, 2002). Attitude certainty refers to the sense of conviction someone has about an attitude (Abelson, 1988) or to the extent to which someone views an attitude as correct, or valid (see Gross, Holtz, & Miller, 1995). The reason researchers have been interested in attitude certainty is that, like other dimensions of attitude strength, it has been associated with a variety of important outcomes. The less certain people are of their attitudes, the less likely those attitudes are to predict behavior (e.g., Fazio & Zanna, 1978), resist persuasive messages (e.g., Babad, Ariav, Rosen, & Salomon, 1987; Bassili, 1996; Wu & Shaffer, 1987), or simply persist over time (e.g., Bassili, 1996). Thus, if resisting persuasive attacks affects attitude certainty, target attitudes might sometimes change in their tendency to predict behavior and/or last over time. In the present research, we explore the impact of resistance to persuasion on attitude certainty from a metacognitive perspective based on people’s perceptions of their own resistance and the conditions under which it occurs.

Individuals are often resistant to persuasion. Whether a teenager continues smoking despite (or because of) her parents’ attempts to curb her habit or a clinically depressed patient is unconvinced by his therapist’s effort to change his view of himself, it is wellestablished that people can be remarkably resistant in the face of persuasive messages. Perhaps because of the pervasiveness of this phenomenon in everyday life, some attitude change researchers over the years have shifted from the traditional focus on successful persuasion to the explicit study of resistance to persuasion—that is, the act or process of defending one’s attitude against persuasive attack (see Knowles & Linn, 2004). In research conducted in this domain, much has been learned. For example, it is now known that people tend to resist persuasion when they are forewarned of someone’s persuasive intent (e.g., Hass & Grady, 1975; Papageorgis, 1968), when they feel that a persuasive message threatens their personal freedom (Brehm, 1966), and when their attitudes are particularly strong (Petty & Krosnick, 1995). We have also learned that there are a number of distinct mechanisms through which resistance can occur. For example, people can counterargue persuasive messages (e.g., Brock, 1967; Petty & Cacioppo, 1979a), bolster their initial attitudes (e.g., Lewan & Stotland, 1961; Lydon, Zanna, & Ross, 1988), or derogate the source of a persuasive message (e.g., Tannenbaum, Macauley, & Norris, 1966). Of importance, though, virtually all of the research in this area has been guided by a fundamental assumption that when people

Metacognition and Resistance to Persuasion

Zakary L. Tormala and Joshua J. Clarkson, Department of Psychological and Brain Science, Indiana University; Richard E. Petty, Department of Psychology, Ohio State University. We thank members of the 2004 –2005 Social Seminar Series at Indiana University and the 2002–2003 Group on Attitudes and Persuasion at Ohio State University for helpful feedback on this research at earlier stages. Correspondence concerning this article should be addressed to Zakary L. Tormala, Department of Psychological and Brain Science, Indiana University, 1101 East 10th Street, Bloomington, IN 47405. E-mail: [email protected]

Metacognition essentially refers to people’s thoughts about and perceptions of their own cognitive states and processes (for reviews, see Bless & Forgas, 2000; Jost, Kruglanski, & Nelson, 1998; Petty, Brin˜ol, Tormala, & Wegener, in press; Yzerbyt, Lories, & Dardenne, 1998). In recent work exploring the role of metacognition in resistance, we (Tormala & Petty, 2002) proposed that when people resist persuasion they can perceive this resistance, reflect on it, and form specifiable attribution-like inferences 423

TORMALA, CLARKSON, AND PETTY

424

about their own attitudes. These inferences, in turn, affect attitude certainty. In an initial series of experiments, we gave participants a counterattitudinal persuasive message, which we instructed them to counterargue. Under some conditions, when participants resisted the message they became more certain of their attitudes than they were to begin with, and their attitudes became more predictive of behavioral intentions and more resistant to a subsequent persuasive attack. It is important to note, though, that these effects were obtained only when participants perceived that they had resisted and perceived that the message they resisted was strong. When participants perceived that they had not resisted or perceived that they had resisted a weak message, attitude certainty was unchanged. Follow-up research indicated that source credibility moderates this certainty increase in much the same way. Just as people become more certain of their attitudes when they believe they resisted a strong message, so too do they become more certain when they believe they resisted a highly credible source (Tormala & Petty, 2004b). Furthermore, these effects are generally confined to high-elaboration, or high-thought, situations (Tormala & Petty, 2004a). In essence, our metacognitive point of view suggests that resisting persuasion only increases attitude certainty when people view their resistance as diagnostic of valid attitudes. When a strong message (or high-credibility source) is resisted, resistance presumably is considered diagnostic of validity. When a weak message (or low-credibility source) is resisted, resistance is less diagnostic of validity because ambiguity remains as to what would have happened in the face of a stronger message (or more credible source). Consistent with our metacognitive framework, we (Tormala & Petty, 2002, 2004a, 2004b) actually presented all participants with the same persuasive message but simply labeled it as strong or weak (or as coming from a high- or low-credibility source). Moreover, participants’ counterarguments were analyzed in terms of number, quality, and general qualitative focus, and there were no differences along any of these dimensions in any of the experiments. Thus, people resisted the same message in the same way and to the same degree but reached very different conclusions about their attitudes (which led to different levels of attitude certainty), depending on their perceptions of their resistance and the situation in which it occurred. People became more certain of their initial attitudes when they were more impressed by their own resistance.

The Present Research In contrast to the assumptions of prior resistance research, then, our earlier (Tormala & Petty, 2002, 2004a, 2004b) studies revealed that resisted persuasive messages can have an important, though previously hidden, impact on people’s attitudes. Yet all of this research has focused on identifying the conditions under which attitude certainty is increased by resistance (see also McGuire, 1964). In the present research we seek to provide evidence for the opposite phenomenon. Consistent with our earlier framework, we take a metacognitive perspective and argue that when people resist persuasion, they can perceive this resistance, reflect on it, and form specifiable inferences about their attitudes that have implications for attitude certainty. We expand this framework, however, by exploring the conditions under which, and the mechanism through which, people can lose attitude certainty after resisting persuasion.

Why would certainty ever decrease when people resist persuasion? As a starting point, we suggest that after people receive and resist a persuasive message, they can think about and assess their own resistance performance. This assessment might yield a favorable appraisal when people think they did a good job resisting—for example, when they based their resistance on valid, or cogent, counterarguments. Alternatively, this assessment might yield an unfavorable appraisal when people think they did a bad job resisting—for example, when they based their resistance on invalid or specious counterarguments. Depending on people’s appraisals, or evaluations, of their own resistance performance, we would expect to observe different levels of attitude certainty following initial resistance. Of particular relevance to the present research, unfavorable appraisals of one’s own resistance might lead to doubts about an attitude. Furthermore, and especially important to the metacognitive perspective, we posit that this effect can occur in the absence of any differences in one’s actual resistance experience or performance. In other words, we submit that people can resist the same message in the same way and with the same objective success but be less certain of their attitudes when their postmessage assessment of their resistance performance leads them to think they did a bad job resisting. On the basis of what we know about attitude certainty, this effect would suggest that after initial resistance people’s attitudes can become less predictive of behavior and less likely to fend off future persuasive attacks (see Gross et al., 1995, for a review). Thus, initial resistance might sometimes mask a hidden success with respect to target attitudes.

Experiment 1 The goal of Experiment 1 was to provide an initial test of our basic hypothesis that having doubts about one’s resistance performance can undermine attitude certainty. In this study, we manipulated whether participants were able to fully communicate or articulate their arguments against a message after resisting that message. To induce the motivation to resist, we forewarned all participants at the outset of the study that they would receive a persuasive message that was personally relevant and counterattitudinal (Papageorgis, 1968; Petty & Cacioppo, 1979a, 1979b). To control the mechanism of resistance, we explicitly instructed all participants to generate counterarguments (Killeya & Johnson, 1998; Tormala & Petty, 2002). Our primary hypothesis was that all participants would resist persuasion but hold their postresistance attitudes with varying levels of certainty depending on condition. When participants were able to fully articulate their counterarguments after the message, we expected them to have relatively high attitude certainty. When participants were unable to fully articulate their counterarguments, we expected them to have lower levels of attitude certainty. Such results would suggest that postmessage assessments of one’s own resistance, including whether one was able to express the basis for that resistance, affect attitude certainty. A secondary goal in this study was to examine one of the well-known consequences of attitude certainty. Considerable research has demonstrated that high-certainty attitudes are more predictive of behavior than lowcertainty attitudes (e.g., Fazio & Zanna, 1978; Tormala & Petty, 2002). In Experiment 1, then, we assessed the implications of certainty effects for attitude– behavior correspondence. For exper-

RESISTANCE TO PERSUASION

imental efficiency, we measured behavioral intentions, which are the single best predictor of actual behavior (e.g., Fishbein & Ajzen, 1975). It is important to emphasize that although we expected certainty to decrease when people could not fully articulate their counterarguments, we did not expect attitude certainty to actually increase when participants fully articulated their counterarguments. Although past research has revealed that resistance sometimes increases attitude certainty, this effect has been confined to situations in which people resisted high-credibility sources (Tormala & Petty, 2004b) or messages labeled as strong (Tormala & Petty, 2002). In the present experiment, we made no reference to message strength or source credibility. Moreover, the message we used was moderate in strength, so participants’ spontaneous assessments of message strength or source credibility would presumably be too low to foster increases in certainty.

Method Participants and Procedure Participants were 159 undergraduates from Indiana University who received partial credit for an introductory psychology course requirement. Each participant was randomly assigned to one of three experimental conditions. All sessions were conducted on computers using MediaLab (Jarvis, 2004) research software. Participants were seated in a room containing six partitioned computer work stations. The experimenter asked participants to read the instructions on the monitors and begin the experiment. At the outset of the experiment, participants were led to believe their university had recently begun to consider the implementation of senior comprehensive exams as a graduation requirement (see Petty & Cacioppo, 1986). Participants were told that all students currently enrolled would have to pass these exams in order to graduate and that failure to pass the exams would mean taking remedial coursework before a degree could be conferred. This policy and the proposal in favor of it were intended to be counterattitudinal for most students. As justification for the experiment, participants were told that we were helping the university’s Board of Trustees assess students’ reactions to this policy. Along these lines, participants were told they would be presented with a summary of the proposal that had been written in favor of comprehensive exams, after which they would be asked to report their attitudes and any counterarguments they could generate against the exam policy. To induce counterarguing, we gave participants the following instructions: The University’s Board of Trustees would like to gather all possible arguments that students can raise against the issue. Thus, we would like you to generate negative or unfavorable arguments against the exam policy after you read a summary of it. As you read the information, try to think of your counterarguments against it. Following this introduction, participants were presented with a persuasive message in favor of comprehensive exams. This message contained more detailed versions of the following arguments (adapted from Petty & Cacioppo, 1986): Grades would improve if the exam policy were adopted, implementing the exams would allow the university to take part in a national trend, the average starting salary of graduates would increase, and the exam policy would allow students to compare their scores with students at other universities. A mixture of strong and weak arguments was included so the message would be moderately compelling overall, yet remain open to counterargument. After the message, participants completed the counterargument-listing task described next and responded to the dependent measures.

425

Counterargument Manipulation Participants were randomly assigned to one of three counterargument conditions: the 10-s condition, the 60-s condition, or the control condition. This manipulation, adapted from time-pressure manipulations used in past research (e.g., Kruglanski & Freund, 1983), was designed to affect participants’ ability to fully articulate their arguments against the comprehensive exam policy. In the 10- and 60-s conditions, instructions were as follows: As you were informed at the outset of this session, the Board of Trustees is interested in collecting the arguments students might raise against the comprehensive exam policy. We would now like to receive your thoughts. On the next screen appears the first of 4 boxes you can use to list your arguments against the senior comprehensive exam policy. Please list 4 different arguments against the exams, but enter only one argument per box. Following these instructions, participants were told that the computer program running the experiment would automatically move them to the next screen after a preset time period for each counterargument. The amount of time allotted for each counterargument was ostensibly based on past studies conducted in our laboratory. Participants were led to believe most students could easily finish in the time provided. The purpose of this information was to minimize external attributions for not completing the counterargument task. Following this information, participants received either 60 s or 10 s to list each counterargument. Pretesting indicated that this would be sufficient and insufficient for most participants, respectively. A third group of participants was randomly assigned to a control condition, in which they learned about comprehensive exams at the outset of the experiment but did not receive a persuasive message or any instructions to generate counterarguments. Instead, they read an irrelevant article that was similar in appearance and length to the exam message, after which they proceeded directly to the dependent measures. This condition provided a baseline for determining the direction of any attitude certainty effects as well as a test of whether resistance occurred in the first place. Resistance was indicated by attitudes in the message and counterargument conditions that did not differ from attitudes in the control condition.

Dependent Measures Attitudes. Immediately following the persuasive message and counterarguing procedure (or immediately following the irrelevant article in the control condition), participants reported their attitudes toward the comprehensive exam policy on a series of semantic differential scales ranging from 1 to 9 with the following anchors: bad– good, negative–positive, unfavorable–favorable, against–in favor, harmful– beneficial, and foolish– wise. Higher numbers reflected more favorable attitudes toward comprehensive exams. Internal consistency was high (␣ ⫽ .93), so responses were averaged to form a composite attitude index. Attitude certainty. After reporting attitudes, participants completed the attitude certainty measure. One global item (adapted from past research; Fazio & Zanna, 1978) asked participants how certain they were of their attitudes toward comprehensive exams. Responses were provided on a scale ranging from 1 to 9 and anchored at not certain at all and extremely certain. Behavioral intentions. After the certainty measure, we assessed behavioral intentions. We told participants that in the future we would be recruiting people to write letters to students to inform them of the benefits of the exam policy. Participants were then asked to indicate how many letters they would be willing to write to assist in this endeavor (this measure was adapted from Tormala & Petty, 2004b). Responses were provided on a scale ranging from 1 to 9, with 1 labeled 0 letters, 2 labeled 1–5 letters, and so on, up to 9, which was labeled 36 – 40 letters. Self-reported elaboration. By manipulating counterargument time after exposure to the message, we intended to control for elaboration during

TORMALA, CLARKSON, AND PETTY

426

the message. Nevertheless, it was possible that participants would (retrospectively) feel that their processing of the message was diminished in the 10-s condition. If true, this effect might account for differences in attitude certainty without requiring any metacognitive assessment of one’s resistance performance. To assess perceived elaboration, we asked participants in the 10-s and 60-s conditions to report how deeply they thought about the proposal, how much effort they put into reading the proposal, and how personally involved they felt with the exam issue. Participants responded on scales ranging from 1 to 9, with higher numbers indicating more elaboration. Responses were highly reliable (␣ ⫽ .82), so we averaged them to form a composite index.

Results Attitudes We began by submitting the attitude data to analysis to determine whether participants resisted persuasion equivalently across conditions. The attitude data were submitted to a one-way analysis of variance (ANOVA) with counterargument condition as the independent variable. As illustrated in Table 1, there were no differences in attitudes across conditions (F ⬍ 1).

Attitude Certainty We next submitted the attitude certainty data to the same oneway ANOVA. In contrast to the attitude data, there was a significant effect of counterargument condition on attitude certainty, F(2, 156) ⫽ 6.68, p ⬍ .002. As displayed in Table 1, participants were less certain of their attitudes in the 10-s condition than in the 60-s or control conditions, F(1, 156) ⫽ 13.24, p ⬍ .001, which did not differ from each other (F ⬍ 1).

Behavioral Intentions We then examined the behavioral intention data. To begin with, there were no differences in letter-writing intentions across the control (M ⫽ 2.09, SD ⫽ 1.77), 60-s condition (M ⫽ 1.92, SD ⫽ 1.69), and 10-s condition (M ⫽ 1.70, SD ⫽ 0.97), F ⬍ 1. As shown in Table 1, however, there were differences in attitude– behavioral

Table 1 Attitudes, Attitude Certainty, and Attitude–Behavioral Intention Correspondence as a Function of Counterargument Condition in Experiment 1 Counterargument condition Dependent measure Attitudes M SD Attitude certainty M SD Attitude–behavioral intention correspondence r

Control

10 s

60 s

4.87a 1.84

4.81a 1.39

4.59a 1.61

6.08a 1.75

5.09b 1.71

6.19a 1.62

.35a

.05b

.37a

Note. All scales ranged from 1 to 9. Subscripts should be interpreted within rows only; means with the same subscript do not differ from each other.

intention correspondence across conditions. Attitudes significantly predicted letter-writing intentions in the control condition (r ⫽ .35, p ⬍ .02) and the 60-s condition (r ⫽ .37, p ⬍ .01), in which certainty was relatively high, but not in the 10-s condition (r ⫽ .05, p ⫽ .72), in which certainty was relatively low. Following the certainty pattern, the correlation was lower in 10-s condition than in the 60-s and control conditions (z ⫽ 1.94, p ⫽ .05), which did not differ from each other (z ⫽ 0.12, p ⬎ .90).

Self-Reported Elaboration Finally, we submitted the perceived elaboration index to analysis. Perceived elaboration was equivalent in the 10-s (M ⫽ 5.87, SD ⫽ 1.92) and 60-s (M ⫽ 6.01, SD ⫽ 1.58) conditions (F ⬍ 1). As intended, though, the overall level of elaboration (M ⫽ 5.94, SD ⫽ 1.75) was significantly higher than the midpoint (5) of the elaboration index, t(105) ⫽ 5.55, p ⬍ .001.

Discussion Experiment 1 provided initial evidence for the certainty reduction hypothesis. To begin with, participants resisted persuasion. In neither of the message conditions were attitudes any more favorable than in the control condition, in which no persuasive message was presented. This finding is telling given that both the persuasive message and the attitude measure in this experiment have been used to show evidence of successful persuasion in past research (e.g., Petty, Brin˜ol, & Tormala, 2002). Therefore, null effects on attitudes do not likely reflect a message or measure incapable of showing attitude change. Most germane to the present concerns, people became less certain of their attitudes after resisting persuasion if they were unable to fully articulate their counterarguments—that is, their reasons for resisting. When people were able to fully articulate their counterarguments, they maintained a relatively high degree of attitude certainty. As in past research on attitude certainty, this effect had implications for attitude– behavioral intention correspondence. When certainty was lowered (10-s condition), attitudes became less predictive of behavioral intentions. When certainty was maintained at a higher level (60-s condition), however, attitudes were as good in predicting behavioral intentions as they were in the control condition. Overall, this pattern of results suggests that if people have doubts about their resistance performance after processing a message, they may doubt the attitude they formed in response to that message and, thus, be less reliant on that attitude in determining future behavior. It is interesting to note that past research has shown that curtailing counterarguments during message processing can foster persuasion (Petty, Wells, & Brock, 1976). We did not expect to find differences in actual persuasion in the present study, because our participants were able to process and think of counterarguments freely during receipt of the message. That is, participants were instructed to generate counterarguments as they processed the message and, presumably, had little trouble doing so, resulting in resistance across conditions. We manipulated participants’ ability to express their counterarguments after the message, which presumably affected postresistance assessments of their performance, leaving attitudes intact but affecting the certainty with which those attitudes were held.

RESISTANCE TO PERSUASION

Finally, it is worth noting a caveat to the findings of Experiment 1 involving participants’ counterarguments. We argue that even when people generate the same profile of counterarguments, they can interpret their resistance very differently depending on other situational factors. Because of a programming error, counterarguments were not saved by the computer in the 10-s condition of the present experiment. Consequently, we were unable to analyze the counterargument data. Nevertheless, given the nature of the 10versus 60-s manipulation, we assume there were substantial differences in the actual content of counterarguments listed. In particular, it seems unlikely that participants listed counterarguments of the same quality across conditions. They may have generated the same quality of counterarguments during the message, but likely did not list equivalent arguments in the thought-listing task. To provide stronger evidence for the “pure” metacognitive perspective, which suggests that the certainty effects we observed can stem from subjective perceptions of counterarguments in the absence of any objective differences, we conducted a second experiment that did not constrain the counterargument listing procedure in any way.

Experiment 2 In Experiment 2, we sought to manipulate people’s perceptions of their counterarguments without varying the actual nature of the counterarguments listed. In short, we allowed participants to list as many counterarguments as they wanted and gave all participants unlimited time to do so. After the counterargument procedure, we gave participants false feedback that their counterarguments were either strong or weak. Otherwise, this experiment was essentially the same as the first, with a few minor exceptions. We expected that when participants resisted using what they were led to believe were weak counterarguments, they would show evidence of decreased attitude certainty. When participants resisted using what they were led to believe were strong counterarguments, we expected them to maintain a relatively high degree of certainty.

Method Participants and Procedure Thirty-five undergraduates from Ohio State University participated in partial fulfillment of a course requirement. This experiment was essentially the same as Experiment 1, conducted on computers and involving the comprehensive exam policy, but there were a few exceptions. First, there were no constraints on participants’ counterarguments in this experiment. All participants were instructed to list as many counterarguments as they could, and they were given unlimited time to do so. Thus, because of random assignment, we expected participants to have equivalent counterarguments across conditions. Second, rather than manipulating the amount of time participants had to list counterarguments, we gave participants false feedback with respect to the quality of their counterarguments. Participants received this feedback immediately after listing their counterarguments but before completing any dependent measures. Third, we removed the control condition from the experiment and instead used a repeated measures design. Early on in the session, after participants had been introduced to the comprehensive exam policy but before they received a message, they reported their attitudes and attitude certainty. Later, after reading the message, listing counterarguments, and receiving false feedback, participants completed the same measures again, along with behavioral intentions.

427

False Feedback Manipulation Immediately after listing counterarguments, participants were randomly assigned to receive feedback that their counterarguments were either strong and convincing or weak and unconvincing. Preceding this feedback was an instruction screen explaining that we had recently collected counterarguments in response to the exam policy from a representative sample of approximately 900 other students. Participants then read that the computers running the current experiment were programmed to analyze counterarguments by comparing them with other counterarguments collected in our lab and that these computers could determine a number of things about participants’ counterarguments as a result of this analysis. Participants were instructed that when they clicked continue the computer would analyze the counterarguments they had entered and provide a summary of the results of this analysis. When participants clicked continue, a message reading “Please wait . . . The computer is processing your counterarguments” appeared on the screen for 10 s. Then, the following passage appeared at the top of the screen: Below, you are presented with your counterargument index. This index reflects the computer’s analysis of the counterarguments you have generated against comprehensive exams. This index can range from 1–10. If your index is greater than 5, that indicates that your counterarguments were relatively strong. If your index is 5 or less, that indicates that your counterarguments were not strong. You will only see this number once. At the bottom of the same screen, participants received their counterargument index. In the strong counterargument condition, participants were told that their counterargument index was 9, which indicated that they had generated strong and convincing counterarguments. In the weak counterargument condition, participants were told that their counterargument index was 2, which indicated that their counterarguments were weak and unconvincing.

Dependent Measures Attitudes. As described above, participants reported their attitude toward comprehensive exams twice: once before the message and once after reading the message and listing counterarguments against it. Because of the repeated assessment of attitudes, we streamlined these measures by having participants complete just a single semantic differential scale at each time point. This scale ranged from 1 to 9, anchored at unfavorable and favorable, respectively. Attitude certainty. After reporting attitudes each time, participants completed the measure of attitude certainty from Experiment 1. Responses were provided on a scale ranging from 1 to 9, with not certain at all and extremely certain as the anchors. Behavioral intentions. At the end of the experiment, we included a measure of behavioral intentions similar to the measure used in Experiment 1. We told participants that in the future we would be seeking students to make phone calls to other undergraduates telling them about the benefits of the exam policy. Participants were asked how much time they would be willing to devote to this task. Responses were given on a scale ranging from 1 to 9, with 1 labeled 0 time, 2 labeled 1–5 minutes, and so on, up to 9, which was labeled 36 – 40 minutes.

Results Counterarguments One objective in Experiment 2 was to establish the equivalence of counterarguments. We assessed both the number and quality of counterarguments listed. As expected, given that the counterargument task preceded the manipulation, participants generated the

428

TORMALA, CLARKSON, AND PETTY

same number of counterarguments in the strong (M ⫽ 2.65, SD ⫽ 1.50) and weak (M ⫽ 2.50, SD ⫽ 1.62) feedback conditions, t(33) ⫽ ⫺0.28, p ⬎ .78. To assess counterargument quality, two judges, blind to condition and hypothesis, rated each counterargument on a 1 to 9 scale, anchored at very weak and very strong. We averaged the ratings for each participant to form two counterargument indices: one for the first judge and one for the second judge. The judges’ ratings were highly correlated (r ⫽ .76, p ⬍ .001), so we averaged them to form an overall quality index. Counterarguments were rated equally in the strong (M ⫽ 3.86, SD ⫽ 1.45) and weak (M ⫽ 4.28, SD ⫽ 1.70) feedback conditions, t(30) ⫽ 0.74, p ⬎ .46.1

Attitudes Attitudes were submitted to a 2 ⫻ 2 mixed ANOVA, with time of measurement (Time 1 or Time 2) and counterargument feedback (strong or weak) as the within- and between-participants variables, respectively. As revealed in the top panel of Figure 1, this analysis failed to produce any effects (all Fs ⬍ 1).

Attitude Certainty We submitted the certainty data to the same mixed ANOVA. Although the main effect for counterargument feedback was not significant (F ⬍ 1), there was a significant main effect for time of measurement, F(1, 33) ⫽ 11.90, p ⬍ .01. Participants were less certain of their attitudes after (M ⫽ 5.91, SD ⫽ 2.11) rather than

before (M ⫽ 6.89, SD ⫽ 2.10) the message. However, this main effect was qualified by a significant interaction between counterargument feedback and time of measurement, F(1, 33) ⫽ 4.72, p ⬍ .04. As illustrated in the bottom panel of Figure 1, the decrease in attitude certainty was confined to participants who were led to believe their counterarguments were weak, F(1, 33) ⫽ 16.27, p ⬍ .001. When participants were led to believe their counterarguments were strong, there was no change in attitude certainty (F ⬍ 1).

Behavioral Intentions We submitted behavioral intentions to a hierarchical regression analysis with Time 2 attitudes and counterargument feedback as predictors in the first step and their interaction in a second step. Overall, there was a positive correlation between attitudes and behavioral intentions, ␤ ⫽ .32, t(32) ⫽ 1.96, p ⬍ .06, but no effect of counterargument feedback, ␤ ⫽ .24, t(32) ⫽ 1.48, p ⬎ .14. Most important, the Attitude ⫻ Feedback interaction was marginally significant, ␤ ⫽ .66, t(31) ⫽ 1.81, p ⬍ .08. As predicted, participants’ attitudes were significant predictors of their willingness to help promote the exam policy in the strong counterargument condition (r ⫽ .51, p ⬍ .04) but not in the weak counterargument condition (r ⫽ .07, p ⬎ .78).

Discussion Experiment 2 revealed that people can become less certain of their attitudes following resistance to persuasion when they are told they have resisted using specious counterarguments. Moreover, this effect has implications for the correspondence between attitudes and behavioral intentions; the less certain people become of their attitudes, the less these attitudes predict behavioral intentions. These results extend the findings of the first experiment in important ways. First, they highlight another metacognitive perception that can affect attitude certainty when people think about their own resistance. Apparently, being led to believe that one based one’s resistance on weak counterarguments casts doubt on the attitude that has just been defended. The results of Experiment 2 are also important in demonstrating that these effects can occur in the absence of any differences in people’s actual resistance. People resisted the same message in the same way and to the same degree but reached different conclusions about their attitudes depending on manipulated appraisals of their resistance performance. One question that might be raised with respect to Experiment 2 is whether participants actually resisted persuasion or simply felt pressure to report the same attitude from Time 1 to Time 2 to avoid appearing inconsistent. This pressure might have been especially acute given that only one (and the same) attitude item was used each time. Although it is impossible to know for sure why participants did not change, we believe this lack of change reflected true attitudinal resistance. After all, the message was personally relevant and counterattitudinal, all participants were forewarned of it, and all participants were instructed to generate counterarguments. As reviewed earlier, these conditions are well documented as 1

Figure 1. Attitudes (top panel) and attitude certainty (bottom panel) as a function of counterargument feedback and time of measurement in Experiment 2.

There were fewer degrees of freedom in the analysis of counterargument quality because 3 participants listed no counterarguments. These participants were coded as listing zero arguments in the number analysis but could not be included in the quality analysis.

RESISTANCE TO PERSUASION

fostering resistance to persuasion. Furthermore, we probed for suspicion at the end of the experiment, and not a single participant expressed any doubts about our cover story. In other words, no one indicated that we might be studying resistance or that we wanted them to resist. Finally, if strong consistency pressures were operating, participants should have reported the same certainty at Time 1 and Time 2, which they did not. In any case, to remove concern about potential consistency pressures, we returned to a betweenparticipants design (using a control condition) in the next experiment.

Experiment 3 Whereas past research has focused on increased attitude certainty following resistance to persuasion (Tormala & Petty, 2002, 2004a, 2004b), Experiments 1 and 2 revealed that resisted persuasive attacks can sometimes have a hidden success when attitude certainty is undermined. In Experiment 3 we attempted to reconcile the present findings with our own past research by producing both increases and decreases in attitude certainty following resistance to persuasion. To do so, we manipulated perceived counterargument strength and the perceived expertise of the source of a message. Consistent with Tormala and Petty (2004b), we expected that attitude certainty would be particularly likely to increase when people were led to believe they generated strong counterarguments against an expert (rather than nonexpert). Extending this finding, we predicted that attitude certainty would be particularly likely to decrease when people were led to believe they generated weak counterarguments against a nonexpert (rather than expert). In short, we predicted main effects for both perceived counterargument strength and source credibility on attitude certainty, indicating the highest level of certainty when people were given the perception that they generated strong arguments against an expert and the lowest level of certainty when people were given the perception that they generated weak arguments against a nonexpert.

429

comprehensive exam message were strong or weak. This manipulation was identical to that used in Experiment 2. Source credibility. The source credibility manipulation was presented on a screen that immediately preceded the persuasive message, and it appeared again at the top of the screen containing the message. In the high-credibility condition, participants were led to believe the proposal was written by “The Faculty Committee on Academic Affairs at Indiana University, which is made up of six highly regarded professors from Educational Science and other related fields.” In the low-credibility condition, participants were led to believe the proposal was written by “Cindy Ross, a part-time Instructor at Southern Appalachian State Community Technical College.” Past research using the same manipulation has shown these sources to be high and low in perceived expertise, respectively (Tormala & Petty, 2004b). Control condition. A subset of participants (n ⫽ 19) was randomly assigned to a control condition, included to provide a baseline for the attitude and certainty data. In this condition, participants were given the same basic introduction to the experiment and the same initial information about comprehensive exams. Following this information, control participants read a neutral article that was similar in appearance and length to the comprehensive exam message but completely unrelated to the exam topic, after which they proceeded to the dependent measures. Control participants were not asked to list any counterarguments.

Dependent Measures Attitudes. Following the persuasive message and counterargument procedure (or directly following the irrelevant article in the control condition), participants rated comprehensive exams on a series of semantic differential scales ranging from 1 to 9 with the following anchors: bad– good, negative–positive, unfavorable–favorable, unpleasant–pleasant, harmful– beneficial, and foolish–wise. Higher numbers indicated more favorable evaluations of the exam policy. Responses were highly consistent (␣ ⫽ .94), so we averaged them to form a composite attitude index. Attitude certainty. Attitude certainty was assessed using the same global item as in the first two experiments.

Results Counterarguments

Method Participants and Design Eighty-nine Indiana University undergraduates participated in partial fulfillment of a course requirement. Participants were randomly assigned to conditions in a 2 (counterargument feedback: strong or weak) ⫻ 2 (source credibility: high or low) ⫹ 1 (external control condition) betweenparticipants design.

Procedure This experiment was very similar to Experiment 2, with two important modifications. First, we manipulated source credibility as in Tormala and Petty (2004b). Second, we removed Time 1 measures and reinserted a control condition for directional comparisons. Otherwise, this experiment was essentially the same. All sessions were conducted on computers, and participants read the same proposal in favor of comprehensive exams, after which they generated as many counterarguments as they could. After listing counterarguments, participants received false feedback about the strength of their arguments and completed dependent measures.

Independent Variables Counterargument feedback. Participants were randomly assigned to receive false feedback that the counterarguments they generated against the

Because control participants did not list counterarguments, the counterargument data were submitted to 2 (counterargument feedback) ⫻ 2 (source credibility) ANOVAs. We began with the number of counterarguments. There was a tendency for participants to generate more counterarguments against the highcredibility (M ⫽ 3.20, SD ⫽ 2.11) rather than low-credibility (M ⫽ 2.63, SD ⫽ 1.09) source (see also Bohner, Ruder, & Erb, 2002; Hass, 1981), but this difference was not significant, F(1, 66) ⫽ 2.18, p ⬎ .14. No other effects even approached significance (Fs ⬍ 1.19, ps ⬎ .28). Two judges rated the quality of participants’ counterarguments using the same approach as in Experiment 2. The judges’ ratings were significantly correlated (r ⫽ .86, p ⬍ .001), so we averaged them. This index revealed no significant effects (all Fs ⬍ 1).

Attitudes Given the design of this experiment (i.e., 2 ⫻ 2 ⫹ 1), and the fact that we predicted no differences in attitudes across conditions, we submitted the attitude data to a one-way ANOVA, treating all five experimental conditions as different levels of the same factor. There were no differences in attitudes across conditions, F(4,

TORMALA, CLARKSON, AND PETTY

430

84) ⫽ 0.52, p ⬎ .72. Furthermore, individual post hoc comparisons revealed that in none of the message conditions were attitudes any more favorable (4.10 ⬍ Ms ⬍ 4.67) than they were in the control condition (M ⫽ 3.98), ps ⬎ .78.

Attitude Certainty We analyzed the certainty data (Figure 2) in a two-pronged fashion. First, we conducted a 2 ⫻ 2 ANOVA with counterargument feedback (strong or weak) and source credibility (high or low) as the independent variables. This analysis revealed a significant main effect for counterargument feedback, F(1, 66) ⫽ 4.98, p ⬍ .03, such that participants were more certain of their attitudes when they were told they resisted using strong (M ⫽ 6.50, SD ⫽ 1.78) rather than weak (M ⫽ 5.58, SD ⫽ 1.61) counterarguments. There was also a main effect for source credibility, F(1, 66) ⫽ 13.56, p ⬍ .001. Participants were more certain of their attitudes after resisting a source who was high (M ⫽ 6.74, SD ⫽ 1.42) rather than low (M ⫽ 5.31, SD ⫽ 1.76) in expertise. There was no interaction between these variables (F ⬍ 1). To determine the direction of these effects, we reinserted the control condition and analyzed the data separately for the perceived strong and weak counterargument participants. Selecting for the strong feedback condition (plus control), there was a significant effect of source credibility on attitude certainty, F(2, 50) ⫽ 3.45, p ⬍ .05. Certainty was greater in the high-credibility condition than in the low-credibility or control conditions, F(1, 50) ⫽ 6.80, p ⬍ .02, which did not differ from each other (F ⬍ 1). Selecting for the weak feedback condition (plus control), there was also a significant effect of source credibility on attitude certainty, F(2, 52) ⫽ 3.62, p ⬍ .04. Certainty was lower in the lowcredibility condition than in the high-credibility or control conditions, F(1, 52) ⫽ 7.02, p ⬍ .02, which did not differ from each other (F ⬍ 1).

Discussion The results of Experiment 3 extend the findings of the first two experiments by identifying conditions under which resistance can be followed by increases and decreases in attitude certainty. Consistent with our earlier findings (Tormala & Petty, 2002, 2004b),

resistance appears to affect attitude certainty primarily when resistance is diagnostic of attitude validity. When people are told they have done a good job resisting (i.e., they have made strong counterarguments), they only become more certain of their attitudes when they perceive that they have resisted an expert. Indeed, if one handily resists an attack from an expert, one can assume that his or her attitude was already correct, or valid. This assumption cannot be made as confidently about an attitude that resists an attack from a nonexpert, because it is possible that an expert might have been more persuasive or presented better arguments. When people are told they have done a bad job resisting (i.e., they have made weak counterarguments), they only become less certain of their attitudes when they perceive that they have resisted a nonexpert. In this case, performing poorly against a nonexpert is particularly diagnostic regarding the attitude’s invalidity. Indeed, one who could think only of weak arguments against a nonexpert might have been persuaded by an expert. If a person performs poorly against an expert, on the other hand, he or she does not stand to lose certainty because a better performance could be assumed against any other source. In essence, we suspect that the certainty effect stems from postmessage perceptions of how successful one’s resistance has been. These perceptions, in turn, appear to be affected by counterargument appraisals and source information.

Experiment 4 Experiment 4 was designed to extend the findings of Experiment 3 in two ways. The primary objective was to provide mediational evidence for the metacognitive processes we have postulated to be responsible for the certainty effects. We predicted that when people were led to believe they had generated strong or weak arguments against an expert versus a nonexpert, they would perceive that their counterarguments had been differentially successful, which would then determine attitude certainty. A secondary objective of Experiment 4 was to explore the implications of initial resistance for attitude change in response to a second message. Along with attitude– behavior consistency, differential resistance to change is a well-documented feature of high versus low certainty attitudes (e.g., Bassili, 1996; Tormala & Petty, 2002; Wu & Shaffer, 1987). Applying past research to the current framework, we expected people to be most susceptible to a follow-up persuasive attack when they believed they had generated weak counterarguments against a nonexpert. Alongside reduced attitude certainty, such an effect would provide convergent evidence for the notion that attitudes can be weakened when people have doubts about their resistance performance.

Method Participants and Procedure

Figure 2. Attitude certainty as a function of counterargument feedback and source credibility in Experiment 3.

Sixty-four Indiana University undergraduates participated in partial fulfillment of a course requirement. This experiment was a replication of Experiment 3, with a few exceptions. First, because this experiment focused on mediation of attitude certainty by participants’ perceptions of their counterarguments, we did not include a control condition in the design. Second, to assess participants’ perceptions of their own counterarguments, we included several new measures. Finally, to assess subse-

RESISTANCE TO PERSUASION quent persuasion following initial resistance, we presented participants with a second persuasive message at the end of the experiment. Participants received the initial persuasive message, generated counterarguments, and completed dependent measures (e.g., attitudes and attitude certainty) as in Experiment 3. Following these measures, participants engaged in a filler task. This task involved a word association procedure, in which 15 words were presented one at a time on the computer screen. Participants were instructed to type the first word that came to mind for each word displayed. The specific words presented were completely unrelated to the experiment and were neutral in valence (e.g., gravity, lamp). Following this task, participants read that we would now present them with additional information about comprehensive exams from a recent report by the Educational Testing Service. We indicated a new source for this information to dispel any inkling participants had that the second message came from the same source as the first message. Participants then read three new, strong arguments in favor of comprehensive exams (adapted from Petty & Cacioppo, 1986; e.g., implementing comprehensive exams would increase the quality of teaching), after which they again reported their attitudes toward the comprehensive exam policy. There were no counterargument instructions or thought-listing measures associated with the second message.

Design and Independent Variables Participants were randomly assigned to conditions in a 2 (counterargument feedback: strong or weak) ⫻ 2 (source credibility: high or low) between-participants design. The manipulations were identical to the manipulations used in Experiment 3.

Dependent Measures Time 1 attitudes: Initial resistance. Following the initial persuasive message and counterarguing procedure, we assessed attitudes using the same six items as in Experiment 3. Responses were highly consistent (␣ ⫽ .95), so we averaged them to form a composite index. Attitude certainty. Following the attitude measure, we assessed attitude certainty using three items: How certain are you of your attitude toward senior comprehensive exams? How convinced are you of your opinion on senior comprehensive exams? How much confidence do you have in your attitude toward senior comprehensive exams? Responses were given on scales ranging from 1 to 9 with the following anchors: not certain at all– extremely certain, not convinced at all– extremely convinced, and no confidence at all–very high confidence. Multiple scales were used in this experiment to create a more reliable index for the mediational analysis. Responses were averaged to form a composite index (␣ ⫽ .93). Perceived strength of counterarguments. To assess counterargument perceptions, participants were asked to report how strong or weak they felt their counterarguments were, how effective or ineffective they felt their counterarguments were, how successful or unsuccessful they felt they were in counterarguing the message, and how satisfied or unsatisfied they were with their counterarguments (items adapted from Tormala & Petty, 2002). Participants responded on scales ranging from 1 to 9, with the following anchors: very weak–very strong, very ineffective–very effective, very unsuccessful–very successful, and very unsatisfied–very satisfied. Each item was scored such that higher numbers reflected more favorable assessments. Responses were averaged to form a composite index (␣ ⫽ .93). Attitude change. Following the second persuasive message, participants again reported their attitudes toward comprehensive exams, this time on a single scale ranging from 1 to 9 and anchored at bad and good, respectively. To create an index of attitude change in response to the second message, we subtracted Time 1 attitudes from Time 2 attitudes using the single shared item between these assessments (i.e., the bad– good semantic differential). Higher attitude change scores reflected more persuasion.

431 Results

Initial Attitudes and Attitude Certainty We began by submitting attitudes following the first message to a 2 ⫻ 2 ANOVA with counterargument feedback (strong or weak) and source credibility (high or low) as the independent variables. As in the previous studies, there were no differences in attitudes across conditions (Fs ⬍ 1). On the attitude certainty index, however, a different pattern emerged. As illustrated in Table 2, there was a main effect for counterargument feedback, F(1, 60) ⫽ 7.02, p ⫽ .01, such that attitude certainty was higher in the strong (M ⫽ 7.00, SD ⫽ 1.46) than in the weak (M ⫽ 5.84, SD ⫽ 1.66) feedback condition. There was also a main effect for source credibility, F(1, 60) ⫽ 4.99, p ⬍ .03; attitude certainty was greater in the high- (M ⫽ 6.89, SD ⫽ 1.47) than in the low- (M ⫽ 5.85, SD ⫽ 1.70) credibility condition. There was no interaction between these variables (F ⬍ 1).

Perceived Strength of Counterarguments We submitted the perceived counterargument strength index to the same 2 ⫻ 2 ANOVA and found the predicted main effects for both source credibility, F(1, 60) ⫽ 5.34, p ⬍ .03, and counterargument feedback, F(1, 60) ⫽ 46.93, p ⬍ .001. As illustrated in Table 2, participants rated their own counterarguments as stronger in the strong (M ⫽ 7.42, SD ⫽ 1.22) than in the weak (M ⫽ 4.62, SD ⫽ 1.86) feedback condition and stronger in the high- (M ⫽ 6.58, SD ⫽ 1.82) than in the low- (M ⫽ 5.23, SD ⫽ 2.21) credibility condition. There was no interaction (F ⬍ 1). To test whether counterargument perceptions mediated the attitude certainty effects, we conducted a 2 ⫻ 2 analysis of covariance (ANCOVA) on attitude certainty, treating perceived counterargument strength as a covariate. Controlling for perceived counterargument strength, neither source credibility, F(1, 59) ⫽ 1.96, p ⬎

Table 2 Attitudes, Attitude Certainty, Perceived Counterargument Strength, and Attitude Change as a Function of Source Credibility and Weak or Strong Counterargument Feedback in Experiment 4 Low source credibility Dependent measure Attitudes M SD Attitude certainty M SD Perceived CA strength M SD Attitude change M SD

High source credibility

Weak

Strong

Weak

Strong

4.25 1.29

4.38 2.05

4.69 2.34

4.03 2.13

5.39 1.64

6.58 1.60

6.42 1.54

7.28 1.33

4.05 1.77

7.10 1.42

5.33 1.77

7.63 1.55

1.58 2.17

0.67 0.78

0.60 1.06

0.06 1.39

Note. All scales ranged from 1 to 9. Attitudes refer to initial attitudes measured after the initial message and counterargument procedure. CA ⫽ counterargument.

TORMALA, CLARKSON, AND PETTY

432

.16, nor counterargument feedback (F ⬍ 1) had a significant effect on attitude certainty. The interaction was also nonsignificant (F ⬍ 1). Consistent with the mediation hypothesis, however, perceived counterargument strength was a significant predictor in this analysis, F(1, 59) ⫽ 9.75, p ⬍ .01.2

Time 2 Attitudes: Subsequent Resistance Finally, we submitted the attitude change index to analysis. To begin with, there was a significant main effect for source credibility, F(1, 60) ⫽ 4.20, p ⬍ .05; attitudes changed more in response to the second message when people initially resisted a low- (M ⫽ 1.23, SD ⫽ 1.80) rather than a high- (M ⫽ 0.30, SD ⫽ 1.26) credibility source. There was also a marginal main effect for counterargument feedback, F(1, 60) ⫽ 3.53, p ⬍ .07; attitudes changed more when people thought they generated weak (M ⫽ 1.15, SD ⫽ 1.81) rather than strong (M ⫽ 0.30, SD ⫽ 1.21) counterarguments to the first message. There was no interaction between these variables (F ⬍ 1). As displayed in Table 2, attitude change was highest in the weak counterargument and lowcredibility condition and lowest in the strong counterargument and high-credibility condition. In fact, the weak counterargument and low-credibility condition was the only one that showed significant change from Time 1 to Time 2, F(1, 60) ⫽ 20.32, p ⬍ .001. In none of the other conditions did attitudes change ( ps ⬎ .13). We also analyzed subsequent resistance by submitting Time 2 attitudes to an ANCOVA with source credibility and counterargument feedback as independent variables and Time 1 attitudes as a covariate. This analysis replicated the outcome with attitude change scores. First, Time 1 attitudes predicted Time 2 attitudes, F(1, 59) ⫽ 99.27, p ⬍ .001. More germane to the present concerns, both source credibility, F(1, 59) ⫽ 3.56, p ⫽ .06, and counterargument feedback, F(1, 59) ⫽ 3.63, p ⫽ .06, had marginally significant main effects on Time 2 attitudes, and there was no interaction (F ⬍ 1). In short, despite no differences in attitudes following the first message, attitudes following the second message were more favorable in the low (M ⫽ 5.37) than in the high (M ⫽ 4.65) source credibility condition, and they were more favorable in the weak (M ⫽ 5.37) than in the strong (M ⫽ 4.65) counterargument feedback condition. Including attitude certainty as an additional covariate in this analysis, neither source credibility nor counterargument feedback had a significant effect on Time 2 attitudes ( ps ⬎ .16).

Discussion The results of Experiment 4 were consistent with the notion that the certainty with which people held their attitudes after resisting a persuasive attack was determined by their postmessage appraisals of their counterargument performance. The more successful people thought their counterarguing was, the more certain they felt of their attitudes. This assessment of success, in turn, was affected by false feedback and the credibility of the source of the counterargued attack. This experiment also suggested that when people had low levels of attitude certainty following an initial attack, they were more susceptible to persuasion in response to a second attack from a different source. When people had high levels of attitude certainty following the initial attack, they were more resistant to the second attack. Thus, this experiment produced convergent

evidence for the notion that people’s initial attitudes can be weakened when they have doubts about their resistance performance. As a caveat to our mediational analysis of the certainty effect (through perceived counterargument strength) in this experiment, it is worth noting that the perceived counterargument strength index directly followed the attitude certainty index. It is possible that the data supported our mediational hypotheses because participants rated the strength of their counterarguments in a way that would be consistent with the level of attitude certainty they had just reported. As noted already, these measures were highly correlated. We tested the mediation through counterargument perceptions to attitude certainty because this is the mediation that seemed most logical or plausible given our framework and the specific manipulations we used. That is, we assumed perceived counterargument strength would be the mediator because we directly manipulated it with false feedback. Thus, we felt that our approach was logically warranted. To empirically validate this assumption, we tested the reverse mediational pathway. That is, we conducted a 2 ⫻ 2 ANCOVA on perceived counterargument strength, treating attitude certainty as the covariate. This analysis revealed that the reverse mediational pathway performed more poorly than did the pathway already tested. In particular, although the effect of source credibility on perceived counterargument strength became nonsignificant, F(1, 59) ⫽ 2.28, p ⬍ .14, the effect of the false feedback manipulation remained highly significant, F(1, 59) ⫽ 35.15, p ⬍ .001. In other words, whereas controlling for perceived counterargument strength makes the effect of false feedback on attitude certainty drop out (F ⬍ 1), controlling for attitude certainty leaves the effect of false feedback on perceived counterargument strength intact. Overall, then, the data tended to support the mediation of the certainty effect by perceived counterargument strength rather the opposite pattern.

General Discussion The data from four experiments provided support for the idea that resisted persuasive attacks can sometimes have hidden success with respect to target attitudes. Specifically, when people resist persuasion but think they did a bad job resisting (e.g., because they are unable to articulate their counterarguments or they have the 2

We also tested mediation using the Baron and Kenny (1986) technique. First, we considered source credibility. Source credibility had significant effects on attitude certainty, ␤ ⫽ .32, t(62) ⫽ 2.62, p ⫽ .01, and perceived counterargument strength, ␤ ⫽ .32, t(62) ⫽ 2.67, p ⬍ .01. Moreover, perceived counterargument strength predicted attitude certainty, ␤ ⫽ .54, t(62) ⫽ 4.98, p ⬍ .001. When both source credibility and perceived counterargument strength were entered as predictors of certainty, perceived counterargument strength was significant, ␤ ⫽ .48, t(61) ⫽ 4.30, p ⬍ .001, but source credibility was not, ␤ ⫽ .16, t(61) ⫽ 1.43, p ⬎ .15. A Sobel test revealed a significant mediational pathway (z ⫽ 2.20, p ⬍ .03). We next examined counterargument feedback. Counterargument feedback had significant effects on certainty, ␤ ⫽ .35, t(62) ⫽ 2.95, p ⬍ .01, and perceived counterargument strength, ␤ ⫽ .67, t(62) ⫽ 7.02, p ⬍ .001. When both counterargument feedback and perceived counterargument strength were entered as predictors of certainty, perceived counterargument strength was significant, ␤ ⫽ .54, t(61) ⫽ 3.73, p ⬍ .001, but counterargument feedback was not, ␤ ⫽ ⫺.01, t(61) ⫽ ⫺.06, p ⬎ .94. Again, the mediational pathway was significant (z ⫽ 3.33, p ⬍ .001).

RESISTANCE TO PERSUASION

perception that their counterarguments are weak), they actually become less certain of their attitudes than they were initially. Furthermore, under the same conditions, people’s attitudes become less predictive of behavioral intentions and less likely to withstand future persuasive attacks. People’s appraisals of their own resistance, then, can actually weaken attitudes, reducing their predictive utility and durability. The present experiments are the first to explore the possibility that attitude certainty, or attitude strength more generally, can be reduced through initial resistance. As described earlier, most past resistance research has been guided by an underlying assumption that when a persuasive message fails to change the valence or extremity of the target attitude, it exerts no impact on that attitude. Our recent studies (Tormala & Petty, 2002, 2004a, 2004b) undermined this assumption by demonstrating that resisted messages can sometimes backfire by making people more certain of the target attitudes than they already were.3 Nevertheless, our research before this article had focused exclusively on the notion that attitude certainty can increase following resistance to persuasion. The current research expands our understanding of these effects by exploring the opposite phenomenon from the same metacognitive perspective. As the current studies reveal, the direction of the certainty effect depends not only on people’s perception and assessment of their resistance, but also on what, or who, people resist. When people perceive that they have done a good job resisting, for instance, they gain attitude certainty, but only when the message they resisted comes from a high-credibility source. Again, it is under these conditions that strong resistance is most diagnostic with respect to the validity of the attitude. As predicted, though, a very different pattern emerges when people perceive that they have done a bad job resisting. When people evaluate their own resistance performance as poor, they are particularly likely to lose certainty if the message they resisted comes from a low-credibility source. As discussed earlier, we assume that under these conditions weak resistance is especially diagnostic with respect to an attitude’s invalidity.

New Questions Although the current findings clearly fit with and extend our metacognitive framework for understanding resistance to persuasion, there are several important questions that remain to be answered. Ultimately, we see these questions as opening the door to new research that will expand our understanding of the current findings and the conditions under which they are most likely to emerge.

Mechanism of Resistance One important task for future research will be uncovering additional factors that undermine certainty following resistance to persuasion. A particularly strong candidate in this regard may be the perceived legitimacy of the mechanism one uses to resist persuasion. As noted earlier, there are a variety of resistance strategies. Recent research by Jacks and Cameron (2003) has suggested that people have some awareness or perception of the strategies they use. We focused on counterarguing in the present research because this is an effective and well-established means of

433

resistance (see Petty, Ostrom, & Brock, 1981). Counterarguing may be very different from other mechanisms, however, in that it is active and thoughtful, and it involves attention to core message arguments. Though such processing can be biased (Lord, Ross, & Lepper, 1979), it is likely to be perceived as a legitimate resistance mechanism. Other resistance mechanisms might be perceived as less legitimate. When one thinks one has ignored a message or derogated its source, for instance, one may feel that he or she has been biased or has basically sidestepped message content (see Jacks & Cameron, 2003). This perception might provoke uncertainty as to whether one could have resisted if one had more thoughtfully processed message arguments. This feeling of uncertainty, in turn, could lead to doubts about the target attitude. Past research is generally consistent with this possibility, suggesting that people can assess the validity of their processing mechanism and that this assessment can affect subsequent processing and feelings of confidence (e.g., Mazursky & Schul, 2000; Yzerbyt, Schadron, Leyens, & Rocher, 1994; see also Chaiken, Liberman, & Eagly, 1989).

Resistance Versus Persuasion Another important question, first asked after Experiment 1, is why people’s perceptions of their resistance did not affect the degree of resistance versus persuasion. Intuitively, one might expect that the less favorable people’s assessment of their own resistance is, the more persuaded they should be. In fact, there is substantial support for this type of effect. As described earlier, Petty et al. (1976) found that when participants’ counterarguments were curtailed by a distraction manipulation, they were more persuaded by a message. More recently, research on the selfvalidation hypothesis (e.g., Petty et al., 2002; Tormala, Petty, & Brin˜ol, 2002) has shown that inducing doubt about people’s counterarguments to a persuasive message can produce more persuasion (relative to inducing confidence in those counterarguments). The present research does not contest the notion that having doubt about one’s resistant thoughts sometimes facilitates persuasion. Our position is that given that resistance already occurred, assessments of that resistance can affect attitude certainty. In other words, after resistance has occurred, people can reflect upon their resistance, assess their performance, and feel more or less certain of their attitudes. When this post hoc assessment of resistance leads to questioning the basis of an attitude (e.g., because the person now thinks the arguments supporting the attitude are weak), attitude certainty declines. In the present experiments, people (on average) did resist persuasion, as indicated by the attitude data. We expected participants to resist, because the message was counterattitudinal, they were forewarned of it, and they were directed to counterargue. We assume that people reflected upon their resistance after this resistance had occurred. Had participants considered their resistance during message processing—that is, while attitudes were still being formed—we suspect that we would have obtained different 3

McGuire (1964) also proposed that initial resistance could boost subsequent resistance, but the mechanism and explanation for these effects were very different from the current formulation (see Tormala & Petty, 2002, 2004a, for further discussions).

TORMALA, CLARKSON, AND PETTY

434

results. More specifically, if people had been made to doubt their resistance before consolidating their attitudes (e.g., by giving online feedback that counterarguments were weak), they might have been persuaded, as predicted by the self-validation hypothesis. This would be akin to findings in the ease of retrieval literature, in which it has been demonstrated that struggling to think of counterarguments before forming attitudes leads people to form more favorable attitudes (e.g., Tormala et al., 2002). We intend to explore this timing issue in future research. Also relevant is the issue of whether people’s postmessage assessments of their performance are restricted to situations in which they resist persuasion. That is, can postmessage appraisal processes also apply to situations in which people are persuaded by a message? We suggest that they can. For instance, Rucker and Petty (2004) found that when people try but fail to resist persuasion, they can reflect on this outcome and become more certain of their newly changed attitudes than they would be if they had not tried to resist in the first place. On the basis of findings such as these, we argue that postmessage attitude appraisal processes, as explored in the present research, are not unique to resistance but apply to resistance and persuasion scenarios more generally. The present research focused on a subset of these situations in which people resist persuasion and then become less certain of their attitudes. In conjunction with past studies (e.g., Rucker & Petty, 2004; Tormala & Petty, 2002), we view the present research as fitting into a larger metacognitive framework for understanding people’s perceptions of their own persuasion versus resistance and the implications of these perceptions for attitude certainty (see also Petty, Tormala, & Rucker, 2004).

Conclusion Past research on resistance has largely been conducted under the assumption that when a persuasive message fails to change the valence or extremity of a target attitude, it simply has been unsuccessful. As a result of this assumption, very little is known about the effects of resisted persuasive messages on people’s attitudes. What is known suggests that resisting persuasion can make attitudes stronger (Tormala & Petty, 2002, 2004a, 2004b; see also McGuire, 1964). The present research demonstrates for the first time the opposite phenomenon—that is, when people have doubts about their resistance they can become less certain of their attitudes. This effect is important as it suggests that in some situations “failed persuasion” can mask a hidden success that ultimately worsens an attitude’s predictive utility and opens the attitude up to future change. Our hope is that this finding will spark new and innovative approaches to attitude change research that focus on the role of metacognitive factors and previously hidden, yet potentially important, traces of success for resisted messages.

References Abelson, R. P. (1988). Conviction. American Psychologist, 43, 267–275. Babad, E. Y., Ariav, A., Rosen, I., & Salomon, G. (1987). Perseverance of bias as a function of debriefing conditions and subjects’ confidence. Social Behavior, 2, 185–193. Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182.

Bassili, J. N. (1996). Meta-judgmental versus operative indexes of psychological attributes: The case of measures of attitude strength. Journal of Personality and Social Psychology, 71, 637– 653. Bless, H., & Forgas, J. P. (Eds.). (2000). The message within: The role of subjective experience in social cognition and behavior. Philadelphia: Psychology Press. Bohner, G., Ruder, M., & Erb, H. (2002). When expertise backfires: Contrast and assimilation effects in persuasion. British Journal of Social Psychology, 41, 495–519. Brehm, J. W. (1966). A theory of psychological reactance. San Diego, CA: Academic Press. Brock, T. C. (1967). Communication discrepancy and intent to persuade as determinants of counterargument production. Journal of Experimental Social Psychology, 3, 296 –309. Chaiken, S., Liberman, A., & Eagly, A. H. (1989). Heuristic and systematic processing within and beyond the persuasion context. In J. S. Uleman & J. A. Bargh (Eds.), Unintended thought (pp. 212–252). New York: Guilford Press. Fazio, R. H., & Zanna, M. P. (1978). Attitudinal qualities relating to the strength of the attitude-behavior relationship. Journal of Experimental Social Psychology, 14, 398 – 408. Fishbein, M., & Ajzen, I. (1975). Belief, attitude, intention, and behavior. Reading, MA: Addison-Wesley. Gross, S., Holtz, R., & Miller, N. (1995). Attitude certainty. In R. E. Petty & J. A. Krosnick (Eds.), Attitude strength: Antecedents and consequences (pp. 215–245). Mahwah, NJ: Erlbaum. Hass, R. G. (1981). Effects of source characteristics on cognitive responses and persuasion. In R. E. Petty, T. M. Ostrom, & T. C. Brock (Eds.), Cognitive responses in persuasion (pp. 141–172). Hillsdale, NJ: Erlbaum. Hass, R. G., & Grady, K. (1975). Temporal delay, type of forewarning, and resistance to influence. Journal of Experimental Social Psychology, 11, 459 – 469. Jacks, J. Z., & Cameron, K. A. (2003). Strategies for resisting persuasion. Basic and Applied Social Psychology, 25, 145–161. Jarvis, W. B. G. (2004). MediaLab [Computer software]. Columbus, OH: Empirisoft. Jost, J. T., Kruglanski, A. W., & Nelson, T. O. (1998). Social metacognition: An expansionist review. Personality and Social Psychology Review, 2, 137–154. Killeya, L. A., & Johnson, B. T. (1998). Experimental induction of biased systematic processing: The direct-thought technique. Personality and Social Psychology Bulletin, 24, 17–33. Knowles, E. S., & Linn, J. A. (Eds.). (2004). Resistance and persuasion. Mahwah, NJ: Erlbaum. Kruglanski, A. W., & Freund, T. (1983). The freezing and un-freezing of lay-inferences: Effects on impressional primacy, ethnic stereotyping and numerical anchoring. Journal of Experimental Social Psychology, 19, 448 – 468. Lewan, P. C., & Stotland, E. (1961). The effects of prior information on susceptibility to an emotional appeal. Journal of Abnormal and Social Psychology, 62, 450 – 453. Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37, 2098 –2109. Lydon, J., Zanna, M. P., & Ross, M. (1988). Bolstering attitudes by autobiographical recall: Attitude persistence and selective memory. Personality and Social Psychology Bulletin, 14, 78 – 86. Mazursky, D., & Schul, Y. (2000). In the aftermath of invalidation: Shaping judgment rules on learning that previous information was invalid. Journal of Consumer Psychology, 9, 213–222. McGuire, W. J. (1964). Inducing resistance to persuasion: Some contem-

RESISTANCE TO PERSUASION porary approaches. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 1, pp. 191–229). New York: Academic Press. Papageorgis, D. (1968). Warning and persuasion. Psychological Bulletin, 70, 271–282. Petty, R. E., Brin˜ol, P., & Tormala, Z. L. (2002). Thought confidence as a determinant of persuasion: The self-validation hypothesis. Journal of Personality and Social Psychology, 82, 722–741. Petty, R. E., Brin˜ol, P., Tormala, Z. L., & Wegener, D. (in press). The role of metacognition in social judgment. In E. T. Higgins & A. Kruglanski (Eds.), Social psychology: A handbook of basic principles (2nd ed.). New York: Guilford Press. Petty, R. E., & Cacioppo, J. T. (1979a). Effects of forewarning of persuasive intent and involvement on cognitive responses and persuasion. Personality and Social Psychology Bulletin, 5, 173–176. Petty, R. E., & Cacioppo, J. T. (1979b). Issue-involvement can increase or decrease persuasion by enhancing message-relevant cognitive responses. Journal of Personality and Social Psychology, 37, 1915–1926. Petty, R. E., & Cacioppo, J. T. (1986). The elaboration likelihood model of persuasion. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 19, pp. 123–205). New York: Academic Press. Petty, R. E., & Krosnick, J. A. (Eds.). (1995). Attitude strength: Antecedents and consequences. Mahwah, NJ: Erlbaum. Petty, R. E., Ostrom, T. M., & Brock, T. C. (Eds.). (1981). Cognitive responses in persuasion. Hillsdale, NJ: Erlbaum. Petty, R. E., Tormala, Z. L., & Rucker, D. D. (2004). Resisting persuasion by counterarguing: An attitude strength perspective. In J. T. Jost, M. R. Banaji, & D. A. Prentice (Eds.), Perspectivism in social psychology: The yin and yang of scientific progress (pp. 37–51). Washington, DC: American Psychological Association. Petty, R. E., Wells, G. L., & Brock, T. C. (1976). Distraction can enhance or reduce yielding to propaganda: Thought disruption versus effort

435

justification. Journal of Personality and Social Psychology, 34, 874 – 884. Rucker, D. D., & Petty, R. E. (2004). When resistance is futile: Consequences of failed counterarguing for attitude certainty. Journal of Personality and Social Psychology, 86, 219 –235. Tannenbaum, P. H., Macauley, J. R., & Norris, E. L. (1966). Principle of congruity and reduction of persuasion. Journal of Personality and Social Psychology, 3, 233–238. Tormala, Z. L., & Petty, R. E. (2002). What doesn’t kill me makes me stronger: The effects of resisting persuasion on attitude certainty. Journal of Personality and Social Psychology, 83, 1298 –1313. Tormala, Z. L., & Petty, R. E. (2004a). Resistance to persuasion and attitude certainty: The moderating role of elaboration. Personality and Social Psychology Bulletin, 30, 1446 –1457. Tormala, Z. L., & Petty, R. E. (2004b). Source credibility and attitude certainty: A metacognitive analysis of resistance to persuasion. Journal of Consumer Psychology, 14, 427– 442. Tormala, Z. L., Petty, R. E., & Brin˜ol, P. (2002). Ease of retrieval effects in persuasion: A self-validation analysis. Personality and Social Psychology Bulletin, 28, 1700 –1712. Wu, C., & Shaffer, D. R. (1987). Susceptibility to persuasive appeals as a function of source credibility and prior experience with the attitude object. Journal of Personality and Social Psychology, 52, 677– 688. Yzerbyt, V. Y., Lories, G., & Dardenne, B. (1998). Metacognition: Cognitive and social dimensions. Thousand Oaks, CA: Sage. Yzerbyt, V. Y., Schadron, G., Leyens, J., & Rocher, S. (1994). Social judgeability: The impact of meta-informational cues on the use of stereotypes. Journal of Personality and Social Psychology, 66, 48 –55.

Received May 31, 2005 Revision received December 12, 2005 Accepted December 14, 2005 䡲

INTERPERSONAL RELATIONS AND GROUP PROCESSES

Mere Effort as the Mediator of the Evaluation–Performance Relationship Stephen G. Harkins Northeastern University The research traditions that have examined the evaluation–performance relationship do not agree on the mediating process(es), nor is there any compelling evidence that favors one account over the others. In the current research, a molecular analysis of performance on the Remote Associates Test was undertaken in an effort to identify the mediating process(es). This analysis suggests that the potential for evaluation leads participants to put greater effort into the prepotent response and that this mere effort alone can account for the typical finding that evaluation improves performance on simple items and debilitates performance on complex ones. Subsequent research will be aimed at testing the generalizability of this account. Keywords: evaluation, task performance, activation model

In nearly 400 studies involving 40,000 participants in eight countries, 88 different tasks, and time spans ranging from 1 min to 3 years, Locke and Latham (1990) reported that participants urged to strive to attain a specific, difficult level of performance do even better than participants asked to do their best. Harkins and his colleagues (Harkins & Lowe, 2000; Harkins, White, & Utman, 2000; White, Kjelgaard, & Harkins, 1995) have shown that participants produce goal-setting effects when difficult goals are set by a legitimate authority (e.g., the experimenter) that can evaluate their performances. In virtually all goal-setting research, simple tasks have been used on which stringent goals lead to improved performance. However, when complex tasks have been used, the effect of goals on performance has been relatively weak and sometimes detrimental (Tubbs, 2001). Deci and Ryan (1985) argued that when individuals are intrinsically motivated, they attempt to stretch their abilities and derive enjoyment from the challenge associated with the task itself. In contrast, when individuals are extrinsically motivated, they are focused on factors such as reward, deadlines, surveillance, and evaluation. Amabile and her colleagues have shown that extrinsic motivation not only undermines subsequent interest in the tasks but also disrupts performance on tasks that require creativity. For example, Amabile (1979) found that the collages of participants subject to external evaluation were judged to be less creative than those of participants who were not subject to this evaluation. Harkins (2001a, 2001b) has argued that creative tasks (e.g., Amabile, 1979) are simply a type of complex task. Certainly, the potential for evaluation has the same debilitating effect on the performance of creative tasks as it has on complex ones. For example, Bartis, Szymanski, and Harkins (1988) found that when asked to produce uses that were as creative as possible without regard to number, participants who were subject to evaluation produced uses that were rated as less creative than participants who were not subject to evaluation. However, when simply asked

The effect of the potential for evaluation on task performance has been a topic of interest in psychology for more than a century (Triplett, 1898). By evaluation, we mean the judgment that can be made by some potential agent of evaluation (i.e., self or other) when he or she has access to a measure of the target’s output and a criterion against which this output can be compared. Five different research traditions in psychology have found that the potential for evaluation affects task performance: social loafing, goal setting, creativity, achievement goals, and social facilitation. These research traditions have developed independently and, for the most part, do not consider the findings obtained within the other traditions. Social loafing refers to the finding that people put out less effort when working together than when working alone (Latane´, Williams, & Harkins, 1979). Harkins (1987) suggested that this reduction in effort stems from the fact that when participants “work together,” their outputs are pooled and the participants can receive neither credit nor blame for their performances. Consistent with this analysis, in their meta-analysis, Karau and Williams (1993) found that when the potential for external evaluation was held constant, the social loafing effect was eliminated. Most social loafing research has used simple tasks on which the potential for evaluation enhances performance. However, several experiments have also incorporated complex versions of the tasks on which it was found that the potential for evaluation debilitated performance (e.g., Jackson & Williams, 1985; Sanna, 1992).

Special thanks go to my colleague, Neal J. Pearlmutter, for all of his help and advice in the conduct of this research. Thanks also go to Sean Allen for writing the computer programs used in this work. Correspondence concerning this article should be addressed to Stephen G. Harkins, Department of Psychology, Northeastern University, 125 NI, Boston, MA 02115. E-mail: [email protected]

Journal of Personality and Social Psychology, 2006, Vol. 91, No. 3, 436 – 455 Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.436

436

THE MERE EFFORT HYPOTHESIS

to generate as many uses as possible for the object without regard to creativity, participants subject to evaluation generated more uses than participants who were not. Thus, these results replicate the findings of Jackson and Williams (1985) and Sanna (1992), who also used simple and complex tasks. Initial work in the achievement goal tradition (e.g., Elliott & Dweck, 1988) argued that individuals in achievement situations pursued one of two types of major goals: (a) performance goals, which focused on the demonstration of competence, and (b) learning or mastery goals, which focused on the development of competence. More recent work (e.g., Elliot & Church, 1997) has used a trichotomous framework that incorporates not only the distinction between performance and learning–mastery goals but also a distinction between approach (striving toward a positive outcome) and avoidance tendencies (avoiding a negative outcome). More specifically, this approach incorporates three types of goals: mastery goals, which are approach goals based on task-based or intrapersonal competence; performance-approach goals, which are approach goals based on attaining normative competence; and performance-avoidance goals, which are avoidance goals based on avoiding normative incompetence. In this framework, mastery goals are seen as opportunities for individuals to increase their abilities or master new tasks. On the other hand, performance-approach goals use a positive normative standard in competence evaluation which is presumed to evoke many of the same positive processes as mastery goals, but is also thought to foster a more external focus on the evaluative environment and on what is needed to attain competence. (Elliot, Shell, Henry, & Maier, 2005, p. 637)

As a result, performance–approach goals can evoke the same effort expenditure and persistence as produced by mastery goals, but the external focus facilitates performance over a wider range of situations and tasks than is the case for mastery goals. Specifically, Elliot et al. argued that the advantage of performance–approach goals over mastery goals is most likely to be observed when (a) competence is evaluated using a normative standard, (b) competence is evaluated publicly, (c) shallow processing is needed for the attainment of competence, (d) the task is boring, (e) competence feedback is acquired from an external source, (f) short-term outcomes are considered, and (g) instrumentalities are present. (p. 637)

Finally, performance–avoidance goals are focused on avoiding negative outcomes, and “evoke a host of negative processes (distraction, anxiety, self-protective divestment) that undermine performance in most achievement settings” (Elliot et al., 2005, p. 631). In sum, this approach proposes that the potential for evaluation can facilitate or debilitate performance depending on the orientation taken to the task (approach vs. avoidance), the standard of comparison (self vs. other), and the type of task (e.g., requiring shallow vs. deep processing). Elliot et al. (2005) argued that performance on boring tasks that require shallow processing should be facilitated by a performance–approach goal (public evaluation, normative standard), whereas performance on interesting tasks that require deep processing should be facilitated by a mastery goal (private evaluation, intrapersonal standard). Of course, performance on this latter type of task should be debilitated

437

by performance–avoidance goals. However, research has shown that performance on these types of tasks can also be debilitated by performance–approach goals. For example, Grolnick and Ryan (1987) found that participants given instructions consistent with a performance–approach goal orientation performed better than a control group on a rote task (shallow processing) but performed more poorly on a conceptual one (deep processing). We argue that interesting tasks that require deep processing represent at least a subset of tasks that can be classified as complex, whereas boring tasks that require shallow processing can be classified as simple. Research on social facilitation has shown that the presence of coactors leads to enhanced performance on simple tasks, and debilitated performance on complex tasks, but there is little agreement on what it is about the presence of these others that produces these effects (Geen, 1989). However, all of the theories that attempt to explain the phenomenon focus on the effects of one or both of the same two features of the facilitation paradigm: the mere presence of others and/or the potential for evaluation that these others represent. Harkins (1987) provided evidence that each of these factors contributes to facilitation effects. For the purposes of the present analysis, we focus on the fact that the potential for evaluation represented by the presence of coactors leads to improved performance on simple tasks and to debilitated performance on complex tasks. Thus, across these five traditions, we find that the potential for external evaluation tends to facilitate performance on simple tasks but to debilitate it on complex ones. These traditions have proposed process models to account for this pattern of findings, but a review reveals no agreement across, or even within, these traditions. For example, at least four different explanations have been proposed to account for the fact that the potential for evaluation can debilitate performance on complex tasks: (a) concern about failure leads to withdrawal of effort (social loafing: Shepperd, 2001; achievement goal theory: Elliot et al., 2005; creativity: Hennessey, 2001; social facilitation: Carver & Scheier, 1981); (b) concern about failure diminishes processing capacity (achievement goal theory: Elliot et al., 2005; social facilitation: Bond, 1982; cf. Sarason, Pierce, & Sarason, 1996); (c) attentional overload restricts focus of attention leading to poor performance on complex tasks, which often require utilization of a wider range of cues than simple tasks (social facilitation: Baron, 1986; creativity: Hennessey, 2001); and (d) drive enhances the probability of the emission of the dominant response, which is likely to be wrong on a complex task (social facilitation: Zajonc, 1965). Finally, Tubbs (2001) argued that, although various possibilities have been suggested (e.g., diminished processing capacity), there is no compelling explanation for the effect of goal setting on the performance of complex tasks. Thus, more than 100 years after the first published experiment on the effects of evaluation on performance (Triplett, 1898), there is still no agreement about what process(es) mediate(s) these effects. Specifying the mediating process is key to the theoretical development of each of these research traditions as well as to any effort to integrate them. In addition, when it comes to application, it is impossible to suggest interventions if we do not understand the mediating process. For example, the intervention that would be designed if people were withdrawing effort and failing as a result would be completely different from the intervention that would be

438

HARKINS

proposed if people were trying hard but their efforts were misdirected, leading to failure. Harkins (2001b) argued that the field’s failure to resolve this issue may be a result of the fact that efforts have focused broadly on theory construction rather than on the tedious analysis that would be required to learn exactly how performance unfolds on a given task. However, it is through just such an analysis that the mediating process may be identified. The current article describes a molecular analysis of the effects of evaluation on the performance of a specific task, the Remote Associates Test1 (RAT; Kihlstrom, 2006; Mednick & Mednick, 1967), which was undertaken in an effort to determine whether this approach can help to identify the mediating process. The RAT requires participants to examine sets of three words (e.g., “elephant,” “lapse,” and “vivid ”) and to generate a fourth word that is somehow related to each word of the triad (“memory”). Harkins (2001b) demonstrated that a manipulation of the potential for evaluation produces the typical pattern of results on simple and complex versions of this task. In Harkins’s (2001b) research, all of the participants were asked to solve as many of the triads as they possibly could in the 12 min that they were allotted. Half of the participants were told that the experimenter would determine how well they had performed at the end of the experimental session (experimenter evaluation), whereas the other half were told that their responses would be averaged with those of previous participants (no experimenter evaluation). Crossed with the manipulation of the potential for experimenter evaluation, one half of the participants were given 20 triads that pretesting had shown to be relatively simple, whereas the other half of the participants were given 20 triads that pretesting had shown to be difficult. Consistent with past research, at the molar level of analysis, Harkins (2001b) found that participants subject to evaluation by the experimenter solved more simple triads than participants who were not. However, when faced with complex triads, participants who were not subject to experimenter evaluation solved more triads than participants who anticipated evaluation by the experimenter. We began the molecular analysis by testing potential explanations for what makes these “simple” and “complex” RAT triads simple and complex, respectively. For example, it is possible that simple triads are more easily solved than complex ones because their solutions simply occur more frequently in the lexicon than the solutions for complex triads. However, a comparison of the frequencies of the solutions for simple and complex triads revealed no differences ( p ⬎ .65). There could be greater consistency in the meaning of the solutions for simple triads than for complex triads (e.g., electric “chair,” high “chair,” vs. tennis “match,” light “match”). However, an analysis of the number of different meanings for the solutions (one vs. two vs. three) for simple versus complex triads revealed no reliable differences ( p ⬎ .70). It could be that solutions to simple triads come consistently before or after the triad member (e.g., gravy “boat,” tug “boat,”), but the solution for a given complex triad comes both before and after (e.g., “tape” worm, scotch “tape”). However, our analysis revealed that simple and complex triads did not differ in the extent to which the solution came before or after the triad members ( p ⬎ .30). We next considered the possibility that triad members for simple items are more closely associated with each other than is the case

for the triad members of complex triads, and this association simplifies the search for the solution. For example, perhaps it is easier to think of “ice” as a possible solution when given the triad members “skate” and “water” than to think of “bank” when given the triad members “river” and “note,” because the simple triad members, “skate” and “water,” are more closely associated with each other than the complex triad members, “river” and “note.”

Experiment 1: Triad Member Association As a test of the triad member association hypothesis, participants were asked to rate the association of each pair of triad members in each RAT item. To the extent that the difference in the difficulty in simple and complex items is the result of differences in the extent of association of the triad members, we should find that, on average, triad members for simple items are rated as more associated with each other than is the case for complex items.

Method Participants In this experiment and in all of the research that follows, participation was limited to native English speakers. Thirty participants (53% female, 47% male) took part in the triad member association study as a means of satisfying an Introductory Psychology course requirement.

Procedure Participants were run in groups ranging in size from 6 to 12. Participants were given a booklet with the following instructions on the first page: On the following pages you will find pairs of words. Your task is to rate the extent to which these words are associated. By “associated,” we mean the extent to which one word makes you think of the word, or the extent to which the words seem to go together. If there is an association, it can take one of several forms. For example, one word could precede the other, one word could follow the other, or there could just be some relationship between the two words. So, “jump” and “joy” go together in that one can “jump for joy.” “Kill” and “joy” are associated in that one can be a “killjoy.” However, seeing the word “joy” does not make one think of the word “car.” So, these two words are not associated. For each pair of words that you see on the following pages, please rate their degree of association on a 9-point scale where 1 means that you see no association between the words and 9 means that you see a very high degree of association between them. The remaining pages of the booklet consisted of a list of the 120 pairs of words that represented the three pairwise combinations of the triad members for each of the 20 simple and 20 complex triads. Thus, for example, for the simple triad “surprise line birthday,” the participants rated the degree of association of “surprise” and “line,” “surprise” and “birthday,” and “line” and “birthday.” The order of the pairs was randomized once and this order was used for all of the participants. After reviewing the instructions, the experimenter asked the participants to turn the page and begin. They were given as much time as they required to complete the ratings.

1

I wish to thank Sarnoff A. Mednick for permission to reprint items from Examiner’s Manual: Remote Associates Test, by S. A. Mednick and M. T. Mednick, 1967, Boston: Houghton Mifflin. Copyright 1967 by S. A. Mednick.

THE MERE EFFORT HYPOTHESIS

Results The mean association for each of the three ratings for each triad was calculated, and these three means were then averaged as a measure of the mean association of the triad members for each triad. The analysis of these data revealed no significant difference between the mean association of the triad members for simple (M ⫽ 3.68) and complex (M ⫽ 3.60) triads (F ⬍ 1, p ⬎ .80).

439

incorrect, generating another close associate would be a simple matter because these words will be highly activated. On the other hand, on complex triads, the correct answer is much less likely to be produced as an associate of one of the triad members. For example, although the word “memory” is related to “elephant” (i.e., memory like an elephant), “elephant” is quite unlikely to generate “memory” as an associate.

Method Discussion The findings for the ratings of triad member association show that triad members of simple and complex items do not differ in the extent to which they are seen as associated. Overall, the triad members were rated as exhibiting a relatively low level of association with each other (overall M ⫽ 3.64 on a 9-point scale). We next considered the possibility that simple triads are more easily solved than complex ones because the triad members for simple items are more strongly associated with their solutions than is the case for triad members for complex items. This explanation proposes that the triad members and their solutions are part of an associative network (McClelland & Rumelhart, 1981). When a particular word (i.e., triad member) is considered, its node is activated, and this activation spreads to other associated words. The strength of this activation corresponds to the strength of the association between the words. This interactive activation model incorporates inhibitory as well as excitatory activation. Thus, as a potential answer gains activation, it also inhibits the activation of other words that could be candidates for the correct answer. This activation model would suggest that the simple items are solved more easily than the complex ones because the activation of the simple triad members’ nodes also strongly activates the nodes of highly associated words, one of which is more likely to be the answer than is the case for complex triads. Put another way, on the simple triads, it is very likely that at least one of the triad members will lead the participant to think of an associate that, when tested against the other two, will turn out to be correct. On the other hand, on the complex items, the associations between the triad members and the correct answer are much weaker (i.e., the associates are more remote), and it is much less likely that the solution will emerge as an associate of any one of the triad members. In addition, the strong activation of closely associated words will inhibit the activation of the remote associates. Thus, the correct answer will take longer to be produced if it is produced at all.

Experiment 2: Testing Close Associates To test this activation hypothesis, we gave participants up to 1 min to generate up to 10 associates for each of the triad members of both simple and complex triads (e.g., “family,” “apple,” “house,” etc.). To support the activation hypothesis, we should find that the “correct” answer is more likely to emerge as one of the associates for the simple triad members than for the complex triad members. For example, one is very likely to think of the word “ice” when presented with the triad member “skate.” This potential answer can then be tested against the other two triad members, “water” and “pick.” In this case, the triad would be solved. However, if the answer were

Participants The 128 participants (52% female, 48% male) took part as a means of satisfying an Introductory Psychology course requirement.

Procedure Participants were run in eight groups ranging in size from 14 to 18. Participants were given a booklet with a cover sheet that informed them that on each of the following pages they would find a single word. Their task was to simply write down words that were associated with the word that was on the page. The word could just come to mind when they thought of the word on the page. They could also generate associates by thinking of words that could come before or after the word on the page. They read that they would be told when to turn the page and begin. They would be given 1 min to generate up to 10 associates for each word. After 1 min, they would be asked to go on to the next word and generate associates for it, and so on. If they could not generate 10 associates, they were to generate as many as they could and then wait for the signal to go on to the next word. They were asked not to go back in their booklets to an item that had been completed nor to go forward to a word that had not yet been reached. After the experimenter reviewed the instructions, he began the task. Each group had been randomly assigned one of eight booklets. Each booklet consisted of the instruction page plus 15 pages, each of which had a word printed at the top, and numbers from 1 to 10 down the left-hand side of the page. Because participants in Harkins’s (2001b) research saw only simple or complex triads, these participants also saw only words from simple or complex triads. However, as a means of reducing fatigue effects, the participants only generated associates for 15 words. The 20 simple and 20 complex triads were each randomly divided into four sets of five triads (four sets of five simple triads and four sets of five complex triads). The order of the triads within each booklet was randomized, but words within a triad were kept together and presented individually, one per page, in the same order that they came in the triad. For example, “skate,” the first word of the triad “skate water pick,” was always presented first, followed by “water” and “pick.”

Results For each member of each triad, we calculated the proportion of “correct answers” (i.e., the proportion of respondents that generated the associate that was the “correct answer” for that triad). For example, “tree” is the solution for the triad “family apple house.” We calculated the proportion of participants who generated the word “tree” as an associate for the triad member “family,” for “apple,” and for “house.” Then for each triad, the probability that the “correct answer” would be generated in response to at least one of the triad members was calculated. Once again, the hypothesis we were testing is that simple triads are simple because their solutions are more closely associated with the triad members than is the case for complex items. If this is the case, we should find that the participants are more likely to produce at least one associate that is the “correct answer” for simple triad members than for

HARKINS

440

complex triad members. Consistent with this hypothesis, we found that the “correct answer” was more than twice as likely to be produced as an associate for at least one of the triad members for the simple triads (M ⫽ .76) than for the complex triads (M ⫽ .37), F(1, 38) ⫽ 42.48, p ⬍ .0001, d ⫽ 2.11.

Discussion These findings are consistent with the activation hypothesis: Simple triads are easier to solve than complex ones because the correct answers are more strongly associated with the triad members of the simple triads than of the complex ones. This finding suggests that to solve the simple triads, participants need only produce close associates of the triad members and then test these associates against the other two triad members, and this appears to be exactly the approach taken by participants subject to experimenter evaluation in Harkins’s (2001b) research. In the experimenter evaluation–simple triad condition, the probability that the correct associate would be generated in response to at least one of the simple triad members was significantly correlated with the proportion of participants who correctly solved the problem, r(18) ⫽ .53, p ⬍ .02, and these participants solved 77% of the simple RAT triads. On the other hand, no-experimenter-evaluation participants solved only 57% of the simple triads, and the probability that the correct associate would be generated in response to at least one of the triad members was not reliably related to their performance, r(18) ⫽ .26, p ⬎ .28. We argue that this correlation is attenuated because the participants are loafing. There is no reason that the no-experimenter-evaluation participants could not generate the correct associate and solve the problem, but instead they generated an associate that is related to one or two of the triad members and stopped there. Consistent with this argument, Harkins (2001b) found that noexperimenter-evaluation participants produced more incorrect answers than participants who were subject to experimenter evaluation but did not differ from these participants in the number of times they left an answer blank. If no experimenter-evaluationparticipants were willing to provide incorrect answers rather than continuing to test associates, on average, they should take significantly less time to solve the simple items that they did solve than participants subject to experimenter evaluation, and Harkins (2001b) found that they did. These findings suggest that participants who were not subject to experimenter evaluation performed more poorly than those who were because they simply did not make the requisite effort on a subset of the simple items. Instead of trying to come up with an answer that related to all three of the triad members, they were satisfied to enter an answer that was related to one or two of the words in the triad.2 These findings are consistent with previous research on social loafing (e.g., Latane´ et al., 1979), which suggests that participants who are not subject to experimenter evaluation put out less effort than those who are. On the simple triads, testing close associates of the triad members works well as a means of solving the problems, but, of course, on these problems, the solutions are not really remote associates of the triad members. Instead, our findings suggest that the solutions for the simple items are strongly associated with their triad members. In contrast, the solutions for the complex items are much

more weakly associated with their triad members, and testing close associates is unlikely to produce the solution. This finding accounts for the fact that performance is so much worse on complex triads than on simple ones, but it does not account for the typical performance reversal found on complex tasks. For example, Harkins (2001b) found that participants subject to evaluation solved only 12% of the complex triads as compared to the 28% solved by participants who were not subject to this evaluation. However, the model of spreading activation does suggest an explanation for this reversal. If the solution for a given RAT item were remotely associated with the triad members, participants would be extremely unlikely to produce the solution by generating associates for the individual triad members. Thus, for example, if presented with the triad member “note,” a participant would be extremely unlikely to produce the associate “bank.” Nonetheless, the activation model suggests that when the participant considers the word “note,” the solution, “bank,” will be weakly activated. Likewise, the solution, “bank,” will also be weakly activated when the participant considers the other two triad members, “river” and “blood.” If this were the only operative process, over time, this weak activation should accumulate, leading to the emergence of the correct answer. However, the interactive activation model incorporates inhibitory as well as excitatory activation. If participants actively test close associates as solutions for the triads, these associates will be highly activated and will strongly inhibit the activation of remote (weak) associates. And, of course, the results for the simple triads suggest that participants subject to experimenter evaluation are testing close associates to a greater degree than are participants who are not subject to evaluation. To the extent that participants subject to experimenter evaluation exhibit this same behavior on complex triads, the summative process necessary for the solution of complex triads will be less likely to culminate in a solution for these participants than for participants who are not subject to evaluation. Thus, the interactive activation model can account for 2 Of course, one could argue that from the participants’ points of view, these responses were not errors at all. The incorrect answers were responses that tended to be related to one or two of the triad members, but not to all three. For example, in response to the triad family apple house, a participant responded green. In this case, green could be associated with apple and house, but not with family. Nonetheless, the participant may have assumed that this answer was correct, even though she did not know the nature of the relationship between green and family. Harkins (2001b) tested the hypothesis that participants viewed these “incorrect” answers as correct by examining their estimates for the number of RAT items they solved correctly. If they considered these “incorrect” answers to be correct, they should have included them in their estimates of the number of RAT items that they correctly solved. However, there was no reliable difference between the number of triads correctly solved and the participants’ estimates of the number of items they solved ( p ⬎ .50). In addition, the absolute difference between the number of items that participants actually solved correctly and their estimates of the number of items that they correctly solved was reliably smaller (M ⫽ 2.61) than the absolute difference between the number of items correctly solved plus the “incorrect” answers and their estimates of items correctly solved (M ⫽ 5.28), F(1, 32) ⫽ 6.23, p ⬍ .02, d ⫽ 0.88. Thus, these data suggest that participants did not include these “incorrect” answers in the number of triads that they estimated that they correctly solved.

THE MERE EFFORT HYPOTHESIS

both the solution process and the performance reversal on complex triads. The lexical decision paradigm offers a means of testing this hypothesis. In this paradigm, participants are shown a stimulus and asked to indicate as quickly as possible whether the stimulus is a word or a nonword. Research using this paradigm (e.g., Meyer & Schvaneveldt, 1971) has shown that participants recognize a stimulus as a word more quickly when that word has been activated by the participant’s previous exposure to a related word. For example, when participants are presented the word “doctor,” they subsequently recognize the stimulus “nurse” as a word more quickly than a control word matched in length and frequency that is unrelated to “doctor.” And the strength of this activation is reflected in the speed of recognition time. Thus, if the activation of a solution for a complex RAT item is building up across time, we should be able to track this process by interpolating the lexical decision task at successive points in the solution attempt. Thus, for example, if participants are asked to make lexical decisions early on in the process, we should find no evidence of activation (i.e., correct answers are not identified as words any faster than their matched control words), whereas later in the process, we should find evidence of such activation. This paradigm also allows us to examine the effect of evaluation on this activation process.

Experiment 3: Lexical Decisions for Correct Answers In our previous research (Harkins, 2001b), participants were given 12 min to spread among the RAT triads. To use the lexical decision paradigm, we must present the triads one at a time for a fixed interval so that we can control the timing of the lexical decision task. Pilot testing showed that the typical pattern of results was produced when we provided participants with 30 s to solve each triad. This 30-s interval is quite close to the average amount of time provided for each triad in the “total time” paradigm (720 s/20 triads ⫽ 36 s/item). Of course, we had no a priori means of knowing the time course of activation of the correct answers. As a first approximation, we placed the lexical decision task 5 s into the 30-s period provided for each RAT item. That is, a triad was presented, and 4.5 s later the participant heard a tone and saw a box on the screen that indicated that a lexical decision was upcoming 0.5 s later. At that point, a word (correct answer or matched control word) appeared in the box. To the extent that the correct answer was identified more quickly as a word than a control word matched in length and frequency, we took this as evidence of activation.3 After completing the 5-s version, we then replicated the experiment but placed the lexical decision 3 s into the 30-s solution period. Finally, we replicated the experiment once again but placed the lexical decision 7 s into the solution period. In our analysis, we focused on the performance of the participants who were not subject to evaluation to find evidence of activation because it is these participants who are better able to solve the complex triads. To the extent that we find support for the activation hypothesis in the performance of these participants, we can then compare their pattern of activation to that of the participants who are subject to experimenter evaluation.

441 Method

The 3-, 5-, and 7-s versions were run as three separate experiments, but because they were exact replications aside from the placement of the lexical decision, we describe them together.

Participants Sixty undergraduate students participated in each version of the experiment (3 s: 52% female, 48% male; 5 s: 51% female, 49% male; 7 s: 52% female, 48% male) as a means of satisfying an Introductory Psychology course requirement.

Procedure Participants were scheduled individually. Upon entering a lab room, they were told that the experiment required them to perform two tasks. The experiment, which was run on a computer, consisted of 20 trials. On each trial, the computer would present them with three words arrayed from left to right on one line. Their first task was to try to think of a fourth word that was related to each of the three words as quickly as they could. For example, if they saw the triad “elephant vivid lapse,” they were to think of the word related to each of these three words (in this case, “memory”), and to press the “answer” button on the computer and type this answer in the box that appeared as quickly as they could. Once they pressed the button, they had 6 s to enter their response. If they did not press the button within the 30-s trial, they were timed out, and after a 2-s intertrial interval (ITI), the next triad was presented. They were told that their second task consisted of making word– nonword judgments. Several seconds after the triad was presented, they would hear a tone and a box would appear on the screen. One-half s later, a stimulus would appear in the box. They were to press the word button if the stimulus was a word, and the non-word button if the stimulus was a nonword. They were to make this judgment as quickly and as accurately as they could. In the 5-s version of the experiment, the tone sounded and the box appeared 4.5 s after the triad was presented, followed by the word– nonword judgment 0.5 s later. In the 3-s version, the tone sounded and the box appeared 2.5 s into the interval, followed by the word–nonword 0.5 s later. Finally, in the 7-s version, the tone sounded and box appeared 6.5 s into the interval, followed by the word–nonword judgment 0.5 s later. As quickly as they could, the participants were to identify the stimulus as a word or a nonword. The participants had 2 s in which to make this response. After this period, the box disappeared and the triad reappeared on the screen. The participants were told that these two tasks (solving triads and making word–nonword judgments) were equally important and that when the tone sounded and the box appeared, they were to focus on the lexical judgment. In fact, when they were making this judgment, they would be unable to enter an answer for the triad. Half of the participants were randomly assigned to an experimenterevaluation condition. They were told that their responses to the triads and their lexical decisions would be examined by the experimenter at the end of the experiment (experimenter evaluation). In addition, the experimenterevaluation participants were told that they would not be given feedback on their performance, because this information, if disseminated, could affect the performance of later participants. The other half of the participants were told that we were interested in average performance. These noexperimenter-evaluation participants were told that after they performed, they would be asked to press a button that would average their performances on both the RAT and the lexical decision task with the performances of previous participants.

3 This method represents the modification of a procedure used by Shames (1994; reported in Kihlstrom, Shames, & Dorfman, 1996) to study implicit problem solving on the RAT.

442

HARKINS

After going through four practice trials (two in which the stimulus in the lexical decision task was a word, and two in which it was not), all of the participants saw the 20 complex triads used by Harkins (2001b) presented in randomized order for 30 s each. On 10 of the trials, the stimulus in the word–nonword task was a word; on the other 10 trials, the stimulus was a nonword. The activation hypothesis argues that the weak activation produced by the remote associations between the triad members and the solution accumulates, leading to the emergence of the correct answer. Missing the contribution of even one of the triad members could undermine this process. To ensure that the triad members were seen as related to their respective solutions, 36 students (50% male, 50% female), drawn from the same pool as the research participants, were asked to rate the extent to which the triad members were related to the solutions on 9-point scales. This measure allows us to determine whether a given triad member and the correct answer are related, even though it is unlikely that the triad member would elicit the correct answer as an associate. That is, the triad member and the answer are related but the association is remote. For example, none of our participants in Experiment 2 gave the word “chair” as an associate for the word “electric,” but the participants who made the relatedness ratings judged “electric” and “chair” to be highly related (M ⫽ 8.91 on the 9-point scale). In those cases in which participants in this relatedness study rated triad members as related to the correct answer, we take this as evidence that the correct answer will be activated when the triad member is viewed, even if the answer does not emerge quickly (or at all) when participants are asked to generate associates for the triad member. Although all of these triads have been in common use in the literature, these ratings showed that for some of the triads some of the triad members are not seen as related to the “correct” answer, at least in our population of research participants. For example, our research participants do not rate the word “lick” in the triad “lick mine sprinkle” as related to the answer “salt.” Nor do they see “Saturday” in the triad “room Saturday salts” as related to the answer “bath.” On the basis of these ratings, we selected the 10 “best” triads of the 20 complex triads for the word trials. For these 10 triads, the lowest relatedness rating of a triad member to its correct answer was 7.2 (on a 9-point scale), and the average relatedness of the triad members to the correct answer was 7.81.4 On 5 of the 10 word trials, the stimulus in the lexical decision task was the correct answer for that triad; on the other 5 word trials, the stimulus was a word matched in length and frequency with the word that was the correct answer for that triad (Kucera & Francis, 1967). However, this word was not related to any of the triad members, nor was it related to the correct answer (M relatedness ratings ⬍ 4 on a 9-point scale). Because any given participant could see a given triad only once, the 10 word triads were randomly divided into two sets, only one of which any participant saw. In one set (Set A), for Triads 1 through 5, the correct answer for that triad was the stimulus in the lexical decision task, whereas for Triads 6 through 10, the stimulus was the matched control word for each of those triads. In the second set (Set B), for Triads 1 through 5, the stimulus was the matched control word for that particular triad, whereas for Triads 6 through 10, the correct answer was the stimulus in the lexical decision task. As a result, when reaction times are averaged across the entire set of participants, the means reflect the average reaction time for words that were the correct answer and their matched control words. The 10 triads for which one or more of the triad members was seen as unrelated to the triad “solution” were used on the nonword trials. For each of these 10 triads, to create the nonword, we selected a word that matched the word that would have been the “correct” answer for that triad in length and frequency (Kucera & Francis, 1967). However, the initial letter was changed to a different consonant. Thus, for example, the “correct” answer for the triad “lick sprinkle mines” is “salt.” The word “path” matches “salt” in length and frequency. To make the nonword, the initial “p” was changed to a “g” (“gath”).

The participants had 2 s in which to make their lexical decision after which the stimulus disappeared and they went back to solving the RAT. When they solved the triad or at the end of 30 s, there was a 2-s ITI, and then the next triad appeared. After completing the RAT items, the computer instructed the participants in the experimenter-evaluation condition to go and get the experimenter. In the no-experimenter-evaluation condition, the computer instructed the participants to press a button to average their performance with that of the previous participants. After doing this, they were asked to go and get the experimenter. All participants were then asked to respond to an experimenter-evaluation manipulation check and some ancillary measures on 11-point scales.

Results The three versions of the experiment were conducted in the sequence 5 s, 3 s, and 7 s. In fact, until we ran the first version (5 s), we had no way of knowing the possible time course of the activation process. As a result, it is inappropriate to conduct overall analyses using time of lexical decision as a factor. However, because the experiments are exact replications aside from the timing of the lexical decision, we present the results of each of the versions together rather than presenting the results of each experiment in turn. Where appropriate, here and throughout the analyses 4

On the complex items, the activation hypothesis argues that the associations of the triad members to the answer are weak already, and it is the summation of the activation produced by these weak associations that yields the correct answer. Given this argument, eliminating (or reducing the strength of) even one of the three weak associations should make the problems more difficult if not insoluble. To test this hypothesis, we used the relatedness ratings to assign the triads to one of three conditions: All three triad members were rated as related to the solution (each triad member was related 7 or higher on the 9-point scale), two of the triad members were rated as related to the solution, or only one of the triad members was rated as related to the solution. Using Harkins’s (2001b) data, we then conducted a 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 3 (all triad members related to the correct answer—7 and above—vs. two related vs. one related) analysis of variance with the proportion of participants solving each complex RAT item as the dependent variable. This analysis showed that no-experimenter-evaluation participants solved 44% of the problems on which each triad member had a relatedness ratings of 7 or greater, but when only two triad members had a relatedness rating of more than 7, performance fell to 20%, significantly below the level found in the “all-threerelated” condition ( p ⬍ .05). When only one triad member had a relatedness rating of more than 7, performance was also significantly lower than in the “all-three-related” condition (M ⫽ 15%, p ⬍ .05), but no lower than when two were related ( p ⬎ .20). In contrast, participants subject to experimenter evaluation performed poorly not only when only one (M ⫽ 4%) or two (M ⫽ 13%) triad members were related to the correct answer, but also even when all three triad members were related to the correct answer (M ⫽ 13%, ps ⬎ .20). These findings are consistent with the argument that good performance on the complex items requires the summation of the weak activation produced by the remote association of each of the triad members and the correct answer. Missing the weak activation provided by even one triad member undermines performance, because without this input, the activation level of the correct answer never reaches the necessary level. Although these findings are consistent with the activation hypothesis, the interpolated lexical decision task provides a direct test of the activation hypothesis and may also suggest why participants subject to evaluation perform poorly regardless of the relatedness of the triad members to the solution.

THE MERE EFFORT HYPOTHESIS

443

Table 1 Manipulation Check for Manipulations of Experimenter Evaluation

Condition

Experimenter evaluation

No experimenter evaluation

M

M

SD

SD

F(1, 56)

p

d

2.40 2.40 3.07

98.00 50.70 45.97

.0001 .0001 .0001

2.65 1.90 1.81

3.04 2.69

37.18 34.18

.0001 .0001

1.63 1.56

2.80 2.62

40.54 66.60

.0001 .0001

1.70 2.18

Experiment 3 3s 5s 7s

9.38 8.37 9.10

1.90 2.98 2.04

3.67 3.43 4.50 Experiment 4

7s 5s

8.53 8.77

2.78 2.69

3.93 4.73 Experiment 7

7 s, correct answer 7 s, close associates

8.67 8.90

2.59 2.04

4.17 3.90

presented in this article, a priori contrasts (Kirk, 1995) were used to compare the means.

Experimenter-Evaluation Manipulation Check The responses to the manipulation checks for experimenter evaluation were analyzed in 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) analyses of variance (ANOVAs). As shown in Table 1, in each version of the experiment (3 s, 5 s, 7 s), participants subject to evaluation by the experimenter reported that they could be evaluated by the experimenter to a greater extent than participants who were not subject to experimenter evaluation ( ps ⬍ .0001).

Lexical Decision Task The reaction times for the words were analyzed in 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ⫻ 2 (correct answer vs. matched control) ANOVAs with experimenter evaluation and set as between-subjects factors and word type (correct vs. control) as a within-subjects factor.5 For ease of exposition, we present the results ordered by the time at which the lexical decision task was introduced (3 s, 5 s, 7 s) rather than in the order in which the experiments were run. 3 s. This analysis revealed no reliable effects ( ps ⬎ .20). Correct answers were not identified as words any more quickly (M ⫽ 803.78 ms, SD ⫽ 204.73) than the matched control words (M ⫽ 784.25 ms, SD ⫽ 228.02; F ⬍ 1). Nor was there any sign of an interaction between word type and the potential for experimenter evaluation (F ⬍ 1). We have argued that this process requires the accumulation of the weak activation provided by each of the triad members, which suggests that if we place the lexical decision at an early point in the process, there will not have been sufficient time for the activation to accumulate. These findings are consistent with the notion that, at 3 s, there has not been sufficient time for activation to have accumulated.

5 s. This analysis revealed a main effect for word type, F(1, 56) ⫽ 7.46, p ⬍ .01, d ⫽ 0.73. The correct words were identified as words more quickly (M ⫽ 713.75 ms, SD ⫽ 192.95) than the matched control words (M ⫽ 768.91 ms, SD ⫽ 204.85). In addition, the interaction between experimenter evaluation and word type (correct vs. control) was not reliable (F ⬍ 1, p ⬎ .90). The correct answers were recognized an average of 55.16 ms earlier than the matched control words by both those participants subject to evaluation and those who were not, suggesting that, at 5 s, the correct answer is activated whether or not participants are subject to evaluation. 7 s. At 7 s, the main effect for word type was also significant, F(1, 56) ⫽ 4.48, p ⬍ .05, d ⫽ 0.57. Participants responded to the correct answers more quickly (M ⫽ 793.93, SD ⫽ 235.40) than their matched control words (M ⫽ 854.84, SD ⫽ 228.82). However, this word type main effect must be interpreted in terms of the Experimenter Evaluation ⫻ Word Type interaction that was also obtained, F(1, 56) ⫽ 5.32, p ⬍ .05, d ⫽ 0.62. For participants subject to experimenter evaluation, there was no difference in reaction time between correct answers (M ⫽ 831.89 ms, SD ⫽ 250.20) and matched control words (M ⫽ 826.44 ms, SD ⫽ 194.40; F ⬍ 1). For participants who were not subject to evaluation, there was a highly reliable difference, F(1, 56) ⫽ 9.78, p ⬍ 5

In this experiment and in each of the following experiments in which we used the lexical decision task, we examined the reaction times only for words. Of course, nonword trials are needed to make the procedure credible, but the speed of the lexical decisions on these trials is not of interest. In each of these experiments, we also excluded trials on which the participants made errors in the word–nonword judgment or timed out. This amounted to less than 5% of the trials in each experiment and did not differ by condition ( ps ⬎ .20). Finally, in each of these experiments, we excluded trials on which the participants provided an answer, whether correct or not, for the triad prior to the word–nonword judgment. The incidence of these “fast” trials ranged from 0% to 10% and did not differ as a function of experimental conditions ( ps ⬎ .20).

444

HARKINS

.01, d ⫽ 0.84. These participants recognized the correct answers as words more quickly (M ⫽ 755.98, SD ⫽ 217.17) than the matched control words (M ⫽ 883.25, SD ⫽ 258.97). These findings suggest that for participants who were not subject to experimenter evaluation, the activation of the correct answer that was present at 5 s had continued to strengthen at 7 s (127 ms vs. 54 ms). However, for participants who were subject to experimenter evaluation, the correct answer was no longer activated (–5 ms).

7 s), they replicate the findings of our previous research in which we have not used the lexical decision task. When all triad members are “remotely” related to the correct answer, participants who are not subject to experimenter evaluation outperform participants who are subject to this evaluation. These findings strongly suggest that the insertion of the lexical decision task does not change the solution process in a way that would threaten our interpretation of the activation effects. Given this outcome, in the remainder of this article, in those experiments that incorporate the lexical decision task, we report only the findings for this task.

RAT Performance The focus of the current research was on the results of the lexical decision task, rather than on performance effects, which have already been well established in this line of research. However, we do need to be sure that the insertion of the lexical decision task does not change the solution process in a way that would threaten our interpretation of the activation effects. Demonstrating that we can replicate the basic performance finding in this paradigm offers one means of providing such evidence. Each of the participants saw 20 RAT problems. However, on 5 of the 20 problems, the correct answer was the stimulus in the lexical decision task. Although in many cases the participants did not report noticing that any of the stimuli in the lexical task were the correct answers, exposure to the correct answer eliminated the effect of evaluation potential at each of the lexical decision points ( ps ⬎ .50). Overall, when shown the correct answer, the participants solved 55% of the triads.6 This outcome is to be expected; whether the participants knew it or not, the manipulation activated the correct answer. As noted previously, on the 10 triads in which nonwords were presented as the stimuli in the lexical decision task, we found that one or more of the triad members were unrelated to the correct answer. As a result, the “correct” answer is missing the weak activation contributed by the(se) triad member(s), and performance should be poor regardless of evaluation potential. Consistent with this expectation, overall, participants who were not subject to evaluation solved as few RAT items (M ⫽ 14%) as participants who were subject to evaluation (M ⫽ 11%). This leaves the five triads in which the matched control words were presented as the stimuli in the lexical decision task to determine whether we can replicate the typical evaluation–performance finding on complex tasks. To generate more power to detect differences despite the small number of trials, we conducted an overall analysis as though we had randomly assigned participants to the three versions of the experiment rather than conducting the experiments sequentially. In this 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ⫻ 3 (3 s vs. 5 s vs. 7 s) ANOVA, we analyzed the percentage of triads solved on the five trials on which matched control words were presented in the lexical decision task. Replicating the typical performance finding in this line of research, participants subject to evaluation solved fewer RAT items (M ⫽ 18%, SD ⫽ 20%) than participants who were not subject to this evaluation (M ⫽ 34%, SD ⫽ 24%), F(1, 168) ⫽ 21.94, p ⬍ .001, d ⫽ 0.72. No effects involving the timing of the lexical decision approached significance ( ps ⬎ .40). Although these results should be interpreted with caution given the fact that participants were not randomly assigned to the lexical decision times (i.e., 3 vs. 5 vs.

Ancillary Measures Participants in each experiment were asked to rate on 11-point scales the extent to which they knew how well they performed on the RAT task, they knew how many RAT items they had solved, and they found the task difficult. No differences were found on any of these measures ( ps ⬎ .20). Overall, the participants found the task difficult (M ⫽ 9.12 on an 11-point scale).7

Discussion To support the activation hypothesis, we should find evidence of the increasing activation of the correct answer across successive points in the solution period, and this is exactly what we did find for participants who were not subject to experimenter evaluation. The findings across the three lexical decision points (3 s, 5 s, 7 s) are presented in Figure 1. For ease of presentation, the results are presented as difference scores (mean control word reaction time minus mean correct word reaction time). Thus, a score above zero represents activation of the correct answer. As can be seen in Figure 1, for participants who were not subject to evaluation there was no activation of the correct answers at 3 s, but by the 5-s mark, there was significant activation, which grew even stronger by the 7-s mark. These findings are consistent with the activation hypothesis, which suggests that on complex items, the weak activation of the correct answer provided by each of the triad members summates and finally produces enough activation to push the correct answer above threshold. Comparing this pattern of findings to the findings for participants subject to the potential for evaluation by the experimenter would seem to provide support for the approaches that argue that the poor performance of the latter participants stems from the fact that they withdraw effort when they become convinced that they will not do well at the task (e.g., Carver & Scheier, 1981; Elliot et al., 2005; Shepperd, 2001). That is, at 3 s, the activation has yet to summate enough to affect reaction times. At 5 s, the activation 6

The average solution time on triads with correct answers did not differ from the average solution times on triads with matched control words. Overall, participants took an average of 13.7 s to solve the triads. Thus, it is not the case that the participants presented with the correct answer just saw the word, recognized it as the correct answer, and immediately solved it. More processing was needed. 7 In each of the following experiments, participants also were asked to rate on 11-point scales the extent to which they knew how well they performed on the RAT task, they knew how many RAT items they have solved, and they found the task difficult. No differences were found on any of these measures in any of the experiments.

THE MERE EFFORT HYPOTHESIS 140 120

Experimenter Evaluation No Experimenter Evaluation

Activation

100 80 60 40 20

445

(.76). As a result, participants subject to experimenter evaluation are left with only the few correct solutions provided by close associates and solved only 12% of the problems (Harkins, 2001b). These findings are consistent with the possibility that participants subject to experimenter evaluation have not given up nor have they fallen prey to worry about failure. Instead they are working hard at generating close associates on the complex items just as they did on the simple ones. And it is the strong activation of these close associates that is inhibiting the activation of the correct answers by the 7-s mark.

0

Experiment 4: Close Associates

-20 -40 Three Seconds Five Seconds Seven Seconds

Lexical Decision Time Figure 1. Word activation as a function of evaluation potential and time of lexical decision presented as difference scores (mean control word reaction time minus mean correct word reaction time at each lexical decision point).

process has begun, but, of course, for it to continue the participants need to remain involved in the task. If at this point they just give up and stop thinking about the words, the activation will dissipate, and this is what could be happening for participants subject to evaluation by the 7-s mark. In contrast, participants who are not subject to evaluation are less concerned about how well they are doing and remain focused on the task. For them activation is more likely to continue to summate, increasing the likelihood that they will ultimately come up with the correct answer, which will emerge, on average, some 6 s later (M ⫽ 13.7 s) in the 30-s timing period. Alternatively, one (Bond, 1982; see also Sarason et al., 1996) could argue that that potential for evaluation motivates participants to monitor how well they are performing the task, and the results of this process determine the subsequent course of their performance. On the simple task, when they check, they see that they are doing well, and they just continue. However, on the complex task, they find that success may not be assured, and they begin to worry and/or feel anxiety. They do not quit as a result, but worry (entertaining thoughts about failure) takes up processing capacity, preventing the correct answer from emerging. However, as we have noted, McClelland and Rumelhart’s (1981) interactive activation model suggests another possible reason for the drop in activation of the correct answer at 7 s exhibited by participants subject to experimenter evaluation. On the simple items, the greater the likelihood that at least one of the triad members elicited the correct associate, the greater the proportion of Harkins’s (2001b) experimenter-evaluation participants who were able to solve that RAT item, r(18) ⫽ .53, p ⬍ .05. This same relationship holds for complex items, r(18) ⫽ .46, p ⬍ .05. Thus, these data suggest that on complex triads, as on simple ones, participants subject to experimenter evaluation are producing close associates in an effort to solve the problems. However, the probability that at least one of the triad members will elicit the correct answer is much lower on complex triads (.37) than on simple ones

To test this hypothesis, we measured the activation of close associates rather than correct answers. We first tested for activation for the close associates at 7 s, because it was at 7 s that we saw the reduction in activation for correct answers for participants in the experimenter-evaluation condition. We then replicated the experiment at 5 s.

Method The 7- and 5-s versions were run as two separate experiments, but because they were exact replications aside from the placement of the lexical decision, we describe them together.

Participants Sixty undergraduate students participated in each version of the experiment (7 s: 53% female, 47% male; 5 s: 51% female, 49% male) as a means of satisfying an Introductory Psychology course requirement.

Procedure The experiment was run exactly as the previous ones, except that instead of correct answers, the focal stimuli were close associates of the triad members. For example, when presented the triad “ball storm man,” the correct answer is “snow.” On a close associates trial, the word “golf,” a close associate of the word “ball,” was presented as the stimulus. These close associates were selected so that, on average, they matched the correct answers in length and frequency. On the 10 word trials, participants saw a close associate of the first, the second, or the third member of the triad, or they saw a control word that was matched with the close associate in length and frequency. It should be noted that, as a result of this matching, the correct answers, the control words for the correct answers, the close associates, and the control words for the close associates did not differ, on average, in frequency or length. Words were also selected to ensure that within each triad, any given close associate was related only to that triad member and not to the other two triad members, and that the close associates were not related to the correct answer. In addition, the control words for the close associates were not related to any of the close associates or to any of the triad members, nor were they related to the correct answer. As in the previous experiments, because any participant could see a given triad only once, the 10 word triads were randomly divided into two sets. In one set (Set A), for Triads 1 through 5 a close associate of one of the words for that triad was the stimulus in the lexical decision task, whereas for Triads 6 through 10, the stimulus was the matched control for each of those words. In the second set (Set B), for Triads 1 through 5, the stimulus was the matched control for that particular word, whereas for Triads 6 through 10, the stimulus was a close associate of one of the three words in the triad. Within each evaluation condition, 15 participants saw Set A and 15 saw Set B. Of course, the participants could be considering

HARKINS

446

close associates of any of the triad members. To take this into account, we tested for activation for close associates for each of the triad members. Thus, of the 15 participants within each set, 5 participants were tested for activation of a close associate of the triad member in the first position for that given triad, 5 were tested for activation of a close associate of the triad member in the second position, and 5 were tested for activation of a close associate of the triad member in the third position. As a result, across the 15 participants who viewed a given set, the three positions for each problem were tested equally often. In the other set, the control word was matched in length and frequency to the particular close associate tested in the complementary set. As a result, when reaction times were averaged across the entire set of participants, the means reflected the average reaction time for words that were close associates of one of the triad members and their matched control words. In the 7-s version of the experiment, a tone sounded and the response box appeared 6.5 s into the trial, followed by the stimulus at the 7-s mark. In the 5-s version, the tone sounded and the box appeared 4.5 s into the trial, followed by the stimulus at the 5-s mark. As in Experiment 3, after completion of the RAT items, the computer instructed the participants in the experimenter-evaluation condition to go and get the experimenter. In the no-experimenter-evaluation condition, the computer instructed the participants to press a button to average their performance with that of the previous participants. After doing this, they were asked to go and get the experimenter. All participants were then asked to respond to an experimenter-evaluation manipulation check and some ancillary measures on 11-point scales.

Participants subject to evaluation recognized close associates as words more quickly (M ⫽ 835.85, SD ⫽ 233.89) than the matched control words (M ⫽ 950.29, SD ⫽ 261.67), F(1, 56) ⫽ 11.18, p ⬍ .001, d ⫽ 0.89, whereas for participants who were not subject to this evaluation, there was no difference between the time it took to recognize the close associates (M ⫽ 862.68, SD ⫽ 169.29) and their matched control words (M ⫽ 859.44, SD ⫽ 202.37; F ⬍ 1). These findings suggest that for participants subject to experimenter evaluation, close associates are activated. The close associates were recognized 114.44 ms more quickly than the matched controls. On the other hand, there was no evidence of such activation for participants who were not subject to evaluation. On average, the close associates were recognized 3.25 ms later than the matched controls. As a next step, we checked for activation of close associates at 5 s. 5 s. The analysis at 5 s revealed a main effect for word type, F(1, 56) ⫽ 6.99, p ⬍ .02, d ⫽ 0.71. The close associates were identified as words more quickly (M ⫽ 713.20 ms, SD ⫽ 196.93) than the matched control words (M ⫽ 764.40 ms, SD ⫽ 219.70). In addition, the interaction between experimenter evaluation and word type (close associate vs. control) was not reliable (F ⬍ 1). Thus, on average, at 5 s, close associates were recognized an average of 51.12 ms earlier than the matched control words.

Results

Discussion

The two versions of the experiment were conducted in sequence (7 s followed by 5 s). However, because the experiments are exact replications aside from the timing of the lexical decision, as in Experiment 3, we present the results of each of the versions together rather than presenting the results of each experiment in turn.

Although the results of Experiment 3 were consistent with the argument that the participants subject to experimenter evaluation were giving up or were distracted by thoughts of failure, we proposed an alternative interpretation. We argued that, in fact, these participants were working hard, generating close associates for triad members, and it was this behavior that undermined their performance. To test this hypothesis, in Experiment 4 we measured the activation of close associates rather than correct answers. Consistent with our proposal we found significant activation for the close associates for participants subject to experimenter evaluation, whereas participants in the no-experimenter-evaluation condition showed no such activation. These findings suggest that it is not that participants subject to experimenter evaluation are giving up. Instead they were engaging in the same behavior that led their counterparts facing simple items to be successful: They were attempting to solve the problem by generating close associates of the triad members. However, this behavior just does not work on the complex items. This interpretation certainly accounts for performance at 7 s, but it does not account for what we found when we tested for the activation of close associates at 5 s. That is, given the interactive nature of McClelland and Rumelhart’s (1981) activation model, we would expect to find the close associates already “pulling ahead” at 5 s, which would account for their high level of activation at 7 s. However, at 5 s, we found that although there was activation of close associates for participants subject to experimenter evaluation, there was no more activation for these participants (M ⫽ 43.51 ms) than for participants in the no-experimenter-evaluation condition (M ⫽ 58.89 ms, p ⬎ .20). In fact, at 5 s, both correct answers and close associates were equally activated for participants who were subject to experimenter evaluation and those who were not ( ps ⬎ .20).

Experimenter-Evaluation Manipulation Check The responses to the manipulation checks for experimenter evaluation were analyzed in 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ANOVAs. As shown in Table 1, in each version of the experiment (7 s, 5 s), participants subject to evaluation by the experimenter reported that they could be evaluated by the experimenter to a greater extent than participants who were not subject to experimenter evaluation ( ps ⬍ .0001).

Lexical Decision Task The reaction times for the words were analyzed in 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ⫻ 2 (close associates vs. matched control) ANOVAs with experimenter evaluation and set as between-subjects factors and word type (close associate vs. control) as a within-subjects factor. 7 s. This analysis revealed a main effect for word type, F(1, 56) ⫽ 5.28, p ⬍ .05, d ⫽ 0.61. The related words were identified as words more quickly (M ⫽ 849.27 ms, SD ⫽ 202.87) than the matched control words (M ⫽ 904.86 ms, SD ⫽ 236.40). However, this main effect must be interpreted in terms of the significant Experimenter Evaluation ⫻ Word Type interaction, F(1, 56) ⫽ 5.91, p ⬍ .02, d ⫽ 0.65.

THE MERE EFFORT HYPOTHESIS

This outcome suggests that for the participants subject to experimenter evaluation, the correct answers should be as likely to win the activation race as the close associates, but they did not. By the 7-s mark, correct answers showed no activation, but close associates were highly activated. As was the case in Experiment 4, one possible explanation for this pattern of findings is that some other source of activation is contributing to the pattern of activation of participants subject to experimenter evaluation at the 5-s mark, and previous research suggests at least one candidate. When we tested trial durations for the RAT task, we found that evaluation led to a performance decrement when participants were given only 30 s to solve the triad, but if participants were given 1 min, participants subject to experimenter evaluation performed as well as those who were not. One might think that performance improved because the extra 30 s provided more time for the participants to solve the triad. That is, we know that experimenterevaluation participants are generating and testing close associates, but given enough time, they may run out of steam. At this point, the correct answer could emerge. Of course, this point is not reached in only 30 s, but it is in 60 s. However, our activation data do not support this interpretation. To support this interpretation, we should have found that close associates are more activated than correct answers at the 5-s mark, but they are not: Close associates and correct answers are equally activated. However, it is also possible that it is the time available after the solution attempt of the previous triad that makes the difference, rather than the extra time available for the current triad. Perhaps the close associates from the previous triad are still active, and the 60 s makes available the time necessary for this activation to subside, providing the opportunity for the current triad to be solved.

447

not include the lexical decision measure. Participants in the 30-s trial with 2-s ITI condition (30 –2) and the 60-s trial condition (which also had a 2-s ITI; 60 –2) were presented the four practice trials with no further instruction. In the 30-s trial with 30-s ITI condition (30 –30), participants were told that the ITI was needed for the computer to produce the next trial and they were to simply wait until the next triad was presented. They then saw the four practice trials. At this point, the experimenter-evaluation manipulation was implemented as in the previous experiments. The participants then saw the 10 triads used in the word trials of the lexical decision task in the previous research presented in a randomized order. After completion of the RAT items, the computer instructed the participants in the experimenterevaluation condition to go and get the experimenter. In the noexperimenter-evaluation condition, the computer instructed the participants to press a button to average their performance with that of the previous participants. After doing this, they were asked to go and get the experimenter. All participants were then asked to respond to an experimenterevaluation manipulation check and some ancillary measures on 11-point scales.

Results The results of the experiment were analyzed in 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 3 (30-s trial with 2-s ITI vs. 60-s trial with 2-s ITI vs. 30-s trial with 30-s ITI) ANOVAs.

Experimenter-Evaluation Manipulation Check Participants subject to evaluation by the experimenter reported that they could be evaluated by the experimenter to a greater extent (M ⫽ 9.42, SD ⫽ 1.80) than participants who were not subject to experimenter evaluation (M ⫽ 4.42, SD ⫽ 2.65), F(1, 66) ⫽ 92.33, p ⬍ .0001, d ⫽ 2.37.

Experiment 5: RAT Performance With 30-s ITIs If this is the case, the participants do not need the 60 s to solve the triad. They simply need enough time between triads for the activation from the previous triad to subside, and the 2-s ITI that we used is insufficient for this purpose. To test this possibility, in Experiment 5 we gave the participants 30 s to solve the triads, but we replaced the 2-s ITI with an ITI of 30 s. Thus, now the participants had 60 s, but the extra 30 s comes after each triad. If it is the time after the triad that makes the difference, we should find that the performance of participants subject to experimenter evaluation is not debilitated in either the 30-s ITI condition or the 60-s condition, which we also included. However, in a third condition in which participants were given 30 s with a 2-s ITI, we should find the typical debilitation for participants subject to experimenter evaluation.

Method Participants Seventy-two undergraduate students participated in this experiment (53% female, 47% male) as a means of satisfying an Introductory Psychology course requirement.

Procedure The procedure for this experiment was the same as the previous ones with the following changes. As we were interested in performance, we did

RAT Performance Analysis of RAT performance revealed a main effect for trial type, F(2, 66) ⫽ 4.12, p ⬍ .05. Participants in the 60-s trial with 2-s ITI (60 –2) condition solved more triads (M ⫽ 3.79, SD ⫽ 1.14) than participants in the 30-s with 2-s ITI (30 –2) condition (M ⫽ 2.83, SD ⫽ 1.55, p ⬍ .05; Tukey’s honestly significant difference; Kirk, 1995). The 30-s with 30-s ITI (30 –30) condition (M ⫽ 3.54, SD ⫽ 1.18) did not differ from either of the other conditions. However, this main effect must be interpreted in terms of the significant Experimenter Evaluation ⫻ Trial Type interaction, F(2, 66) ⫽ 6.49, p ⬍ .01. Replicating previous research, in the 30 –2 condition, participants subject to experimenter evaluation performed more poorly (M ⫽ 1.92, SD ⫽ .90) than participants who were not subject to this evaluation (M ⫽ 3.75, SD ⫽ 1.54), F(1, 66) ⫽ 14.05, p ⬍ .001, d ⫽ 0.92. Also a replication of past research, when 60 s were provided, this difference was eliminated. In fact, it was reversed, though not significantly ( p ⬎ .20; Mexperimenter evaluation ⫽ 4.08, SD ⫽ 1.24; Mno experimenter evaluation ⫽ 3.50, SD ⫽ 1.00). Finally, consistent with the possibility that providing time after a solution attempt facilitates performance on the following triad, we found that in the 30 –30 condition, participants subject to evaluation performed as well (M ⫽ 3.50, SD ⫽ 1.09) as participants who were not subject to this evaluation (M ⫽ 3.58, SD ⫽ 1.31; F ⬍ 1).

HARKINS

448 Discussion

Experiment 5 showed that providing a 30-s ITI eliminated the difference between the performance of those participants who were subject to experimenter evaluation and those who were not, suggesting that instead of using the additional time provided by the 60-s trial to solve the current triad, the participants subject to evaluation were benefiting from the fact that the long ITI allows time for the activation stemming from the previous triad to subside.8 These findings suggest that the persistence of the activation from close associates generated in the previous trial contributes to the debilitation of the performance of participants subject to experimenter evaluation.

Experiment 6: Close Associates from Previous Trial To test this possibility we looked for activation of close associates from the previous trial at the 5-s mark, because Experiment 4 showed that close associates for the present trial are highly active at 7 s, and are not any more active than the correct answer at 5 s. As a result, it would appear most likely that if there is an effect of the previous trial’s close associates, it would be at 5 s.

the pair. As in the previous experiment, across the 15 participants scheduled within each set, close associates of each of the three triad members were tested equally often. When reaction times were averaged across the participants in each of the sets, the means reflected the average reaction time for words that are close associates of one of the triad members of the triad presented in the preceding trial and their matched control words. The three other word trials were used simply so that all of the word trials did not come in pairs, and these data were not analyzed. The order of these three “word” triads and the 13 triads with nonword lexical decisions was fixed, as was the order of trials on which the test triads appeared. That is, in the 26 trials, the pairs of test triads were always in Positions 2 and 3, 8 and 9, 13 and 14, 18 and 19, and 23 and 24. The pairs of test triads were rotated across these positions so that each pair appeared equally often in each position. The three extra word triads were always in Positions 6, 11, and 21, and the nonword triads were always in the other 13 positions. Because we were testing for activation at 5 s, the tone sounded and the box appeared 4.5 s into the trial, followed by the stimulus at the 5-s mark. As in the previous research, after completion of the RAT items, the computer instructed the participants in the experimenter-evaluation condition to go and get the experimenter. In the no-experimenter-evaluation condition, the computer instructed the participants to press a button to average their performance with that of the previous participants. After doing this, they were asked to go and get the experimenter. All participants were then asked to respond to an experimenter-evaluation manipulation check and some ancillary measures on 11-point scales.

Method Results

Participants Sixty undergraduate students participated in the experiment (50% female, 50% male) as a means of satisfying an Introductory Psychology course requirement.

Procedure The basic procedure of this experiment was the same as that used in the previous research except that participants saw 26 triads, instead of 20. On 13 of the trials, the stimulus in the lexical decision was a nonword. On 13 of the trials, the stimulus was a word. Ten of the 13 triads were test triads, which were always presented in pairs. In this research we were testing the possibility that close associates from the previous trial were affecting performance on the current one, and so, the word in the lexical decision in the second triad of the pair was always a close associate of a triad member in the first triad. The stimulus in the lexical decision task for the first triad of the pair was a control word for the second triad. That is, participants tried to solve a triad in one trial, and then in the next trial, we tested for activation of a close associate of one of the triad members of the previous trial. As before, two sets of triads were constructed (Sets A and B). If the triads in the above example were from Set A, in Set B the triad that came first in Set A would be moved to second in Set B, and the triad that came second would be presented first. And as in Set A, it would be the activation of a close associate of a triad member of the triad that came first in the pair that would be tested by the lexical decision that accompanied the triad that came second, and the lexical decision that accompanied the triad presented first would present the matched control word for one of the triad members of the triad that came second in the pair. Thus, for example, in Set A, the triad to be tested could be “type ghost story,” and the lexical decision presented with the immediately following triad was “blood,” a word that is a close associate of “type.” The stimulus for the lexical decision accompanying “type ghost story” would be the matched control word for one of the triad members of the triad that came second. In Set B, “type ghost story” would be the second triad of the pair, and the matched control word for “blood,” which is “treat,” would appear as the stimulus in the lexical decision task for the triad that comes first in

Experimenter-Evaluation Manipulation Check The responses to the manipulation check for experimenter evaluation were analyzed in a 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ANOVA. Participants subject to evaluation by the experimenter reported that they could be evaluated by the experimenter to a greater extent (M ⫽ 9.37, SD ⫽ 2.24) than participants who were not subject to experimenter evaluation (M ⫽ 4.23, SD ⫽ 3.21), F(1, 56) ⫽ 52.67, p ⬍ .0001, d ⫽ 1.94.

Lexical Decision Task The reaction times for the words were analyzed in a 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ⫻ 2 (close associates from previous trial vs. matched control) ANOVA with experimenter evaluation and set as between-subjects factors and word type (close associate vs. control) as a within-subjects factor. This analysis revealed a main effect for word type, F(1, 56) ⫽ 12.74, p ⬍ .001, d ⫽ 0.95. The 8 One might argue that it is not that activation from the preceding trial is subsiding but rather that the participants are actually using the 30-s ITI to solve the preceding triad and that the favorable affect produced by this success then leads them to perform better on subsequent trials. To test this possibility, we ran a replication of the 30-s ITI condition in which the computer requested participants to review the triads and their answers after they had completed the RAT task. They were shown each triad and given 3 s to type in their answers, enough time to enter a response but not enough to solve the item on the spot. If they had solved the triad any time between the end of the trial and the end of the task, they were provided the chance to provide the answer. However, there was no difference in the number of items solved at the two points ( p ⬎ .90), suggesting that they were not solving the problems in the 30-s ITI following their presentation.

THE MERE EFFORT HYPOTHESIS

close associates from the previous trial were identified as words more quickly (M ⫽ 779.89 ms, SD ⫽ 237.15) than the matched control words (M ⫽ 846.91 ms, SD ⫽ 219.11). However, this main effect must be interpreted in terms of the significant Experimenter Evaluation ⫻ Word Type interaction, F(1, 56) ⫽ 5.34, p ⬍ .05, d ⫽ 0.62. Participants subject to evaluation recognized close associates from the previous trial as words more quickly (M ⫽ 748.80, SD ⫽ 200.78) than the matched control words (M ⫽ 859.22, SD ⫽ 220.87), F(1, 56) ⫽ 17.29, p ⬍ .001, d ⫽ 1.11, whereas for participants who were not subject to this evaluation, there was no difference between the time it took to recognize the close associates from the previous trial (M ⫽ 810.97, SD ⫽ 268.53) and their matched control words (M ⫽ 834.60, SD ⫽ 220.41; F ⬍ 1). These findings suggest that for participants subject to experimenter evaluation, close associates from the previous trial are activated. The close associates were recognized 110.42 ms more quickly than the matched controls. On the other hand, there was no evidence of such activation for participants who were not subject to evaluation. On average, for these participants the close associates from the previous trial were recognized only 23.63 ms earlier than the matched control words.

Experiment 7: Lexical Decisions With a 30-s ITI These data show that the close associates from the previous trial were highly activated 5 s into the next trial. This finding is consistent with the argument that the 30-s ITI improves performance because it provides time for this activation to dissipate. If this is the case, if we incorporated a 30-s ITI in the reaction time paradigm and placed the lexical decision at the 7-s mark, we should find activation for the correct answer for participants subject to experimenter evaluation. To account for the improved performance resulting from the 30-s ITI, we should also find that the activation of close associates from the present trial is reduced. We looked at 7 s because it was at this point that Experiment 3 showed no activation of correct answers for participants subject to experimenter evaluation, and Experiment 4 showed activation of close associates for these participants.

Method In the first version of the experiment, correct answers were used as the words in the lexical decision task, whereas in the second, close associates were used for this purpose, but because they were exact replications aside from the stimuli used in the lexical decision task, we describe them together.

Participants Sixty undergraduate students participated in each version of the experiment (correct answers: 51% female, 49% male; close associates: 50% female, 50% male) as a means of satisfying an Introductory Psychology course requirement.

Procedure With only one change, we used the same procedure in the correct answer version of this experiment as we used in the 7-s version of Experiment 3, and the same procedure in the close associates version as in the 7-s version of Experiment 4. The one change was that instead of the 2-s ITI between

449

each triad, we inserted a 30-s ITI. Participants were told that this time was required for the computer to set up the new triad, and they were just to wait until the next problem appeared. As in each of the prior experiments, after completion of the RAT items, the computer instructed the participants in the experimenter-evaluation condition to go and get the experimenter. In the no-experimenterevaluation condition, the computer instructed the participants to press a button to average their performance with that of the previous participants. After doing this, they were asked to go and get the experimenter. All participants were then asked to respond to an experimenter-evaluation manipulation check and some ancillary measures on 11-point scales.

Results The two versions of the experiment were conducted in sequence (correct answer followed by close associate). However, because the experiments are exact replications aside from whether the word stimulus is a correct answer or close associate, we present the results of each version together.

Experimenter-Evaluation Manipulation Check The responses to the manipulation checks for experimenter evaluation were analyzed in 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ANOVAs. As shown in Table 1, in each version of the experiment (correct answer, close associate), participants subject to evaluation by the experimenter reported that they could be evaluated by the experimenter to a greater extent than participants who were not subject to experimenter evaluation ( ps ⬍ .0001).

Lexical Decision Task The reaction times for the words were analyzed in two 2 (experimenter evaluation vs. no experimenter evaluation) ⫻ 2 (Set A vs. Set B) ⫻ 2 (test word [correct answer or close associate] vs. matched control) ANOVAs with experimenter evaluation and set as between-subjects factors and word type (correct answer or close associate vs. matched control word) as a within-subjects factor. 7 s, correct answers. This analysis revealed a main effect for word type, F(1, 56) ⫽ 10.02, p ⬍ .01, d ⫽ 0.85. The related words were identified as words more quickly (M ⫽ 771.85 ms, SD ⫽ 206.40) than the matched control words (M ⫽ 851.88 ms, SD ⫽ 229.28). Neither the main effect for experimenter evaluation ( p ⬎ .30) nor the interaction ( p ⬎ .80) approached significance. Thus, the correct answer was activated for both participants subject to evaluation and for those who were not (overall M ⫽ 80.51 ms). 7 s, close associates. This analysis revealed no reliable effects. There was no main effect for experimenter evaluation (F ⬍ 1) nor word type (F ⬍ 1), nor was there an interaction between these variables (F ⬍ 1). These findings suggest that close associates of the triad members were not activated.

Discussion Consistent with prediction, Experiment 7 showed that, when a 30-s ITI was inserted, correct answers were activated at the 7-s mark for both those participants who were subject to evaluation and those who were not. And, as would be expected given the activation of the correct answers, Experiment 7 also showed that the close associates of the triad members were not. These findings

HARKINS

450

are consistent with the notion that the poor performance of participants subject to experimenter evaluation stems from the fact that they were attempting to solve the problems by generating close associates of the triad members, and that the activation produced by this effort persisted from one trial to the next. The 30-s ITI improves performance because it provides time for the activation of the close associates from the previous trial to dissipate.

General Discussion Any comprehensive account of the effects of evaluation on performance must provide evidence for the mediating process. Although the research traditions that have examined evaluation– performance effects have proposed process models, they do not agree on the mediating process(es), nor is there any compelling evidence favoring one account over the others. We argued that one means of identifying the mediating process would be to undertake a molecular analysis of the effects of evaluation on the performance of a specific task. In the current article, we describe a series of experiments in which we attempt to do exactly that, using the Remote Associates Test. As a first step in this analysis, we attempted to determine exactly what makes RAT items easy or difficult to solve. After considering and rejecting a number of possibilities (e.g., triad member association; Experiment 1), we considered the activation hypothesis. This explanation contends that the triad members and their solutions are part of an associative network. When a particular word (i.e., triad member) is considered, its node is activated and this activation spreads to other associated words, and the strength of this activation corresponds to the strength of the association between the words. This explanation proposes that the “easy” items are easily solved because the activation of the triad members’ nodes in turn activates the nodes of closely related words, one of which is the answer. Consistent with this argument, in Experiment 2 we found that “correct” answers were more than twice as likely to be produced as an associate of at least one of the triad members on the simple triads than on the complex triads. Thus, to perform well on the simple triads, all one need do is generate close associates of a triad member and then see if the candidate is also associated with the other two triad members. And this appears to be exactly what participants subject to experimenter evaluation were doing. For these participants, we found that the probability that the correct response would be generated in response to at least one of the triad members was reliably correlated with the proportion of participants who solved the problem. This relationship was not reliable for participants who were not subject to evaluation, consistent with the notion that these participants were loafing. On complex items the association between the triad members and the correct answer is much weaker (i.e., the associates are more remote). On these items, it is unlikely that generating close associates will produce the solution. Instead, the activation hypothesis suggests that the solution requires the accumulation of the weak activation provided by each of the triad members. In a series of experiments, we attempted to test this activation account by inserting a lexical decision task at various points in the solution process of the complex RAT task, so that we could measure the activation of correct answers and close associates of the triad members.

In Experiment 3, we found that although at 5 s there was reliable activation of the correct answers for both those participants who were subject to evaluation and those who were not, by the 7-s mark, this activation had dissipated for participants subject to evaluation by the experimenter. Instead, as shown in Experiment 4, at the 7-s mark close associates of the triad members were highly activated for these participants. However, when we looked for activation of the close associates for participants subject to experimenter evaluation at 5 s, we found that it was there, but it was no greater than the activation of the correct answers. Thus, by itself, this pattern of activation could not account for the dominance of close associates at the 7-s mark. In pilot work for this line of research we found that when we used 30-s trials, we replicated the typical finding that, on complex tasks, participants who were subject to evaluation performed more poorly than those who were not. However, 60-s trials eliminated this difference. One could argue that the 60-s trials improved the performance of participants subject to evaluation because the trials provided enough time for these participants to exhaust their store of close associates while still leaving time for the correct answer to emerge. However, if this were the case, we would expect to find that close associates of the triad members of the current triad were highly activated at the 5-s mark, and, as noted earlier, at 5 s we found that they were no more activated than correct answers. This outcome suggests the possibility that it is not the extra time that is helping performance on the current triad, but rather that the extra time provides the opportunity for activation from the previous trial to dissipate. We tested this possibility in Experiment 5 by providing participants with a 30-s ITI instead of the 2-s ITI used in our previous research. Of course, participants in this condition did not have any more time to solve the triad than participants in the 30-s condition with a 2-s ITI; the extended interval just provided the opportunity for the activation from the previous trial to subside. Consistent with this hypothesis, we found that participants subject to experimenter evaluation who were provided with the 30-s ITI performed as well as participants who were not subject to this evaluation. This finding suggests that close associates of the triad members from the preceding trial are activated for participants who are subject to experimenter evaluation but not for those who are not, and this is exactly what we found in Experiment 6. Finally, in Experiment 7 we found that when we inserted a 30-s ITI between triads and tested for activation at the 7-s mark, correct answers were activated for both participants who were subject to experimenter evaluation and those who were not, and close associates were activated for neither. Thus, on this task, when faced with a triad, participants subject to experimenter evaluation immediately began generating close associates for the triad members. As a result, if the correct answer was a close associate, as was the case on the simple RAT problems, these participants did extremely well. However, on the complex items, the correct answer was not a close associate of one of the triad members. Instead, it was a remote associate, and it took the weak activation resulting from each of the triad members to produce enough activation to give the correct answer some chance of emerging as a response. Of course, close associates have much stronger activation than remote associates and as long as the participants keep testing these candidates, their strong activation will inhibit the activation of the correct answer, making it ex-

THE MERE EFFORT HYPOTHESIS

tremely unlikely that the correct answer will make it anywhere in the activation race. In fact, our evidence suggests that the activation of these close associates is so strong that it persists through the 2-s ITI and 5 s into the next trial. To test the role that the activation of these close associates plays in performance, we conducted a mediation analysis following the procedures suggested by Kenny, Kashy, and Bolger (1998) on the data from Experiment 5 (close associates from the previous trial at the 5-s mark). Zero-order correlations and regression beta weights are shown for the predicted mediational model in Figure 2. Activation in the model is represented as the difference between the reaction time for the unrelated control words and the reaction times for the close associates from the previous trial, such that a positive score indicates that the close associates from the previous trial are activated (i.e., faster reaction times for close associates than for control words). Significant zero-order correlations exist between all three variables. However when performance was regressed on activation and condition, only activation remained a reliable predictor. These findings are consistent with the argument that 5 s into the trial, activation of the close associates from the previous trial mediates the effect of the potential for experimenter evaluation on performance (Sobel test; Z ⫽ 2.06, p ⬍ .05). Our account would contend that sometime between 5 and 7 s, the participants subject to experimenter evaluation begin generating close associates for the triad members of the current trial, as suggested by the high activation of these associates at the 7-s mark (Experiment 4). Of course, the activation levels of the close associates increase quickly and inhibit the activation of the correct answer on the current trial. To test this aspect of our account, we conducted a mediation analysis on the data from Experiment 4 (close associates from the current trial at the 7-s mark). Zero-order correlations and regression beta weights are shown for the predicted mediational model in Figure 3. Once again, activation in the model is represented as the difference between the reaction time for the unrelated control words and the reaction times for the close associates from the current trial, such that a positive score indicates that the close associates from the previous trial are activated. Significant zero-order correlations exist between all three variables. However when performance was regressed on activation and condition, only activation remained a reliable predictor. These findings are consistent with the argument that 7 s into the trial,

Figure 2. Activation from close associates from the preceding trial at 5 s as a mediator of performance in Experiment 5. Coefficients in parentheses indicate zero-order correlations. Coefficients not in parentheses represent parameter estimates for a recursive path model including both predictors. Asterisks indicate parameter estimates or correlations that differ from zero at p ⬍ .05. Experimenter evaluation was dummy coded (experimenter evaluation ⫽ 1, no experimenter evaluation ⫽ 0).

451

Figure 3. Activation from close associates from the current trial at 7 s as a mediator of performance in Experiment 4. Coefficients in parentheses indicate zero-order correlations. Coefficients not in parentheses represent parameter estimates for a recursive path model including both predictors. Asterisks indicate parameter estimates or correlations that differ from zero at p ⬍ .05. Experimenter evaluation was dummy coded (experimenter evaluation ⫽ 1, no experimenter evaluation ⫽ 0).

activation of the close associates from the current trial mediates the effect of the potential for experimenter evaluation on performance (Sobel test; Z ⫽ 1.95, p ⫽ .05). These findings suggest that the activation of the correct answer is inhibited first by the activation of the close associates from the previous trial and then by the close associates of the present triad. The introduction of the 30-s ITI eliminates the effect of the activation from the previous trial, giving the correct answer a chance to win the activation race. In fact, in Experiment 7, we found greater activation of the correct answer than of the close associates for both those participants subject to evaluation and those who were not. Taken together, the results of this research suggest that on both simple and complex items participants subject to evaluation simply put out more effort than those who were not. To solve simple triads, participants need only generate close associates for one of the triad members and then test them against the other triad members. The greater effort on the part of participants subject to evaluation leads to the production of more close associates and better performance. On complex items, the participants engage in exactly the same behavior, but putting more effort into generating close associates for these items ensures failure. Thus, on this task at least, mere effort appears to mediate the evaluation–performance relationship. In our survey of the different traditions, we identified four explanations that have been offered to account for the effect of evaluation on complex task performance: withdrawal of effort, processing interference, restricted focus of attention, and drive. Our findings show that on the complex RAT items, participants subject to evaluation do not perform poorly because they withdraw effort (Carver & Scheier, 1981; Elliot et al., 2005; Hennessey, 2001; Shepperd, 2001). Instead, it appears that it is the fact that they are putting out effort that is the source of their difficulty on complex triads. Nor is it that worry concerning failure takes up processing capacity, ensuring failure (Bond, 1982; Elliot et al., 2005; Sarason et al., 1996). Once again, our findings suggest that participants subject to experimenter evaluation are engaged in the same behavior on both simple and complex items. It is just that this behavior is effective on simple items but ineffective on complex ones. The third explanation, focus of attention, suggests that the potential for evaluation produces an attentional overload that

452

HARKINS

“leads to a restriction in cognitive focus in which the individual attends more to cues that are most central to the task (or alternatively most central geographically in the display) at the expense of more peripheral cues” (Baron, 1986, p. 27). This cognitive explanation fails to capture the role that motivation plays in producing the pattern of results. That is, it is not the fact that participants subject to evaluation are focused on answer candidates that are closely related to the triad members (i.e., central cues) that accounts for their performance. It is the effort that they put into generating these candidates and testing them that ensures that they perform better than no-experimenter-evaluation participants on simple items but more poorly on complex ones. Likewise, noexperimenter-evaluation participants do not perform more poorly on simple items and better on complex ones than participants subject to evaluation because they are better able to think of more remotely associated answer candidates (i.e., peripheral cues). It is their lack of motivation that prevents them from testing enough candidates to come up with correct answers on simple items, and it is this same lack of motivation that allows the small amount of activation produced by each triad member to accumulate to the point that the correct answer emerges on complex items. Of the four explanations, the mere effort account has the most in common with the explanation provided by drive theory. This explanation proposes that the presence of others produces arousal, which increases drive. Increased drive enhances the probability of the emission of dominant responses, which are likely to be correct on simple tasks but incorrect on complex ones. In the case of the RAT, the dominant response is the production of close associates, which facilitates the solution of simple items but debilitates the solution of complex ones. Thus the two accounts each use the notion of a dominant, or prepotent, response. However, we have not adopted Zajonc’s (1965) Hullian notion of nonspecific drive. Our argument is limited to the effects of the potential of evaluation on motivation. In addition, the mere effort hypothesis does not rely on the Hullian notion of a habit hierarchy. For example, we do not propose that complex RAT items are solved because a subordinate (correct) response to one of the triad members is more likely to be emitted when there is no potential for evaluation. Instead we argue that solving the complex RAT items requires the accumulation of the weak activation provided by each of the triad members. Finally although Zajonc would agree that the potential for evaluation would have the effect of increasing drive, he argued that even in the absence of the potential for evaluation, the mere presence of others also does so. In the present line of research, we have focused on the effects of evaluation and have scheduled participants individually. Thus, mere presence effects fall outside the scope of the mere effort hypothesis, as do the motivational effects of any variable other than the potential for evaluation. The next step in this line of research will be to test the generality of this explanation. The mere effort hypothesis proposes that participants who are subject to evaluation simply put more effort into whatever response is prepotent on that particular task. On the RAT, our molecular analysis revealed that the prepotent response was to generate close associates. On other tasks, research may have already identified the prepotent response. For example, to solve anagrams, participants typically rearrange the letters in the problem until they find a match with a word searched for and retrieved from their lexicon (Witte & Freund, 2001). To do so, they begin with the first letter of the word. And because many more

words begin with consonants than with vowels, participants have a strong tendency to try consonants in the first position (Witte & Freund, 2001). The mere effort hypothesis would predict that this response would be enhanced when participants are subject to evaluation by the experimenter. That is, just as participants subject to evaluation are highly motivated to solve the RAT items and, to this end, generate close associates of the triad members, participants subject to evaluation should be highly motivated to solve anagrams and should attempt to do so by testing consonants in the first position. As a result, participants subject to experimenter evaluation should be more likely to solve anagrams for which the words begin with consonants but less likely to solve anagrams for which the words begin with vowels than participants who are not subject to evaluation. A number of other variables have also been found to affect the solvability of anagrams. For example, the greater the frequency of appearance of a word in the language, the easier it is to solve its anagram (e.g., Mayzner & Tresselt, 1958; Witte & Freund, 2001). Word frequency determines the resting level of activation for the word, and the greater a word’s resting level of activation, the more likely it is to be tested as a candidate when its constituent letters appear in the anagram. However, the mere effort explanation would not make the interaction prediction for this manipulation of anagram difficulty. Whether the solutions to anagrams are words of high or low frequency, solvers will still tend to try consonants in the first position, and this prepotent response should be even more likely to be made by participants subject to evaluation than by those who are not. In contrast, the other accounts would make the same interaction prediction for evaluation and each of the manipulations of task difficulty. For example, both Bond (1982) and Carver and Scheier (1981) have suggested that participants monitor how well they are performing the task. When the monitoring indicates that success is not assured Bond (1982; cf. Sarason et al., 1996) suggested that concern over performance takes up processing capacity, preventing the correct answer from emerging, whereas Carver and Scheier suggested that when participants believe it unlikely that they can bring their behavior in line with the standard, they just stop trying. According to these and the other accounts, what should matter is the combination of the potential for evaluation and the experience of difficulty, not the source of the difficulty. Thus, it should not matter whether the solution is difficult because the word appears infrequently in the language, or begins with a vowel instead of a consonant. In each case, the solver should experience difficulty in achieving success, and performance should suffer. In the case of anagrams, previous research has identified the prepotent response. In other cases, the literature has not identified the prepotent response, but our molecular analysis of RAT performance provides a guide for how we should go about identifying it. For example, the typical interaction between the potential for evaluation and task difficulty has been obtained in the literature on maze solution (e.g., Hunt & Hillery, 1973; Husband, 1931; Jackson & Williams, 1985; Shaver & Liebling, 1976). To identify the prepotent response, we will begin by replicating the typical interaction pattern on simple and complex mazes and then examine performance on the simple mazes to determine what approach participants subject to experimenter evaluation used to outperform participants who are not subject to this evaluation. Once we identify the prepotent response, we will test the mere effort hy-

THE MERE EFFORT HYPOTHESIS

pothesis by seeing whether the participants subject to evaluation perform more poorly on the complex maze simply because the same “more effort” they put into the prepotent response on the simple maze undermines their performance on the complex version of the task. Of course, demonstrating the viability of the mere effort hypothesis will require research using a variety of tasks. This research could show that although the mere effort hypothesis can account for performance on some types of tasks, other processes in addition to, or instead of, mere effort may be involved on other types of tasks. Nonetheless, the current research is promising. Identifying the mediating process may lay the groundwork for an integration of the five research traditions and, at the least, clarifies the basis for the findings within each of the traditions. For example, Shepperd’s (2001) expectancy-value account of social loafing effects would suggest that participants subject to evaluation perform more poorly on complex tasks than participants working collectively because the former participants withdraw effort. The mere effort hypothesis would argue that it is not that these participants are withdrawing effort at all. In fact, they are putting out at least as much effort on complex tasks as on simple ones, and it is this very effort that is debilitating their performance. Thus, it is the fact that participants are loafing in the collective condition that produces better performance. In the case of goal setting, the mere effort hypothesis would suggest that goal-setting effects are produced quite simply by participants’ putting more effort into the prepotent response in the goal-setting conditions. If the prepotent response is correct, this greater effort will produce better performance. Consistent with this interpretation, Harkins (2001a) set a stringent goal for RAT performance and found that participants subject to experimenter evaluation produced a goal-setting effect on the “simple” RAT items but participants not subject to this evaluation did not. Of course, in this case, if participants for whom a goal is set generate many close associates, they will solve more RAT items, which is what was found. On the other hand, participants given a stringent goal for complex RAT items performed no better than participants in the do-your-best condition. In fact, we would have expected performance in this case to have been even worse than the do-your-best level as a result of the debilitating effect of generating many close associates on “complex” triads. However, it is possible that performance was already so low in the do-your-best condition (M ⫽ 3.1 of 20 items) that a floor effect prevented the goal participants from solving even fewer items. In any event, these findings provide some support for the mere effort interpretation of goal-setting effects. Hennessey (2001) suggested that focus of attention and/or withdrawal of effort account for the fact that evaluation debilitates creativity. In contrast, the mere effort account would suggest that creative tasks are tasks on which the prepotent response is incorrect. The potential for evaluation potentiates these incorrect responses, leading to debilitated performance. Consistent with this argument, when Amabile (1979) gave students in an evaluation condition specific instructions on how to produce a creative collage (e.g., variation in the shapes used, asymmetry of design, detail in design), these participants produced collages that were rated as the most creative in the experiment. The mere effort account would argue that Amabile identified the appropriate responses for these participants, and the potential for evaluation then potentiated these

453

responses. It is not clear why either the withdrawal of effort or focus of attention would predict this heightened level of performance. However, to show exactly how mere effort accounts for the debilitating effects of evaluation on creativity will require that, in future research, we identify the uncreative, prepotent responses that are potentiated by the potential for evaluation. Consistent with the trichotomous achievement goal approach, Elliot et al. (2005) found that performance–approach goals lead to better performance than performance–avoidance goals. In fact, in this research, the performance of participants in the performance– approach condition was as good as that found in a mastery condition. As Elliot et al. wrote, “Performance–avoidance goals undermined performance relative to performance–approach and mastery goals, and performance–approach goals were as positive for performance as mastery goals” (p. 634). They went on to show that when a performance contingency is introduced, performance– approach goals lead to even better performance than mastery goals, whereas the performance contingency leads to even poorer performance in the performance–avoidance goal condition. The authors concluded that, contrary to the assumption of achievement goal researchers that mastery goals exert a positive influence and performance goals exert a negative influence on performance, “it is performance–avoidance goals, not performance goals in general, that have a negative influence on performance” (p. 638). The mere effort hypothesis would suggest that it might be premature to argue that performance–approach goals do not debilitate performance. The fact that a performance contingency improved performance in the performance–approach condition of Elliot et al.’s (2005) research suggests that the Scrabble-like task used in this research is “simple” (i.e., the correct response is prepotent). That is, participants in the performance–approach/contingency condition perform better because they are putting out more effort, which on a “simple” task leads to better performance. Our research shows that participants are highly responsive to the prescriptions of the experimenter. For example, Harkins et al. (2000) showed that participants subject to experimenter evaluation produce goals setting effects when they are urged to strive to reach a stringent goal. When the same criterion is presented as a piece of information (no striving instructions), participants perform no better than participants in a do-your-best control group. We argue that this latter case corresponds to the performance–approach goal condition in Elliot et al.’s (2005) research. That is, these participants are told that the session will provide them with the opportunity to demonstrate that they are exceptional puzzle solvers. Despite these instructions, they perform no better than participants in the mastery condition who are told that the purpose of the study is to collect data on college students’ reactions to the game and that they are “to learn how to play this game well.” Only when the participants in the performance–approach goal condition have a reason to put out more effort do they do so. Just as the stringent goal informs the participants in our goal-setting research that greater effort is required, when the performance–approach participants realize that they can earn extra credit but to do so, they must “do exceptionally well,” they try harder. The criterion for the mastery participants does not require this level of effort (“learn how to play this game well”), and so they do not put out any more effort with the contingency than without it. Of course, this account would suggest that they believe that they are just as likely to get

454

HARKINS

the extra credit as the participants in the performance–approach goal condition. We also argue that participants in the performance–avoidance goal conditions are simply responding to the prescriptions of the experimenter and to the opportunities presented by their participation. There is no evidence that these participants expect to do poorly, that they have given up, that they are distracted, or that they are anxious. All these participants have to do is demonstrate that they are not poor puzzle solvers, a criterion they may well believe that they can satisfy with little effort. Once again, we argue that they have the same expectation of success as the other participants. They just believe that it will take less effort to achieve this outcome. If the prepotent responses in this task are correct, Elliot et al.’s (2005) findings simply show what has already been found in the social loafing, goal-setting, and social facilitation literatures: On “simple” tasks, greater effort leads to better performance. To show that performance–approach goals do anything more than this, it will be necessary to show that the performance that they produce on a “complex” task (prepotent response is incorrect) is better than, or at least equivalent to, that found in a mastery condition. However, the mere effort hypothesis would predict that this will not be found. Instead, if the prepotent response is incorrect, the mere effort account would predict that telling the participants that they must perform exceptionally well will ensure that they perform poorly. In the social facilitation literature, a variety of explanations have been offered to account for the debilitating effect of evaluation on the performance of complex tasks: withdrawal of effort, processing interference, focus of attention, and drive. As we noted previously, the mere effort explanation is most similar to Zajonc’s (1965) drive explanation. However, at this time, the only point of similarity between the mere effort account and the drive account for social facilitation effects is the focus on the dominant, or prepotent, response. We have not adopted Zajonc’s Hullian notions of nonspecific drive or habit hierarchies, nor have we considered mere presence effects. However, it is possible that in the future, other aspects of Zajonc’s formulation will be incorporated in this work. Even now, it is interesting to note that in 1965, Zajonc’s drive explanation brought renewed interest to a moribund area of research, and now 40 years later, we return to his ideas. It is to be hoped that this effort will finally bring resolution to a problem that has been around since the birth of experimental social psychology (Triplett, 1898). It is also possible that mere effort can account for the performance effects found in another domain, stereotype threat. Stereotype threat refers to the social-psychological threat that arises when one is in a situation or doing something for which a negative stereotype about one’s group applies. This predicament threatens one with being negatively stereotyped, with being judged or treated stereotypically, or with the prospect of conforming to the stereotype. (Steele, 1997, p. 614)

Although the effects of stereotype threat on performance are well established, the mediating process is not. In fact, the same processes proposed to account for the effects of evaluation on performance have also been proposed to account for the effects of stereotype threat on task performance (e.g., anxiety: Steele & Aronson, 1995; withdrawal of effort: Stone, 2002; processing

interference: Schmader & Johns, 2003; arousal: O’Brien & Crandall, 2003), and as is the case in the evaluation–performance domain, there is no compelling evidence favoring one explanation over another. Recently Schmader and Johns (2003) argued that stereotype threat reduces working memory capacity, which negatively impacts performance. That is, stereotype threat activates negative stereotypes, and cognitive resources that could be devoted to the task are spent processing this information and/or attempting to suppress it. This explanation is a variant of the processing interference explanation proposed in the evaluation–performance domain (e.g., Bond, 1982). And, as is the case in that domain, we argue that it is mere effort, not diminished processing capacity, that accounts for their findings. The plausibility of this proposal is supported by two arguments: First, both stereotype threat and the potential for evaluation could arouse concern about one’s ability to perform the task, leading to processing interference. However, our findings for the effects of evaluation on RAT performance do not support this processing interference account. It seems quite unlikely that different explanations would hold in situations that would seem equally likely to arouse performance concerns. Second, O’Brien and Crandall (2003) and Ben-Zeev, Fein, and Inzlicht (2005) have each found that stereotype threat not only debilitates performance on difficult tasks but also facilitates performance on simple tasks. This pattern of findings fits quite well with the mere effort account but cannot be explained by a reduction in working memory capacity alone. However, additional research will be required to determine whether mere effort can provide a viable account of performance effects in the stereotype threat literature. Finally, because evaluation is an integral and necessary part of task performance in many applied settings (e.g., industry, education), knowing the specific process(es) through which the potential for evaluation facilitates or debilitates performance will permit the design of effective intervention strategies. For example, we tested an intervention based on the results of the current research in a pilot study. All of the participants were told that their performance would be subject to evaluation. One third of these participants were told that if they wanted to succeed, they should refrain from generating close associates. Instead they were to simply register the triad members and then wait for the answer to “pop up.” Another third were told that if they wanted to succeed, they should generate as many close associates as possible. The final third, a control condition, were told nothing. We found that participants who were told not to generate close associates, but to wait for the answer to emerge, outperformed the other two groups, which did not differ from each other. Thus, in this case, simply providing an instruction about how to approach the task was enough to counter the effect of the potential for evaluation. These findings suggest that the molecular approach to examining the evaluation–performance relationship shows promise not only for theory but also for practice.

References Amabile, T. (1979). Effects of external evaluation on artistic creativity. Journal of Personality and Social Psychology, 37, 221–233. Baron, R. (1986). Distraction-conflict theory: Progress and problems. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 19, pp. 1– 40). New York: Academic Press.

THE MERE EFFORT HYPOTHESIS Bartis, S., Szymanski, K., & Harkins, S. (1988). Evaluation of performance: A two-edged knife. Personality and Social Psychology Bulletin, 14, 242–251. Ben-Zeev, T., Fein, S., & Inzlicht, M. (2005). Arousal and stereotype threat. Journal of Experimental Social Psychology, 41, 174 –181. Bond, C. (1982). Social facilitation: A self-presentational view. Journal of Personality and Social Psychology, 42, 1042–1050. Carver, C., & Scheier, M. (1981). The self-attention-induced feedback loop and social facilitation. Journal of Experimental Social Psychology, 17, 545–568. Deci, E., & Ryan, R. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum Press. Elliot, A. J., & Church, M. A. (1997). A hierarchical model of approach and avoidance achievement motivation. Journal of Personality and Social Psychology, 72, 218 –232. Elliot, A. J., Shell, M. M., Henry, K. B., & Maier, M. A. (2005). Achievement goals, performance contingencies, and performance attainment: An experimental test. Journal of Educational Psychology, 97, 630 – 640. Elliott, E., & Dweck, C. (1988). Goals: An approach to motivation and achievement. Journal of Personality and Social Psychology, 54, 5–12. Geen, R. (1989). Alternative conceptions of social facilitation. In P. Paulus (Ed.), Psychology of group influence (pp. 15–51). Hillsdale, NJ: Erlbaum. Grolnick, W., & Ryan, R. (1987). Autonomy in children’s learning: An experimental and individual difference investigation. Journal of Personality and Social Psychology, 52, 890 – 898. Harkins, S. (1987). Social loafing and social facilitation. Journal of Experimental Social Psychology, 23, 1–18. Harkins, S. (2001a). The role of task complexity, and sources and criteria of evaluation in motivating task performance. In. S. Harkins (Ed.), Multiple perspectives on the effects of evaluation on performance: Toward an integration (pp. 99 –131). Norwell, MA: Kluwer Academic. Harkins, S. (2001b). The three-variable model: From Occam’s razor to the black box. In. S. Harkins (Ed.), Multiple perspectives on the effects of evaluation on performance: Toward an integration (pp. 207–259). Norwell, MA: Kluwer Academic. Harkins, S., & Lowe, M. (2000). The effects of self-set goals on task performance. Journal of Applied Social Psychology, 30, 1– 40. Harkins, S., White, P., & Utman, C. (2000). The role of internal and external sources of evaluation in motivating task performance. Personality and Social Psychology Bulletin, 26, 100 –117. Hennessey, B. (2001). The social psychology of creativity: Effects of evaluation on intrinsic motivation and creativity of performance. In S. Harkins (Ed.), Multiple perspectives on the effects of evaluation on performance: Toward an integration (pp. 47–75). Norwell, MA: Kluwer Academic. Hunt, P., & Hillery, J. (1973). Social facilitation in a coaction setting: An examination of the effects over learning trials. Journal of Experimental Social Psychology, 9, 563–571. Husband, R. (1931). Analysis of methods in human maze learning. Journal of Genetic Psychology, 39, 258 –277. Jackson, J., & Williams, K. (1985). Social loafing on difficulty tasks: Working collectively can improve performance. Journal of Personality and Social Psychology, 49, 937–942. Karau, S., & Williams, K. (1993). Social loafing: A meta-analytic review and theoretical integration. Journal of Personality and Social Psychology, 65, 681–706. Kenny, D., Kashy, D., & Bolger, N. (1998). Data analysis in social psychology. In D. Gilbert, S. Fiske, & G. Lindzey (Eds.), The handbook of social psychology (Vol. 1, 4th ed., pp. 233–265). Boston: McGrawHill. Kihlstrom, J. F. (2006). Remote Associates Test. Retrieved July 12, 2006, from http://ist-socrates.berkeley.edu/⬃kihlstrm/RATest.htm Kihlstrom, J. F., Shames, V., & Dorfman, J. (1996). Intimations of memory

455

and thought. In L. Reder (Ed.), Implicit memory and metacognition (pp. 1–23). Mahwah, NJ: Erlbaum. Kirk, R. (1995). Experimental design. Pacific Grove, CA: Brooks/Cole. Kucera, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press. Latane´, B., Williams, K., & Harkins, S. (1979). Many hands make light the work: The causes and consequences of social loafing. Journal of Personality and Social Psychology, 37, 823– 832. Locke, E., & Latham, G. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ: Prentice Hall. Mayzner, M., & Tresselt, M. (1958). Anagram solution times: A function of letter order and word frequency. Journal of Experimental Psychology, 56, 376 –379. McClelland, J., & Rumelhart, D. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88, 375– 407. Mednick, S. A., & Mednick, M. T. (1967). Examiner’s manual: Remote Associates Test. Boston: Houghton Mifflin. Meyer, D., & Schvaneveldt, R. (1971). Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90, 227–234. O’Brien, L., & Crandall, C. (2003). Stereotype threat and arousal: Effects on women’s math performance. Personality and Social Psychology Bulletin, 29, 782–789. Sanna, L. (1992). Self-efficacy theory: Implications for social facilitation and social loafing. Journal of Personality and Social Psychology, 62, 774 –786. Sarason, I., Pierce, G., & Sarason, B. (1996). Domains of cognitive interference. In I. Weiner (Ed.), Cognitive interference (pp. 139 –152). Mahwah, NJ: Erlbaum. Schmader, T., & Johns, M. (2003). Converging evidence that stereotype threat reduces working memory capacity. Journal of Personality and Social Psychology, 85, 440 – 452. Shaver, P., & Liebling, B. (1976). Explorations in the drive theory of social facilitation. Journal of Social Psychology, 99, 259 –271. Shepperd, J. (2001). Social loafing and expectancy-value theory. In. S. Harkins (Ed.), Multiple perspectives on the effects of evaluation on performance: Toward an integration (pp. 1–24). Norwell, MA: Kluwer Academic. Steele, C. (1997). A threat in the air: How stereotypes shape intellectual identity and performance. American Psychologist, 52, 613– 629. Steele, C., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69, 797– 811. Stone, J. (2002). Battling doubt by avoiding practice: The effects of stereotype threat on self-handicapping in White athletes. Personality and Social Psychology Bulletin, 28, 1667–1678. Triplett, N. (1898). The dynamogenic factors in pacemaking and competition. American Journal of Psychology, 9, 507–533. Tubbs, M. (2001). Goal setting research in industrial/organizational psychology. In S. Harkins (Ed.), Multiple perspectives on the effects of evaluation on performance: Toward an integration (pp. 25–26). Norwell, MA: Kluwer Academic. White, P., Kjelgaard, M., & Harkins, S. (1995). Testing the contribution of self-evaluation to goal setting effects. Journal of Personality and Social Psychology, 9, 69 –79. Witte, K., & Freund, J. (2001). Single-letter retrieval cues for anagram solution. Journal of General Psychology, 128, 315–328. Zajonc, R. (1965, July 16). Social facilitation. Science, 149, 269 –274.

Received September 20, 2004 Revision received January 31, 2006 Accepted February 9, 2006 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 456 – 475

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.456

High-Maintenance Interaction: Inefficient Social Coordination Impairs Self-Regulation Eli J. Finkel

W. Keith Campbell and Amy B. Brunell

Northwestern University

University of Georgia

Amy N. Dalton

Sarah J. Scarbeck

Duke University

Northwestern University

Tanya L. Chartrand Duke University Tasks requiring interpersonal coordination permeate all spheres of life. Although social coordination is sometimes efficient and effortless (low maintenance), at other times it is inefficient and effortful (high maintenance). Across 5 studies, participants experienced either a high- or a low-maintenance interaction with a confederate before engaging in an individual-level task requiring self-regulation. Self-regulation was operationalized with measures of (a) preferences for a challenging task with high reward potential over an easy task with low reward potential (Study 1) and (b) task performance (anagram performance in Study 1, Graduate Record Exam performance in Studies 2 and 3, physical stamina in Study 4, and fine motor control in Study 5). Results uniformly supported the hypothesis that experiencing highmaintenance interaction impairs one’s self-regulatory success on subsequent, unrelated tasks. These effects were not mediated through participants’ conscious processes and emerged even with a nonconscious manipulation of high-maintenance interaction. Keywords: high-maintenance interaction, social coordination, self-regulation, interdependence theory, mimicry

you might even finish earlier than 8 p.m. It soon becomes clear, however, that you and Bob have incompatible approaches to the cooking process. You keep getting in one another’s way, and you are forced to exert effort to discern what Bob is doing so the two of you can get in sync. The effort required to coordinate with Bob offsets the advantages of having two competent cooks working together; you finish at 8 p.m. When you turn your attention to manuscript writing later that evening, you are unfocused and unmotivated. You have trouble concentrating, and the quality of your writing is poor. The goal of the present research is to demonstrate that inefficient social coordination on interpersonal tasks (e.g., cooking with others) can impair individual-level self-regulation on subsequent, unrelated tasks (e.g., concentrating effectively while working on a manuscript). Evidence supporting this link would contribute to the research literatures on both self-regulation (a central topic in social psychology) and social coordination (a largely neglected topic).

Imagine that you are an experienced cook who enjoys hosting dinner parties and incorporating creativity into your meal preparation. Imagine further that you have recently decided to volunteer to cook in a soup kitchen for the homeless. You arrive at 4:30 p.m. on a chilly afternoon expecting a busy evening. You plan to finish by 8 p.m., allowing 2 hr to work on an overdue manuscript before bedtime. The manager of the soup kitchen informs you that a second experienced cook, Bob, will be joining you. You anticipate that working with Bob will make the cooking tasks more efficient;

Eli J. Finkel and Sarah J. Scarbeck, Department of Psychology, Northwestern University; W. Keith Campbell and Amy B. Brunell, Department of Psychology, University of Georgia; Amy N. Dalton and Tanya L. Chartrand, Fuqua School of Business, Duke University. We gratefully acknowledge Candida Abrahamson, Galen Bodenhausen, Paul Eastwick, Sanford Finkel, Steven Graham, Madoka Kumashiro, and William Maddux for their helpful suggestions on earlier versions of this article. We also acknowledge Sarah Barber, Joseph Benitez, Kimmy Coburn, Caitlin Hogan, Courtney Holman, Julie Keller, Elizabeth Krusemark, Josh Limor, Ashley Mason, Katie McGee, Amy Mize, Jamie Saul, Luciana Silva, David Sternberg, Leslie Smith, and Eleanor Tate for their assistance with data collection. Correspondence concerning this article should be addressed to Eli J. Finkel, Department of Psychology, Northwestern University, Swift Hall Room 102, 2029 Sheridan Road, Evanston, IL 60208-2710. E-mail: [email protected]

Social Coordination and High-Maintenance Interaction Interpersonal interaction is characterized by effective social coordination to the degree that the interacting individuals are able to align their behaviors with one another in an efficient and effortless manner. The term high-maintenance interaction refers to the degree to which social coordination on an interpersonal task requires energy exertions beyond those required to perform the 456

HIGH-MAINTENANCE INTERACTION

task itself. We argue that differences in the degree to which social coordination experiences are high-maintenance are consequential: These differences influence whether the interactants experience self-regulatory failure on subsequent, unrelated tasks that they perform by themselves. Research in the interdependence theory tradition examines how the structure of interpersonal situations affects the interacting individuals (Kelley et al., 2003; Kelley & Thibaut, 1978; Thibaut & Kelley, 1959), and it provides a good framework through which to investigate phenomena associated with social coordination. Interdependence is defined as “the process by which interacting persons influence one another’s experiences” (Rusbult & Van Lange, 1996, p. 564). This definition is broad enough to include diverse interdependence problems, including well-researched topics like conflicts of interest (e.g., Finkel, Rusbult, Kumashiro, & Hannon, 2002; Rusbult, Verette, Whitney, Slovik, & Lipkus, 1991; Van Lange, 1999) and trust dynamics (e.g., Holmes & Rempel, 1989; Simpson, in press). We suggest that, in addition, the issue of the self-regulatory consequences of efficient versus inefficient social coordination is a central interdependence topic that has been largely neglected heretofore. This neglect is surprising given the degree to which effective social coordination makes life easier. Many tasks are more efficiently accomplished by people working in concert than by individuals working alone, and scholars have recently noted that tasks requiring coordination are pervasive: “Most human activity involves coordinating one’s actions with the actions of others” (Reis & Collins, 2004, p. 233), and “Virtually all social activity requires coordination of some sort. . . . Two colleagues completing a research paper, two professors team teaching a course, and a committee of faculty members in a doctoral dissertation defense all depend on coordination” (Thompson & Fine, 1999, p. 282). Depending on the interpersonal coordination at a given time, task performance may vary dramatically in its efficiency (e.g., Kelley et al., 2003). Although there is a large literature examining the personal consequences of interpersonal conflict, very little research investigates the personal consequences of poor interpersonal coordination. Rusbult and Van Lange (2003) illustrated this distinction between interpersonal problems rooted in conflicts of interest and those rooted entirely in coordination difficulties by presenting two different scenarios for John and Mary as they decide where to spend their summer vacation. In the first scenario, John wants to go to a beach resort and Mary wants to go to Rome. In the second, John and Mary both want to go to Rome. Whereas the first scenario requires that John and Mary make dicey decisions to navigate through their different preferences, the second does not require that John and Mary take one another’s preferences into account—after all, they have the same preferences in the first place. Rusbult and Van Lange (2003) observed that interaction in the second scenario represents a coordination problem—the two must agree on a date for their vacation, and one person must arrange for travel and lodging. Thus, in comparison to situations with conflicting interests, situations with corresponding interests are relatively simple. [italics added] . . . They entail coordinating in such a manner as to enjoy the good outcomes that are readily available to the pair. (p. 352)

457

We suggest that such coordination is frequently simple, not because coordinating with others is a trivial task (consider, e.g., the immense complexity of programming a robot to master the subtleties of engaging in smooth social coordination with a human being), but rather because humans acquire, as an aspect of normal development, remarkable behavioral repertoires for bringing about effective social coordination. Furthermore, once these repertoires are developed, humans generally apply them effortlessly and nonconsciously to novel social situations. As a result, wellcoordinated social interaction is the norm; poor coordination is the salient exception (Hatfield, Cacioppo, & Rapson, 1994). Although efficient coordination experiences are the norm, interaction requiring effortful attention to the complexities of such coordination remains prevalent in everyday life. For example, it can be complicated—and exhausting—to decide where several friends will go for dinner (or what movie to see thereafter), even if everybody in the group would be content with any restaurant under consideration. We suggest that when individuals have compatible goals but the interpersonal execution of these goals is inefficient enough to require heightened vigilance to issues of social coordination, their self-regulatory success on subsequent, unrelated tasks may well become impaired. Initial support for this hypothesis has emerged from a recent series of studies investigating cross-race interaction (Richeson & Shelton, 2003; Richeson & Trawalter, 2005). One study, for example, revealed that relative to nonbiased White university students, those who were racially biased had impaired performance on a Stroop color-word task after interacting with a Black confederate. A plausible explanation for these results is that the biased White students had to exert more energy to make this cross-race interaction go smoothly, and, as a result, their subsequent task performance suffered. This performance decrement is likely indicative of a more general state of impaired self-regulation experienced by biased White individuals after interacting with a Black person. This energy exertion and performance decrement interpretation fits with recent developments in the self-regulation literature, to which we now turn our attention.

Self-Regulation Self-regulation refers to what Baumeister (1998) called the self’s executive function, which “makes decisions, initiates actions, and in other ways exerts control over both self and environment” (p. 712). Self-regulation is the psychological process activated when studying on a Friday night rather than getting ice cream with friends or when forcing oneself to concentrate on a difficult task when one’s mind begins to wander. It entails efforts by the self to alter its states or responses (Vohs & Baumeister, 2004) in a goal-directed manner. A large body of evidence supports the assertion that effective self-regulation is essential to living life well and to the existence of a well-functioning civilization (Baumeister, Heatherton, & Tice, 1994). Increasingly, researchers are gaining insight into the intrapersonal processes by which individuals engage in successful selfregulation (e.g., Baumeister et al., 1994; Carver & Scheier, 1998; Gollwitzer, 1999; Higgins, 2000; Loewenstein, 1996; Mischel, Shoda, & Rodriquez, 1989; Rothman, Baldwin, & Hertel, 2004; Shah & Kruglanski, 2003). The present research builds on this literature by investigating whether the interpersonal process of

458

FINKEL ET AL.

high-maintenance interaction impairs individual-level selfregulatory success on subsequent, unrelated tasks. We hope this investigation serves as one demonstration of a more general observation: A comprehensive theory of self-regulation requires enhanced insight into the processes by which individuals’ selfregulatory success is influenced by interpersonal processes. By what mechanism might high-maintenance interaction affect self-regulation? One compelling possibility is that it impairs selfregulation by depleting psychological resources. Recent theorizing suggests that successful self-regulation requires a central psychological resource called self-regulatory strength, which refers to “the internal resources available to inhibit, override, or alter responses” (Schmeichel & Baumeister, 2004, p. 86). In the context of high-maintenance interaction, tempting responses might include losing focus, discontinuing the interaction, or being rude; striving to achieve efficient coordination in such interaction requires that one exert self-regulatory strength to overcome these counterproductive responses. Accumulating evidence demonstrates that selfregulatory strength is a limited and depletable resource that fluctuates markedly as a function of factors such as prior willpower exertion, exhaustion, and stress (for reviews, see Muraven & Baumeister, 2000; Schmeichel & Baumeister, 2004). To the degree that individuals exert self-regulatory strength in a given situation, they will have fewer self-regulatory resources available for a separate task they perform moments later (i.e., their “strength” is sapped and they are left in a state of self-regulatory strength depletion). An important implication is that “a person can become exhausted from many simultaneous demands and so will sometimes fail at self-control even regarding things at which he or she would otherwise succeed” (Baumeister & Heatherton, 1996, p. 3). Research on self-regulatory strength depletion typically uses a two-task paradigm, in which participants are randomly assigned to perform an initial task that either requires self-regulatory exertion or does not. After completing this first task, all participants complete the same follow-up task that also requires self-regulatory exertion. Abundant evidence demonstrates that relative to participants who first performed the task requiring no self-regulatory exertion, those who first performed the task requiring selfregulatory exertion exhibit impaired performance on the second task (e.g., Baumeister, Bratslavsky, Muraven, & Tice, 1998; Muraven, Collins, & Neinhaus, 2002; Muraven, Tice, & Baumeister, 1998; Vohs & Heatherton, 2000; Vohs & Schmeichel, 2003). Recent research supports the strength model of self-regulation by demonstrating that experiencing the initial task that requires selfregulatory exertion only impairs performance on follow-up tasks that also require self-regulatory exertion. For example, depletion impairs performance on complex thinking tasks (e.g., cognitive extrapolation, thoughtful reading comprehension) but not on simpler mental activities (e.g., general knowledge, memorization and recall of nonsense syllables) (Schmeichel, Vohs, & Baumeister, 2003). The volume of evidence amassed since the late 1990s to support the strength model leaves little doubt that prior self-regulatory exertion impairs self-regulation on subsequent tasks. As this literature has matured, scholars have become increasingly interested in identifying the psychological mechanisms through which engaging in the first task impairs performance on the second one. Several findings raise the intriguing possibility that the psychological processes linking initial self-regulatory exertion to subsequent

self-regulatory failure remain mysterious to those very individuals who so reliably fall prey to them. For example, evidence suggests that depletion effects are not caused by differences across experimental conditions in mood (e.g., Ciarocco, Sommer, & Baumeister, 2001; Schmeichel et al., 2003; Vohs & Schmeichel, 2003), in self-efficacy (Wallace & Baumeister, 2002), or even in subjectively experienced depletion (e.g., Muraven & Slessareva, 2003; Schmeichel et al., 2003). From this perspective, perhaps attempts to identify verbally the driving mechanisms underlying our selfregulatory strength depletion amounts to “telling more than we can know” (Nisbett & Wilson, 1977, p. 231). It is plausible, for example, that participants in the two-task paradigm are not consciously aware of how performing the first task has influenced their psychological dynamics, a state of affairs that would leave them particularly vulnerable to the effects of depletion because they would be unable to rally resources deliberately to counteract its effects (see the discussion by Finkel & Campbell, 2001). As emphasized earlier, the primary goal of the present research is to examine whether experiencing high-maintenance interaction impairs subsequent, individual-level self-regulatory success. A secondary goal is to investigate whether participants are consciously aware of how high-maintenance interaction is influencing them. We systematically examine whether subjectively experienced depletion (the most plausible mechanism)—and mood, selfefficacy, and liking for the interaction partner (three other possible mechanisms)—mediate the effect of high-maintenance interaction on impaired self-regulation. If these systematic efforts to establish the existence of a self-report mediator reveal reliable evidence for one or more of these mechanisms, such evidence would expand researchers’ knowledge of the precise pathway through which high-maintenance interaction impairs self-regulation. If these efforts reliably fail to reveal any evidence to support any of these possible mechanisms, such evidence would also expand researchers’ knowledge of these processes, specifically by providing support for the notion that such interaction impairs individuals’ subsequent self-regulation without their awareness.

Hypothesis and Research Overview As observed previously, research investigating the effects of high-maintenance interaction on self-regulation is sparse. To fill this gap—and on the basis of the preceding theoretical analysis—we advance the hypothesis that in comparison to experiencing low-maintenance interaction, experiencing high-maintenance interaction results in impaired individual-level self-regulation on subsequent, unrelated tasks. We report the results of five studies manipulating whether participants experienced poorly or wellcoordinated interaction with a confederate before performing an individual-level behavioral task requiring self-regulatory exertion. We operationalized self-regulatory success by assessing (a) preferences for a challenging task with high reward potential over an easy task with low reward potential (Study 1) and (b) task performance (anagram performance in Study 1, Graduate Record Examination [GRE] performance in Studies 2 and 3, physical stamina in Study 4, and fine motor control in Study 5). In all studies, we also examined whether participants were consciously aware of how high-maintenance interaction affected them.

HIGH-MAINTENANCE INTERACTION

Study 1: Maze Task The primary goal of Study 1 was to present a first test of the hypothesis that experiencing high-maintenance interaction results in impaired self-regulation on subsequent tasks. We devised an experimental manipulation of social coordination (high- vs. lowmaintenance interaction) with a same-sex stranger, examining whether individuals who experience poorly coordinated interpersonal interaction would perform worse on subsequent selfregulatory tasks relative to those who experience well-coordinated interaction. We included two different, theoretically derived measures of self-regulation (Baumeister et al., 1994; Gottfredson & Hirschi, 1990): task motivation, or whether participants prefer to engage in a challenging task that has the potential to be rewarding or an easy task that is unlikely to be rewarding, and task performance, or how well participants perform on a task of intermediate difficulty. Study 1 participants interacted with a confederate of the experimenter whose behavior made social coordination either high maintenance (inefficient, difficult) or low maintenance (efficient, easy). After this interaction, we provided participants with the option of performing (by themselves) either a challenging task that had the potential to be rewarding or an easy task that was unlikely to be rewarding. On the basis of previous research suggesting that a central correlate of poor self-regulation is a preference for simple tasks (Gottfredson & Hirschi, 1990; Grasmick, Tittle, Bursik, & Arneklev, 1993; see also Flora, Finkel, & Foshee, 2003) and the observation that depleted individuals prefer to engage in simple tasks (e.g., watching television) rather than challenging tasks (e.g., doing homework), we expected that participants assigned to the high-maintenance condition would be less likely to select the challenging, potentially rewarding task than would those assigned to the low-maintenance condition. After participants selected the easy or the challenging task, we presented all of them with the identical task of intermediate difficulty. Building on the idea that high-maintenance interaction impairs self-regulation (e.g., concentration, motivation), we predicted that participants assigned to the high-maintenance condition would perform worse on this task than would those assigned to the low-maintenance condition.

Method Participants. Participants were 26 female undergraduates who volunteered to take part in the study in partial fulfillment of the requirements for an introductory psychology course. These women were 18.92 (SD ⫽ 1.16) years old on average, and most were Caucasian (15% African American and 85% Caucasian). Procedure. Participants reported to the experiment and waited outside the laboratory. Waiting with them was another “participant,” who was actually a confederate of the experimenter. Participants were greeted by the experimenter, who explained to them that they would first perform a task together and, subsequently, they would perform a task independently. The experimenter added that the person who signed up for Working With Others (always the participant) would be the “Tracker” in the first task and the person who signed up for Teamwork and Communication (always the confederate) would be the “Communicator.” Modifying a procedure from prior research (Engebretson, Matthews, & Scheier, 1989), the experimenter led participants to a room partially divided by a partition. On a desk on one side of the partition was a computer joystick that was ostensibly connected to the computer on the other side of

459

the partition. The partition was arranged such that (a) the participant and the confederate were unable to see one another and (b) only the confederate was able to see the computer monitor. The experimenter explained to the newly formed partners that they would be performing a 3-min collaborative maze task. Specifically, she told them the following: In this task, the goal is for the two of you to coordinate your efforts to achieve optimal performance on the task. The task requires that the Tracker [participant] trace an irregular maze using the joystick. When [the Tracker] deviates from the maze, the computer will score it as time off the maze. However, as you can see, [the Tracker’s] view of the maze will be obstructed, which will force [her] to rely on the Communicator [confederate] for directions. The Communicator will only be allowed to direct the Tracker using the following terms: left, right, up, down, diagonal, slower, faster, and stop. The Tracker is not allowed to speak at all during the task—no exceptions. Performance on this task will be evaluated based on the distance traveled (or speed) and the number of errors, and it will be compared against normative scores available from previous testing. Participants were randomly assigned to one of two conditions for this social coordination task. In the high-maintenance condition, the confederate made a scripted series of errors in her directions. Typical errors were “Wait!” and “Right . . . I mean left.” The confederate made roughly one error every 10 directions and deliberately remained out of sync with the participant. In the low-maintenance condition, she followed the same script but without making errors and while staying in sync. To minimize the likelihood that individuals in the high-maintenance condition would feel that they had performed worse on the maze task than would those in the low-maintenance condition, the experimenter gave participants in both conditions the same feedback, stating that they had scored somewhat above average. To maximize the likelihood that participants would believe this feedback, we displayed it on the computer screen. Following the maze task, the experimenter led the participant and the confederate to separate rooms to complete individual tasks. After dismissing the confederate, she returned to inform the participant that the next task would be to solve anagrams. The participant was asked to choose between (a) easy anagrams that could be fun and not too challenging to solve or (b) difficult anagrams that take more concentration but could be rewarding to solve. The participant’s preference for the easy versus the challenging task served as our measure of task motivation. After recording the participant’s answer, the experimenter left the room, ostensibly to retrieve the selected anagram task. When she returned, however, she informed the participant that only a moderately challenging set of anagrams was available; that is, regardless of the participant’s task preference and experimental condition, the experimenter presented every participant with the same anagram task— one of intermediate difficulty (as reported by Gilhooly & Johnson, 1978). The experimenter gave participants 5 min to solve as many of the 15 anagrams as they could. The number of correctly solved anagrams served as our measure of task performance. Participants then completed a final questionnaire consisting of a twoitem subjectively experienced depletion measure (“At the end of the task, I felt emotionally drained” and “At the end of the task, I felt tired”; ␣ ⫽ .79), a two-item measure of liking for the interaction partner (“Overall, I liked my partner”), and a four-item manipulation check assessing the degree to which they experienced the maze interaction as a highmaintenance interaction (e.g., “We had a difficult time communicating,” “It was easy for us to coordinate our efforts”; the latter of which was reverse-scored; ␣ ⫽ .87).

Results and Discussion Manipulation check. Before performing hypothesis tests, we wanted to discern whether participants in the high-maintenance

460

FINKEL ET AL.

condition experienced the interaction with the confederate as more high maintenance than did those in the low-maintenance condition. Results from an independent-samples t test revealed that participants assigned to the high-maintenance condition indeed felt that the interaction was significantly more high maintenance (M ⫽ 2.29, SD ⫽ 0.95) than did those in the low-maintenance condition (M ⫽ 1.63, SD ⫽ 0.43), t(24) ⫽ 2.27, p ⫽ .03. Hypothesis tests. As described above, we included two dependent measures to test our hypothesis that high-maintenance interaction results in impaired self-regulation: (a) task motivation and (b) task performance. First, as depicted in Figure 1, results from a chi-square test revealed that participants who had been assigned to the high-maintenance condition were significantly and substantially less likely to choose the challenging task than were those who had been assigned to the low-maintenance condition, ␹2(1) ⫽ 5.85, p ⫽ .02. These results suggest that people who have recently experienced a potentially depleting social interaction prefer to engage in simple, nonchallenging tasks rather than in challenging tasks with high reward potential. Second, as depicted in Figure 2, results from an independentsamples t test revealed that participants who had been assigned to the low-maintenance condition solved 56% more anagrams than did those who had been assigned to the high-maintenance condition, t(24) ⫽ 2.31, p ⫽ .03. We also performed an additional regression analysis predicting the number of anagrams solved from experimental condition, controlling for the effects of task motivation. This analysis revealed a significant effect of the experimental condition on the number of anagrams solved, ␤ ⫽ .51, t(23) ⫽ 2.43, p ⫽ .02, which suggests that high-maintenance interaction impairs self-regulation, even after we controlled for task motivation. Auxiliary analyses. To discern whether liking for the interaction partner accounted for the effect of the social coordination manipulation on impaired self-regulation, we conducted two multiple regression analyses predicting, respectively, task motivation (selection of the challenging vs. easy anagrams) and task performance (number of anagrams solved) from the social coordination manipulation and the liking measure. Results revealed that the

social coordination manipulation predicted unique variance in both task motivation, ␤ ⫽ .47, t(23) ⫽ 2.47, p ⫽ .02, and task performance, ␤ ⫽ –.40, t(23) ⫽ –2.11, p ⬍ .05, whereas the liking measure did not (both 兩ts兩 ⬍ 1.00). To examine whether including subjectively experienced depletion in the model altered conclusions, we conducted two additional multiple regression analyses predicting, respectively, task motivation and task performance from the social coordination manipulation and the subjectively experienced depletion measure. Results revealed that the social coordination manipulation predicted unique variance in both task motivation, ␤ ⫽ .47, t(23) ⫽ 2.58, p ⫽ .02, and task performance, ␤ ⫽ –.43, t(23) ⫽ –2.31, p ⫽ .03, whereas the subjectively experienced depletion measure did not (both 兩ts兩 ⬍ 1.00). Although mediation by subjectively experienced depletion seemed plausible a priori, the nonsignificant difference on this variable as a function of experimental condition is consistent with previous findings in the depletion literature (e.g., Muraven & Slessareva, 2003; Schmeichel et al., 2003). Summary. Taken together, the Study 1 results provide strong initial support for the notion that high-maintenance interaction causes impaired self-regulation. Participants who were randomly assigned to engage in a 3-min high-maintenance (relative to lowmaintenance) interaction subsequently exhibited substantially impaired task motivation and task performance. This effect was not mediated by liking for the interaction partner or subjectively experienced depletion.

Study 2: Data Entry Task The primary goal of Study 2 was to replicate the Study 1 findings with a method that used (a) new coordination and selfregulation tasks and (b) a no-interaction control condition. Although the Study 1 results suggest that in comparison to the effects of low-maintenance interaction, high-maintenance interaction impairs self-regulatory success, they do not enable us to discern whether (a) high-maintenance interaction is destructive, (b) lowmaintenance interaction is constructive, or (c) some combination of these possibilities is the case. Given that efficient social coor-

Figure 1. Study 1: The percentage of participants electing to perform the challenging anagram task (rather than the simple one) as a function of whether they had previously engaged in a high-maintenance or a lowmaintenance interaction with a confederate.

HIGH-MAINTENANCE INTERACTION

461

Figure 2. Study 1: The number of anagrams participants solved as a function of whether they had previously engaged in a high-maintenance or a low-maintenance interaction with a confederate.

dination is the norm (with poor coordination as the exception; see the Introduction), we predicted that high-maintenance interaction would impair self-regulation but low-maintenance interaction would not strengthen it. In Study 2, we randomly assigned participants to perform a data entry task (a) with a confederate who made the interaction high maintenance, (b) with a confederate who made the interaction low maintenance, or (c) alone. In the two dyadic conditions, the samesex confederate read a string of numbers to the participant, who entered them into a computer spreadsheet. After completing this task, participants spent 10 min working (alone) on analytical GRE questions. We chose analytical GRE questions as our dependent measure for two primary reasons. First, performance on standardized tests has important real-world implications. Second, this task is cognitively demanding. To perform well, individuals must focus intently on diverse pieces of information at once, which requires motivation and persistent concentration (Yang & Johnson-Laird, 2001). As mentioned earlier, previous research has demonstrated that performance on the analytical section of the GRE is exactly the type of cognitive ability that becomes impaired when individuals experience self-regulatory strength depletion (Schmeichel et al., 2003).

experimenter added that the person who signed up for Working With Others (always the participant) would be the “Recorder” in the first task and the person who signed up for Teamwork and Communication (always the confederate) would be the “Communicator.” Participants were led to a room partially divided by a partition. The data to be entered were on a desk on one side of the partition, but the computer into which the data were to be entered was on the other side. The partition was arranged such that (a) the participant and the confederate were unable to see one another, (b) only the participant could see the computer monitor, and (c) only the confederate could see the data to be entered. In the two dyadic conditions, the experimenter explained to the newly formed partners that they would be performing a collaborative data entry task and that performance on this task was predictive of future career success. Specifically, she told them the following: In many workplaces, people need to rely on each other to get a job done. In this task, the goal is for the two of you to coordinate your efforts to achieve optimal performance on a data entry task. The task requires that the Recorder [participant] enter the data being called out as accurately as possible. . . . Your goal is to enter as much data as you can, which will force you to rely on the Communicator [confederate] for the data. The Communicator will call out the data as it is listed on the sheet. Performance on this task will be evaluated based on your speed and accuracy, and it will be compared against normative scores available from previous work. Previous research has shown that this

Method Participants. Participants were 58 undergraduates who volunteered to take part in the study in partial fulfillment of the requirements for an introductory psychology course.1 We dropped 4 participants (1 because of suspicion regarding our experimental procedures, 2 because of the participants’ failure to follow directions during the data entry task, and 1 because the experimenter forgot to administer the GRE measure), leaving a sample of 54 participants (37 women) who were 19.70 (SD ⫽ 1.22) years old on average and predominantly Caucasian (6% African American, 91% Caucasian, and 4% other).2 Procedure. The Study 2 procedures paralleled those used in Study 1. Participants reported to the experiment and waited outside the laboratory. Waiting with them was a same-sex “participant” who was actually a confederate of the experimenter. Participants were greeted by the experimenter, who explained to them that they would first perform a task together, and subsequently, they would perform a task independently. The

1

We discarded 15 participants because their sessions were conducted by an experimenter who experienced difficulties in running the sessions. After lab members alerted us to her consistent failure to follow experimental procedures, we examined the rate at which substantial problems (experimenter error and participant suspicion) occurred in the sessions she ran relative to this rate for the other experimenters. These problems were 3 times more likely in her sessions than in the other four experimenters’ sessions combined. 2 Although approximately one third of the Study 2 participants were male, a quirk in random assignment resulted in only 2 of these male participants being assigned to each experimental condition (with 13 assigned to the control condition). As such, we are not in a position to examine sex effects in this study, and the analyses reported later in this article collapse across participant sex. We addressed this concern in Studies 3 through 5.

FINKEL ET AL.

462

task is predictive of people’s later job success and reflects on how well you perform in various work environments. After giving these directions, the experimenter left the room. The participant and the confederate performed this task for 5 min. In the alone condition, the experimenter instructed the participant on the data entry task without mentioning another person or coordination. Participants were randomly assigned to one of three conditions for this data entry task. In the high-maintenance condition, the confederate made a scripted series of errors while calling out the data. Typical errors were “2—I mean 1” and “9, oops, sorry, I meant 4.” The confederate made roughly one error for every 10 number sets. To strengthen the manipulation further, the confederate remained out of sync with the participant: He or she could hear the strokes of the keyboard and strategically avoided developing a rhythm with the participant. In the low-maintenance condition, the confederate followed the same script but without making errors and while staying in sync as the participant entered the data. In the alone (control) condition, the participant entered the data by himself or herself. After this task was completed, the experimenter gave participants in all conditions the same feedback, stating that they scored somewhat above average on the data entry task. As in Study 1, we displayed this feedback on the computer screen. Next, the experimenter directed the participants to perform an individual task answering analytical problems taken from the GRE. She gave the participants 10 min to solve as many of the nine problems as they could; the number of correctly answered GRE problems served as our dependent measure. Unlike participants in Study 1, those in Study 2 did not choose whether they preferred to perform an easy or a challenging task; we simply presented them with the GRE task without any reference to task difficulty. Following the GRE task, participants completed a brief questionnaire (“After the data entry task, . . .”), including a two-item subjectively experienced depletion measure (“I felt drained” and “I felt mentally exhausted”), a straightforward two-item mood measure (“I was in a bad mood,” reverse-scored, and “I was in a good mood”), and a two-item self-efficacy measure (“I felt like I could accomplish my goals” and “I felt confident in my abilities”). Although previous research suggests that the effects that emerge in the two-task paradigm are not due to differences in mood (e.g., Ciarocco et al., 2001; Schmeichel et al., 2003; Vohs & Schmeichel, 2003) or self-efficacy (Wallace & Baumeister, 2002) across experimental conditions, we wanted to discern whether these constructs might account for the effect of the social coordination manipulation on impaired self-regulation in the present research. Participants then completed a three-item questionnaire assessing liking for the interaction partner (“I liked my lab partner,” “My lab partner was nice,” and “It was a pleasure working with my lab partner”).3 Finally, all participants completed the four-item manipulation check (as in Study 1) assessing the degree to which they experienced the interaction with the confederate as a highmaintenance interaction. The depletion (␣ ⫽ .88), mood (␣ ⫽ .71), self-efficacy (␣ ⫽ .86), liking (␣ ⫽ .88), and high-maintenance interaction (␣ ⫽ .86) measures exhibited acceptable scale reliabilities.

Results and Discussion Manipulation check. As in Study 1, we wanted to discern whether participants in the high-maintenance condition experienced the interaction with the confederate as more high maintenance than did those in the low-maintenance condition. (The control participants did not complete this measure because they never interacted with a confederate.) Results from an independentsamples t test revealed that participants assigned to the highmaintenance condition indeed felt that the interaction was significantly more high maintenance (M ⫽ 2.15, SD ⫽ 0.82) than did those assigned to the low-maintenance condition (M ⫽ 1.44, SD ⫽ 0.75), t(27) ⫽ 2.41, p ⫽ .02.

Hypothesis tests. We predicted that participants who experienced the high-maintenance data entry interaction would correctly answer significantly fewer GRE questions relative to those who experienced the low-maintenance data entry interaction or who performed the data entry task by themselves. A one-way analysis of variance (ANOVA) predicting the number of GRE questions answered correctly from the experimental manipulation revealed a significant difference between conditions, F(2, 51) ⫽ 6.67, p ⬍ .01. To gain insight into this omnibus difference, we created two dummy variables to compare (a) the low-maintenance participants with the high-maintenance participants and (b) the alone participants with the high-maintenance participants. As depicted in Figure 3, results supported our predictions: Compared with the highmaintenance participants, the low-maintenance participants correctly answered 45% more GRE questions, F(1, 51) ⫽ 8.90, p ⬍ .01, and the alone participants correctly answered 50% more, F(1, 51) ⫽ 12.35, p ⬍ .001. Also as predicted, a separate analysis failed to reveal significant differences between the lowmaintenance participants and the alone participants, F(1, 51) ⬍ 1.00. Auxiliary analyses. To discern whether mood and/or selfefficacy accounted for the effect of the social coordination manipulation on poor GRE performance, we conducted an additional multiple regression analysis predicting GRE score from the social coordination manipulation and both possible mediators. Results revealed that the social coordination manipulation predicted unique variance in GRE performance, F(2, 49) ⫽ 6.79, p ⫽ .003, whereas mood and self-efficacy did not, Fs(1, 49) ⬍ 1.00. A follow-up analysis added liking for the interaction partner to this model and also revealed that the social coordination manipulation predicted unique variance in GRE performance, F(2, 17) ⫽ 8.12, p ⫽ .01, whereas mood, self-efficacy, and liking did not, Fs(1, 17) ⬍ 1.22, ps ⬎ .28. To examine whether including subjectively experienced depletion in the model altered conclusions, we conducted a multiple regression analysis predicting GRE performance from the social coordination manipulation and the subjectively experienced depletion measure. Results revealed that the social coordination manipulation predicted unique variance in GRE performance, F(2, 50) ⫽ 6.66, p ⫽ .003, whereas the subjectively experienced depletion measure did not, F(1, 50) ⬍ 1.00. Overall, these results provide no support for the notion that mood, selfefficacy, liking for the interaction partner, or subjectively experienced depletion mediates the association of high-maintenance interaction with impaired self-regulation. Summary. The results from Study 2 extended those from Study 1 in suggesting that high-maintenance interaction causes impaired self-regulation when compared with a low-maintenance condition or a control condition. These findings suggest that highmaintenance interaction impairs self-regulation but that lowmaintenance interaction does not enhance it. This effect was not attributable to subjectively experienced depletion, mood, selfefficacy, or liking for the partner.

3

A procedural error meant that only 22 of the participants completed this three-item measure.

HIGH-MAINTENANCE INTERACTION

463

Figure 3. Study 2: The number of Graduate Record Exam (GRE) problems participants answered correctly as a function of whether they had previously engaged in a high-maintenance or a low-maintenance interaction with a confederate, or had previously performed the task alone.

Study 3: Maze Task (Revisited) Studies 1 and 2 provide good support for the hypothesis that high-maintenance interaction causes impaired self-regulation. The primary goal of Study 3 was to provide a stronger test of the mediation and confound analyses by assessing the three possible mechanisms that seemed most plausible to us (subjectively experienced depletion, mood, and self-efficacy) between the highmaintenance manipulation and the self-regulatory task rather than after it (as done in Studies 1 and 2). Given that this goal involves establishing the stability of the effects demonstrated in Studies 1 and 2, Study 3 directly replicated procedures from these previous studies: Participants completed a maze task with a confederate (as in Study 1) and then performed a GRE task (as in Study 2).

Method Participants. Participants were 46 undergraduates (24 women) who volunteered to take part in the study in partial fulfillment of the requirements for an introductory psychology course. These participants were 19.24 (SD ⫽ 1.52) years old on average, and most were Caucasian (4% African American, 7% Asian American, 87% Caucasian, and 2% other). Procedure. The procedures for Study 3 were borrowed directly from previous studies. Participants experienced the same confederate-based maze task as used in Study 1 and the same GRE task as used in Study 2. After participating in the maze task but before participating in the GRE task, participants completed a brief questionnaire (“I feel . . .”) including an elaborated, seven-item subjectively experienced depletion measure (mentally exhausted; motivated, reverse-scored; drained; energetic, reversescored; worn out; lazy; and focused, reverse-scored); an elaborated, sevenitem mood measure (happy, content, cheerful, angry, frustrated, annoyed, and sad; the four negative mood items were reverse-scored); and a threeitem self-efficacy measure (competent, capable, confident). After completing the GRE task, participants also completed the same four-item manipulation check measure used in Studies 1 and 2 to assess the degree to which

they experienced the interaction with the confederate as a highmaintenance interaction. The depletion (␣ ⫽ .68), mood (␣ ⫽ .77), self-efficacy (␣ ⫽ .82), and high-maintenance interaction (␣ ⫽ .71) measures all exhibited acceptable scale reliabilities.

Results and Discussion Manipulation check. Unlike the manipulation check findings from the identical task in Study 1 and from the conceptually similar task in Study 2, results from an independent-samples t test did not reveal significant differences between the highmaintenance (M ⫽ 2.12, SD ⫽ 0.87) and the low-maintenance (M ⫽ 1.98, SD ⫽ 0.80) conditions in predicting subjectively experienced high-maintenance interaction, ␤ ⫽ .08, 兩t(44)兩 ⬍ 1.00, although means were descriptively in the sensible direction. We continued with hypothesis tests despite the nonsignificant manipulation check because (a) this manipulation has been effective previously (in Study 1), (b) theory dictates that the task fits the criteria for a high-maintenance interaction, and (c) significant effects of the social coordination manipulation on GRE performance in the absence of a significant manipulation check could provide preliminary support for the intriguing idea that highmaintenance interaction can impair subsequent self-regulation even when the individual fails to recognize consciously that the interaction had been a high-maintenance one. Hypothesis tests. To test the hypothesis that participants who experienced the high-maintenance maze interaction would correctly answer significantly fewer GRE questions relative to those who experienced the low-maintenance maze interaction, we performed an independent-samples t test predicting GRE score from the social coordination manipulation. As depicted in Figure 4, results revealed that participants who had been assigned to the low-maintenance condition solved 35% more GRE problems than

464

FINKEL ET AL.

Figure 4. Study 3: The number of Graduate Record Exam (GRE) problems participants answered correctly as a function of whether they had previously engaged in a high-maintenance or a low-maintenance interaction with a confederate.

did those who had been assigned to the high-maintenance condition, t(44) ⫽ –2.68, p ⫽ .01. An exploratory multiple regression analysis examining whether the strength of this effect differed as a function of participant sex revealed a significant Social Coordination Condition ⫻ Participant Sex interaction effect, ␤ ⫽ .31, t(42) ⫽ 2.28, p ⫽ .03. This analysis also revealed a significant main effect for the social coordination manipulation, ␤ ⫽ –.36, t(42) ⫽ –2.64, p ⫽ .01, but not for participant sex, ␤ ⫽ .00, t(42) ⫽ 0.03, p ⫽ .97. Follow-up analyses revealed that the high-maintenance interaction effect was in the expected direction for both sexes but stronger for females. Given that we did not predict this sex difference, that the general trends were in the expected direction for both sexes, and that the social coordination main effect remained robust in the analysis controlling for participant sex and the interaction effect, we awaited the results of Studies 4 and 5 before drawing firm conclusions about sex differences. Auxiliary analyses. To discern whether mood and/or selfefficacy accounted for the effect of the social coordination manipulation on GRE performance, we conducted an additional multiple regression analysis predicting GRE score from social coordination and both of these possible mediators. Results revealed that the social coordination manipulation predicted unique variance in GRE performance, ␤ ⫽ –.32, t(42) ⫽ –2.13, p ⫽ .04, whereas mood and self-efficacy did not, 兩␤s兩 ⬍ .15, 兩ts(20)兩 ⬍ 1.00. To examine whether including subjectively experienced depletion in the model altered conclusions, we conducted a multiple regression analysis predicting GRE performance from the social coordination

manipulation and the subjectively experienced depletion measure. Results revealed that the social coordination manipulation predicted unique variance in GRE performance, ␤ ⫽ –.59, t(21) ⫽ –3.19, p ⫽ .004, whereas the subjectively experienced depletion measure did not, ␤ ⫽ .07, 兩t(21)兩 ⬍ 1.00. These results once again provide no support for the notion that mood, self-efficacy, or subjectively experienced depletion mediates the association of high-maintenance interaction with impaired self-regulation. Summary. Complementing previous findings, then, the Study 3 results suggest that high-maintenance interaction causes impaired self-regulation. This effect was not attributable to subjectively experienced depletion, mood, or self-efficacy.

Study 4: Social Problem Solving Studies 1 through 3 provide strong support for the hypothesis that high-maintenance interaction causes impaired self-regulation. These studies all used conceptually similar procedures: Participants, who were not allowed to speak to or see their partner, engaged in a nonemotional dyadic task with a confederate who made the interaction either high maintenance or low maintenance. The primary goal of Study 4 was to examine whether the highmaintenance interaction effect would emerge when we used substantially different procedures designed to address the limitations of those used in Studies 1 through 3. Participants in Study 4 were assigned to provide guidance or comfort to an emotionally distressed stranger (who was actually a confederate). We (a) investigated high-maintenance interaction by manipulating whether or

HIGH-MAINTENANCE INTERACTION

not this distressed stranger was receptive to participants’ efforts to help and (b) hypothesized that the poor social coordination resulting from the repeated and ineffective efforts to help the stranger who was unreceptive would lead to impaired self-regulation on a subsequent, unrelated task. The Study 4 procedures differed in three important ways from those used in the previous studies. First, participants were placed in an active instead of a passive role: Rather than being dependent in the high-maintenance interaction condition on the confederate’s poor directions (as in the previous studies), participants were now in the agentic role of attempting to help the confederate with a problem, offering reasonable suggestions that simply failed to promote synchronized dialogue. Second, the experimental procedures placed no constraints on the participant’s behavior: Rather than having to stay silent throughout the task, participants were now allowed to communicate freely in any way that felt appropriate to them. Third and finally, participants in Study 4 experienced an emotionally involving and ecologically valid task: Rather than engaging in minimally involving and artificial tasks (dyadic data entry without being able to see the person reading the numbers or navigating a computerized maze without being able to see the computer monitor), participants were now placed into a deeply involving context that served as a realistic analogue for situations they were likely to experience in their own lives. Furthermore, performing these tasks well in their own lives influences the quality of their interpersonal relationships. Replicating the highmaintenance interaction effect with these substantially revised procedures would rule out alternative explanations for the findings of Studies 1 though 3 (e.g., that the effect is unique to being in a passive role, to situations in which one experiences externally imposed restraints on one’s communication, or to tasks that are uninvolving or artificial). In addition to ruling out alternative explanations for the highmaintenance interaction effect and exploring its boundary conditions, the procedures used in Study 4 can shed light on the psychological dynamics underlying a particularly robust and important empirical finding in clinical psychology: that experiencing depression predicts being socially rejected (Segrin & Dillard, 1992; see also Coyne, Thompson, & Palmer, 2002). Although abundant research has investigated why these associations exist, scholars have not definitively identified what particular interpersonal dynamics result in the rejection of individuals who are experiencing depression (e.g., Coyne, 1990; Segrin & Dillard, 1992). We suggest that a heretofore unexplored reason why relationship partners tend to be rejecting is that interacting with individuals who are experiencing depression frequently requires exertions of effort that can be depleting; Study 4 represents an experimental test of this hypothesis. Why might it be depleting to interact with individuals who are experiencing depression? We suggest that the depletion results from the attributional tendencies of individuals experiencing depression regarding their negative circumstances. Individuals experiencing depression tend to exhibit a distinctive and hopeless attribution style that increases the likelihood that attempts to help them will prove ineffective (cf. Seligman, Abramson, Semmel, & von Baeyer, 1979). To be comforted by others virtually requires that one be receptive to suggestions— or at least to the possibility that there exists some course of action that could make a difference. Individuals exhibiting the depressive attribution style, how-

465

ever, are unlikely to be receptive; no matter what suggestion they receive, they are unlikely to see potential for improvement. As such, repeated attempts to help them may well render the interaction decidedly unsynchronized. What happens, in contrast, when one tries to comfort an individual who is experiencing distress but who does not exhibit the depressive attribution style? Such interaction, we suggest, is less depleting because such a person is likely to be receptive to suggestions, thereby allowing the conversation to progress in a more satisfying direction. Comforting such a person does not require repeated exertions to generate new ideas oriented toward being helpful and providing traction for progressing dialogue. Of particular relevance to the current article, interaction with a person who is distressed but not characterized by the depressive attribution style is likely to be better coordinated than that with a person who is distressed and also characterized by the depressive attribution style. To test the idea that interacting with an individual exhibiting both distress and the depressive attribution style is more depleting than is interacting with an individual exhibiting distress but not the depressive attribution style, we designed a problem-solving task that entailed two roles: talker and advisor. Talkers (always the confederate) generated a personal problem they were willing to share and receive help with solving, and advisors (always the participant) listened to the talker share the problem and then offered suggestions or advice. The goal of the task was to work toward possible solutions to the talker’s problem. Talkers (confederates) always discussed the identical distressing problem. In the low-maintenance (nondepressed) condition, they were receptive to the advisor’s suggestions or advice; in the high-maintenance (depressed) condition, they were not. In addition to this new interaction task, the present study also used handgrip stamina as a new measure of self-regulatory resources. In addition to assessing physical strength, performing well on the handgrip stamina task requires self-regulatory exertion: After squeezing the handgrip for a short period of time, hand muscles become fatigued and the person feels the urge to relax the muscles. Self-regulation requires overcoming this fatigue and pushing oneself to continue, similar to other forms of stamina. (Ciarocco et al., 2001, p. 1160; see also Muraven et al., 1998)

If results revealed that experiencing a high-maintenance interaction results in impaired physical stamina, this would complement our previous findings to suggest that high-maintenance interaction results in a relatively global impairment in self-regulatory functioning.

Method Participants. Participants were 37 undergraduates who volunteered to take part in the study in partial fulfillment of the requirements for an introductory psychology course. We dropped 5 participants (3 because of suspicion regarding our experimental procedures, 1 because of equipment failure, and 1 whose handgrip times exceeded 3 standard deviations from the mean for her sex—whereas no other participant was even 2 standard deviations from it), leaving a sample of 32 participants (17 women) who were 19.52 (SD ⫽ 1.12) years old on average and predominantly Asian American and Caucasian (31% Asian American, 47% Caucasian, 9% Hispanic, and 9% other; 1 participant did not report race information).

466

FINKEL ET AL.

Procedure. As described earlier, Study 4 was designed to parallel Studies 1 through 3 in its core structural features (manipulating whether social coordination was efficient or inefficient and then assessing selfregulation with a behavioral measure) but to be dissimilar in the particular procedures in which these structural features were embedded. Participants reported to the experiment and waited outside the laboratory. Waiting with them once again was a same-sex “participant” who was actually a confederate of the experimenter. The experimenter greeted the participant and the confederate and led them to a pleasant room where they were seated on an L-shaped sofa in predetermined positions so they could easily look at each other. After they were situated, the experimenter explained that she needed to finish setting things up before they could begin the experiment. She excused herself, leaving the participant and the confederate alone in the room together for 3 min. We included this seemingly impromptu acquaintance period so they could establish a modicum of rapport to facilitate the upcoming self-disclosure task (explained later). The confederate was instructed to initiate conversation if the participant did not. At this point, nobody (not even the confederate or the experimenter) knew to which experimental condition the participant had been assigned. The confederate’s behavior was neutral enough to facilitate smooth and believable transitions to his or her subsequent role in either the experimental or the control condition. When the experimenter returned, she apologized for the delay and explained the procedures for the handgrip task to the participant and the confederate. To minimize the likelihood that the participant would become suspicious that the handgrip and the self-disclosure tasks were linked, the experimenter explained that she was collecting pilot data for a sports psychologist at another university and that two assessments would be taken at separate times and then averaged to get the most accurate estimates. The first assessment took place at this time and provided a baseline measure of physical stamina preceding any experimental manipulation. The experimenter took the participant and the confederate, one at a time, to a separate room for this initial assessment to minimize evaluation apprehension and possible competitiveness. She instructed them to squeeze the handgrip closed around an eraser for as long as possible; when the eraser dropped from the handgrip, she stopped the timer. Once the participant and the confederate had completed the baseline handgrip assessment and were again situated on the couch, the experimenter introduced the “cooperative problem-solving task.” Roles for the task were ostensibly assigned randomly by having each person select a piece of paper out of a basket, but the procedure was rigged so the participant was always assigned to the role of advisor and the confederate was always assigned to the role of the talker. The experimenter explained that the talker would “begin the conversation by sharing a personal problem he has been dealing with lately.”4 She instructed the talker that the personal problem should be something that he feels comfortable discussing and reassured him that neither participant would be forced to talk about anything that makes him uncomfortable. She continued by instructing the talker to “pick a problem that has been bothering you recently and something that you could use some help solving; it can be anything from roommate trouble or relationship problems, to a conflict with parents, or something more general.” After acknowledging that it sometimes takes people a few minutes to think of a personal problem they are willing to discuss, she asked him whether anything came to mind. He responded, “Um, yeah, I think I have something I could talk about.” The experimenter then explained that the advisor’s job was to listen to the talker describe the personal problem, after which the talker and the advisor would engage in a problem-solving discussion. She explained that “the advisor should feel free to offer advice or suggestions, just as you would with a friend. Together, your goal is to come up with possible solutions to the problem at hand.” Before she left, the experimenter handed each of them a sheet of paper with an outline of the instructions to remind them of their respective roles

and the goal of the task. On the paper handed to the talker (confederate) was a number indicating the experimental condition for that particular session. To keep her blind, the experimenter did not know which number was linked to which condition. Once the experimenter had left the room and closed the door, the talker began describing the problem, which was scripted as follows for both conditions: Well, ok, this is a little weird, but I guess there is something I could talk about. The main problem that I’ve been having is adjusting to life at college— or just trying to find my place here. I don’t know—I was really looking forward to coming to college, but so far I really haven’t been that happy here. I haven’t really liked many of my classes—so I don’t have any clue what I will major in. I get along with my roommate really well, but other than that, I haven’t really met that many cool people— or anyone that I really have much in common with. It just seems like maybe this isn’t the right place for me. I mean, I knew that there would be an adjustment period in the beginning— when I first got here— but it seems like by now everyone I know is really happy here and I’m not, really. And my old friends from high school—who go to other schools—seem to be really settled in having a lot of fun. So I just feel like, basically, I’m not having that much fun and I’m not even learning that much. So, I’m starting to wonder if maybe I made the wrong decision in coming to Northwestern. Before the first session, both the male and the female confederate practiced this description dozens of times until we were satisfied that each was both convincing and consistent. Once the talker described his problem to the advisor, they proceeded to the problem-solving discussion. The talker described the same problem in both the high-maintenance and the low-maintenance conditions; these conditions were differentiated by his responses to the advisor’s suggestions. In the depressed (high-maintenance) condition, his responses were generally pessimistic about the likelihood of improvement regarding the problem; he deflected the advisor’s suggestions as unlikely to improve the situation. In the nondepressed (low-maintenance) condition, in contrast, he was less pessimistic and more receptive to the advisor’s suggestions. Even in the nondepressed condition, though, he never became happy or exhibited signs that he thought the problem was resolved; rather, he remained distressed about the problem but was receptive to the advisor’s attempts to help with problem solving. The talkers (confederates) were trained until they felt comfortable spontaneously generating responses while staying “in character.” Most important, we emphasized to them that “the key difference [between the two conditions] is that you are open to suggestions and the possibility of improvement” only in the nondepressed condition. After the talker and advisor had discussed the talker’s problem for 6 min, the experimenter returned and explained that because the interaction task was now complete, the talker and the advisor would be separated to do the second handgrip measure and to complete some questionnaires. The experimenter led the confederate into another room and returned immediately to assess the participant’s postinteraction handgrip score. Participants then completed a brief questionnaire (“After conversation, I felt . . .”), including a three-item subjectively experienced depletion measure (emotionally depleted, drained, and mentally tired), a seven-item mood measure (calm, content, happy, frustrated, annoyed, angry, and sad; the four negative mood items were reverse-scored), and a two-item selfefficacy measure (capable of accomplishing my goals, confident in my abilities). Finally, participants completed a four-item manipulation check

4

As mentioned earlier, the participant interacted with a same-sex confederate. In explaining the procedures, we describe male sessions because this focus allows us to use female pronouns (e.g., “she”) to refer to the experimenter (always female) and male pronouns to refer to the participant or the confederate. It also allows us to avoid tortured pronoun combinations such as “he or she.”

HIGH-MAINTENANCE INTERACTION

467

Figure 5. Study 4: The percentage reduction in physical stamina as a function of whether participants had previously interacted with a distressed person either exhibiting the depressive attribution style (high-maintenance interaction) or not (low-maintenance interaction).

modified for the current study to assess the degree to which they experienced the interaction as a high-maintenance interaction (e.g., “The conversation went very smoothly,” reverse-scored; “I felt comfortable giving advice,” reverse-scored). The depletion (␣ ⫽ .90), mood (␣ ⫽ .76), and self-efficacy (␣ ⫽ .87) measures exhibited acceptable reliabilities; the reliability of the high-maintenance interaction measure was somewhat lower (␣ ⫽ .59).

Results and Discussion Manipulation check. Unlike the manipulation check findings from Studies 1 and 2 but like those from Study 3, results from an independent-samples t test did not reveal significant differences between the high-maintenance (M ⫽ 2.13, SD ⫽ 0.80) and the low-maintenance (M ⫽ 1.95, SD ⫽ 0.50) conditions in predicting subjectively experienced high-maintenance interaction, ␤ ⫽ .14, 兩t(29)兩 ⬍ 1.00, although means were descriptively in the sensible direction. For reasons similar to those advanced in Study 3, we continued with hypothesis tests despite the nonsignificant manipulation check. Hypothesis tests. To test the hypothesis that participants who experienced the high-maintenance problem-solving interaction would exhibit greater decrements in physical stamina from before to after this interaction relative to those who experienced the low-maintenance one, we performed a mixed-model ANOVA in which time (the first vs. the second handgrip assessment) was a within-subjects variable and the social coordination manipulation and participant sex were between-subjects variables. As predicted, results revealed a significant Time ⫻ Condition interaction effect, such that the decrement in physical stamina between the preinteraction and postinteraction assessments was larger in the highmaintenance condition (M ⫽ 27.88 s, SD ⫽ 32.67 s) than in the low-maintenance condition (M ⫽ 8.20 s, SD ⫽ 27.11 s), F(1, 28) ⫽ 5.42, p ⫽ .03. This Time ⫻ Condition interaction effect was

not significantly moderated by participant sex, F(1, 28) ⫽ 1.77, p ⫽ .19. As depicted in Figure 5, participants in the highmaintenance condition showed a 33% decrement from preinteraction to postinteraction physical stamina (from 84.09 s to 56.22 s), whereas those in the low-maintenance condition showed a 15% decrement (from 54.72 s to 46.52 s).5 Auxiliary analyses. To discern whether mood and/or selfefficacy accounted for the effect of the social coordination manipulation on the decrement in physical stamina scores, we conducted a mixed-model multiple regression analysis in which time was a within-subjects variable, the social coordination manipulation was a dichotomous between-subjects variable, and both mood and self-efficacy were continuous between-subjects variables. Results revealed that the Time ⫻ Condition interaction effect was significant, F(1, 28) ⫽ 5.23, p ⫽ .03, whereas the Time ⫻ Mood and the Time ⫻ Self-Efficacy interaction effects were not, Fs(1, 28) ⬍ 1.17, ps ⬎ .29. To examine whether including subjectively experienced depletion in the model altered conclusions, we conducted a mixed-model multiple regression analysis in which time was a within-subjects variable, the social coordination manipulation was a dichotomous between-subjects variable, and the subjectively experienced depletion measure was a between-subjects continuous variable. Results revealed that the Time ⫻ Condition interaction effect remained (marginally) significant, F(1, 29) ⫽ 3.93, p ⫽ 5

An auxiliary analysis examining whether the preinteraction (and premanipulation) stamina scores differed across the experimental conditions failed to reveal evidence that they did, t(30) ⫽ 1.41, p ⫽ .17. Even so, the pattern of means in this study leaves open the alternative explanation that our results are due to regression to the mean. Given that Studies 1, 2, 3, and 5 are not susceptible to this alternative explanation, the most parsimonious explanation for the Study 4 results is that experiencing the highmaintenance interaction impaired physical stamina.

FINKEL ET AL.

468

.057, whereas the Time ⫻ Subjectively Experienced Depletion interaction effect was not, F(1, 29) ⬍ 1.00. These results once again provide no support for the notion that mood, self-efficacy, or subjectively experienced depletion mediates the association of high-maintenance interaction with impaired self-regulation. Summary. Complementing previous findings, then, the Study 4 results suggest that high-maintenance interaction causes impaired self-regulation on a physical stamina task, an effect that was not mediated by subjectively experienced depletion, mood, or self-efficacy. Given that the social problem-solving procedures used in Study 4 are strikingly different from, and more ecologically valid than, the maze and data entry procedures used in Studies 1 through 3, we gain confidence in the generality of the adverse effects of high-maintenance interaction on subsequent self-regulation. As in Study 3, the predicted results emerged in Study 4 even though the manipulation check failed to reveal significant differences across the two conditions in the degree to which participants reported that the interaction was high maintenance. This pattern of results again raises the intriguing possibility that high-maintenance interaction can impair self-regulation even when individuals do not realize that they have experienced a high-maintenance interaction in the first place. Study 5 was designed to provide a rigorous test of this idea.

Study 5: Nonconscious Behavioral Mimicry In addition to providing strong support for the hypothesis that high-maintenance interaction impairs individual-level selfregulation on subsequent, unrelated tasks, Studies 1 through 4 also revealed a striking lack of support for the possibility that this effect is mediated through plausible conscious processes (subjectively experienced depletion, mood, self-efficacy, or liking for the interaction partner). This reliable pattern of findings is consistent with the possibility that high-maintenance interaction impairs selfregulation without requiring high-level cognitive mediation. Building on the plausible notion that humans are constantly, nonconsciously attuned to their social coordination experiences— particularly to social coordination failures—in their everyday lives, we incorporated in Study 5 a subtle manipulation of highmaintenance interaction in which participants were not consciously aware that the social coordination had been inefficient, or even that social coordination issues were relevant. This design differed from those used in Studies 1 through 4 in that the manipulations in those previous studies involved unambiguous instances of poor social coordination; participants in the high-maintenance interaction conditions, for example, surely recognized that the confederate was guiding them poorly on the maze and data entry tasks (Studies 1 through 3) and was being unreceptive to one’s well-intentioned suggestions (Study 4). As in Studies 1 through 4, we manipulated social coordination by having participants engage in a dyadic task with a confederate. Unlike these previous studies, however, the Study 5 procedure manipulated social coordination without influencing performance on the dyadic task. Whereas successful performance on the maze task (Studies 1 and 3), the data entry task (Study 2), and even the problem-solving task (Study 4) obviously depended on the confederate’s behavior across the social coordination conditions, successful performance on the task used in Study 5 did not. In addition, the Study 5 procedure manipulated poor social coordi-

nation without participants’ even being aware that they were experiencing it in the first place. To accomplish this, we adapted procedures from the burgeoning literature on nonconscious behavioral mimicry (e.g., Chartrand & Bargh, 1999). In these studies, half of the participants interact with a confederate who subtly mimics their mannerisms and gestures (the low-maintenance interaction, or mimicry, condition) and the other half interact with a confederate who subtly but deliberately stays out of sync with their mannerisms and gestures (the high-maintenance interaction, or misalignment, condition). Our decision to use behavioral mimicry and antimimicry (misalignment) procedures to manipulate social coordination nonconsciously builds on the idea (initially expressed in the Introduction) that social interaction is remarkably complex. Strategies for navigating most of this complexity are so deeply rooted in the knowledge base of a healthy adult that they are generally implemented without effort or even conscious awareness. Individuals are rarely required, for example, to concentrate effortfully on subtle but crucial aspects of social interaction such as where to stand, where to focus one’s gaze, how much distance to leave between themselves and their interaction partner, what language to speak, and so on. There are, however, instances in which these bedrock components of social coordination break down, and such breakdowns vary widely in how salient they are. At the salient extreme, it would be disconcerting to negotiate a price with a plumber who insists on addressing all communication to your forearm or to collaborate with a colleague who only communicates with you while rubbing your head. Toward the nonsalient extreme, abundant evidence has been emerging to suggest that subtle behavioral coordination is a fundamental aspect of interpersonal interaction (see Bernieri, 1988; Chartrand & Bargh, 1999; Chartrand, Maddux, & Lakin, 2005; Lakin, Jefferis, Cheng, & Chartrand, 2003). For example, when individuals shake their foot or touch their face during a social interaction unrelated to these body movements, the person with whom they are interacting also engages in footshaking or face-touching behaviors—and these spontaneous mimicry behaviors are enacted without any conscious awareness that mimicry is taking place (Chartrand & Bargh, 1999). Despite its subtlety, however, we suggest that poorly synchronized behavioral mimicry can render otherwise efficient social interaction more complex, requiring at a nonconscious-level heightened attention to social coordination processes. The increased vigilance required during interaction characterized by such social misalignment, we argue, transforms it into high-maintenance interaction and increases the likelihood of impaired self-regulation on subsequent, unrelated tasks. In addition to using a new method to manipulate social coordination, Study 5 also introduced a new method to assess selfregulation, a task measuring fine motor control. If results revealed that experiencing a high-maintenance interaction results in impaired fine motor control, this would provide additional evidence of the generality of self-regulatory impairment resulting from high-maintenance interaction.

Method Participants. Participants were 29 undergraduates (18 women) who participated in exchange for $20. Procedure. As in Studies 1 through 4, Study 5 incorporated a procedure that built on the two-task paradigm: First, participants interacted with

HIGH-MAINTENANCE INTERACTION a (female) confederate; second, they performed a self-regulatory task on their own. The experimenter instructed the participant and the confederate to engage in a picture description task, in which they took turns describing to one another a series of 12 color pictures selected from magazines such as Time and National Geographic. The experiment was rigged such that the confederate described the same 6 of the 12 pictures in every session; this allowed her to memorize a prepared script for those pictures and deliver her descriptions with natural hesitations and vocal disfluencies to make her responses appear spontaneous. Given that the picture description task was intended to manipulate nonconscious behavioral mimicry but not intimacy, we strived to minimize the degree to which the interactants engaged in self-disclosure. As such, the experimenter informed the interactants that their task was to provide a factual description of the pictures rather than to discuss their emotional or intellectual responses to them. These procedures differed essentially from the emotionally involving ones used in Study 4 and provided the first task in which participants could talk with the confederate (in contrast to Studies 1 through 3) on a task that is not emotional in nature (in contrast to Study 4). In addition, Study 5 was the first in which the participant and the confederate occupied equivalent roles. After explaining the procedures of the picture description task, the experimenter gave the confederate and the participant the predetermined set of pictures. The interactants then took turns describing their pictures (without showing the other person the picture they were describing) until they had described all 12 pictures. The experimenter always casually asked the confederate to begin first. During this picture description task, the experimenter maintained a neutral body position (feet flat on the floor, hands folded in her lap, and sitting upright in her chair) to avoid influencing the mimicry manipulation. The picture description task took approximately 30 min (M ⫽ 29.07, SD ⫽ 5.01) to complete. We used this picture description task as a medium through which to incorporate our social coordination manipulation. Participants were randomly assigned to work with a confederate who either subtly mimicked or antimimicked their physical mannerisms and gestures during the task. In the mimicry condition, the confederate unobtrusively mimicked the participant with a slight variation and a delay of 1 or 2 s. For example, when participants crossed their legs, the confederate would do the same after enough of a delay so the mimicry would not be obvious. In the misalignment condition, the confederate unobtrusively engaged in antimimicry behaviors, such that her body language was always out of sync with that of the participant. For example, if participants sat still and upright in their chair, the confederate might fidget and lean forward. Following the picture description task, the experimenter told the interactants that they would complete the rest of the experiment in separate rooms. After dismissing the confederate, the experimenter returned to the participant and informed him or her that the next task would be to play the game Operation,6 which is a commercial board game for children that involves removing up to 12 fake body parts from a cartoon patient using a tweezer-like device (see Vohs et al., 2005, Study 7). Each of the 12 fake body parts rested in a shallow pit surrounded by metal edges. Whenever the participant inadvertently touched the tweezers against the metal edges, the game emitted a loud buzzing noise and the cartoon patient’s nose glowed red. The experimenter explained that the participant’s tasks were (a) to remove each of the body parts in a smooth movement in which the tweezers did not touch the metal edges and (b) to do so as quickly as possible. Participants attempted to remove the 12 body parts in a predetermined order. If participants accidentally touched the tweezers to the metal edges, setting off the buzzing noise and the reddened nose, they were required to remove the tweezers and initiate a new attempt to remove that particular piece. Participants were allowed to give up on any particular piece and move on to the next one with the understanding that they could not go back and attempt to remove that piece again; deciding to move on without successfully removing the piece would represent a failure to perform optimally on the task.

469

Given that Operation is designed for children as young as 6 years of age, almost all adults are capable of removing all the pieces eventually if they have sufficient motivation and concentration to do so. As such, our central measure of impaired self-regulation was removal failures, or the number of pieces participants never removed. We also examined the effect of the mimicry manipulation on removal efficiency, or the ratio of the number of pieces successfully removed divided by the total number of removal attempts the participant made. We included this second dependent measure because it provided information relevant to the question of why highmaintenance interaction impairs self-regulation. One possibility is that individuals who have experienced a high-maintenance interaction perform poorly on subsequent tasks not because they are ineffective at performing them but rather because they do not even bother to try to perform them in the first place. If this is the case, results should reveal that participants who have experienced a high-maintenance interaction successfully remove fewer pieces even though they are just as effective at removing a piece on any given removal attempt (i.e., that they should have more removal failures, even though their removal efficiency is no worse). A second possibility is that individuals who have experienced a high-maintenance interaction perform poorly on subsequent tasks because they perform them sloppily. If this is the case, results should reveal that participants who have experienced a high-maintenance interaction make just as many (if not more) attempts to remove the body parts relative to those who have experienced the low-maintenance interaction but that each given attempt is less likely to be successful (i.e., that they should have both more removal failures and also poorer removal efficiency). After completing the Operation task, participants completed a brief questionnaire (“When I interacted with the other participant, . . .”), including a five-item subjectively experienced depletion measure (mentally exhausted; motivated, reverse-scored; drained; energetic, reverse-scored; and worn out), a six-item mood measure (happy, content, cheerful, angry, dejected, and sad; the three negative mood items were reverse-scored), a four-item self-efficacy measure (competent, capable, confident, and selfassured) and a two-item questionnaire assessing liking for the interaction partner (“I liked the other participant,” and “The other participant strikes me as someone I could be friends with”). Finally, they completed the same four-item measure used in Studies 1 though 3 to assess the degree to which they experienced the interaction with the confederate as a highmaintenance interaction. The depletion (␣ ⫽ .84), mood (␣ ⫽ .77), self-efficacy (␣ ⫽ .89), and liking (␣ ⫽ .76) measures all exhibited acceptable scale reliabilities; the high-maintenance interaction (␣ ⫽ .59) measure exhibited poorer reliability than the identical measure exhibited in Studies 1 through 3.7 Consistent with previous research using mimicry procedures (see Chartrand et al., 2005, for a review), a thorough funnel debriefing failed to identify any participants who were aware that they had been mimicked or antimimicked— or even that the study had anything to do with behavioral coordination.

Results and Discussion The mimicry manipulation and subjective experiences of highmaintenance interaction. Before performing hypothesis tests, we wanted to discern whether participants in the high-maintenance (misalignment) condition subjectively experienced the interaction with the confederate as more high maintenance than did those in 6

We thank Kathleen Vohs for suggesting this task. Dropping one of the four items on this high-maintenance interaction measure improved its reliability somewhat (␣ ⫽ .68). When substituting in this reduced, three-item measure instead of the four-item measure, results were essentially the same. This fact, in conjunction with the fact that the four-item measure was used in Studies 1 through 3, led us to stick with the full, four-item measure. 7

470

FINKEL ET AL.

Figure 6. Study 5: The number of removal failures as a function of whether participants had previously engaged in a social interaction characterized by misalignment (high-maintenance interaction) or mimicry (low-maintenance interaction).

the low-maintenance (mimicry) condition. Consistent with the findings from Studies 3 and 4, results revealed that participants in the high-maintenance interaction (misalignment) condition (M ⫽ 2.38, SD ⫽ 0.65) did not report that the interaction was significantly more high maintenance than did those in the lowmaintenance interaction (mimicry) condition (M ⫽ 2.05, SD ⫽ 0.67), t(26) ⫽ 1.29, p ⬎ .20, although means were descriptively in the sensible direction. Hypothesis tests. We predicted that participants who were assigned to the high-maintenance interaction (misalignment) condition would exhibit a greater number of removal failures relative to those who were assigned to the low-maintenance interaction (mimicry) condition. As depicted in Figure 6, results from an independent-samples t test revealed strong support for this prediction, t(27) ⫽ 2.85, p ⬍ .01. Although participants in both conditions successfully removed most of the pieces, participants who had experienced the high-maintenance (misalignment) interaction exhibited 86% more removal failures relative to those who had experienced the low-maintenance (mimicry) interaction. We also examined the effect of the social coordination manipulation on removal efficiency to discern whether participants who were assigned to the high-maintenance interaction (misalignment) condition performed worse than those who were assigned to the low-maintenance interaction (mimicry) condition on the Operation task because (a) they simply failed to make attempts to remove as many pieces or (b) they were less effective at removing the pieces they attempted to remove (poor removal efficiency). As depicted in Figure 7, results from an independent-samples t test revealed that participants who were assigned to the high-maintenance interaction (misalignment) condition tended to exhibit poor removal efficiency relative to those who were assigned to the lowmaintenance interaction (mimicry) condition, t(27) ⫽ –2.00, p ⬍ .056. This finding reveals that relative to participants who were

assigned to the high-maintenance interaction (misalignment) condition, those who were assigned to the low-maintenance interaction (mimicry) condition were 39% more likely to remove a piece successfully on any given attempt (11.14 successes in 51.14 attempts vs. 10.40 successes in 66.47 attempts).8 Participant sex did not significantly moderate the effects of the social coordination manipulation on removal failures, t(25) ⫽ 1.17, p ⫽ .25, or on removal efficiency, t(25) ⫽ 0.23, p ⫽ .82. Auxiliary analyses. To discern whether mood, self-efficacy, and/or liking for the interaction partner accounted for the effect of the social coordination (mimicry) manipulation on removal failures, we conducted an additional multiple regression analysis predicting removal failures from the mimicry manipulation and all three of these possible mediators. Results revealed that the mimicry manipulation still predicted unique variance, ␤ ⫽ .52, t(24) ⫽ 2.79, p ⫽ .01, whereas mood, self-efficacy, and liking did not, 兩␤s兩 ⬍ .25, 兩ts(24)兩 ⬍ 1.11. To examine whether including subjectively experienced depletion in the model altered conclusions, we conducted a multiple regression analysis predicting removal failures from the mimicry manipulation and the subjectively experienced depletion measure. Results revealed that the mimicry manipulation predicted unique variance in removal failures, ␤ ⫽ .51, t(26) ⫽ 2.91, p ⬍ .01, whereas the subjectively experienced depletion measure did not, ␤ ⫽ –.13, 兩t(26)兩 ⬍ 1.00. These results once again provide no support for the notion that mood, self-efficacy, liking for the interaction partner, or subjectively experienced depletion

8

An auxiliary analysis revealed a trend such that participants in the antimimicry condition (M ⫽ 56.07, SD ⫽ 27.18) made a larger number of failed attempts than did those in the mimicry condition (M ⫽ 40.00, SD ⫽ 24.21), ␤ ⫽ .31, t(27) ⫽ 1.68, p ⫽ .105.

HIGH-MAINTENANCE INTERACTION

471

Figure 7. Study 5: Removal efficiency, or the percentage of removal attempts in which the body part was successfully removed, as a function of whether participants had previously engaged in a social interaction characterized by misalignment (high-maintenance interaction) or mimicry (low-maintenance interaction).

mediates the association of high-maintenance interaction with impaired self-regulation. Summary. Complementing the findings from Studies 1 through 4, the Study 5 results suggest that high-maintenance interaction (as manipulated through antimimicry procedures) causes impaired self-regulation (as assessed with a measure of fine motor control). A likely reason for this impaired performance is that individuals perform subsequent tasks sloppily following highmaintenance interaction: Their likelihood of success in removing the target piece on any given trial was impaired in the highmaintenance interaction (misalignment) condition relative to the low-maintenance interaction (mimicry) condition. These effects were not attributable to subjectively experienced depletion, mood, self-efficacy, or liking for the partner, nor were they moderated by participant sex. The corpus of evidence across studies provides little reason to conclude that the high-maintenance interaction effect is systematically moderated by sex.

General Discussion Across five studies, we manipulated social coordination to examine how high-maintenance interaction affects individual-level self-regulation on subsequent, unrelated tasks. We manipulated social coordination by having participants experience either a high- or a low-maintenance interaction on a maze task (Studies 1 and 3), a data entry task (Study 2), an emotional problem-solving task (Study 4), or an emotionless picture description task (Study 5). We assessed self-regulation with (a) measures of preferences for a challenging task with high reward potential over an easy task with low reward potential (Study 1) and (b) task performance (anagram performance in Study 1, GRE performance in Studies 2 and 3, physical stamina in Study 4, and fine motor control in Study 5). Results from all five studies supported the hypothesis that

high-maintenance interaction impairs the interactants’ subsequent self-regulatory success on unrelated tasks. This effect remained robust beyond the effects of subjectively experienced depletion, mood, self-efficacy, and liking for the interaction partner—and it emerged even with a nonconscious manipulation of highmaintenance interaction (Study 5). Two of our findings paint a picture of the individual who has just endured a high-maintenance interaction as somebody with diminished achievement motivation and sloppy task performance rather than as somebody striving for excellence but coming up short. The first finding comes from Study 1 (see Figure 1): Highmaintenance interaction causes individuals to prefer to engage in simple tasks that are unlikely to require much effort but also are unlikely to be rewarding. The second finding comes from Study 5 (see Figure 7): High-maintenance interaction causes individuals to perform subsequent tasks without the care and attention to detail that they would otherwise apply. These findings suggest that experiencing high-maintenance interaction causes individuals to avoid challenging tasks, if possible, or to perform them without the focus and concentration required to perform them well.

Implications The research reported herein has immediate implications for the interdependence theory and the self-regulation traditions. Regarding the former, although (a) interdependence refers to the processes through which interactants influence one another’s experiences and outcomes (Rusbult & Van Lange, 1996), and (b) interdependence scholars have long theorized about how social coordination can interfere with or facilitate effective functioning (e.g., Kelley et al., 2003; Thibaut & Kelley, 1959), the present article is the first empirical report to use an interdependence theory analysis of the personal consequences of poor social coordination.

472

FINKEL ET AL.

More generally, relationships scholars have largely neglected issues of social coordination, focusing instead on topics such as conflict, attributions, trust, satisfaction, and commitment. As we suggested in the Introduction, social coordination is a central but neglected topic for relationships theories in general and for interdependence theory in particular. The present research also adds our voice to the emerging chorus of scholars emphasizing the importance of incorporating social dynamics into theories of self-regulation. The most prominent theories of self-regulation focus primarily on self-regulatory dynamics taking place within a given individual (e.g., Baumeister & Heatherton, 1996; Carver & Scheier, 1998; Higgins, 2000; Mischel et al., 1989). The past several years, however, have witnessed several compelling demonstrations of the power of interpersonal relationships to influence self-regulation (e.g., Baumeister, DeWall, Ciarocco, & Twenge, 2005; Fitzsimons & Bargh, 2003; Shah, 2003). The present report complements these recent demonstrations by emphasizing that social coordination is yet another interpersonal process with important implications for self-regulation.

The Question of Mediation Despite rigorous efforts across all five studies to find evidence that subjectively experienced depletion, mood, self-efficacy, or liking for the interaction partner might mediate the effects of the social coordination manipulations on self-regulation, none could account for these effects. This robust failure to find evidence that high-level conscious mechanisms mediate the high-maintenance interaction effect is consistent with prior attempts to find selfreport mediators of the association of initial self-regulatory exertion with subsequent self-regulatory impairment (e.g., Ciarocco et al., 2001; Muraven & Slessareva, 2003; Schmeichel et al., 2003; Vohs & Schmeichel, 2003; Wallace & Baumeister, 2002), and it supports the idea that high-maintenance interaction impairs selfregulation without individuals even being aware that the interaction has influenced them. Especially strong evidence for this possibility emerged from Study 3, Study 4, and, in particular, Study 5. These studies failed to reveal significant differences across the high- and low-maintenance interaction conditions in the degree to which participants subjectively experienced the interaction as high maintenance, but they all nonetheless revealed that participants assigned to the high-maintenance interaction condition generally exhibited impaired self-regulation relative to those assigned to the low-maintenance interaction condition. There are circumstances under which scholars must entertain the possibility that the null hypothesis is correct (see Greenwald, 1975), and the pervasiveness of the null mediational effects in all five studies suggests that it may be appropriate to draw the tentative conclusion that high-maintenance interaction impairs self-regulation in the absence of high-level cognitive mediation. Although the present studies do not provide definitive evidence that the high-maintenance interaction is mediated by selfregulatory depletion, a recent series of studies by Richeson and colleagues in the prejudice literature suggests that depletion is a likely mediator of the effect. In one study, the differential activation of a brain region known to be associated with self-regulation (the dorsolateral prefrontal cortex; DLPFC) in response to Black versus White faces significantly mediated the association of White

participants’ prejudice scores with their impaired Stroop performance after interacting with a Black confederate; Stroop performance was not significantly impaired after interacting with a White confederate, regardless of the participants’ prejudice scores (Richeson et al., 2003). These results are consistent with a selfregulatory depletion explanation: Interracial encounters seem to require that prejudiced White people exert self-regulation (as detected through elevated DLPFC activation), which may well deplete self-regulatory resources and ultimately impair executive control performance. Evidence from a separate series of studies suggests that increasing the self-regulatory demands of interracial interactions results in greater impairment in subsequent Stroop performance, whereas decreasing such demands reduces it (Richeson & Trawalter, 2005). The body of evidence emerging from the research reported herein and that by Richeson and her colleagues indicates that (a) the driving mechanism behind the destructive self-regulatory effects of high-maintenance interaction is selfregulatory strength depletion, but (b) individuals are not consciously aware that the interactions have affected them. These conclusions suggest that future research striving to establish the mechanisms underlying the high-maintenance interaction effect (and depletion effects more generally) could benefit from an emphasis on nonconscious mediators. The findings implicating the DLPFC as a potential mediator (Richeson et al., 2003) suggest that systematic investigations into brain activity may well reveal important discoveries not only for the high-maintenance interaction and depletion literatures but also for various cognitive neuroscience literatures. In addition, recent research implicating blood glucose as an important predictor of effective self-regulation (e.g., Fairclough & Houston, 2004) suggests that future research could benefit from exploring whether high-maintenance interaction impairs self-regulation because it depletes blood glucose levels. Moving from biological to psychological processes, it is possible that, for example, rumination (perhaps even partly at a conscious level) about the high-maintenance interaction while performing the second task in the two-task paradigm could account for impaired self-regulation. Although the present results (e.g., the Study 1 finding that participants prefer to engage in simple rather than challenging tasks after experiencing a high-maintenance interaction) suggest that this alternative explanation is unlikely to account entirely for the high-maintenance interaction effect, the field could benefit from a systematic investigation of rumination as a mediator of (or as an alternative explanation for) this and other depletion effects.

Who Is Responsible for a High-Maintenance Interaction? What makes certain social interactions high maintenance and others low maintenance? We suggest that interaction can be high maintenance because of characteristics of (a) the self (Johanna tends to get in people’s way when she cooks), (b) the partner (Bob’s mistakes make the cooking process inefficient), (c) their interaction (Johanna and Bob do not communicate well when they cook together), or (d) the situation (the arrangement of the kitchen makes coordination especially challenging). The research reported herein manipulated the partner’s behavior to create highmaintenance interaction, but future research could delve into any of these four categories of factors or look at the interplay between them. For example, perhaps Johanna experiences greater self-

HIGH-MAINTENANCE INTERACTION

regulatory failure on subsequent tasks after trying to cook with anybody who is indecisive, especially when the kids are crying.

A Vicious Cycle? Recent research suggests that effective self-regulation (both high dispositional self-control and low self-regulatory strength depletion) may be an important factor helping people to engage in behaviors that promote relationship well-being (Finkel & Campbell, 2001; Vohs, 2004). The present article complements this work by demonstrating that self-regulatory success is affected by social coordination experiences. In combination, the pattern of results emerging from previous research and the present report suggest that the phenomena of high-maintenance interaction and impaired self-regulation may function together in a vicious cycle. For example, perhaps high-maintenance interaction with a romantic partner impairs self-regulation not only in personal domains, but also in interpersonal domains. This unpleasant pattern can build on itself, resulting in poor personal and interpersonal outcomes. It may provide a partial explanation for the “when it rains, it pours effect” in which personal problems (e.g., lack of productivity at work) are often accompanied by interpersonal problems (e.g., fights with one’s partner). Future research could use diary methods to investigate these processes as they transpire in everyday life.

473

that may mediate the effect of high-maintenance interaction on impaired self-regulation. We also highlight three strengths of the present work. First, strong evidence emerged to suggest that high-maintenance interaction impairs self-regulation across five studies using diverse methods of manipulating high-maintenance interaction and of assessing self-regulation. The effect is remarkably robust. Second, the experimental procedures used in these studies allow us to draw firm conclusions regarding the causal direction of this effect. And third, we have presented evidence to suggest that social coordination processes that take place outside of individuals’ conscious awareness can influence interactants’ self-regulation. Although the precise mechanisms through which high-maintenance interaction impairs self-regulation remain elusive, finding such consistent evidence that it can impair self-regulation outside of individuals’ awareness throws open fascinating directions for future research.

Conclusion Coordinating behavior with others can be challenging, even when we share the same goals for the interaction. Results from five studies using diverse methods suggest that high-maintenance interaction causes impaired individual-level self-regulation on subsequent, unrelated tasks—and that this process takes place outside of conscious awareness. This work serves as one example of the important role played by interpersonal processes in self-regulatory success.

Emotionally Energizing Interaction? Although the present research focuses on high-maintenance interaction that impairs self-regulation, we are confident that future research will also identify interpersonal processes that enhance self-regulation. Just as interaction partners can deplete us, they should also be able to replenish us. For example, perhaps an affectionate 10-min conversation with a loved one can replenish depleted self-regulatory resources. Recent evidence suggests that the loved one may not even have to be present to bolster the self: Thinking about a person with whom one has a close positive relationship (but not a negative or a distant positive relationship) makes one willing to learn threatening but valuable information about the self (Kumashiro & Sedikides, 2005). Future research could explore why close positive relationships can be replenishing or bolstering.

Limitations and Strengths We raise two limitations of the present work. First, all studies reported in this article relied on rigged interaction between strangers. Future research could explore how these processes play out in real relationships and real-world situations. Second, as mentioned previously, we have not determined a self-report mechanism by which high-maintenance interaction impairs self-regulation. All four of our likely suspects (subjectively experienced depletion, mood, self-efficacy, and reduced liking for the interaction partner) reliably failed to mediate the effect, and the Study 5 results provide strong support for the intriguing possibility that high-maintenance interaction can impair self-regulation directly and nonconsciously. Future research could complement the present investigation by exploring behavioral, implicit-cognitive, and biological processes

References Baumeister, R. F. (1998). The self. In D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), Handbook of social psychology (4th ed., Vol. 2, pp. 680 –740). New York: McGraw-Hill. Baumeister, R. F., Bratslavsky, E., Muraven, M., & Tice, D. M. (1998). Ego depletion: Is the active self a limited resource? Journal of Personality and Social Psychology, 74, 1252–1265. Baumeister, R. F., DeWall, C. N., Ciarocco, N. J., & Twenge, J. M. (2005). Social exclusion impairs self-regulation. Journal of Personality and Social Psychology, 88, 589 – 604. Baumeister, R. F., & Heatherton, T. F. (1996). Self-regulation failure: An overview. Psychological Inquiry, 7, 1–15. Baumeister, R. F., Heatherton, T. F., & Tice, D. M. (1994). Losing control: How and why people fail at self-regulation. San Diego, CA: Academic Press. Bernieri, F. J. (1988). Coordinated movement and rapport in teacher– student interactions. Journal of Nonverbal Behavior, 12, 120 –138. Carver, C. S., & Scheier, M. F. (1998). On the self-regulation of behavior. New York: Cambridge University Press. Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception– behavior link and social interaction. Journal of Personality and Social Psychology, 76, 893–910. Chartrand, T. L., Maddux, W., & Lakin, J. (2005). Beyond the perception– behavior link: The ubiquitous utility and motivational moderators of nonconscious mimicry. In R. Hassin, J. Uleman, & J. A. Bargh (Eds.), The new unconscious (pp. 334 –361). New York: Oxford University Press. Ciarocco, N. J., Sommer, K. L., & Baumeister, R. F. (2001). Ostracism and ego depletion: The strains of silence. Personality and Social Psychology Bulletin, 27, 1156 –1163. Coyne, J. C. (1990). Interpersonal processes in depression. In G. I. Keitner (Ed.), Depression and families (pp. 31–54). Washington, DC: American Psychiatric Association.

474

FINKEL ET AL.

Coyne, J. C., Thompson, R., & Palmer, S. C. (2002). Marital quality, coping with conflict, marital complaints, and affection in couples with a depressed wife. Journal of Family Psychology, 16, 26 –37. Engebretson, T. O., Matthews, K. A., & Scheier, M. F. (1989). Relations between anger expression and cardiovascular reactivity: Reconciling inconsistent findings through a matching hypothesis. Journal of Personality and Social Psychology, 57, 513–521. Fairclough, S. H., & Houston, K. (2004). A metabolic measure of mental effort. Biological Psychology, 66, 177–190. Finkel, E. J., & Campbell, W. K. (2001). Self-control and accommodation in close relationships: An interdependence analysis. Journal of Personality and Social Psychology, 81, 263–277. Finkel, E. J., Rusbult, C. E., Kumashiro, M., & Hannon, P. A. (2002). Dealing with betrayal in close relationships: Does commitment promote forgiveness of betrayal? Journal of Personality and Social Psychology, 82, 956 –974. Fitzsimons, G. M., & Bargh, J. A. (2003). Thinking of you: Nonconscious pursuit of interpersonal goals associated with relationship partners. Journal of Personality and Social Psychology, 84, 148 –163. Flora, D. B., Finkel, E. J., & Foshee, V. A. (2003). Higher order factor structure of a self-control test: Evidence from a confirmatory factor analysis with polychoric correlations. Educational and Psychological Measurement, 63, 112–127. Gilhooly, K. J., & Johnson, C. E. (1978). Effects of solution word attributes on anagram difficulty. Quarterly Journal of Experimental Psychology, 30, 57–70. Gollwitzer, P. M. (1999). Implementation intentions: Strong effects of simple plans. American Psychologist, 54, 493–503. Gottfredson, M. R., & Hirschi, T. (1990). A general theory of crime. Stanford, CA: Stanford University Press. Grasmick, J. F., Tittle, C. R., Bursik, R. J., Jr., & Arneklev, B. J. (1993). Testing the core empirical implications of Gottfredson and Hirschi’s general theory of crime. Journal of Research in Crime and Delinquency, 30, 5–29. Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1–20. Hatfield, E., Cacioppo, J. T., & Rapson, R. L. (1994). Emotional contagion. Cambridge, England: Cambridge University Press. Higgins, E. T. (2000). Making a good decision: Value from fit. American Psychologist, 55, 1217–1230. Holmes, J. G., & Rempel, J. K. (1989). Trust in close relationships. In C. Hendrick (Ed.), Close relationships (pp. 187–220). Thousand Oaks, CA: Sage. Kelley, H. H., Holmes, J. G., Kerr, N. L., Reis, H. T., Rusbult, C. E., & Van Lange, P. A. M. (2003). An atlas of interpersonal situations. New York: Cambridge University Press. Kelley, H. H., & Thibaut, J. W. (1978). Interpersonal relations: A theory of interdependence. New York: Wiley. Kumashiro, M., & Sedikides, C. (2005). Taking on board liability-focused information: Close positive relationships as a self-bolstering resource. Psychological Science, 16, 732–739. Lakin, J. L., Jefferis, V. E., Cheng, C. M., & Chartrand, T. L. (2003). The chameleon effect as social glue: Evidence for the evolutionary significance of nonconscious mimicry. Journal of Nonverbal Behavior, 27, 145–162. Loewenstein, G. (1996). Out of control: Visceral influences on behavior. Organizational Behavior and Human Decision Processes, 65, 272–292. Mischel, W., Shoda, Y., & Rodriquez, M. L. (1989, May 26). Delay of gratification in children. Science, 244, 933–938. Muraven, M., & Baumeister, R. F. (2000). Self-regulation and depletion of limited resources: Does self-control resemble a muscle? Psychological Bulletin, 74, 774 –789. Muraven, M., Collins, R. L., & Neinhaus, K. (2002). Self-control and

alcohol restraint: An initial application of the self-control strength model. Psychology of Addictive Behaviors, 16, 113–120. Muraven, M., & Slessareva, E. (2003). Mechanisms of self-control failure: Motivation and limited resources. Personality and Social Psychology Bulletin, 29, 894 –906. Muraven, M., Tice, D. M., & Baumeister, R. F. (1998). Self-control as limited resource: Regulatory depletion patterns. Journal of Personality and Social Psychology, 74, 774 –789. Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231–259. Reis, H. T., & Collins, W. A. (2004). Relationships, human behavior, and psychological science. Current Directions in Psychological Science, 13, 233–237. Richeson, J. A., Baird, A. A., Gordon, H. L., Heatherton, T. F., Wyland, C. L., Trawalter, S., & Shelton, J. N. (2003). An fMRI investigation of the impact of interracial contact on executive control. Nature Neuroscience, 6, 1323–1328. Richeson, J. A., & Shelton, J. N. (2003). When prejudice does not pay: Effects of interracial contact on executive function. Psychological Science, 14, 287–290. Richeson, J. A., & Trawalter, S. (2005). Why do interracial interactions impair executive function? A resource depletion account. Journal of Personality and Social Psychology, 88, 934 –947. Rothman, A. J., Baldwin, A. S., & Hertel, A. W. (2004). Self-regulation and behavior change: Disentangling behavioral initiation and behavior maintenance. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation: Research, theory, and applications (pp. 130 –148). New York: Guilford Press. Rusbult, C. E., & Van Lange, P. A. M. (1996). Interdependence processes. In E. T. Higgins & A. W. Kruglanski (Eds.), Social psychology: Handbook of basic principles (pp. 564 –596). New York: Guilford Press. Rusbult, C. E., & Van Lange, P. A. M. (2003). Interdependence, interaction, and relationships. Annual Review of Psychology, 54, 351–375. Rusbult, C. E., Verette, J., Whitney, G. A., Slovik, L. F., & Lipkus, I. (1991). Accommodation processes in close relationships: Theory and preliminary evidence. Journal of Personality and Social Psychology, 60, 53–78. Schmeichel, B. J., & Baumeister, R. F. (2004). Self-regulatory strength. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation: Research, theory, and applications (pp. 84 –98). New York: Guilford Press. Schmeichel, B. J., Vohs, K. D., & Baumeister, R. F. (2003). Intellectual performance and ego depletion: Role of the self in logical reasoning and other information processing. Journal of Personality and Social Psychology, 85, 33– 46. Segrin, C., & Dillard, J. P. (1992). The interactional theory of depression: A meta-analysis of the research literature. Journal of Social and Clinical Psychology, 12, 43–70. Seligman, M. E., Abramson, L. Y., Semmel, A., & von Baeyer, C. (1979). Depressive attributional style. Journal of Abnormal Psychology, 88, 242–247. Shah, J. Y. (2003). Automatic for the people: How representations of others implicitly affect goal pursuit. Journal of Personality and Social Psychology, 84, 661– 681. Shah, J. Y., & Kruglanski, A. W. (2003). When opportunity knocks: Bottom-up priming of goals by means and its effects on self-regulation. Journal of Personality and Social Psychology, 84, 1109 –1122. Simpson, J. A. (in press). Foundations of interpersonal trust. In A. W. Kruglanski & E. T. Higgins (Eds.), Social psychology: A handbook of basic principles (2nd ed.). New York: Guilford Press. Thibaut, J. W., & Kelley, H. H. (1959). The social psychology of groups. New York: Wiley. Thompson, L., & Fine, G. A. (1999). Socially shared cognition, affect, and

HIGH-MAINTENANCE INTERACTION behavior: A review and integration. Personality and Social Psychology Review, 3, 278 –302. Van Lange, P. A. M. (1999). The pursuit of joint outcomes and equality in outcomes: An integrative model of social value orientation. Journal of Personality and Social Psychology, 77, 337–349. Vohs, K. D. (2004, January). The health of romantic relationships relies on self-regulation. Paper presented at the meeting of the Society for Personality and Social Psychology, Austin, TX. Vohs, K. D., & Baumeister, R. F. (2004). Understanding self-regulation: An introduction. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation: Research, theory, and applications (pp. 1–9). New York: Guilford Press. Vohs, K. D., Baumeister, R. F., Twenge, J. M., Schmeichel, B. J., Tice, D. M., & Crocker, J. (2005). Decision fatigue exhausts self-regulatory resources—But so does accommodating to unchosen alternatives. Manuscript submitted for publication.

475

Vohs, K. D., & Heatherton, T. F. (2000). Self-regulatory failure: A resource-depletion approach. Psychological Science, 11, 249 –254. Vohs, K. D., & Schmeichel, B. J. (2003). Self-regulation and the extended now: Controlling the self alters the subjective experience of time. Journal of Personality and Social Psychology, 85, 217–230. Wallace, H. M., & Baumeister, R. F. (2002). The effects of success versus failure feedback on further self-control. Self and Identity, 1, 35– 42. Yang, Y., & Johnson-Laird, P. N. (2001). Mental models and logical reasoning problems in the GRE. Journal of Experimental Psychology: Applied, 7, 308 –316.

Received September 21, 2004 Revision received January 14, 2006 Accepted January 27, 2006 䡲

HIGH-MAINTENANCE INTERACTION behavior: A review and integration. Personality and Social Psychology Review, 3, 278 –302. Van Lange, P. A. M. (1999). The pursuit of joint outcomes and equality in outcomes: An integrative model of social value orientation. Journal of Personality and Social Psychology, 77, 337–349. Vohs, K. D. (2004, January). The health of romantic relationships relies on self-regulation. Paper presented at the meeting of the Society for Personality and Social Psychology, Austin, TX. Vohs, K. D., & Baumeister, R. F. (2004). Understanding self-regulation: An introduction. In R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation: Research, theory, and applications (pp. 1–9). New York: Guilford Press. Vohs, K. D., Baumeister, R. F., Twenge, J. M., Schmeichel, B. J., Tice, D. M., & Crocker, J. (2005). Decision fatigue exhausts self-regulatory resources—But so does accommodating to unchosen alternatives. Manuscript submitted for publication.

475

Vohs, K. D., & Heatherton, T. F. (2000). Self-regulatory failure: A resource-depletion approach. Psychological Science, 11, 249 –254. Vohs, K. D., & Schmeichel, B. J. (2003). Self-regulation and the extended now: Controlling the self alters the subjective experience of time. Journal of Personality and Social Psychology, 85, 217–230. Wallace, H. M., & Baumeister, R. F. (2002). The effects of success versus failure feedback on further self-control. Self and Identity, 1, 35– 42. Yang, Y., & Johnson-Laird, P. N. (2001). Mental models and logical reasoning problems in the GRE. Journal of Experimental Psychology: Applied, 7, 308 –316.

Received September 21, 2004 Revision received January 14, 2006 Accepted January 27, 2006 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 476 – 492

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.476

The Time Course of Grief Reactions to Spousal Loss: Evidence From a National Probability Sample Katherine B. Carnelley

Camille B. Wortman

University of Southampton

Stony Brook University, State University of New York

Niall Bolger

Christopher T. Burke

Columbia University

New York University

Most studies of widowhood have focused on reactions during the first few years postloss. The authors investigated whether widowhood had more enduring effects using a nationally representative U.S. sample. Participants were 768 individuals who had lost their spouse (from a few months to 64 years) prior to data collection. Results indicated that the widowed continued to talk, think, and feel emotions about their lost spouse decades later. Twenty years postloss, the widowed thought about their spouse once every week or 2 and had a conversation about their spouse once a month on average. About 12.6 years postloss, the widowed reported feeling upset between sometimes and rarely when they thought about their spouse. These findings add to an understanding of the time course of grief. Keywords: bereavement, widowhood, continuing bonds, meaning, positive growth

theories say little about the time course of grief reactions. Some bereavement theorists have suggested that the bereaved go through a series of stages or phases (see Aiken, 1991, for a review); Bowlby (1980), for example, proposed four phases: shock, yearning and protest, despair, and recovery. Bereavement researchers do not currently view phases of grief as fixed and sequential (Archer, 1999; M. S. Stroebe et al., 2001a). As thoughts and memories of the deceased are reviewed, the individual is believed to work through the implications of the loss (Rando, 1993; Worden, 2002). The notion that grief must be worked through has been the dominant perspective for the past century (Bonanno et al., 2002). However, the assumption that emotions need to be worked through has been questioned more recently (e.g., M. S. Stroebe & Stroebe, 1991). At this point, the bereaved is expected to reach a state of acceptance (Gorer, 1967; Hardt, 1978 –1979), reorganization of the mental representation of the lost person (Bowlby, 1980), or recovery (Glick, Weiss, & Parkes, 1974; Stephenson, 1985). Although many theorists have portrayed the grief process in several ways and there are debates about how it unfolds, none have described the time course of grief. They have not addressed how long it takes to go through stages or phases, work through grief, resolve what has happened, or recover. Indeed, bereavement researchers have begun to question the notion of recovery (e.g., Miller & Omarzu, 1998; M. S. Stroebe, Hansson, Stroebe, & Schut, 2001c), suggesting that “in the long-term, the bereaved do not simply ‘return to baseline’ following the loss” (M. S. Stroebe et al., 2001a, p. 746). In what follows, we summarize relevant past research that has focused on the bereaved individual’s continuing cognitive and emotional involvement with the lost person and eventual resolution of the loss. In addition, we discuss research suggesting that sometimes personal growth is an outcome of spousal loss. Finally, we present data from our cross-sectional, large-

Loss of a spouse is one of the most serious threats to health, well-being, and productivity that most people encounter during their lives (see M. S. Stroebe, Hansson, Stroebe, & Schut, 2001b, for a review), although there is considerable variability in responses to loss (e.g., Bonanno et al., 2002). Much research has investigated the impact of widowhood during the first few years postloss (e.g., Carnelley, Wortman, & Kessler, 1999); however, very little research has examined the long-term outcomes (positive and negative) and long-term processes (resolution and continuing involvement) associated with spousal loss. Indeed, M. S. Stroebe, Hansson, Stroebe, and Schut (2001a) called for research to examine the time course of grief reactions. The present study investigated the time course of grief reactions to spousal loss in a nationally representative U.S. sample using Wave 1 of the Americans’ Changing Lives (ACL) data set (House et al., 1990). Although many theorists have described the process through which individuals come to terms with the loss of a spouse, their

Katherine B. Carnelley, School of Psychology, University of Southampton, Highfield, Southampton, United Kingdom; Camille B. Wortman, Department of Psychology, Stony Brook University, State University of New York; Niall Bolger, Department of Psychology, Columbia University; Christopher T. Burke, Department of Psychology, New York University. This research was conducted at the Survey Research Center at the Institute for Social Research, University of Michigan, Ann Arbor, Michigan, and was supported by National Institute on Aging Grant R01 A610757 (Camille B. Wortman, principal investigator; Ronald C. Kessler, coprincipal investigator), by Training Grant T32 MH16806 (Ronald C. Kessler, principal investigator), and by Research Scientist Development Award 1 K02 MH99507 to Ronald C. Kessler from the National Institute of Mental Health. We thank Ronald C. Kessler for guidance with the analysis strategy. Correspondence concerning this article should be addressed to Katherine B. Carnelley, School of Psychology, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom. E-mail: [email protected] 476

GRIEF REACTIONS TO SPOUSAL LOSS

scale, national study to examine the time course of widowhood grief reactions.

Continuing Involvement and Emotional Resolution Bereaved individuals often feel a bond with the deceased that can continue for decades (e.g., Shaver & Tancredy, 2001). This continuing sense of connection does not necessarily indicate poor adjustment to loss (Bonanno, Wortman, & Nesse, 2004; Klass, Silverman, & Nickman, 1996). Although there are various types of continuing involvement, our review primarily focuses on those investigated in the present study: memories and conversations about the deceased and anniversary reactions. Memories of a lost spouse may simultaneously bring comfort and cause distress. Shaver and Tancredy’s (2001) discussion of contemporary emotion theory (e.g., Frijda, 1986, 1988; Lazarus, 1991, 1999) and attachment theory (Bowlby, 1969, 1973, 1980; Cassidy & Shaver, 1999) helps one to understand the relationship between memories and emotions. The bereaved hold many affectively charged representations of the lost spouse and of the spouse in interaction with the self. A priming of one of these memories activates emotions that are difficult to ignore. Grieving is partly a matter of bumping up against these thoughts and feelings over a period of months or years and acknowledging both their affective charge and their inadequacy as representations of current reality. They have to be reworked . . . or weakened by habituation. (Shaver & Tancredy, 2001, p. 72)

Weiss (2001) stated that it is the persistence of these affectively charged representations of the attachment figure (spouse) that enable the bereaved to feel a continued connection with the deceased. Although many theorists have maintained that over time, the bereaved person’s memories are worked through so that they are no longer painful (Parkes & Weiss, 1983; Rando, 1993; Worden, 2002), more recently, bereavement researchers have begun to question this assumption (e.g., M. S. Stroebe & Stroebe, 1991). Hence, it is important to assess emotional resolution (Weiss, 1988), or how long the bereaved continue to experience emotional pain when they think or talk about their spouse or when they encounter reminders. Although they did not assess whether or not memories were painful, Bonanno et al. (2004) did examine how typical it was for different types of grievers to think about their lost spouse over time. Their results showed that thinking about one’s spouse from 6 to 18 months postloss is quite common (at 6 months, such thoughts ranged from daily or almost daily to several times a day, and at 18 months, they ranged from approximately 4 times a week to between daily and several times a day). Their findings also suggest that those who think about their spouse the most show worse adjustment to loss; this may be due to rumination. Research needs to examine the extent to which these thoughts are negative or positive and comforting. An important question concerns the role of positive memories of one’s spouse in adjustment to loss. Do positive memories increase over time as a person comes to terms with the loss, or do they decrease over time as the loss becomes less salient to the widowed spouse? Field, Nichols, Holen, and Horowitz (1999) examined the consequences of continuing involvement in the form of fond memories on distress levels in a monologue role-playing with

477

one’s deceased spouse 6 months postloss. They found that those participants who frequently felt comforted from memories of their spouse showed less grief severity and less helplessness about coping with the loss in the role-playing. Consistent with this, Bonanno et al. (2004) found that at 6 and 18 months postloss, receiving comfort from positive memories of one’s spouse was most characteristic of the better functioning bereaved. However, in contrast, Field, Gal-Oz, and Bonanno (2003) examined the effects of fond memories about the deceased on grief 5 years postloss and found that having many fond memories was correlated with more grief but uncorrelated with depression, the Symptom Checklist (Derogatis, 1983), and positive states of mind. Taken together, these results suggest that fond memories may be more beneficial shortly after the loss than several years postloss; people who have many fond memories of their spouse 5 years postloss may also be inclined to ruminate about the loss and experience mental anguish. Continuing involvement via conversation with others and its impact on grief reactions are understudied (Klass & Walter, 2001). An exception is work by Bonanno et al. (2004), who found that at 6 months postloss, the bereaved talked about their spouse from once a week to two or three times a week; at 18 months postloss, they talked about their spouse about once a week. This shows that the widowed talk about their spouse fairly often up to 18 months postloss. The present study examined the frequency of conversations, memories, and emotional resolution over time and their rate of change over time in the long-term adjustment to loss. Another goal of the study was to learn more about anniversary reactions following conjugal loss. Anniversary reactions are periods of acute grief triggered by occasions associated with the deceased (e.g., date of death or birthday). It has been argued that such reactions are often quite intense even several years after the loss (see Rando, 1993, for a detailed discussion). Although there are few relevant studies, they have suggested that such reactions are quite common. In a study of people who lost a loved one anywhere from one month to 22 years previously, Zisook, Devaul, and Click (1982) found that 25% of the respondents reported being upset at the anniversary of the death. A major aim of the present study has been to provide more information about the prevalence of anniversary reactions over time. We examined the frequency, intensity, and duration of anniversary reactions and their rate of change over time.

Finding Meaning in the Loss One reason why the loss of a spouse can have such a powerful impact on well-being is that it can deprive the bereaved person’s life of meaning (Marris, 1958). Janoff-Bulman (1992) argued that individuals have three core assumptions to their inner world: (a) They are worthy, (b) the world is benevolent, and (c) what happens to them makes sense. The death of a spouse can shatter these assumptions, leaving the individual to rebuild his or her assumptive world and reestablish meaning. Janoff-Bulman and Frantz (1997) and Davis, Nolen-Hoeksema, and Larson (1998) discussed two types of meaning that can be found in a loss: (a) making sense of the event and (b) finding value or benefit in the experience. Parkes and Weiss (1983) found that 2 to 4 years later, 61% of suddenly bereaved individuals and 29% of forewarned bereaved individuals were still questioning why the death had happened. Not surprisingly, it is easier to make sense of a natural death than a

CARNELLEY, WORTMAN, BOLGER, AND BURKE

478

sudden death. For example, Lehman, Wortman, and Williams (1987) found that of those who lost their spouse in a car accident, 68% had not found meaning in the loss 4 –7 years later. In contrast, in a sample of elderly, conjugally bereaved individuals whose spouses died of various causes (Bonanno et al., 2004), most did not search for meaning (71% 6 months postloss and 72% 18 months postloss). They found that 14% of the widowed searched for but did not find meaning in the loss at 6 and 18 months postloss, and 15% and 13% searched for and found meaning at 6 months and 18 months postloss, respectively. Interestingly, Davis and NolenHoeksema (2001) found that bereaved who had difficulty making sense of the loss at 6 months postloss also tended to have difficulty making sense of it later (at 18 months). In addition, those who make sense of it later for the first time provide explanations that are not comforting (i.e., the world is not a just, ordered, or benevolent place). Research has suggested that being able to find positive meaning in a loss leads to better adjustment. Davis et al. (1998) found that making sense of the loss of a family member at 6 months postloss was associated with less distress at 6 and 13 months postloss but not associated with distress at 18 months postloss. In addition, finding benefit in the loss was associated with adjustment at 6, 13, and 18 months postloss. However, their results suggest that finding meaning leads to better adjustment primarily when meaning is found relatively shortly after the death. Specifically, making sense of the loss by 6 months postloss was associated with less distress; however, making sense of the loss for the first time at a later date was not associated with distress. Finally, some research has suggested that those who adjust best do not search for meaning. Davis, Wortman, Lehman, and Silver (2000) found in a sample of adults who lost a spouse or child in a car accident that those who never searched for meaning showed better adjustment than those searching but not finding meaning; those who found meaning did not differ significantly from either group, but their scores fell between the other two groups. Similarly, Bonanno et al. (2004) found that not searching for meaning was most typical of their resilient grievers at 6 and 18 months postloss, whereas searching for and finding meaning at 18 months was most typical of their chronic grievers. Taken together, these results suggest that two groups of the bereaved show the least distress: (a) those who never search for meaning and (b) those who search for meaning, make sense of the death early, and are able to hold onto that meaning over time. The present study investigated the cognitive resolution of loss over a longer time frame, including searching for meaning, whether meaning was found, and the kinds of meanings found.

Personal Growth Although widowhood can have profound negative effects, there is also some evidence suggesting that it may ultimately lead to psychological growth (e.g., Schaefer & Moos, 2001; Wortman & Silver, 1990). As a result of the loss, the bereaved person may learn a new set of skills that can result in positive changes, such as enhanced self-competence (e.g., Lopata, 1973). Some studies have shown that the widowed report increased personal growth and the discovery of new strengths, more independence, control, competence, resilience, self-assurance, and self-efficacy (Arbuckle & de Vries, 1995; Calhoun & Tedeschi, 1990; Fry, 1998; Lieberman,

1996; Thomas, DiGiulio, & Sheehan, 1988). These studies have shown reports of growth 1 to 15 years after the loss. In addition, Bonanno et al. (2004) showed that perceived benefits of loss increase from 6 months to 18 months postloss. In the present study, we examined levels of positive change over a much longer period of time, focusing on perceptions of increased self-confidence and personal growth.

The Present Study The present study has several methodological strengths that improve on much of the prior research. First, it was based on a large, nationally representative sample of 768 men and women who lost their spouse. The response rate, 67%, although not ideal, is higher than most previous bereavement studies (e.g., Parkes & Weiss, 1983). Second, our study incorporated a wide range of measures of reactions to loss over time, including key process and outcome measures identified in prior research. Third, the study is one of the few to help clarify how people typically experience grief reactions many years after the loss. Fourth, respondents were not recruited for a study of bereavement but rather for a study on productivity, stress, and health. There is evidence to suggest that willingness to participate in studies of bereavement is affected by the respondent’s level of depression and other psychological variables (M. S. Stroebe & Stroebe, 1989), making it difficult to draw inferences about the impact of conjugal loss. Because the study was not described as a study of bereavement and because most respondents were approached a considerable period of time after the loss, such selection problems are less likely to have occurred. Finally, the study helps to clarify what form adjustment to loss takes. Is there a great deal of change during the first few months or years after the loss, followed by a period of more gradual change, or is there a steady change as time unfolds? An important consideration in our analysis is the functional relationship between time since widowhood and measures of adjustment. Theories of self-regulation, such as that of Carver and Scheier (1982), suggest that self-regulatory (or adjustment) behavior in people can be usefully described using a negative feedback model that (for our purposes) has three key features. First, regulatory systems can be driven out of equilibrium by external forces, just as a person would be driven out of psychological equilibrium by the passing of his or her spouse. Second, regulatory systems often involve negative feedback loops and move toward some equilibrium or end state, just as a person who has been widowed usually moves toward a state of recovery or psychological equilibrium. Third, when in a state of disequilibrium, people engage in regulatory behaviors to reduce perceived discrepancies between their current state and the end or goal state. The magnitude of the regulatory behavior should then depend on the magnitude of the perceived discrepancy, with larger discrepancies leading to stronger regulatory behaviors. If this model accurately represents the process of adjustment to widowhood, then a simple linear model cannot describe the relationship between time since widowhood and an individual’s level of adjustment. Rather, we would expect that change would be relatively rapid initially and would decelerate as it approached an equilibrium level. Estimating such a model can be accomplished by specifying the specific functional form of the relationship and using nonlinear regression methods (Rawlings, Pantula, & Dickey, 1998; Singer & Willett, 2003). A wide range of biological systems

GRIEF REACTIONS TO SPOUSAL LOSS

and processes seem to involve negative feedback systems such as this (McGuigan, 1994), and a negative exponential (or exponential decay) function has often been used to describe this movement toward equilibrium (Singer & Willett, 2003). Thus, we believe that this function better explains the pattern of adjustment as a function of time than a simple linear model. There are, however, two complications that arose in our study with respect to this approach. First, because our data were crosssectional, we could not directly observe within-person change but could only observe differences between persons who differed in time since widowhood. Thus, it was important to measure and adjust for influences that could produce spurious differences in adjustment and affect the shape of the adjustment function, as in the case in which those who lost spouses more recently had losses that were less unexpected or less sudden than those in previous decades. Furthermore, there is the issue of heterogeneity in adjustment processes. As Bonanno and his colleagues (2002) have shown, not all people experience bereavement in the same way. Given that the variables measured in our study are in many cases novel assessments of grief processes, we have chosen to focus on overall patterns across a broad range of adjustment measures, on the assumption that future, more focused studies would explore heterogeneity in grief processes. In sum, our overall goal was to assess how quickly and completely people adjust to the loss of their spouse and to provide guideposts validated by research data indicating what typical grieving is. The research was designed to address a number of specific questions. Drawing on comparisons of participants who differed in time since bereavement, what would the shape of the time-since-loss function be, and more specifically, how would the implied rate of change vary depending on the length of time since the loss? Would the function imply that there was a great deal of adjustment in the first few months or years after the loss, followed by a period of slower or more gradual change (i.e., negative exponential), or would there be a steady change as time increased (i.e., linear)? Would it be typical for a bereaved person to think or talk about the spouse 10, 20, or 30 years after the loss, or would such thoughts occur infrequently by the end of the first decade? How long would it be typical to experience painful or fond memories? Would those who experienced conjugal loss show evidence of positive psychological change? Answers to these questions should not only clarify the nature and duration of bereavement processes for a nationally representative sample of the U.S. population but also enhance the ability to intervene effectively to promote better adjustment.

Method Participants Participants were drawn from a nationally representative sample of U.S. adults who were interviewed as part of a large-scale study of productivity, stress, and health in middle and late life. This study, the ACL study mentioned above, was conducted by the Survey Research Center at the University of Michigan (for more information about the ACL study, see House et al., 1990). The study involved face-to-face interviews with a multistage, stratified probability sample of noninstitutionalized persons 25 years of age or older and living in the continental United States. To obtain enough older and Black respondents to permit analysis by subgroups and to maximize the number of widowed respondents who would be inter-

479

viewed, Blacks and respondents over 60 years old were sampled at twice the rate of Whites under 60. The final sample included a total of 3,617 respondents, reflecting an overall response rate of 67%. As is usual in survey research, in analyses to be reported here, the data were weighted to adjust for variations in probabilities of selection and in response rates across sampling areas (see Kalton & Flores-Cervantes, 2003, and Lessler & Kalsbeek, 1992, for more detail on survey weighting procedures). In addition, poststratification weights were added to make the weighted sample correspond to the July 1985 Current Population Survey (United States Bureau of the Census, 1985) estimates by sex, age (25– 64 years old and 65⫹ years old), and region (Northeast, Midwest, South, and West). Interviews were conducted with 3,212 persons who had ever been married; 7861 of these persons (155 men and 631 women) had experienced the death of their spouse prior to the interview anywhere from less than 1 to 64 years previously. To give a better sense of how the number of years respondents had been widowed was distributed, the lower quartile was 0 to 5 years, the second quartile was 5 to 11 years, the third quartile was 11 to 21 years, and the fourth quartile was 21 to 64 years. The data were collected between May and December of 1986, and the interview averaged 86 minutes in length. All respondents were asked questions about their health, well-being, productive activities, stressful life experiences, and coping resources. In addition, people who had lost a spouse were asked about the circumstances surrounding the loss and their current thoughts and feelings about the loss. It was felt that answering questions that focused on the loss (e.g., frequency of memories) might be too upsetting for those who were recently widowed, and for this reason, respondents widowed in the past 3 months (n ⫽ 15) were not asked these questions. An additional 3 respondents were missing data on the variable indicating whether the death had been unexpected. Thus, the effective n for all analyses (unless otherwise noted) was 768.

Measures Our analyses focused on measures of bereaved respondents’ current assessments of their emotional recovery, including the nature and frequency of thoughts and memories of the deceased, emotional and cognitive resolution of the loss, and perceptions of positive change. To facilitate comparisons among the measures, we followed the recommendations of Cohen, Cohen, Aiken, and West (1999) and rescaled all dependent variables such that the lowest possible score was 0 and the highest possible score was 100. Scores on each dependent measure, therefore, can be thought of as percentages of the total possible score. Because this approach makes it more difficult to interpret results in terms of the original ratingscale labels, we include these labels on all graphs of the data. Continuing involvement with the deceased and emotional resolution. To assess continuing involvement with the deceased spouse and to determine the extent of emotional resolution, respondents were asked to indicate (a) how often, during the past 3 months, they had thoughts or memories about their late husband or wife (M ⫽ 62.6, SD ⫽ 36.5, skewness ⫽ ⫺0.41, kurtosis ⫽ ⫺1.33), and (b) how often, during the past 3 months, they talked about their late husband or wife (M ⫽ 42.5, SD ⫽ 34.7, skewness ⫽ 0.31, kurtosis ⫽ ⫺1.27). For each of these questions, respondents were asked to indicate the frequency of thoughts or conversations on the following 7-point scale: 1 (never), 2 (less than once a month), 3 (about once a month), 4 (2 or 3 times a month), 5 (about once a week), 6 (2–3 times a week), and 7 (daily or almost daily). Respondents were also asked to indicate (c) how often thinking or talking about their late husband or wife made them feel happy (M ⫽ 58.7,

1

This excludes 22 respondents who were separated before widowhood.

480

CARNELLEY, WORTMAN, BOLGER, AND BURKE

SD ⫽ 34.1, skewness ⫽ ⫺0.33, kurtosis ⫽ ⫺1.01) and (d) how often thinking or talking about him or her made them feel sad or upset (M ⫽ 43.1, SD ⫽ 32.1, skewness ⫽ 0.29, kurtosis ⫽ ⫺0.79) on the following 5-point scale: 1 (never), 2 (rarely), 3 (sometimes), 4 (often), and 5 (almost always). Finally, respondents were asked to indicate (e) whether they had experienced particular occasions during the past year (e.g., the date of their husband’s or wife’s death or his or her birthday) when the sadness and loneliness that they experienced right after the death returned to them on the following 5-point scale: 1 (no, never), 2 (yes, but rarely), 3 (yes, some), 4 (yes, frequently), and 5 (yes, all the time). In the widowed sample, the average respondent reported a level of 37.2, with a standard deviation of 34.6, skewness of 0.37, and kurtosis of ⫺1.15. Those respondents who had experienced responses of this sort, commonly called anniversary reactions in the bereavement literature, were also asked to rate on 5-point scales (f) how long such reactions typically lasted—1 (a few moments), 2 (a few hours), 3 (a day or so), 4 (a few days), and 5 (a week or longer)—and (g) how intense such feelings usually were—1 (not at all), 2 (just a little), 3 (somewhat), 4 (quite), and 5 (extremely). The average length of anniversary reactions within the widowed sample was 20.1 units, with a standard deviation of 26.2, skewness of 1.17, and kurtosis of 0.53. The average intensity of anniversary reactions was 52.1 units, with a standard deviation of 28.6, skewness of 0.21, and kurtosis of ⫺0.82. These variables showed substantial intercorrelation. Frequency of thoughts about the partner was correlated .69 with frequency of conversations about the partner and .56 with frequency of anniversary reactions. Frequency of anniversary reactions was also correlated .46 with frequency of conversations about the deceased spouse and .45 with frequency of sad or upsetting thoughts and conversations about the spouse. Finally, the frequency of anniversary reactions was correlated .41 with the intensity of such reactions. Finding meaning in the loss. To assess cognitive resolution following the loss, or the extent to which respondents had been able to come up with a satisfactory account of what had happened, respondents were asked to rate (a) whether they were currently searching to make sense or find some meaning in their spouse’s death on a 5-point scale ranging from 1 (no, never) to 5 (yes, all the time). This measure had a mean of 29.5 units, a standard deviation of 31.7, skewness of 0.63, and kurtosis of ⫺0.77. Respondents who reported having ever searched for meaning were asked to rate (b) whether they had made any sense or found any meaning in their husband’s or wife’s death on the following 5-point scale: 1 (no, not at all), 2 (yes, a little), 3 (yes, some), 4 (yes, quite a bit), and 5 (yes, a great deal). For this measure, the mean response was 24.6 units, with a standard deviation of 32.9, skewness of 1.05, and kurtosis of ⫺0.22. Respondents were also asked to indicate how true they thought it was that (c) they did not question their spouse’s death because it was meant to be (M ⫽ 70.9, SD ⫽ 38.0, skewness ⫽ ⫺0.88, kurtosis ⫽ ⫺0.79), (d) they felt their spouse’s death was senseless and unfair (M ⫽ 28.9, SD ⫽ 38.7, skewness ⫽ 0.93, kurtosis ⫽ ⫺0.74), (e) they did not worry about finding meaning in their spouse’s death because these things just happen (M ⫽ 76.2, SD ⫽ 34.3, skewness ⫽ ⫺1.17, kurtosis ⫽ ⫺0.02), and (f) they believed their spouse was better off now than if he or she had lived longer (M ⫽ 58.3, SD ⫽ 43.0, skewness ⫽ ⫺0.33, kurtosis ⫽ ⫺1.62). These variables were measured on a 4-point scale: 1 (not at all true), 2 (somewhat true), 3 (mostly true), and 4 (very true). The correlations among these variables were more modest than those in the previous section. Here, the extent to which respondents reported that the death was meant to be was correlated .41 with the extent to which they reported that death is something that just happens. Interestingly, the extent to which respondents reported currently searching for meaning in the death showed sizable correlations with several variables in the previous section related to continued involvement and emotional resolution. It correlated .49 with the frequency of thoughts about the spouse, .40 with the frequency of

conversations about the spouse, and .52 with the duration of anniversary reactions. Personal growth. Finally, respondents were asked two questions regarding the extent to which they had experienced personal growth as a result of the loss. Respondents were asked to indicate their agreement with the statement “I have become more self-confident as a result of having to manage without my husband/wife.” The average level of this measure across the widowed sample was 70.7 units, with a standard deviation of 36.2, skewness of ⫺0.85, and kurtosis of ⫺0.71. Respondents were also asked to evaluate how much they felt like a stronger person for coping with their spouse’s death. This measure had a mean of 65.5 units, standard deviation of 37.7, skewness of ⫺0.60, and kurtosis of ⫺1.11. For both of these questions, respondents rated their agreement on a 4-point scale ranging from 1 (not at all true) to 4 (very true). Responses to these questions were correlated .46 across the sample.

Results Characteristics of the Widowed Sample Table 1 summarizes the characteristics of the 768 respondents who constituted the widowed sample. As this table illustrates, the typical respondent in the study was 70 years old and had lost his or her spouse after nearly 30 years of marriage. The average age of the spouse who died was 59.2 years, and respondents had been widowed an average of 15 years at the time of the interview. Twenty percent of the sample were men.

Overview of Results Below, we first examine various indicators of continuing psychological involvement with the spouse, such as the frequency of thoughts and memories, and indicators of emotional resolution of the loss, such as the extent to which memories made the respondent feel happy versus upset. Then, we describe the relationship between time since widowhood and various indicators of cognitive resolution of the loss. Finally, we examine indicators of positive growth and change.

Analysis Approach Given our focus on the long-term course of adjustment, we expected to observe a temporal process of equilibration in which levels of grief process variables moved toward some long-term equilibrium level. As noted earlier, the mathematical function for modeling equilibration with time is usually a negative exponential in which temporal changes are relatively rapid at first and slow with time until an asymptote is reached. Preliminary graphical analyses confirmed our expectations for many of the process variables. These preliminary analyses were conducted as follows. Time since widowhood was grouped into 5-year intervals, and mean levels of each process variable in each interval were calculated, adjusting for a set of control variables. These included demographic variables such as sex, race, education, and age of respondent; contextual variables relating to the loss event such as age of spouse and number of children at the time of widowhood, whether the death was expected or not, and whether the death was due to murder, accident, or suicide; and, finally, a measure of current relationship status, that is, whether the respondent was remarried at the time of the interview. Each of the control variables was

GRIEF REACTIONS TO SPOUSAL LOSS

481

Table 1 Characteristics of the Widowed Sample Total (n ⫽ 768)

Men (n ⫽ 149)

Women (n ⫽ 619)

Characteristic

M

SD

M

SD

M

SD

Average age (years) Average age when widowed (years) Education (number of years) Spouse’s age at death (years) Total number of children % currently married Race White (%) Black (%) Other (%) Religion Protestant (%) Catholic (%) Jewish (%) Other/none (%) Time since death (years) Died suddenly (%) Number of years married

70.0 55.2 9.9 59.2 3.0 15.9

11.1 14.8 3.7 15.4 2.4

70.6 57.1 9.7 54.9 3.2 30.2

12.1 16.6 4.1 16.6 2.5

69.9 54.7 10.0 60.3 3.0 12.4

10.8 14.3 3.7 15.0 2.4

63.9 34.1 2.0

65.1 30.9 4.0

63.6 34.9 1.5

76.2 18.6 1.6 3.6 14.8 47.9 29.0

69.8 20.8 1.3 8.1 13.5 43.0 29.5

77.7 18.1 1.6 2.6 15.2 49.1 28.9

11.7 16.0

included because it could plausibly be associated with both time since widowhood and the grief response, thereby leading to a bias in the relationship between the two. These adjusted means (also knows as least squares means) represent the predicted level of a given outcome for each time interval for individuals at the mean on the control variables, and they were computed using the LSMEANS option in the general linear model (GLM) procedure available in SAS software (SAS Institute, 2004). As we illustrate in the graphs presented below, although many of the process variables showed a negative exponential pattern with time, some showed what appeared to be a linear trend, and some showed no relationship with time. In our modeling of the functional form, we allowed for all three possibilities. The simplest or null model, shown in Figure 1(i), was one in which, after adjusting for the control variables (xi) listed above, there was no relationship between time since widowhood (t) and a given process measure, labeled y. In formal terms, y⫽a⫹0⫻t⫹

冘

cixi ⫹ ε.

(1)

Each control variable (xi) was mean centered, that is, the sample mean was subtracted from each person’s score (resulting in a mean of 0 for all control variables). With the controls coded in this way,

Figure 1.

12.6 16.4

11.5 16.0

the intercept a is the mean of y, and the relationship between time (t) and y net of the controls is constrained to be 0. Each ci represents the unique linear relationship between control variable i and y. Finally, ε is a random variable that represents all other remaining influences on y. It is assumed that these (a) average to zero, (b) are uncorrelated with the other variables in the model, and (c) have the same variability over time and the covariates. Note that time t in this and the other two regression models discussed below is the original, continuous time scale, not the categorized version of time since widowhood used to estimate the adjusted means discussed above. The second or linear model, shown in Figure 1(ii), specified that after adjusting for the control variables, there was a linear relationship between time since widowhood and y, as follows: y ⫽ a⬘ ⫹ b ⫻ t ⫹

冘

c⬘ixi ⫹ ε⬘.

(2)

The intercept a⬘ is the value of y for a typical person who has just been widowed. The slope b is the expected difference in each recovery measure y associated with a one-year difference in time since widowhood. Here, each c⬘i represents the unique linear relationship between control variable i and y, adjusting for time. As in the null model, ε⬘ is a random variable related to the unmeasured

Characteristic forms of the null, linear, and negative exponential models.

CARNELLEY, WORTMAN, BOLGER, AND BURKE

482

Table 2 Goodness-of-Fit and Improvement-in-Fit Measures for Null, Linear, and Negative Exponential Models: Measures of Continued Involvement and Emotional Resolution Measure Frequency of thoughts and memories Null model Linear model Negative exponential model Frequency of conversations Null model Linear model Negative exponential model Positive affect Null model Linear model Negative exponential model Negative affect Null model Linear model Negative exponential model Frequency of anniversary reactions Null model Linear model Negative exponential model Length of anniversary reactions Null model Linear model Negative exponential model Intensity of anniversary reactions Null model Linear model Negative exponential model Note.

2 ⌬Radj

RMSE

2 Radj

21.4 20.6 20.5

.3696 .4151 .4173

.0455 .0477

⬍ .001 ⬍ .001

22.5 21.4 21.2

.2142 .2926 .3053

.0784 .0911

⬍ .001 ⬍ .001

25.0 25.0 25.0

.0595 .0587 .0608

⫺.0008 .0013

.536 .227

22.8 22.6 22.5

.0710 .0873 .0969

.0163 .0259

⬍ .001 ⬍ .001

23.6 22.1 21.9

.1320 .2404 .2512

.1085 .1192

⬍ .001 ⬍ .001

18.9 18.9 19.0

.0131 .0124 .0117

⫺.0006 ⫺.0013

.404 .505

21.1 20.9 20.5

.0332 .0479 .0875

.0147 .0543

.004 ⬍ .001

p

RMSE ⫽ root-mean-square error.

influences on y, conforming to the same set of assumptions laid out above. The third or negative exponential model, illustrated in Figure 1(iii), specified that after adjusting for the control variables (xi), there was a negative exponential relationship between time since widowhood (t) and y such that adjustment was relatively rapid at first but gradually slowed until an asymptote was reached. The model was as follows: y ⫽ f ⫺ d ⫻ e⫺s⫻t ⫹

冘

c⬙ixi ⫹ ε⬙,

(3)

in which y is the measure of recovery, f is the final or asymptotic level of y, d is the distance between the initial level and the final level of y, e is a mathematical constant representing the base of the natural logarithm, and t is time in years since widowhood. The final parameter, s, is known as the decay constant. It is a positive number related to the rate of adjustment, with larger numbers representing more rapid adjustment. In this specification, f ⫺ d is the equivalent of a and a⬘ in the previous models, that is, a⬙. Again, each c⬙i represents the unique linear relationship between control variable i and y. Once again, ε⬙ corresponds to all other random, unmeasured influences on y. The same assumptions described for the null model also apply to this residual term in the negative exponential model. The estimation of all three models was accomplished using the NLIN procedure available with SAS software (SAS Institute, 2004), although the null and linear models could have equivalently been estimated with several other SAS procedures (e.g., REG or

GLM) or any software capable of least squares linear regression.2 To determine which of the three models was most appropriate for each measure, we computed adjusted goodness-of-fit indices for each model for each measure (shown in Tables 2 and 3). Like many other methods for computing regression estimates, the NLIN procedure uses least squares estimation to arrive at these estimates (SAS Institute, 2004). That is, the estimates produced are those that minimize the sum of squared residuals. Note that because the linear and negative exponential models are not nested models, they cannot be directly compared in terms of their fit. They can, however, be compared in terms of how much each improves the fit compared with the null model. A simple change in R2 statistic would not suffice for this purpose, though, because it does not take the relative degrees of freedom of the two models into account, namely, that one degree of freedom is lost when fitting the linear model, whereas two degrees of freedom are lost when fitting the negative exponential model. To account for the different degrees of freedom of the linear and negative expo-

2 Unlike some other regression programs, the NLIN procedure requires the user to provide starting values for the parameter estimates. These starting values can be estimated from descriptive data and plots. An additional feature of NLIN is the ability to set bounds on the parameter estimates. In all negative exponential regression analyses presented here, f and f ⫺ d from Equation 3 were constrained to be between 0 and 100, and s was constrained to be 0 or larger.

GRIEF REACTIONS TO SPOUSAL LOSS

483

Table 3 Goodness-of-Fit and Improvement-in-Fit Measures for Null, Linear, and Negative Exponential Models: Measures of Finding Meaning and Personal Growth Measure

2 Radj

RMSE

2 ⌬Radj

p

Finding meaning Death was meant to be Null model Linear model Negative exponential model Death just happens Null model Linear model Negative exponential model Currently searching for meaning Null model Linear model Negative exponential model Ever find meaning Null model Linear model Negative exponential model Death was senseless Null model Linear model Negative exponential model Spouse is better off now Null model Linear model Negative exponential model

28.5 28.2 28.2

.0825 .0983 .0977

.0158 .0151

⬍ .001 ⬍ .001

24.6 24.3 24.3

.0613 .0879 .0864

.0266 .0250

⬍ .001 ⬍ .001

22.9 22.0 21.8

.1631 .2246 .2385

.0615 .0754

⬍ .001 ⬍ .001

23.5 23.6 23.6

⫺.0059 ⫺.0087 ⫺.0145

⫺.0028 ⫺.0086

.678 .814

27.4 27.3 27.3

.1566 .1673 .1677

.0107 .0111

.001 .003

30.8 30.8 30.8

.1173 .1170 .1159

⫺.0003 ⫺.0014

.381 .379

25.3 25.2 25.2

.1045 .1102 .1112

.0058 .0067

.015 .022

26.3 26.3 26.3

.1014 .1002 .1029

⫺.0011 .0015

.841 .195

Personal growth Gained self-confidence Null model Linear model Negative exponential model Stronger person as a result Null model Linear model Negative exponential model Note.

RMSE ⫽ root-mean-square error.

nential models, we computed adjusted R2 and change in adjusted R2 values to assess model fit (see Tables 2 and 3).3 2 For each measure, these tables show the Radj value for each model, the improvement in fit for both the linear and negative exponential models compared with the null model, and the significance test of the improvements in fit. In most cases, both the linear and negative exponential models improved the fit compared with the null model. In these cases, we directly compared the 2 change in Radj values of the two models. We used the convention 2 that a difference in change in Radj of .0100 (i.e., 1%) between the two models indicated a sizable difference in fit. Generally, the negative exponential model showed a greater improvement in fit than the linear model, confirming the descriptive evidence discussed above. In such cases, we show (in Figures 2, 3, and 4) the fitted function for the negative exponential model. In cases in which the linear model showed a greater improvement in fit, we show the linear function. For simplicity, in cases in which both the linear and negative exponential models improved the fit to an approximately equal degree, we show only the fitted results for the

negative exponential model in the figures. In cases in which neither improved the fit, we show the implied zero-slope line from the null model. On the basis of the fit statistics presented in Tables 2 and 3, Tables 4, 5, and 6 present the regression estimates and several additional pieces of information regarding the best fitting model or models for each dependent variable. First, these tables identify which of the three models best described each outcome. In cases in which the linear and negative exponential models showed equal

2 Both normal R2 values and Radj values are easily computed from the analysis of variance table that accompanies regression analyses in most common statistical packages. Whereas R2 is simply the sum of squares 2 accounted for by the model divided by the total sum of squares, Radj is given by the following formula: 3

2 ⫽ Radj

SSmodel /dfmodel . SStotal /dftotal

484

CARNELLEY, WORTMAN, BOLGER, AND BURKE

Figure 2. Functional relationships between years since widowhood and measures of continuing involvement and emotional resolution, adjusted for sex, race, education, age of respondent; age of spouse and number of children at time of widowhood; whether the death was expected or not; whether the death was due to murder, accident, or suicide; and whether the respondent became remarried. Points indicate least squares means for 5-year intervals, and error bars represent standard errors of those means.

improvement in fit, we show results for both in the tables. We also indicate the rate of change of each measure as a function of time. For those variables showing a linear relationship, this corresponds to b in Equation 2 and represents the number of units on the dependent variable corresponding to a one-year difference in time since widowhood, adjusting for the control variables listed above. For example, knowing that the linear rate of change in frequency

of thoughts about the deceased spouse is ⫺.89 units per year means that we would expect respondents widowed 10 years apart to differ on this variable by 8.9 units. For those variables showing a negative exponential relationship, the value listed in the table corresponds to s in Equation 3, although no such easy calculation of adjustment level is possible here because the rate of change depends on time itself. To determine how respondents were doing

GRIEF REACTIONS TO SPOUSAL LOSS

485

Figure 3. Functional relationships between years since widowhood and measures of meaning finding, adjusted for sex, race, education, age of respondent; age of spouse and number of children at time of widowhood; whether the death was expected or not; whether the death was due to murder, accident, or suicide; and whether the respondent became remarried. Points indicate least squares means for 5-year intervals, and error bars represent standard errors of those means.

after a specified number of years, it is necessary to calculate the predicted levels of y using Equation 3. Tables 4 – 6 provide several additional pieces of information about the models we estimated. First, they provide the initial value or intercept predicted for each model. For all three models, this has the same interpretation, namely, the predicted level of the depen-

dent variable for people at the point of widowhood, adjusting for the control variables. Second, for measures in which the negative exponential model provided the best fit, these tables provide what we call the 90% asymptotic level. This value represents the level of adjustment corresponding to 90% of the distance between the initial value and the asymptotic value. Finally, for those variables

CARNELLEY, WORTMAN, BOLGER, AND BURKE

486

Figure 4. Functional relationship between years since widowhood and measures of personal growth, adjusted for sex, race, education, age of respondent; age of spouse and number of children at time of widowhood; whether the death was expected or not; whether the death was due to murder, accident, or suicide; and whether the respondent became remarried. Points indicate least squares means for 5-year intervals, and error bars represent standard errors of those means.

in which there was a negative exponential relationship between time since widowhood and adjustment, we used Equation 3 to estimate the number of years it would take to reach 25%, 50%, 75%, and 90% of the distance to the asymptotes. These values provide a view of adjustment as a function of time for the negative exponential models that is easier to interpret than the s parameter of Equation 3. In those cases in which the relationship between the dependent variable and time since widowhood is linear, identifying a specific long-term recovery level is not possible because the rate of recovery does not change over time. Thus, for those variables in

which the model was linear, we left the 25%, 50%, 75%, and 90% columns blank.

A Note on the Figures Figures 2– 4 show the best fitting model for each measure of recovery as well as the least squares means estimated for each 5-year interval since widowhood (described earlier). Although the total sample contained respondents widowed as little as a few months to as long as 64 years prior to the survey, small cell sizes

Table 4 Linear and Negative Exponential Regressions Relating Indicators of Continuing Involvement and Emotional Resolution to the Duration of Time Since Widowhood Number of years to % of asymptotic level

Level of dependent variable Dependent variable & form of relationship Frequency of memories Linear Negative exponential Frequency of conversations Negative exponential Positive affectb Null Negative affectb Negative exponential Frequency of anniversary reactions Negative exponential Duration of anniversary reactionsc Null Intensity of anniversary reactionsc Negative exponential

Speed of change

SE

Initial

SE

90% asymptotic

SE

25%

50%

75%

90%

⫺.89* .03†

.11 .02

76.3* 80.5*

2.0 3.0

34.5*

8.6

8.9

21.4

42.9

71.3a

.06*

.02

69.5*

3.4

22.8*

3.1

4.7

11.4

22.8

37.9

60.2*

1.3

0 .18*

.08

61.4*

5.1

39.8*

1.6

1.6

3.8

7.6

12.6

.04*

.00

66.0*

3.1

6.6*

1.4

6.7

16.1

32.2

53.5

19.1*

1.2

78.5*

5.8

50.4*

1.9

0.9

2.1

4.2

7.0

0 .32*

.11

Note. For linear results, rates are in units per year. All analyses adjust for sex, race, education, age of respondent, age of spouse, and number of children at time of widowhood; whether the death was expected or not; whether the death was due to murder, accident, or suicide; and whether the respondent became remarried. a Projected value is beyond the range of the data. b Total n ⫽ 697. c Total n ⫽ 484. † p ⬍ .10. * p ⬍ .05.

GRIEF REACTIONS TO SPOUSAL LOSS

487

Table 5 Linear and Negative Exponential Regressions Relating Indicators of Meaning Finding to the Duration of Time Since Widowhood Number of years to % of asymptotic level

Level of dependent variable Dependent variable & form of relationship View death as meant to be Linear Negative exponential View death as something that just happens Linear Negative Exponential Currently searching for meaning?a Negative exponential If searched, found meaning?a Null View death as senseless and unfair Linear Negative exponential View of spouse as better off now Null

Speed of change

SE

Initial

SE

90% asymptotic

SE

25%

50%

75%

90%

.60* .02*

.16 .01

57.7* 56.0*

2.7 3.4

95.6*

2.7

14.2

34.1

68.3b

113.4b

.65* .03*

.14 .01

66.0* 64.0*

2.3 3.2

96.4*

1.8

9.6

23.1

46.1

76.8b

.12*

.05

60.1*

6.5

22.2*

2.4

2.5

6.0

12.0

19.9

25.9*

1.7

41.4* 44.4*

2.6 4.1

20.3*

8.0

7.3

17.6

35.2

58.5

55.3*

1.5

0 ⫺.50* .03

.15 .04

0

Note. For linear results, rates are in units per year. All analyses adjust for sex, race, education, age of respondent, age of spouse, and number of children at time of widowhood; whether the death was expected or not; whether the death was due to murder, accident, or suicide; and whether the respondent became remarried. a Total n ⫽ 313. b Projected value is beyond the range of the data. * p ⬍ .05.

rizes the goodness-of-fit results, whereas Table 4 summarizes the fitted models. For frequency of thoughts and memories about the deceased spouse, both the linear and negative exponential models showed equal improvement in fit over the null model. For simplicity, we 2 chose to graph only the negative exponential results (⌬Radj ⫽ .0477, p ⬍ .001), as shown in Figure 2(i). These results show a moderate decline in thoughts and memories as a function of time since widowhood. Respondents who were recently widowed reported having thoughts and memories of their loved one approximately two to three times per week (corresponding to 80.5 units). The 90% asymptotic value (34.5 units) corresponds to thoughts or memories about once per month, and the number of years widowed corresponding to this value is 71.3 years. This value is beyond the observed range of our data (0 – 64 years postwidowhood).

limited the interpretability of the least squares means analysis to widows of 35 years or less. For this reason, Figures 2– 4 depict only the first 35 years since widowhood, although all analyses made use of the entire widowed sample. Also, each of these figures has two ordinate axes, one showing the 0 –100 scale of the dependent measure and one showing the actual verbal labels the respondents were given.

Continuing Involvement and Emotional Resolution We first present results for measures of continuing involvement with the lost loved one, such as the frequency of memories and conversations about one’s spouse, and measures of emotional resolution, such as positive and negative feelings from thinking and talking about the loss and the frequency, duration, and intensity of feeling upset with reminders of the loss. Table 2 summa-

Table 6 Linear and Negative Exponential Regressions Relating Perceptions of Personal Growth to the Duration of Time Since Widowhood Number of years to % of asymptotic level

Level of dependent variable Dependent variable & form of relationship Increased self-confidence Linear Negative exponential Stronger person as a result Null

Speed of change

SE

Initial

SE

90% asymptotic

SE

25%

50%

75%

90%

.34* .16

.14 .12

65.6* 59.0*

2.4 5.4

73.0*

1.5

1.8

4.3

8.6

14.2

65.8*

1.3

0

Note. Rates are in units per year. All analyses adjust for sex, race, education, age of respondent, age of spouse, and number of children at time of widowhood; whether the death was expected or not; whether the death was due to murder, accident, or suicide; and whether the respondent became remarried. * p ⬍ .05.

488

CARNELLEY, WORTMAN, BOLGER, AND BURKE

The negative exponential model provided the best fit in the case 2 of the frequency of conversations about the deceased spouse (⌬Radj ⫽ .0911, p ⬍ .001). The relevant fitted line is displayed in Figure 2(ii). Recently widowed respondents reported having conversations about their loved one approximately once a week (69.5 units), and respondents at the 90% asymptotic value reported conversations occurring less than once a month (22.8 units). This latter value occurs for respondents at 37.9 years postwidowhood. Turning now to indicators of emotional resolution, for positive affect, neither the linear nor the negative exponential model improved the fit over the null model. Respondents reported experiencing happy feelings when they thought or talked about their spouse between sometimes and often (60.2 units), and this level did not vary as a function of years since widowhood. Recently widowed respondents experienced negative feelings about as often as positive ones when thinking or talking about their spouse (61.4 units), but negative feelings showed a negative exponential de2 crease over time (⌬Radj ⫽ .0259, p ⬍ .001). Respondents widowed for 12.6 years reported experiencing negative affect between sometimes and rarely (39.8 units), which corresponds to 90% of the distance to the asymptote. Figure 2(iii) shows the results for both positive and negative affect. Taken together, it seems that widowed persons’ thoughts and conversations about their spouses become more pleasant overall as the death becomes more distant. Table 4 also summarizes the findings regarding three questions designed to probe the frequency, duration, and intensity of what are typically called anniversary reactions in the bereavement literature. Initially, respondents experienced such reactions between sometimes and frequently (66.0 units); however, these reactions showed a negative exponential decline as a function of years since 2 widowhood (⌬Radj ⫽ .1192, p ⬍ .001). This relationship is plotted in Figure 2(iv). Respondents widowed for 53.5 years corresponded to the 90% asymptotic value of 6.6 (almost never), suggesting that anniversary reactions may essentially disappear after several decades of widowhood. The duration of these reactions seems not to vary systematically as a function of the number of years since widowhood, as neither the linear nor the negative exponential model showed improved fit over the null model. Respondents reported that these reactions lasted a few hours or less (19.1 units), as shown in Figure 2(v). However, the intensity of these reactions did seem to vary as a function of time since widowhood. Recently widowed respondents reported quite intense anniversary reactions (78.5 units), but the intensity showed a negative exponential de2 cline over time (⌬Radj ⫽ .0543, p ⬍ .001). Respondents widowed for 7.0 years represent the 90% asymptotic value of 50.4 units, or somewhat intense. This relationship is shown in Figure 2(vi).

Finding Meaning in the Loss To determine whether there was a relationship between time since the loss and cognitive resolution of the loss, respondents were asked to indicate whether they agreed with the statement “I don’t question my spouse’s death because it was meant to be” and the statement “I don’t worry about finding meaning in my spouse’s death because these things just happen.” As Table 5 illustrates, respondents who had lost a spouse recently showed considerable agreement with both of these statements, rating them as mostly true. In both cases, as indicated by Table 3(i), the increase in agreement with these statements over time was described equally

well by the linear and negative exponential models. To simplify, only the negative exponential models are shown in Figures 3(i) and 2 2 3(ii) (⌬Radj ⫽ .0151, p ⬍ .001, and ⌬Radj ⫽ .0250, p ⬍ .001, respectively). These figures show that agreement with these statements increases quite steadily as a function of time since widowhood, with the 90% asymptotic levels corresponding to mostly true and occurring many decades later, as indicated in Table 5. Consistent with these findings, when asked whether they had ever found themselves searching to make sense or find meaning in the loss, 59% of the widowed respondents said that they had never done this since the loss. The 41% of respondents who did report having searched for meaning in the death were further asked to indicate whether they had done so in the past 3 months. According to the negative exponential model, those respondents who had endured the loss most recently reported actively searching for meaning between sometimes and frequently (60.1 units). This variable showed a 2 significant decline with time (⌬Radj ⫽ .0754, p ⬍ .001), such that those who had experienced the loss many years ago reported that they had rarely searched for meaning in the past 3 months, with a 90% asymptotic level of 22.2. This value corresponds to respondents who had been widowed for 19.9 years. Figure 3(iii) shows this relationship. When asked if they had ever found meaning in their spouse’s death, respondents who had searched for meaning indicated that they had made a little sense of it (25.9 units). The extent to which respondents reported finding meaning in the death did not vary as a function of time since widowhood, as neither the linear nor the negative exponential model showed improved fit over the null model. This relationship is represented in Figure 3(iv). Respondents were also asked to express their agreement or disagreement with the statements that they felt their spouse’s death was senseless and unfair and that their spouse was better off than if he or she had lived longer. For the question of whether the death was senseless, the linear and negative exponential models showed equivalent improvement in fit over the null model, as seen in Table 3. For simplicity, only the negative exponential relationship is shown in Figure 3(v). Recently widowed respondents indicated that this statement was between somewhat and mostly true (44.4 units), and agreement with this statement decreased as a function of time since widowhood. Respondents who had been widowed for 58.5 years represented the 90% asymptotic value of 20.3 units (between not at all and somewhat true). The extent to which respondents agreed that their spouse was better off now did not seem to vary as a function of time since widowhood. Neither the linear nor the negative exponential model showed improved fit over the null model, as shown in Table 3. On average, respondents rated this statement between somewhat and mostly true (55.3 units). The zero-slope line implied by the null model is shown in Figure 3(vi).

Personal Growth Responses to the questions designed to assess perception of positive growth following the loss are presented in Table 6. For the question of whether the respondent felt an increase in selfconfidence as a result of managing the loss, both the linear and negative exponential models showed predictive improvement over 2 2 the null model (⌬Radj ⫽ .0058, p ⫽ .015, and ⌬Radj ⫽ .0067, p ⫽

GRIEF REACTIONS TO SPOUSAL LOSS

.022, respectively). Recently widowed individuals said it was mostly true (59.0 units) that they had gained self-confidence through managing alone; agreement with this statement increased as a function of time since widowhood, with those widowed 14.2 years representing the 90% asymptotic value of 73.0 units (between mostly true and very true). Finally, widowed respondents were asked to rate their agreement with the statement that they were stronger as a result of losing their spouses. As seen in Table 3, neither the linear nor the negative exponential model showed improved fit over the null model. Respondents found this statement to be mostly true, and agreement did not differ as a function of time since widowhood. The zero-slope line implied by the null model is shown in Figure 4(ii).

Discussion In this article, we have aimed to chart the time course of grief by focusing on continued involvement with the deceased spouse, emotional resolution, meaning finding, and feelings of personal growth. We have shown evidence that the widowed continue to talk, think, and feel emotions about their lost spouse many years (sometimes decades) later.

Continuing Involvement and Emotional Resolution Studies have rarely focused on the length of time that people continue to experience memories and have conversations about their deceased spouse beyond the first 4 years after the loss. Such behaviors are generally regarded as signs of involvement with or attachment to the lost loved one. Our data suggest that such behavior is normal. Although memories and conversations decrease with time, they take many decades to reach their lowest level. As long as 20 years after the loss, the typical respondent still thought about his or her spouse once every week or two and had a conversation about him or her, on average, once a month. Past research has shown a decrease in thoughts about the lost spouse from 18 months to 48 months (Boerner, Wortman, & Bonanno, 2005), and our findings extend this. An important direction for future research is to examine the reactions of others in the bereaved person’s social network to discussions of a lost spouse so long after the loss. Research has shown that others sometimes try to stop the bereaved from discussing feelings associated with the loss and find these types of discussions uncomfortable (Ingram, Jones, & Smith, 2001; Lehman, Ellard, & Wortman, 1986). Further information is needed regarding the functions, costs, and benefits of different types of continuing involvement for long-term well-being. The extent to which memories and conversations are beneficial may depend on whether the bereaved focuses on negative or positive aspects. Research has examined this relatively shortly after the loss and has shown that rumination, or focusing on the distressing aspects of memories, is associated with higher levels of depression 6 months after the bereavement (NolenHoeksema, Parker, & Larson, 1994). Our recently bereaved respondents reported feeling negatively after thinking or talking about their spouse between sometimes and often. This decreased with time, such that those bereaved about 12.5 years reported negative affect between sometimes and rarely. Of course, feeling sad or upset after thinking or talking about one’s spouse may not

489

be evidence of rumination. Memories may spontaneously be upsetting and sad. Negative thoughts may not be easily controlled and sometimes can be intrusive. Future research should examine the long-term consequences of rumination, negative intrusive thoughts, and upsetting memories for well-being. In contrast, focusing on the positive aspects of memories of the deceased can be associated with good adjustment (W. Stroebe & Schut, 2001). Our recently bereaved participants and those bereaved for decades reported that positive affect sometimes resulted from memories and conversations; the frequency did not change with time since loss. It is interesting that although the frequency of thoughts that result in feeling upset decreases with time, the frequency of thoughts that result in feeling happy does not. These positive thoughts may serve to maintain a bond with the deceased, as suggested by Weiss (2001). Future research should focus on the benefits of positive memories and the extent to which they are linked to negative thoughts and feelings. For example, does one get to a point in time at which fond memories no longer trigger both happiness and sadness, leading to a bittersweet feeling, but rather trigger only other positive thoughts and feelings? Are there individual differences in the extent to which positive memories prime negative thoughts and feelings? Further directions for future research could examine how conversations, memories, and adjustment are related in the long term, decades after the loss. On average, our bereaved participants sometimes or rarely experienced anniversary reactions (i.e., experienced painful thoughts about the loss or became upset in the face of reminders) after several decades. The frequency of anniversary reactions sharply decreased over time from frequently to rarely over about 2.5 decades. It takes about 53 years for the frequency of anniversary reactions to nearly disappear. The intensity of these reactions drops quickly in the first few years and then slows. However, the duration of these reactions, a few hours, does not change with time since bereavement. It is common for anniversary reactions to be experienced at least sometimes and at somewhat intense levels for a few hours or less for 7– 8 years postloss. Rando (1993) has maintained that mourners who show these reactions are often inappropriately labeled as showing pathological mourning. According to Archer (1999), anniversary reactions are commonly regarded by the bereaved as a setback in their recovery and may fuel their fears that they will never be able to master the loss. It is important for clinicians to prepare clients for the possibility that such reactions may emerge and to normalize such feelings once they have occurred. Anniversaries may provide an opportunity for the bereaved to have a time-limited experience of memories and conversations about the lost spouse; this may be particularly true for cultural practices that have ritualized anniversaries, such as yahrzeit in the Jewish tradition.4 Our results suggest that anniversary reactions can occur decades after the loss (albeit infrequently) and should not be pathologized. The long-term responses to the loss that were documented are all the more striking when one considers that the respondents in our study most affected by the loss of a spouse—those who had died, were too ill to be interviewed, or were institutionalized—are by definition excluded from a national probability sample of this sort. In addition, 15.9% of our ever-widowed sample were remar4

We thank an anonymous reviewer for suggesting this interpretation.

490

CARNELLEY, WORTMAN, BOLGER, AND BURKE

ried at the time of interview, and we might have expected these individuals to show better adjustment. Moreover, these responses are averaged over bereaved spouses who suffered many different kinds of loss. On the basis of what is known about risk factors for recovery, we would expect that the consequences of the loss would last even longer when the death was sudden and unexpected, was untimely, or occurred because of someone else’s negligence (see W. Stroebe & Stroebe, 1987, or Wortman & Silver, 1990). We statistically controlled for many of these risk factors in the present analyses. It is a task for future research to examine the different long-term trajectories for people who differ on levels of these and other risk factors. Bonanno et al. (2002) have successfully examined some of these issues in an article that focused on the relatively short-term (6 and 18 months postloss) trajectory of grief reactions; this should be extended by looking at risk factors and long-term adjustment.

Finding Meaning in the Loss Over time, many individuals who lose a spouse may be able to achieve a state of cognitive resolution concerning the loss. Those individuals who had experienced the loss most recently expressed considerable agreement with statements indicating that they did not question the loss because it was meant to be and because such things just happen; this agreement increased with time. Also, the majority (59%) indicated that they accepted the loss and had never searched for meaning regarding why the death had occurred; this is consistent with past research that focused on the 18 months after the loss (e.g., Bonanno et al., 2004). Among those who searched for meaning, the recently bereaved did so between sometimes and frequently, whereas those bereaved for a couple of decades did so rarely; this is consistent with findings of Boerner et al. (2005), who found that search for meaning decreased with time from 18 to 48 months postloss. Among those individuals who searched for meaning, there was no significant relationship between time since the loss and ability to find meaning; respondents found only a little meaning in the loss. Those individuals who had experienced the loss most recently showed some agreement that the loss was senseless and unfair, whereas those who had suffered the loss long ago expressed less agreement with the statement. Respondents showed moderate agreement with the statement that their spouse was better off dead, and this did not change with time. Taken together, these findings suggest that if individuals are going to resolve the loss of their spouse, they will do so relatively soon after the loss. Consistent with other researchers (Davis & NolenHoeksema, 2001; Davis et al., 2000), we found that additional time does not appear to be helpful in reaching a state of resolution.

Personal Growth Consistent with past research (e.g., Schaefer & Moos, 2001; Thomas et al., 1998), our results provide support for the notion that the loss of a spouse can be accompanied by the perception of personal growth. With time, respondents experienced an increase in self-confidence. In addition, our bereaved respondents indicated that it was mostly true that they had become a stronger person as a result of the loss; this judgment did not change with time. In subsequent research, it will be important to delineate the process through which positive changes occur. In a heterogeneous sample

of bereaved individuals, Gamino, Sewell, and Easterling (2000) found that personal growth was associated with having a chance to say goodbye, spirituality, spontaneous positive memories of the deceased, and finding something positive resulting from the loss. Positive changes may also stem from the successful assumption of challenging tasks that were formerly handled by the spouse (cf. Umberson, Wortman, & Kessler, 1992).

Strengths, Limitations, and Future Directions Because the present study was based on results from a large, national sample of the conjugally bereaved, it afforded a unique opportunity to assess the enduring consequences of such a loss. However, because the present article has focused on a crosssectional comparison, we cannot draw firm conclusions about changes in recovery over time. Therefore, results suggesting that it took the bereaved as long as 50 –70 years to reach their lowest level on certain variables should be interpreted with caution. A potential problem with this cross-sectional data is that the length of time since widowhood may be confounded with other variables, such as whether the loss was untimely. We addressed this problem by controlling for variables that could influence whether the death was timely or not and that could therefore influence the course of adjustment to the loss and by controlling for the age of the participant and spouse, whether the death was expected or not, and whether the death was due to murder, accident, or suicide. In addition, it is possible that length of time widowed is confounded with cohort. Those who lost their spouse at one point in history (e.g., during World War II) might face different challenges than those who lost their spouse at another point in history (e.g., during the Vietnam War or during the mid-1980s). However, we tried to adjust for cohort effects statistically by controlling for age of the widowed. Another limitation of cross-sectional designs is that they do not lend themselves to the identification of preloss coping resources, such as social support or personality, that may influence the time course of recovery. The aforementioned issues can best be resolved through a prospective, longitudinal study that assesses individuals before and at several intervals following the loss of their spouse (e.g., Carr, Nesse, & Wortman, 2006), with a focus on long-term recovery. It is important to ask whether the design used in this study may have led respondents to exaggerate the length of time that they had thoughts, conversations, and feelings about their deceased spouse. We believe it is unlikely that our respondents exaggerated their reports because, as noted above, the study focused on a wide variety of life experiences and was not presented to respondents as a study on bereavement. One might consider the age of our data a limitation (as stated earlier, data were collected in 1986). However, given the consistency of our findings with the findings of more recent research, we do not think that the age of our data makes them unrepresentative. Although we obtained an ethnically diverse sample, it was beyond the scope of the current study to examine ethnic differences in the trajectory of reactions to spousal loss. It was also beyond the scope of this study to examine potential moderating effects of other variables such as respondent’s sex and remarriage after the loss. Finally, relatively little information was gathered regarding the cause of death. Consequently, we were unable to

GRIEF REACTIONS TO SPOUSAL LOSS

investigate how specific modes of death might influence the pattern of results. These are directions for future research.

Conclusion Probably the most frequently asked questions about grief and mourning concern duration. Despite paying lip service to the notion that everyone’s mourning is individual and that a complex of factors affect duration and course, almost everyone—from the mourner herself, to students, to caregivers, to media reporters—invariably returns to the question, “How long does mourning take?” (Rando, 1993, pp. 60 – 61)

The present results suggest that the grieving process following the loss of a long-term spouse can continue for many years. Even after decades have passed, it is common to have memories and conversations about one’s spouse, to sometimes become sad and upset as a result, and at times to experience distress when reminders, such as the date of the spouse’s death, are encountered. Hopefully, greater awareness of these findings can lead to better interventions and a more compassionate view of those who are attempting to come to terms with the loss of their spouse.

References Aiken, L. R. (1991). Death, dying, and bereavement (2nd ed.). Boston: Allyn & Bacon. Arbuckle, N. W., & de Vries, B. (1995). The long-term effects of later life spousal and parental bereavement on personal functioning. Gerontologist, 35, 637– 647. Archer, J. (1999). The nature of grief: The evolution and psychology of reactions to loss. London: Routledge. Boerner, K., Wortman, C. B., & Bonanno, G. A. (2005). Resilient or at risk? A 4-year study of older adults who initially showed high or low distress following conjugal loss. Journal of Gerontology: Series B. Psychological Sciences and Social Sciences, 60, 67–73. Bonanno, G. A., Wortman, C. B., Lehman, D. R., Tweed, R. G., Haring, M., Sonnega, J., et al. (2002). Resilience to loss and chronic grief: A prospective study from preloss to 18-months postloss. Journal of Personality and Social Psychology, 83, 1150 –1164. Bonanno, G. A., Wortman, C. B., & Nesse, R. M. (2004). Prospective patterns of resilience and maladjustment during widowhood. Psychology and Aging, 19, 260 –271. Bowlby, J. (1969). Attachment and loss: Vol. 1. Attachment. London: Hogarth. Bowlby, J. (1973). Attachment and loss: Vol. 2. Separation—Anxiety and anger. New York: Basic Books. Bowlby, J. (1980). Attachment and loss: Vol. 3. Loss—Sadness and depression. New York: Basic Books. Calhoun, L. G., & Tedeschi, R. G. (1990). Positive aspects of critical life problems: Recollections of grief. Omega: Journal of Death and Dying, 20, 265–272. Carnelley, K. B., Wortman, C. B., & Kessler, R. C. (1999). The impact of widowhood on depression: Findings from a prospective survey. Psychological Medicine, 29, 1111–1123. Carr, D., Nesse, R., & Wortman, C. B. (Eds.). (2006). Spousal bereavement in late life. New York: Springer Publishing. Carver, C. S., & Scheier, M. F. (1982). Control theory: A useful conceptual framework for personality-social, clinical, and health psychology. Psychological Bulletin, 92, 111–135. Cassidy, J., & Shaver, P. R. (Eds.). (1999). Handbook of attachment: Theory, research, and clinical applications. New York: Guilford Press. Cohen, P., Cohen, J., Aiken, L. S., & West, S. G. (1999). The problem of

491

units and the circumstance of POMP. Multivariate Behavioral Research, 34, 31–34. Davis, C. G., & Nolen-Hoeksema, S. (2001). Loss and meaning: How do people make sense of loss? American Behavioral Scientist, 44, 726 –741. Davis, C. G., Nolen-Hoeksema, S., & Larson, J. (1998). Making sense of loss and benefiting from the experience: Two construals of meaning. Journal of Personality and Social Psychology, 75, 561–574. Davis, C. G., Wortman, C. B., Lehman, D. R., & Silver, R. C. (2000). Searching for meaning in loss: Are clinical assumptions correct? Death Studies, 24, 497–540. Derogatis, L. R. (1983). SCL-90-R administration, scoring, and procedures manual II (2nd ed.). Towson, MD: Clinical Psychometric Research. Field, N. P., Gal-Oz, E., & Bonanno, G. A. (2003). Continuing bonds and adjustment at 5 years after the death of a spouse. Journal of Consulting and Clinical Psychology, 71, 110 –117. Field, N. P., Nichols, C., Holen, A., & Horowitz, M. J. (1999). The relation of continuing attachment to adjustment in conjugal bereavement. Journal of Consulting and Clinical Psychology, 67, 212–218. Frijda, N. H. (1986). The emotions. Cambridge, England: Cambridge University Press. Frijda, N. H. (1988). The laws of emotion. American Psychologist, 43, 349 –358. Fry, P. S. (1998). Spousal loss in late life: A 1-year follow-up of perceived changes in life meaning and psychosocial functioning following bereavement. Journal of Personal and Interpersonal Loss, 3, 369 –391. Gamino, L. A., Sewell, K. W., & Easterling, L. W. (2000). Scott and White Grief Study—Phase 2: Toward an adaptive model of grief. Death Studies, 24, 633– 660. Glick, I. O., Weiss, R. S., & Parkes, C. M. (1974). The first year of bereavement. New York: Wiley. Gorer, G. (1967). Death, grief and mourning. Garden City, NY: Anchor Books. Hardt, D. V. (1978 –1979). An investigation of the stages of bereavement. Omega: Journal of Death and Dying, 9, 279 –285. House, J. S., Kessler, R. C., Herzog, A. R., Mero, R. P., Kinney, A. M., & Breslow, M. J. (1990). Age, socioeconomic status, and health. Milbank Quarterly, 68, 383– 411. Ingram, K. M., Jones, D. A., & Smith, N. G. (2001). Adjustment among people who have experienced AIDS-related multiple loss: The role of unsupportive social interactions, social support, and coping. Omega: Journal of Death and Dying, 43, 287–309. Janoff-Bulman, R. (1992). Shattered assumptions: Towards a new psychology of trauma. New York: Free Press. Janoff-Bulman, R., & Frantz, C. M. (1997). The impact of trauma on meaning: From meaningless world to meaningful life. In M. Power & C. R. Brewin (Eds.), The transformation of meaning in psychological therapies (pp. 91–106). Chichester, England: Wiley. Kalton, G., & Flores-Cervantes, I. (2003). Weighting methods. Journal of Official Statistics, 19, 81–97. Klass, D., Silverman, P. R., & Nickman, S. L. (1996). Continuing bonds: New understandings of grief. Washington, DC: Taylor & Francis. Klass, D., & Walter, T. (2001). Process of grieving: How bonds are continued. In M. S. Stroebe, R. O. Hansson, W. Stroebe, & H. Schuts (Eds.), Handbook of bereavement research: Consequences, coping, and care (pp. 431– 448). Washington, DC: American Psychological Association. Lazarus, R. S. (1991). Emotion and adaptation. New York: Oxford University Press. Lazarus, R. S. (1999). Stress and emotion: A new synthesis. New York: Springer. Lehman, D. R., Ellard, J. H., & Wortman, C. B. (1986). Social support for the bereaved: Recipients’ and providers’ perspectives on what is helpful. Journal of Consulting and Clinical Psychology, 54, 438 – 446. Lehman, D. R., Wortman, C. B., & Williams, A. F. (1987). Long-term

492

CARNELLEY, WORTMAN, BOLGER, AND BURKE

effects of losing a spouse or child in a motor vehicle crash. Journal of Personality and Social Psychology, 52, 218 –231. Lessler, J. T., & Kalsbeek, W. D. (1992). Nonsampling errors in surveys. New York: Wiley. Lieberman, M. (1996). Doors close, doors open: Windows, grieving and growing. New York: Putnam. Lopata, H. Z. (1973). Self-identity in marriage and widowhood. Sociological Quarterly, 14, 407– 418. Marris, P. (1958). Widows and their families. London: Routledge & Kegan Paul. McGuigan, F. J. (1994). Biological psychology: A cybernetic science. Upper Saddle River, NJ: Prentice-Hall. Miller, E. D., & Omarzu, J. (1998). New directions in loss research. In J. Harvey (Ed.), Perspectives on loss: A sourcebook (pp. 3–20). Washington, DC: Taylor & Francis. Nolen-Hoeksema, S., Parker, L. E., & Larson, J. (1994). Ruminative coping with depressed mood following loss. Journal of Personality and Social Psychology, 67, 92–104. Parkes, C. M., & Weiss, R. S. (1983). Recovery from bereavement. New York: Basic Books. Rando, T. A. (1993). Treatment of complicated mourning. Champaign, IL: Research Press. Rawlings, J. O., Pantula, S. G., & Dickey, D. A. (1998). Applied regression analysis: A research tool (2nd ed.). New York: Springer-Verlag. SAS Institute. (2004). SAS/STAT 9.1 user’s guide. Cary, NC: Author. Schaefer, J. A., & Moos, R. H. (2001). Bereavement experiences and personal growth. In M. S. Stroebe, R. O. Hansson, W. Stroebe, & H. Schut (Eds.), Handbook of bereavement research: Consequences, coping, and care (pp. 145–167). Washington, DC: American Psychological Association. Shaver, P. R., & Tancredy, C. M. (2001). Emotion, attachment, and bereavement: A conceptual commentary. In M. S. Stroebe, R. O. Hansson, W. Stroebe, & H. Schut (Eds.), Handbook of bereavement research: Consequences, coping, and care (pp. 63– 88). Washington, DC: American Psychological Association. Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. New York: Oxford University Press. Stephenson, J. S. (1985). Death, grief, and mourning. New York: Free Press. Stroebe, M. S., Hansson, R. O., Stroebe, W., & Schut, H. (2001a). Future directions for bereavement research. In M. S. Stroebe, R. O. Hansson, W. Stroebe, & H. Schut (Eds.), Handbook of bereavement research: Consequences, coping, and care (pp. 741–766). Washington, DC: American Psychological Association. Stroebe, M. S., Hansson, R. O., Stroebe, W., & Schut, H. (Eds.). (2001b).

Handbook of bereavement research: Consequences, coping, and care. Washington, DC: American Psychological Association. Stroebe, M. S., Hansson, R. O., Stroebe, W., & Schut, H. (2001c). Introduction: Concepts and issues in contemporary research on bereavement. In M. S. Stroebe, R. O. Hansson, W. Stroebe, & H. Schut (Eds.), Handbook of bereavement research: Consequences, coping, and care (pp. 3–22). Washington, DC: American Psychological Association. Stroebe, M. S., & Stroebe, W. (1989). Who participates in bereavement research? A review and empirical study. Omega: Journal of Death and Dying, 20, 1–29. Stroebe, M. S., & Stroebe, W. (1991). Does “grief work” work? Journal of Consulting and Clinical Psychology, 59, 479 – 482. Stroebe, W., & Schut, H. (2001). Models of coping with bereavement: A review. In M. S. Stroebe, R. O. Hansson, W. Stroebe, & H. Schut (Eds.), Handbook of bereavement research: Consequences, coping, and care (pp. 375– 403). Washington, DC: American Psychological Association. Stroebe, W., & Stroebe, M. S. (1987). Bereavement and health: The psychological and physical consequences of partner loss. New York: Cambridge University Press. Thomas, L. E., DiGiulio, R. C., & Sheehan, N. W. (1988). Identity loss and psychological crisis in widowhood: A re-evaluation. International Journal of Aging and Human Development, 26, 225–239. Umberson, D., Wortman, C. B., & Kessler, R. (1992). Widowhood and depression: Explaining gender differences in vulnerability. Journal of Health and Social Behavior, 33, 10 –24. United States Bureau of the Census. (1985). Statistical abstracts of the U.S., 1986. Washington, DC: Government Printing Office. Weiss, R. S. (1988). Loss and recovery. Journal of Social Issues, 44, 37–52. Weiss, R. S. (2001). Grief, bonds, and relationships. In M. S. Stroebe, R. O. Hansson, W. Stroebe, & H. Schut (Eds.), Handbook of bereavement research: Consequences, coping, and care (pp. 47– 62). Washington, DC: American Psychological Association. Worden, J. W. (2002). Grief counseling and grief therapy (3rd ed.). New York: Springer Publishing Company. Wortman, C. B., & Silver, R. C. (1990). Successful mastery of bereavement and widowhood: A lifecourse perspective. In P. B. Baltes & M. M. Baltes (Eds.), Successful aging: Perspectives from the behavioral sciences (pp. 225–264). New York: Cambridge University Press. Zisook, S., Devaul, R. A., & Click, M. A. (1982). Measuring symptoms of grief and bereavement. American Journal of Psychiatry, 139, 1590 – 1593.

Received September 28, 2005 Revision received January 18, 2006 Accepted January 30, 2006 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 493–512

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.493

What Do People Value When They Negotiate? Mapping the Domain of Subjective Value in Negotiation Jared R. Curhan

Hillary Anger Elfenbein

Massachusetts Institute of Technology

University of California, Berkeley

Heng Xu Massachusetts Institute of Technology Four studies support the development and validation of a framework for understanding the range of social psychological outcomes valued subjectively as consequences of negotiations. Study 1 inductively elicited and coded elements of subjective value among students, community members, and practitioners, revealing 20 categories that theorists in Study 2 sorted into 4 underlying subconstructs: Feelings About the Instrumental Outcome, Feelings About the Self, Feelings About the Negotiation Process, and Feelings About the Relationship. Study 3 proposed a new Subjective Value Inventory (SVI) and confirmed its 4-factor structure. Study 4 presents convergent, discriminant, and predictive validity data for the SVI. Indeed, subjective value was a better predictor than economic outcomes of future negotiation decisions. Results suggest the SVI is a promising tool to systematize and encourage research on subjective outcomes of negotiation. Keywords: negotiation, social psychological outcomes, satisfaction, trust, self, justice

negotiation as an economically motivated or strategic interaction best practiced by rational, unemotional actors—perhaps as a result of the origins of the field in the study of choice and expected utility within economics (for reviews, see Bazerman, Curhan, & Moore, 2001; Carnevale & Pruitt, 1992; Thompson, 1990). This article presents the results of a large-scale investigation designed to add to this body of research by providing a comprehensive framework of subjective outcomes in negotiation. The goal is both to contribute to the advancement of theory and to provide a tool for researchers to study subjective value in negotiations with a similar level of precision as that with which more tangible objective value has been studied for decades. Although objective behavioral outcomes clearly represent an important aspect of negotiation performance, researchers have long criticized the relative lack of attention paid to social psychological measures in negotiation. As early as 1975, Rubin and Brown argued that “the time has come to move such measures . . . out of the dark recess known as ‘supplementary analysis’ back into the forefront of researchers’ attention, where they belong” (p. 297). Since the 1960s and 1970s, there has been a gradual increase in the use of perceptual and attitudinal measures as dependent variables within studies of negotiation, but even in the recent 10-year period from 1993 to 2002, such measures were included in only 25% of studies (Mestdagh & Buelens, 2003). Other studies have incorporated social psychological factors as the predictors of economic outcomes rather than as consequential outcomes themselves (Bazerman, Curhan, Moore, & Valley, 2000). The current article attempts to fill this gap with a series of studies mapping the domain of subjective value in negotiation, using a combination of methods to explore and categorize the range of psychological factors that people value as the consequences of their negotiations. We also present the development and initial validation of a survey

Negotiation—a decision-making process in which people mutually decide how to allocate scarce resources (Pruitt, 1983)— on its face appears to involve primarily the exchange of tangible goods and services, yet it also leaves an inherently psychological imprint on those involved. Recent research has incorporated subjective, social psychological factors into the study of negotiation, challenging the rationalist assumption that has tended to portray

Jared R. Curhan and Heng Xu, Sloan School of Management, Massachusetts Institute of Technology; Hillary Anger Elfenbein, Organizational Behavior and Industrial Relations, Haas School of Business, University of California, Berkeley. Preparation of this article was supported by the Mitsui Career Development Faculty Chair held by Jared R. Curhan and National Institute of Mental Health Behavioral Science Track Award for Rapid Transition 1R03MH071294-1 held by Hillary Anger Elfenbein. We are indebted to Corinne Bendersky, Joel Cutcher-Gershenfeld, Gordon Kaufman, Laura Kray, Robert McKersie, Nancy Peace, and Phyllis Segal for collecting data in their classrooms and workshops. For helpful comments, we thank Paul Berger, Joel Brockner, John Carroll, Rachel Croson, Martin Evans, Roberto Fernandez, Adam Galinsky, Richard Gonzalez, James Gross, Sheena Iyengar, Jerome Kagan, David Kenny, Thomas Kochan, Donald Lessard, Bertram Malle, Hazel Markus, Victoria Medvec, Steven Mestdagh, Michael Mitchell, Jennifer Mueller, Drazen Prelec, Michele Williams, and Michael Zyphur. For research assistance, we thank Edward Carstensen, Ken Coelho, Zachary Corker, Kate Dowd, Scott Edinburgh, Ray Faith, Marc Farrell, Pooja Gupta, Adnan Qadir, Shayna Schulz, and Philip Sun. Finally, we thank the members of the Program on Negotiation at Harvard University who generously volunteered their time. Correspondence concerning this article should be addressed to Jared R. Curhan, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Room E52-554, Cambridge, MA 02142-1347. E-mail: [email protected] 493

494

CURHAN, ELFENBEIN, AND XU

tool to measure subjective value. The aim is to be as exhaustive as possible, not to supplant related areas of research but rather to organize and pull together topics that often have been studied separately—as diverse, for example, as procedural justice and self-efficacy—and to include them within a broad systematic framework of negotiation outcomes. In doing so, we define the concept of subjective value as the social, perceptual, and emotional consequences of a negotiation.

Social Psychological Outcomes in Negotiation Previous conceptual frameworks of negotiation form a starting point for the current investigation of subjective value, which in turn empirically tests and validates these frameworks. In her 1990 review of research in negotiation, Thompson proposed that negotiation outcomes fall into two broad classes: economic and social psychological. Economic outcomes refer to explicit terms or products of the negotiation, such as whether an agreement has been reached, how much value or joint benefit has been created, and how resources are divided or claimed by the individual parties (see also Nash, 1953). Social psychological measures in negotiation, Thompson theorized, are grounded in social perception and consist of three important elements: perceptions of the bargaining situation, perceptions of the other party, and perceptions of oneself. Although Thompson’s framework includes measures of negotiation process as separate from outcome variables, we argue that a negotiator’s feeling about the process—rather than the process itself—is an aspect of subjective value. Thompson’s (1990) first category concerns perceptions of the bargaining situation. This includes judgments and feelings about the negotiation process, for example, the norms, context, structure and scripts, communication and information sharing, and fairness or justice involved (e.g., Brockner & Wiesenfeld, 1996; Colquitt, Conlon, Wesson, Porter, & Ng, 2001; Lind & Tyler, 1988; Pinkley, 1990; Thibaut & Walker, 1975). Perceptions of the other party, Thompson’s (1990) second category, involve person perception and impression formation applied to one’s negotiation counterpart. Such processes result in feelings that can be classified as either individual or dyadic—that is, what negotiators think of their counterparts and what they think of their own relationships with those counterparts, respectively. However, in practice the two are dynamically linked and can be difficult to separate. At the individual level, this factor includes the attributions that negotiators make about counterparts on the basis of their behavior (e.g., their ethics, tactics, and strategies) and trait inferences such as the expertise, cooperativeness, friendliness, and resulting reputation of the counterpart (e.g., Fortgang, Lax, & Sebenius, 2003; Morris, Larrick, & Su, 1999; Tinsley, O’Connor, & Sullivan, 2002). At the dyadic level, this factor includes the social relationship, trust, respect, liking, and concern for the other party that develops among negotiation counterparts (e.g., Lewicki, McAllister, & Bies, 1998; Naquin & Paulson, 2003; Pruitt & Rubin, 1986). Thompson’s (1990) third category, perceptions of the self, involves turning the person perception process inward. Negotiators judge their own traits, performance, and worth on the basis of their interactions with others (Snyder & Higgins, 1997), using both their internal awareness of motivations and values as well as observations of their own behavior as if from the outside (Ross, 1977).

Unique to perceptions of the self are issues of self-efficacy, selfenhancement and positive illusions, and self-esteem and maintaining face (e.g., Bandura, 1977; Brown, 1968; Pyszczynski, Greenberg, Solomon, Arndt, & Schimel, 2004; Taylor & Brown, 1994). What takes place in a negotiation can affect negotiators’ attributions about their own skill (Kwon & Weingart, 2004). Selfefficacy, in turn, can influence future negotiation performance (Stevens, Bavetta, & Gist, 1993). White, Tynan, Galinsky, and Thompson (2004) argued that negotiation is an especially sensitive experience for the self because it often involves confrontation and assigning public tangible worth to objects and efforts of personal value. We expand on Thompson’s (1990) framework by highlighting separately an area included within the first category, perceptions of the bargaining situation: a negotiator’s feelings about the final terms of the settlement. At the nexus of objective and subjective value is the subjective feeling of satisfaction with one’s objective outcome. Oliver, Balakrishnan, and Barry (1994) argued that such outcome satisfaction is an affective comparative evaluation of a settlement, with implications for subsequent behavior such as willingness to continue the relationship with one’s counterpart. A negotiator perceives a settlement to be advantageous or disadvantageous via social comparison with respect to prior expectations and the outcomes achieved by other negotiators (e.g., Loewenstein, Thompson, & Bazerman, 1989; Messick & Sentis, 1985; Novemsky & Schweitzer, in press; Straub & Murnighan, 1995). At some level, subjective feelings of success are often the only feedback a negotiator has for his or her performance, given that outside of a classroom exercise one might know the exact dollar value of a deal but rarely the dollar value of the best possible deal that the other side would have accepted or, indeed, the dollar value of deals that would have been achieved by peers in an identical situation.

The Value of Subjective Value Social psychological outcomes of negotiation are not necessarily the consolation prize of a poor bargaining agreement but rather represent an important area of study for at least three reasons. Subjective value can serve as a good in itself, as a negotiator’s intuition about objective outcomes, and as a predictor of future objective value.

A Good in Itself In O. Henry’s classic Christmas story The Gift of the Magi, a young husband and wife facing hard times each sell their most prized possession to buy a gift that is rendered useless by the other’s parallel sacrifice. Yet the reader is left to believe that the couple gained more than it lost from the exchange, even if a rational analysis would conclude that economic value was left on the table. Likewise, negotiators sometimes forfeit or limit opportunities to extract value, either consciously or unconsciously, in deference to relational goals or norms, and doing so might preserve or even strengthen relationships (Curhan, Neale, Ross, & Rosencranz-Engelmann, 2006). Negotiations often take place in the context of ongoing interpersonal relationships among family members, friends, neighbors, colleagues, and long-time business associates. The quality of the relationship can be important beyond

SUBJECTIVE VALUE IN NEGOTIATION

the particular issues at stake and the resources being divided (Gelfand, Smith, Raver, & Nishii, in press). In the absence of a relationship, or even knowledge of a counterpart’s identity, participants in ultimatum bargaining games often make financial trade-offs to preserve their own subjective feelings about fairness to others (see, e.g., Bazerman & Neale, 1992; Camerer & Thaler, 1995; Gu¨th, Schmittberger, & Schwarze, 1982). “Negotiators’ interests can go beyond the obvious and tangible,” Lax and Sebenius (1986) wrote, “Take for example, the almost universal quest for social approval or the simple pleasure one derives from being treated with respect, even in a one-time encounter” (p. 74). More recently, Tyler and Blader’s (2003) group engagement model placed respect in a central role, and indeed, Blount and Larrick (2000) showed that concerns for respect predicted negotiators’ preferences over and above instrumental concerns.1 These findings add to a body of work demonstrating a shared and self-fulfilling myth regarding the value of self-interest as a motivator (Miller, 1999; Miller & Ratner, 1998; Mills, 1940). Although it can be less socially acceptable to discuss motives other than self-interest, they are no less important in driving our behavior.

Negotiator’s Intuition About Objective Outcomes Parties often lack the information and ability to perform a full, accurate, rational analysis of negotiation situations, and consequently they can have perceptions that differ greatly from objective economic analyses (Thompson & Hastie, 1990). How do you ever know if you succeeded in a negotiation? It would be implausible, not to mention uncomfortable, for a real-world negotiation to conclude with a debriefing of parties’ aspirations, targets, and breaking points. In many cases, it would be challenging even to quantify one’s own outcomes and to aggregate across multiple issues. Thus, negotiators generally rely on subjective intuition to evaluate how well they did. If subjective value mirrors intuitions about performance, then it may be a more proximal predictor of future behavior than objective performance itself. Even if the link is not always direct or transparent, behavior is influenced by a person’s perceptions, thoughts, and attitudes rather than the objective reality of a situation (see, e.g., Eagley & Chaiken, 1998). Thus, understanding subjective value could shed light on the motivations and action tendencies of negotiators and the process of learning from experience.

Predictor of Future Objective Value Finally, the subjective value resulting from a negotiation may feed back, positively or negatively, into future economic outcomes. Individuals who increase the subjective value of their counterparts may be able to develop and reap the benefits of more favorable reputations (Croson & Glick, 2001; Fortgang et al., 2003; Goates, Barry, & Friedman, 2003; Tinsley et al., 2002). Increasing one’s own subjective value could increase perseverance and motivation in future negotiations. At the relationship level, the interpersonal rapport developed in Negotiation A might foster concern for the other party, information sharing, and other behaviors critical to the success of Negotiation B (Drolet & Morris, 2000; Mannix, Tinsley, & Bazerman, 1995; O’Connor, Arnold, & Burris, 2005; Pruitt & Rubin, 1986). Indeed, Negotiation B is more likely even to take place if negotiators establish the foundation for

495

a relationship in Negotiation A (Oliver et al., 1994). Furthermore, negotiators need sufficient good will to implement the objective terms of a contract and the so-called social contract for how they work together, communicate, and resolve disputes in the future (Fortgang et al., 2003; Walton, Cutcher-Gershenfeld, & McKersie, 1994). Thus, maintaining good relationships—which might be hindered by extracting all possible economic rewards— can be an effective strategy in maintaining the cooperation necessary for greater returns in the long run. For example, in the prisoner’s dilemma game, the tit-for-tat strategy prevails over other strategies in the long term, even though it does not outperform any given counterpart, because it maintains stable cooperation over longer periods than other strategies (e.g., Axelrod, 1984; Komorita & Parks, 1995). Although subjective value may be a precursor to future objective value, it is important to emphasize that the two frequently diverge as well. This is particularly, but not exclusively, the case in the short term. The subjective satisfaction that one derives from an objective outcome is not a linear function, nor even in some cases is it monotonically increasing (Conlon, Lind, & Lissak, 1989; Kahneman & Tversky, 1979; Northcraft, Brodt, & Neale, 1995). Indeed, experimental manipulations such as increasing or attending to one’s aspirations can drive the two in opposite directions, increasing objective negotiation performance while simultaneously reducing subjective satisfaction (Galinsky, Mussweiler, & Medvec, 2002; Thompson, 1995). Thus, it is worth studying subjective value as a distinct factor in spite of the reciprocal relationship it can have with objective value.

The Value of Measuring Subjective Value Even if the umbrella term subjective value may be new, the concept itself is already woven into the fabric of negotiations research. The current investigation’s contribution is to develop a comprehensive framework and to validate a survey measure of subjective value. Negotiation theorists have not yet agreed on the methods and standards for measuring subjective outcomes (Kurtzberg & Medvec, 1999; Valley, Neale, & Mannix, 1995). Thompson (1990) argued that “comparative analyses of behavior are more difficult when investigators use different measures of performance. Apparently inconclusive results and even contradictory findings may often be traced to different measures of performance” (p. 517). Thus, the current research program has the potential to benefit the field by making findings from different lines of research easier and more meaningful to reconcile. Furthermore, creating a comprehensive, inductive framework has the potential to uncover possible blind spots within negotiations research, revealing fertile areas for future work and contributing to the generation of theory about the role of subjective value in negotiation. This article presents the results of a four-study program of research designed to answer the question “What do people value when they negotiate?” Using a combination of inductive and deductive methods, we engaged participants from conventional student populations as well as community members and negotiation practitioners. We begin by attempting to map the domain of 1

We thank an anonymous reviewer for pointing us to these articles.

CURHAN, ELFENBEIN, AND XU

496

subjective value using an open-ended inductive approach to generate a wide range of elements of value based on participants’ past business and personal negotiations. We continue in the second study by asking negotiation experts to delineate connections among these resulting elements of subjective value, revealing an underlying cognitive map of the construct into four broad factors. The third study uses these elements and factors as a starting point to develop a survey instrument to assess subjective value as a multifaceted perception across a range of negotiation contexts. Finally, the fourth study presents initial evidence for the validity of this instrument by showing its strong convergence with related constructs and lesser correlation with unrelated constructs, its divergence from personality traits, and its ability to predict negotiators’ actual willingness to engage in future relationships with counterparts. These latter studies provide researchers with a systematic tool to include subjective value alongside objective value as a key consequence of negotiations.

Study 1: What Do People Value? We begin the program of research with a broad-based empirical exploration of subjective value. Although existing theoretical frameworks and constructs within the umbrella of subjective value guide our understanding, Study 1 aims to provide an answer as exhaustive and inclusive as possible to the question of what people value in negotiation. Rather than limiting participants to preconceived categories, this study provided an open-ended opportunity for a wide range of participants to generate examples of their own valued outcomes in recent business and personal negotiation contexts. The retrospective self-report of values can leave open whether participants may have additional values they are unable to access through introspection (e.g., Robinson & Clore, 2002; Silvia & Gendolla, 2001) or unwilling to report given social desirability and self-presentation concerns (e.g., DeMaio, 1984; Jones & Pittman, 1982; Schwarz & Strack, 1999). However, the values that negotiators report for their interactions are worthwhile in themselves, as the lay theories of goals (Miller, 1999) that form “vocabularies of motive” (Mills, 1940, p. 904). Even so, we used self-administered confidential questionnaires, for which social desirability concerns are the least pronounced (DeMaio, 1984), rather than face-to-face or telephone interviews. Furthermore, we considered a separate category for any concept mentioned by even 1 participant. In the absence of research that can effectively sample a variety of disputes in real time, the self-report questionnaire technique used in the current study remains a worthwhile tool for accessing the lay theories negotiators hold regarding their valued negotiation outcomes.

Method Participants To sample participants likely to represent a diversity of backgrounds, approaches, and experiences, we recruited a total of 103 students, community members, and negotiation practitioners. Undergraduate students at the Massachusetts Institute of Technology responded to campus flyers (n ⫽ 43 [18 women, 25 men], mean age ⫽ 19.23 years, SD ⫽ 0.77). Community members responded to posted advertisements in major transportation stations, squares, supermarkets, and stores in the Boston area (n ⫽ 32 [12 women, 20 men], mean age ⫽ 33.45 years, SD ⫽ 3.26). Union and

management negotiation practitioners participated while attending a negotiation workshop (n ⫽ 28 [6 women, 22 men], mean age ⫽ 49.96 years, SD ⫽ 7.97). Students and community members were paid $10.

Procedure Questionnaire. The survey was designed to generate specific examples of the criteria participants used to evaluate their subjective value from negotiations. To evoke a wide range of possible contexts, the survey began with a definition of negotiation as “any situation in which people are trying to accomplish a goal and have to communicate with at least one other person in order to achieve that goal.” Participants were instructed to recall two such incidences in which they had taken part during the past year, one in a personal setting and one in a business setting, counterbalanced in order. For each incident, the survey instructed participants to describe the situation briefly in writing and to generate subjective value factors: “Please list below what was important to you in the negotiation you just described. In other words, what are all the factors that mattered to you in this negotiation.” To encourage a thorough listing of possible factors, these instructions appeared alongside 16 blank spaces and invited participants to continue on the back side of the page if desired. Participants completed an average of 4.43 (SD ⫽ 2.00) subjective value factors for personal and 4.42 (SD ⫽ 2.16) subjective value factors for business negotiations. Finally, participants were instructed to rate the importance to them personally of each factor they had just listed, using a scale ranging from 1 (not very important) to 7 (extremely important). Typical business examples included negotiations with supervisors over salary and work schedules and experiences as consumers, whereas typical personal examples included splitting household chores, caring for relatives, and coordinating social plans. Coding. Sixteen pilot surveys completed by students, professionals, and community members, not included in analyses below, provided sample subjective value factors used to create a coding system. The goal was to provide a list of comprehensive categories that accurately described the breadth of goals listed by participants. Four independent coders further refined this initial coding system by categorizing each subjective value factor from a random sample of 22 of the 103 questionnaires, which were also included in analyses.

Results Table 1 lists the 20 varied coding categories that emerged, along with their frequency, average rated importance, and the coding reliability. Interestingly, although participants more frequently mentioned factors associated with their objective negotiation outcomes—that is, terms of the agreement that were either quantifiable (e.g., money or delivery time) or not readily quantifiable (e.g., high quality)—than any of the other factors, their importance ratings were in fact no higher than a range of subjective factors such as relationship quality, fairness, listening, remedy for wrongdoing, morality, and positive emotion. This was the case both for business negotiations (objective outcomes, M ⫽ 5.38, SD ⫽ 1.32; subjective outcomes, M ⫽ 5.31, SD ⫽ 1.59), t(47) ⫽ 0.91, ns, for the 48 participants reporting both types of outcomes, and for personal negotiations (objective outcomes, M ⫽ 5.37, SD ⫽ 1.60; subjective outcomes, M ⫽ 5.38, SD ⫽ 1.32), t(45) ⫽ 0.12, ns, for the 46 participants reporting both types of outcomes.

Discussion Study 1 was an inductive examination of the components of subjective value. Participants provided an unconstrained reporting of the factors important to them in previous business and personal

SUBJECTIVE VALUE IN NEGOTIATION

497

Table 1 Frequencies, Ratings, and Coding Reliability of Subjective Value Factors Reported in Business and Personal Negotiations Business Frequency Coding category Nonquantifiable terms of the agreement Quantifiable terms of the agreement Legitimacy Impact on an outside party Respect Fairness/equity Good attitude Positive emotion Effective process Morality/ethics/religious Resolution Relationship quality Trust Listening Satisfaction Acknowledgement of wrongdoing/remedy Saving face Compromise/mutual agreement Winning Peaceful/nonconfrontational Unclear or other Overall

Personal

Importance

Frequency

Importance

%

M

SD

%

M

SD

31.1 18.1 8.2 7.6 6.1 3.6 2.9 2.5 2.3 2.1 1.9 1.8 1.7 1.3 1.1 1.1 0.8 0.6 0.6 N/L 4.8 100.0

5.4 5.4 5.5 5.3 5.2 5.9 5.2 6.2 4.8 6.7 6.2 5.8 6.3 5.7 5.4 6.6 3.3 5.3 5.5

1.4 1.3 1.7 1.4 2.0 1.6 1.5 1.2 1.4 0.7 0.8 1.7 0.4 1.6 1.1 0.5 2.2 1.5 2.1 2.0

5.3 5.4 4.5 6.1 5.6 6.1 5.0 5.6 5.2 5.7 3.6 5.3 5.3 6.0 5.8 7.0 3.5 6.1 4.7 2.0 0.8

1.6 1.6 1.7 1.2 1.4 1.2 1.8 1.4 1.9 2.4 1.6 1.7 1.2 1.0 1.3

0.5

27.1 16.9 10.2 3.9 6.7 1.5 1.5 5.0 3.0 0.7 1.5 2.8 2.0 1.7 4.1 0.2 0.4 3.9 0.7 0.8 5.2 100.0

3.5 0.9 1.5 7.4 2.1

Coding reliability 0.94 0.89 0.94 0.80 0.83 0.98 0.92 0.94 0.85 0.98 0.95 0.91 0.94 0.96 0.84 0.98 1.00 0.82 0.88 0.67 0.89 0.87

Note. N/L indicates that no participant in that condition listed a subjective value factor falling under the particular coding category. Listed values may not add to 100% due to rounding.

negotiations and then reported the level of importance of each factor. We attempted to capture a wide range of approaches and experiences with various negotiation contexts by sampling participants widely and providing them with a broad definition of negotiation. Accordingly, the 20 categories resulting from their concerns spanned from ethics to saving face to making more money. Metrics of objective performance, the typical focus of much negotiations research, were the most salient to participants in terms of frequency of reporting. Even so, 20% of the participants did not list any factors describing the objective terms of the agreement. And, surprisingly, for participants reporting such objective metrics, they generally rated them as no more important than many other factors highly personal and subjective. These findings suggest that subjective outcomes in negotiation may be dramatically underrated in their real-world importance.

Whereas Study 1 explored the negotiation outcomes valued by a wide range of participants, Study 2 relied on the expertise of negotiation theoreticians, members of a distinguished research center. Negotiating frequently or assisting others with their negotiations may lead to a more clearly articulated and nuanced conception of negotiation outcomes. Indeed, Neale and Northcraft (1986) reported that practitioners generally held a more integrative and collaborative view of the process of negotiation, which suggests that they would likely hold a deep and comprehensive perspective on the topic of subjective value. Our aim in Study 2 was to tap into the wisdom of experts—“the embodiment of the best subjective beliefs and laws of life that have been sifted and selected through the experience of succeeding generations” (Seligman & Csikszentmihalyi, 2000, p. 11)—to examine the constructs and cognitive mapping that may emerge within the larger umbrella of subjective value.

Study 2: Mapping the Domain Study 1 generated 20 different categories of subjective value but left open the question of how these various categories relate to each other. Thus, the goal of Study 2 was to examine the higher order groupings and constructs that emerge when mapping the domain of subjective value. To provide such a mapping, negotiation theorists took part in a sorting task designed to illustrate the emergent conceptual groupings among the factors. Such sorting techniques are well established for studying a variety of cognitive and perceptual phenomena where the purpose is to provide measures of similarity versus distance between concepts or ideas (Rosenberg, 1982).

Method Participants Participants were professional members of the Program on Negotiation at Harvard University, an “inter-university consortium committed to improving the theory and practice of negotiation and dispute resolution” (see http://www.pon.harvard.edu/). Jared R. Curhan sent a letter of invitation for a 1-hr interview to 116 Program on Negotiation members who had addresses on the mailing list. Of these, 24 (21%) agreed to participate, and the first 15 were included in the study. Their professions included university professors, ombudspersons, mediation trainers, negotiation consultants, and other negotiation-related professional roles.

CURHAN, ELFENBEIN, AND XU

498 Stimuli

A series of forty 4-in. ⫻ 6-in. index cards were prepared, with two exemplars for each of the 20 coding categories of subjective value that emerged in Study 1. The exemplars were first selected as archetypes among the samples of the coded items, in that the items represented frequent examples of the types of statements coded into that category. The examples were rephrased in order to apply to the widest range of negotiation settings, preserving participants’ own words where possible but eliminating contextspecific details. For example, in the listening category, “that my dad was listening to what I had to say” was rephrased as “party feels counterpart is listening.” Figure 1 lists all 40 exemplars.

Procedure Participants were told that the set of 40 index cards, appearing in a random order differing for each participant, listed factors mentioned by participants in an earlier study as important negotiation outcomes. Instructions requested participants to “sort the cards into conceptual categories that make sense to you, based on the similarity or dissimilarity of the items, making as many or as few piles as you wish.” Participants created an average of 7.13 categories (SD ⫽ 2.20).

Results Analyses used the results of the sorting procedure to assess the conceptual distance between each pair of items among the 40 (Rosenberg, 1982) and subsequently the number of subconstructs necessary and sufficient to describe the various subjective outcomes generated in Study 1. A 40 ⫻ 40 dissimilarity matrix generated for each participant contained a 0 for pairs of cards that were sorted into the same pile and a 1 for pairs sorted into different piles. The 15 participants’ distance matrices were summed so that each cell in the matrix contained a number between 0 and 15, representing the count of participants for whom the pair of cards appeared in different piles. Such distance measures are the basis of input for the multivariate techniques of clustering and multidimensional scaling (Rosenberg, 1982). Cluster analysis. Cluster analysis is a classification technique for forming homogeneous groups using variance minimization techniques to provide the greatest coherence within groups and the greatest distance between groups (Borgen & Barnett, 1987; Kuiper & Fisher, 1975). Using the CLUSTER procedure (with Ward’s minimum-variance method) in the SPSS statistical software package, a four-cluster solution emerged as the optimal grouping on the basis of the criteria outlined in Tunis, Fridhandler, and Horowitz (1990) of (a) providing clusters that were conceptually meaningful and interpretable and (b) stability, in that the clusters changed only minimally when the four-cluster solution was compared with the other possible solutions. Figure 1 presents the tree diagram, or dendrogram, that illustrates the extent to which items clustered together into categories. On the basis of the items falling into each category, we named them Feelings About the Instrumental Outcome (Instrumental), Feelings About the Self (Self), Feelings About the Relationship (Relationship), and Feelings About the Negotiation Process (Process). The Relationship and Process clusters also appeared to be subclusters of a larger factor that we named Rapport. Multidimensional scaling. Multidimensional scaling provided a converging technique to examine the robustness of the underlying categorical factor structure. Multidimensional scaling uses the

proximity among objects to generate a graphical representation of the configuration of points to reflect the “hidden structure” in the data (Kruskal & Wish, 1978). Such a technique allows researchers to derive a representation of a cognitive structure without the participant necessarily being aware of or able to report the implicit dimensionality and without prompting by preconceived experimenter notions, thus making it particularly suitable for exploratory research and theory development (Pinkley, 1990; Rusbult & Zembrodt, 1982). To determine the appropriate number of dimensions in which to represent the data, we used the recommended criteria of (a) no significant increase in variance explained (R2) on addition of further dimensions, (b) an “elbow” or bend in the plot of stress values where lower numbers indicate goodness of fit (values ⫽ .404, .234, .151, .124, .103, and .083 for Dimensions 1 through 6, respectively), suggesting that the four-dimension solution did not appear to substantially reduce the stress beyond that of the threedimension solution, and (c) yielding a parsimonious and conceptually interpretable solution (Kruskal & Wish, 1978). Balancing these three criteria provided the three-dimensional solution illustrated in Figure 2, with R2 ⫽ .74. Conceptually, the multidimensional scaling solution also revealed the same four groupings that were identified in the cluster analysis, of which process and relationship also appeared to be subfactors of a larger rapport construct. These results provided converging evidence for the domains of subjective value identified by the sorting task.

Discussion The current study examined the conceptual groupings that emerged among the wide range of factors reported by earlier participants as important to them in their negotiations. The goal was to develop a comprehensive and inductively derived typology of subjective value. On the basis of the empirical results, negotiation theorists appear to group these outcomes into four broad factors representing a comprehensive yet parsimonious description of subjective value. One resulting factor was Feelings About the Instrumental Outcome, or the belief by a negotiator of having had a strong objective settlement, represented by elements such as “winning” a negotiation, receiving a significant amount of money, or obtaining a product of high quality. A second factor was positive Feelings About the Self, represented by elements such as saving face or doing the “right thing.” The third and fourth factors addressed issues with Feelings About the Negotiation Process and Feelings About the Relationship, respectively, under a larger concept of Rapport. Process issues included elements such as being listened to by the other party. Relationship issues included elements such as trust and not damaging parties’ relationship. Although these categories emerged inductively from the data generated by participants in Studies 1 and 2, deductively they bear strong resemblance to previous conceptual frameworks for classifying subjective outcomes in negotiation. Thompson’s (1990) outline of psychological measures of negotiation performance focused on perceptions of the negotiation situation (similar to our Process factor), perceptions of the other party (similar to our Relationship factor), and perceptions of the self (similar to our Self factor). Following Oliver et al. (1994), we further expand Thompson’s framework to emphasize the nexus of economic and perceptual

SUBJECTIVE VALUE IN NEGOTIATION

Figure 1. Cluster analysis tree diagram (dendrogram) illustrating the conceptual distance among subjective value factors (Study 2).

499

CURHAN, ELFENBEIN, AND XU

500

Questionnaire instructions requested participants to consider a recently experienced negotiation and to describe it briefly before responding to the 62 questions with respect to that negotiation. As in Study 1, to evoke a wide range of possible contexts, the survey began with the same broad definition of negotiation.

Participants

Figure 2. Multidimensional scaling analysis illustrating the conceptual distances among subjective value factors (Study 2).

outcomes, in the form of subjective beliefs and feelings about the tangible outcome of a bargaining encounter (similar to our Instrumental factor). Thus, our current empirical results support these models, using a data-driven approach that converged with results of theory-driven views.

Given the volume of research on negotiations taking place with student samples, for consistency in creating and testing the properties of a survey instrument we elected to work with student samples for this phase of the research program. Two distinct samples were recruited in order to conduct exploratory and confirmatory analyses on separate data sets. The exploratory sample consisted of 141 undergraduate and master’s-level business students at the University of California, Berkeley, who participated for course credit. The confirmatory sample consisted of 272 master’s-level business students at the University of California, Los Angeles, who completed the survey as part of a course on negotiations and conflict management. To sample participants drawing on real-life experiences as well as those responding in real time without the need to recall events from past memory, of these 272 participants, half were assigned at random to complete the survey on the basis of an in-class exercise just completed, simulating a salary negotiation (Schroth, Ney, Roedter, Rosin, & Tiedmann, 1997), and the other half on the basis of a real-life negotiation in which they had taken part outside of the class.

Study 3: The Subjective Value Inventory

Results

Studies 1 and 2 identified and classified areas of subjective value that are relevant and important to negotiators but did not provide a means to incorporate these areas into further research. Study 3 takes the results of the first two studies as a starting point to develop a questionnaire, the Subjective Value Inventory (SVI). By generating a relatively large initial pool of questions representing the four factors of subjective value identified in Study 2, selecting items for inclusion on the basis of their psychometric properties, and confirming that the resulting questionnaire accurately portrays the four-factor model, our intention is to provide a relatively efficient yet broad tool for the inclusion of subjective value as a key outcome in future negotiations research.

An exploratory factor analysis was conducted to identify the four best items for each subfactor of subjective value, resulting in a more manageably sized 16-item SVI that could be used in subsequent confirmatory analyses. Because the goal was to examine item loadings as one heuristic for selecting survey items rather than for the purpose of exploring the factor structure of the SVI itself, our analytic strategy was to examine each factor of subjective value separately in a principal components analysis with varimax rotation containing only the items intended for that factor. The heuristic for item selection was to balance three criteria: (a) high loading on its intended factor, (b) content assessing unique aspects of the category (McCullough, Emmons, & Tsang, 2002), and (c) maximum interitem correlations. Table 2 contains the resulting items selected for each factor. Structural equation modeling examined the structure and coherence of the resulting 16 items, using Analysis of Moment Structure software (Byrne, 2001), substituting the sample’s mean value in the few cases where participants did not complete all 16 items (Ms ⫽ 3.0% and 0.9%, SDs ⫽ 2.8% and 1.1%, respectively, across items for the exploratory and confirmatory samples). We compared the fit of three models: (a) a one-factor model containing all 16 items, (b) a three-factor model (Instrumental, Self, and Rapport), and (c) the “three–two” model predicted on the basis of the results of Study 2, with three factors (Instrumental, Self, and Rapport) and two subfactors (Relationship and Process) within the larger factor of Rapport. Given the variation among researchers in norms regarding the optimal fit statistics to evaluate structural equation models—and the differing strengths and weaknesses of each individual index—we tested and present a wide range of absolute and relative fit indices (Browne & Cudeck, 1993; Diamantopoulos & Siguaw, 2000; Kelloway, 1998; Kenny, 2005; Kline, 2005; Mulaik et al., 1989).

Method Questionnaire The results of Studies 1 and 2 were used to generate a questionnaire intended to measure the degree of subjective value experienced in a negotiation. Inductively, the subjective value factors that were generated in Study 1 and subsequently examined in Study 2 formed the core basis for generating survey items. Study 1 generated 20 different coded categories of subjective value that Study 2 distilled into four factors. We drafted an initial pool of 14, 8, 20, and 20 survey items, respectively, for the categories Feelings About the Instrumental Outcome, Feelings About the Self, Feelings About the Negotiation Process, and Feelings About the Relationship, respectively. These inductively used the subjective value factors and coding derived from Study 1 and deductively made use of the research literature on subjective outcomes in negotiation to guide the amount of coverage for each of the four factors. Wording attempted to make each item clear, vivid, and applicable to the widest range of possible negotiation contexts. To reduce the effects of fatigue and response sets, the 62 total questions appeared in one of six different random orders, counterbalanced across participants.

SUBJECTIVE VALUE IN NEGOTIATION

501

Table 2 Sixteen-Item Subjective Value Inventory (SVI) Question

Response options

Factor

A. Feelings About the Instrumental Outcome 1. How satisfied are you with your own outcome—i.e., the extent to which the terms of your agreement (or lack of agreement) benefit you? 2. How satisfied are you with the balance between your own outcome and your counterpart(s)’s outcome(s)? 3. Did you feel like you forfeited or “lost” in this negotiation? 4. Do you think the terms of your agreement are consistent with principles of legitimacy or objective criteria (e.g., common standards of fairness, precedent, industry practice, legality, etc.)?

1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ A great deal; includes an option NA (reverse scored) 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA

.88 .88 .78 .67

B. Feelings About the Self 5. Did you “lose face” (i.e., damage your sense of pride) in the negotiation? 6. Did this negotiation make you feel more or less competent as a negotiator? 7. Did you behave according to your own principles and values? 8. Did this negotiation positively or negatively impact your self-image or your impression of yourself?

1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ A great deal; includes an option NA (reverse scored) 1 ⫽ It made me feel less competent, 4 ⫽ It did not make me feel more or less competent, and 7 ⫽ It made me feel more competent; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA 1 ⫽ It negatively impacted my self-image, 4 ⫽ It did not positively or negatively impact my self-image, and 7 ⫽ It positively impacted my self-image; includes an option NA

.66 .63 .61 .73

C. Feelings About the Process 9. Do you feel your counterpart(s) listened to your concerns? 10. Would you characterize the negotiation process as fair? 11. How satisfied are you with the ease (or difficulty) of reaching an agreement? 12. Did your counterpart(s) consider your wishes, opinions, or needs?

1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA 1 ⫽ Not at all satisfied, 4 ⫽ Moderately satisfied, and 7 ⫽ Perfectly satisfied; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA

.83 .74 .71 .84

D. Feelings About the Relationship 13. What kind of “overall” impression did your counterpart(s) make on you? 14. How satisfied are you with your relationship with your counterpart(s) as a result of this negotiation? 15. Did the negotiation make you trust your counterpart(s)? 16. Did the negotiation build a good foundation for a future relationship with your counterpart(s)?

1 ⫽ Extremely negative, 4 ⫽ Neither negative nor positive, and 7 ⫽ Extremely positive; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA 1 ⫽ Not at all, 4 ⫽ Moderately, and 7 ⫽ Perfectly; includes an option NA

.85 .79 .79 .79

Note. Copyright 2006 by J. R. Curhan and H. A. Elfenbein. Permission to use the Subjective Value Inventory is granted free of charge for noncommercial purposes only. See www.subjectivevalue.com for additional information or for permission to reproduce the Subjective Value Inventory.

Absolute indices include chi-square and chi-square/degrees of freedom, root-mean square error of approximation, and the standardized root-mean square residual. Chi-square indicates the extent to which the proposed model differs from the fit for the actual data, with a ratio of chi-square/degrees of freedom of 1 indicating perfect fit and of below 3 indicating reasonable fit (Kline, 2005). Root-mean square error of approximation is a parsimony-adjusted index that includes a correction for model complexity, which is more favorable for models with large numbers of variables but few coefficients to estimate. Values of .06 and lower represent close model fit, and a value of .08 suggests reasonable approximation

(Browne & Cudeck, 1993; Kline, 2005). Standardized root-mean square residual measures the overall discrepancy between the observed and predicted correlations, with values less than .08 –.10 generally considered favorable (Kline, 2005). Relative fit indices include the incremental fit index and comparative fit index. For the incremental fit index, values of .90 and higher are considered good, although lower values are expected for models with fewer parameters (Kenny, 2005). By contrast, the comparative fit index generally shows better fit for models with fewer variables (Kenny & McCoach, 2003). Finally, we present model comparison indices: the Akaike information criterion and the test for the difference in

CURHAN, ELFENBEIN, AND XU

502

chi-square. The Akaike information criterion indicates the degree of parsimony when comparing models using the same data set, with smaller numbers indicating a better model (Kenny, 2005). The test for the difference in chi-square indicates whether a nested model is a significantly better fit to the data. Table 3 lists each of these indices for each model for both samples. The single-factor model is a relatively poor fit compared with the three-factor model. The three–two factor model is a significantly better fit than the three-factor model in the exploratory sample and a marginally better fit in the confirmatory sample. Figure 3 illustrates this factor structure for the SVI. Table 4 lists the resulting reliability and correlations among the four factors. The Self factor appears to have the least internal cohesion among items—suggesting, perhaps, a more multifaceted nature— and the lowest level of association with other scale factors.

Figure 3. Factor structure of the Subjective Value Inventory (Study 3).

Discussion The goal of the current study was to create a general-use questionnaire instrument to measure subjective value in negotiations. We used the psychometric properties of individual questions to select test items and found general support that the resulting survey follows the four-factor structure for subjective value that was derived in Study 2. The 16-item SVI appears to meet this goal. There are two clearly separate factors of Feelings About the Instrumental Outcome and Feelings About the Self. In addition, as in the second study, the two factors Feelings About the Negotiation Process and Feelings About the Relationship appear to be subfactors of a larger construct of Rapport. However, it is worth noting that in one of the two samples the distinction between these two subfactors reached only marginal significance. Nevertheless, the general convergence between these results with students and those with negotiation theorists in Study 2 suggests that both populations appear to use similar implicit categorizations of subjective value. This provides greater confidence in the generalizability of classifications within the SVI instrument. For theoretical reasons, we elect to retain the

two Rapport subfactors as separate constructs rather than to combine them together into a single survey factor. Although the present research derived these subfactors from the bottom up, we note—iterating from the top down—that each corresponds closely to an existing concept in the research literature. Whereas negotiation process is concerned largely with “cold cognition” issues such as productive discourse, techniques for reaching appropriate settlements, and other related areas, relational concerns draw more emphasis on “hot” interpersonal and affective processes (Thompson, Medvec, Seiden, & Kopelman, 2001; Thompson, Nadler, & Kim, 1999).

Study 4: Initial Validation of the SVI The fourth study aims to validate the new SVI as a tool for researchers interested in measuring the outcomes of negotiations, with acceptable psychometric properties and convergent, divergent, and predictive validity.

Table 3 Structural Equation Models of the Subjective Value Inventory Absolute fit Model

␹

2

df

␹ /df 2

Comparative fit RMSEA

SRMR

Model comparison

CFI

IFI

⌬␹2

AIC

.772 .921 .932

.775 .922 .933

— 183.45** 15.17**

443.93 266.48 255.31

.864 .913 .914

.865 .914 .915

— 107.95** 4.57†

495.56 357.61 357.05

Exploratory sample (n ⫽ 141) One-factor Three-factor Three-two factor

379.927 196.479 181.312

104 101 99

3.653 1.945 1.831

.138 .082 .077

.983 .071 .070

Confirmatory sample (n ⫽ 272) One-factor Three-factor Three-two factor

395.562 287.613 283.046

104 101 99

3.903 2.848 2.859

.102 .083 .083

.067 .057 .057

Note. The one-factor model contains all 16 items; the three-factor model contains items grouped into the factors Perceived Instrumental Outcome, Self, and Rapport; and the three-two factor model groups items into three factors (Perceived Instrumental Outcome, Self, and Rapport) with two subfactors (Relationship and Process) contained within larger factor of Rapport. RMSEA ⫽ root-mean-square error of approximation; SRMR ⫽ standardized root-mean-square residual; CFI ⫽ comparative fit index; IFI ⫽ incremental fit index; AIC ⫽ Akaike information criterion. † p ⱕ .10. ** p ⱕ .01. (All values two-tailed)

SUBJECTIVE VALUE IN NEGOTIATION

503

Table 4 Reliability and Correlations Among the Four Factors of the Subjective Value Inventory Factor 1. 2. 3. 4. 5.

Global Instrumental Self Process Relationship

1

2

3

4

5

(.91) .88** .73** .90** .90**

(.86) .59** .71** .71**

(.70) .52** .50**

(.85) .83**

(.88)

Note. Reliabilities appear in parentheses on the diagonal. ** p ⱕ .01 (two-tailed).

Convergent Validity

Divergent Validity

Relevant factors within the SVI should correlate positively with the tools researchers have previously used to examine related areas under the umbrella of subjective value. We assessed the specific constructs of trust, satisfaction, and justice in a mixed-motive negotiation with multiple issues and integrative potential, in which negotiators could vary meaningfully in performance as well as issues of justice, relationship building, and satisfaction. Trust has been defined as “an individual’s belief and willingness to act on the basis of the words, actions, and decisions of another” (McAllister, 1995, p. 25). Trust is a critical element of negotiators’ development of an effective working relationship (e.g., Lewicki & Stevenson, 1997). Thus, Hypothesis 1 is that trust in a negotiation counterpart converges with the Relationship subscale of the SVI. Likewise, developing an effective working relationship implies greater desire to work again together in the future, which is Hypothesis 2. Satisfaction with a negotiation is a critical element of subjective value. Oliver et al.’s (1994) subjective disconfirmation framework argues for a “‘better-than/worse-than’ heuristic” (p. 256) in which the match of settlements with negotiators’ prior expectations— known as subjective disconfirmation— drives satisfaction with an outcome. Hypothesis 3 is that outcome satisfaction and subjective disconfirmation converge with the SVI’s Instrumental factor. Justice has been the focus of an extensive research literature within negotiations and organizational behavior more widely. Colquitt (2001) found evidence for four distinct dimensions of organizational justice. Procedural justice refers to fairness in the decision-making processes that lead to decision outcomes, and thus Hypothesis 4 is that procedural justice converges with the Process factor of the SVI. Distributive justice refers to fairness in the allocation of outcomes or resources, and thus Hypothesis 5 is that distributive justice converges with the Instrumental factor of the SVI. Interpersonal justice refers to fairness in people being treated with respect and sensitivity, and thus Hypothesis 6 is that interpersonal justice converges with the Relationship factor of the SVI. The final factor of justice, informational justice, refers to justice in being provided with appropriate communication about the procedures of decision making, and thus Hypothesis 7 is that informational justice converges with the Process factor of the SVI. Finally, given that negotiators should have at least some intuition about their performance, we predict that the actual objective outcome of a negotiation will correlate with negotiators’ feelings about the objective outcome, which is Hypothesis 8.

The tools that researchers have previously used to capture specific constructs within subjective value should have lesser correlations with those factors of subjective value that are less directly relevant on the basis of theory. Thus, Hypothesis 9 is that the magnitude of correlations among the four factor scores on the SVI and the measures of trust, satisfaction, justice, and objective outcome should be largest for the specific predictions of Hypotheses 1– 8 and that the other correlations, not specified in advance by theory, should be of lesser magnitude. Furthermore, divergent validity of the SVI would suggest that the instrument should be largely uncorrelated with personality traits, which is Hypothesis 10. Traits are conceptualized as stable differences at the individual level (John, Donahue, & Kentle, 1991; McCrae & John, 1992). By contrast, the SVI addresses a relational construct regarding the outcomes of an interpersonal interaction. It seems plausible that, over time and in dynamic, reciprocal, and self-selected situations, an association could develop in which personality traits could guide the types of situations and quality of interpersonal interactions that one chronically experiences in negotiations (Magnus, Diener, Fujita, & Payot, 1993). However, the current research setting is a one-time negotiation with a randomly assigned partner, in which the setting is explicitly delineated and fixed across participants. Thus, in this study, in the absence of supportive theory, strong relationships between personality traits and the SVI would be vulnerable to critique regarding common method bias (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003), in which individuals may perhaps report subjective value differently based on stable temperamental traits. To sample a range of traits, we test the Big Five personality factors (McCrae & John, 1992) as well as a trait often linked with research on personality in negotiation, Machiavellianism (Christie & Geis, 1970).

Predictive Validity Responses to the SVI at the time of a negotiation should predict important, face-valid criteria at a later point in time. Drawing on Thompson’s (1990) argument that social psychological measures of negotiation are grounded in social perception (Allport, 1955), we examine future perceptions of negotiation counterparts in a context where those perceptions have real consequences. Oliver et al. (1994) argued that the willingness to negotiate again with one’s counterpart is a key consequence of subjective outcomes. Drawing from the research literature on job satisfaction (e.g., Schneider,

CURHAN, ELFENBEIN, AND XU

504

1985), Oliver et al. noted that such satisfaction predicts greater retention and stated intention to retain current working relationships. Thus, Hypothesis 11 is that greater subjective value following a negotiation predicts greater subsequent willingness to engage in cooperative interactions with the same counterpart. For a first test, we used a real behavioral measure. As part of participants’ introductory course on negotiations, in which bargaining outcomes were the sole determinant of students’ grades, we specified to participants that there would be a final exercise for which their recorded preferences indeed determined the assignment of a future teammate in a team-against-team negotiation. Our second test used semibehavioral intentions in the form of participants’ opinions of their counterpart’s worthiness for future professional contact. To enhance realism, we used questions designed to sample from the type of networking activities common to the alumni of highly rated MBA programs. Thus, the current study aimed to document the potential value of subjective value.

their role, given that real-world negotiators evaluate their performance and experience in the absence of specific comparative information.3 Behavioral measures. Just before the end of the course, students completed two measures that served as behavioral and semibehavioral assessments of their negotiation counterparts from the mixed-motive exercise. First, students completed a teammate preference ranking of all three previous in-class exercise negotiation counterparts. This provided the instructor with preferences for actual use to determine the student’s teammate in a team-on-team exercise, the results of which contributed to their course grade. Thus, participants voted “with their feet” to indicate interest in working with their counterpart in a future cooperative venture, to negotiate together against another student team. At the same time, they also made behavioral intention ratings of each of their previous counterparts, recording their opinion of the counterpart’s worthiness for further professional contact using questions designed to represent networking activities typical among the alumni of top business schools: 1.

Would you want to have this person as your business partner?

2.

If you were considering whether or not to join a firm, and you found out that this person worked there, would that make you more or less likely to join?

3.

If a friend asked your advice about whether to engage in a business transaction with this person, would you recommend doing so?

4.

Years from now, if you ran into this person at a professional meeting, would you be likely to approach him or her?

5.

How likely is it that you will seek to remain in contact with this person?

Method Participants As part of a half-semester intensive course on negotiations and conflict management at Massachusetts Institute of Technology, 104 master’s-level business students participated in this study (77 men, 27 women).

Procedure Personality instruments. At the beginning of the semester, the students completed self-report personality questionnaires. The Big Five Personality Inventory (15-item measure; Langford, 2003) assessed Conscientiousness, Extraversion, Neuroticism, Agreeableness, and Openness with three questions per factor (reliabilities ⫽ .69, .88, .84, .51, and .64, respectively). Christie and Geis’s (1970) scale assessed Machiavellianism (reliability ⫽ .76). Mixed-motive negotiation exercise. Students negotiated with a randomly paired partner in a scored mixed-motive negotiation simulation called Riggs-Vericomp, in which they attempted to reach a deal for the transfer of recycling equipment (Wheeler, 2000). The exercise included distributive issues, in which gain to one partner was at the other’s equal expense; compatible issues, in which both parties received the same number of points for a given option and thus were best served by the same option (Thompson & Hrebec, 1996); and integrative issues, for which participants could logroll in order to increase the total points score available to both parties (Froman & Cohen, 1970; Pruitt, 1983). Following the exercise, participants recorded the details of their agreement to provide information from which to compute the number of points earned by each party. To make values comparable across the two different roles, points were converted to standardized Z scores using a comparison group of the other participants sharing the same role. These Z scores served as the instrumental outcome, referred to as objective value in the analyses below. Postnegotiation questionnaires. Participants completed a series of postnegotiation questionnaires. The 16-item SVI was developed in Study 3. Instructions for the SVI appear in the Appendix. Colquitt’s (2001) justice scales assessed procedural justice, distributive justice, interpersonal justice, and informational justice (reliabilities ⫽ .87, .93, .91, and .90, respectively). Items from Lewicki, Saunders, Minton, and Barry (2002) assessed the trust between parties (reliability ⫽ .91).2 Participants recorded their settlement satisfaction, willingness to negotiate again with same partner, and subjective disconfirmation using single-item measures from Oliver et al. (1994). Students completed these surveys before the classroom discussion in which they learned how their outcomes compared with others in

Responses were made on a scale ranging from 1 to 7 (␣ ⫽ .91).

Results Convergent and Divergent Validity Table 5 shows the relationship between the SVI and the exercise results in terms of objective value (i.e., the instrumental outcome) and postnegotiation questionnaires. Because participants took part in the exercise in pairs, their individual data are nested within the dyad. Thus, Table 5 lists individual-level partial correlations with significance tests that correct for interdependence within dyads (Gonzalez & Griffin, 1999). We used Meng, Rosenthal, and Rubin’s (1992) formulas for comparing correlated coefficients to test differences between these partial correlations.4 Relationships between the four factors of the SVI and additional postnegotiation questionnaires also suggest strong convergent and acceptable divergent validity. As predicted by Hypotheses 1 and 2, respectively, trust and willingness to negotiate again with same partner correlated most strongly with the Relationship factor of the SVI, although there was overlap with the Process factor that also 2 In a separate pilot study, the Lewicki et al. (2002) Negotiation Trust scale also correlated highly with the subset of questions within the Organizational Trust Inventory (R. C. Mayer & Davis, 1999) that are applicable to dyadic negotiations (14 items; reliability ⫽ .93, r ⫽ .87). 3 We thank an anonymous reviewer for this point. 4 We thank Richard Gonzalez (personal correspondence, September 22, 2005) for his advice concerning the validation of this method.

SUBJECTIVE VALUE IN NEGOTIATION

505

Table 5 Partial Correlations Between the Subjective Value Inventory (SVI), Objective Value, and Postnegotiation Scales Completed for a Mixed-Motive Negotiation Exercise Feelings about the Rapport Measure Objective value Trust Subjective disconfirmation Outcome satisfaction Willingness to negotiate again Justice Procedural Distributive Interpersonal Informational

Instrumental outcome

Self

Process

Relationship

Overall

Total SVI

Discriminant validity (Z)

.26a .42b .73a .80a .54b .63 .54a .65a .39b .48b

.02b .31b .50b .60b .41b .48 .50a .38b .34b .34b

.17a .59a .57b .63b .69a .72 .63a .50b .52b .67a

.07b .58a .46b .53b .74a .71 .65a .41b .62a .63a

.12 .61 .54 .61 .75 .75 .67 .48 .60 .68

.16 .56 .66 .75 .71 .75 .68 .57 .55 .63

2.25* 2.00* 3.98** 4.42** 3.34** 1.06 3.57** 3.03** 2.97**

Note. Values in bold indicate predicted convergent scales. Coefficients in the table are individual-level partial correlations with significance tests that correct for interdependence within dyads (Gonzalez & Griffin, 1999). All partial correlations are significant at the .05 level unless italicized. Values for the four factors that do not share a subscript differ from each other at the .05 level, using Meng, Rosenthal, and Rubin’s (1992) formulas for comparing overlapping correlations. Discriminant validity refers to the contrast test that compares the value for the predicted convergent scale with that of all other scales, using Meng et al.’s method. N ⫽ 106 individuals in 53 dyads. * p ⱕ .05. ** p ⱕ .01 (All two-tailed).

comprises Rapport. In support of Hypothesis 3, both subjective disconfirmation and outcome satisfaction were most strongly related to the Instrumental factor. Hypothesis 4 was not supported, as procedural justice was not significantly more strongly related to the Process factor than to the rest of the SVI. However, as predicted by Hypotheses 5 and 6, respectively, distributive justice was most strongly related to the Instrumental factor and interpersonal justice to the Relationship factor. Addressing Hypothesis 7, informational justice was more closely related to the Process factor, although there was also overlap with the Relationship factor. Finally, addressing Hypothesis 8, objective value correlated significantly with Feelings About the Instrumental Outcome—suggesting that participants had a sense of their performance, albeit an imperfect sense— but did not correlate with the Self, Process, or Relationship factors. This indicates that the SVI does not merely tap common method bias relating to global satisfaction anchored in perceived negotiation performance. In support of Hypothesis 9, the above correlations were nearly always significantly greater in magnitude for the theoretically related factor of the SVI than for the factors of the SVI not specifically predicted to converge. Taken together, these patterns suggest that the particular factors of the SVI, although correlated with each other, appear to have nonoverlapping variance that addresses distinct constructs previously represented in the research literature on negotiations. As further evidence for the divergent validity of the SVI, in support of Hypothesis 10, Table 6 presents partial correlations between the SVI and personality traits. Because these traits are individual differences and the SVI addresses a relational construct regarding the outcomes of an interpersonal interaction with a randomly assigned partner, the small correlations in Table 6 are noteworthy and suggest that the SVI does not merely tap common method bias relating to global factors such as agreeableness or scale usage tendencies (Podsakoff et al., 2003). To demonstrate that feelings about negotiation performance encompass more than

strictly quantifiable outcomes, we conducted an additional analysis using a multilevel linear regression model with Kashy and Kenny’s (2000) actor–partner interaction model to account for interdependence among negotiators within a dyad.5 Feelings about the instrumental outcome appear to be a function of not only the actual instrumental outcome (␤ ⫽ .19, p ⬍ .01) but also of two of the other three factors of the SVI (Self ␤ ⫽ .30, p ⬍ .01; Process ␤ ⫽ .47, p ⬍ .01; Relationship ␤ ⫽ .09, ns).

Predictive Validity The behavioral measures indicated actual and intended expressions of interest in working together again with negotiation counterparts. Table 7 summarizes the results of linear regression models using Kashy and Kenny’s (2000) actor–partner interaction model to account for dyadic interdependence. These models predict actual and intended relationship continuation on the basis of the subjective and objective outcomes from the participant and counterpart. Providing support for Hypothesis 11, participants reporting higher subjective value reported significantly higher teammate preference rankings to work together in a future cooperative task. By contrast, participants’ actual objective outcome of the negotiation had no such impact on teammate preference rankings. For ratings of behavioral intentions, similarly, participants reporting greater subjective value expressed greater intentions to maintain a positive professional connection with their counterpart. In addition, we conducted similar actor–partner interaction model regressions of partner rankings and behavioral intentions on each of the four SVI subscales separately and found the same pattern of results in predicting teammate preference rankings (␤ ⫽ .32, p ⬍ 5

We thank an anonymous reviewer for suggesting these additional analyses.

CURHAN, ELFENBEIN, AND XU

506

Table 6 Partial Correlations Illustrating Divergent Validity Between Personality Traits and the Subjective Value Inventory (SVI) Completed for a Mixed-Motive Negotiation Exercise Feelings about the Rapport Personality trait

Instrumental outcome

Self

Process

Relationship

Overall

Total SVI

Machiavellianism Openness Conscientiousness Extraversion Agreeableness Neuroticism

⫺.05 .08 .19 .07 ⫺.05 .06

⫺.13 .14 .16 ⫺.16 .09 .09

⫺.04 .15 .06 ⫺.05 .02 .11

⫺.12 .22* .03 .02 .09 .18†

⫺.09 .20† .05 ⫺.01 .06 .15

⫺.10 .17 .13 ⫺.02 .04 .13

Note. Coefficients in the table are individual-level partial correlations with significance tests that correct for interdependence within dyads (Gonzalez & Griffin, 1999). N ⫽ 106 individuals in 53 dyads. † p ⱕ .10. * p ⱕ .05 (all two-tailed).

.01; ␤ ⫽ .27, p ⬍ .05; ␤ ⫽ .51, p ⬍ .01; and ␤ ⫽ .48, p ⬍ .01, for the Instrumental, Self, Process, and Relationship factors, respectively) and behavioral intentions (␤ ⫽ .22, p ⬍ .05; ␤ ⫽ .29, p ⬍ .01; ␤ ⫽ .61, p ⬍ .01; and ␤ ⫽ .69, p ⬍ .01, for the Instrumental, Self, Process, and Relationship factors, respectively). Objective outcomes did not show an association in any of these analyses (all ␤s ⬍ .17, ns). Finally, to demonstrate that the SVI has predictive validity above and beyond the justice scale with which it is highly correlated, both factors were entered together in a multilevel regression of behavioral intentions and both were significant positive predictors (␤s ⫽ .39 and .29, ps ⬍ .01, for the SVI and justice, respectively).6

Discussion Study 4 provides preliminary evidence demonstrating that the new SVI is a worthwhile and valid tool to assess the subjective element of negotiations. The SVI’s four factors—Feelings About

Table 7 Subsequent Behavioral Measures of Participant’s Desire for Future Cooperation as Predicted by Subjective and Objective Value From a Mixed-Motive Negotiation Exercise

Predictor Participant’s Subjective value Objective value Counterpart’s Subjective value Objective value Model diagnostics Pseudo R2 ⫺2 log likelihood

Model 1: teammate preference ranking

Model 2: behavioral intention rating

.48** ⫺.05

.54** ⫺.11

⫺.04 .08 .23 228.0

.03 .07 .38 208.2

Note. All terms except model diagnostics are individual-level standardized regression coefficients (betas) with significance tests that control for interdependence within dyads (Kashy & Kenny, 2000). Complete data available for 92 individuals in 46 dyads. ** p ⱕ .01 (two-tailed).

the Instrumental Outcome, Feelings About the Self, Feelings About the Negotiation Process, and Feelings About the Relationship—appear to converge as predicted with theoretically relevant constructs examined in prior negotiations research (e.g., Colquitt, 2001; Lewicki & Stevenson, 1997; Oliver et al., 1994). The inherently relational and situational SVI also diverges from stable individual difference measures such as Machiavellianism (Christie & Geis, 1970) and the Big Five personality traits (Langford, 2003; McCrae & John, 1992). Particularly noteworthy were the predictive validity findings demonstrating that greater subjective value following a negotiation predicts greater subsequent willingness to engage in cooperative interactions with the same negotiation counterpart. Participants responding with higher values on the SVI were more likely to choose their counterpart as a partner with whom to work together against another team when part of their actual course grade was at stake. In fact, subjective value was a better predictor of inclination toward such future interaction than was instrumental value. This finding speaks to the great value of subjective value, an element often overlooked in negotiations research that focuses strictly on bargaining agreements. The finding also speaks to the enduring nature of subjective value over time—apparently, more enduring than objective outcomes. Participants completed the SVI shortly after the negotiation yet recorded their teammate preferences weeks later. Finally, this finding speaks to the validity of the SVI as a survey instrument— both in terms of participants’ ability to introspect about subjective value as well as their willingness to report these feelings—in that the SVI strongly predicted a later rating that had real consequences for participants.

General Discussion This research contributes to a comprehensive framework of social psychological outcomes in negotiation. Using a combination of inductive and deductive methods and involving participants ranging from students and community members to negotiation practitioners, we attempted to answer the question “What do 6

We thank an anonymous reviewer for suggesting this analysis.

SUBJECTIVE VALUE IN NEGOTIATION

people value when they negotiate?” Whereas the study of subjective value is not itself new to the field of negotiation, this is the first attempt to connect this range and breadth of concepts, to probe inductively for possible blind spots, and to provide future researchers with a valid and efficient tool to standardize the measure of noninstrumental consequences of negotiation. The four-factor model of subjective value that emerged included (a) feelings about instrumental outcomes (e.g., outcome satisfaction and distributional fairness), (b) feelings about the self (e.g., saving face and living up to one’s own standards), (c) feelings about the negotiation process (e.g., fairness and voice), and (d) feelings about the relationship (e.g., good impressions and a solid foundation for the future). The relationship and process factors also appeared to be subfactors of a larger construct of rapport. This model also served to empirically validate previous conceptual frameworks used to describe social psychological measures in negotiation (Oliver et al., 1994; Thompson, 1990). Empirical findings suggested, intriguingly, the understated value of subjective value. Participants in Study 1 reported a diverse range of negotiation goals. Although subjective value was less salient, it was no less important to negotiators than objective metrics of their performance. Although tangible terms of agreements appeared more frequently than any other single factor, in open-ended responses 1 in 5 participants did not mention any tangible outcomes at all. These findings suggest that researchers may dramatically underrate subjective outcomes in negotiation given their real-world importance. In Study 4, subjective value was a better predictor than objective value of negotiators’ future behaviors and intentions. Participants reporting high subjective value were more likely weeks later to choose their counterpart for a future cooperative interaction that had real stakes, and they were also more likely to report plans to maintain a professional relationship. This finding also speaks to the validity of the SVI instrument, given that participants were able and willing to selfreport responses that later correlated strongly with consequential choices. A third particularly noteworthy finding concerns the significant—yet low— correlation between feelings about instrumental outcomes and those outcomes themselves. This suggests the difficulty, even in the controlled setting of an in-class negotiation exercise, to gather and process accurate information about one’s objective performance.

Limitations The biggest limitation of this research program is, simply put, whether negotiators value what they say they value. We relied on self-report in the open-ended generation of subjective value factors in Study 1, their mapping in Study 2, and the use of response scales in Studies 3 and 4. We address this concern in two ways. Conceptually, we argue that what people say they value in a negotiation itself is important. The accuracy of such accounts could not truly be evaluated without losing meaning (e.g., Ross, 2001; Ross & Nisbett, 1991). To obtain an immediate and direct method to ascertain a participant’s accuracy in reporting subjective value would represent a paradox—that of providing an objective criterion against which to compare inherently subjective value. Indeed, the question of how to measure and track subjective experience is a current focus of a growing volume of research on

507

well-being and hedonic science (Diener, 1984; Kahneman, Diener, & Schwarz, 1999; Schwarz & Strack, 1999), grappling with similar issues of self-report, such as self-presentation and social desirability. Just as Diener (1984, 1994) and colleagues have argued that people are considered happy to the extent that they subjectively believe themselves to be happy, we believe that introspection is the gold standard for assessing subjective value. Thus, traditional measurement strategies such as the multitrait– multimethod approach (Campbell & Fiske, 1959) would not be applicable to subjective value. Although subjective well-being has been assessed using self-reports and peer reports from family members and friends for cross-validation (e.g., Pavot, Diener, Colvin, & Sandvik, 1991; Sandvik, Diener, & Seidlitz, 1993), it can be argued that those around us have an informed perspective on our life satisfaction because it is visible to others. By contrast, it is not clear that peers would have an informed perspective on a negotiator’s subjective value beyond hearsay from the negotiator him- or herself. Any behavioral manifestations available for peers to observe (e.g., relationship continuation) are conceptually distinct consequences rather than alternate measurements of subjective value. That said, we bear the burden to demonstrate that participants are willing and able to report their subjective value, and we do so empirically in Study 4. To maintain that participant responses are driven by more than declarative knowledge and folk beliefs that may be valid internally but not with respect to actual future behaviors, we present initial data demonstrating the SVI is a strong predictor of future choices with real consequences for participants. The selection of a teammate for a team-against-team negotiation had genuine stakes in a class for which objective point scores in classroom exercises were the sole determinants of students’ grades. Thus, the strongly positive findings demonstrate participants were capable and willing to report accurately about their subjective value. Self-reports, whatever their underlying attribution process, have an inherent validity or interest to researchers when they predict important consequences for individuals. A second limitation of the current research program was the use of student samples to examine the factor structure of the SVI instrument and provide initial data on its reliability and validity. Although such samples are representative of much of the body of negotiations research, students may differ in the focus and importance they place on various factors of subjective value. More research including practitioners and community members would be worthwhile before assuming that the SVI instrument generalizes unchanged for use with wider populations. We speculate that the use of student samples in Studies 3 and 4 may have contributed to the relative weakness of the distinction between the Process and Relationship components of subjective value. Indeed, the popular book Getting to Yes (Fisher, Ury, & Patton, 1991) focuses on the need to train negotiators to separate the person from the situation—suggesting that these two concepts are theoretically distinct but empirically confounded, particularly for novice negotiators. Furthermore, wording in the Relationship questions attempted to separate negotiators’ working relationship from idiosyncratic liking. This may have focused participants on the negotiation process, thus limiting the distinction participants made between these two elements of rapport.

CURHAN, ELFENBEIN, AND XU

508 Future Research

The results of these studies suggest a number of avenues for further research. First, the systematic approach taken by the current investigation points to the relatively less investigated areas within subjective value. Notably, feelings about the self emerged as a distinct independent factor, and its relatively lower interitem consistency suggests it is complex and multidimensional. Yet, of the four components of subjective value, Self encompasses the smallest existing research literature within negotiations. Newer work on the role of face threat as well as stereotype threat and stereotype confirmation has attempted to remedy this gap (e.g., Kray, Thompson, & Galinsky, 2001; White et al., 2004). Likewise, the field would benefit from greater understanding of feelings about instrumental outcomes. How to know whether you succeeded in a negotiation is critical. The current empirical findings suggest that such knowledge is imperfect, revealing only a modestly sized partial correlation of .26 with objective outcomes themselves. Yet such knowledge is crucial for learning: Experience can be a lousy teacher if one’s conclusions about that experience are flawed. Research on counterfactual reasoning has found that individuals engage in valuable counterfactual thinking as a result of negative affect and misfortune (e.g., Galinsky, Seiden, Kim, & Medvec, 2002; Lipe, 1991; Roese, 1997). But what if negotiators are not able to diagnose their own misfortunes accurately? If subjective feelings about success and failure trigger counterfactual reasoning, then a greater understanding of subjective value is a critical component underlying theories of feedback and negotiator learning and training. The development of the SVI also offers researchers the chance to further examine how the various elements of subjective value may interact with each other. For example, recent work on procedural justice has suggested that feelings about an instrumental outcome may more strongly reflect onto feelings about one’s own skills and competence as a negotiator when one believes that the process was fair (Brockner et al., 2003).7 More research exploring the consequences of subjective value would be worthwhile. Earlier, we speculated that one value of subjective value is that it may feed back positively into future economic outcomes. Such a speculation awaits more complete testing than the preliminary results presented in Study 4. A basic question is whether the suggestive finding, that subjective value was a stronger predictor than objective value of important future consequences, would replicate in contexts with greater personal stakes for negotiators. A more detailed question concerns the boundary conditions of such an effect: Under what circumstances should subjective value be a good predictor of future instrumental outcomes? Furthermore, more research should explore the precursors of subjective value. What leads to greater feelings of personal reward from a negotiation? Cognitions such as norms, expectations, aspirations, and preferences are likely to play a key role. Similarly, work should examine structural issues such as the relationship among the parties, likelihood of future interaction, the subject and setting of the negotiation, the issues to be decided, and the medium of communication. Finally, individual differences such as personality factors, culture, and other demographic background characteristics may influence subjective value. For example, formative research on the role of emotional intelligence (e.g., J. D. Mayer,

Salovey, & Caruso, 2000) in negotiation suggests that emotional intelligence represents an asset for negotiators, particularly insofar as negotiators with high emotional intelligence seem capable of inducing their counterparts with positive affect, even after controlling for instrumental outcomes (Curhan & Mueller, 2006). Even for researchers who do not focus on subjective value per se, including it as an outcome measure provides the potential to observe the consequences of particular experimental manipulations on subjective experience. In examining how subjective value arises in a negotiation, it is also important to take a process orientation and to examine the behaviors that take place—for example, the strategies and tactics used, whether parties are cooperative versus competitive, how they share information, and other factors. It is worthwhile to examine not only the tactics that lead to negotiators’ own subjective value, but also the tactics that negotiators can use to increase the subjective value of their counterparts. Typologies of negotiation processes such as that of Olekalns, Brett, and Weingart (2003) would be ideal for addressing such questions. Even before the negotiation itself, negotiators may anticipate their level of subjective value and may make predictions— correct or incorrect—and consequently choices in an attempt to maximize their subjective value.8

Practical Implications and Interventions Given the widespread importance of effective negotiating, how can we put to use an understanding of subjective value? Study 1 suggests that the objective terms of an agreement may be more salient than other factors, but perhaps no more important. This raises the question of what might happen by focusing negotiators’ attention on subjective value. However, we argue that more work would be necessary to validate any intervention approach. Ironically, merely focusing on one’s subjective value can have a counterproductive impact on it. Conlon and Hunt (2002) found that representing outcomes to participants in terms of smiling and frowning faces—rather than numerical payoff grids—resulted in greater emotional involvement, but this greater involvement led, in turn, to longer negotiation times and higher impasse rates. They argued that high rates of disagreement in real-world negotiations are consistent with greater emotional involvement outside of controlled research settings. We speculate that interpersonal skills such as emotional intelligence may serve to moderate such findings—in which the conventional wisdom that emotional involvement is detrimental for reaching agreements may hold in the case of low emotional intelligence but that focusing on subjective value and increasing emotional involvement could benefit negotiators with high emotional intelligence. We hope that the promising findings of the current article will serve as a call for research that can develop and support nuanced recommendations about the methods and contexts in which negotiators should focus on their subjective value in order to improve the outcomes and experience of their interactions.

7 8

We thank an anonymous reviewer for raising this idea. We thank an anonymous reviewer for this idea.

SUBJECTIVE VALUE IN NEGOTIATION

Conclusion The purpose of this article has been to present a comprehensive framework of the range of inherently social psychological outcomes in negotiation, which serves as a complement to more tangible, instrumental, or economic outcomes. It is our hope that such a framework serves to encourage, systematize, and facilitate research that looks beyond economic exchange as the consequence of interpersonal negotiations. The field of negotiations has been a uniquely interdisciplinary pursuit, eagerly incorporating perspectives from economics, law, organizational behavior and industrial relations, sociology, and psychology. The current research aimed to put a social psychological stamp on the study of negotiation outcomes.

References Allport, F. (1955). Theories of perception and the concept of structure. New York: Wiley. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books. Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84, 191–215. Bazerman, M. H., Curhan, J. R., & Moore, D. A. (2001). The death and rebirth of the social psychology of negotiation. In G. J. O. Fletcher & M. S. Clark (Eds.), Blackwell handbook of social psychology: Interpersonal processes (pp. 196 –228). Oxford, England: Blackwell. Bazerman, M. H., Curhan, J. R., Moore, D. A., & Valley, K. L. (2000). Negotiation. Annual Review of Psychology, 51, 279 –314. Bazerman, M. H., & Neale, M. A. (1992). Negotiating rationally. New York: Free Press. Blount, S., & Larrick, R. P. (2000). Framing the game: Examining frame choice in bargaining. Organizational Behavior and Human Decision Processes, 81, 43–71. Borgen, F., & Barnett, D. (1987). Applying cluster analysis in counseling psychology research. Journal of Counseling Psychology, 34, 456 – 468. Brockner, J., Heuer, L., Magner, N., Folger, R., Umphress, E., van den Bos, K., et al. (2003). High procedural fairness heightens the effect of outcome favorability on self-evaluations: An attributional analysis. Organizational Behavior and Human Decision Processes, 91, 51– 68. Brockner, J., & Wiesenfeld, B. M. (1996). An integrative framework for explaining reactions to decisions: Interactive effects of outcomes and procedures. Psychological Bulletin, 120, 189 –208. Brown, B. (1968). The effects of need to maintain face on interpersonal bargaining. Journal of Experimental Social Psychology, 4, 107–122. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136 –162). Newbury Park, CA: Sage. Byrne, B. M. (2001). Structural equation modeling with AMOS: Basic concepts, applications and programming. Mahwah, NJ: Erlbaum. Camerer, C., & Thaler, R. H. (1995). Ultimatums, dictators and manners. Journal of Economic Perspectives, 9, 209 –219. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait, multimethod matrix. Psychological Bulletin, 56, 81–105. Carnevale, P. J., & Pruitt, D. (1992). Negotiation and mediation. Annual Review of Psychology, 43, 531–582. Christie, R., & Geis, F. L. (1970). Studies in Machiavellianism. New York: Academic Press. Colquitt, J. A. (2001). On the dimensionality of organizational justice: A construct validation of a measure. Journal of Applied Psychology, 86, 286 – 400. Colquitt, J. A., Conlon, D. E., Wesson, M. J., Porter, C. O. L. H., & Ng, K. Y. (2001). Justice at the millennium: A meta-analytic review of 25

509

years of organizational justice research. Journal of Applied Psychology, 86, 425– 445. Conlon, D. E., & Hunt, S. (2002). Dealing with feeling: The influence of outcome representations on negotiation. International Journal of Conflict Management, 13, 38 –58. Conlon, D. E., Lind, E. A., & Lissak, R. I. (1989). Nonlinear and nonmonotonic effects of outcome on procedural and distributive justice fairness judgments. Journal of Applied Social Psychology, 19, 1085– 1099. Croson, R., & Glick, S. (2001). Reputations in negotiations. In S. Hoch & H. Kunreuther (Eds.), Wharton on making decisions (pp. 177–186). New York: Wiley. Curhan, J. R., & Mueller, J. S. (2006, January). Emotional intelligence and counterpart affect induction in the context of integrative negotiations. Poster session presented at the 7th Annual Meeting of the Society for Personality and Social Psychology, Palm Springs, CA. Curhan, J. R., Neale, M. A., Ross, L., & Rosencranz-Engelmann, J. (2006). Relational satisficing in negotiation: Divergent effects of power distance on economic outcomes and relational capital. Manuscript submitted for publication. DeMaio, T. J. (1984). Social desirability and survey measurement: A review. In C. F. Turner & E. Martin (Eds.), Surveying subjective phenomena (pp. 257–281). New York: Russell Sage Foundation. Diamantopoulos, A., & Siguaw, J. (2000). Introducing LISREL. Thousand Oaks, CA: Sage. Diener, E. (1984). Subjective well-being. Psychological Bulletin, 95, 542– 575. Diener, E. (1994). Assessing subjective well-being: Progress and opportunities. Social Indicators Research, 31, 103–157. Drolet, A. L., & Morris, M. W. (2000). Rapport in conflict resolution: Accounting for how face-to-face contact fosters mutual cooperation in mixed-motive conflicts. Journal of Experimental Social Psychology, 36, 26 –50. Eagley, A. H., & Chaiken, S. (1998). Attitude structure and function. In D. T. Gilbert & S. T. Fiske (Eds.), The handbook of social psychology (pp. 788 – 827). Boston: McGraw-Hill. Fisher, R., Ury, W., & Patton, B. (1991). Getting to Yes: Negotiating agreement without giving in (2nd ed.). New York: Penguin Books. Fortgang, R. S., Lax, D. A., & Sebenius, J. K. (2003, February). Negotiating the spirit of the deal. Harvard Business Review, 1–9. Froman, L. A., & Cohen, M. D. (1970). Compromise and logroll: Comparing efficiency of two bargaining processes. Behavioral Science, 30, 180 –183. Galinsky, A. D., Mussweiler, T., & Medvec, V. H. (2002). Disconnecting outcomes and evaluations in negotiations: The role of negotiator focus. Journal of Personality and Social Psychology, 83, 1131–1140. Galinsky, A. D., Seiden, V., Kim, P. H., & Medvec, V. H. (2002). The dissatisfaction of having your first offer accepted: The role of counterfactual thinking in negotiations. Personality and Social Psychology Bulletin, 28, 271–283. Gelfand, M. J., Smith, V. M., Raver, J., & Nishii, L. (2006). Negotiating relationally: The dynamics of the relational self in negotiations. Academy of Management Review, 31, 427– 451. Goates, N., Barry, B., & Friedman, R. A. (2003, June). Good karma: How individuals construct schemas of reputation in negotiation contexts. Paper presented at the 16th annual conference of the International Association for Conflict Management, Melbourne, Victoria, Australia. Gonzalez, R., & Griffin, D. (1999). The correlation analysis of dyad-level data in the distinguishable case. Personal Relationships, 6, 449 – 469. Gu¨th, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum bargaining. Journal of Economic Behavior and Organization, 3, 367–388. John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The “Big Five”

510

CURHAN, ELFENBEIN, AND XU

Inventory—Versions 4a and 54. Berkeley: University of California, Berkeley, Institute of Personality and Social Research. Jones, E. E., & Pittman, T. S. (1982). Toward a general theory of strategic self-presentation. In J. Suls (Eds.), Psychological perspectives on the self (Vol. 1, pp. 231–262). Hillsdale, NJ: Erlbaum. Kahneman, D., Diener, E., & Schwarz, N. (1999). Well-being: The foundations of hedonic psychology. New York: Russell Sage Foundation. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 4, 263–291. Kashy, D. A., & Kenny, D. A. (2000). The analysis of data from dyads and groups. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 451– 477). New York: Cambridge University Press. Kelloway, E. K. (1998). Using LISREL for structural equation modeling. Thousand Oaks, CA: Sage. Kenny, D. A. (2005). Measuring model fit. Retrieved September 2, 2005, from http://davidakenny.net/cm/fit.htm Kenny, D. A., & McCoach, D. B. (2003). Effect of the number of variables on measures of fit in structural equation modeling. Structural Equation Modeling, 10, 333–351. Kline, R. B. (2005). Principles and practice of structural equation modeling. (2nd ed.) New York: Guilford Press. Komorita, S. S., & Parks, C. D. (1995). Interpersonal relations: Mixedmotive interaction. Annual Review of Psychology, 46, 183–207. Kray, L. J., Thompson, L., & Galinsky, A. (2001). Battle of the sexes: Gender stereotype confirmation and reactance in negotiations. Journal of Personality and Social Psychology, 80, 942–958. Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverly Hills, CA: Sage. Kuiper, F. K., & Fisher, L. A. (1975). A Monte Carlo comparison of six clustering procedures. Biometrics, 31, 777–783. Kurtzberg, T., & Medvec, V. H. (1999). Can we negotiate and still be friends? Negotiation Journal, 15, 355–361. Kwon, S., & Weingart, L. R. (2004). Unilateral concessions from the other party: Concession behavior, attributions, and negotiation judgments. Journal of Applied Psychology, 89, 263–278. Langford, P. H. (2003). A one-minute measure of the Big Five? Evaluating and abridging Shafer’s (1999) Big Five markers. Personality and Individual Differences, 35, 1127–1140. Lax, D. A., & Sebenius, J. K. (1986). Interests: The measure of negotiation. Negotiation Journal, 2, 73–92. Lewicki, R. J., McAllister, D. J., & Bies, R. J. (1998). Trust and distrust: New relationships and realities. Academy of Management Review, 23, 438 – 458. Lewicki, R. J., Saunders, D. M., Minton, J. W., & Barry, B. (2002). Negotiation: Readings, exercises, and cases. New York: McGraw-Hill/ Irwin. Lewicki, R. J., & Stevenson, M. A. (1997, June). Trust development in negotiation: Proposed actions and a research agenda. Paper presented at the 10th Annual Conference of the International Association for Conflict Management, Bonn, Germany. Lind, E. A., & Tyler, T. R. (1988). The social psychology of procedural justice. New York: Plenum Press. Lipe, M. G. (1991). Counterfactual reasoning as a framework for attribution theories. Psychological Bulletin, 109, 456 – 471. Loewenstein, G. F., Thompson, L., & Bazerman, M. H. (1989). Social utility and decision making in interpersonal contexts. Journal of Personality and Social Psychology, 57, 426 – 441. Magnus, K., Diener, E., Fujita, F., & Payot, W. (1993). Extraversion and neuroticism as predictors of objective life events: A longitudinal analysis. Journal of Personality and Social Psychology, 65, 1046 –1053. Mannix, E. A., Tinsley, C. H., & Bazerman, M. (1995). Negotiating over time: Impediments to integrative solutions. Organizational Behavior & Human Decision Processes, 62, 241–251.

Mayer, J. D., Salovey, P., & Caruso, D. R. (2000). Models of emotional intelligence. In R. J. Sternberg (Ed.), Handbook of intelligence (pp. 396 – 420). Cambridge, England: Cambridge University Press. Mayer, R. C., & Davis, J. H. (1999). The effect of the performance appraisal system on trust for management: A field quasi-experiment. Journal of Applied Psychology, 84, 123–136. McAllister, D. J. (1995). Affect- and cognition-based trust as foundations for interpersonal cooperation in organizations. Academy of Management Journal, 38, 24 –59. McCrae, R. R., & John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60, 175–215. McCullough, M. E., Emmons, R. A., & Tsang, J. (2002). The grateful disposition: A conceptual and empirical typography. Journal of Personality and Social Psychology, 82, 112–127. Meng, X., Rosenthal, R., & Rubin, D. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111, 172–175. Messick, D. M., & Sentis, K. P. (1985). Estimating social and nonsocial utility functions from ordinal data. European Journal of Social Psychology, 15, 389 –399. Mestdagh, S., & Buelens, M. (2003, June). Thinking back on where we’re going: A methodological assessment of five decades of research in negotiation behavior. Paper presented at the 16th annual conference of the International Association for Conflict Management, Melbourne, Victoria, Australia. Miller, D. T. (1999). The norm of self-interest. American Psychologist, 54, 1053–1060. Miller, D. T., & Ratner, R. K. (1998). The disparity between the actual and assumed power of self-interest. Journal of Personality and Social Psychology, 74, 53– 62. Mills, C. W. (1940). Situated actions and vocabularies of motive. American Sociological Review, 5, 904 –913. Morris, M. W., Larrick, R. P., & Su, S. K. (1999). Misperceiving negotiation counterparts: When situationally determined bargaining behaviors are attributed to personality traits. Journal of Personality and Social Psychology, 77, 52– 67. Mulaik, S. A., Janes, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989). Evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430 – 445. Naquin, C. E., & Paulson, G. D. (2003). Online bargaining and interpersonal trust. Journal of Applied Psychology, 88, 113–120. Nash, J. (1953). Two-person cooperative games. Econometrica, 21, 128 – 140. Neale, M. A., & Northcraft, G. B. (1986). Experts, amateurs, and refrigerators: Comparing expert and amateur negotiators in a novel task. Organizational Behavior & Human Decision Processes, 38, 305–317. Northcraft, G. B., Brodt, S. E., & Neale, M. A. (1995). Negotiating with nonlinear subjective utilities: Why some concessions are more equal than others. Organizational Behavior and Human Decision Processes, 63, 298 –310. Novemsky, N., & Schweitzer, M. (2004). What makes negotiators happy? The differential effects of internal and external social comparisons on negotiator satisfaction. Organizational Behavior and Human Decision Processes, 95, 186 –197. O’Connor, K. M., Arnold, J. A., & Burris, E. R. (2005). Negotiators’ bargaining histories and their effects on future negotiation performance. Journal of Applied Psychology, 90, 350 –362. Olekalns, M., Brett, J. M., & Weingart, L. (2003). Phases, transitions and interruptions: The processes that shape agreement in multi-party negotiations. International Journal of Conflict Management, 14, 191–211. Oliver, R. L., Balakrishnan, P. V., & Barry, B. (1994). Outcome satisfaction in negotiation: A test of expectancy disconfirmation. Organizational Behavior and Human Decision Processes, 60, 252–275. Pavot, W., Diener, E., Colvin, C. R., & Sandvik, E. (1991). Further validation of the Satisfaction With Life Scale: Evidence for cross-

SUBJECTIVE VALUE IN NEGOTIATION method convergence of well-being measures. Journal of Personality Assessment, 57, 149 –161. Pinkley, R. L. (1990). Dimensions of conflict frame: Disputant interpretations of conflict. Journal of Applied Psychology, 75, 117–126. Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879 –903. Pruitt, D. G. (1983). Achieving integrative agreements. In M. H. Bazerman & R. J. Lewicki (Eds.), Negotiating in organizations (pp. 35– 49). Beverly Hills, CA: Sage. Pruitt, D. G., & Rubin, J. Z. (1986). Social conflict: Escalation, stalemate, and settlement. New York: McGraw-Hill. Pyszczynski, T., Greenberg, J., Solomon, S., Arndt, J., & Schimel, J. (2004). Why do people need self-esteem? A theoretical and empirical review. Psychological Bulletin, 130, 435– 468. Robinson, M. D., & Clore, G. L. (2002). Belief and feeling: Evidence for an accessibility model of emotional self-report. Psychological Bulletin, 128, 934 –960. Roese, N. J. (1997). Counterfactual thinking. Psychological Bulletin, 121, 133–148. Rosenberg, S. (1982). The method of sorting in multivariate research with applications selected from cognitive psychology and person perception. In N. Hirschberg & L. G. Humphreys (Eds.), Multivariate applications in the social sciences (pp. 117–142). Hillsdale, NJ: Erlbaum. Ross, L. (1977). The intuitive psychologist and his shortcomings. In L. Berkowitz (Eds.), Advances in experimental social psychology (pp. 174 –220). New York: Academic Press. Ross, L. D. (2001). Getting down to fundamentals: Lay dispositionism and the attributions of psychologists. Psychological Inquiry, 12, 37– 40. Ross, L., & Nisbett, R. E. (1991). The person and the situation: Perspectives of social psychology. Philadelphia: Temple University Press. Rubin, J. Z., & Brown, B. R. (1975). The social psychology of bargaining and negotiation. New York: Academic Press. Rusbult, C. E., & Zembrodt, I. M. (1982). Responses to dissatisfaction in romantic involvements: A multidimensional scaling analysis. Journal of Experimental Social Psychology, 19, 274 –293. Sandvik, E., Diener, E., & Seidlitz, L. (1993). Subjective well-being: The convergence and stability of self-report measures. Journal of Personality, 61, 317–342. Schneider, B. (1985). Organizational behavior. Annual Review of Psychology, 36, 573– 611. Schroth, H. A., Ney, G., Roedter, M., Rosin, A., & Tiedmann, M. (1997). MBA salary negotiation. Evanston, IL: Dispute Resolution Research Center. Schwarz, N., & Strack, F. (1999). Reports of subjective well-being: Judgmental processes and their methodological implications. In D. Kahneman, E. Diener, & N. Schwarz (Eds.), Well-being: The foundations of hedonic psychology (pp. 61– 84). New York: Russell Sage Foundation. Seligman, M. E. P., & Csikszentmihalyi, M. (2000). Positive psychology: An introduction. American Psychologist, 55, 5–14. Silvia, P. J., & Gendolla, G. H. E. (2001). On introspection and selfperception: Does self-focused attention enable accurate self-knowledge? Review of General Psychology, 5, 241–269. Snyder, C. R., & Higgins, R. L. (1997). Reality negotiation: Governing

511

one’s self and being governed by others. Review of General Psychology, 1, 336 –350. Stevens, C. K., Bavetta, A. G., & Gist, M. E. (1993). Gender differences in the acquisition of salary negotiation skills: The roles of goals, selfefficacy, and perceived control. Journal of Applied Psychology, 78, 723–735. Straub, P. G., & Murnighan, J. K. (1995). An experimental investigation of ultimatums: Common knowledge, fairness, expectations, and lowest acceptable offers. Journal of Economic Behavior and Organization, 27, 345–364. Taylor, S. E., & Brown, J. D. (1994). Positive illusions and well-being revisited: Separating fact from fiction. Psychological Bulletin, 116, 21–27. Thibaut, J., & Walker, L. (1975). Procedural justice: A psychological analysis. Hillsdale, NJ: Erlbaum. Thompson, L. (1990). Negotiation behavior and outcomes: Empirical evidence and theoretical issues. Psychological Bulletin, 108, 515–532. Thompson, L. (1995). The impact of minimum goals and aspirations on judgments of success in negotiations. Group Decision & Negotiation, 4, 513–524. Thompson, L., & Hastie, R. (1990). Social perception in negotiation. Organizational Behavior and Human Decision Processes, 47, 98 –123. Thompson, L., & Hrebec, D. (1996). Lose–lose agreements in interdependent decision making. Psychological Bulletin, 120, 396 – 409. Thompson, L., Medvec, V. H., Seiden, V., & Kopelman, S. (2001). Poker face, smiley face and rant ‘n’ rave: Myths and realities about emotion in negotiation. In M. Hogg & S. Tindale (Eds.), Group processes (pp. 139 –163). Malden, MA: Blackwell. Thompson, L., Nadler, J., & Kim, P. H. (1999). Some like it hot: The case for the emotional negotiator. In L. L. Thompson, J. M. Levin, & D. M. Messick (Eds.), Shared cognition in organizations: The management of knowledge (pp. 139 –161). Mahwah, NJ: Erlbaum. Tinsley, C. H., O’Connor, K. M., & Sullivan, B. A. (2002). Tough guys finish last: The perils of a distributive reputation. Organizational Behavior and Human Decision Processes, 88, 621– 645. Tunis, S., Fridhandler, B., & Horowitz, M. (1990). Identifying schematized views of self with significant others: Convergence of quantitative and clinical methods. Journal of Personality and Social Psychology, 59, 1279 –1286. Tyler, T. R., & Blader, S. L. (2003). The group engagement model: Procedural justice, social identity, and cooperative behavior. Personality and Social Psychology Review, 7, 349 –361. Valley, K. L., Neale, M. A., & Mannix, E. A. (1995). Friends, lovers, colleagues, strangers: The effects of relationships on the process and outcome of dyadic negotiations. Research on Negotiation in Organizations, 5, 65–93. Walton, R. E., Cutcher-Gershenfeld, J. E., & McKersie, R. B. (1994). Strategic negotiations: Theory of change in labor-management relations. Cambridge, MA: Harvard Business School Press. Wheeler, M. A. (2000). Riggs-Vericomp (Case 801– 096/7). Boston: Harvard Business School. White, J. B., Tynan, R., Galinsky, A. D., & Thompson, L. L. (2004). Face threat sensitivity in negotiation: Roadblock to agreement and joint gain. Organizational Behavior and Human Decision Processes, 94, 102–124.

(Appendix follows)

CURHAN, ELFENBEIN, AND XU

512

Appendix Instructions for Use of the Subjective Value Inventory 16-Item Questionnaire Instructions for Participants General Instructions: For each question, please circle a number from 1 to 7 that most accurately reflects your opinion. You will notice that some of the questions are similar to one another; this is primarily to ensure the validity and reliability of the questionnaire. Please simply answer each question independently, without reference to any of the other questions. Important: If you encounter a particular question that is not applicable, simply circle “NA.” Even if you did not reach agreement, please try to answer as many questions as possible.

Administration of the Subjective Value Inventory Items can be presented in any order. However, the order shown in Table 2 is recommended. No headings should be used (e.g., Feelings About the Instrumental Outcome). The version presented in Table 2 is intended for negotiations involving two or more individuals. When the focal negotiation

involves only two individuals, the words counterpart(s) and outcome(s) should be changed to counterpart and outcome, respectively.

Scoring of the Subjective Value Inventory Items 3 and 5 should be reverse-scored (i.e., a response of 7 becomes 1, a response of 6 becomes 2, etc.). Next, items within each of the four subscales should be averaged (with equal weightings) to yield four subscale scores (i.e., Instrumental, Self, Process, and Relationship). If desired, a Global score can be calculated by averaging (with equal weightings) these four subscale scores. A Rapport score may also be calculated by averaging scores for Process and Relationship (with equal weightings).

Received January 14, 2005 Revision received November 3, 2005 Accepted January 30, 2006 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 513–518

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.513

Can Manipulations of Cognitive Load Be Used to Test Evolutionary Hypotheses? H. Clark Barrett, David A. Frederick, and Martie G. Haselton

Robert Kurzban University of Pennsylvania

University of California, Los Angeles D. DeSteno, M. Y. Bartlett, J. Braverman, and P. Salovey (2002) proposed that if sex-differentiated responses to infidelity are evolved, then they should be automatic, and therefore cognitive load should not attenuate them. DeSteno et al. found smaller sex differences in response to sexual versus emotional infidelity among participants under cognitive load, an effect interpreted as evidence against the evolutionary hypothesis. This logic is faulty. Cognitive load probably affects mechanisms involved in simulating infidelity experiences, thus seriously challenging the usefulness of cognitive load manipulations in testing hypotheses involving simulation. The method also entails the assumption that evolved jealousy mechanisms are necessarily automatic, an assumption not supported by theory or evidence. Regardless of how the jealousy debate is eventually settled, cognitive load manipulations cannot rule out the operation of evolved mechanisms. Keywords: evolutionary psychology, cognitive load, modularity, jealousy, automaticity

enced by cognitive load manipulations. We propose that this is an incorrect assumption about jealousy mechanisms. It is likely that many evolved mechanisms rely on processes affected by cognitive load, making the cognitive load method inappropriate as a way to test for the existence of these mechanisms. Second, the reasoning of DeSteno et al. (2002) embodies incorrect assumptions about the way evolution-minded researchers conceive of evolved cognitive architectures. In particular, the conceptualization of modularity and automaticity presented by DeSteno et al. (2002) builds a false dichotomy between evolutionary models, which are said to entail “automaticity,” and nonevolutionary models, which entail “effortful decisions” (see Barrett & Kurzban, in press; Pinker, 1997; Sperber, 2005). For these reasons, we argue that the cognitive load method, in particular as applied by DeSteno et al. (2002), is not a valid test of whether a particular experimental result reflects an “automatic . . . response shaped by evolution” (DeSteno et al., 2002, p. 1003) or the presumed alternative, a “nonautomatic response not shaped by evolution.” We wish to stress that although we believe that DeSteno et al. have not falsified the hypotheses about jealousy that they claim to have falsified, the problems we have raised with their method exist independent of the debate about sex differences in response to different types of infidelity. No matter how current debates about jealousy are resolved, the cognitive load method as used by DeSteno et al. is based on faulty assumptions about evolved mechanisms and cannot be used to test for their existence.

DeSteno, Bartlett, Braverman, and Salovey (2002) proposed a new method for evaluating hypotheses about evolved cognitive mechanisms. They assumed that evolved cognitive mechanisms are necessarily automatic in their operation, accepting no input from deliberative or effortful processes. On this basis, they suggested that adding a cognitive load manipulation to a task (remembering a long string of digits and producing a response in less than 10 s) should “enhance the influence of automatic processes on judgment and behavior through the inhibition of corrective or deliberative processes reflecting the influence of conscious analysis” (DeSteno et al., 2002, p. 1111). Because cognitive load is purported to affect only deliberative processes, they proposed that if an effect observed under no load conditions is reduced or eliminated under cognitive load, the initial effect can be inferred to have been the result of effortful, rather than evolved, cognition. They argued that by comparing performance under cognitive load and no load, they could provide a crucial test of hypotheses about evolved cognitive mechanisms that can “resolve the stalemate” (DeSteno et al., 2002, p. 1104) over the role of specialized evolved mechanisms in jealousy. Here we suggest that this reasoning is faulty for two reasons. First, it assumes that evolved jealousy mechanisms do not process information from other cognitive mechanisms (e.g., controlled processes, working memory, etc.) that might themselves be influ-

H. Clark Barrett, Department of Anthropology, University of California, Los Angeles; David A. Frederick, Department of Psychology, University of California, Los Angeles; Martie G. Haselton, Communication Studies Program and Department of Psychology, University of California, Los Angeles; Robert Kurzban, Department of Psychology, University of Pennsylvania. Correspondence concerning this article should be addressed to H. Clark Barrett, Department of Anthropology, University of California, Los Angeles, 341 Haines Hall, Box 951553, Los Angeles, CA 90095-1553. E-mail: [email protected]

A Brief History of the Problem DeSteno et al. (2002) used the cognitive load method to test a proposal by Buss, Larsen, Westen, and Semmelroth (1992) about evolved sex differences in jealousy. On the basis of parental investment theory (Trivers, 1972), Buss et al. (1992) predicted there would be sex differences in the degree of distress caused by imagined sexual and emotional infidelity. In brief, they reasoned 513

514

BARRETT, FREDERICK, HASELTON, AND KURZBAN

that there was an asymmetry in the fitness costs of these kinds of infidelity for men and women. Because of internal fertilization, women can be certain that their child is genetically their own, whereas men are uncertain of paternity. Investment in the offspring of another man was an event with high fitness costs, leading to the evolution of anticuckoldry adaptations in men. One prediction of this account is that, relative to women, men should be more distressed by real or imagined sexual infidelity by their mates. The fitness costs to women of infidelity by a male partner would also have been high. Unlike men, women were certain of the genetic relatedness of their offspring. However, female fitness would have been significantly influenced by male investment in offspring (Hill & Hurtado, 1996; Hurtado & Hill, 1992; Marlowe, 2003). Thus, women stood to lose less from sexual infidelity per se than did men, but more, relative to men, if their partners were to divert investment to other mates and their offspring. If becoming emotionally involved with another woman led to diverted investment by ancestral men, then one prediction that follows from this hypothesis is that emotional infidelity should induce more jealousy in women than in men.1 Thus, both women and men should be concerned with infidelity, but jealousy should be elicited more by cues to sexual infidelity in men than in women and more by cues to resource diversion in women than in men.2 Note that these predictions entail little regarding the cognitive structure of the mechanisms underlying jealousy. The predicted sex differences could be realized in many possible cognitive designs guiding decision making about infidelity. We return to this question below.

about infidelity scenarios. Because DeSteno et al. (2002) believed that the evolution-predicted sex difference hypothesis implies an automatic process, they reasoned that if the cognitive load manipulation has the predicted effect, it undermines the evolutionary hypothesis.4 The reasoning behind this claim turns on two assumptions. First, it assumes that the evolution-predicted sex difference hypothesis implies the existence of jealousy modules whose operation includes automatic processes but no other (conscious, deliberative) processes. This assumption is necessary because if the evolved jealousy systems could include deliberative systems and not just automatic ones, their experiments would not rule out any hypothesis of interest to DeSteno et al. (2002). An evolved system that took input from deliberative systems would also be affected by cognitive load manipulations, so even if cognitive load effects were observed, the evolutionary hypothesis could not be ruled out. Second, the reasoning of DeSteno et al. (2002) assumes that cognitive load manipulations do not interfere with cognitive processes used by purported jealousy modules. If and only if both assumptions are correct—that the jealousy module is automatic and that it cannot be influenced by processes sensitive to cognitive load—then cognitive load manipulations should have no effect on judgment tasks relying on an evolved jealousy module.

The Cognitive Load Method

1 Note that although these predictions focus on differences between the sexes, Buss and others (Buss, 2000; Symons, 1979) discussed both similarities and differences in jealous responses by men and women. For example, both men and women face a complete loss of their valued mate to a reproductive competitor if he or she is lured away. 2 These are only predictions about sex differences. It is often claimed that the evolutionary perspective predicts that men should be more upset by sexual than emotional infidelity, whereas women should be more upset by emotional than sexual infidelity (e.g., DeSteno et al., 2002, p. 1114; C. R. Harris, 2003, 2005, p. 77, Table 1), but these predictions do not follow from the evolutionary logic described above nor were they advanced by Buss et al. (1992; see also Buss & Haselton, 2005). For example, the finding in certain cultures that men and women both rate emotional infidelity as more upsetting than sexual infidelity, but that men find sexual infidelity more upsetting than women do, is consistent with the hypothesis (e.g., Buunk, Angleitner, Oubaid, & Buss, 1996). Although not required by the hypothesis, certain patterns of rank ordering effects within sexes would also be consistent with the existence of predicted differences between sexes. For example, Buss et al. (1992) observed greater increases in electrodermal activity from baseline in men’s response to sexual than to emotional infidelity, whereas the reverse was found for women. This pattern is consistent with differential responses by men and women to sexual infidelity (men greater than women) and differential responses by men and women to emotional infidelity (women greater than men; but see C. R. Harris, 2000). 3 DeSteno et al. (2002) claimed that the sex differences “disappeared” (p. 1103) under cognitive load (see also C. R. Harris, 2003, p. 117, for the same claim). In fact, as Sagarin (2005) documented, the sex difference is smaller but still statistically significant. 4 As we explain below, although DeSteno et al. (2002) and others treat the evolutionary hypothesis as a single hypothesis, it actually consists of multiple hypotheses.

DeSteno et al. (2002) gave participants 10 s to indicate whether the idea of their partner committing sexual infidelity or the idea of their partner being emotionally unfaithful was more distressing. Half of the participants were placed under cognitive load. They were asked to remember a long string of digits while attempting to simulate what it would be like to experience these types of infidelity and choosing which one was more distressing. The rationale for the use of a cognitive load manipulation was that it “should, if anything, enhance the influence of automatic processes on judgment and behavior through the inhibition of corrective or deliberative processes reflecting the influence of conscious analysis” (DeSteno et al., 2002, p. 1111). According to DeSteno et al.’s (2002) reasoning, if performance on the jealousy task is influenced by conscious analysis, then a cognitive load manipulation, which is presumed to disrupt conscious analysis, should have an effect on the outcome of the judgment task. Thus, in their view, if the sex difference in jealousy reactions is attenuated or eliminated, then the sex difference (under no load conditions) must have been due to deliberative processes disrupted by cognitive load rather than because of the operation of a module. DeSteno et al. (2002) reported precisely this result: an attenuation of sex differences in the choice of sexual versus emotional infidelity.3

Assumptions Underlying the Cognitive Load Method The rationale for the use of a cognitive load manipulation is that it should reveal the operation of automatic processes if automatic processes are influencing judgments on a task, such as judgments

Assumptions About Cognitive Architecture Are these assumptions valid? The rationale for cognitive load as a test of the evolution-predicted sex difference hypothesis is based

COMMENT ON DESTENO ET AL. (2002)

on a model of cognitive processing endorsed by Fodor (1983, 2000). This model holds that to the extent that modules exist, their influence on information processing occurs early in the processing stream and is immune to top-down or horizontal influence. In this view, when higher level or central systems interact with the modules, it is only to receive the modules’ outputs. However, as we discuss in more detail below, there are good evolutionary reasons to suppose that this is not an adequate model of all or most evolved cognitive systems and certainly not of the kind underpinning jealousy judgments (Barrett, 2005; Barrett & Kurzban, in press; Pinker, 1997, 2005; Sperber, 1994, 2005). Evolutionary psychologists have been explicit about what they mean by modularity, and this does not include a commitment to automaticity (e.g., see Pinker, 1997, pp. 27–31). In fact, somewhat ironically, evolutionary psychologists have argued against automaticity as a design feature of certain cognitive mechanisms in domains in which many social psychologists have committed themselves to automaticity, such as automatic categorization of individuals by race (Kurzban, Tooby, & Cosmides, 2001). Instead, evolutionary psychologists have proposed that individual cognitive systems will have design features that reflect their function; features like automaticity, encapsulation, and speed could be features of some systems (e.g., snake detection), but only when such features are appropriate for the problems the system evolved to solve (Barrett & Kurzban, in press; Sperber, 1994, 2005; Tooby & Cosmides, 1992). These are not features that are likely to be appropriate for a system regulating jealousy reactions. Fodor (1983) claimed that modular systems should accept only narrow classes of inputs (usually, only perceptual ones) and should process these automatically. Evolutionary psychologists have argued, on the other hand, that many evolved inference and decisionmaking systems should be expected to use background knowledge and contextual information, stored in what Fodor (1983) would call “central” systems, as part of their normal operation (Barrett, 2005; Sperber, 1994, 2005; Tooby & Cosmides, 1992). We believe that this is also the case for a jealousy system, which would be close to useless if it had to rely only on direct perceptual evidence of infidelity. Instead, as we will argue in more detail below, it is likely that a specialized jealousy system, if it exists, would have evolved to rely heavily on background knowledge and contextual information—including information generated by deliberative or so-called central processes—in generating jealousy. Consider the jealousy tasks used by Buss et al. (1992) and DeSteno et al. (2002). Is the Fodorian model of modularity likely to account for processes underlying performance on these tasks? This seems very unlikely because, in order to complete the task, even as originally designed by Buss et al. (1992), deliberatively processed information must be used: namely, the mental simulations of jealousy scenarios that subjects are instructed to perform. Although there is healthy debate surrounding imaginary and counterfactual scenarios (P. L. Harris, 2000; Sperber, 2000), there is little doubt that these mechanisms could include ones that come under the rubric of what DeSteno et al. (2002) called “deliberative processes” and that these would, because of the nature of the task, have to operate before any jealousy judgments were made. Why does this matter for the cognitive load method? If the cognitive load manipulation interferes with deliberative conscious processes, as DeSteno et al. (2002) proposed, then it could also interfere with the capacity to generate the imagined scenarios on

515

which the subjects are asked to base their jealousy judgments. Failure to realistically simulate an experience hypothesized to evoke evolved mechanisms would interfere with the operation of such mechanisms. This precludes conclusions about the origins of the sex differences that are observed under no load conditions. This would be true in both Buss et al.’s (1992) account, in which a jealousy module is triggered by imagining infidelity events, or in DeSteno et al.’s (2002) account, in which no jealousy module is involved, but imagined infidelity events are still the basis for judgment. The manipulation would disrupt the results in either case, ruling it out as a means of discriminating between the two hypotheses.

Assumptions About Evolution and Modularity Using the cognitive load method as a test of evolutionary psychological hypotheses rests entirely on the assumption that all evolved modular systems must be automatic. If this assumption is incorrect, then the cognitive load method is not a valid test of evolutionary psychological hypotheses in general.5 Here we discuss why the assumption of automaticity as a general property of evolved systems and as a specific property of jealousy mechanisms is likely to be incorrect. As mentioned above, there is a substantial difference between the standard Fodorian view of modularity and that endorsed by evolutionary psychologists. Evolutionary psychologists are interested in evolved psychological specializations, and the central concept that they invoke is functional specialization, not the checklist of features, including automaticity, endorsed by Fodor (1983). It is true that many evolutionary psychologists use the term module to refer to evolved specializations, thereby evoking by association the Fodorian modularity concept (though, ironically, given DeSteno et al.’s [2002] heavy use of the term, “module” does not appear in Buss et al. [1992]). However, because evolutionary theories are centered on functional specialization and not Fodorian features per se, evolutionary psychologists have been explicit in rejecting features such as automaticity and encapsulation as necessary properties of evolved modules (see, e.g., Barrett, 2005; Pinker, 1997, 2005; Sperber, 1994, 2005).6 The split between evolutionary psychologists and Fodor (1983) on modularity properties is not merely a semantic quibble but is based on evolutionary logic. Assume that the argument of Buss et 5 We are not claiming that cognitive load could never be used to test an evolutionary hypothesis in a sensible way. What we are claiming is that because automaticity in the sense intended by DeSteno et al. (2002) is not mandatorily entailed by evolutionary hypotheses, the cognitive load method is not a global litmus test for evolved mechanisms. We could imagine cognitive load as a test of specific hypotheses (evolutionary or not) that posit lack of interaction between a given system and systems impaired by cognitive load. However, isolation of systems is not a general property of evolved systems nor of the putative systems being investigated by DeSteno et al. 6 Although it could be argued that DeSteno et al. (2002) intended only the Fodorian view of modularity, this is contradicted by the fact that they cite Buss and Kenrick (1998), Cosmides and Tooby (1994), and Pinker (1997) for their claim that “modules constitute automatic mental processes” (DeSteno et al., 2002, p. 1105). These authors diverge from Fodor (1983) in multiple ways (see, e.g., Pinker, 1997, pp. 30 –31) and, it is important to note, never state that automaticity is a necessary feature of evolved psychological mechanisms.

516

BARRETT, FREDERICK, HASELTON, AND KURZBAN

al. (1992) is sound and that natural selection favored men who reacted strongly to sexual infidelity and women who reacted strongly to resource diversion. What does this tell us about the design features of the cognitive mechanisms that might evolve under these conditions? Minimally, it implies only that each sex should have cognitive mechanisms that cause them to be sensitive to cues indicating these events and to adjust their behavior to minimize the probability of these events occurring. Nothing about the effortfulness or automaticity of the processes guiding behavior is mandated by the logic of the hypothesis. In fact, in the case of detecting infidelity, a system designed to be activated automatically and only by perceptual cues—that is, only by directly observing one’s mate engaged in an act of sexual intercourse—would probably be a poor one. Instead, given that infidelity would often have to be inferred via a variety of indirect contextual cues and central reasoning processes, we would expect the system to be sensitive to a wide range of cues and to be sensitive to knowledge stored in and manipulated by central systems. Imagine, for example, a married man whose wife works at an office and comes home promptly at seven every evening. One day, his wife mentions that a young male coworker has joined the firm, and 2 days later, she fails to come home. If jealousy ensues, it is not because of the direct observation of infidelity. Rather, a process of inference, integrating both circumstantial evidence and background knowledge, had to have occurred in order to trigger the jealousy response. In turn, additional contextual information, such as knowledge of a public transportation strike on the night in question, might mitigate the jealousy response. It is to be expected that jealousy reactions would accept as input the output of many reasoning and inference processes, including deliberative and effortful ones, given that direct evidence of infidelity would have been rare in ancestral environments. It is incorrect to assume that natural selection favors a single type of cognitive system, one that is strictly bottom-up, automatic, encapsulated, and perceptually driven. We also note that just as it is fallacious to make the inference from evolved to automatic, it is also fallacious to make the converse inference from automatic to evolved. There is a large psychological literature demonstrating apparent automaticity for diverse processes, such as the activation of culturally local stereotypes about Black men (Eberhardt, Goff, Purdie, & Davies, 2004; Payne, Lambert, & Jacoby, 2002), perception of the layout of a chess board by experts (Chase & Simon, 1973), and other skills, including medical diagnosis and solving physics problems (Be´dard & Chi, 1992), cases in which few would argue for evolved modules specifically dedicated to the processes in question.

Interpreting the Results of DeSteno et al. (2002) We wish to stress, to avoid possible misinterpretations of our claims, that there is an asymmetry entailed by the use of cognitive load as a critical test of hypotheses about cognitive architecture. For cognitive load to be valid as a critical test, very specific conditions must hold: The hypothesized system in question must have no interaction with processes that might be influenced by cognitive load manipulations. On the other hand, for it to be rendered invalid as a critical test requires only that there be a potential for cognitive load to impact some other interacting system that might influence the results. In the present case, there are likely to be many processes involved in the jealousy task, which

begins with reading instructions in written English and ends with the participant indicating a response. Although there is uncertainty about the exact nature of the processes involved—as with any experimental task in psychology—this uncertainty weighs against cognitive load as a critical test. Our present conjectures about the role of mental simulation are just that— conjectures— but they show that it is not hard to imagine quite plausible ways in which cognitive load could have effects without ruling out the operation of an evolved jealousy system. Nonetheless, it is worth considering why cognitive load manipulations might produce the observed effects on the jealousy task. DeSteno et al. (2002) did not find that cognitive load produced mere random behavior. Instead, they found that in the cognitive load condition, women’s judgments became more similar to men’s. There was an increase in the proportion of women who deemed sexual infidelity to be more distressing than emotional infidelity. Suppose the conjectures we have offered above are correct, and the jealousy system takes as input the output of central systems. In particular, suppose that it can be influenced by systems that build scenarios about events that have not been directly observed, weighing background and contextual knowledge to estimate the probability of infidelity. Suppose further that the cognitive load manipulation interferes with these scenario-building systems. It could be that different kinds of imagination primes vary in how they recruit cognitive systems involved in simulating imaginary scenarios. The instruction to imagine one’s partner committing sexual infidelity might trigger potent and vivid imagery that could serve as input to a jealousy system. The instruction to imagine one’s partner in love or forming a deep emotional attachment with another individual might require more recruitment of resourcetaxing, deliberative processes—possibly those shared with the system required to hold digit strings in memory. Under this scenario, disrupting deliberative processes might lead to exactly the results observed by DeSteno et al. (2002), whereas allowing simulation and causal reasoning processes to operate as they would under natural conditions leads to the predicted sex difference. Again, this is only one possible scenario, but a plausible one. We would like to note, in passing, that an empirical test of this potential alternative explanation of DeSteno et al.’s (2002) results could be developed if the evolutionary logic of the information processing features of the jealousy system were fleshed out. If it is the case that jealousy reactions on this task depend on imagining different kinds of scenarios, there might be differences in the kinds of processes involved in imagining sexual infidelity versus emotional infidelity, and corresponding differences in the kinds of manipulations that could influence these processes. For example, sexual infidelity might be easily imagined through purely visual imagery, whereas emotional infidelity might require representation of the mental states of others, using systems that are not imagery based per se (i.e., intentions might matter for emotional infidelity but not sexual infidelity). One prediction would be that tasks that interfere with visual processing might have a greater impact on judgments about sexual infidelity, whereas tasks that interfere with assessment of intentions and other mental states (desire, commitment) might differentially impact emotional infidelity judgments. Finally, an important factor to consider in any research on evolved mechanisms is that all such mechanisms evolved to operate under particular conditions. These conditions never include the presentation of information in written form, as reading and

COMMENT ON DESTENO ET AL. (2002)

writing are historically recent innovations. This is not to say that written information cannot be used as input to evolved mechanisms. Indeed, much evolution-based research, including that of Buss et al. (1992), is predicated on the assumption that written information is converted, via the process of reading, into an internal conceptual format that evolved systems can use. However, this implies that manipulations that influence the cognitive processes involved in reading, interpreting, and mentally representing written information could interfere with processing steps that make information available to evolved mechanisms. This is another way in which cognitive load could interfere with judgments on experimental jealousy tasks in ways that are orthogonal to questions of evolved design.

Future Research on Jealousy Although we did not set out to write an article on jealousy per se, we make two suggestions for future work. First, we suggest that researchers treat the sexual and emotional jealousy hypotheses separately. Each is based on its own logic (prevention of cuckoldry for men and prevention of resource loss for women), and each should be evaluated on its own merits.7 Second, we suggest that it is time that the debate move beyond sex differences in self-reports, which may have reached the limit of their utility, and toward investigations of specific design features consistent with the hypothesized functions of jealousy in men and women (see Buss & Haselton, 2005). For example, if jealousy adaptations in men evolved, in part, to prevent cuckoldry, men should express jealousy more intensely when their partners display cues to higher overall levels of fertility, including youth and attractiveness, and to cues indicating that their partners are approaching the high fertility point in the menstrual cycle (for tests of these predictions, see Buss & Shackelford, 1997; Haselton & Gangestad, 2006).

Conclusion Some might conclude from our argument that evolutionary hypotheses simply are not testable. Although we would argue emphatically that this is not true—and we have offered some brief suggestions for tests in the case of jealousy—this is not an article about how to test hypotheses about jealousy in particular nor about how to test evolutionary hypotheses in general. On that topic, there already exists a large literature (e.g., Simpson & Campbell, 2005; Tooby & Cosmides, 1992). Rather, our aim has been to point out that a particular method, the cognitive load method, does not tell us what DeSteno et al. (2002) claimed that it tells us about evolved psychological mechanisms. As the use of evolutionary theory in psychology matures, it is particularly important that tests of evolutionary hypotheses be based on sound logic derived from evolutionary theory and from reasonable evolutionary assumptions. They should not import notions external to the hypothesis itself and not directly warranted by it. In particular, a variety of folk or informal theories have not yet been purged from psychology, including the idea that natural selection creates innate reflexes that cause humans to act automatically, like zombies. Many dichotomies, such as innate versus learned, evolved versus cultural, and instinctual versus conscious, are simply not licensed by the logic of evolutionary theory and cannot be the basis of logically sound tests of evolutionary hypotheses.

517

7 Note that there is an additional, untested, assumption contained in the emotional infidelity logic—namely, that external emotional involvement leads to resource diversion. Predictions about responses to emotional infidelity are therefore potentially weaker than predictions about responses to sexual infidelity.

References Barrett, H. C. (2005). Enzymatic computation and cognitive modularity. Mind and Language, 20, 259 –287. Barrett, H. C., & Kurzban, R. (in press). Modularity in cognition: Framing the debate. Psychological Review. Be´dard, J., & Chi, M. T. H. (1992). Expertise. Current Directions in Psychological Science, 1, 135–139. Buss, D. M. (2000). The dangerous passion: Why jealousy is as necessary as love and sex. New York: Free Press. Buss, D. M., & Haselton, M. G. (2005). The evolution of jealousy: A response to Buller. Trends in Cognitive Science, 9, 506 –507. Buss, D. M., & Kenrick, D. T. (1998). Evolutionary social psychology. In D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The handbook of social psychology (4th ed., pp. 982–1026). Boston: McGraw-Hill. Buss, D. M., Larsen, R. J., Westen, D., & Semmelroth, J. (1992). Sex differences in jealousy: Evolution, physiology, and psychology. Psychological Science, 3, 251–255. Buss, D. M., & Shackelford, T. K. (1997). From vigilance to violence: Mate retention tactics in married couples. Journal of Personality and Social Psychology, 72, 346 –361. Buunk, B. P., Angleitner, A., Oubaid, V., & Buss, D. M. (1996). Sex differences in jealousy in evolutionary and cultural perspective: Tests from the Netherlands, Germany, and the United States. Psychological Science, 7, 359 –363. Chase, W., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 5, 55– 81. Cosmides, L., & Tooby, J. (1994). Beyond intuition and instinct blindness: The case for an evolutionarily rigorous cognitive science. Cognition, 50, 41–77. DeSteno, D., Bartlett, M. Y., Braverman, J., & Salovey, P. (2002). Sex differences in jealousy: Evolutionary mechanism or artifact of measurement? Journal of Personality and Social Psychology, 83, 1103–1116. Eberhardt, J. L., Goff, P. A., Purdie, V. J., & Davies, P. G. (2004). Seeing black: Race, crime, and visual processing. Journal of Personality and Social Psychology, 87, 876 – 893. Fodor, J. A. (1983). Modularity of mind. Cambridge, MA: MIT Press. Fodor, J. A. (2000). The mind doesn’t work that way. Cambridge, MA: MIT Press. Harris, C. R. (2000). Psychophysiological responses to imagined infidelity: The specific innate modular view of jealousy reconsidered. Journal of Personality and Social Psychology, 78, 1082–1091. Harris, C. R. (2003). A review of sex differences in sexual jealousy, including self-report data, psychophysiological responses, interpersonal violence, and morbid jealousy. Personality and Social Psychology Review, 7, 102–128. Harris, C. R. (2005). Male and female jealousy, still more similar than different: Reply to Sagarin (2005). Personality and Social Psychology Review, 9, 76 – 86. Harris, P. L. (2000). The work of the imagination. New York: Blackwell. Haselton, M. G., & Gangestad, S. W. (2006). Conditional expression of women’s desires and men’s mate guarding across the ovulatory cycle. Hormones and Behavior, 49, 509 –518. Hill, K. R., & Hurtado, A. M. (1996). Ache´ life history: The ecology and demography of a foraging people. Hawthorne, NY: Aldine de Gruyter. Hurtado, A. M., & Hill, K. R. (1992). Paternal effect on offspring survivorship among Ache and Hiwi hunter-gatherers: Implications for mod-

518

BARRETT, FREDERICK, HASELTON, AND KURZBAN

eling pair-bond stability. In B. S. Hewlett (Ed.), Father– child relations: Cultural and biosocial contexts (pp. 31–55). Hawthorne, NY: Aldine de Gruyter. Kurzban, R., Tooby, J., & Cosmides, L. (2001). Can race be erased? Coalitional computation and social categorization. Proceedings of the National Academy of Sciences, 98, 15387–15392. Marlowe, F. (2003). A critical period for provisioning by Hadza men: Implications for pair bonding. Evolution and Human Behavior, 24, 217–229. Payne, B. K., Lambert, A. J., & Jacoby, L. L. (2002). Best laid plans: Effects of goals on accessibility bias and cognitive control in race-based misperceptions of weapons. Journal of Experimental Social Psychology, 38, 384 –396. Pinker, S. (1997). How the mind works. New York: Norton. Pinker, S. (2005). So how does the mind work? Mind and Language, 20, 1–24. Sagarin, B. J. (2005). Reconsidering sex differences in jealousy: Comment on Harris (2003). Personality and Social Psychology Review, 9, 62–75. Simpson, J. A., & Campbell, L. (2005). Methods of evolutionary sciences. In D. M. Buss (Ed.), The handbook of evolutionary psychology (pp. 119 –144). New York: Wiley. Sperber, D. (1994). The modularity of thought and the epidemiology of

representations. In L. A. Hirschfeld & S. A. Gelman (Eds.), Mapping the mind: Domain specificity in cognition and culture (pp. 39 – 67). New York: Cambridge University Press. Sperber, D. (Ed.). (2000). Metarepresentations: A multidisciplinary perspective. New York: Oxford University Press. Sperber, D. (2005). Modularity and relevance: How can a massively modular mind be flexible and context-sensitive? In P. Carruthers, S. Laurence, & S. Stich (Eds.), The innate mind: Structure and contents (pp. 53– 68). New York: Oxford University Press. Symons, D. (1979). The evolution of human sexuality. New York: Oxford University Press. Tooby, J., & Cosmides, L. (1992). The psychological foundations of culture. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 19 – 136). New York: Oxford University Press. Trivers, R. (1972). Parental investment and sexual selection. In B. Campbell (Ed.), Sexual selection and the descent of man, 1871–1971 (pp. 136 –179). Chicago: Aldine Publishing.

Received July 18, 2005 Revision received November 23, 2005 Accepted December 1, 2005 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 519 –523

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.519

Constraining Accommodative Homunculi in Evolutionary Explorations of Jealousy: A Reply to Barrett et al. (2006) David DeSteno and Monica Y. Bartlett

Peter Salovey

Northeastern University

Yale University

This article responds to a critique by H. C. Barrett, D. A. Frederick, M. G. Haselton, and R. Kurzban (2006), wherein it is argued that manipulations of cognitive constraints cannot be used to test general evolutionary hypotheses regarding the architecture of mind. In making this argument, Barrett et al. focus on what they believe to be faulty logic in D. DeSteno, M. Y. Bartlett, J. Braverman, and P. Salovey’s (2002) use of such techniques to examine proposed sex differences in jealousy. In presenting their argument, however, Barrett et al. appear to disregard central findings presented in DeSteno et al. (2002) and, in so doing, fail to grasp the interrelations among findings that might readily address their concerns. Here, the authors present arguments for why and when manipulations of cognitive resources may prove useful in investigating evolved psychological mechanisms and, in so doing, situate their use within the ongoing debate concerning evolved sex differences in jealousy. Keywords: emotion, evolutionary psychology, jealousy

It was within this framework that we conducted the experiments comprising the article that is the subject of Barrett, Frederick, Haselton, and Kurzban’s (2006) critique. To our mind, that set of experiments (DeSteno et al., 2002) was unique, as it allowed us to examine the viability of the ESD effect itself rather than to become mired in debate regarding the form and origin of any alternative mediating mechanisms. In short, we set forth predictions that should hold if findings supporting the ESD stemmed from evolved modules. In this way, rather than continuing to argue about possible alternative mechanisms that might underlie ESD findings, and whether such alternatives were themselves the product of the evolutionary chisel, we simply set out to test whether the ESD would function as should any decision stemming from an evolved module. In brief, our 2002 article demonstrated that the ESD was methodologically limited to a forced-choice response format. On all other response measures, it was absent; men and women both reported greater aversion to sexual infidelity.2 In addition, we demonstrated that the disparity between participants’ responses on the forced-choice and all other measures disappeared when forcedchoice decisions were made under conditions known to enhance outcomes derived from efficient cognitive processes. To our mind, these findings, when taken together, made a strong case that the

Entities should not be multiplied beyond necessity. —William of Ockham

The evidentiary basis for sex differences in jealousy has been vigorously contested for over a decade (e.g., Buss, Larsen, & Westen, 1996; Buss, Larsen, Westen, & Semmelroth, 1992; Buunk, Angleitner, Oubaid, & Buss, 1996; DeSteno, Bartlett, Braverman, & Salovey, 2002; DeSteno & Salovey, 1996; Green & Sabini, 2006; Harris, 2000, 2003; Harris & Christenfeld, 1996). The predictions comprising the proposed evolutionary sex difference (ESD)1 that began the controversy were quite straightforward: As a result of the biological constraints inherent in human mating, men need to be sensitive to sexual infidelities to prevent cuckoldry, whereas women, certain of their maternity, need concern themselves with keeping the parental investment of the man (Buss et al., 1992). Belying the dispute that was to follow, one phenomenon on which all sides of the debate do agree is that if one simply asks men and women to choose between instances of sexual or emotional infidelity as more upsetting, women, by a wide margin, select emotional infidelity and thereby demonstrate a seeming sex difference supporting the ESD. At this point, however, the consensus ends. The evolved nature of the cognitive processes that produce this difference and even the scientific validity of the difference itself have repeatedly been called into question, leading to seemingly endless back and forth volleys from opposing camps.

1

Note that the term evolutionary is used to refer to the theory specified by Buss and colleagues (Buss et al., 1992). This is only one of many possible theories of jealousy that could be derived from an evolutionary perspective. Thus, we wish to stress that in arguing against this view, we are in no way proposing that evolutionary pressures do not play a role in shaping jealousy, but rather that the specific predictions set forth by Buss and colleagues are not supported by the data. 2 Of great significance, the generality of this finding has recently been demonstrated with the largest and most representative sample to date: a national probability sample (Green & Sabini, 2006). Here again, no sex difference in jealousy emerged on nonforced-choice measures; instead, men and women both reported greater aversion to sexual than to emotional infidelity.

David DeSteno and Monica Y. Bartlett, Department of Psychology, Northeastern University; Peter Salovey, Department of Psychology, Yale University. This research was supported by Grant MH068240 from the National Institute of Mental Health. Correspondence concerning this article should be addressed to David DeSteno, Department of Psychology, Northeastern University, Boston, MA 02115. E-mail: [email protected] 519

520

DESTENO, BARTLETT, AND SALOVEY

canonical, reflexive response of men and women to infidelities was the same and, accordingly, that findings previously used to argue for the ESD were artifactual. Given the longevity of this debate, we were not surprised to discover a critique of our work. However, as we read Barrett et al.’s (2006) critique, we had to admit puzzlement about both its characterization of our positions and, most notably, its omission of fundamental aspects of our previous article. The article that Barrett et al. critique contained two distinct experiments, the findings of which only make sense in the context of each other. Yet, discussion of the first experiment is entirely absent from their critique; rather, they focus solely on use of a cognitive load methodology to study the ESD and, in so doing, present its rationale and findings out of context. Second, although they attribute to us an extremely rigid and, in some cases, slightly absurd view of modularity, the article they criticize contains discussions of several hypothesized architectures for evolved modules and details how our findings fit with each. In responding to their criticisms, we felt that rather than engage in a point-by-point rebuttal of every minute instance of disagreement, it would be more beneficial (and interesting to readers) in the limited space allowed to focus on the major issues of contention and, in so doing, provide an overview not only of where this debate has been, but also of where, in our opinion, future work on jealousy needs to go if generative progress is to be made. Accordingly, this article proceeds in two sections. The first addresses the fundamental criticisms raised by Barrett et al. (2006) regarding our methodology for examining the ESD. The second presents our view of why the architecture of evolutionary mechanisms, including those theorized to underlie the ESD, must be characterized by cognitive efficiency.

The Importance of Context: A Full Account of DeSteno et al. (2002) In reading Barrett et al.’s (2006) critique, it becomes readily apparent that there is some disconnect between the contents of DeSteno et al. (2002) and how such contents are presented in Barrett et al. In many cases, this may be due to simple differences in interpretation of terms, but in others, consideration of fundamental components of our previous article is absent, making informed evaluation of our arguments difficult for readers. Cutting to the heart of the matter, Barrett et al.’s (2006) primary criticism is that our use of a cognitive load methodology derives from a faulty assumption that evolved cognitive modules (ECMs) are necessarily automatic and accept “no input from deliberative or effortful processes” (Barrett et al., 2006, p. 513). Accordingly, they argue that the use of cognitive load to constrain cognitive resources is a flawed method to test whether the ESD stems from ECMs. They make this claim for two reasons. First, they argue that ECMs should receive input from nonautomatic mental processes. Second, they note that ECMs need not be characterized by automaticity. On the first point, we, of course, agree. On the second, we do not, but wish to point out that acceptance or rejection of it is not entirely necessary to support the claims made in DeSteno et al. (2002). The “faulty” assumption regarding modular input that is attributed to us seemed a bit off the mark. We have never stated that ECMs should be unable to receive input from conscious or

domain-general mechanisms. If that were our view of the way ECMs functioned, then we would never have bothered to study jealousy by using hypothetical scenarios and vignettes. Such methodologies necessarily depend on the ability to create and consciously manipulate simulations. Indeed, we relied on just such tactics in the first experiment of DeSteno et al. (2002). This point aside, Barrett et al. (2006) go on to argue that the use of cognitive load to study ECMs would be appropriate only if one were to embrace such a restrictive view of allowed input. They correctly point out that any manipulation of cognitive resources that completely inhibited the ability to engage in simulations could bias the output of ECMs—the old garbage-in, garbage-out problem. We completely agree, which is why DeSteno et al. (2002) contained two experiments that must be considered together. The first showed that the ESD is absent under normal processing conditions on all preference measures except the forced choice; the second demonstrated the conditions (i.e., cognitive constraint) under which the aberrant responses on the forced-choice measure move into parity with responses on the multiple other measures. Thus, the cognitive load study was not meant to be a stand-alone investigation. Rather, it was meant to explain a divergence among responses on different measures that is known to occur under normal processing conditions. The importance of this interpretation cannot be overstated; hence, we will outline the logic of our previous work below. In testing the validity of the ESD, we made two assumptions. First, we argued that the ESD should not be limited to a forcedchoice response format but rather should occur on a majority of self-report preference measures (e.g., Likert scale, agree– disagree, checklists). Second, we argued that the mechanisms underlying the ESD should function efficiently. Accordingly, the first experiment required individuals to report the jealousy they would feel in response to sexual and emotional infidelities. They did so by using not only the traditional forced-choice response format but also multiple types of continuously scaled measures that assessed responses to sexual and emotional infidelity individually. Analysis of these data produced a very consistent effect: Men and women both reported more distress from sexual than from emotional infidelity on all measures save the forced choice. The data pattern was the same whether we examined individual item responses or used covariance structure modeling to remove method-specific error variance among the continuous measure scales. It is interesting to note, however, that these same participants produced the usual ESD on the forced-choice measure. At this point, our suspicion that the ESD was a methodological artifact gained traction. Limitation of the ESD to a single-response format seemed quite a difficult conundrum for supporters of the ESD to explain. Surely the canonical case for which such mechanisms developed was not one where an individual simultaneously came upon a partner engaged in sexual activity on one hand and emotional bonding on the other. Instead, heightened jealousy should occur whenever the target infidelity threat is considered. Our hunch for why the ESD emerged only on forced-choice measures was that it was driven by a decision strategy induced by the specific question format. It has long been known that preference questions can produce conflicting results as a function of response format (Payne, 1982). Indeed, much work from the decision-making literature has demonstrated that a forced-choice format, as opposed to others, leads

REPLY TO BARRETT ET AL. (2006)

individuals to make a comparison by considering the trade-offs of each possibility (Lichtenstein & Slovic, 1973; Payne, 1982; Tversky, Sattah, & Slovic, 1988). In fact, our earlier work had shown that consideration of just such a trade-off often accounts for the ESD (DeSteno & Salovey, 1996). To the degree that one believed emotional infidelity was more likely to lead to sexual infidelity than the converse, one’s probability of selecting emotional infidelity as most distressing increased. In essence, selection of emotional infidelity implied a double-shot of infidelity. Of import, such conditional expectations not only accounted for the sex difference on the ESD but also predicted choices within members of each gender. Of course, this trade-off represents only one of many that individuals could consider when forced to choose between the two options. To demonstrate that it was such an effortful analysis that drove the ESD, as opposed to an ECM, we decided to use a cognitive load manipulation. The basic logic was quite simple. If an effortful analysis of trade-offs were guiding choice, a manipulation that reduced working memory capacity should inhibit such reasoning and thereby reduce or remove the ESD. The most central point here, and the one that Barrett et al. (2006) seem to have misunderstood, is that we were using cognitive load to see whether we could explain findings that we knew to occur under normal processing conditions. That is, we had already demonstrated that the ESD is absent on all other preference measures under normal processing conditions. Cognitive load, then, manipulated the ability of our proposed mediator (i.e., an effortful cognitive analysis) to function and, in so doing, provided a clear test of its role in producing the ESD. Our expectation was supported by the data: When effortful reasoning was limited, both men and women reported greater jealousy to sexual infidelity, thereby bringing their responses on the forced-choice measure into parity with their responses on all other measures completed under normal simulation conditions. Barrett et al.’s (2006) argument regarding the possibility that a load manipulation might inhibit deliberative processes that provide input to ECMs is sound. Indeed, had the cognitive load manipulation produced a random result, we would not have published it as we could not be sure whether the null finding resulted from inhibition of an effortful analysis or from degradation of the simulation of infidelities. Here again is why the first experiment in our previous article (DeSteno et al., 2002) becomes so important. It provided a baseline condition demonstrating the absence of the ESD on all measures save one under normal simulation conditions. In so doing, it allowed us to have confidence that the greater universal aversion to sexual infidelity reported by those under load is valid; it matches what these same people reported under normal processing conditions on multiple jealousy measures. We found it somewhat curious how Barrett et al. (2006) attempted to explain away the fact that the participants under load in our experiment did not produce a random result. They suggest that imagining different types of infidelity might recruit different simulation systems. It could be, they note, that “[t]he instruction to imagine one’s partner in love or forming a deep emotional attachment with another individual might require more recruitment of resource-taxing, deliberative processes” (Barrett et al., 2006, p. 516). Although many things may be possible, this conjecture is certainly not probable. To accept this possibility is to ignore much that is known in the judgment and decision-making literature and

521

to disregard the principle of parsimony. In essence, one would need to assume that the reason cognitive load produced a greater universal aversion to sexual infidelity on the forced choice measure had nothing to do with a decision strategy known to be induced by forced-choice response formats. Rather, one would need to assume that responses on the two classes of measures (i.e., forced choice vs. continuous) moved into parity because of the operation of two distinct processes: load-induced differential simulation abilities for the forced-choice task and lack of a formatinduced trade-off analysis for the continuous measures. Such an explanation strikes us as Procrustean. In an attempt to insulate the ECM view, multiple new entities or phenomena are being invoked, without theoretical or empirical backing, to account for a finding that is much more simply explained as a methodological artifact. The use of cognitive load, or other methodologies that manipulate processing constraints, can, we believe, be of use in testing evolutionary, or most any other, cognitive models. We completely agree that care must be taken to ensure that such constraints are not unduly inhibiting input to proposed mechanisms. This criterion, however, can often be met through the use of control studies, as was the case in DeSteno et al. (2002). Naive use of cognitive load can make interpretation difficult. Judicious use, however, can provide a clear test of competing mediating mechanisms. We also wish to note that our use of cognitive load was not meant to be a magic bullet to remove all effortful processing; load has no such ability. Rather, it simply functioned to inhibit effortful analyses to some degree, which would allow us to test an ordinal prediction regarding movement of the data in the predicted direction. It is true that we believe that ECMs must function with some degree of efficiency. Thus, we also felt that use of cognitive load was an appropriate way to assess whether ECMs produced the ESD. However, we wish to stress that acceptance of this view is not necessary to accept the claim that the ESD is a methodological artifact. Rather, all one must accept is that load inhibits an effortful trade-off analysis and that the lack of the ESD also occurs under conditions of normal simulation abilities. Still, we found it quite odd that Barrett et al. (2006) attributed to us such a tightly constrained view of modules. A quick perusal of the discussion section of DeSteno et al. (2002) clearly presents multiple views of modularity. We agree that the Fodorian view (Fodor, 1983) is very tightly constrained and, in fact, that a functionally modular view might be more appropriate. In such models, multiple automatic mechanisms may be linked and, in the absence of conscious intervention, function as an integrated computational system. Automatic here does not mean anatomically encapsulated or insulated from top-down modification. All it means is that such systems regularly function with high efficiency and may do so without continual volitional guidance (cf. Bargh, 1994; Wegner & Bargh, 1998). To us, that seems the least one can expect of an ECM, a point to which we now turn.

On the Need for Efficiency: Banishing Accommodative Homunculi As we noted earlier, Barrett et al. (2006) believe it incorrect to attribute properties of automaticity or efficiency to ECMs by default. Accordingly, they argue that the use of experimental manipulations that restrict executive functions does not provide valid tests of general evolutionary psychological hypotheses. In

522

DESTENO, BARTLETT, AND SALOVEY

making this argument, they note that many evolutionary psychologists have been explicit in rejecting Fodor’s (1983) arguments that features such as automaticity and encapsulation are necessary properties of evolved modules. The primary architectural concept for evolutionary psychologists, they argue, is functional specialization. Thus, for Barrett et al., the only design feature that is minimally necessary for a jealousy ECM is that “each sex should have cognitive mechanisms that cause them to be sensitive to cues indicating these [infidelity] events and to adjust their behavior to minimize the probability of these events occurring. Nothing about the effortlessness or automaticity of the processes guiding behavior is mandated by the logic of the hypothesis” (Barrett et al., 2006, pp. 515–516). In considering their argument, we suggest that the attack of Fodor’s (1983) views in isolation is not particularly useful for resolving the current controversy. At present, there is healthy debate regarding some of the features of modules that Fodor suggested. We agree that anatomical encapsulation is unlikely to be a general feature of modularity. We also agree that impenetrable automaticity may not characterize all modules. However, we completely disagree that one can define a module without invoking some type of automaticity. There are several ways in which a mental process may be automatic (Bargh, 1994; Wegner & Bargh, 1998). At a minimum, however, we assert that ECMs must possess two properties of automaticity: They must be able to be initiated without conscious intention, and they must function efficiently. In principle, this view simply implies that ECMs must be triggered automatically by specific forms of input, whether received by perceptual or simulation systems, and must then produce a computational result based on a series of built-in assumptions that do not require conscious deliberation. As Pinker (1997) noted, grand examples of ECMs are assumed to underlie the visual system. For example, certain ECMs must always interpret differences in shading in a certain manner in order to produce a coherent representation of object orientation. Such a process happens automatically, whether one is perceiving or imagining a scene. The involved ECMs do not cogitate on what to assume or how to interpret shading; they simply put the potentially ambiguous input through a specified algorithm that provides what is hopefully a meaningful result. In arguing for efficiency, we are in no way stating that the input to or output from ECMs cannot be modified by executive systems; we do not preclude the possibility that ECMs interact with more central, deliberative systems. Although a more impenetrable view may be appropriate for some basic perceptual modules, we agree that it is not a necessary feature for all modules. However, where we believe the reasoning of Barrett et al. (2006) fails is in the assumption that the computations of the ECMs themselves need not be characterized by automaticity. If such computations are not effortlessly initiated in response to stimuli and do not arrive at output by using specified assumptions, then we must attribute to modules the ability to freely cogitate and, thereby, unmask them as homunculi. In posing this view, we are in no way making what is sometimes referred to as the zombie argument, wherein “natural selection creates innate reflexes that cause humans to act automatically, like zombies” (Barrett et al., 2006, p. 517). Such a view could be true only if the output of specific ECMs were believed to mediate behavior without the possibility of being influenced by other

modules or executive control systems. No one to our knowledge has endorsed this view. The point we are making is that ECMs can be expected to function in a multiprocess cognitive system where decisions are jointly determined. The contribution of each ECM, however, must itself be based on an automatic computational process if the homunculus trap is to be avoided. In accord with this view, we are willing to agree with Barrett et al.’s (2006) notion that a man’s jealousy in response to a late arrival home by his wife, who he also knows to have a new male coworker, may be moderated by contextual information, such as knowledge of a transit strike. However, we completely disagree with their notion that an ECM would be sensitive to such circumstantial factors received as input. It is inconceivable to us that an ECM would be sensitive to such abstract, and modern, contextual knowledge. If it were, it would have to have the power of volitional reason, implying a homunculus-like ability to learn and then embrace or discount such information. Even if one were to theorize the existence of a multitude of accommodative modules sensitive to every eventuality (e.g., transit strike, power outage), the result would be a virtual homunculus; it would allow no possibility for falsification. Yet, as we noted, we do agree that contextual moderation of jealousy is likely to prevent our wayward heroine from finding an enraged zombie at home in the event of a transit strike. Such moderation, we assert, will stem from modulation of ECM output by more central control systems. Humans’ general reasoning abilities often function to shape or correct the output of more evolutionarily primitive computational systems (Chaiken & Trope, 1999; LeDoux, 1996). Indeed, recent evidence indicates that evolved mechanisms influencing moral judgment function in just this manner (Greene, Nystrom, Engell, Darley, & Cohen, 2004; Greene, Sommerville, Nystrom, Darley, & Cohen, 2001). These more primitive systems work on the assumption that it is better to be wrong than to be dead (or cuckolded, as the case may be), but the most optimal outcome is often derived from contextual moderation of modular output by central systems.3 This general flexibility is one of the factors that gives humans such adaptability. Accordingly, awareness of a transit strike might allow one to tamp down rising jealousy stemming from an ECM. The case would be similar to a familiar one wherein an individual automatically recoils from a snake in a zoo exhibit, but then quickly relaxes as the conscious mind realizes that a piece of Plexiglass will prevent a strike by the serpent. Given this view, we believe that the use of cognitive load can offer an appropriate methodology to examine questions regarding proposed ECMs, as long as it is established that interference with volitional simulations or stimuli reception cannot likely be identified as the reason for an absence of effects. We believe we have met this criterion by demonstrating the lack of the ESD under normal simulation conditions and by explicitly predicting its absence under cognitive load on the basis of well-established precedents regarding use of forced-choice response formats. In the 3

This is not to say that all contextual sensitivity must be driven by higher order systems. It is certainly likely to be the case that some sets of modules may be linked in ways that allow one to modify the output of others. Such cases, however, will likely be limited to cognitive systems associated with long-standing informational value (e.g., affective states; cf. LeDoux, 1996).

REPLY TO BARRETT ET AL. (2006)

present case, inhibition of the central system should not have diminished the ESD if it stemmed from ECMs. Cognitive load does not function to zap the cortex; it simply allows more efficient systems to proceed unimpeded. As one might expect, we also believe that manipulations of cognitive resources may be used, given certain constraints, to examine hypotheses of modularity in many areas. Indeed, they have profitably been used to examine the decoding of nonverbal behavior, an ability many postulate as mediated by ECMs. For example, Ambady and Gray (2002), by manipulating levels of deliberative analysis, demonstrated increased accuracy for nonverbal perception under conditions designed to favor increased automatic processing. Although such manipulations may not be relevant, or even possible, for examining all posed ECMs, we favor their use when meaningful predictions can be made. Another reason we endorse the use of such methodologies is that they have the potential to avoid the aforementioned problem of accommodative homunculi. In the debate over jealousy ECMs, this is an issue with which we have struggled. As an example, after DeSteno and Salovey (1996) demonstrated that the differential infidelity implications individuals attached to each type of infidelity were responsible for driving the ESD on the forced-choice measure, Buss et al. (1996) suggested that such beliefs about whether emotional infidelity also implied sexual infidelity, and vice versa, might actually be part of the evolutionary mechanism that drives the ESD. A similar argument could be made to accommodate any proposed alternative mediator. This fact is what drove us to engage in a set of process-oriented experiments focusing not on proposed alternative mechanisms but on the validity of the supportive evidence itself. The only clear way to banish accommodative homunculi is to show their true nonmodular nature.

Coda A reasonable question at this point concerns where the study of jealousy is headed. The problem of accommodation mentioned above suggests that the direction for progress must entail a paradigm shift with respect to the theories and methodologies that currently form the canonical basis for investigations of jealousy. If research on jealousy is to progress and avoid taking on characteristics of intransigence, then both sides of the debate must become more sophisticated in their theorizing and methodologies while also resisting temptations to insulate their views behind accommodative hypotheses (cf. Lakatos, 1977). Thus, we agree with Barrett et al. (2006) that work in this area could benefit greatly from increased specificity of the architecture underlying proposed sex differences in jealousy as well as from process-oriented experiments meant to put specific architectural models to the test. No doubt that there will be much debate concerning the appropriate ways to go about this task, but a continued reliance on the use of single questionnaires to assess responses to imagined infidelities, in the absence of other manipulations, is unlikely to provide further resolution to the controversy. We feel that experiments focusing on process hold the best potential to move the debate forward.

523

Bargh, J. A. (1994). The four horsemen of automaticity: Awareness, intention, efficiency, and control in social cognition. In R. S. Wyer & T. K. Srull (Eds.), Handbook of social cognition (2nd ed., pp. 1– 40). Mahwah, NJ: Erlbaum. Barrett, H. C., Frederick, D. A., Haselton, M. G., & Kurzban, R. (2006). Can manipulations of cognitive load be used to test evolutionary hypotheses? Journal of Personality and Social Psychology, 91, 513–518. Buss, D. M., Larsen, R., & Westen, D. (1996). Sex differences in jealousy: Not gone, not forgotten, and not explained by alternative hypotheses. Psychological Science, 7, 373–375. Buss, D. M., Larsen, R., Westen, D., & Semmelroth, J. (1992). Sex differences in jealousy: Evolution, physiology, and psychology. Psychological Science, 3, 251–255. Buunk, B., Angleitner, A., Oubaid, V., & Buss, D. M. (1996). Sexual and cultural differences in jealousy: Tests from the Netherlands, Germany, and the United States. Psychological Science, 7, 359 –363. Chaiken, S., & Trope, Y. (1999). Dual-process theories in social psychology. New York: Guilford Press. DeSteno, D., Bartlett, M. Y., Braverman, J., & Salovey, P. (2002). Sex differences in jealousy: Evolutionary mechanism or artifact of measurement? Journal of Personality and Social Psychology, 83, 1103–1116. DeSteno, D. A., & Salovey, P. (1996). Evolutionary origins of sex differences in jealousy? Questioning the “fitness” of the model. Psychological Science, 7, 367–372. Fodor, J. A. (1983). The modularity of mind. Cambridge, MA: MIT Press. Green, M. C., & Sabini, J. (2006). Gender, socioeconomic status, age, and jealousy: Emotional responses to infidelity in a national sample. Emotion, 6, 330 –334. Greene, J. D., Nystrom, L. E., Engell, A. D., Darley, J. M., & Cohen, J. D. (2004). The neural bases of cognitive conflict and control in moral judgment. Neuron, 44, 389 – 400. Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., & Cohen, J. D. (2001, September 14). An fMRI investigation of emotional engagement in moral judgment. Science, 293, 2105–2108. Harris, C. R. (2000). Psychophysiological responses to imagined infidelity: The specific innate modular view of jealousy reconsidered. Journal of Personality and Social Psychology, 78, 1082–1091. Harris, C. R. (2003). A review of sex differences in sexual jealousy, including self-report data, psychophysiological responses, interpersonal violence, and morbid jealousy. Personality and Social Psychology Review, 7, 102–128. Harris, C. R., & Christenfeld, N. (1996). Gender, jealousy, and reason. Psychological Science, 7, 364 –366. Lakatos, I. (1977). The methodology of scientific research programmes: Philosophical papers (Vol. 1). Cambridge, England: Cambridge University Press. LeDoux, J. E. (1996). The emotional brain. New York: Simon & Schuster. Lichtenstein, S., & Slovic, P. (1973). Response-induced reversals of preference in gambling: An extended replication in Las Vegas. Journal of Experimental Psychology, 101, 16 –20. Payne, J. W. (1982). Contingent decision behavior. Psychological Bulletin, 92, 382– 402. Pinker, S. (1997). How the mind works. New York: Norton. Tversky, A., Sattath, S., & Slovic, P. (1988). Contingent weighting in judgment and choice. Psychological Review, 95, 371–384. Wegner, D. M., & Bargh, J. A. (1998). Control and automaticity in social life. In D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The handbook of social psychology (4th ed., pp. 446 – 496). Boston: McGraw-Hill.

References Ambady, N., & Gray, H. M. (2002). On being sad and mistaken: Mood effects on the accuracy of thin-slice judgments. Journal of Personality and Social Psychology, 83, 947–961.

Received April 20, 2006 Revision received May 6, 2006 Accepted May 11, 2006 䡲

PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES

Conserving Self-Control Strength Mark Muraven, Dikla Shmueli, and Edward Burkley University at Albany, State University of New York Individuals may be motivated to limit their use of self-control resources, especially when they have depleted some of that resource. Expecting to need self-control strength in the future should heighten the motivation to conserve strength. In 4 experiments, it was found that depleted participants who anticipated exerting self-control in the future performed more poorly in an intervening test of self-control than participants who were not depleted, and more poorly than those who did not expect to exert self-control in the future. Conversely, those who conserved strength performed better on tasks that they conserved the strength for as compared with those who did not conserve. The underlying economic or conservation of resource model sheds some light on the operation of self-control strength. Keywords: self-control strength, self-regulation, motivation, goals, conservation

For that reason, understanding how people allot limited resources may be important for understanding self-control. Given the limited nature of self-control, individuals are making decisions (likely unconsciously) about how to apportion a limited resource whenever they exert self-control. This is important because the allocation of self-control resources might help explain why after one exerts self-control, subsequent self-control performance suffers (e.g., Baumeister, Bratslavsky, Muraven, & Tice, 1998; Muraven, Collins, & Nienhaus, 2002; Muraven, Tice, & Baumeister, 1998; Vohs & Heatherton, 2000). In particular, we argue that the desire to conserve resources for future demands may help explain why self-control suffers. Individuals who recently exerted selfcontrol may have a greater desire to conserve energy than individuals who did not exert self-control, which may help explain why exerting self-control leads to poorer performance just afterward.

One of the primary strengths of humans is that they can consider the future. The consequences, outcomes, or implications of an action often shape our present behavior. The desire to be healthy in the future may lead one to quit smoking or persist on a diet. In that way, the future can affect the present, as individuals change their behavior to foster the occurrence of a desired outcome in the future. Indeed, the whole process of self-regulation entails guiding current behavior to reach a future goal (Carver & Scheier, 1998; Higgins, 1996). Self-regulation comes at a cost, however. Besides entailing the forgoing of pleasures (Tice, Bratslavsky, & Baumeister, 2001), the exertion of self-control appears to deplete a limited resource (selfcontrol strength) needed for the success of self-control (Muraven & Baumeister, 2000). This strength is recovered slowly, so that immediately after exerting self-control, there is less of this vital resource. Because of the crucial but limited nature of self-control strength, people must be judicious in their management of it (as with other limited resources; see Hobfoll, 2002). If a person spends too much of this resource on a low-priority project (e.g., being nice to fellow drivers during rush hour), he or she might not have enough strength for critical, high-priority projects (e.g., not binging on alcohol).

The Nature of Self-Control We define self-control as the overriding or inhibiting of automatic, habitual, or innate behaviors, urges, emotions, or desires that would otherwise interfere with goal directed behavior (Barkley, 1997a; Baumeister, Heatherton, & Tice, 1994; Kanfer & Karoly, 1972). When people exert self-control, they inhibit their normal, typical, or automatic behavior (Bargh & Chartrand, 1999). For example, if someone typically smokes after eating, then it requires the exertion of self-control to alter this habit and not smoke after dinner. If the person does not exert self-control, he or she will behave automatically and smoke (Tiffany, 1990). People exert self-control because they want to follow a rule (either externally or internally determined) or delay gratification (Barkley, 1997a; Hayes, 1989; Shallice & Burgess, 1993). Overcoming a habitual or instinctual pattern of acting by definition requires self-control.

Mark Muraven, Dikla Shmueli, and Edward Burkley, Department of Psychology, University at Albany, State University of New York. Edward Burkley is now at the University of North Carolina, Chapel Hill. This research was based in part on the doctoral dissertation of Mark Muraven, who thanks the members of his committee, Roy Baumeister, Dianne Tice, Tim Curran, and Jagdip Singh, for their guidance. Portions of this research were supported by National Institute on Alcohol Abuse and Alcoholism Grant AA12770 and National Institute on Drug Abuse Grant DA015131. Correspondence concerning this article should be addressed to Mark Muraven, Department of Psychology, University at Albany, State University of New York, Albany, NY 12222. E-mail: [email protected]

Journal of Personality and Social Psychology, 2006, Vol. 91, No. 3, 524 –537 Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.524

524

CONSERVING STRENGTH

Recent research supports the argument that individuals treat self-control as a limited resource that gets depleted with use. In particular, if self-control is a limited resource, than performance on subsequent self-control tasks should be diminished after the exertion of self-control (e.g., Baumeister et al., 1998; Muraven et al., 1998; Vohs & Heatherton, 2000; Wallace & Baumeister, 2002). When placed in a situation that required drinking restraint (participants believed they would be taking a driving test after sampling some beer), participants who had to override their thoughts (a task that requires self-control) prior to drinking consumed more alcohol and were more intoxicated than participants who solved simple arithmetic problems prior to drinking (Muraven et al., 2002). Notably, only the amount of self-control exerted in the first phase of the experiment was related to the subsequent inability to regulate alcohol intake; participants did not drink more after regulating their thoughts because they were more frustrated, irritated, aroused, or otherwise unhappy. Consistent with prior research, exerting self-control (and only self-control; see Baumeister et al., 1998; Muraven & Slessareva, 2003) leads to a decline in selfcontrol performance subsequently. Research has found a correlation between the amount of selfcontrol exerted and subsequent self-control performance (e.g., Muraven et al., 2002). However, additional research suggests that the relationship between self-control resources and self-control performance is perhaps more complicated. Most notably, the motivation to exert self-control and the rewards people receive for exerting self-control moderate the relationship between depletion and self-control performance (Muraven & Slessareva, 2003). This is significant because a complete theory of self-control strength will have to explain why the exertion of self-control often (but not always) leads to poorer self-control performance subsequently. The moderation of depletion by motivation suggests that selfcontrol suffers in many situations because individuals are not unable but instead are not willing to exert sufficient self-control to overcome the impulse. There are several reasons why people may be less willing to exert sufficient self-control: It requires much effort, it is unpleasant, or it is too costly. The current research addresses the last point: People may fail at self-control because they are more concerned with conserving strength than with the outcomes of self-control. This may help explain why exerting self-control has been shown to lead to a decline in self-control performance and further develop the ideas underlying the selfcontrol strength model.

Conservation of Resources The desire to conserve strength and the associated aversion to losing strength may explain why anticipating a self-control demand leads to poorer self-control on an unrelated task. From an economic perspective (Tversky & Kahneman, 1981), self-control can be viewed as the investment of a limited resource (self-control strength) in an unknown but risky endeavor. This means that the loss of self-control strength is given more weight than a comparable gain (Hobfoll, 2002; Hobfoll, Johnson, Ennis, & Jackson, 2003; Tversky & Kahneman, 1981). Consistent with that economic idea, theories of the management of limited resources have shown that for limited and depletable resources (e.g., energy, social support) that are recovered slowly, people are motivated to obtain, retain, and protect their supply (Hobfoll, 2002). Individuals hold

525

energy in reserve when the benefits of using the resource do not outweigh the costs of depleting it (Scho¨npflug, 1983). In short, people may try to minimize the amount of self-control strength they deplete by being selective in their self-control efforts. Furthermore, this motivation to conserve should be intensified among those who have recently lost resources. Individuals who have already suffered a loss or who have fewer resources value, defend, and are less likely to use their remaining resources more than individuals who have greater resources (Norris & Kaniasty, 1996; Park & Folkman, 1997). Those whose self-control strength has been more depleted should try to avoid expending more strength as compared with others who are less depleted. Finally, anticipation of future demands on resources may affect people, especially those lower in resources, as they prepare for the potential loss (Folkman & Lazarus, 1985; Gottlieb, 1987). People may prepare for future losses by conserving what resources they have (Aspinwall & Taylor, 1997). This preparation for future losses may be magnified by the degree of the future demand as well as the person’s personal resources. Those who are lower in resources should be more motivated to conserve and protect what resources they have as compared with those with greater resources. Hence, because we posit that self-control strength is a limited resource that is recovered slowly (for more on the recovery of lost self-control strength, see Tice, Baumeister, Shmueli, & Muraven, 2005), depleted persons should be more motivated to conserve strength than less depleted persons. This increased motivation to conserve (at either a conscious or an unconscious level) may be reflected in poorer self-control performance, as the person becomes less willing to invest the required resources to succeed at self-control. As outlined earlier, the motivation to conserve may be heightened when the person expects to exert self-control in the future, especially when the person is already motivated to protect his or her resources. Put another way, anticipating self-control in the future may have a larger effect on depleted individuals as compared with less depleted individuals. Self-control performance may depend on both past and future self-control demands. We tested the hypotheses generated by this conservation model in four experiments. If the decline in self-control performance after exerting self-control is a result of increased conservation of resources, then depleted participants who expect to exert self-control (and only self-control) in the future should value their strength more, be more concerned with future demands, and try to conserve their strength to a greater degree. Hence, depleted participants should perform more poorly on an intervening test of self-control before this anticipated task. Looking at how future self-control demands affect depleted persons allows us to determine whether conservation of limited resources underlies many of the phenomena associated with self-control strength.

Experiment 1 In Experiment 1, participants’ desire to conserve strength was enhanced by having them believe that they would have to engage in self-control in the near future. They were told that after the dependent measure they would take another test of performance. The final test of performance was described as difficult and effortful. Half the participants were led to believe that the task would also require self-control, whereas the other half believed that the task would not require self-control.

MURAVEN, SHMUELI, AND BURKLEY

526

The desire to conserve self-control strength should be the strongest when the individual has little strength and expects to need his or her remaining strength in the future. Therefore, participants who are more depleted and who expect to engage in self-control in the future should perform more poorly on a test of self-control than those who are less depleted and do not expect to engage in self-regulation in the future and more poorly than less depleted participants.

Method Participants Ninety-four undergraduate students (61 men and 33 women) were recruited for Experiment 1. One male participant who indicated that he had a health problem that precluded putting his hand in ice water was excused; thus, the data for 93 participants are reported here. Participants received partial course credit in return for their participation. Each individual testing session lasted about 30 min.

Procedure The experimenter told participants that they were taking part in an experiment looking at the role of concentration on performance. Participants were told that they would take several general tests of concentration and distraction. They were not informed of the depletion model, nor were they told that their self-control performance was being investigated. The experimenter was unaware of the research hypotheses. In this experiment (and the ones that follow), the same basic procedures were followed. First, participants’ self-control strength was depleted (Task 1). After that, they were told about two additional tests: the one that would serve as the dependent measure (Task 2) and a task on which anticipation of future need for self-control could be manipulated (Task 3). In particular, participants were told that after Task 2, they would engage in a third task that might require some degree of self-control. They then engaged in Task 2 only. Depletion phase. Participants were then randomly assigned to either solve moderately difficult multiplication problems (math problem condition) or suppress the thought of a white bear (thought suppression condition). Participants in the thought suppression condition were asked to write down their thoughts on a piece of paper while trying to avoid thinking about a white bear (Wegner, Schneider, Carter, & White, 1987). Participants in the math problem condition solved difficult multiplication problems. Previous research (Muraven et al., 1998, 2002) and pretesting has found that participants rate thought suppression as equally frustrating, difficult, unpleasant, effortful, and arousing as working on arithmetic problems. The only difference between the two conditions was the amount of inhibition or self-control required (see Muraven et al., 2002). Suppressing one’s thoughts requires overriding or inhibiting in order to succeed at the task, whereas solving math problems is a relatively automatic activity that should require far less inhibition. Therefore, only thought suppression should produce depletion. After 5 min had passed, the experimenter reentered the room and gave participants a brief manipulation check. Participants were asked about amount of effort they exerted on that task (“How much effort did you exert on the white bears/math problems?” on a 25-point scale ranging from no effort to all my effort) and any frustration generated by the task (“How frustrating was that task?” rated on a 25-point scale ranging from not frustrating to extremely frustrating). Anticipation of future task. The next two tests were then explained to participants. First, they were told about Task 2, the cold pressor task (described later in this article). They were then told that the third and final task would be a test of emotional awareness and that they would watch a short video of a stand-up comedian. The video was described to them in detail, and participants were repeatedly told that the comedian was ex-

tremely funny. Half the participants were given no further instructions: They were led to believe that after Task 2 they would watch the video and answer a few questions about it (anticipate no self-control condition). The other half of the participants were told that they would have to control their emotions while watching the video (anticipate self-control condition). They should not laugh or smile, no matter how funny they found the video. The experimenter told participants that not laughing at the video is extremely difficult and requires a great deal of self-control. Previous research has indeed found that not laughing requires a great deal of self-control and is depleting (Muraven et al., 1998). Thus, participants in the future selfcontrol condition anticipated a difficult attempt at self-control that might use up some of their self-control resources following Task 2. At this point, participants completed another brief manipulation check to assess their perception of the future tasks. In particular, they were asked how much effort they expect to exert on the third task (“How much effort did you plan to exert on Task 3?” rated on a 25-point scale ranging from no effort to all my effort), and how much energy that task would demand (“How much energy did you expect the last task would require?” rated on a 25-point scale ranging from very little to very much). Measurement phase. Participants then engaged in Task 2. This test of self-control, known as the cold pressor task, required them to hold their nondominant hand motionless in ice cold water for as long as possible. Holding one’s hand in cold water is uncomfortable and people typically wish to remove their hand from the water as soon as possible. People must therefore engage in self-control to override their desire to remove their hand from the water. Participants with greater self-control typically keep their hand in the water longer than participants with less self-control (Baker & Kirsch, 1991; Litt, 1988). The water temperature was maintained using a mixture of ice and water, and a pump was used to keep the water circulating. To ensure an equal starting point, all participants first held their hand in room temperature water for 1 min before they put their hand in the ice water. The ambient room temperature was maintained at 18° C. Participants were told to place their hand in the ice water and to not move the hand or fingers. They were told to hold their hand in the water as long as they could and that they should only remove their hand when they could not bear the cold any more. Thus, participants were instructed to fight against the urge to remove the hand. The experimenter started timing using a stopwatch the moment their hand was fully submerged (half the forearm underwater) and stopped timing once the entire hand was removed from the water. After the cold pressor task, participants were told that there would not be enough time for the third task and that the experiment was over. At this point, they completed a final manipulation and procedure check. They were asked whether they were trying to conserve strength for the final task (“How much were you trying to conserve your energy for the third task?” rated on a 25-point scale ranging from not at all to very much), how important it was to them to conserve strength (“How important was it to you to conserve strength for the final task?” rated on a 25-point scale ranging from not at all to very much). They were also asked about their performance on the cold pressor task (“How much effort did you exert to keep your hand in the water?” rated on a 25-point scale ranging from no effort to all my effort). Finally, participants were carefully debriefed about their research experience. No one reported awareness of the conservation hypothesis.

Results Manipulation Check As shown in Table 1, participants in the thought suppression condition reported that they exerted the same amount of effort as compared with participants in the math problem condition, t(90) ⫽ 0.80, ns, and were no more frustrated by the first task, t(90) ⫽

CONSERVING STRENGTH

527

Table 1 Experiment 1: Responses on Key Variables Based on Initial Task and Expectation for Future Self-Control Depleted Anticipate self-control

Not depleted Anticipate no self-control

Anticipate self-control

Anticipate no self-control

Variable

M

SD

M

SD

M

SD

M

SD

Effort on initial task Frustration on initial task Anticipated self-control effort on third task Anticipated energy required by third task Importance of conservation Time in ice water

12.58a 8.28a 17.72a 14.89a 7.72a 59.06a

5.89 7.49 4.10 5.29 6.36 49.58

12.23a 10.11a 11.06b 8.26b 4.00b 101.54b

7.45 7.36 5.63 4.74 5.27 43.48

14.04a 8.86a 17.80a 15.49a 4.01b 102.32b

6.83 6.64 4.74 5.59 5.60 52.17

13.09a 9.04a 12.11b 9.15b 3.88b 128.80c

6.65 7.27 5.93 5.08 3.71 48.374

Note. N ⫽ 93. Means that do not share subscripts differ at p ⬍ .05 using the Tukey correction for multiple tests. Time in ice water was measured in seconds.

0.86, ns. This replicates the previous experiments that have found that although thought suppression is just as frustrating and difficult as solving math problems, thought suppression requires more self-control than solving math problems (e.g., Muraven et al., 1998). A power analysis suggested that we had sufficient power to detect a medium effect size (d ⫽ 0.5) approximately 76% of the time, which indicates that we should have been able to detect meaningful differences between the conditions, had any existed. Consistent with the experimental design, participants viewed future tasks that required self-control as demanding more energy and effort than future tasks that were not described as requiring self-control. Participants in the future self-control condition reported that they expected the third task to require more energy than did participants in the no future self-control condition, t(90) ⫽ 6.11, p ⬍ .001. Likewise, participants who expected to exert self-control in the future expected to exert more effort on the third task, t(90) ⫽ 5.70, p ⬍ .001.

Dependent Measure The time participants held their hand in the ice water was analyzed using a 2 (initial task) ⫻ 2 (anticipation for the future) analysis of variance (ANOVA). Replicating previous research (Muraven et al., 1998), we found that individuals who exerted self-control in the first phase of the experiment tended to remove their hand from the water sooner than those who did not exert self-control initially, F(1, 89) ⫽ 3.19, p ⬍ .07. The main effect for future task was not significant, however, F(1, 89) ⫽ 0.16. Most important, the interaction between initial task and future task was also significant, F(1, 89) ⫽ 4.24, p ⬍ .05. Several results from the post hoc tests reported in Table 1 should be noted. First, replicating previous research on self-control strength, participants who had to suppress their thoughts and did not anticipate self-control in the future removed their hand from the water sooner than participants who solved arithmetic problems and did not anticipate self-control in the future. Second, consistent with the conservation hypothesis, a focused contrast found that participants who suppressed their thoughts and expected to exert self-control in the future removed their hand from the ice water sooner than people in the other three conditions, t(89) ⫽ 2.11, p ⬍ .05. The length of time participants held their hand in the water

was normally distributed; thus, there was no restriction of range or ceiling effect. This is consistent with the argument that people are motivated to conserve self-control strength, and this desire to conserve is stronger among people who have recently exerted self-control.

Motivation to Conserve Finally, we examined how the future task affected participants’ perception and motivation for holding their hand in the ice water. As shown in the post hoc tests in Table 1, participants who had to exert self-control in the first part of the experiment and who expected to exert self-control in the future felt it was more important to conserve than did other participants. There was also a correlation between how much participants were trying to conserve their energy for the third task and how long they held their hand in the ice water. Across the entire sample, participants who felt it was less important to conserve energy kept their hand in the ice water longer, r(93) ⫽ 0.21, p ⬍ .05. This is consistent with the argument that anticipating self-control in the future increases the value of remaining strength, and the stronger the motivation to conserve this strength, the more poorly the person performs on a test of self-control.

Discussion The results of Experiment 1 indicate that anticipating the need for self-control in the future and exerting self-control in the past combine to influence current self-control performance. Individuals apparently anticipate self-control tasks and alter their behavior to conserve a limited resource. Moreover, participants who exerted self-control in the first part of the experiment and therefore should have depleted some of their strength were more concerned with conserving strength than were those who were not as depleted. There was a relationship between desire to conserve and selfcontrol outcome as well: The more they reported conserving strength, the more poorly participants performed on the test of self-control. These findings are consistent with the economic or conservation of resource model of depletion outlined earlier: Selfcontrol performance may suffer because people are trying to

MURAVEN, SHMUELI, AND BURKLEY

528

conserve what self-control strength they have, and the desire to conserve is stronger among people with less strength. However, one shortcoming of Experiment 1 was that the future task involved either suppressing emotions while watching the comedy or simply paying close attention. Thus, it is possible that a variable unrelated to self-control (e.g., difficulty or effort) may have influenced participants’ desire to conserve in Experiment 1. Moreover, expecting to watch a comedy may have had unexpected positive affective consequences, which could help replenish lost strength (Tice et al., 2005). We designed Experiment 2 to address those concerns. In particular, if expecting to exert self-control in the future is indeed the critical factor, then individuals who anticipate a demanding task that does not require self-control in the future should perform equally well on a test of self-control as individuals who are not told about any future task. People in either of those conditions should perform better than people who expect to exert self-control in the future, even when that future self-control task does not seem any more unpleasant, harder, or difficult than the task that did not require self-control.

Experiment 2 Method Participants A total of 103 (57 male and 46 female) undergraduate students participated in the study in partial fulfillment of a psychology class requirement. Participants were tested in small groups. Each testing session lasted approximately 30 min.

Procedure Upon arrival at the laboratory, participants were greeted by an experimenter and were seated in a cubicle in front of a computer without interacting with one another. They were told that the purpose of the experiment was to examine how individuals regulate attention and that they would be taking several different tests of attention. All instructions were presented on the computer, and the computer randomly assigned participants to condition at run time as well. Thus, the experimenter was unaware of participants’ assigned condition and did not interact with participants during the course of the experiment. Depletion phase. For the first task, participants were instructed to retype a short paragraph that appeared on the computer screen as quickly as they could. Participants could not see what they typed, although the computer recorded all keystrokes. The computer randomly assigned participants to one of two conditions. In the type-all-letters condition, participants retyped the paragraph as it appeared on the screen. In contrast, participants assigned to the no-e’s condition were told not to type any e’s or spaces as they retyped the paragraph. Following such a rule likely required the person to override the natural inclination to type every letter and therefore should have required self-control (Rieger, 2004). After typing the paragraph (the computer stopped timing when they typed the three letters in the last word), participants were then given the Brief Mood Introspection Scale (BMIS; Mayer & Gaschke, 1988) to assess their mood and arousal. They also completed a short manipulation check to assess their self-control efforts (“How much were you fighting against an urge on that task?” rated on a 7-point scale ranging from not at all to very much and “How much did you have to control yourself on that task?” rated on a 7-point scale ranging from not at all to very much). We also assessed whether the initial task differed in effort required (“Did that task require

much effort?” rated on a 7-point scale ranging from definitely no to definitely yes). Future tasks. The computer then presented instructions to the participants about the next task, a test of attention regulation (see the following paragraph). The last paragraph of the instructions informed participants that after the test of attention regulation, they would be taking one final test. This final paragraph varied randomly across participants. One group of participants, future self-control, was warned as follows: “Immediately after this test, you will take a final test of performance. This test will require you to remember many numbers while being distracted.” Concentrating in the face of distractions has been found to require self-control, and thus this task likely would require self-control. Another group, future hard problems, was advised as follows: “Immediately after this test, you will take a final test of performance. This test will require you to solve math problems of varying degrees of difficulty.” Prior research (Muraven et al., 2002) suggests that most individuals find solving math problems difficult, unpleasant, effortful, and demanding, but that solving problems does not require overcoming a strong impulse. Hence, participants in this condition expected a difficult and demanding task in the future but did not expect to exert much self-control. Finally, there was a no-instructions condition. Participants in this condition were not informed about any future task. Measurement phase. Participants then took the test of attention regulation. Numbers were flashed in rapid succession on the computer screen. Participants were told to press the space bar when the number 4 followed the number 6. The numbers were presented at an irregular interval (750 ms ⫾ 300 ms) and were on the screen for a relatively short time (250 ms ⫾ 100 ms). The entire trial lasted approximately 12 min. This task required a great deal of continuous concentration on participants’ part— even a moment’s distraction could lead to missing a number. This test has a long history of being used to measure problems with attention regulation (See, Howe, Warm, & Dember, 1995) and indeed has been used as a screening test for self-control problems (White et al., 1994). Participants who are trying to conserve self-control strength and hence are less likely to exert self-control may have trouble overriding the natural tendency of the mind to wander and hence miss more targets (have a lower hit rate) than participants who are less motivated to conserve strength. Finally, after completing this second task, participants were given a brief manipulation check questionnaire to assess whether the groups differed in their perception of this task. In particular, participants were asked how difficult they thought the future task would be (“How difficult do you think the next task will be?” rated on a 7-point scale ranging from very easy to very difficult), how much the future task was a distraction (“How much did you think about what is going to happen next?” rated on a 7-point scale ranging from very little to very much), and how fun the next task would be (“Will the next task be fun?” rated on a 7-point scale ranging from definitely no to definitely yes). They also answered whether the future task would require self-control (“How much will you have to stop yourself during the future task?” rated on a 7-point scale ranging from very little to very much). Participants were then carefully debriefed by the experimenter. No participant reported awareness of the true nature of this experiment or the experimental hypotheses.

Results Manipulation Checks As shown in Table 2, there was no difference across groups on mood or arousal (all Fs ⬍ 2.00, p ⬎ .10). The interaction between future task and typing condition also was unrelated to mood and arousal (Fs ⬍ 1.50, p ⬎ .10). There was a main effect for typing condition for fighting against an urge, F(1, 97) ⫽ 5.38, p ⬍ .025, and how much self-control was exerted, F(1, 97) ⫽ 6.61, p ⬍ .01. Not typing e’s required more self-control than typing all the letters. The main effect for instructions and the interaction between in-

CONSERVING STRENGTH

529

Table 2 Experiment 2: Responses on Key Variables Based on Initial Task and Expectation for Future Self-Control Depleted

Not depleted

Future self-control

Future hard

No future (control)

Variable

M

SD

M

SD

M

Mood Arousal Fight against an urge on first task Self-control exerted on first task Anticipated self-control on third task Anticipated difficulty of third task How distracting was third task? How fun will the third task be? Hits on concentration

1.92a 24.16a 5.08a 4.38a 5.54a 3.77a 4.08a 2.23a 20.23a

13.81 3.56 2.12 2.06 1.85 2.28 2.14 1.24 4.01

⫺1.47a 23.86a 5.42a 4.68a 4.74b 3.37a 3.16b 2.53a 25.57b

8.54 7.31 0.61 1.70 1.85 1.89 1.95 1.65 3.78

0.73a 22.73a 4.21a 3.94a 4.13c 3.39a 3.96a 3.09a 25.39b

Future self-control

Future hard

No future (control)

SD

M

SD

M

SD

M

SD

12.38 5.37 1.83 1.81 1.32 2.08 2.03 1.70 5.78

0.23a 20.64a 3.23b 2.62b 5.23a 2.92a 3.82a 2.15a 27.77c

13.50 4.05 1.18 1.56 1.64 1.66 2.30 1.21 5.35

0.95a 22.04a 2.74b 2.32b 4.79b 3.00a 4.58b 2.68a 27.68c

9.38 3.42 1.88 1.25 1.65 2.19 2.17 1.89 5.00

1.38a 22.61a 3.00b 2.70b 4.56b 3.81a 4.44b 2.56a 28.06c

9.84 3.88 1.71 1.46 1.37 1.47 1.90 1.21 5.57

Note. N ⫽ 103. Means that do not share subscripts differ at p ⬍ .05 using the Tukey correction for multiple tests.

structions and typing condition had no effect on how much selfcontrol was required (all Fs ⬍ 1.00, p ⬎ .10). An examination of the manipulation check for the anticipated third task found few meaningful differences. The instructions to ignore distractors was perceived to require more self-control than the instructions to solve math problems or no instructions, F(2, 97) ⫽ 3.89, p ⬍ .05. Despite the difference in self-control required, the future task did not differ in how distracting it was, how difficult it seemed, or how much fun participants thought this future task would be—the main effect for typing condition and for anticipated task, and the interaction between those variables, was not significant (all Fs ⬍ 2.00, p ⬎ .10). In short, the future task was perceived the same across groups, except in how much selfcontrol it would require.

Dependent Measure We conducted a 2 (initial task: type all, no e’s) ⫻ 3 (future task: demanding, self-control, none) ANOVA on second task performance, with targets found on the concentration task as the dependent measure (see Table 2). First, there was a main effect for initial task, F(1, 97) ⫽ 9.38, p ⬍ .005. Replicating previous work, individuals who exerted self-control in the first part of the experiment performed more poorly on the dependent measure than those who did not exert self-control. Participants who expected to exert self-control in the future also tended to miss more targets, although this result did not reach conventional levels of significance, F(2, 97) ⫽ 2.31, p ⬍ .10. These main effects were qualified by a significant interaction between initial task and future task, F(2, 97) ⫽ 5.17, p ⬍ .01. To further explore the interplay between past exertion of selfcontrol and future exertion of self-control, we first removed the participants who were not told of a future task from the analysis. When just the anticipate demanding task and anticipate selfcontrol task groups were considered, the interaction between initial task and future task remained significant, F(1, 60) ⫽ 9.22, p ⬍ .005. Similarly, when participants who were told to expect a demanding task that did not require self-control in the future were omitted and participants who were not told about a future task were included, the interaction between past self-control effort and

future task was close to conventional levels of significance, F(1, 61) ⫽ 3.07, p ⬍ .08. Finally, when the participants who anticipated a task that required self-control were omitted and just the demanding-future-task and no-future-task groups were considered, the interaction between initial task and anticipated task was not significant, F(1, 73) ⫽ 2.01, ns. In other words, the effects were specific to expecting to exert self-control in the future and were not the product of merely anticipating a forthcoming task. We also examined the relationship between the manipulation checks and performance on the vigilance task. Across the entire sample, expecting to exert more self-control in the future was negatively related to participants’ performance on the vigilance task, r(103) ⫽ –.27, p ⬍ .05. On the other hand, the amount of effort participants expected to exert on the future task was not related to performance on the vigilance task, r(103) ⫽ –.14, ns, nor was the expected difficulty of the future task, r(103) ⫽ –.11, ns. Participants also reported that how distracting the final task was was not related to performance on the vigilance task either, r(103) ⫽ –.14, ns. Finally, how fun participants felt the future task would be was not related to targets missed on the vigilance task, r(103) ⫽ –.01, ns. In short, the critical feature of the future task appeared to be how much self-control it required, rather than distraction or simple effort demanded.

Discussion The results of Experiment 2 replicated and extended the previous experiment. In particular, as in Experiment 1, participants who anticipated exerting self-control in the future and who exerted self-control in the past performed more poorly on a test of selfcontrol than participants who did not exert self-control in the past and more poorly than participants who did not anticipate exerting self-control in the future. This implies that individuals are concerned with the amount of self-control they will have to exert in the future. It appears that the heightened desire to conserve resources, based on a depleted state and future demands, leads to diminished self-control performance. To further disentangle whether these effects were merely due to anticipating a difficult task, we included three conditions: no future task, a task that would require self-control, and a difficult task that

MURAVEN, SHMUELI, AND BURKLEY

530

did not require self-control. The results indicated that participants who expected a difficult task that did not require self-control did not differ from participants who did not expect any task in the future. Despite the differences in self-control performance, individuals who expected to exert self-control in the future did not believe that the task would be more unpleasant or difficult than participants who anticipated a difficult task that did not require overriding an impulse. Moreover, the groups reported thinking about the future task the same, which suggests the cognitive demands of the future task were equal across groups. In summary, the effects of expecting to exert self-control appear to make a unique and distinct contribution to self-control performance, as predicted by conservation of resource theories.

Experiment 3 Experiment 3 used slightly different methods than the previous experiments to test the conservation hypothesis. Rather than explicitly giving participants a task to expect in the future, we prompted participants to think about the amount of self-control demands they would expect to face in the coming hours. Thus, this experiment has greater ecological validity than the previous studies, as it tests whether anticipating naturally occurring demands in the future can also lead participants to perform more poorly on immediate tests of self-control. In this experiment, participants completed a questionnaire assessing the amount of self-control demands that they anticipated dealing with after they left the experiment and before going to bed that night. Such an assessment should remind participants of the need to conserve strength and might lead to poorer self-control performance, especially among depleted individuals. To moderate the strength of this effect, some participants completed the questionnaire before their self-control was assessed, whereas others completed it after the main dependent measure. Presumably, there should be a strong, inverse relationship between anticipated future self-control demands and present self-control performance for participants who are reminded of those demands; the effect should be much smaller for those not reminded of future self-control demands. Moreover, previous exertion of self-control should moderate this effect: Individuals who have already exerted self-control should be the most sensitive to future self-control demands, especially when reminded of them. That is, self-control performance should be a function of anticipated self-control demands, cuing of those demands, and previous self-control exertion.

Method Participants Sixty-two undergraduate students (38 women and 24 men) received a partial course credit for their participation in Experiment 3. Each individual testing session lasted approximately 45 min.

Procedure An experimenter who was unaware of the research hypotheses greeted participants. They were told that the purpose of the study was to investigate the relationship between eating and cognitive functioning. Participants did not know that their self-control performance was being evaluated.

Depletion phase. Participants were then presented with a plate of chocolate chip cookies and a plate of celery. Participants who were randomly assigned to the no-cookie condition were told that they were in the celery group. They were asked to eat at least one or two pieces of celery, but they should not eat any of the cookies. Participants in the no-celery condition were given similar instructions to sample a cookie but not to eat any celery. Previous research (Baumeister et al., 1998) found that cookies are much more tempting than vegetables and that it requires more self-control to resist eating cookies. Thus, participants assigned to not eat the cookies should deplete more self-control strength than participants assigned to not eat the celery. Following the first experimental tasks, participants completed the BMIS (Mayer & Gaschke, 1988) to assess their mood and arousal. They also completed a short manipulation check to assess whether conditions differed in unpleasantness (“How unpleasant was that task?” rated on a 30-point scale ranging from very unpleasant to very pleasant) and frustration (“How frustrating was that task?” rated on a 30-point scale ranging from very frustrating to not frustrating at all). Future task. Half the participants then completed a 27-item measure of future self-control demands that they anticipated facing from the end of the experiment until they went to sleep that night. This instrument assessed a broad domain of self-control, including anger control (“I will have to control my temper”); the need to fight temptations to smoke, drink, or eat to excess (“I will have to resist the temptation to smoke”); and the need to regulate attention (“I will have to concentrate”). These items were rated on a 7-point scale with anchors of very unlikely and very likely; thus, higher scores indicate more future self-control demands. A factor analysis found that the items strongly loaded on one factor, and an examination of the factor loadings (using a scree plot) found that a single-factor solution was appropriate. This single factor, named Future Self-Control Demands, had excellent internal reliability (coefficient alpha was .91). The other half of the participants completed this questionnaire of future self-control demands after the dependent measure, so that they were not as aware of their future self-control demands while their self-control was measured behaviorally. The experimenter was aware of the order of the tasks but did not know participants’ level of future self-control demands. Measurement phase. Participants’ self-control was then assessed using a stop signal task. The stop signal task is a cognitive test of inhibition that requires participants to respond as quickly as possible to a judgment task (de Jong, Coles, Logan, & Gratton, 1990; Logan & Cowan, 1984). On approximately 25% of the trials, a tone sounds, indicating that participants should inhibit their response. Previous research has found that performance on the stop signal task is positively related to self-control capacity. For example, children lower in self-control (such as children with attention deficit hyperactivity disorder; see Barkley, 1997b) are less able to stop themselves and perform more poorly on the stop signal task than children with greater self-control (Oosterlaan & Sergeant, 1996; Schachar & Logan, 1990; Schachar, Tannock, & Logan, 1993). Thus, the stop signal task is a well-established measure of self-control. Individuals who are depleted and conserving strength for a future task should perform more poorly on a stop signal task than those who are less depleted or not as motivated to conserve. Following Oosterlaan and Sergeant (1996), participants were instructed to indicate the position of a square on a computer screen by pressing the appropriate key on the keyboard. They were asked to respond as quickly and as accurately as possible. From time to time, a tone would sound. Participants were asked to suppress their response (i.e., not to hit any key on the keyboard when they saw a square) whenever they heard a tone. All participants received standard instructions that appeared on the screen. The task was composed of five blocks, each consisting of 64 trials. To familiarize participants with the task, the first block was the practice block. Participants were not informed that the first block was designed for practice only. After the third block was completed, the computer instructed participants to take a short break, close their eyes, and relax.

CONSERVING STRENGTH Auditory stop signals were presented randomly and occurred on 25% of the trials. As a means of controlling for individual differences in reaction times and differences in reaction time over the experiment, participants’ mean primary reaction time (MRT; how quickly they responded to the square) was calculated for each block. The auditory stop signals were then presented 50, 200, 350, and 500 ms before the MRT calculated in the preceding blocks. Auditory stop signals were introduced at or after the presentation of a square on the computer screen. Each trial began with the introduction of a fixation point that appeared in the center of the screen for 500 ms. Following the fixation point, a white square was presented for 1,000 ms. Within each block, the stimulus presentation was counterbalanced, such that (a) half of the squares appeared on the right side and half of the squares appeared on the left side of a computer screen; (b) auditory stop signals given 50, 200, 350 or 500 ms before a participant’s MRT were presented with equal frequency; (c) auditory stop signals were counterbalanced across right and left presentation of the square; and (d) presentation of 50-, 200-, 350-, and 500-ms trials was randomized within each block. Participants’ ability to stop themselves from responding was calculated for each stop signal time, based on the proportion of number of responses not made when the tone sounded. Upon completion of the stop signal task, participants completed some additional questionnaires and were carefully debriefed using a modification of the funnel debriefing procedure (Chartrand & Bargh, 1996). None of the participants reported awareness that the tasks were related or that selfcontrol was the primary focus of the study.

Results Manipulation Check Like the previous experiments, participants’ BMIS arousal and mood did not differ across conditions; for arousal, not eat cookies, M ⫽ 31.61, SD ⫽ 6.63; not eat celery, M ⫽ 33.94, SD ⫽ 4.73, t(60) ⫽ 1.61, ns; for mood, not eat cookies, M ⫽ 1.26, SD ⫽ 8.36; not eat celery, M ⫽ –1.42, SD ⫽ 7.56, t(60) ⫽ 1.32, ns. The groups likewise did not differ in self-reported unpleasantness of the initial task (cookies, M ⫽ 25.12, SD ⫽ 5.00; celery, M ⫽ 25.80, SD ⫽ 5.87), t(60) ⫽ 0.51, ns, or frustration (cookies, M ⫽ 24.41, SD ⫽ 6.31; celery, M ⫽ 25.63, SD ⫽ 5.82), t(60) ⫽ 0.80, ns. Most important, the two groups did not differ in their self-report of future self-control demands (cookies, M ⫽ 98.10, SD ⫽ 27.11; celery, M ⫽ 105.33, SD ⫽ 26.18), t(60) ⫽ 1.06, ns. Individuals who were told that they could not eat the cookies reported that they expected the same amount of self-control demands in the future as participants who could not eat the celery. Whether the assessment of future self-control demands occurred before or after the dependent measure also was not related to the amount of demands anticipated in the future, t(60) ⫽ 0.43, ns.

Dependent Measure We focused on the most difficult stop signal delay: 50 ms prior to their typical response (longer offsets would occur closer to the appearance of the box on the screen, giving participants more time to stop themselves, and therefore required far less self-control). Using moderated multiple regression, we tested whether the proportion of responses not made when the tone sounded was related to the amount of self-control demands they anticipated that day (standardized), when they were cued into those demands (before or after the stop signal task), and the prior exertion of self-control (resist cookies or celery). The slope for the main effect of the initial task was close to conventional levels of significance, B ⫽

531

– 0.25, SE ⫽ 0.13, t(54) ⫽ 1.92, p ⬍ .06, indicating that more depleted participants may have been less able to stop themselves on the stop signal task than the less depleted participants. The main effects for order of questionnaires, B ⫽ 0.93, SE ⫽ 0.91, t(54) ⫽ 1.01, ns, and self-demands, B ⫽ – 0.01, SE ⫽ 0.066, t(54) ⫽ 0.080, ns, were not significant. The two-way interaction between depletion and order of the questionnaire was significant, B ⫽ – 0.30, SE ⫽ 0.13, t(54) ⫽ 2.34, p ⬍ .025. Depleted individuals reminded about future self-control demands performed worse as compared with those not reminded about future self-control demands. The two-way interaction between depletion and demands did not reach conventional levels of significance, B ⫽ – 0.16, SE ⫽ 0.09, t(54) ⫽ 1.75, p ⬍ .09, but suggested depleted participants may perform worse when they had more demands in the future. The two-way interaction between order and demands was not significant, B ⫽ 0.03, SE ⫽ 0.09, t(54) ⫽ 0.75, ns. These effects are modified by a nearly significant three-way interaction between depletion, amount of future demands, and order of the questionnaires, B ⫽ – 0.26, SE ⫽ 0.13, t(54) ⫽ 1.94, p ⬍ .057. We then examined the simple relationship between future demands and self-control performance for each of the four groups (see Figure 1). Daily demands were nearly significantly related to self-control performance for participants who had to resist eating the cookies and who completed the self-control demand questionnaire before the stop signal task, B ⫽ – 0.007, SE ⫽ 0.003, t(54) ⫽ 1.83, p ⬍ .07. The relationship was not significant for participants who had to resist eating the cookies and who completed the self-control questionnaire after the stop signal, B ⫽ 0.004, SE ⫽ 0.003, t(54) ⫽ 1.21, ns. Similarly, future demands were unrelated to self-control performance for participants who had to resist eating the celery, irrespective if the questionnaire came before, B ⫽ 0.002, SE ⫽ 0.003, t(54) ⫽ 0.73, ns, or after the stop signal task, B ⫽ 0.001, SE ⫽ 0.003, t(54) ⫽ 0.18, ns. Put another way, the amount of self-control expected in the future was related to performance only when participants were depleted and those demands were made salient. As predicted by the conservation model, participants who were depleted, who anticipated more self-control demands in the future, and who were reminded of those demands performed more poorly on a test of self-control.

Discussion This experiment replicated the results of the previous experiments using different methods of exerting self-control, measures of self-control performance, and future self-control demands. There was a relationship between self-control performance and anticipated future self-control demands for depleted participants when the future demands were made salient; this relationship was far weaker for depleted individuals when the future demands were not salient. Although there was a main effect for depletion (i.e., participants who had to exert self-control performed worse on the stop signal tasks than participants who did not have to exert self-control), this effect was moderated by future self-control demands. This suggests that depleted people may perform more poorly on tests of self-control because they are motivated to conserve strength for the future demands.

MURAVEN, SHMUELI, AND BURKLEY

532

Figure 1. Experiment 3: Relationship between stop signal performance (proportion of responses inhibited when tone sounded) and future self-control demands, based on previous self-control exertion and order of presentation (measure of future self-control demands before or after stop signal task).

Experiment 4 Experiment 4 had several purposes. The first purpose was to use different tasks to replicate and generalize the results found in the previous experiments. As in the previous experiments, we predicted that depleted participants who anticipated exerting more self-control in the future should perform more poorly on a test of self-control than depleted participants who anticipated exerting little self-control in the future and more poorly than those who were less depleted. The second purpose was to actually measure people’s performance on a third task. That is, after participants’ self-control performance on the main dependent measure was assessed, they then engaged in a third task. By measuring their self-control performance a third time, we hoped to get a sense of whether conservation is effective. Conserving energy can be a beneficial strategy, as it allows individuals to make optimal use of their limited resources. Hence, we predicted that if participants are saving energy for a third task, they should perform worse on the second task than those who are not as motivated to save strength. However, by saving strength, participants should perform better on the third task than those who were less motivated to conserve strength. There should be a trade-off between performance on the second and third self-control tasks, especially among those who expected that the third task would require self-control. Finally, when the performance measure does not require selfcontrol, the effects of exerting self-control in the past and expecting to exert self-control in the future should be eliminated. Put another way, if participants are motivated to conserve self-control resources (and those resources are depleted by the previous exertion of self-control), then performance on this non-self-control task should not be related to past or future self-control operations. On the other hand, if the effects

in the previous experiments are being driven by mood, arousal, or distraction, then it should not matter whether the task requires selfcontrol or not: The effects will not be specific to tasks that require self-control. By comparing performance on a test that requires selfcontrol with a test that does not require self-control, we hoped to test the specificity of self-control in the conservation model.

Method Participants A total of 152 undergraduate students (92 male and 60 female) participated in the study for extra credit or partial fulfillment of a psychology class requirement. Each testing session lasted approximately 30 min.

Procedure The experiment was conducted on a computer, which presented all instructions to participants and randomly assigned them to a condition at run time. Upon entering the laboratory, the experimenter explained the participants’ rights to them, had them sign a consent form, seated them in front of a computer, and started the program. Participants were not informed of the true purpose of the study or that the study involved selfcontrol. Instead, they were told, via the computer, that the experimenters were investigating the relationship among several cognitive processes and that they would be completing several different tasks that measured various aspects of these processes. Depletion phase. Participants’ initial self-control demands were manipulated using a typing task, like that in Experiment 2. In particular, they were directed to retype a paragraph that was displayed on the computer screen. Half of the participants were instructed to type the paragraph as it appeared; the other half were instructed to retype the paragraph without using the letter e or the space bar. After typing the paragraph, they then completed a brief manipulation check to assess whether the conditions

CONSERVING STRENGTH differed in how much self-control was required (“How much were you fighting against an urge on that task?”) rated on a 7-point scale with anchors ranging from not at all to very much. Participants’ mood and arousal was also assessed using the BMIS (Mayer & Gaschke, 1988). Future task. Next, the amount of self-control required by the third and final task in the future was manipulated. Participants were told by the computer that there were two more tests: a test of color-word identification (Stroop) and anagrams. They were told that the Stroop test would be next, followed by the anagrams. The instructions for the Stroop were the same for all participants, but the description of the anagrams differed across conditions and served as the manipulation of future self-control. In particular, all participants read a brief paragraph explaining that previous research has found that students find anagrams difficult and challenging. The last sentence differed across conditions, however. Participants who anticipated a difficult task were told that they should expect to “think hard while working on this task.” Participants who anticipated self-control were told that they should expect to “work hard at overriding impulses while working on this task.” Thus, all participants expected a difficult task; what differed across conditions was the amount of selfcontrol they expected to exert in the future. Participants’ perception of this future task was then assessed. In particular, they were asked about effort (“How hard do you think the third task will be?”), how unpleasant they thought that future task would be (“How unpleasant does the third task sound?”), and how much self-control they thought this future task would require (“How much do you think you will have to override an urge while working on the third task?”) rated on a 7-point scale ranging from not at all to very much. Measurement phase. After they were told about this final task, their self-control performance was assessed using a Stroop task. Previous research (Wallace & Baumeister, 2002) found that performance on the Stroop entails self-control. Inhibiting the word to say the ink color is harder for participants lower in self-control capacity, and hence depleted individuals should take longer to report the ink color than would less depleted individuals. In the present experiment, the words were presented on the computer screen and participants had to report the ink color using the keyboard. Participants saw a total of 80 words. How long it took them to respond after the word appeared on the screen was measured by the computer. Incorrect responses were rejected by the computer. If participants took longer than 2 s to supply a correct answer, the time for that trial was recorded as 2 s. Less than 3% of all the data points were handled in this way; no single participant had more than four responses (5%) cut off. In addition, to demonstrate that the effects are specific to tasks that require self-control, for half of the participants, the color words matched the font color. For the other half of the participants, there was a mismatch between the font color and the word. Presumably, the effects of exerting self-control in the past and anticipating self-control in the future should only affect performance on tasks that require self-control (mismatch condition); there should be no effect of future or past self-control on tasks that do not require self-control (match condition). After completing the Stroop, participants completed one more manipulation check to assess how much they were conserving resources for the third task (“Were you conserving energy for the next task?”) and how much they thought about that next task (“How much did you think about the next task while working on that task?), answered on a 7-point scale with anchors ranging from not at all to very much. Final task. After the manipulation check, participants engaged in the anagram task. The anagrams appeared on the screen, one at a time. There were five anagrams in total, each consisting of six letters. Participants could click a button to indicate that they had solved the anagram or wished to quit working on that particular anagram. Most critically, the third and fifth anagrams were unsolvable. Working on frustrating and difficult tasks when the option to quit is available and salient requires overriding the desire to stop. Consistent with previous research on self-control (Muraven et al., 1998), how long participants worked on these impossible anagrams

533

served as a measure of self-control performance, with greater self-control indicated by longer effort. After completing that task, participants were then carefully debriefed about their research experience by the experimenter. Participants reported having no awareness that the second task was the critical task, nor did they suspect that performance on the second task was affected by either the initial task or the future task.

Results Manipulation Checks The manipulation checks were analyzed using a 2 (typing instructions: type all vs. no e’s) ⫻ 2 (future task: self-control vs. difficult) ⫻ 2 (ink color word: match vs. mismatch) ANOVA. As shown in Table 3, mood and arousal did not differ across conditions, nor were any of the interactions significant (all Fs ⬍ 2.50, p ⬎ .10). For the amount of inhibition exerted on the initial (typing) task, there was a main effect of instructions, F(1, 144) ⫽ 20.01, p ⬍ .001. Not typing e’s required more self-control than typing all the letters. No other main effect or interaction was significant (all Fs ⬍ 2.00, p ⬎ .10). Participants’ expectation of how difficult the future task would be did not differ across conditions (all main effects and interactions, Fs ⬍ 1.40, p ⬎ .10). The same was true for how unpleasant they thought the third task would be (all Fs ⬍ 1.25, p ⬎ .10). Thus, whether the future task required self-control or was merely difficult was unrelated to participants’ expectations of how difficult they expected the future task to be or how unpleasant the third task would be. Likewise, the instructions had no apparent effect on participants’ cognitions—they reported thinking about the future task the same amount (all Fs ⬍ 1.10, p ⬎ .10). Hence, the future task was no more distracting when it required self-control than when it did not. Participants did feel that the future task that was described as requiring self-control would require more restraint than the future task that was described as requiring thinking hard, F(1, 144) ⫽ 4.01, p ⬍ .05. No other main effect or interaction was significant for this variable, however (Fs ⬍ 2.50, p ⬎ .10).

Stroop Performance Group differences. Participants’ performance on the Stroop task was analyzed using a 2 (typing instructions: type all vs. no e’s) ⫻ 2 (future task: self-control vs. difficult) ⫻ 2 (ink color word: match vs. mismatch). The time it took participants to respond to the 80 words was normally distributed; the score distribution for the match and no match was the same (although the means differed). As would be expected, there was a main effect for ink color-word match, F(1, 144) ⫽ 20.83, p ⬍ .001. It took participants longer to indicate the ink color when the word did not match the ink color than when it did. The main effect for typing instructions, F(1, 144) ⫽ 0.97, ns, or for the future task were not significant, F(1, 144) ⫽ 1.12, ns. There was no interaction between typing instructions and future task, F(1, 144) ⫽ 0.50, ns, nor between future task and Stroop instructions, F(1, 144) ⫽ 0.25, ns. The interaction between Stroop and typing instructions was significant, F(1, 144) ⫽ 3.93, p ⬍ .05. Individuals who had to exert self-control in the first part of the experiment took longer to respond to mismatches between word and ink color than to

MURAVEN, SHMUELI, AND BURKLEY

534

Table 3 Experiment 4: Responses on Key Variables Based on Initial Task and Expectation for Future Self-Control Depleted Future self-control Variable

M

Not depleted Future hard

SD

M

Future self-control

Future hard

SD

M

SD

M

SD

1.11 3.52 1.04 2.01 2.12 1.94 2.01 1.74 15.2 57.5

4.15a 20.60a 4.46a 4.92a 5.38a,b 1.86c 4.92a,b 3.15a 91c 195a

4.85 6.25 1.61 1.61 1.94 0.38 1.61 1.28 11.8 91.3

⫺1.08a 20.81a 2.92b 3.00b 4.83a,b 2.33b,c 5.00a,b 3.58a 94c 151a,b

9.67 5.22 1.56 1.91 2.62 1.44 1.91 1.78 13.2 165

10.6 6.14 1.64 1.79 1.52 0.45 1.79 1.10 13.7 71.3

0.67a 23.30b 4.44a 4.44c 5.44a 3.56b 4.44a 3.22a 76b 163a

8.90 2.87 2.00 1.74 1.51 1.33 1.74 1.48 12.8 47.4

⫺1.14a 20.66a 2.71b 3.29b 5.43a 3.55b 5.29a,b 3.43a 81a,b 233

9.78 4.99 2.36 1.98 2.44 1.92 1.98 1.99 8.22 85.0

Mismatch Stroop Mood Arousal Inhibition required Future self-control Future difficult Save energy for future Distracting thoughts about future Future unpleasant Stroop Time on anagrams

2.89a 19.14a 4.30a 4.56a 5.89a 4.22a 4.56a 3.22a 114a 187a

13.85 2.98 1.32 1.94 1.54 1.30 1.94 1.39 17.5 71.8

⫺3.00a 22.26a 2.45b 4.14a,b 5.09a,b 3.17b 4.64a 3.73a 104b 107b Match Stroop

Mood Arousal Inhibition required Future self-control Future difficult Save energy for future Distracting thoughts about future Future unpleasant Stroop Time on anagrams

6.17a 25.78a 3.33a,b 5.67a 5.17a,b 4.85a 5.67b 3.33a 86a 118a

7.70 4.35 1.86 1.37 2.56 1.21 1.37 1.86 11.6 19.9

2.20a 20.76b 3.20a 3.20b 4.60b 3.20b 5.20a,b 3.80a 82a,b 142a

Note. N ⫽ 152. Means that do not share subscripts differ at p ⬍ .05 using the Tukey correction for multiple tests. Stroop and time on anagrams were measured in seconds.

matches. This suggests that prior exertion of self-control only affects tasks that require inhibition and has no effect on tasks that do not require inhibition. Finally, the three-way interaction was significant, F(1, 144) ⫽ 5.71, p ⬍ .025 (see Table 3). To best understand this interaction, we examined the effects of ink color-word match and mismatch separately. For the match condition, the main effect for future task, F(1, 73) ⫽ 0.08, ns, and typing instructions, F(1, 73) ⫽ 0.10, ns, as well as the interaction between these terms, F(1, 73) ⫽ 1.00, ns, were not significant. When a task does not require self-control, the past exertion of self-control or the future exertion of self-control has no effect. A very different pattern emerged for the mismatch condition. Although the main effect for future instructions was not significant, F(1, 73) ⫽ 1.06, the main effect for typing instructions was significant, F(1, 73) ⫽ 8.61, p ⬍ .01. Replicating previous work (e.g., Wallace & Baumeister, 2002), individuals who were depleted performed more poorly on the Stroop task as compared with individuals who were not. Finally, consistent with the conservation hypothesis, there was an interaction between future task and previous exertion of self-control, F(1, 73) ⫽ 6.97, p ⬍ .025. A focused contrast indicated that participants who exerted selfcontrol in the past and who expected to exert self-control in the future performed more poorly on the Stroop task compared with the other three conditions, t(73) ⫽ 3.56, p ⬍ .001. Motivation to conserve. Participants were asked how much they were trying to save energy for the third task. The typing

instructions had no effect on conservation, F(1, 144) ⫽ 1.25, ns, nor did the future task, F(1, 144) ⫽ 0.05, ns. However, the interaction between these two terms was significant, F(1, 144) ⫽ 7.44, p ⬍ .01. People who exerted self-control in the past and who expected to exert self-control in the future were much more motivated to conserve energy than participants who did not exert self-control. To test the contribution of the motivation to conserve on Stroop performance, we conducted a moderated multiple regression. There was a main effect for Stroop condition (match vs. mismatch), B ⫽ 14.4, SE ⫽ 4.81, t(148) ⫽ 2.99, p ⬍ .001, which merely indicates that it took participants longer to read the mismatched than the matched Stroop words. The overall effect for self-reported motivation to conserve was not significant, B ⫽ 1.99, SE ⫽ 1.04, t(148) ⫽ 1.44, ns. The interaction between Stroop condition and motivation to conserve was significant, however, B ⫽ 2.09, SE ⫽ 1.04, t(148) ⫽ 2.01, p ⬍ .05. An analysis of the simple slopes found that the relationship between motivation to conserve and Stroop performance was significant for participants who saw Stroop words that did not match the ink color, B ⫽ 1.31, SE ⫽ 0.61, t(148) ⫽ 2.17, p ⬍ .05. The relationship between motivation to conserve and Stroop performance was not significant for participants who saw the matching Stroop words, however, B ⫽ –1.29, SE ⫽ 1.27, t(148) ⫽ 1.01, ns. The more they were trying to conserve, the longer it took participants to say the Stroop words in the

CONSERVING STRENGTH

mismatch condition. Conservation had no effect on tasks that did not require self-control, however.

Anagram Performance Finally, we examined participants’ performance on the third task: persistence on impossible anagrams. The main effect for initial self-control was significant, F(1, 144) ⫽ 10.82, p ⬍ .01. Depleted individuals persisted less on the anagrams. There were no main effects for Stroop or future self-control (all Fs ⬍ 0.54, ps ⬎ .10). The two-way interaction between Stroop condition and future self-control was significant, F(1, 144) ⫽ 14.75, p ⬍ .01. The two-way interactions between Stroop and depletion, F(1, 144) ⫽ 1.87, ns, and depletion and future self-control, F(1, 144) ⫽ 1.77, ns, were not significant, however. The three-way interaction between initial self-control task, Stroop condition, and expectation for the future was significant, F(1, 144) ⫽ 4.25, p ⬍ .01.1 Because of the relative complexity of this interaction, we examined the means separately for the match versus mismatch Stroop groups. In the mismatch group (ink color did not match the color word), the main effect for initial task, F(1, 73) ⫽ 0.05, ns, and for the future task, F(1, 73) ⫽ 0.00, ns, were both not significant. The interaction between these terms was significant, however, F(1, 73) ⫽ 6.65, p ⬍ .025. Examining the means (see Table 3) indicates that initially depleted participants who did not expect to exert self-control in the future quit working on the anagrams sooner than participants in any other condition, which was confirmed by focused contrast t(76) ⫽ 2.62, p ⬍ .025. This suggests they spent all their resource on the initial two tasks, which led to reduced persistence on the anagrams. On the other hand, participants who were initially depleted but expected to exert self-control in the future performed as well as participants who were not depleted. This could be the result of conservation—they knew about the future task, so they saved resources for it and hence performed better on it. When the ink color matched the word, there was no main effect for future tasks, F(1, 73) ⫽ 0.03, ns. However, the main effect for initial self-control task was close to conventional levels of significance, F(1, 73) ⫽ 3.09, p ⬍ .08. The interaction between future task and initial task was not significant, F(1, 73) ⫽ 1.00. Because the matched Stroop task does not require self-control, these results replicate previous work on depletion: Individuals quit working on self-control tasks sooner if they exerted self-control in the past. It also should be noted, however, that individuals who were not depleted, did not expect to exert self-control in the future, and saw matching Stroop words performed better than those in all other conditions. This may be because these individuals were the least depleted by any task and were the least concerned about the future and hence had “energy to burn.”

Trade-Offs Overall, then, participants appear to have been making a tradeoff between the Stroop and anagram tasks, especially when expecting that the anagrams would require self-control. When warned about the future task, depleted participants reduced their efforts greatly (see the results for the Stroop task), which preserved some resources for the final task. Using multiple regression, we found that the relationship between Stroop performance and per-

535

sistence on the anagrams was not significant, B ⫽ 1.81, SE ⫽ 1.44, t(148) ⫽ 1.26. There also was no effect for Stroop condition, B ⫽ 1.33, SE ⫽ 0.84, t(148) ⫽ 1.59, ns. However, consistent with our theory of conservation, the interaction between Stroop condition and Stroop performance was significant, B ⫽ 0.42, SE ⫽ 0.20, t(148) ⫽ 2.15, p ⬍ .05. An analysis of the simple slopes found that in the mismatch Stroop condition, there was a relationship between Stroop and anagram performance, B ⫽ 0.89, SE ⫽ 0.43, t(148) ⫽ 2.07, p ⬍ .05 (because long persistence but short Stroop times reflect better self-control, the direction of the relationship is positive). In the matching Stroop condition, this relationship was not significant, B ⫽ –1.02, SE ⫽ 0.80, t(148) ⫽ 1.28. In other words, better self-control on the Stroop was associated with worse selfcontrol on the anagrams. When the Stroop task did not require self-control (the ink matched the word color), this relationship was not found, however. This indicates that participants are trading better self-control performance on the Stroop for worse performance on the anagrams.

Discussion As in the previous experiments, participants who exerted selfcontrol in the first part of the experiment and who anticipated exerting self-control in the future performed worse on an intervening measure of self-control. Moreover, this poor performance was limited to tasks that actually required self-control. This appears to reflect the desire to conserve resources for the future, rather than a general decline in performance due to mood, arousal, frustration, or distraction. Participants’ performance on the anagrams also was consistent with the conservation hypothesis. In particular, when the second task did not require self-control (the ink matched the word color), performance on the final task was merely related to how much self-control the first task required. This result replicated previous work on self-control strength with the addition of an intervening task. When the second task required self-control (there was mismatch between ink and word color), performance on the third task suffered, unless the individual conserved resources (at the cost of hurting performance on the second task). This experiment went beyond the previous experiments by showing that conservation appears to be a reasonable and successful strategy. There was a relationship between reports of conserving energy on the second task and performance on the third task. Although individuals who conserved energy performed worse on the second task, they actually did better on the third task as compared with individuals who were less motivated to conserve 1

It might be argued that analysis of persistence on the insoluble items should control for speed of solutions on soluble items. However, two factors counter this argument. First, there is evidence that depletion also impairs performance on soluble anagrams (Baumeister et al., 1998). Second, a substantial number of participants failed to solve, or solved incorrectly, one or more soluble items. Consistent with Baumeister et al. (1998), the incorrect/missing solutions did not occur at random, but related to experimental condition, ␹2(7, N ⫽ 152) ⫽ 13.2, p ⬍ .05, because depleted participants (especially those who anticipated self-control in the future and who worked on the mismatching Stroop) were especially likely to not answer or give the incorrect solution to the solvable anagrams. For these reasons, such a control appeared to represent an overcorrection.

MURAVEN, SHMUELI, AND BURKLEY

536

self-control strength. Indeed, there is evidence of a trade-off between Stroop and anagram performance, at least for those who were told to expect self-control in the future. These results further imply that self-control is a resource that can be conserved for the future and allocated among tasks, based on individually determined priorities.

General Discussion The results of these four experiments suggest that self-control performance is related to how much self-control participants exerted previously and the amount of self-control they expect to exert in the near future. In particular, people who exerted self-control in the recent past and who anticipated exerting self-control in the near future performed more poorly on a test of self-control than those who did not exert self-control recently or those who did not anticipate exerting self-control in the future. These results are consistent with theories of conservation of limited resources. Exerting self-control may deplete a limited resource needed for self-control, which heightens individuals’ desire to conserve what remains. This motivation to conserve is intensified by the anticipation of future demands. Indeed, secondary analyses found that participants whose self-control strength was depleted reported being more concerned with the future task and reported conserving more strength when this future task required self-control. Overall, the more motivated they were to conserve strength, the poorer they performed on immediate tests of self-control. The final task in Experiment 4 suggests that conservation may be a useful policy for managing a limited personal resource when there are future demands. As was found in the previous experiments, individuals who reported conserving energy performed more poorly on the intervening task. There is evidence that they were engaged in a trade-off: Worse performance on the second task was compensated for by better performance on the last task. Thus, limiting how much energy is exerted was related to better performance on future demands, even though it appeared to hurt performance on the intervening tasks. These experiments can also help explain the previously established but puzzling result that merely anticipating a stressor in the future can lead to a loss of self-control (Spacapan & Cohen, 1983): The desire to conserve resources for future demands is enough to lead to poorer performance in the present. There was no main effect for future demands in the present experiments; it only interacted with self-control in the past. Less depleted people likely need a very good reason to conserve, as they are less sensitive to future demands than are more depleted individuals. If we used a more powerful future task (such as putting one’s hand in cold water, as done in Spacapan & Cohen, 1983), nondepleted participants also may have conserved strength. Overall, the results imply that the willingness to use further self-control resources drops after people exert self-control. This can be contrasted with an alternative model that posits self-control becomes more difficult and effortful when people are depleted, much like one must work harder to get water out of a well as it dries up. Because future demands should affect the willingness to use self-control resources, by making the remaining resources more valuable without making present self-control more difficult, the present study clearly supports the first alternative: Depletion

increases individuals’ motivation to conserve self-control resources. Hence, it appears that the reason self-control fails, especially after the previous exertion of self-control, is because people become more unwilling, and not less able, to exert self-control. These experiments help illuminate the inner workings of selfcontrol strength. It appears that self-control strength is treated like any other limited, valued, and slow to replenish resource: the less one has, the more one values what remains (as predicted by the economic law of diminishing marginal utilities). This motivation to conserve the self-control strength resource is consistent with other theories of the management of limited resources (Hobfoll, 2002) and can help explain why self-control fails: The more strength is depleted, the less people want to expend what strength remains. This also explains why external motivators can help encourage individuals to overcome depletion (Muraven & Slessareva, 2003)— concerns about the present task are more important than concerns about the future when one is given a strong incentive. Finally, the idea of protecting or conserving strengths, energies, and other resources underlies many theories of the management of limited resources (Hobfoll, 2002). We believe that adding perceptions of future demands may be an important component of how people treat these resources. This also points to other variables that have to be examined in relationship to self-control strength. For example, we would expect the conservation effect to be larger the more a person values the future tasks, especially relative to any intervening task. Individual differences in self-management and awareness of self-control may also moderate the overall effect. In conclusion, people are concerned with conserving their limited resources, especially when they expect to exert self-control in the future. This conservation leads to poorer self-control performance. Individuals lower in that resource, because of previous demands, should be particularly concerned with future demands and conservation. Thus, self-control fails not because people deplete all their resources but because they are concerned with having enough of the resource for the future. Conservation of limited resources, due to anticipated demands in the future, can lead to poorer self-control. This is a paradox of self-regulation: Human forethought and desire to exert self-control can lead to a breakdown of self-control in certain circumstances.

References Aspinwall, L. G., & Taylor, S. E. (1997). A stitch in time: Self-regulation and proactive coping. Psychological Bulletin, 121, 417– 436. Baker, S., & Kirsch, I. (1991). Cognitive mediators of pain perception and tolerance. Journal of Personality and Social Psychology, 61, 504 –510. Bargh, J. A., & Chartrand, T. L. (1999). The unbearable automaticity of being. American Psychologist, 54, 462– 479. Barkley, R. A. (1997a). ADHD and the nature of self-control. New York: Guilford Press. Barkley, R. A. (1997b). Behavioral inhibition, sustained attention, and executive functions: Constructing a unifying theory of ADHD. Psychological Bulletin, 121, 65–94. Baumeister, R. F., Bratslavsky, E., Muraven, M., & Tice, D. M. (1998). Ego-depletion: Is the active self a limited resource? Journal of Personality and Social Psychology, 74, 1252–1265. Baumeister, R. F., Heatherton, T. F., & Tice, D. M. (1994). Losing control: How and why people fail at self-regulation. San Diego, CA: Academic Press.

CONSERVING STRENGTH Carver, C. S., & Scheier, M. F. (1998). On the self-regulation of behavior. New York: Cambridge University Press. Chartrand, T. L., & Bargh, J. A. (1996). Automatic activation of impression formation and memorization goals: Nonconscious goal priming reproduces effects of explicit task instructions. Journal of Personality and Social Psychology, 71, 464 – 478. de Jong, R., Coles, M. G., Logan, G. D., & Gratton, G. (1990). In search of the point of no return: The control of response processes. Journal of Experimental Psychology: Human Perception and Performance, 16, 164 –182. Folkman, S., & Lazarus, R. S. (1985). If it changes it must be a process: A study of emotion and coping during three stages of a college examination. Journal of Personality and Social Psychology, 48, 150 –170. Gottlieb, B. H. (1987). Using social support to protect and promote health. Journal of Primary Prevention, 8, 49 –70. Hayes, S. C. (1989). Rule-governed behavior: Cognition, contingencies, and instructional control. New York: Plenum Press. Higgins, E. T. (1996). The “Self digest”: Self-knowledge serving selfregulatory functions. Journal of Personality and Social Psychology, 71, 1062–1083. Hobfoll, S. E. (2002). Social and psychological resources and adaptation. Review of General Psychology, 6, 307–324. Hobfoll, S. E., Johnson, R. J., Ennis, N., & Jackson, A. P. (2003). Resource loss, resource gain, and emotional outcomes among inner city women. Journal of Personality and Social Psychology, 84, 632– 643. Kanfer, F. H., & Karoly, P. (1972). Self-control: A behavioristic excursion into the lion’s den. Behavior Therapy, 3, 398 – 416. Litt, M. (1988). Self-efficacy and perceived control: Cognitive mediators of pain tolerance. Journal of Personality and Social Psychology, 54, 149 –160. Logan, G. D., & Cowan, W. B. (1984). On the ability to inhibit thought and action: A theory of an act of control. Psychological Review, 91, 295– 327. Mayer, J. D., & Gaschke, Y. N. (1988). The experience and metaexperience of mood. Journal of Personality and Social Psychology, 55, 102–111. Muraven, M., & Baumeister, R. F. (2000). Self-regulation and depletion of limited resources: Does self-control resemble a muscle? Psychological Bulletin, 126, 247–259. Muraven, M., Collins, R. L., & Nienhaus, K. (2002). Self-control and alcohol restraint: An initial application of the self-control strength model. Psychology of Addictive Behaviors, 16, 113–120. Muraven, M., & Slessareva, E. (2003). Mechanisms of self-control failure: Motivation and limited resources. Personality and Social Psychology Bulletin, 29, 894 –906. Muraven, M., Tice, D. M., & Baumeister, R. F. (1998). Self-control as a limited resource: Regulatory depletion patterns. Journal of Personality and Social Psychology, 74, 774 –789. Norris, F. H., & Kaniasty, K. (1996). Received and perceived social support in times of stress: A test of the social support deterioration deterrence model. Journal of Personality and Social Psychology, 71, 498 –511. Oosterlaan, J., & Sergeant, J. A. (1996). Inhibition in ADHD, aggressive,

537

and anxious children: A biologically based model of child psychopathology. Journal of Abnormal Child Psychology, 24, 19 –36. Park, C. L., & Folkman, S. (1997). Stability and change in psychosocial resources during caregiving and bereavement in partners of men with AIDS. Journal of Personality, 65, 421– 447. Rieger, M. (2004). Automatic keypress activation in skilled typing. Journal of Experimental Psychology: Human Perception and Performance, 30, 555–565. Schachar, R. J., & Logan, G. D. (1990). Impulsivity and inhibitory control in normal development and childhood psychopathology. Developmental Psychology, 26, 710 –720. Schachar, R. J., Tannock, R., & Logan, G. (1993). Inhibitory control, impulsiveness, and attention deficit hyperactivity disorder. Clinical Psychology Review, 13, 721–739. Scho¨npflug, W. (1983). Coping efficiency and situational demands. In G. R. J. Hockey (Ed.), Stress and fatigue in human performance (pp. 299 –330). New York: Wiley. See, J. E., Howe, S. R., Warm, J. S., & Dember, W. N. (1995). Metaanalysis of the sensitivity decrement in vigilance. Psychological Bulletin, 117, 230 –249. Shallice, T., & Burgess, P. (1993). Supervisory control of action and thought selection. In A. Baddeley & L. Weiskrantz (Eds.), Attention: Selection, awareness, and control (pp. 171–187). Oxford, England: Oxford University Press. Spacapan, S., & Cohen, S. (1983). Effects and aftereffects of stressor expectations. Journal of Personality and Social Psychology, 45, 1243– 1254. Tice, D. M., Baumeister, R. F., Shmueli, D., & Muraven, M. (2005). Restoring the self: Positive affect helps improve self-regulation following ego depletion. Manuscript submitted for publication. Tice, D. M., Bratslavsky, E., & Baumeister, R. F. (2001). Emotional distress regulation takes precedence over impulse control: If you feel bad, do it! Journal of Personality and Social Psychology, 80, 53– 67. Tiffany, S. T. (1990). A cognitive model of drug urges and drug-use behavior: Role of automatic and nonautomatic processes. Psychological Review, 97, 147–168. Tversky, A., & Kahneman, D. (1981, January 30). The framing of decisions and the psychology of choice. Science, 211, 453– 458. Vohs, K. D., & Heatherton, T. F. (2000). Self-regulatory failure: A resource-depletion approach. Psychological Science, 11, 243–254. Wallace, H. W., & Baumeister, R. F. (2002). The effects of success versus failure feedback on further self-control. Self and Identity, 1, 35– 42. Wegner, D. M., Schneider, D., Carter, S. R., & White, T. L. (1987). Paradoxical effects of thought suppression. Journal of Personality and Social Psychology, 53, 5–13. White, J. L., Moffitt, T. E., Caspi, A., Bartusch, D. J., Needles, D. J., & Stouthamer-Loeber, M. (1994). Measuring impulsivity and examining its relationship to delinquency. Journal of Abnormal Psychology, 103, 192–205.

Received July 1, 2005 Revision received March 31, 2006 Accepted April 4, 2006 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 538 –552

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.538

Five Types of Personality Continuity in Childhood and Adolescence Filip De Fruyt

Meike Bartels

Ghent University

Free University of Amsterdam

Karla G. Van Leeuwen, Barbara De Clercq, Mieke Decuyper, and Ivan Mervielde Ghent University This study examines 5 types of personality continuity—structural, mean-level, individual-level, differential, and ipsative—in a representative population (N ⫽ 498) and a twin and sibling sample (N ⫽ 548) of children and adolescents. Parents described their children on 2 successive occasions with a 36-month interval using the Hierarchical Personality Inventory for Children (I. Mervielde & F. De Fruyt, 1999). There was evidence for structural continuity in the 2 samples, and personality was shown to be largely differentially stable. A large percentage had a stable trait profile indicative of ipsative stability, and mean-level personality changes were generally small in magnitude. Continuity findings were explained mainly by genetic and nonshared environmental factors. Keywords: personality continuity and assessment, five-factor model, childhood, adolescence, behavior genetics

differential continuity and distinct mean-level change patterns can coexist, underscoring the independency of these two types of continuity (Block, 1971). Moreover, the two MAs combined help to advance a clear conceptual distinction between the two continuity types and are further useful to solve apparent inconsistencies among studies, summarizing the available data collected by the different researchers involved in the personality continuity debate. The two MAs raised new questions about additional forms of personality continuity and about the potential moderators and antecedents of stability. Although the MA on rank-order continuity (Roberts & DelVecchio, 2000) provided stability estimates from 3 years to old age, the majority of studies in the MA for the youngest age groups relied on a limited set of temperament and Q-sort measures. Comprehensive and hierarchically organized age-specific personality measures might be more appropriate to study homo- and heterotypic developmental changes at young age (Caspi, 1998). The MA on mean-level changes (Roberts et al., 2006) started from age 10 onward, because there is a dearth of studies on this continuity type for younger ages. Today, the MAs allow for estimation of the amount of mean-level change and the stability correlation between any two ages from age 10 onward, but there is less evidence on younger age groups. In addition to differential and mean-level continuity, other types of personality development have been examined, although less frequently (Caspi, 1998). Means on trait dimensions cannot be compared directly across measurement points when the covariance structure varies across time. Structural continuity refers to the invariance of the covariance structure across time and is a necessary requirement for the assessment of mean-level stability (Biesanz, West, & Kwok, 2003). In addition to the analysis of group means for different ages, one can also examine individuallevel change. Individual-level change refers to the magnitude of increase or decrease exhibited by a person on any given trait. These changes may be masked in an analysis of mean-level con-

In the past decades, there has been a wealth of studies on personality continuity that were recently summarized in two metaanalyses (MAs) of longitudinal data on differential (Roberts & DelVecchio, 2000) and mean-level (Roberts, Walton, & Viechtbauer, 2006) continuity. Differential continuity describes the degree to which the relative differences among individuals remains invariant across time, whereas mean-level stability refers to the extent to which personality scores change over time. Longitudinal designs are required to investigate differential stability, looking at trait correlations across time, whereas mean-level stability can be studied with the use of longitudinal data (Roberts et al., 2006). In addition, mean trait scores from cross-sectional age cohorts are useful for mean-level stability comparisons (McCrae et al., 1999, 2000). A MA on differential continuity (Roberts & DelVecchio, 2000) showed that people acquire increasingly stable relative trait positions with age, with largely linear increases in stability until a plateau is reached at least after age 50. A MA on mean-level stability (Roberts et al., 2006) showed that people tend to increase, especially in their 20s to 40s, in social dominance (a facet of extraversion), conscientiousness, and emotional stability. People further demonstrate increases on social vitality (another facet of extraversion) and openness in adolescence, but decrease on both traits in old age. Absolute changes for agreeableness were observed only in old age. These two MAs convincingly illustrate that

Filip De Fruyt, Karla G. Van Leeuwen, Barbara De Clercq, Mieke Decuyper, and Ivan Mervielde, Department of Developmental, Personality, and Social Psychology, Ghent University, Ghent, Belgium; Meike Bartels, Department of Biological Psychology, Free University of Amsterdam, Amsterdam, the Netherlands. Correspondence concerning this article should be addressed to Filip De Fruyt, Department of Developmental, Personality, and Social Psychology, Ghent University, H. Dunantlaan 2, B-9000, Ghent, Belgium. E-mail: [email protected] 538

PERSONALITY CONTINUITY IN CHILDHOOD AND ADOLESCENCE

tinuity, because equal numbers of individuals may increase or decrease on a trait, resulting in no change for the entire group. Finally, ipsative stability refers to the continuity of the configuration of traits within the individual and provides information on the stability of the patterning of traits within a person across time, hence facilitating a person-centered approach to personality development (Robins & Tracy, 2003). These additional types of personality continuity have been studied mainly from adolescence to adulthood (McCrae et al., 1999, 2000; Roberts & Del Vecchio, 2000; Robins, Fraley, Roberts, & Trzesniewski, 2001), although a few studies have addressed particular types of personality continuity in childhood (e.g., Van Lieshout & Haselager, 1994). Caspi, Roberts, and Shiner (2005) argued that there are relatively few studies that assess a comprehensive set of personality variables to track continuities and changes over time. No study has addressed all five types simultaneously across a substantial time interval in childhood and adolescence using a comprehensive and hierarchical five-factor model (FFM) personality measure. The present study examines all five types of continuity in two different samples of schoolchildren and adolescents assessed at two time points spanning a 36-month interval. To assess whether the observed continuity and change patterns generalize across studies, we examined continuity types in a representative population sample and a genetic-informative sample of twins and siblings. The different nature of the samples further enabled us to investigate important additional questions. The representative sample allowed the description of patterns of continuity and change that are notable in the general population of children and adolescents. The twin and sibling sample allowed the examination of genetic and environmental influences on personality continuity and change, enabling a genetic– environmental decomposition of the personality trait variances cross-sectionally but also of the trait covariance across assessment points. McGue, Bacon, and Lykken (1993) conducted a similar study of young adulthood, demonstrating that the stable core of personality is strongly associated with genetic factors and that personality change largely reflects environmental factors. They reported that, on average, “over 80% of the variance of the stable component of the Time 2 phenotype was associated with genetic factors” but also that “a majority of the Time 2 personality variance is unrelated to variance expressed at Time 1” (McGue et al., 1993, p. 105). As far as we know, the present work is the first behavioral genetic study seeking to identify and characterize genetic and environmental influences on individual differences in stability and change in children and adolescents using a comprehensive FFM measure. We examined personality continuity for a set of basic personality dimensions and facets using a lexically based measure assessing a broad range of personality traits relevant for childhood and adolescence. Adopting a comprehensive, age-appropriate inventory increases the likelihood of detecting changes and enhances the generalizability of findings. Although parents are the primary informants on children’s and adolescents’ personalities, the representative population sample also completed a self-report personality adjective measure so that we could examine shared method variance.

Temperament and Personality Developmental and child psychologists traditionally describe stable and observable differences in young children by relying on

539

temperamental constructs such as Negative Emotionality (Thomas & Chess, 1977) or Emotionality (Buss & Plomin, 1984), Sociability (Buss & Plomin, 1984) or Surgency (Rothbart & Derryberry, 1981), Task Persistence (Thomas & Chess, 1977) or Effortful Control (Rothbart & Derryberry, 1981), and Activity Level (Buss & Plomin, 1984; Goldsmith & Campos, 1982; Thomas & Chess, 1977). They assumed that temperamental differences are expressions of neurobiological mechanisms that have a strong genetic basis (Mervielde, De Clercq, De Fruyt, & Van Leeuwen, 2005). In contrast, personality traits are mainly used to chart stable latent differences in adults, presumed to be partly influenced by temperament and interaction with the environment. The revival of trait psychology, and especially the preponderance of the FFM (Digman, 1990), challenged these viewpoints. First, personality psychologists tend to agree that five broad dimensions, that is, Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness, can be considered the basic dimensions underlying adult personality. Personality psychologists interested in the developmental antecedents of this FFM subsequently showed that these five are also useful to describe individual differences in childhood and adolescence (Digman, 1963; Digman & Inouye, 1986; John, Caspi, Robins, Moffit, & Stouthamer-Loeber, 1994; Kohnstamm, Halverson, Mervielde, & Havill, 1998; Lamb, Chuang, Wessels, Broberg, & Hwang, 2002). John et al. (1994) introduced the basic personality dimensions to developmental psychologists and demonstrated that the “Little Five” predict externalizing problem behavior in children. The qualifier “Little” refers to dimensions denoting trait differences in childhood, paralleling the label “Big” used to refer to dimensions characterizing adults. Second, McCrae and colleagues (2000) argued against the artificial distinction between temperamental constructs and personality traits, because there are strong empirical and conceptual links between the domains of temperament and personality. The defining characteristics of temperament variables also apply to traits, including early observability, genetic basis, and pervasive impact on a range of behaviors (McCrae & Costa, 1997). Indeed, behavior genetic studies consistently document the strong genetic basis of FFM traits (Jang, McCrae, Angleitner, Riemann, & Livesley, 1998), with heritability estimates ranging from .40 to .60 depending on the trait and measure. McCrae and Costa (1996) conceptualized the FFM dimensions as basic tendencies shaping the interactions with the environment, resulting in characteristic (mal)adaptations such as interests, values, and attitudes but also problem behavior and psychopathology. Mervielde, De Clercq, et al. (2005) compared the major models of temperament with the FFM, illustrating the similarities and the conceptual overlap between temperament models and the FFM. Caspi et al. (2005) recently concluded that temperament and personality increasingly appear to be more alike than different.

Assessing Personality at Young Age A comprehensive assessment of age-specific indicators of traits is crucial to studying personality at young age and especially its development. Different approaches have been adopted to assess FFM dimensions in children and adolescents. Often, FFM measures—initially developed for adults—are used to describe differences in younger age groups (see, e.g., studies using the Revised NEO Personality Inventory [NEO-PI–R] to assess adolescents’

540

DE FRUYT ET AL.

personality: De Clercq & De Fruyt, 2003; De Fruyt, Mervielde, Hoekstra, & Rolland, 2000; McCrae et al., 2002). Other studies adapted the phrasing of personality items for younger age groups (e.g., the junior version of Eysenck’s Personality Questionnaire measure, EPQ–J; Eysenck, 1963; Eysenck, Makaremi, & Barrett, 1994). However, it can be argued that these adapted measures are probably not suitable for a fine-grained assessment of childhood and adolescent personality differences and especially not to assess developmental change (De Clercq, De Fruyt, & Van Leeuwen, 2004). Therefore, an alternative approach that is more sensitive to subtle personality differences at young age should be developed on the basis of the full range of personality differences observable prior to adulthood. The lexical approach to personality description (De Raad, & Perugini, 2002) provides a convincing rationale for the development of a comprehensive child and adolescent personality taxonomy. Mervielde and De Fruyt (1999) adopted this approach to construct such a taxonomy for classification of a large pool of parental personality descriptions of Flemish children aged between 6 and 13 years (Kohnstamm et al., 1998). They subsequently developed the Hierarchical Personality Inventory for Children (HiPIC), representing the content of parental descriptions in short sentence items referring to concrete and observable behavior. Therefore, the HiPIC can be considered a lexically based measure of the active parental vocabulary, in contrast with the NEO-PI–R, in which the facets are not derived empirically but are selected after a careful search of the adult personality literature. The HiPIC items span five broad domains, labeled as Extraversion, Benevolence, Conscientiousness, Emotional Instability or Neuroticism, and Imagination. Some domain labels differ from the lexical adult Big Five (Goldberg, 1993). The HiPIC dimensions Extraversion, Conscientiousness, and Emotional Instability refer to content that is similar to the adult Big Five counterparts and hence received the same label. The HiPIC Benevolence factor, however, refers to a broader set of traits than the adult Big Five or FFM Agreeableness factor because it includes traits linked to the “easy– difficult” child concept described in the temperament literature (Thomas, Chess, Birch, Hertzig, & Korn, 1963). The Benevolence factor refers to differences in the manageability of the child from the perspective of the parent-informant. The HiPIC Imagination domain comprises both Intellect and Openness to Experience items, blending the two alternative labels for the fifth factor emerging from adult adjective-based lexical studies (Goldberg, 1993) and the questionnaire-oriented FFM approach (Costa & McCrae, 1992). Given its specific focus and comprehensiveness, the HiPIC can be considered a more sensitive measure to assess personality change at young age (De Clercq et al., 2004).

Personality Development in Childhood and Adolescence Different developmental theories conceive puberty as a significant stage for social and personality development (Grotevant, 1998), involving changing social interactions with parents and peers and increasing societal influences. The importance of adolescence as a key transitional phase is also acknowledged by Erikson (1950, 1968), who considered adolescence a second individuation stage in which change is likely. Similarly, behavioral theories (Robin & Foster, 1989) emphasize that during adolescence, learning processes and contingencies are embedded in novel social networks and environments, including changing peer

groups. In addition, organismic theories such as Piaget’s theory of cognitive development (Piaget, 1983) suggest that newly acquired cognitive structures influence the way children and adolescents interact with their environment. Finally, personality theories such as Cloninger’s theory of character development (Cloninger, Svrakic, & Przybeck, 1993) discern qualitatively different life stages that an individual has to master before a more advanced developmental phase or level can be achieved. All these theories emphasize developmental discontinuities, but it remains to be established whether these discontinuities reflect changes in basic tendencies or whether they are restricted to changing characteristic adaptations (McCrae & Costa, 1996). The demonstration of different forms of stability across the FFM trait hierarchy in childhood and adolescence would underscore McCrae and Costa’s Five-Factor Theory (McCrae & Costa, 1996) and hence requires that the previously reviewed theories of personality development account not only for trait change but also for trait stability. Roberts and DelVecchio (2000) recently conducted an MA review of differential stability and examined whether and when stability peaks during the life course, challenging Costa and McCrae’s claim (1994; McCrae & Costa, 1984, 1990) that personality is “set like plaster” after age 30. They analyzed 3.217 test–retest correlation coefficients from 152 longitudinal studies and demonstrated an increase in stability from .31 during childhood to .54 in young adulthood, mounting to .64 by age 30, with the highest stability of .74 observed between 50 and 70 years, when the time interval was held constant at 6.7 years. These data suggest that personality is less stable during the preadult years. The estimated population correlations controlling for the time interval of the longitudinal study are .31 for ages 0 –2.9, .49 for ages 3–5.9, .43 for ages 6 –11.9, and .43 for ages 12–17.9, respectively. McCrae and colleagues (2002) examined both mean-level and differential changes during adolescence in self-descriptions using the NEO-PI–R, including an analysis of continuity at the level of the individual. They found that mean-level personality scores for Extraversion, Agreeableness, and Conscientiousness were stable between ages 12 and 18, but Neuroticism appeared to increase in girls and Openness to Experience increased in both boys and girls. A 4-year longitudinal study of intellectually gifted students showed a considerable degree of rank-order instability for gifted students across the two assessment points. Stability coefficients for boys across a 4-year interval ranged from .31 (Agreeableness) to .49 (Conscientiousness) and for girls from .30 (Neuroticism) to .63 (Conscientiousness). Individual-level continuity analyses relying on the reliable change index (RCI; Jacobson & Truax, 1991) indicated that about 60% of the sample did not change over the 4-year interval for each of the FFM dimensions. A similar analysis for older college-age individuals (Robins et al., 2001) showed, however, that almost 80% of the individuals were stable on each of the FFM dimensions, suggesting that differential stability is significantly lower during adolescence. The MA of Roberts et al. (2006) on mean-level change during adolescence (10 –18 years) showed a small but insignificant increase for Openness to Experience (d ⫽ .23; K ⫽ 13, N ⫽ 2,911), and significant increases for Social Dominance (d ⫽ .20; K ⫽ 5, N ⫽ 1,700) and Emotional Stability (d ⫽ 16; K ⫽ 23, N ⫽ 10,557). No significant mean-level changes were reported for Social Vitality, Agreeableness, and Conscientiousness during adolescence.

PERSONALITY CONTINUITY IN CHILDHOOD AND ADOLESCENCE

Block (1971) and Hart, Hofmann, Edelstein, and Keller (1997) used the California Q-set to examine within-person (ipsative) change, but to our knowledge no patterns of trait change in children and adolescents were examined with an inventory specifically designed to assess the FFM. Studies investigating stability of personality prototype classification using three personality profiles, that is, resilient, under-, and overcontrolled children, provided partial evidence of ipsative stability (De Fruyt, Mervielde, & Van Leeuwen, 2002). However, it was unclear whether the instability of prototype classification for some individuals should be attributed to real developmental change measurement error or the procedure to derive prototypes (De Fruyt et al., 2002). Structural continuity has not been studied explicitly in childhood and adolescence.

Method Participants Two longitudinal samples of children and young adolescents were available for the present analyses.1 Representative population sample. The first sample included children participating in a follow-up study investigating parenting, children’s personalities, and problem behavior at two assessment occasions separated by a 36-month interval. Participating families including 1 child were recruited by random sampling of elementary and secondary schools. The sample was stratified by province (East and West Flanders), region (rural or urban), school type (public, private, or Catholic schools) and grade (3rd, 4th, 5th, and 6th year) for elementary schools. For secondary schools, subject sampling was based on province (East and West Flanders), type of curriculum (vocational, technical, and general education), and grade (1st and 2nd year). Eighty percent of the elementary schools and 60% of the secondary schools granted permission to contact parents for this study. In case of refusal, schools were replaced with other randomly selected schools. A letter addressed to the parents informed them about the goal and the procedures of the research project. The response rate for parents of primary schoolchildren was 41%, and for parents with children in secondary schools, 39%. At Time 2, 82% of the families continued collaboration, 12% refused, and 6% could not be reached. This sampling method resulted in a well-balanced sample regarding socioeconomic status, gender, and age (Van Leeuwen, 2004). Families were visited at home by a trained psychology student at each assessment point. The mother, father, and child were instructed to independently complete a series of questionnaires. Parents described their own personality using the authorized Dutch translation of the NEO-PI–R (Costa & McCrae, 1992) and the personality of their child(ren) using the HiPIC (Mervielde & De Fruyt, 1999). Children also provided self-ratings of their personality using an adjective inventory at Time 2. Only participants who participated at both Time 1 and Time 2 were included in the study, resulting in a sample of 498 families, including 238 boys and 260 girls. There were no mean-level personality differences between the dropouts and those who continued participation, although both mothers and fathers who continued participation scored lower on Neuroticism, and fathers also scored higher, on average, on Extraversion, Openness to Experience, and Agreeableness. Indeed, continuing participation depended more on the personality of the parent than on that of the child. The mean age of the participants was 10.9 years (SD ⫽ 1.8 years; range 7–15) at Time 1, and 13.9 years (SD ⫽ 1.8 years; range 10 –18) at Time 2. Twin and sibling sample. The second sample included parents and children participating in a small-scale twin family study, providing HiPIC ratings of their twins and 1 sibling, if eligible, per family. HiPIC data on two assessment points with a 36-month interval were available for 548 children, all between the ages of 5 and 14 years, with a mean of 8.65 years (SD ⫽ 2.11 years). The sample included 271 boys and 277 girls.

541

Complex and differential interdependencies existed within the twin and sibling sample at the level of the units of observation, that is, within monozygotic (MZ) and dizygotic (DZ) twin pairs and between twins and siblings, but also at the level of the informants, that is, parents rating 2 (in the case of twins only) or 3 (twins ⫹ 1 sibling) children of the same family. The interdependencies at the level of the units of observation included a mixture of interchangeable (Griffin & Gonzalez, 1995) and noninterchangeable (Gonzalez & Griffin, 1999) cases, with same-sex twins to be considered interchangeable and different-sex or twin-sibling pairs as distinguishable. Such interdependencies might affect both the nature of the relationships observed across time and the level of significance (Gonzalez & Griffin, 2000). Because very different and complex analyses per type of personality continuity would have to be conducted to control for all possible dependencies that exist within such a dataset, a random sample of 1 child or adolescent twin per family was extracted to make the results directly comparable with the results obtained from the other samples involved in the present research, but also in previous MA work. The random twin sample included 208 twin children (boys, N ⫽ 106; girls, N ⫽ 102), whose ages were on average 8.50 years (SD ⫽ 1.92). All twins and siblings for whom complete data records were available across the two assessment moments were used for the genetic analyses, taking into account their interdependencies. The twin sample included 35 MZ male pairs; 44 MZ female pairs; and respectively, 33 male, 35 female, and 56 mixed-sex DZ pairs of twins. The average age of the twins was 8.34 years (SD ⫽ 1.94), and the mean age for the siblings was 9.79 years (SD ⫽ 2.34).

Questionnaires HiPIC. Parents rated both samples at the two assessment occasions on the HiPIC (Mervielde & De Fruyt, 1999). The HiPIC includes 144 items grouped into 18 facets that are hierarchically organized under the higher order factors. Parents were instructed to “describe the child by the way he or she has most often behaved over the last year” by indicating on a 5-point Likert-type scale the degree to which each statement was characteristic of the child to be assessed, with scale anchors labeled as barely characteristic, slightly characteristic, more or less characteristic, characteristic, and highly characteristic. All items have a similar grammatical format and are formulated in the third-person singular, avoiding negations in the item and excluding personality-descriptive adjectives. Facet labels directly reflect the nature of the parental free descriptions and are in some cases indicative of opposite poles of the domain scale they are assigned to, requiring the computation of reversed scores (R) before facets can be aggregated into a domain score. The HiPIC structure outlines, as follows (a sample item is included between parentheses): Extraversion consists of Energy (“bubbles with life”), Expressiveness (“shows feelings”), Optimism (“laughs through life”), and Shyness (R: facet score to be reversed; “needs time to get used to peers”); Benevolence comprises the facets Altruism (“defends the weak”), Dominance (R; “acts the boss”), Egocentrism (R; “takes him/herself into consideration first”), Compliance (“sticks to arrangements”) and Irritability (R; “is quick to take offense”); Conscientiousness includes Concentration (“works with sustained attention”), Perseverance (“keeps at it when the going gets tough”), Orderliness (“takes care of his/her possessions”) and Achievement motivation (“wants to be among the best”); Emotional Instability measures Anxiety (“is afraid of failure”) and Self-Confidence (R; “has confidence in own abilities”); and finally, Imagination is composed of the facets Creativity (“can use

1

Both samples were already used for other research purposes. The representative population sample was already used to study the interaction between child personality and parental behavior as predictors of problem behavior (Van Leeuwen, Mervielde, et al., 2004), and the twin and sibling sample for a study on the personality type approach (De Fruyt et al., 2002). However, for both these purposes no data were reported on the continuity of personality.

DE FRUYT ET AL.

542

everyday things in a new way”), Intellect (“grasps the meaning of things quickly”), and Curiosity (“asks many ‘why’ questions”). Benevolence is conceptually and empirically related to the adult FFM domain of Agreeableness, whereas Imagination is associated with the Openness to Experience domain (De Fruyt et al., 2000). The HiPIC’s robust factor structure and high internal consistencies of domains and facets have been documented in various studies with clinical and nonclinical samples (Van Hoecke, De Fruyt, De Clercq, Hoebeke, & Van de Walle, 2006; Van Leeuwen, De Fruyt, & Mervielde, 2004; Van Leeuwen, Mervielde, Braet, & Bosmans, 2004; Vollrath & Landolt, 2005). Domain scale reliabilities for the representative population and twin and sibling samples ranged from .76 (Extraversion, Time 1; representative population sample) to .89 (Conscientiousness, Time 1; twin and sibling sample), and for the facet scales, from .77 (Self-Confidence, Time 2; representative population sample) to .91 (Intellect, Time 1; twin and sibling sample). Questionnaire Big Five. The population sample also provided selfratings on the Questionnaire Big Five (QBF; Gerris et al., 1998; Goldberg, 1992), a Dutch shortened version of Goldberg’s 100 adjectives, at the second measurement point. The QBF includes 30 adjectives, 6 per FFM dimension, presented with a 7-point Likert-type scale. Scholte, van Aken, and van Lieshout (1997) and Dubas, Gerris, Janssens, and Vermulst (2002) have demonstrated that the QBF provides valid estimates of an individual’s standing on the FFM dimensions. Scale reliabilities were .77, .78, .67, .82, and .88 for Emotional Stability, Extraversion, Resourcefulness (the Openness scale of the QBF), Agreeableness, and Conscientiousness, respectively (Van Leeuwen, 2004).

Informants In most cases, both parents provided HiPIC ratings of the children in the population and the twin and sibling sample. Given their high intercorrelations, ranging from .57 (Emotional Instability, representative population sample) to .77 (Conscientiousness, representative population sample), father and mother ratings were averaged in both samples to obtain more reliable scores. Self-ratings on the QBF at follow-up were also available for all children and adolescents of the representative sample, enabling an examination of shared method effects.

Data Analytic Approach The two samples are age heterogeneous (at Time 1, age ranged from 7 to 15 years in the representative sample and from 5 to 14 years in the random selection of twins sample). The within-sample age range is thus much larger than the time interval of 36 months between the two assessment points. To account for this large age range, we combined the representative sample and the random selection of twins (N ⫽ 682) and then grouped in the following age groups at the first assessment: 6 –7 years (N ⫽ 88), 8 –9 years (N ⫽ 183), 10 –11 years (N ⫽ 201), and 12–13 years (N ⫽ 210) to make the analyses more developmentally informative. The five continuity types are examined primarily within these different age groups.

Results Structural Stability The demonstration of invariance of the correlation matrix across the two measurement occasions was a conditio sine qua non before other forms of continuity could be examined. Table 1 reports the intercorrelations among the Time 1 and Time 2 domain scores averaged across mothers and fathers for the four age groups. Similarly to Robins et al. (2001), we examined structural continuity using structural equation modeling comparing the fit of two models. The correlations among the five factors were freely esti-

mated at each measurement occasion in the first model, whereas in the second model the intercorrelations were constrained to be equivalent across assessments. A significant difference in the fit between these two models was considered indicative of structural change. The baseline model for the structural analysis at the Big Five factor level was a single-indicator latent variable model, with one latent variable associated with each of the 10 scores (five dimensions ⫻ two assessment occasions). This is a fully saturated model, with the variances of the latent variables fixed to 1 and the variances of the residuals fixed to 0. The correlations among the latent variables were freely estimated. In the second model, the correlations between all pairwise traits across the two assessment points were constrained to be equal. For example, the correlation between Extraversion and Emotional Instability at Time 1 was forced to equal the correlation between Extraversion and Emotional Instability at Time 2. Inspection of Table 2 shows that the intercorrelations among the five HiPIC domains were invariant across measurement occasions for all age groups: age group 6 –7, ␹2⌬(10, N ⫽ 88) ⫽ 15.69, p ⬎ .05; age group 8 –9, ␹2⌬(10, N ⫽ 183) ⫽ 8.47, p ⬎ .05; age group 10 –11, ␹2⌬(10, N ⫽ 201) ⫽ 9.52, p ⬎ .05; and age group 12–13, ␹2⌬(10, N ⫽ 210) ⫽ 11.45, p ⬎ .05. The previous analyses at the domain level could also be extended to examine structural invariance for the 18 facets across time. This very stringent test examining the invariance of 153 intercorrelations across time could not be done for the youngest age group because the number of parameters exceeds the number of participants. The 153 intercorrelations were not stable over time for age group 8 –9, with ␹2⌬(153, N ⫽ 183) ⫽ 207.27, p ⬍ .01; or for age group 10 –11, with ␹2⌬(153, N ⫽ 201) ⫽ 198.34, p ⬍ .01. Constraining the model for age group 12–13 showed no reduction in fit, with ␹2⌬(153, N ⫽ 210) ⫽ 175.31, p ⬎ .05, indicating that the 153 facet intercorrelations were invariant for the oldest age group across the 3-year interval. Factor structures of varimax-rotated principal component analyses of the HiPIC facets could also be compared, calculating factor congruences across assessment occasions for each age group. Table 3 shows that the factor congruence coefficients were all higher than .90 except for the coefficients for Imagination from the age groups 8 –9 and 10 –11, which were .85 and .84, respectively.

Differential Continuity Differential continuity coefficients across the 36-month interval (see Table 4), uncorrected for unreliability, were uniformly high for all domains across the four age groups, with values ranging from .61 (age group 6 –7, Emotional Instability) to .86 (age group 6 –7, Imagination), almost equaling test–retest reliabilities reported in the manual (Mervielde, De Fruyt, & De Clercq, 2005). For the first three age groups, coefficients for Emotional Instability were smaller than those for the other FFM domains, and the magnitude of the correlations decreased slightly with increasing age for Extraversion ( p ⬍ .01 between the youngest and the oldest age groups). Very similar conclusions could be drawn for the HiPIC facets, with values ranging between .57 (age group 6 –7, Anxiety) and .83 (age group 6 –7, Altruism), suggesting that considerable differential stability is manifested across all hierarchical levels of the FFM. Cross-time correlations computed separately for maternal and paternal ratings (not reported in Table 4) were about .05 to .10

PERSONALITY CONTINUITY IN CHILDHOOD AND ADOLESCENCE

543

Table 1 Intercorrelations Among the Five HiPIC Dimensions at Time 1 and Time 2 by Age Group HiPIC dimension

1. Emotional instability

2. Extraversion

3. Imagination

4. Agreeableness

5. Conscientiousness

⫺.41*** .28** — ⫺.01 .38***

⫺.05 .19 .14 — .39***

.04 .00 .42*** .42*** —

⫺.27*** .48*** — .01 .35***

⫺.18* .03 .15* — .44***

⫺.04 ⫺.04 .40*** .50*** —

⫺.37*** .48*** — .07 .45***

⫺.16* .03 .16* — .42***

⫺.20** .04 .48*** .47*** —

⫺.39*** .52*** — .10 .52***

⫺.19** .13 .22** — .33***

⫺.17* .22*** .54*** .49*** —

Age 6–7 years 1. 2. 3. 4. 5.

Emotional Instability Extraversion Imagination Agreeableness Conscientiousness

⫺.33** — .45*** ⫺.04 ⫺.09

— ⫺.47*** ⫺.36*** .03 .02

Age 8–9 years 1. 2. 3. 4. 5.

Emotional Instability Extraversion Imagination Agreeableness Conscientiousness

⫺.40*** — .46*** ⫺.02 ⫺.02

— ⫺.40*** ⫺.33*** ⫺.07 ⫺.03

Age 10–11 years 1. 2. 3. 4. 5.

Emotional Instability Extraversion Imagination Agreeableness Conscientiousness

⫺.40*** — .45*** ⫺.00 ⫺.01

— ⫺.37*** ⫺.43*** ⫺.10 ⫺.22***

Age 12–13 years 1. 2. 3. 4. 5.

Emotional Instability Extraversion Imagination Agreeableness Conscientiousness

⫺.44*** — .49*** .06 .17*

— ⫺.43*** ⫺.44*** ⫺.13 ⫺.20**

Note. HiPIC ⫽ Hierarchical Personality Inventory for Children. Intercorrelations at Time 1 are reported below the diagonal, and intercorrelations at Time 2 are reported above the diagonal; 6 –7 years: N ⫽ 88, 8 –9 years: N ⫽ 183, 10 –11 years: N ⫽ 201, and 12–13 years: N ⫽ 210. * p ⬍ .05. ** p ⬍ .01. *** p ⬍ .001.

lower than the averaged parental ratings for domains and facets, suggesting that the magnitude of the stability coefficients does not primarily result from the increased reliability to be expected from averaging across parents. The continuity coefficients computed for Table 2 HiPIC Domain and Facet-Level Structural Continuity Analyses by Age Group Across a 36-Month-Interval Age group

Chi-square

df

p

CFI

.109 .583 .483 .324

0.98 1.00 1.00 1.00

single maternal ratings could be corrected for unreliability using 12-week test–retest reliabilities described in the manual (Mervielde, De Fruyt, & De Clercq, 2005) for maternal raters: .72 (Emotional Instability), .74 (Extraversion), .83 (Imagination), .78 (Benevolence), and .82 (Conscientiousness). Adopting these corrections for unreliability provided averaged stability coefficients for maternal domain ratings of .93, .91, .89, and .86 for the age groups 6 –7, 8 –9, 10 –11, and 12–13, respectively. Correcting the coefficients obtained for the facets showed averaged corrected facet stability coefficients of .92, .90, .88, and .85 for the age groups 6 –7, 8 –9, 10 –11, and 12–13, respectively.

Domain level 6–7 years 8–9 years 10–11 years 12–13 years

15.69 8.47 9.52 11.45

10 10 10 10 Facet level

a

6–7 years 8–9 years 10–11 years 12–13 years

207.27 198.34 175.31

153 153 153

.00230 .00797 .10457

0.99 1.00 1.00

Note. HiPIC ⫽ Hierarchical Personality Inventory for Children; 6 –7 years: N ⫽ 88, 8 –9 years: N ⫽ 183, 10 –11 years: N ⫽ 201, and 12–13 years: N ⫽ 210; CFI ⫽ comparative fit index. a Total sample size is smaller than the number of parameters.

Table 3 HiPIC Factor Congruence Analyses Across a 36-Month Interval per Age Group Age group

EI

E

I

B

C

6–7 years 8–9 years 10–11 years 12–13 years

.90 .97 .97 .99

.95 .96 .95 .98

.90 .85 .84 .97

.98 .98 .99 .99

.95 .91 .98 .98

Note. HiPIC ⫽ Hierarchical Personality Inventory for Children. 6 –7 years: N ⫽ 88, 8 –9 years: N ⫽ 183, 10 –11 years: N ⫽ 201, and 12–13 years: N ⫽ 210; EI ⫽ Emotional Instability; E ⫽ Extraversion; I ⫽ Imagination; B ⫽ Benevolence; C ⫽ Conscientiousness.

DE FRUYT ET AL.

544

Table 4 HiPIC Differential Continuity Analyses by Age Group Across a 36-Month Interval HiPIC domain and facet

6–7 years

8–9 years

10–11 years

12–13 years

Emotional Instability Anxiety Self-confidence Extraversion Energy Expressiveness Optimism Shyness Imagination Creativity Intellect Curiosity Benevolence Altruism Dominance Egocentrism Compliance Irritability Conscientiousness Concentration Perseverance Order Achievement Striving

.61 .57 .61 .83 .81 .74 .75 .77 .86 .82 .82 .80 .77 .83 .72 .69 .66 .75 .76 .77 .72 .68 .72

.65 .62 .63 .78 .71 .75 .64 .78 .69 .72 .70 .64 .71 .73 .61 .63 .69 .69 .82 .80 .74 .80 .76

.63 .60 .66 .76 .76 .69 .70 .68 .77 .73 .76 .74 .79 .71 .78 .71 .73 .76 .77 .75 .73 .73 .72

.69 .65 .66 .66 .63 .67 .59 .70 .69 .65 .74 .62 .75 .69 .70 .69 .66 .70 .74 .73 .67 .73 .66

Note. HiPIC ⫽ Hierarchical Personality Inventory for Children; 6 –7 years: N ⫽ 88, 8 –9 years: N ⫽ 183, 10 –11 years: N ⫽ 201, and 12–13 years: N ⫽ 210. All correlations significant at p ⬍ .001.

To evaluate effects of having the same informants (parents) and measures (HiPIC) across assessment occasions, we also examined differential stability using different measures and raters in the representative population sample. HiPIC-averaged parental ratings obtained at Time 1 were correlated with QBF self-ratings assessed 3 years later. Coefficients are provided in Table 5. To enable an evaluation of the effect of having different raters (self vs. parents) and measures, we also describe the correlations between HiPIC and QBF at follow-up. The follow-up coefficients primarily reflect

the different informant and measures’ perspectives, with coefficients, uncorrected for unreliability, varying between .21 (Benevolence–Agreeableness) and .47 (Conscientiousness), with all coefficients on the diagonal showing the largest magnitude. These coefficients are in line with other studies using self- and parental reports in children and adolescents (De Clercq, De Fruyt, Koot, & Benoit, 2004). The coefficients across the 36-month interval were in general about .10 lower than the same-time coefficients, with again all coefficients at the diagonal being largest.

Table 5 Cross-Method/Informant Correlations Across 36 Months (Representative Population Sample) HiPIC domain

1. QBF–EmS

2. QBF–Ext

3. QBF–Res

4. QBF–Agr

5. QBF–Con

.02 .11* .05 .12** .08

.06 ⫺.08 .02 .11* .36***

⫺.02 .19*** .10* .21*** .17***

.07 ⫺.01 .10 .24*** .47***

3-year interval 1. 2. 3. 4. 5.

HiPIC–EI HiPIC–E HiPIC–I HiPIC–B HiPIC–C

⫺.27*** .07 .08 .10* .05

⫺.22*** .33*** .09* ⫺.05 ⫺.01

⫺.07 .19*** .26*** ⫺.01 .07

Same-time assessment 1. 2. 3. 4. 5.

HiPIC–EI HiPIC–E HiPIC–I HiPIC–B HiPIC–C

⫺.38*** .15*** .13** .13** .03

⫺.31*** .47*** .13** ⫺.04 ⫺.05

⫺.12** .23*** .35*** ⫺.03 .07

Note. HiPIC ⫽ Hierarchical Personality Inventory for Children; QBF ⫽ Questionnaire Big Fire; QBF–EmS ⫽ Emotional Stability; QBF–Ext ⫽ Extraversion; QBF–Res ⫽ Resourcefulness; QBF–Agr ⫽ Agreeableness; QBF–Con ⫽ Conscientiousness; HiPIC–EI ⫽ Emotional Instability; HiPIC–E ⫽ Extraversion; HiPIC–I ⫽ Imagination; HiPIC–B ⫽ Benevolence; HiPIC–C ⫽ Conscientiousness. * p ⬍ .05. ** p ⬍ .01. *** p ⬍ .001.

PERSONALITY CONTINUITY IN CHILDHOOD AND ADOLESCENCE

The differences in magnitude between the different-measures and different-raters’ correlations across time—relative to the samemeasure parental correlations across the 36-month interval—are thus largely attributable to the different informants and measures, rather than testimony of poor differential continuity across this time lag. McCrae (1994) showed by path analytic arguments that the stability of the true score can be easily estimated by dividing the predictive correlation by the concurrent correlation. The concurrent correlation between observers (self vs. parent) and/or different instruments (HiPIC vs. QBF) sets an upper limit to the agreement that can be expected, and the closer the cross-time correlation across observers is to this concurrent correlation, the more stable the trait (McCrae, 1994, p. 163). Adopting this estimation method to the present data shows true score validity estimates of .71 (.27/.38), .70 (.33/.47), .74 (.26/.35), .57 (.12/.21), and .77 (.36/.47) for Emotional Instability, Extraversion, Imagination, Benevolence, and Conscientiousness, respectively.

Mean-Level Continuity We examined mean-level stability using repeated measures analysis of variance, including gender as a covariate (Costa, Terracciano, & McCrae, 2001; McCrae et al., 2002). The analyses of mean-level domain differences presented in Table 6 demonstrate strong parallels across the youngest age groups, with no meanlevel differences reported for the age groups 6 –7 and 8 –9 for each of the Big Five. Small mean-level decreases in Emotional Instability were found for age groups 10 –11, F(1, 199) ⫽ 9.57, p ⬍ .01, ε2 ⫽ .05, and 12–13, F(1, 208) ⫽ 9.86, p ⬍ .01, ε2 ⫽ .05; and decreases in Imagination, F(1, 208) ⫽ 10.11, p ⬍ .01, ε2 ⫽ .05, and Conscientiousness, F(1, 208) ⫽ 4.13, p ⬍ .05; ε2 ⫽ .02, were found for age group 12–13. No mean-level domain changes were found for Extraversion and Benevolence. Mean-level stability analyses at the fact level (not reported in a table) showed a mean decrease for Dominance, F(1, 86) ⫽ 5.27, p ⬍ .05, ε2 ⫽ .06, Optimism, F(1, 86) ⫽ 6.42, p ⬍ .05, ε2 ⫽ .07, and Creativity, F(1, 86) ⫽ 6.89, p ⬍ .01, ε2 ⫽ .07, for age group 6 –7; a mean increase for Altruism, F(1, 181) ⫽ 5.46, p ⬍ .05, ε2 ⫽ .03, for age group 8 –9; and mean decreases for Energy, F(1, 199) ⫽ 10.87, p ⬍ .001, ε2 ⫽ .05, Anxiety, F(1, 199) ⫽ 12.10, p ⬍ .05, ε2 ⫽ .06, and Creativity, F(1, 199) ⫽ 48.55, p ⬍ .001, ε2 ⫽ .20, for age group 10 –11. The largest number of changes was found for age group 12–13, with mean decreases for facets across all domains: Irritability, F(1, 208) ⫽ 11.52, p ⬍ .001, ε2 ⫽ .05; Achievement Striving, F(1, 208) ⫽ 13.31, p ⬍ .001, ε2 ⫽ .06; Energy, F(1, 208) ⫽ 4.84, p ⬍ .05, ε2 ⫽ .03; Expressiveness, F(1, 208) ⫽ 7.62, p ⬍ .01, ε2 ⫽ .04; Anxiety, F(1, 208) ⫽ 17.91, p ⬍ .001, ε2 ⫽ .08; Creativity, F(1, 208) ⫽ 15.31, p ⬍ .001, ε2 ⫽ .07; and Curiosity, F(1, 208) ⫽ 14.41, p ⬍ .001, ε2 ⫽ .07. In general, the magnitude of these differences was small, except for the decrease in Creativity for age group 10 –11, and the majority of the observed differences were not significant after application of the Bonferroni correction for the number of statistical tests.

Individual-Level Continuity We also examined whether mean-level continuity extended to the individual level, examining the number of individuals showing decreased, equal, or increased trait scores, using the RCI2 (Chris-

545

tensen & Mendoza, 1986; Jacobson & Truax, 1991). Robins et al. (2001) and Roberts, Caspi, and Moffitt (2001) used this RCI to examine reliable change across the FFM traits. The RCI has been developed to assess the clinical significance of change after therapeutic intervention and is computed as RCI ⫽ X2⫺X1/Sdiff, where X1 represents a person’s score at Time 1, X2 represents that same person’s score at Time 2, and Sdiff is the standard error of difference between the two test scores. Computing RCIs for a person’s Big Five profile enables researchers to determine how many individuals remain stable on their five-trait pattern across time but also provides information on the frequency of FFM individual-level change patterns. A child can be stable on four of the FFM dimensions but decline on Emotional Instability. This type of analysis is particularly useful to examine the kind and direction of individual changes but is also informative to describe the denser or more frequent trait-change configuration patterns in a large sample. Frequent individual trait-change configurations point to the kind of changes that can be expected during particular life stages, especially when replicable across samples of the same age range. Difference scores per individual are compared with the distribution of change scores that would be expected from error of measurement alone and hence separates true change from change attributable to measurement error. Such analyses require independent test–retest reliability estimates. For the present analyses, HiPIC test–retest estimates for maternal3 ratings across a 12-week interval reported in the manual (Mervielde, De Clercq, et al., 2005) were used, that is, .72 (Emotional Stability), .74 (Extraversion), .83 (Imagination), .78 (Benevolence), and .82 (Conscientiousness). Change scores exceeding a 95% confidence interval are assumed to represent true change (McCrae et al., 2002; Robins et al., 2001). The majority of participants did not change FFM positions during the 36-month interval across the different age groups. Sixty-seven percent (age group 8 –9 years) to 77% (age group 6 –7 years) were ascribed similar FFM scores at the two assessment points. If children or adolescents changed on personality scores, change was usually restricted to one (about 20% across age groups) or two (about 5%–10%) FFM domains. None of the participants in the total sample (N ⫽ 682) changed on each of the FFM domains, and only 1 adolescent of age group 12–13 changed scores on four of the FFM domains. Some trait change patterns were denser across age groups, with a substantial number of individuals (more than 10%) from age groups 8 –9 and 12–13 years exhibiting decreased scores on Imagination.

Ipsative Continuity Ipsative continuity is usually examined with two methods. The first approach relies on Cronbach and Gleser’s observation (1953) 2 There has been some debate about the appropriateness of the RCI to account for regression to the mean effects when used to evaluate the effects of therapeutic interventions. However, this criticism applies less to the present application of the RCI, because no explicit assumptions are made regarding the direction of the expected change. Individuals can become, for example, more extraverted or more introverted. 3 An anonymous reviewer pointed out that we do not have short-term test–retest correlations for HiPIC ratings averaged across parents, which is true. However, as we demonstrated, single parental ratings are only about .05 to .10 smaller across a 3-year interval than averaged ratings, so it can be expected that differences are even smaller across a shorter interval.

DE FRUYT ET AL.

546

Table 6 HiPIC Domain Mean-Level Continuity Analyses by Age Group Across 36 Months Domain and age group Emotional Instability 6–7 years 8–9 years 10–11 years 12–13 years Extraversion 6–7 years 8–9 years 10–11 years 12–13 years Imagination 6–7 years 8–9 years 10–11 years 12–13 years Benevolence 6–7 years 8–9 years 10–11 years 12–13 years Conscientiousness 6–7 years 8–9 years 10–11 years 12–13 years

Time 1

Time 2

F

p

ε2

22.41 21.85 21.39 21.69

22.79 20.57 20.32 20.11

.36 .76 9.57 9.86

ns ns .01 .01

.05 .05

29.17 28.85 28.70 27.82

28.67 28.15 27.63 26.89

2.33 .22 1.29 2.14

ns ns ns ns

30.38 30.67 29.86 28.68

29.25 29.45 28.59 27.58

.92 2.57 3.31 10.11

ns ns ns .01

26.56 27.90 28.00 27.90

26.94 28.42 28.44 27.92

.83 .03 1.31 1.88

ns ns ns ns

26.20 26.59 26.24 25.80

25.66 26.28 26.02 25.22

.01 1.13 .03 4.13

ns ns ns .05

.05

.02

Note. 6 –7 years: N ⫽ 88, 8 –9 years: N ⫽ 183, 10 –11 years: N ⫽ 201, and 12–13 years: N ⫽ 210; F according to Wilks’s lambda; degrees of freedom for all analyses are (1, 86) for 6 –7 years, (1, 181) for 10 –11 years, and (1, 208) for 12–13 years; ns ⫽ nonsignificant. HiPIC ⫽ Hierarchical Personality Inventory for Children.

that individual profiles can vary in three major ways: elevation (the average level of scores), scatter (the variability of scores), and shape (the patterning of scores). Cronbach and Gleser (1953) developed three indices, D2, D⬘2, and D⬙2, for quantifying these sources of variance. D2 is sensitive to differences in elevation, scatter, and shape and quantifies the squared differences between Big Five traits at two assessment occasions. D⬘2 is only sensitive to differences in scatter and shape and quantifies the squared differences between Big Five profiles after each profile has been centered around its mean. Finally, D⬙2 only reflects differences in shape and quantifies the squared differences between profiles after each profile has been standardized (Cronbach & Gleser, 1953; Robins et al., 2001). Using these profile similarity indices, Robins and colleagues (2001) demonstrated that only 17% of their undergraduate sample showed significant changes in the shape of their profile across a 4-year interval. The majority of the changes in profiles were related to changes in elevation and/or scatter but not shape. Parallel to the study by Robins et al. (2001, p. 626), probabilities were estimated by simulating trait scores on a sample of 50,000 individuals with identical levels of elevation, scatter, and shape in a person’s profile at the two measurement points and examining corresponding distributions. Simulated trait scores were based on means, variances, and covariances estimated from the real data, and the test–retest reliability coefficients reported in the manual (Mervielde, De Clercq, et al., 2005) were used to estimate the error variance. Across the four age groups, the D2 values ranged from 35.70 to 45.68, the D⬘2 values from 22.88 to 30.46, and the D⬙2 from .20 to .22. To interpret these values, we simulated four samples of 50,000

participants, assuming no reliable changes in profiles across time: that is, with similar scores for elevation, scatter, and shape per age group. This simulation produced distributions for the D2, D⬘2, and D⬙2 indices per age group, with the 95th percentiles for age group 6 –7 years at 98.65, 84.46, and 1.31, respectively; for age group 8 –9 years at 89.62, 76.16, and 1.19, respectively; for age group 10 –11 years at 93.24, 79.11, and 1.20, respectively; and finally, for age group 12–13 years at 108.60, 92.58, and 1.39, respectively. Individuals with values beyond these 95th percentile values were considered to have significantly changed profiles. Respectively, 9.1%, 14.8%, 14.9%, and 16.7% of the children per age group had D2 values beyond the simulated cutoffs; 5.7%, 10.4%, 11.4%, and 12.9% had D⬘2 values beyond the simulated cutoffs; and 9.1%, 8.2%, 6.0%, and 9.0% had D⬙2 values beyond the respective cutoffs, suggesting that children’s and adolescents’ profiles primarily reflected changes of elevation and scatter, but less so in terms of shape. Less than 10% of the individuals across all age groups exhibited changes in the shape of the profile. A second but related approach to examining ipsative stability is to compute Q correlations, that is, within-person correlations across the HiPIC domains or facets at Time 1 and Time 2. Robins et al. (2001) found that Big Five profile correlations ranged from .95 to .97 during college years, with a mean of .61 (SD ⫽ .39) and a median of .76. Within-person correlations have to be evaluated against the distribution of within-person correlations that can be found in a sample with a similar mean and standard deviation, but in which profiles are randomly paired across assessment occasions. Robins et al. (2001) found an average value of .20 for a simulated data set.

PERSONALITY CONTINUITY IN CHILDHOOD AND ADOLESCENCE

Q correlations across the 18 HiPIC facets were computed per age group and compared with the distributions of Q correlations for simulations randomly pairing Time 1 with Time 2 assessments (i.e., data were from the same group of persons, but scores were paired randomly across time) per age group, having identical means and standard deviations at Time 1 and Time 2 as the real data. Across the four age groups, the median Q correlations ranged from .81 to .85, far above the median Q correlations in the simulated data (.35 to .43), suggesting stability of the withinindividual facet-trait profile for a large number of children and adolescents.

Behavior Genetic Determinants of Continuity Saturated model. Because of the small sample size, we combined MZ twins and DZ twins across gender to increase the power of the study. If nonsignificant effects of sex on variances and covariances are found, MZ male and MZ female twins can be combined in one group, and DZ male, DZ female, and DZ twins of opposite sex can be combined in one group. To allow sex differences in mean, we implemented sex as a covariate in the genetic model fitting. Tests were conducted with a saturated model in Mx (see http://www.psy.vu.nl/mxbib). Genetic model fitting. Cross-sectional heritabilities and genetic and environmental influences on stability of the domains were estimated with the use of bivariate genetic analyses in Mx (Neale, 2003). A Cholesky decomposition, with sex and age as covariates (definition variables), was used. The Cholesky decomposition is descriptive and not driven by a specific developmental hypothesis. It decomposes a covariance matrix into genetic and nongenetic covariance matrices and is a first approach to obtaining genetic and environmental correlations across time in longitudinal datasets. Because of the small sample sizes, the analyses lacked power to test for the significance of the distinct variance components. However, significant effects of sex (girls showed higher ratings of Emotional Instability, Benevolence, and Conscientious-

547

ness at Time 1 and Time 2) and age (older children showed lower levels of Emotional Instability and Extraversion at Time 1 and Time 2) on the mean were found. Estimates for the full ACE model with the 95% confidence intervals are given. No significant differences in variances and covariances for boys and girls were observed for any of the domains ( p saturated model fitting is ⬎ .05 for all domains at the two time points), so for the remaining genetic analyses, MZ boys and girls were combined in one group, and DZ boys and girls (including twins of opposite sex) were combined in one group. Standardized estimates (with 95% confidence intervals) of additive genetic, shared environmental, and nonshared environmental influences on all domains at Time 1 and Time 2 and on the covariance between Time 1 and 2 are given in Table 7. Heritability for the distinct domains varied from 7% for Emotional Instability at the second assessment occasion and 43% for Imagination at Time 1. Relatively large influences of nonshared environment were observed, partly representing measurement error. Shared environmental influences were small, except for Imagination, but it should be kept in mind that this variance component is hard to detect, and insufficient power gives rise to underestimation of the effects of shared environment. Stability of the personality domains between the two assessment points was accounted for by additive genetic, shared environmental, and nonshared environmental influences (see the boldfaced items in Table 7). The additive genetic standardized estimate for the trait covariance across time varied between .19 (Emotional Instability) and .51 (Benevolence). Estimates for the cross-time trait covariance for the shared environment were usually small, except for Imagination (.34), and estimates for the nonshared environment ranged from .31 (Imagination) to .68 (Emotional Instability). Genetic and environmental correlations (i.e. overlap in influencing genes or environmental factors) between the first and the second assessment points are given in Table 8. Genetic correlations and shared environmental correlations were 1.0 or almost 1.0

Table 7 Standardized Estimates of Additive Genetic, Shared Environmental and Nonshared Environmental Influences on Variances and Covariances HiPIC domains Emotional Instability Time 1 Time 2 Extraversion Time 1 Time 2 Imagination Time 1 Time 2 Benevolence Time 1 Time 2 Conscientiousness Time 1 Time 2

A at Time 1

A at Time 2

C at Time 1

C at Time 2

E at Time 1

E at Time 2

.23 (.005–.43)a .19 (⫺.02–.46)c

.07 (.00–.23)b

.11 (.00–.23)d .13 (.00–.27)e

.06 (.00–.16)f

.66 (.49–.87)g .68 (.45–.90)h

.87 (.71–.97)i

.39 (.16–.56) .40 (.12–.62)

.28 (.05–.49)

.06 (.00–.16) .10 (.00–.23)

.10 (.00–.23)

.55 (.40–.76) .50 (.32–.76)

.61 (.44–.83)

.43 (.28–.57) .35 (.18–.52)

.29 (.11–.45)

.29 (.17–.40) .34 (.21–.46)

.30 (.20–.41)

.28 (.20–.41) .31 (.20–.46)

.41 (.29–.57)

.38 (.17–.54) .51 (.24–.69)

.40 (.16–.57)

.00 (.00–.05) .02 (⫺.02–.10)

.07 (.00–.16)

.61 (.46–.83) .47 (.30–.74)

.53 (.38–.76)

.36 (.14–.53) .46 (.20–.65)

.38 (.15–.55)

.00 (.00–.07) .00 (.00–.07)

.00 (.00–.06)

.64 (.47–.86) .54 (.35–.80)

.62 (.45–.85)

Note. Boldfaced values indicate covariances, and 95% confidence intervals are in parentheses. HiPIC ⫽ Hierarchical Personality Inventory for Children; A ⫽ heritability; C ⫽ shared environment; E ⫽ nonshared environment. a Heritability Time 1; b Heritability Time 2; c Influence of A on covariance between Time 1 and Time 2. d Shared environment Time 1; e Shared environment Time 2; f Influence of C on covariance between Time 1 and Time 2. g Nonshared environment Time 1; h Nonshared environment Time 2; i Influence of E on covariance between Time 1 and Time 2.

DE FRUYT ET AL.

548

Table 8 Genetic and Environmental Correlations for the HiPIC Domains HiPIC domain and time Emotional instability Time 1 Time 2 Extraversion Time 1 Time 2 Imagination Time 1 Time 2 Benevolence Time 1 Time 2 Conscientiousness Time 1 Time 2

A at Time 1

A at Time 2

C at Time 1

C at Time 2

1 1.00

1

1 1.00

1

1

1 1.00

1

1 .94 1

1

1

.67

1

1 .94

1

1 1.00

1

1 1.00

1

.74

1

1

1 1

.57 1

1

.98

E at Time 2

1

.82

1

E at Time 1

.63

1

1 .97

1

.67

1

Note. HiPIC ⫽ Hierarchical Personality Inventory; A ⫽ heritability; C ⫽ shared environment; E ⫽ nonshared environment.

for all HiPIC domains, suggesting one underlying set of genes and one underlying set of shared environmental influences for personality at both assessment occasions. Nonshared environmental correlations varied from .57 to .74, suggesting that large parts overlap but that there are also time-specific nonshared environmental influences, presumably measurement error.

Discussion The present study examined five types of personality continuity in children and adolescents as rated by their parents on a lexically based measure specifically designed to assess personality at young age. The present work extends two recent MAs on differential (Roberts & DelVecchio, 2000) and mean-level (Roberts et al., 2006) personality continuity, assessing the two continuity types for younger age groups and examining three additional types of continuity for children and adolescents. Moreover, genetic and environmental influences on trait continuity were also estimated.

Structural Continuity The SEM analyses showed that the covariance among HiPIC domains was clearly invariant across a 36-month interval for the different age groups, providing evidence for structural stability at the five-factor level from childhood to late adolescence. The results from these analyses at the domain level were also confirmed by the congruency analyses comparing factor structures for the different age groups across time points. All congruence coefficients were larger than .85, a value considered indicative of structural replicability (Haven & ten Berge, 1977), except for a minor deviation (.84) for Imagination in the 10 –11 age group. The most stringent test of structural personality continuity, examining the longitudinal invariance of the 153 facet intercorrelations, was positive only for the oldest age group of 12–13 years, suggesting that structural invariance applies to both hierarchical levels of the HiPIC, at least from 12–13 to 15–16 years. Structural

invariance at the HiPIC facet level could not be demonstrated for the 8 –9-year-olds and the 10 –11-year-olds, although the chisquare values were small considering the large number of parameters, and the CFI indices were 1.0 or close to 1.0. The present analyses clearly underscored that the positioning of the major dimensions of personality in childhood and adolescence is stable across a substantial time interval for different ages in childhood and adolescence. The demonstration of structural continuity for these age groups is a prerequisite before ipsative, mean-, and individual-level personality continuity data can be adequately examined and interpreted (Biesanz et al., 2003) and hence further legitimates the interpretation of MA findings reported for adolescence (Roberts et al., 2006).

Differential or Rank-Order Continuity The rank ordering of individuals across the 3-year interval was demonstrated to be very stable, with coefficients usually beyond .70 — uncorrected for unreliability—for the domains of Extraversion, Imagination, Benevolence, and Conscientiousness. Differential continuity for Emotional Instability was somewhat lower but still above .60. These findings largely generalized across the different age groups, except for a decline in rank-order continuity for Extraversion with increasing age. Analyses at the facet-level produced very similar results. Slightly lower correlation coefficients were observed for maternal ratings. These could be further corrected for unreliability showing correlations, averaged across FFM domains, ranging from .93 for the 6 –7-year-olds to .86 for the 12–13-year-olds. The results obtained here differ in three respects from previous reports on rank-order continuity. First, much higher correlations were found than those described in the MA on rank-order stability of Roberts and DelVecchio (2000) and the self-report results described by McCrae et al. (2002). The coefficients obtained in the present work double the correlations reported for this life stage in the MA of Roberts and DelVecchio (2000), estimated to be .43 for

PERSONALITY CONTINUITY IN CHILDHOOD AND ADOLESCENCE

12-year-olds, and they are also much higher than the median 4-year stability coefficient of .38 reported by McCrae et al. (2002) in a sample of gifted youth. The present coefficients obtained from parental ratings are similar to the correlations reported for adults, underscoring that FFM traits—when rated by parents—show already differential stability at young age. Second, correlation coefficients were similar across age groups, except for a decrease of differential stability for Extraversion in the oldest group. These findings do not confirm the expectation (Roberts & DelVecchio, 2000) that differential continuity will increase with age. According to the present data, individuals have a stable positioning relative to each other in elementary and the first years of secondary school, except for Extraversion, at least when parental reports are considered. A final difference is that in adulthood, self- and peer-reported rank-order continuity provide more similar coefficients (Costa & McCrae, 1988), whereas the findings from the present study— compared with those reported by Roberts and DelVecchio (2000) and McCrae et al. (2002)—suggest large differences in rank-order stability of adolescent self-reports versus parental reports. It could be argued that the higher rank-order coefficients for parental ratings are due to the tendency of parents to retain a lasting image of their child despite real changes. Indeed, involving the same observers and measures across assessment occasions makes it difficult to disentangle observer from developmental effects. However, the analysis correlating HiPIC parental assessments at Time 1 with QBF self-ratings at follow-up showed that the lower crosstime correlations were mainly attributable to the different informant perspectives (parent vs. self-rating) and the different inventories (HiPIC vs. QBF), rather than questioning differential stability. Relying on path-analytic arguments (McCrae, 1994), we see that true estimates for all HiPIC domains were above .70, except for a lower value of .57 for Benevolence in a design with different raters but also different measures. These values are certainly higher than those meta-analytically computed by Roberts and DelVecchio (2000) but remain considerably lower than similarly estimated true-score stabilities in adults as reported by Costa and McCrae (1988). The lower validities of the self-reports in other studies might be alternatively explained by a limited insight in personality in early adolescence.

Mean-Level Stability No mean-level changes in childhood (age groups 6 –7 and 8 –9 years) were reported at the HiPIC domain level. In line with the findings of Roberts et al. (2006), mean-level decreases in Emotional Instability were reported from age 10 onwards, and no changes were reported for Benevolence. Contrary to the Roberts et al. (2006) findings, Conscientiousness and Imagination showed small decreases in age group 12–13, whereas Roberts et al. found no change for Conscientiousness and found increased scores, although insignificant, for Openness to Experience. Roberts et al. (2006) distinguished two facets of Extraversion— Social Vitality and Social Dominance—and reported a significant increase for Social Dominance in adolescence. Social Vitality reflects traits such as sociability, positive affect, gregariousness, and energy level and is probably closely related to the HiPIC’s Extraversion factor, including Energy, Shyness (R), Expressiveness, and Optimism, whereas their Social dominance category is probably best reflected by the HiPIC’s Dominance facet of Be-

549

nevolence. No mean-level changes for Extraversion were found in the present work, and mean-level Dominance scores slightly decreased for the youngest age group in childhood but did not show normative changes thereafter. The majority of differences at the facet level were reported for the late adolescents in our sample, although the magnitude of these differences was small, except the decrease in Curiosity reported for the 10 –11-year-olds. Curiosity and also Creativity showed a slight decrease in the 12–13-yearolds. These findings are counter to those of Roberts et al., who found an increase in Openness during adolescence. An anonymous reviewer suggested that perhaps some of the HiPIC Curiosity and Creativity items are less applicable for these older age groups and are hence less frequently endorsed. However, inspection of the content of the Curiosity and Creativity items suggests that all items are applicable to a broader age range, including adolescence.

Individual-Level Changes The analysis of the individual trait-change patterns demonstrated that two thirds (age group 8 –9) to three quarters (age group 6 –7) of all individuals did not show reliable change on any of the FFM dimensions underscoring individual-level continuity for a substantial number of individuals. If change occurred, it was usually restricted to one FFM domain. Only one trait-change pattern occurred more frequently; that is, children and adolescents showing decreased scores on Imagination, especially in the 8 –9year-old group and the 12–13-year-old group. These observations are counter to the findings of Roberts et al. (2006) and McCrae et al. (2002). The implications of the individual-level findings for the personality research agenda are twofold. First, the present findings underscore the necessity to study the determinants of stability (Caspi et al., 2005), because the majority of children and adolescents are ascribed stable positions on traits. Secondly, determinants of change have to be investigated (Mroczek & Spiro, 2003), although the absence of dense trait-change patterns in the present samples suggests that the nature and direction of trait changes vary widely within the population, underscoring the necessity to examine very large groups. If the divergent pattern of individual trait changes would be replicated in such large samples, this would suggest that trait change is largely specific for the individual or even random, rather than normative.

Ipsative Stability Although both individual-level and ipsative continuity indices focus on the individual, they are psychometrically very different. The computation of the RCI to inspect individual-level changes requires a comparison with the standard deviation of the population, contrary to the analysis of ipsative stability that relies exclusively on the individual’s scores (De Fruyt, Van Leeuwen, Bagby, Rolland, & Rouillon, 2006). Ipsative stability was examined using two procedures applied to different hierarchical levels of the FFM. Q correlations were computed across the individual’s facet scores at the two assessment points, whereas Cronbach and Gleser’s (1953) D2, D⬘,2 and D⬙2 indices were used to examine differences among HiPIC domain-trait profiles across time. The results from both types of analyses strongly converged. Substantially higher Q correlations within individuals across time were observed than for simulated ratings, assuming a random

550

DE FRUYT ET AL.

pairing of Time 1 and Time 2 scores. Using Cronbach and Gleser’s (1953) D indices, we found that less than 10% of the sample exhibited changes in the shape of their domain profile, suggesting that if change occurs, it is mostly change of elevation or scatter. Similar ipsative analyses conducted across a 4-year interval in undergraduate students (Robins et al., 2001) showed that 17% demonstrated changes in the shape of their profile, suggesting that individual-level changes are observed more frequently in young adulthood than in childhood or adolescence. Behavior genetic determinants of continuity. The twin and sibling sample, although small in size, provided a unique opportunity to decompose trait variances cross-sectionally as well as trait covariances across assessment points. In line with previous behavior genetic studies (Caspi et al., 2005; Loehlin, 1992), we estimated the additive genetic variance for the HiPIC domains to be around .30, varying between .07 (Emotional Instability, Time 2) and .43 (Imagination, Time 1). Smaller estimates were obtained for the shared environment, except for Imagination, and the remaining variance was explained by the nonshared environment, including measurement error. The small variance explained by the shared environment for the HiPIC domains (except for Imagination) has to be interpreted with caution, because the small sample size makes it difficult to detect shared environmental influences. These results are much in line with the patterns observed in other behavior genetic studies on individual differences in personality in adulthood (Jang et al., 1998). Most interesting were the analyses on the genetic– environmental decomposition of the trait covariances (representing stability), showing very similar patterns for Extraversion, Benevolence, and Conscientiousness, with 40% (Extraversion) to 51% (Benevolence) of the covariance across time accounted for by genetic factors, 0% to 10% explained by the shared environment, and around 50% by the nonshared environment. The covariance of Emotional Instability was largely explained by the nonshared environment and to a moderate extent by genetic factors and 13% by the common environment. Finally, the three variance components explained an almost equal amount of covariance for Imagination. The magnitude of the genetic and shared environmental correlations (1.0 or near 1.0) suggests that the same set of genes and shared environmental influences determines the continuity observed across time, whereas the lower values for the nonshared environment suggest that different environmental factors, which are unique to an individual, operate across time. Strengths and limitations. The present study has a number of strengths, including the use of a hierarchical and comprehensive personality measure specifically designed to assess traits at young age, as well as the availability of a population-based representative sample and a genetic informative sample. However, this work also has a number of limitations that should be taken into account when interpreting the results. First, continuity was examined across a limited time lag with the use of only one inventory. It could be argued that more change is to be expected across longer time intervals, and the observed trait-change patterns— or the absence of change— could be specific for the HiPIC. For example, the decrease in Curiosity in adolescence might be particular for the HiPIC. It cannot be excluded that more changes will be observed across a longer time interval and for different personality measures. Second, the continuity types were examined with parents as primary informants on children’s and adolescents’ personality. Useful extensions of this research could include in-

volving teachers as informants for young children and asking adolescents to provide self-ratings. Third, the genetic-informative sample is relatively small, hampering the statistical power of the analyses. The analyses that were conducted are informative, considering the dearth of genetic-informative studies on individual differences in personality development in childhood and adolescence, but larger samples would enable a series of more detailed and powerful analyses, including the exploration of the FFM lower level traits, and behavior genetic analyses of different types of personality continuity, for example, the trait pattern across time. Fourth, a potential threat for longitudinal research is that more stable individuals do not drop out, and hence personality continuity is more likely to be demonstrated than change. However, continuity is demonstrated across all traits not only for traits related to dropout in longitudinal research, such as Conscientiousness (De Fruyt & Mervielde, 1999). In addition, prolonged participation in studies with children depends more on parental decisions. In that case, parents who are more likely to continue participation describe their children as more consistent, which is unlikely to be the case. Finally, the present study included only two assessment occasions. Additional assessment points, preferably across a longer time span, would allow the use of latent-growth curve modeling and provide better ways to handle measurement error and offer opportunities to address new questions. In conclusion, the evidence for different types of personality continuity supports and extends previous research demonstrating that the level of continuity in childhood and adolescence is higher than often expected (e.g., compared with young adulthood; Roberts et al., 2006). Caspi and colleagues (2005) argued in this respect that personality trait development is not a continuityversus-change proposition, but that continuity and change coexist. The major challenge for developmental theories will be to account not only for trait changes but also for trait continuity.

References Biesanz, J. C., West, S. G., & Kwok, O.-M. (2003). Personality over time: Methodological approaches to the study of short-term and long-term development and change. Journal of Personality, 71, 905–941. Block, J. (1971). Lives through time. Berkeley, CA: Bancroft. Buss, A. H., & Plomin, R. (1984). Temperament: Early developing personality traits. Hillsdale, NJ: Erlbaum. Caspi, A. (1998). Personality development across the life-course. In W. Damon & N. Eisenberg (Eds.), Handbook of child psychology: Social, emotional, and personality development (pp. 311–388). Wiley: New York. Caspi, A., Roberts, B. W., & Shiner, R. L. (2005). Personality development: Stability and change. Annual Review of Psychology, 56, 453– 484. Christensen, L., & Mendoza, J. L. (1986). A method assessing change in a single subject: An alteration of the RC index. Behaviour Therapy, 17, 305–308. Cloninger, C. R., Svrakic, D. M., & Przybeck, T. R. (1993). A psychobiological model of temperament and character. Archives of General Psychiatry, 50, 975–990. Costa, P. T., Jr., & McCrae, R. R. (1988). Personality in adulthood: A six-year longitudinal study of self-reports and spouse ratings on the NEO Personality Inventory. Journal of Personality and Social Psychology, 54, 853– 863. Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personality Inventory and Five-Factor Inventory Professional Manual. Odessa, FL: Psychological Assessment Resources. Costa, P. T., Jr., & McCrae, R. R. (1994). Set like plaster? Evidence for the

PERSONALITY CONTINUITY IN CHILDHOOD AND ADOLESCENCE stability of adult personality. In T. F. Heatherton & J. L. Weinberger (Eds.), Can personality change? (pp. 21– 40). Washington, DC: American Psychological Association. Costa, P. T., Jr., Terracciano, A., & McCrae, R. R. (2001). Gender differences in personality traits across cultures: Robust and surprising findings. Journal of Personality and Social Psychology, 81, 322–331. Cronbach, L. J., & Gleser, G. C. (1953). Assessing similarity between profiles. Psychological Bulletin, 50, 456 – 473. De Clercq, B., & De Fruyt, F. (2003). Personality disorder symptomatology in adolescence: A Five-Factor Model perspective. Journal of Personality Disorders, 17, 269 –292. De Clercq, B., De Fruyt, F., Koot, H. M., & Benoit, Y. (2004). Quality of life, perceived competence and traits in children surviving cancer: A state-trait multiple-rater perspective. Journal of Pediatric Psychology, 29, 579 –590. De Clercq, B., De Fruyt, F., & Van Leeuwen, K. (2004). A “little five” lexically-based perspective on personality disorder symptoms in adolescence. Journal of Personality Disorders, 18, 479 – 499. De Fruyt, F., & Mervielde, I. (1999). RIASEC types and Big Five traits as predictors of employment status and nature of employment. Personnel Psychology, 52, 701–727. De Fruyt, F., Mervielde, I., Hoekstra, H. A., & Rolland, J.-P. (2000). Assessing adolescents’ personality with the NEO PI–R. Assessment, 7, 329 –345. De Fruyt, F., Mervielde, I., & Van Leeuwen, K. (2002). The consistency of personality type classification across samples and five-factor measures. European Journal of Personality, 16, 57–72. De Fruyt, F., Van Leeuwen, K. G., Bagby, R. M., Rolland, J. P., & Rouillon, F. (2006). Assessing and interpreting personality change and continuity in patients treated for major depression. Psychological Assessment, 18, 71– 80. De Raad, B., & Perugini, M. (2002). Big Five factor assessment: Introduction. In B. de Raad & M. Perugini (Eds.), Big Five assessment (pp. 1–26). Go¨ttingen: Hogrefe & Huber. Digman, J. M. (1963). Principal dimensions of child personality as inferred from teacher’s judgements. Child Development, 34, 43– 60. Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41, 417– 440. Digman, J. M., & Inouye, J. (1986). Further specification of the 5 robust factors of personality. Journal of Personality and Social Psychology, 50, 116 –123. Dubas, J. S., Gerris, J. R. M., Janssens, J. R. M., & Vermulst, A. A. (2002). Personality types of adolescents: Concurrent correlates, antecedents, and type ⫻ parenting interactions. Journal of Adolescence, 25, 79 –92. Erikson, E. (1950). Childhood and society. New York: Norton. Erikson, E. H. (1968). Identity: Youth and crisis. New York: Norton. Eysenck, S. B. G. (1963). Junior Eysenck Personality Inventory. San Diego, CA: Educational and Industrial Testing Service and Human Services. Eysenck, S. B. G., Makaremi, A., & Barrett, P. T. (1994). A cross-cultural study of personality—Iranian and English children. Personality and Individual Differences, 16, 203–210. Gerris, J. R. M., Houtmans, M. J. M., Kwaaitaal-Roosen, E. M. G., De Schipper, J. C., Vermulst, A. A., & Janssens, J. M. A. M. (1998). Parents, adolescents and young adults in Dutch families: A longitudinal study. Nijmegen, the Netherlands: Institute of Family Studies. Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4, 26 – 42. Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26 –34. Goldsmith, H. H., & Campos, J. J. (1982). Toward a theory of infant temperament. In R. N. Emde & R. J. Harmon (Eds.), The development of attachment and affiliative systems (pp. 161–193). New York: Plenum. Gonzalez, R., & Griffin, D. (1999). The correlational analysis of dyad-level data in the distinguishable case. Personal Relationships, 6, 449 – 469.

551

Gonzalez, R., & Griffin, D. (2000). On the statistics of interdependence: Treating dyadic data with respect. In W. Ickes & S. Duck (Eds.), The social psychology of personal relationships (pp. 181–213). London: Wiley & Sons. Griffin, D., & Gonzalez, R. (1995). Correlational analysis of dyad-level data in the exchangeable case. Psychological Bulletin, 118, 430 – 439. Grotevant, H. D. (1998). Adolescent development in family contexts. In W. Damon& N. Eisenberg (Eds.), Handbook of child psychology: Social, emotional, and personality development (pp. 1097–1149). Wiley: New York. Hart, D., Hofmann, V., Edelstein, W., & Keller, M. (1997). The relation of childhood personality types to adolescent behavior and development: A longitudinal study of Icelandic children. Developmental Psychology, 32, 195–205. Haven, S., & ten Berge, J. M. F. (1977). Tucker’s coefficient of congruence as a measure of factorial invariance: An empirical study (Tech. Rep. No. 290 EX).Groningen, the Netherlands: University of Groningen. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19. Jang, K. L., McCrae, R. R., Angleitner, A., Riemann, R., & Livesley, W. J. (1998). Heritability of facet-level traits in a cross-cultural twin sample: Support for a hierarchical model of personality. Journal of Personality and Social Psychology, 74, 1556 –1565. John, O. P., Caspi, A., Robins, R. W., Moffitt, T. E., & Stouthamer-Loeber, M. (1994). The little 5—Exploring the nomological network of the 5-factor model of personality in adolescent boys. Child Development, 65, 160 –178. Kohnstamm, G. A., Halverson, C. F., Mervielde, I., & Havill, V. L. (1998). Parental descriptions of child personality. Developmental antecedents of the Big Five? The LEA series in personality and clinical psychology. Mahwah, NJ: Erlbaum. Lamb, M. E., Chuang, S. S., Wessels, H., Broberg, A. G., & Hwang, C. P. (2002). Emergence and construct validation of the Big Five in early childhood: A longitudinal analysis of their ontogeny in Sweden. Child Development, 73, 1517–1524. Loehlin, C. J. (1992). Genes and environment in personality development. London, UK: Sage. McCrae, R. R. (1994). The counterpoint of personality assessment: Selfreports and observer ratings. Assessment, 2, 159 –172. McCrae, R. R., & Costa, P. T., Jr. (1997). Personality trait structure as human universal. American Psychologist, 52, 509 –516. McCrae, R. R., & Costa, P. T., Jr. (1984). Emerging lives, enduring dispositions: Personality in adulthood. Boston: Little, Brown. McCrae, R. R., & Costa, P. T., Jr. (1990). Personality in adulthood. New York: Guilford. McCrae, R. R., & Costa, P. T., Jr. (1996). Toward a new generation of personality theories: Theoretical contexts for the five-factor model. In J. S. Wiggins (Eds.), The five-factor model of personality: Theoretical perspectives (pp. 51– 87). New York: Guilford. McCrae, R. R., Costa, P. T., Jr., de Lima, M. P., Simoes, A., Ostendorf, F., Angleitner, A., et al. (1999). Age differences in personality across the adult life span: Parallels in five cultures. Developmental Psychology, 35, 466 – 477. McCrae, R. R., Costa, P. T., Jr., Ostendorf, F., Angleitner, A., Hrebickova, M., Avia, M. D., et al. (2000). Nature over nurture: Temperament, personality and lifespan development. Journal of Personality and Social Psychology, 78, 173–186. McCrae, R. R., Costa, P. T., Jr., Terraciano, A., Parker, W. L., Mills, C. J., De Fruyt, F., & Mervielde, I. (2002). Personality trait development from age 12 to 18: Longitudinal, cross-sectional, and cross-cultural analyses. Journal of Personality and Social Psychology, 83, 1456 –1468. McGue, M., Bacon, S., & Lykken, D. T. (1993). Personality stability and change in early adulthood: A behavioral genetic analysis. Developmental Psychology, 29, 96 –109.

552

DE FRUYT ET AL.

Mervielde, I., De Clercq, B., De Fruyt, F., & Van Leeuwen, K. (2005). Temperament, personality and developmental psychopathology as childhood antecedents of personality disorders. Journal of Personality Disorders, 19, 171–201. Mervielde, I., & De Fruyt, F. (1999). Construction of the Hierarchical Personality Inventory for Children (HiPIC). In I. Mervielde, I. Deary, F. De Fruyt, & F. Ostendorf (Eds.), Personality psychology in Europe. Proceedings of the Eight European Conference on personality psychology (pp. 107–127). Tilburg, the Netherlands: Tilburg University Press. Mervielde, I., De Fruyt, F., & De Clercq, B. (2005). Handleiding Hie¨rarchische Persoonlijkheidsvragenlijst voor Kinderen [Manual Hierarchical Personality Inventory for Children]. Ghent, Belgium: Ghent University, Department of Developmental, Personality, and Social Psychology. Mroczek, D. K., & Spiro, A., III. (2003). Modeling intraindividual change in personality traits: Findings from the Normative Aging Study. Journal of Gerontology Psychology Science, 58B, 153–165. Neale, M. C. (2003). Mx: Statistical modeling (3rd ed). Richmond: University of Virginia. Piaget, J. (1983). Piaget’s theory. In P. H. Mussen (Ed.), Handbook of child psychology: Vol. 1, History, theory, and methods (pp. 103–128). New York: Wiley. Roberts, B. W., Caspi, A., & Moffitt, T. E. (2001). The kids are alright: Growth and stability in personality development from adolescence to adulthood. Journal of Personality and Social Psychology, 81, 670 – 683. Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126, 3–25. Roberts, B. W., & Walton, K., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life course: A metaanalysis of longitudinal studies. Psychological Bulletin, 132, 1–25. Robin, A. L., & Foster, S. L. (1989). Negotiating parent–adolescent conflict: A behavioral–family systems approach. New York: Guilford Press. Robins, R. W., Fraley, R. C., Roberts, B. W., & Trzesniewski, K. H. (2001). A longitudinal study of personality change in young adulthood. Journal of Personality, 69, 617– 640. Robins, R. W., & Tracy, J. L. (2003). Setting an agenda for a personcentered approach to personality development. Monographs of the Society for Research in Child Development, 68, 110 –122. Rothbart, M. K., & Derryberry, D. (1981). Development of individual

differences in temperament. In M. E. Lamb & A. L. Brown (Eds.), Advances in developmental psychology (pp. 37– 86). Hillsdale, NJ: Erlbaum. Scholte, R. H. J., van Aken, M. A. G., & van Lieshout, C. F. M. (1997). Adolescent personality factors in self-ratings and peer nominations and their prediction of peer acceptance and peer rejection. Journal of Personality Assessment, 69, 534 –554. Thomas, A., Chess, L. A., Birch, H. G., Herzig, M. E., & Korn, S. (1963). Behavioral individuality in early childhood. New York: New York University Press. Thomas, A., & Chess, S. (1977). Temperament and development. New York: Brunner/Mazel. Van Hoecke, E., De Fruyt, F., De Clercq, B. J., Hoebeke, P., & Van de Walle, J. (2006). Internalising and externalising problem behavior in childhood enuresis: A Five-Factor Model perspective. Journal of Pediatric Psychology, 31, 460 – 468. Van Leeuwen, K. (2004). Parenting and personality as predictors of child and adolescent internalizing and externalizing problem behavior. Unpublished doctoral dissertation, Ghent University, Ghent, Belgium. Van Leeuwen, K., De Fruyt, F., & Mervielde, I. (2004). A longitudinal study of the utility of the resilient, overcontrolled and undercontrolled personality types as predictors of children’s and adolescents’ problem behavior. International Journal of Behavioral Development, 28, 210 – 220. Van Leeuwen, K. G., Mervielde, I., Braet, C., & Bosmans, G. (2004). Child personality and parental behavior as moderators of problem behavior: Variable- and person-centered approaches. Developmental Psychology, 40, 1028 –1046. Van Lieshout, C. F. M., & Haselager, G. J. T. (1994). The Big Five personality factors in Q-sort descriptions of children and adolescents. In C. F. Halverson, Jr., G. A. Kohnstamm, & R. P. Martin (Eds.). The developing structure of temperament and personality from infancy to adulthood (pp. 293–318). Hillsdale NJ: Erlbaum. Vollrath, M., & Landolt, M. A. (2005). Personality predicts quality of life in pediatric patients with unintentional injuries: A one-year follow-up study. Journal of Pediatric Psychology, 30, 481– 491.

Received May 12, 2005 Revision received April 4, 2006 Accepted April 6, 2006 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 553–567

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.553

Terror Management and Religion: Evidence That Intrinsic Religiousness Mitigates Worldview Defense Following Mortality Salience Eva Jonas and Peter Fischer Ludwig-Maximilians-Universita¨t Terror management theory suggests that people cope with awareness of death by investing in some kind of literal or symbolic immortality. Given the centrality of death transcendence beliefs in most religions, the authors hypothesized that religious beliefs play a protective role in managing terror of death. The authors report three studies suggesting that affirming intrinsic religiousness reduces both death-thought accessibility following mortality salience and the use of terror management defenses with regard to a secular belief system. Study 1 showed that after a naturally occurring reminder of mortality, people who scored high on intrinsic religiousness did not react with worldview defense, whereas people low on intrinsic religiousness did. Study 2 specified that intrinsic religious belief mitigated worldview defense only if participants had the opportunity to affirm their religious beliefs. Study 3 illustrated that affirmation of religious belief decreased death-thought accessibility following mortality salience only for those participants who scored high on the intrinsic religiousness scale. Taken as a whole, these results suggest that only those people who are intrinsically vested in their religion derive terror management benefits from religious beliefs. Keywords: intrinsic versus extrinsic religiousness, terror management theory, mortality salience, worldview defense, terrorist attacks

theory (TMT; Greenberg, Solomon, & Pyszczynski, 1997) to explain why and how religion serves this function. Although the idea that religion’s core function is to provide people with a means to transcend death is not new (e.g., Allen, 1897/2000; Feuerbach, 1843/1980; Freud, 1915/1959), so far there is no empirical evidence that religion defuses the existential terror that the awareness of mortality imposes on the human cognitive system. In the current article, we will provide evidence suggesting that intrinsically religious people are able to effectively manage the terror of death.

Best of all, of course, religion solves the problem of death, which no living individuals can solve, no matter how they would support us. (Ernest Becker, 1973, p. 203)

In many cultures, religious beliefs and practices provide a forum for the expression of spirituality and serve several psychological functions, such as providing a system of shared meaning and social practices (Becker, 1971; Berger, 1969), explaining the unknown (Goodenough, 1986), and protecting people against the terror of death (Solomon, Greenberg, & Pyszczynski, 1991). In addition, religion is a source of support and guidance that can help to maintain and enhance character and mental health (Allport, 1950; Bergin, 1991). Religion is an important component of many people’s lives. In fact, a recent Gallup Poll found that 89% of Americans perceived themselves as being religious, 59% indicated that their beliefs were very important to them, and 44% had attended a church or synagogue in the last 7 days (Newport, 2004). We propose that one of the key functions of religion is to provide people with the opportunity to live a meaningful life, to feel significant and eternal. We hypothesize that in this way, religion helps to manage the terror inherent in the most basic of all human fears: the fear of death. We will apply terror management

TMT and Research Based on the work of Ernest Becker (1973), TMT assumes that a motivation to deny death serves as a unifying concept for explaining people’s thinking and behavior. The theory posits that the pairing of an instinct for self-preservation combined with a capacity for self-consciousness makes humans aware of their mortality and creates the potential for paralyzing terror. However, TMT suggests that two key psychological structures have evolved to help people manage the terror (unconsciously). These structures are (a) the adoption of a cultural worldview and (b) self-esteem. Cultural worldviews are cultural belief systems that provide an explanation for existence; imbue the world with order, meaning, and permanence; and provide a set of standards through which individuals can attain a sense of personal value. Moreover, these worldviews promise protection and death transcendence to those who meet the standards of value by providing literal or symbolic immortality (e.g., through concepts of soul, heaven, or nirvana with regard to literal immortality and lasting cultural achievements as a vehicle for symbolic immortality). Self-esteem is acquired by believing in the cultural worldview and living up to its standards. Together, these psychological structures constitute an anxiety buffer that helps people to cope with the problem of death.

Eva Jonas and Peter Fischer, Department of Psychology, Social Psychology, Ludwig-Maximilians-Universita¨t, Munich, Germany. We are grateful to Andreas Kastenmu¨ller for his help in the data collection. We would like to thank Jeff Greenberg for his helpful comments on an earlier version of this article. Correspondence concerning this article should be addressed to Eva Jonas, who is now at the Department of Psychology, University of Salzburg, Hellbrunner Straße 34, A-5020, Salzburg, Austria. E-mail: [email protected] 553

JONAS AND FISCHER

554

Over the last 20 years, a wide array of studies has been conducted to test hypotheses derived from TMT. The vast majority of TMT studies has shown that mortality salience (MS) intensifies people’s efforts to sustain faith in their worldview and to strive for self-esteem (for review, see Solomon, Greenberg, & Pyszczynski, 2004). For example, following MS, people have been found to identify with important in-groups (Dechesne, Janssen, & van Knippenberg, 2000) and to react more positively to others who support their worldview and more negatively to those who violate or challenge their worldview (Greenberg et al., 1997). When reminded of their own death, people are more inclined to reward others who follow cultural norms and to punish those who violate cultural norms (Rosenblatt, Greenberg, Solomon, Pyszczynski, & Lyon, 1989). Furthermore, people themselves behave more closely in accordance with salient cultural norms, by showing increased tolerance to different others, generosity, fairness, or appearanceenhancing behavior (see, e.g., Arndt, Schimel, & Goldenberg, 2003; Greenberg, Simon, Pyszczynski, Solomon, & Chatel, 1992; Jonas, Fritsche, Greenberg, Martens, & Niesta, 2006). Related research suggests that the reactions observed are specific to reminders about one’s own death and do not occur in similar strength if people are simply reminded of some other negative factor, such as failing an exam or suffering from dental pain (Greenberg et al., 1997). In addition, it has been shown that worldview defense reactions and self-esteem striving following MS are paralleled by death-related thought that is accessible but outside of one’s focal attention (see, e.g., Pyszczynski, Greenberg, & Solomon, 1999).

Terror Management and Religion Religion has an important function in helping people to cope with the problem of death. For centuries, people’s thinking and feelings about death have been colored by their religious beliefs and rituals (Parrinder, 1983). Several studies suggest that religiousness is negatively related to concerns about death (e.g., Feifel & Nagy, 1981; Kahoe & Dunn, 1975; Spilka, Stout, Minton, & Sizemore, 1977; Templer, 1970). From the perspective of TMT, we hypothesize that religion plays a protective role in terror management because it provides a basis for cultural worldviews and culturally derived self-esteem. It serves as a meaningful system in which human life and human activities can be embedded, in which people can feel significant and be secure in the knowledge that they will live on in some form after physical death. From this point of view, one could propose that MS would increase investment in religion for those who adhere to religious beliefs. However, with regard to the question of whether reminders of death induce stronger religiousness, the empirical evidence is mixed. Whereas Norenzayan and Hansen (2006) recently found that MS led people to report to being more religious and to more strongly believing in God and divine interventions, Burling (1993) failed to find similar effects when measuring religiousness on Batson and Ventis’s (1982) interactional and internal scales. Yet, with regard to the question of whether reminders of death motivate people to defend their religious beliefs, the evidence is more conclusive. Greenberg et al. (1990), for example, demonstrated that MS led people to defend and bolster their religious faith by favoring those who shared their religion but derogating those who believed in another religion. In addition, Greenberg, Simon, Porteus, Pyszczynski, and Solomon (1995) found that MS made it

more difficult to use religious symbols in an inappropriate way: In a problem-solving task in which the most effective solution was to use a crucifix as a hammer, participants who had been reminded of death had greater difficulty solving the task than participants who could use a block of wood as a hammer.1 Furthermore, religion provides standards to live up to and gives people a sense of worth if they meet these standards. In line with this function, terror management research has shown that MS increases the severity of moral judgments (Rosenblatt et al., 1989) and adherence to salient cultural norms, such as fairness and generosity (Jonas et al., 2006). Although TMT proposes that all cultural worldviews help to manage the terror of death, given the centrality of death transcendence beliefs in most religions, the central function of religion may be to defuse the terror of death by providing belief in an afterlife. In support of this notion, Osarchuk and Tatz (1973) found increased belief in a life after death among those people who had been exposed to slides depicting morbid scenes and who were confident about the existence of the hereafter. Findings by Schoenrade (1989) suggest that participants with a strong belief in an afterlife reacted with both more positive and negative associations to reminders of death, whereas participants who did not or only weakly believed in an afterlife showed fewer positive associations. Finally, Dechesne et al. (2003) found evidence that belief in an afterlife based on supposed scientific evidence regarding the neardeath experience reduced self-esteem striving and worldview defense following MS. Taken as a whole, these findings provide converging evidence that religious beliefs serve a terror management function and mitigate the effects of MS. Surprisingly, however, this idea has hardly been addressed in the empirical research on terror management to date. To our knowledge, only one qualitative analysis exists: Greenberg et al. (1997, p. 97) grouped participants according to whether they had mentioned religious topics, such as a deity or soul, in their writings when reporting thoughts about their own death. This differentiation did not reveal any effects regarding participants’ worldview defense or self-esteem strivings. We propose that one reason for this could be that the researchers did not differentiate between intrinsic and extrinsic religious orientations (Allport, 1950).

Intrinsic Versus Extrinsic Religious Orientation The proposition that individuals are religious in psychologically different ways has its origin in the work of James (1902), who recognized two types of trait-based religious responding: First, he described the “healthy-minded” person, who tends to be happy, optimistic, and grateful toward God. Second, he proposed the “sick soul,” who is aware of the evils of the world and to whom the inevitability of death is clearly salient. In subsequent research validating interindividual differences regarding people’s religiousness empirically, Allport and Ross’s (1967) distinction between intrinsic and extrinsic religious orientation has been one of the most important and influential additions to this area. Extrinsic religiousness involves a utilitarian approach to religion. Religion is used instrumentally to obtain other ends, such as safety, solace, 1

Similar results were shown with regard to the inappropriate use of a nonreligious cultural symbol, the U.S. flag.

TERROR MANAGEMENT AND RELIGION

social standing, and self-justification and to protect the self from infractions (Allport, 1966). Intrinsic religiousness, on the other hand, can be characterized by the striving for meaning and value. It “regards faith as a supreme value in its own right. It is oriented toward a unification of being, takes seriously the commandment of brotherhood, and strives to transcend all self-centered needs” (Allport, 1966, p. 455). Intrinsically religious individuals have a mature and sincerely honest religious orientation with deeply internalized religious beliefs, by which they are guided without considering external consequences. Religion is the principal motive in the lives of these individuals; it is the framework within which they live their lives, and it provides them with motivation and direction. Allport and Ross (1967) created a Religious Orientation Scale measuring intrinsic and extrinsic orientations that has been widely used in empirical studies of religion. Originally, intrinsic versus extrinsic religious orientation was conceptualized along a single bipolar continuum assuming a negative correlation between the two dimensions. However, as Donahue (1985) summarized in his meta-analysis, there is considerable evidence that the two dimensions tend to load on two separate factors. Furthermore, the mean correlation across the 34 samples reported by Donahue (1985) was ⫺.06 (comprising a range from ⫺.58 to .24). Although other authors extended the dichotomous intrinsic versus extrinsic view of religiousness—for example, religion as a quest (Batson, 1976) and extrinsic–personal (protection, consolation) or extrinsic–social religiousness (religious participation, social status; Gorsuch & McPherson, 1989)—the distinction between intrinsic and extrinsic religiousness dominates the literature on religious orientations. Accordingly, various authors have argued that the distinction between intrinsic and extrinsic religiousness has been the most useful distinction for research on religiousness in different contexts, such as the relationship between religiousness and mental health (see Gorsuch, 1988). Research on religious orientations has shown positive correlations between extrinsic religiousness and negatively evaluated characteristics, such as prejudice and dogmatism, whereas intrinsic religiousness tends to be uncorrelated with these negative characteristics (e.g., Batson, 1976). In his meta-analysis, Donahue (1985) found a mean correlation across all measures of prejudice of about .34 for extrinsic religiousness and about ⫺.05 for intrinsic religiousness. Similarly, for measures of dogmatism, the mean correlation was about .36 on average for extrinsic religious orientation but only .06 for intrinsic religious orientation. Intrinsic religiousness, on the other hand, was positively correlated with measures of religious belief and commitment, whereas extrinsic religiousness seemed to be uncorrelated with these religious measures (for review, see Donahue, 1985). In this vein, intrinsically religious people have been shown to be more orthodox and to ascribe more importance to religion than highly extrinsic religious individuals do (e.g., Batson, 1976). Donahue (1985) reported that the average correlation between intrinsic religiousness and religious belief scales was .39, whereas for extrinsic religiousness there was a correlation of only about .16. When the importance of religion or religious commitment is examined, one can even find correlations of about .76 for intrinsically religious people versus .03 for extrinsically religious people. As the extrinsic Religious Orientation Scale measures not so much religiousness per se but religion as an instrument to obtain other ends (such as solace or social standing)

555

and treats religion as one of many influences on life, it is not surprising to find extrinsic religiousness basically uncorrelated with measures of religious belief and commitment. Moreover, studies and reviews are consistent in finding intrinsic religiousness to be positively correlated with mental health outcomes, such as self-regulation, personal adjustment, self-control, or sense of well-being, whereas by contrast negative correlations were found for extrinsic religiousness (e.g., Bergin, Masters, & Richards, 1987; Maltby & Day, 2000). More recently, Saroglou (2002) showed in a meta-analysis that intrinsic religiousness was associated with emotional stability, in contrast with extrinsic religiousness, which was related to neuroticism. Furthermore, Maltby and Day (2004) demonstrated that extrinsic religious orientation was associated not only with neuroticism but also with negative religious coping (i.e., an insecure relationship to God with negative outcomes for mental health), whereas intrinsic religiousness was associated with positive religious coping (i.e., a secure relationship to God with positive outcomes for mental health) and had a negative relationship with psychoticism. Intrinsically religious individuals are also known to report lower levels of anxiety, whereas extrinsic religious orientation correlates positively with anxiety (Baker & Gorsuch, 1982; Bergin et al., 1987; Maltby, Lewis, & Day, 1999; Sturgeon & Hamley, 1979). Finally, and most important for our line of reasoning, there are several studies showing that various measures of fear of death are negatively correlated with intrinsic and positively correlated with extrinsic religious orientation (e.g., Bolt, 1977; Minton & Spilka, 1976; Spilka et al., 1977; for review, see Donahue, 19852). Genia (1996) concluded that intrinsic religiousness can be seen as a strong predictor of psychospiritual health and that intrinsic believers’ sense of a personal relationship with God may make them less vulnerable to existential angst. Building on this research, we assume that intrinsic religiousness serves as a buffer against existential terror and therefore mitigates the need for other terror management defenses. Extrinsic religiousness, on the other hand, should not be capable of buffering existential anxiety and should therefore not alter terror management reactions. These predictions are based on the fact that people who demonstrate an intrinsic orientation toward religion have a deeply internalized belief system that provides secure knowledge about the meaning of life and death. Compared with extrinsically religious people—who consider religion not an end in itself but rather a means to an end—the meaning system of intrinsics should be more effective in providing those psychological resources that are important for terror management. The association of intrinsic religiousness with mental health and low anxiety (most important, low existential anxiety) might be considered a consequence of their more secure belief system.

2

In the studies mentioned in the meta-analysis by Donahue (1985), there is one exception: Spilka, Pelligrini, and Dailey (1968) reported a positive relationship between different indicators of fear of death and intrinsic religiousness. However, the authors admitted that the scales used in this research were earlier versions and different from the scales used in the other studies (Donahue, 1985).

JONAS AND FISCHER

556 Affirmation of Religious Beliefs

When facing a crisis, people often turn to religion as a coping response (e.g., Carver, Scheier, & Weintraub, 1989; McCrae & Costa, 1986). Effective coping thereby implies that people pray to God, search for answers on the basis of their belief system, and trust that they receive personal support from God (Ebaugh, Richman, & Chafetz, 1984; Maton, 1989)—in other words, people affirm their religion in order to cope with a life crisis or stress. Using a self-affirmation procedure (Steele, 1988), Schmeichel and Martens (2005) recently demonstrated in a nonreligious context that affirming a value that is important to an individual personally decreased worldview defense and death-thought accessibility following MS. It is important to note that this effect could not be explained by a possible self-esteem boost caused by affirming the value. The authors reasoned that—to the extent that worldviews serve to mitigate existential anxiety— bolstering one’s worldview should reduce defensiveness in the face of MS. These findings are an important addition to the previous terror management literature because they are the first to show that not only boosting people’s self-esteem (Harmon-Jones et al., 1997) but also upholding an important value prevents terror management defense. However, the findings are not conclusive because they only partially provide support for the authors’ line of reasoning. Using the evaluation of a pro- versus anti-American essay as a measure of worldview defense, Schmeichel and Martens found an interaction effect indicating that self-affirmation negated defensive reactions only for the negative essay but not for the positive one. Moreover, the authors themselves suggested that bolstering one’s worldview serves a terror management function only “if one’s worldview is strong, held with conviction and a sense of value” (Schmeichel & Martens, 2005, p. 659). However, they did not differentiate between different kinds of worldviews. They presented a list with 12 values or characteristics to their participants (e.g., aesthetic appreciation, sense of humor, relations with family and friends, social skills) and asked them to pick 1 item that was important to them and to write a short narrative about it. Across the two studies, 66% of the participants wrote about their relations with family and friends. Given that about two thirds of their participants bolstered their relations with family and friends, it could have been that participants just boosted their social identity instead of affirming any value from their worldview. With the distinction between intrinsic and extrinsic religious orientation, in the current research we want to focus on values that have been internalized differently. Affirming intrinsic religiousness might be even more effective for terror management than just affirming any value at all because intrinsically religious people have strongly internalized beliefs and, as Allport and Ross (1967) put it, not only use their values but live them. Moreover, religious worldviews might be more effective in terror management than secular worldviews. Purely secular bases of immortality are never completely satisfying as they cannot fully convince people of their eternal significance. Religious worldviews, on the other hand, address the fear of death directly by claiming that death is not the end of existence, thereby providing cosmic significance and greater peace than secular worldviews (Becker, 1971; see also Greenberg, Solomon, & Landau, in press). To summarize, we predict that intrinsic religiousness provides effective protection from mortality concerns and that the affirma-

tion of intrinsic religious beliefs prevents worldview defense. We further hypothesize that the affirmation of extrinsic religious beliefs does not prevent worldview defense. We report three studies to test this idea.

Study 1 Terrorist attacks induce fear, anxiety, and concern about death (Boscarino, Figley, & Adams, 2004; Gallup News Service, 2001; Saad, 2001). More than 60% of Americans who were asked about their emotions in the aftermath of September 11, 2001, responded that their personal sense of security had been shaken (Saad, 2001), and (compared with 24% in the preceding year) 54% feared that they or a member of their family would become a victim of a future terrorist attack (Gallup News Service, 2001). TMT research has revealed that even subliminal primes of symbols, such as 9/11 or WTC, increase death-thought accessibility (Landau, Solomon, et al., 2004). As terrorist attacks seem to be a natural reminder of mortality, in Study 1 we looked at the relation between religiousness and worldview-relevant defense in the light of the Istanbul terrorist attacks in November 2003, when the Turkish capital was shocked by a series of devastating bomb blasts. Terrorism experts said the attacks were the work of al Qaeda, but as Turkey had never been considered a prime target of al Qaeda terrorism, and given that 2.1 million Turks live in Germany, the suicide attacks in Istanbul raised the question of whether Germany might also become a target of al Qaeda terrorism. We manipulated MS quasi-experimentally by the time lag to the Istanbul terrorist attacks: immediately after the attacks (high MS) versus 1 week later (low MS). Our study started on November 20, 2003, the day of the attacks on the British Consulate and the Turkish headquarters of the London-based HSBC bank, which killed 15 people and wounded more than 300, and 5 days after the suicide car bombers’ attacks on two synagogues in central Istanbul, which killed 23 people and injured more than 250. We propose that these incidents induced MS. However, for the delayed time of measurement when no further incidents occurred, we suggested that the fear experienced immediately after the attack subsided, and the death threat dissipated. We assumed that after the Istanbul terrorist attacks, religious people were likely to affirm their religion to be better able to cope with the threat of terrorism. As outlined above, when facing a crisis, people often turn to religion as a coping response. Similar reactions have been observed after terrorist attacks. In the United States, a nationwide survey, conducted by the University of Chicago, revealed that 84% of the people who were asked whether they had said special prayers in response to the September 11th terrorist attack answered in the affirmative (T. W. Smith, Rasinski, & Toce, 2001). Furthermore, Pyszczynski, Solomon, and Greenberg (2003) reported that after September 11th, the highest level of church attendance since the 1950s was observed in America. Similar spikes were recorded in Canada, England, and Australia; Bible sales flourished; and Internet religion boomed. Yet, by December 2001, church attendance had returned to previous levels. With regard to Germany, observations have also shown that immediately after the September 11th terrorist attacks, many Germans congregated in churches to find a sense of security and solace (Grimm, 2005).

TERROR MANAGEMENT AND RELIGION

We expected that for intrinsically religious people, the affirmation of religion serves terror management functions and that they therefore would show less worldview defense when reminded of death by the terrorist attacks than would extrinsically religious or nonreligious participants. However, for the delayed time of measurement, we hypothesized that the death threat would dissipate and, as a consequence, that there would no longer be any need to affirm religion. Thus, we hypothesized that the immediate assessment condition would function as an MS/self-affirmation condition for intrinsics but as an MS/no-affirmation condition for extrinsics. To measure worldview defense, we used a selective exposure procedure similar to the one used in a terror management study by Jonas, Greenberg, and Frey (2003): We measured participants’ striving for cognitive consistencies by looking at the preference for information supporting a preexisting belief compared with information conflicting with this belief. From the perspective of TMT, cognitive inconsistency should be experienced as aversive because it undermines the stability of participants’ worldview. In support of this notion, Jonas et al. (2003) found that participants’ preference for consistency-restoring cognitions increased following MS. In a similar vein, McGregor, Zanna, Holmes, and Spencer (2001) showed that people cope with personal inconsistency due to a threat by spontaneously emphasizing certainty and conviction about personal attitudes toward social issues and thus restore cognitive consistency. In our study, we used a combination of these two procedures. We presented positive and negative statements regarding the question of how likely terrorist attacks in Germany were considered to be and measured how participants evaluated the statements with respect to their credibility, importance, and strength. We predicted that intrinsically religious people would have less need to defend their personal conviction when reminded of death than would extrinsically religious or nonreligious people.

Method Participants. Seventy-eight patrons of a students’ coffee shop–pub in Munich (Germany) volunteered to participate in this study. The study was conducted in the evening between 6:30 p.m. and 10:30 p.m. The sample consisted of 32 women and 46 men, ranging in age from 22 to 60 years (M ⫽ 29.97 years, SD ⫽ 7.51). One participant who was an outlier of about three standard deviations from the mean of the main dependent variable was excluded from the analyses (Kirk, 1995). Furthermore, we excluded 3 foreign participants who turned out to have severe problems with the German language and 4 participants who did not follow the instructions (because they interrupted filling out the questionnaire, started eating, talked to each other, and generated doubt that they were fully concentrated on the content of our questionnaire). This left 70 participants in the sample. Of these participants, 21 reported that they were Catholic, 13 were Protestant, and 36 either filled in no denomination or left a blank box. Design. The experiment was based on a 2-factorial (time of measurement: immediately after the terrorist attacks in Istanbul [November 20, 2003] vs. delayed [November 28, 2003]) between-subjects design. The dependent variable of worldview defense was based on participants’ defense of their position regarding the question of whether terrorist attacks were likely to happen in Germany. Procedure. We recruited participants in a students’ coffee shop–pub close to the university campus of Ludwig-Maximilians-Universita¨t in Munich by asking them to answer a questionnaire concerning terrorism. The packet started with a short description of the terrorist attacks in Istanbul

557

and raised the question of whether similar attacks would also happen in Germany. Next, we asked participants to answer some questions regarding their estimates about the likelihood that such attacks would happen in Germany. Among these questions were the following: “How likely do you think it is that similar terrorist attacks will happen in Germany in the near future?” (answers ranged from 0 ⫽ not at all likely to 10 ⫽ extremely likely); and “If you had to make a decision, is it true or is it not true that similar terrorist attacks will happen in Germany?” (check a box for yes or no). After that, participants were asked to fill out the German version of the Intrinsic and Extrinsic Religious Orientation scales by Feagin (1964; Zwingmann, Hellmeister, & Ochsmann, 1994), which was based on the original Religious Orientation Scale by Allport and Ross (1967).3 The measures of intrinsic and extrinsic religious orientation each consisted of 6 items and were scored on a 5-point scale. The intrinsic scale (␣ ⫽ .74) includes such items as “I try hard to carry my religion over into all my other dealings in life”; “My religious beliefs are what really lie behind my whole approach to life”; “The prayers I say when I am alone carry as much meaning and personal emotion as those said by me during services”; and “It is important to me to spend periods of time in private religious thought and meditation.” The extrinsic scale (␣ ⫽ .62) includes items like “The primary purpose of prayer is to gain relief and protection”; “The purpose of prayer is to secure a happy and peaceful life”; “Religion helps to keep my life balanced and steady in exactly the same way as my citizenship, friendships, and other memberships do”; and “The church is most important as a place to formulate good social relationships.”4 To measure the dependent variable, we next told our participants that we would now present them with some short summaries of newspaper articles on the terrorist attacks. They had to evaluate each summary and to indicate whether they would be interested in reading the article at the end of the study. We then presented eight summaries to the participants, of which four pointed out that it was unlikely that terrorist attacks would happen in Germany and four stated the opposite (i.e., that it was very likely that terrorist attacks would happen in Germany). A piece of information rendering terrorist attacks in Germany likely might state, for example, “The terrorists’ intent is to terrorize all Western and pro-Western states. Therefore, it is only a question of time before the first terrorist attacks happen in Germany.” A piece of information contradicting the position that terrorist attacks would happen in Germany might state, “The terrorist attacks in Istanbul were directed at the secular Muslim democracy in Turkey, which according to the terrorists’ point of view betrays Islam and has to be punished. Because Germany has never been Muslim, attacks in Germany are unlikely.” Participants were asked to evaluate each summary with regard to how credible they thought this article was (responses on a scale from 0 ⫽ not at all credible to 10 ⫽ extremely credible), how important they considered the article to be (scale from 0 ⫽ not at all important to 10 ⫽ extremely important), and to what extent this article supported or contradicted the notion that terrorist attacks would happen in Germany (scale from ⫺5 ⫽ strongly contradicting to 5 ⫽ strongly supporting) (␣ ⫽

3

Whereas the Religious Orientation Scale by Allport and Ross (1967) was originally conceptualized along a single bipolar continuum, Feagin (1964) proposed two separate unipolar scales, which are now considered psychometrically preferable to the Allport and Ross scales (e.g., Hood, 1971; see also Donahue, 1985). 4 The finding of a low alpha for the extrinsic orientation scale is not uncommon (see alpha coefficients of about .54, .62, and .66 reported by Genia, 1996, and Hutchinson, Patock-Peckham, Cheong, & Nagoshi, 1998). Moreover, we also found the differences in reliabilities between intrinsic and extrinsic religiousness to be comparable to other studies (e.g., Hutchinson et al., 1998; Maltby & Day, 2004; Zwingmann et al., 1994).

558

JONAS AND FISCHER

.84).5 After the study was over, participants were debriefed and thanked for their participation.

Results Check for interfering effects. We first checked whether there were any systematic differences among our participants between the two times of measurement and found no systematic differences among the sample from November 20, 2003, to November 28, 2003, regarding age, t(68) ⬍ 1; gender, ␹2(1, N ⫽ 70) ⫽ 1.33, p ⬎ .24; or occupation: 72% employees, 28% students vs. 68% employees, 32% students. We then ran regression analyses by using MS (operationalized by time of measurement: immediately after the terrorist attacks in Istanbul [November 20, 2003] vs. delayed [November 28, 2003]), intrinsic religious orientation, extrinsic religious orientation, the 3 two-way interaction terms, and the three-way interaction as predictors. Following Aiken and West’s (1991) recommendation, we converted the continuous predictor variables (intrinsic and extrinsic religious orientation) to z scores before we computed the interaction terms. We first ran a regression analysis by using the estimated likelihood that terrorist attacks might happen in Germany as the criterion variable to check whether there was a relation between peoples’ intrinsic or extrinsic religious orientation and this measure. However, the regression analysis did not show any significant effect (all ts ⬍ 1.04, all ps ⬎ .30). Overall, participants considered the likelihood that terrorist attacks might happen in Germany to be about the mean (M ⫽ 5.21, SD ⫽ 2.38, on a scale from 0 to 10). Regarding the participants’ opinion about whether they thought terrorist attacks would happen in Germany, we found that overall, 36 participants thought terrorist attacks would happen in Germany and 34 participants thought that no attacks would happen in Germany. Main analyses. Our dependent measure of worldview defense was the evaluation of information supporting versus conflicting with participants’ belief that terrorist attacks would or would not happen in Germany. We created a composite measure of credibility, importance, and strength of pieces of information supporting the participants’ position regarding the question of whether terrorist attacks would happen in Germany minus ratings for pieces of information conflicting with their position. This composite is a measure for worldview defense. We then checked whether there was any significant effect of gender, and none was found. Next, we ran a regression analysis by using the composite measure for worldview defense as the criterion variable and MS (operationalized by time of measurement: immediately after the terrorist attacks in Istanbul [November 20, 2003] vs. delayed [November 28, 2003]), intrinsic religious orientation, extrinsic religious orientation, the 3 two-way interaction terms, and the three-way interaction as predictors. As predicted, this analysis yielded a significant interaction effect between MS and intrinsic religious orientation (B ⫽ ⫺1.52, ␤ ⫽ ⫺.44), |t(62)| ⫽ 2.20, p ⬍ .04, partial r2 ⫽ .07. To illustrate the nature of this interaction, we inserted standardized intrinsic orientation scores at values one standard deviation above and below the mean and the two levels for MS into the regression equation (see Figure 1). This analysis revealed that immediately after the terrorist attacks, people low in intrinsic religiousness showed a strong bias in favor of their own position, whereas people high in intrinsic religiousness showed a

Figure 1. Worldview defense depending on people’s intrinsic religiousness and time of measurement relative to terrorist attacks in Study 1.

more balanced information evaluation. However, at the delayed time of measurement after the attacks had happened, this difference between people with low versus high intrinsic religiousness had disappeared, and people in both groups showed only a small bias in favor of their position. Post hoc probing confirmed that there was a significant relationship between intrinsic religiousness and worldview defense only at the immediate time of measurement (B ⫽ ⫺1.51, ␤ ⫽ ⫺.43), t(62) ⫽ 2.91, p ⬍ .01, but not at the delayed time of measurement, |t| ⬍ 1. The regression analysis yielded no other significant effect. Most important, there was neither a main effect nor an interaction with regard to extrinsic religiousness (all ts ⬍ 1, all ps ⬎ .45). The correlation between the two scales of intrinsic and extrinsic religious orientation was .42. We will reflect on the size of correlations between the two scales more thoroughly in the General Discussion.

Discussion The results of Study 1 showed that immediately after the terrorist attacks in Istanbul, which can be considered as a naturally occurring reminder of mortality, people who scored high on intrinsic religiousness did not react with worldview defense, whereas people low on intrinsic religiousness did. People’s extrinsic religious orientation, on the other hand, had no effect on worldview defense. As Burling (1993) and Greenberg et al. (1997) did not differentiate between people low or high in intrinsic religiousness, 5

In addition, participants were asked whether they wanted to read the complete article and to check a box accordingly. However, we did not further explore this variable because 57 of the 70 participants (81%) did not want to read any article at all—an effect probably caused by the specific situation in the coffee shop–pub. In addition, a second, unrelated, question also investigated in this study was how religiousness, well-being, and self-efficacy were related. Therefore, after the information evaluation and search was finished, participants filled out a subjective well-being and self-efficacy measure. Parts of these findings are reported elsewhere (Fischer, Greitemeyer, Kastenmu¨ller, Jonas, & Frey, 2006).

TERROR MANAGEMENT AND RELIGION

this lack of distinction might explain the null effects found by these authors. Furthermore, we propose that the reason for the difference between people high versus low in intrinsic religiousness at the immediate time of measurement was that the affirmation of intrinsic religious beliefs serves terror management functions. Research has shown that people often respond to a crisis by spontaneously affirming their religion. They turn to religion as a coping response (e.g., Carver et al., 1989; McCrae & Costa, 1986) and, for example, have reported that they said special prayers in response to the September 11th terrorist attacks (T. W. Smith et al., 2001). Unfortunately, there were no measures of religious activity included in the materials that could have more directly assessed such aspects as amount of praying or churchgoing. Hence, to provide more direct evidence for our assumption that after the Istanbul terrorist attacks religious people increased their praying and other religious behavior to be better able to cope with the threat, we ran a follow-up study in which we tested whether people indeed affirmed their faith when confronted with terrorist attacks. At the end of their participation in a different study, in which participants had already filled out the Intrinsic and Extrinsic Religious Orientation Scale, 39 students from the University of Munich were given a questionnaire that started with the same short description of the terrorist attacks in Istanbul, as in Study 1. After that, we asked participants to answer some questions on a scale (from 0 ⫽ not at all to 10 ⫽ very much) regarding the matter of how terrorism affected people. The first question referred to whether participants could remember how they had reacted in the days immediately after the attacks in Istanbul. We presented five reactions: (a) “I talked to other people about the danger of terrorist attacks more often” (M ⫽ 3.38, SD ⫽ 2.51); (b) “I paid special attention to the terrorist attacks in the media” (M ⫽ 3.54, SD ⫽ 2.48); (c) “I said additional prayers that there be no more terrorist attacks” (M ⫽ 1.62, SD ⫽ 2.17); (d) “I went to church more often” (M ⫽ 0.33, SD ⫽ 0.70); and (e) “I tried harder to avoid thinking about possible terrorist attacks” (M ⫽ 3.18, SD ⫽ 2.90). In addition, participants had the opportunity to report other reactions. The second question asked participants, “In case you are religious, to what extent does believing in God help you to cope with the fear of terrorist attacks?” (M ⫽ 3.17, SD ⫽ 3.40). Finally, participants were asked to describe in a few words how believing in God helped them to cope with the fear of terrorist attacks. We ran correlations between intrinsic religiousness and the responses to the different questions. The corresponding correlations for extrinsic religiousness are reported in brackets. The results revealed that the higher the participants’ intrinsic religious orientation, the more often they reported that in response to the terrorist attacks they had said additional prayers that there be no more terrorist attacks (r ⫽ .35, p ⬍ .03 [extrinsic religiousness: r ⫽ .25, p ⬍ .13]), the more often they went to church (r ⫽ .30, p ⬍ .07 [extrinsic religiousness: r ⫽ .38, p ⬍ .02]), and the more effectively believing in God helped them to cope with the fear of terrorist attacks (r ⫽ .75, p ⬍ .001 [extrinsic religiousness: r ⫽ .57, p ⬍ .002]). However, with regard to other reactions, such as talking to other people about the danger of terrorist attacks (r ⫽ .14, p ⬎ .40 [extrinsic religiousness: r ⫽ .06, p ⬎ .70]), paying attention to the terrorist attacks in the media (r ⫽ .20, p ⬎ .20 [extrinsic religiousness: r ⫽ ⫺.04, p ⬎ .70]), and trying to avoid thinking about possible terrorist attacks (r ⫽ .07, p ⬎ .65 [extrinsic

559

religiousness: r ⫽ .05, p ⬎ .70]6), religious participants in our follow-up study did not differ from nonreligious people. These findings are thus in accordance with our assumption that intrinsically religious participants cope with the fear aroused by terrorist attacks by displaying an increased tendency to affirm their religious faith. Furthermore, as noted earlier, our argument is that religious affirmation did not affect the dependent variable a week later (a week during which there were no further incidents) because the death threat had dissipated by that point. However, we are aware that this explanation is still only speculative. Therefore, in the next study, we more directly tested our hypothesis that affirmation of religion is indeed the necessary precondition for finding reduced terror management defense reactions among intrinsically religious people. In addition, although terrorist attacks remind people of personal mortality (reports about terrorist attacks usually contain facts and pictures of injured and dead victims), they also raise other kinds of threats as well, such as uncertainty, lack of control, or violation of sacred values. Recent evidence indicates that these kinds of threats can also cause defensive reactions (e.g., Burris & Rempel, 2004; McGregor et al., 2001; Tesser, Crepaz, Collins, Cornell, & Beach, 2000; Tetlock, Kristel, Elson, & Green, 2000; van den Bos, Poortvliet, Maas, Miedema, & van den Ham, 2005).7 To make sure that our effects were caused by MS, in the next two experiments we took an MS induction from the TMT literature and directly asked participants to think about their own death. By using classic TMT procedures and materials, we could also address another potential alternative explanation for our results derived from the findings of Abbott-Chapman and Denholm (2001) that religious individuals have a more risk-averse perceptual style than less religious individuals (see also Miller & Hoffmann, 1995). In combination with the fact that the newspaper articles arguing for a high likelihood of future terrorist attacks in Germany might have been more subjectively persuasive than the articles arguing for a low likelihood of future attacks, high intrinsically religious participants could have appeared less worldview defensive than low intrinsically religious participants. To circumvent this problem in the following studies, we used classic TMT paradigms that should not be affected by intrinsically religious individuals’ tendency to be more risk averse. 6

These results illustrate that participants with an extrinsic religious orientation also tended to affirm their religiousness in the face of terrorist attacks. This is in accordance with the assumption that extrinsic religiousness is used instrumentally to obtain safety and solace. However, different from intrinsically religious individuals, extrinsics emphasized going to church rather than praying to God as a coping response. Moreover, all in all, they found believing in God less effective in helping to cope with the fear of terrorist attacks than did intrinsics. Although it is uncertain to what extent people are able to introspect what helps them to cope with existential terror, this pattern of results might be another hint supporting our hypothesis that intrinsic religiousness is more effective than extrinsic religiousness in helping to cope with existential threat. 7 These findings are compatible with TMT because the theory does not claim that death is the only threat to give rise to defensive reactions but that MS is not merely an (for example) uncertainty or aversive treatment, a conclusion based in part on previous studies contrasting MS with uncertainty control topics and so forth (e.g., Landau, Johns, et al., 2004).

JONAS AND FISCHER

560

To conclude, the results of Study 1 provide a first indication that intrinsic religious beliefs may play an important role in terror management defense. In addition, although the results of the follow-up study are consistent with our interpretation that affirmation of religious beliefs is needed to help people to cope effectively with the terror of death, they are not conclusive and remain speculative. Therefore, in Study 2, we more directly tested the role of affirmation of religious beliefs for terror management defense. Furthermore, we used a traditional worldview defense measure and thereby increased the divergence between the independent and dependent variables.

Study 2 In the context of a naturally occurring reminder of mortality, our first study suggested that intrinsic religiousness might indeed serve as an anxiety buffer against existential threat and thus prevent worldview defense. In addition, the results—taken together with those of the follow-up study— hinted that the affirmation of religiousness might be a crucial factor in the process. In Study 2, we wanted to test this notion under more controlled circumstances. We hypothesized that we could replicate the finding that religious beliefs prevent worldview defense following MS with regard to a secular worldview defense measure consisting of the defense of the participants’ home city as a place to live. However, we predicted that this would be the case only if participants had the opportunity to affirm their religious beliefs before being confronted with reminders of death. In choosing the defense of participants’ home city as a place to live as a measure of worldview defense, we followed the terror management literature in which the most prominent dependent variable is the evaluation of a pro- versus anti-American essay (see Greenberg et al., 1997). However, German participants are less patriotic about their country than Americans (e.g., Mummendey, Klink, & Brown, 2001; Simon, Pantaleo, & Mummendey, 1995). On the other hand, as a legacy of the fact that Germany was decentralized for a long period of time in history and as a legacy of the epoch of romanticism, local regions are of special importance for Germans (cf. Heimat, 2005). This is especially the case for a city such as Munich in Bavaria, which is often called the “secret capital” of Germany because of its cultural, economic, and academic strength and the natural beauty of the surrounding area—to which its citizens feel particularly committed (cf. Die Corps, 2005; Goetheinstitut, 2005). To manipulate the opportunity to affirm religious beliefs, we asked participants to fill out the German version of the Intrinsic and Extrinsic Religious Orientation Scale by Feagin (1964; Zwingmann et al., 1994) either before the MS treatment or at the end of the experiment—after the worldview defense measure had been collected (for a similar procedure when priming tolerance, see Greenberg et al., 1992). We predicted that religion would prevent worldview defense after MS only for those people who had highly internalized religious beliefs and filled out the scale prior to the measure of worldview defense. By contrast, no effect was expected for high and low intrinsically religious participants who filled out the Religious Orientation Scale after the measure of worldview defense. In addition, we did not predict any effect to occur with regard to the Extrinsic Religious Orientation Scale.

Method Participants. To get a wide range of people scoring very differently on the Religious Orientation Scale, we decided to collect our data at two different universities in Munich: 65 participants were from the Hochschule fu¨r Philosophie (University of Philosophy)—a university led by the Jesuit order, with many religious students—and 46 participants were from the Ludwig-Maximilians-Universita¨t, with a representative range of religious students. The sample consisted of 65 women and 46 men, ranging in age from 18 to 63 years (M ⫽ 27.35 years, SD ⫽ 10.30). Sixteen participants were excluded from the analyses because they were not German citizens, and we were not sure to what extent the dependent variable applied to them. Thus, 95 participants remained in our sample (56 reported that they were Catholic, 20 were Protestant, 18 reported to have no denomination or did not fill out the corresponding box, and one answer was illegible). Design. The experiment was based on a 2 (MS: yes vs. no) ⫻ 2 (affirmation of religious beliefs: yes vs. no) between-subjects design. The dependent variable of worldview defense was based on participants’ defense of Munich as a place to live. Procedure. We recruited the participants in the university buildings by asking whether they would be willing to take part in a set of diverse psychological studies in which they would have to fill out some questionnaires about personality, urban areas, and the religious side of life. If they agreed, they were given a small packet of short questionnaires. The packet started with questions about different cities in Germany. After that, participants in the affirmation of religious beliefs condition filled in the German translation of the Intrinsic (␣ ⫽ .84) and Extrinsic (␣ ⫽ .76) Religious Orientation Scale by Feagin (1964), whereas participants in the no affirmation of religious beliefs condition filled in this questionnaire at the end of the study after the dependent variable worldview defense had been collected. Then we administered the MS versus control treatment by using an adaptation of the Projective Life Attitudes Assessment, which has been used in previous studies on terror management (cf. Dechesne et al., 2003). In the MS condition, participants were asked to write down the first sentence that came to their mind when they thought about their own death. In the control condition, the same question was asked with regard to dental pain. We then asked participants to respond to a mood assessment (a German translation of the Positive and Negative Affect Schedule; Watson, Clark, & Tellegen, 1988)8 to provide a delay between the MS manipulation and the dependent measure (cf. Greenberg, Pyszczynski, Solomon, Simon, & Breus, 1994). We measured the dependent variable worldview defense by giving participants two essays that dealt with an evaluation of Munich as a place to live. The essays were ostensibly written by two students who had been living in Munich for 2 years. One essay was a positive report focusing on Munich as being one of the most beautiful cities in Germany, where life is fun because of the wonderful parks, buildings, beer gardens, and cultural events. In addition, there were descriptions of the great job opportunities and lower unemployment problems compared with other cities in Germany. The author stated that one reason for this was that Munich had great universities with excellent scientists and teachers. The negative essay accused the people living in Munich of being conservative, narrowminded, and unworldly. Moreover, the city was criticized as being excessively expensive, and the economic situation and job market were said to be deteriorating. With regard to cultural events, the city was described as becoming less attractive because many young artists were moving to Berlin. The author stated that because Munich was so crowded, it was hard to enjoy the opportunities the city provided. Finally, the universities were also characterized as being overcrowded and the professors as always

8

In prior research, the PANAS scales have consistently been used as filler, and MS has not shown any reliable effects on affect (see Greenberg et al., 1997). In line with this research, we found that the multivariate analysis of variance was not significant. The same was found for Study 3.

TERROR MANAGEMENT AND RELIGION

561

being stressed. Order of presentation of the essays was counterbalanced. After having read each essay, participants were asked to evaluate it with regard to five questions: how much the participants liked the author, how intelligent and how knowledgeable they thought he was, how much they agreed with his opinion, and how true what he said was (these questions were taken from former terror management studies; see, e.g., HarmonJones et al., 1997). Each question had to be answered on a scale from 1 (not at all) to 10 (very much). Cronbach’s alpha was .87 for the five ratings of the positive essay and .89 for the negative essay. In the end, participants were debriefed and thanked for their participation.

Results Our dependent measure of worldview defense was a composite of the difference of the mean evaluations of pro-Munich minus anti-Munich essays (cf., e.g., Greenberg et al., 1994). We first checked whether there was any effect depending on the location where our study took place. We found that there was no difference and also no interaction with the experimental conditions.9 Next, we ran a regression analysis by using the composite measure for worldview defense as the criterion variable and MS, affirmation of religious beliefs, intrinsic and extrinsic religious orientation, and the corresponding two-way, three-way, and fourway interactions as predictor variables. This analysis yielded a significant effect of MS (B ⫽ 1.64, ␤ ⫽ .35), t(80) ⫽ 2.04, p ⬍ .05, partial r2 ⫽ .05; a marginal interaction between MS and intrinsic religiousness (B ⫽ 1.20, ␤ ⫽ .36), t(80) ⫽ 1.78, p ⬍ .08, partial r2 ⫽ .04; and most important, the predicted interaction between MS, intrinsic religious orientation, and affirmation of religious beliefs (B ⫽ ⫺2.20, ␤ ⫽ ⫺.48), |t(80)| ⫽ 3.05, p ⬍ .004, partial r2 ⫽ .10. There were no other significant effects, and it is important to note that there were none with regard to extrinsic religious orientation.10 To understand the nature of the significant three-way interaction, we ran two separate regression analyses for the affirmation and no affirmation conditions. Without affirmation of religious beliefs, we found only the usual MS effect (B ⫽ 1.39, ␤ ⫽ .32), t(43) ⫽ 2.23, p ⬍ .04, partial r2 ⫽ .10, indicating that MS led to an increased worldview defense. However, with affirmation of religious beliefs, there was no main effect but only the predicted interaction between MS and intrinsic religious orientation (B ⫽ ⫺1.74, ␤ ⫽ ⫺.51), |t(44)| ⫽ 2.28, p ⬍ .03, partial r2 ⫽ .11. To illustrate the nature of this interaction, we inserted standardized intrinsic orientation scores at values one standard deviation above and below the mean and the two levels for MS into the regression equation (see Figure 2). We found that following MS, people high in intrinsic religiousness did not react with increased worldview defense, whereas people low in intrinsic religiousness did. Post hoc probing confirmed that there was a significant relationship between intrinsic religiousness and worldview defense following MS (B ⫽ ⫺1.38, ␤ ⫽ ⫺.37), t(44) ⫽ 2.87, p ⬍ .01, but not when mortality was not salient, |t| ⬍ 1. The correlation between the two scales of intrinsic and extrinsic religious orientation was .47. We will return to this point in the General Discussion.

Discussion The current study suggests that MS increases worldview defense for participants low in intrinsic religiousness and for participants

Figure 2. Worldview defense following affirmation of religious beliefs depending on people’s intrinsic religiousness and mortality salience in Study 2.

high in intrinsic religiousness who did not affirm their religious beliefs. However, people with a high intrinsic orientation who had the opportunity to affirm their religious beliefs did not exhibit intensified worldview defense in response to MS. In accordance with Study 1, this effect did not occur for people high in extrinsic religiousness. Whereas in Study 1 the dependent variable was closely connected to the independent variable (salience of terrorist attacks), in Study 2 the independent and dependent variables were more divergent. The fact that we replicated the findings for intrinsically religious participants from Study 1 in Study 2 underlines the generalizability of our results and suggests that only intrinsic religiousness seems to be effective in managing terror. The finding that extrinsic religiousness did not alter terror management reactions supports our assumption that extrinsic religiousness is not capable of acting as a buffer against existential anxiety. Considering that Donahue (1985) reported a correlation of about .76 with importance of religion or religious commitment for intrinsic religiousness but only .03 for extrinsic religious orientation, it is not surprising that extrinsic religiousness serves no protective function for terror management. Only the deeply internalized belief system of those who demonstrate an intrinsic orientation toward religion seems to be capable of buffering existential concerns. Regarding the role of affirmation in Study 1, we suggested that high intrinsics had previously been affirming their worldview when confronted with the terrorist attacks and thus did not show increased worldview defense. On the basis of this logic, in Study 2 we then manipulated whether participants had the opportunity to 9 We also checked whether there was any effect regarding participants’ gender or order of presentation of the two essays, but no significant effects were found. For 3 participants, we replaced the value for extrinsic religious orientation with the mean, and for 2 participants, we replaced the value for intrinsic religious orientation with the mean. This procedure is an established method to deal with low numbers of missing values (Roth, Switzer, & Switzer, 1999). 10 We found only a marginal interaction with affirmation of religious beliefs—that is, MS was not involved in this interaction (B ⫽ 1.35, ␤ ⫽ .40), t(80) ⫽ 1.68, p ⬍ .10, partial r2 ⫽ .03.

JONAS AND FISCHER

562

affirm their religious orientation prior to MS and confirmed that giving intrinsics such an opportunity negated the need to defend. Thus, we can conclude that the opportunity to affirm their religious beliefs allowed the high intrinsically religious participants to react nondefensively after being reminded of their own mortality. In Study 3, our aim was to shed more light on the process that might underlie this finding. As our research illustrates that affirming important values prevents worldview defense, Study 2 could be seen as a conceptual replication of the research by Schmeichel and Martens (2005). However, our results also go beyond these findings because these authors did not differentiate between different kinds of worldviews, and most of the participants bolstered their relations with family and friends. Our approach has two advantages over the study by Schmeichel and Martens: First, we avoided a possible influence due to the asymmetric choice of a relevant topic. Second, our approach should have a priori excluded the possibility that a potential TMT effect could be due to a self-esteem boost, because, by definition, intrinsic religiousness represents a connection to God that exhibits less emphasis on the ego or self. In contrast, extrinsic religiousness represents a connection to God without turning away from the ego or self (Allport & Ross, 1967; see also Hills & Francis, 2003). On that score, the affirmation of the self should have been a priori excluded by affirming intrinsic religious values, rendering the alternative explanation of a potential terror management effect due to a self-esteem boost unlikely. However, to provide additional evidence that affirming intrinsic religious beliefs did not boost self-esteem and thereby reduce defensiveness in the face of MS (see Harmon-Jones et al., 1997), in Study 3 we included the State Self-Esteem Scale (Heatherton & Polivy, 1991), although we did not expect any changes in self-esteem (for similar findings, see Galinsky, Stone, & Cooper, 2000; Schmeichel & Martens, 2005).

Study 3 A variety of studies within the terror management literature focus on the process that connects MS and worldview defense. These studies have illustrated that worldview defense primarily occurs when death-related thoughts are highly accessible but outside of current consciousness or focal attention (Greenberg et al., 1994). The initial defense against death-related thoughts involves suppressing these thoughts and distracting oneself from them, resulting in reduced death-thought accessibility (Pyszczynski et al., 1999). However, when death-related thoughts are no longer in focal consciousness, death-thought accessibility increases, triggering worldview defense, which in turn reduces the accessibility of death-related thoughts (Arndt, Greenberg, Solomon, Pyszczynski, & Simon, 1997). If worldview defense is triggered by increased accessibility of death-related thoughts, the question arises of whether people high in intrinsic religiousness who affirmed their religiousness exhibit lower death-thought accessibility than nonreligious people or religious people who did not affirm their religiousness. However, an alternative explanation for our results could be that affirming one’s faith might have helped people high in intrinsic religiousness to better cope with the problem of death. Thus, although deaththought accessibility might still increase for this group of participants, it probably does not create the potential for existential fears.

In Study 3, we therefore tested whether affirmation of religious beliefs decreased death-thought accessibility in high intrinsically religious participants but not in low intrinsically religious participants, providing us with a possible explanation for the findings in Study 1 and 2.

Method Participants. Fifty students from two universities in Munich, Germany, who had not participated in Study 2 volunteered to participate in the study. Twenty-four participants were from the Hochschule fu¨r Philosophie (University of Philosophy) led by the Jesuit order, and 26 participants were from the Ludwig-Maximilians-Universita¨t. The sample consisted of 20 women and 30 men, ranging in age from 19 to 49 years (M ⫽ 24.04 years, SD ⫽ 4.68). Regarding their religious denomination, 28 of the participants reported that they were Catholic, 12 were Protestant, and 10 participants reported that they had no denomination or did not answer the corresponding question. Design. The experiment was based on a one-factorial between-subjects design with two conditions: MS versus no MS. The dependent variable was the accessibility of death-related words. Procedure. We recruited participants in the university buildings by asking them whether they would be willing to take part in a set of diverse psychological studies in which they had to fill out some questionnaires about different aspects of their lives. If they agreed, they were given a small packet of short questionnaires. The packet started with the German translation of the Intrinsic (␣ ⫽ .86) and Extrinsic (␣ ⫽ .72) Religious Orientation Scale by Feagin (1964). After that, we used the same procedure as in Study 2 to institute the MS or control and distracter treatment. Then we measured the dependent variable death-thought accessibility by using a word-stem completion task (a German version based on Greenberg et al., 1994). We presented 24 different word fragments to the participants and asked them to find for each fragment a word that made sense and to then complete each word fragment by putting the corresponding letters into the blanks. For 6 of these word fragments, participants could insert either a death-related word (such as grave, corpse, or coffin) or a neutral word. The other fragments served as filler items. Participants then received a German version of the 12-item State Self-Esteem Scale by Heatherton and Polivy (1991; Sellin & Schu¨tz, 2001) (␣ ⫽ .74). Finally, for exploratory reasons, we asked some additional questions regarding participants’ religious beliefs and practices. In the end, participants were debriefed and thanked for their participation.

Results Our dependent measure was the number of death-related words the participants inserted in the word completion task.11 In the regression analysis, this variable served as criterion. Because of the low sample size and the left-skewed dependent variable deaththought accessibility (cf. Green, 1991; Tabachnick & Fidell, 2001), in the following regression analysis we reduced the number of predictor variables and included only MS and intrinsic religious orientation. The results revealed the predicted significant interaction between MS and intrinsic religious orientation (B ⫽ ⫺.47, ␤ ⫽ ⫺.45), |t(46)| ⫽ 2.10, p ⬍ .05, partial r2 ⫽ .09. No other effects occurred. To illustrate the nature of this interaction, we inserted standardized intrinsic orientation scores at values one standard deviation above and below the mean and the two levels for MS 11 We first checked whether there was any difference depending on the two locations where our study took place. However, there was no effect. There was also no effect regarding participants’ gender.

TERROR MANAGEMENT AND RELIGION

into the regression equation (see Figure 3). This analysis revealed that following MS, people low in intrinsic religiousness showed an increased accessibility of death-related thoughts, whereas for high intrinsically religious people, this was not the case. Post hoc probing confirmed that there was a significant relationship between intrinsic religiousness and death-thought accessibility only following MS (B ⫽ ⫺.37, ␤ ⫽ ⫺.32), |t(46)| ⫽ 2.65, p ⬍ .05, but not when mortality was not salient, |t| ⬍ 1. The correlation between the two scales of intrinsic and extrinsic religious orientation was .54. We will return to this point in the General Discussion. Finally, we checked whether the decreased accessibility of death-related words for intrinsically religious participants could be explained by increased self-esteem of the participants. However, we found that intrinsic religiousness correlated negatively with self-esteem (r ⫽ ⫺.27, p ⬍ .07). We next entered self-esteem into the regression analysis to check for mediation. We found that in this analysis, the significant interaction between intrinsic religious orientation and MS remained unaffected (B ⫽ ⫺0.52, ␤ ⫽ ⫺.49), |t(45)| ⫽ 2.17, p ⬍ .04, partial r2 ⫽ .09. This leads to the conclusion that the difference in the accessibility of death-related words cannot be explained by any change in self-esteem.

Discussion Whereas Study 2 illustrated that for people high in intrinsic religiousness affirmation of religious beliefs prevented worldview defense following MS, Study 3 suggested that this group of participants also did not show heightened accessibility of deathrelated thoughts when reminded of death—although low intrinsically religious participants showed the usual delayed increase in death-thought accessibility. Taken together, these findings are an important addition to previous research showing that active commitment to religious beliefs and practices reduces fear of death (e.g., Feifel & Nagy, 1981; Kahoe & Dunn, 1975; Spilka et al., 1977; Templer, 1970). Thus, Study 3 suggests that people high in intrinsic religiousness do not seem to be better able to cope with increased death-thought accessibility; rather, MS does not induce increased accessibility of death-related thoughts as it does for nonreligious or low intrinsically religious people.

Figure 3. Death-thought accessibility depending on people’s intrinsic religiousness and mortality salience in Study 3.

563

The finding that the affirmation of the intrinsic religious belief is critical as well as the more direct test with the State Self-Esteem Scale argues against the idea that our effects could simply be explained by difference in self-esteem between high and low intrinsically oriented religious people. This result is consistent with other value-based self-affirmation studies (Galinsky et al., 2000; Schmeichel & Martens, 2005). Study 3 also helps to rule out an alternative account for Study 2, which could possibly also have been explained in line with a trivialization– centrality alternative. By this account, high intrinsics who affirmed their beliefs might have been less prone to worldview defense because the domain of defense occurred with regard to a less central aspect of their belief system. Living in Munich might simply have been less important for high intrinsics, and previous TMT work has shown that people defend only those domains that are important to them. Similarly, one could alternatively suggest that by affirming religion, living in Munich no longer seemed that important. However, the data from Study 3 support our view and refute this alternative by finding that deaththought accessibility is not elevated among intrinsically religious participants. The alternative account would have predicted no such attenuation.

General Discussion Although people’s thinking and feelings about death have been colored by their religious beliefs and rituals for centuries (Parrinder, 1983), in the context of TMT there has been a paucity of research exploring the role of religion in mitigating defensive reactions to existential concerns. Our research focuses on the important function of religion in terror management and goes even further by integrating the importance of intrinsic versus extrinsic belief and affirming those beliefs. By illustrating that MS induced reluctance to use religious symbols in an inappropriate way and increased bias toward members of one’s own religion compared with affiliates of another religion, previous terror management studies have suggested that the reminder of mortality leads people to defend their religious faith (Greenberg et al., 1990). Going beyond this research and in line with research from other areas showing that religiousness is generally associated with lower death anxiety, our research provides converging evidence that religious belief plays a protective role in terror management. Our studies suggest that affirming intrinsic religiousness reduces the use of terror management defenses with regard to both a secular belief system and deaththought accessibility following MS. This suggests an important qualifier of the conclusion that religious faith plays a terror management function and protects people from their concerns about death. It appears that only those intrinsically vested in their religion (i.e., those for whom religion serves as a framework for life by providing both meaning and value) derive terror management benefits from religious beliefs. Extrinsically oriented religious people (i.e., those with a utilitarian approach to religion), on the other hand, do not seem to be protected from the terror of death. Thus, our research suggests that the distinction between intrinsic and extrinsic religious participants is crucial to understand under which conditions religion plays a protective role in terror management. This underlies the idea that it is not religiousness per se that helps people to buffer mortality concerns and that intrinsically

564

JONAS AND FISCHER

religious people are not just more religious in a quantitative sense than extrinsically religious people. In contrast, the quality distinction between intrinsic and extrinsic religious orientation seems to be responsible for the difference in terror management defense. This finding extends prior work showing that only intrinsic religiousness correlates with measures of religious beliefs and commitment, whereas extrinsic religiousness has been shown to be relatively uncorrelated with these measures (Donahue, 1985). In this context, however, an interesting question is whether the quality and quantity of religious beliefs might be inherently confounded in the concept of intrinsic and extrinsic religiousness. Part of what defines intrinsic religiousness is that it is more central and internalized. Thus, it appears that intrinsically religious people are also more religious in a quantitative sense (e.g., they pray more often) than extrinsic religious people. As religion can be understood as multidimensional (Pargament, 2002), it might be a fruitful endeavor for future research to further investigate qualitative and quantitative differences in different concepts of religion. A complicating factor, however, was that in all three studies we found considerable correlations between intrinsic and extrinsic religious orientation. Although Allport (1950) originally proposed a strong negative correlation between intrinsic and extrinsic religiousness, later research reported a wide range of correlations from ⫺.58 to .56. (Donahue, 1985; Hutchinson et al., 1998; Maltby & Day, 2000; Zwingmann et al., 1994). Thus, noticeably positive correlations between the two dimensions are not uncommon. In addition, previous literature has shown that different kinds of samples reveal different correlations. Positive correlations can be found in samples in which the majority of participants are nonreligious, whereas negative correlations appear more often in religious samples or in especially orthodox and conservative samples (Donahue, 1985; Hutchinson et al., 1998). Moreover, particularly in studies involving student samples and in other studies conducted in Germany, high correlations can be observed (Zwingmann et al., 1994). Thus, the positive correlations we found in our studies were probably due to the fact that most of our participants were students who held less traditional views. However, we think this problem is lessened by the fact that the regression analyses suggested that the interaction between intrinsic religious orientation and the experimental factors was independent from extrinsic religious orientation. This supports our interpretation and replicates previous findings showing that in the context of religious variables, no interactions with extrinsic religiousness were found. Nevertheless, such interactions could be observed with respect to dependent variables from a nonreligious context (Donahue, 1985). To summarize, our findings extend previous research by shedding more light on the role religion plays in terror management. Our work shows that level of religiousness, and not just its momentary salience, matters. Moreover, it also shows that type of religiousness matters, forging a link between traditional conceptualizations of religiousness and TMT and providing evidence that only intrinsic religiousness serves a terror management function, a finding not explicitly anticipated in prior writings on religiousness or TMT. By integrating the distinction between intrinsic and extrinsic religious beliefs and stressing the importance of the previous affirmation of those beliefs, we specified how religion mitigates defensive reactions to existential concerns. Thus, our work is also related to the areas of self-determination and self-affirmation.

For future research, it would be a fruitful endeavor to investigate the interplay between religiousness and TMT with further religious orientations that could not be addressed in the present research. For example, beyond the distinction between intrinsic and extrinsic religiousness, Batson (1976; see also Batson & Schoenrade, 1991) introduced a third orientation called “religion as a quest.” This religious orientation addresses a self-critical, doubt-valuing, and reflective approach to religiousness and thus a dimension that has so far not been addressed by the intrinsic– extrinsic distinction. In addition, on the basis of self-determination theory (Deci & Ryan, 1991), Ryan, Rigby, and King (1993) distinguished different styles of religious internalization: Introjection represents religious internalization that is determined by fear, guilt, or external pressure, whereas identification is characterized by greater volition and personal value. Future research might want to investigate whether the effects of the present studies can be replicated or may be even further differentiated with these additional types of religious orientations. Another interesting question for future research is whether the distinction between intrinsic and extrinsic orientation for terror management defense also extends to worldviews other than religious ones. Allport and Ross (1967, p. 434) identified intrinsically oriented religious people as those “who live their religion” and extrinsically oriented people as those “who use their religion.” Maybe a similar distinction can be made between people who live their cultural worldviews and those who use their cultural worldviews, or between those who internally identify with culturally relevant in-groups and those who instrumentally use cultural ingroups to achieve other goals, such as status, social support, or self-justification. Similar to distinguishing between an intrinsic and extrinsic orientation in religion, it might be useful to extrapolate this distinction to a broader basis of cultural worldviews in general. Perhaps within TMT, the distinction between intrinsic and extrinsic orientation is of greater importance than presently appreciated. Similar to the findings of Sheldon and Kasser (1998), which showed that the pursuit of only intrinsically oriented goals (but not extrinsically oriented ones) led to greater well-being, future research may indeed find similar effects for coping with existential fear resulting from the awareness of death. Furthermore, the question arises as to whether intrinsic values in general or intrinsic religiousness in particular is generally a better defense against all types of psychological threats, or whether there are forms of threat (e.g., self-esteem, social rejection) for which extrinsic religiousness provides a better defense than intrinsic religiousness. Recent research indicates that intrinsic religiousness is especially effective in coping with uncontrollable aversive events and threats, such as terrorism, severe illness, or death (e.g., Fischer et al., 2006; Park, Cohen, & Herb, 1990). However, because of the stronger relation of extrinsic religiousness to the self and to social networks and related social support, extrinsic religiousness might be especially effective in coping with controllable events requiring self-esteem and social support, such as being successful on the job, studying for exams, or dealing with frustrations in the family. It would be a fruitful endeavor for future research to address this research question more directly, for example, by varying the controllability of a future event (low vs. high) and measuring coping success dependent on religious orientation (extrinsic vs. intrinsic).

TERROR MANAGEMENT AND RELIGION

Conclusion Religion is obviously a central aspect of so many people’s lives. The religious belief system can accompany people from birth to death. Religion is unique among meaning systems because it equips “individuals to respond to situations in which they come face-to-face with the limits of human power and control and are confronted with their vulnerability and finitude” (B. W. Smith, Pargament, Brant, & Oliver, 2000, p. 171). Our research helps to explain why religion plays such a critical role in contemporary cultures and has always been important to people. Becker (1971) argued that religious worldviews are so effective for terror management because they provide cosmic significance and directly address the fear of death by claiming that death is not the end of existence. Thus, TMT might shed new light on understanding why intrinsic religiousness has been shown to have such positive implications for people’s physical and mental health (Hill & Pargament, 2003).

References Abbott-Chapman, J., & Denholm, C. (2001). Adolescents’ risk activities, risk hierarchies and the influence of religiosity. Journal of Youth Studies, 4, 279 –297. Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. Allen, G. (2000). The evolution of the idea of God. Escondido, CA: The Book Tree. (Original work published 1897) Allport, G. W. (1950). The individual and his religion. New York: Macmillan. Allport, G. W. (1966). The religious context of prejudice. Journal for the Scientific Study of Religion, 5, 447– 457. Allport, G. W., & Ross, J. M. (1967). Personal religious orientation and prejudice. Journal of Personality and Social Psychology, 5, 432– 443. Arndt, J., Greenberg, J., Solomon, S., Pyszczynski, T., & Simon, L. (1997). Suppression, accessibility of death-related thoughts, and cultural worldview defense: Exploring the psychodynamics of terror management. Journal of Personality and Social Psychology, 73, 5–18. Arndt, J., Schimel, J., & Goldenberg, J. L. (2003). Death can be good for your health: Fitness intentions as a proximal and distal defense against mortality salience. Journal of Applied Social Psychology, 33, 1726 – 1746. Baker, M., & Gorsuch, R. (1982). Trait anxiety and intrinsic– extrinsic religiousness. Journal for the Scientific Study of Religion, 21, 119 –122. Batson, C. D. (1976). Religion as prosocial: Agent or double agent? Journal of the Scientific Study of Religion, 15, 29 – 45. Batson, C. D., & Schoenrade, P. A. (1991). Measuring religion as quest: 1. Validity concerns. Journal for the Scientific Study of Religion, 30, 416 – 429. Batson, C. D., & Ventis, L. W. (1982). The religious experience. New York: Oxford University Press. Becker, E. (1971). The birth and death of meaning. New York: Free Press. Becker, E. (1973). The denial of death. New York: Free Press. Berger, P. L. (1969). A rumor of angels. Garden City, NY: Doubleday. Bergin, A. E. (1991). Values and religious issues in psychotherapy and mental health. American Psychologist, 46, 394 – 403. Bergin, A. E., Masters, K. S., & Richards, P. S. (1987). Religiousness and mental health reconsidered: A study of an intrinsically religious sample. Journal of Counseling Psychology, 34, 197–206. Bolt, M. (1977). Religious orientation and death fears. Review of Religious Research, 19, 73–76. Boscarino, J. A., Figley, C. R., & Adams, R. E. (2004). Compassion fatigue following the September 11 terrorist attacks: A study of secondary

565

trauma among New York City social workers. International Journal of Emergency Mental Health, 6, 57– 66. Burling, J. W. (1993). Death concerns and symbolic aspects of the self: The effects of mortality salience on status concern and religiosity. Personality and Social Psychology Bulletin, 19, 100 –105. Burris, C. T., & Rempel, J. K. (2004). “It’s the end of the world as we know it”: Threat and the spatial-symbolic self. Journal of Personality and Social Psychology, 86, 19 – 42. Carver, C. S., Scheier, M. F., & Weintraub, J. K. (1989). Assessing coping strategies: A theoretically based approach. Journal of Personality and Social Psychology, 56, 267–283. Dechesne, M., Janssen, J., & van Knippenberg, A. (2000). Defense and distancing as terror management strategies: The moderating role of need for structure and permeability of group boundaries. Journal of Personality and Social Psychology, 79, 923–932. Dechesne, M., Pyszczynski, T., Arndt, J., Ransom, S., Sheldon, K. M., van Knippenberg, A., & Janssen, J. (2003). Literal and symbolic immortality: The effect of evidence of literal immortality on self-esteem striving in response to mortality salience. Journal of Personality and Social Psychology, 84, 722–737. Deci, E. L., & Ryan, R. M. (1991). A motivational approach to self: Integration in personality. In R. Dienstbier (Ed.), Nebraska Symposium on Motivation: Perspectives on motivation (Vol. 38, pp. 237–288). Lincoln: University of Nebraska Press. Die Corps. (2005). Mu¨nchen— die Hauptstadt (zumindest) der Bayern [Munich—the capital (at least) for the Bavarians]. Retrieved September 24, 2005, from http://www.die-corps.de/Muenchen.352.0.html Donahue, M. (1985). Intrinsic and extrinsic religiousness: Review and meta-analysis. Journal of Personality and Social Psychology, 48, 400 – 419. Ebaugh, H. R. F., Richman, K., & Chafetz, J. S. (1984). Life crisis among the religiously committed; do sectarian distances matter? Journal for the Scientific Study of Religion, 23, 19 –31. Feagin, J. R. (1964). Prejudice and religious types: A focused study of Southern fundamentalists. Journal for the Scientific Study of Religion, 4, 3–13. Feifel, H., & Nagy, V. T. (1981). Another look at fear of death. Journal of Consulting and Clinical Psychology, 49, 278 –286. Feuerbach, L. (1980). Thoughts on death and immortality: From the papers of a thinker, along with an appendix of theological-satirical epigrams. Berkeley: University of California Press. (Original work published 1843) Fischer, P., Greitemeyer, T., Kastenmu¨ller, A., Jonas, E., & Frey, D. (2006). Coping with terrorism: The impact of increased salience of terrorism on mood and self-efficacy of intrinsically religious and nonreligious people. Personality and Social Psychology Bulletin, 32, 365– 377. Freud, S. (1959). Thoughts for the time on war and death, 1915. Collected papers (Vol. 4). New York: Basic Books. (Original work published 1915) Galinsky, A. D., Stone, J., & Cooper, J. (2000). The reinstatement of dissonance and psychological discomfort following failed affirmations. European Journal for Social Psychology, 30, 123–147. Gallup News Service. (2001). Attack on America: Key trends and indicators. In a graphic summary of public opinion following September 11th. Retrieved December 20, 2003, from http://www.gallup.com/poll/ releases/pr01092c.asp Genia, V. (1996). I, E, quest, and fundamentalism as predictors of psychological and spiritual well-being. Journal for the Scientific Study of Religion, 35, 56 – 64. Goetheinstitut. (2005). Bundesla¨nder 3 Bayern: Munich [Federal states 3 Bavaria: Munich]. Retrieved September 24, 2005, from http://www .goethe.de/dll/pro/lkpc/Muenchen.htm

566

JONAS AND FISCHER

Goodenough, E. R. (1986). The psychology of religious experiences. Lanham, MD: University Press of America. Gorsuch, R. L. (1988). Psychology of religion. Annual Review of Psychology, 39, 201–221. Gorsuch, R. L., & McPherson, S. E. (1989). Intrinsic/extrinsic measurement: I/E-Revised and single-item scales. Journal for the Scientific Study of Religion, 28, 348 –354. Green, S. B. (1991). How many subjects does it take to do a regression analysis? Multivariate Behavioral Research, 26, 499 –510. Greenberg, J., Pyszczynski, T., Solomon, S., Rosenblatt, A., Veeder, M., Kirkland, S., & Lyon, D. (1990). Evidence for terror management theory: II. The effects of mortality salience on reactions to those who threaten or bolster the cultural worldview. Journal of Personality and Social Psychology, 58, 308 –318. Greenberg, J., Pyszczynski, T., Solomon, S., Simon, L., & Breus, M. (1994). Role of consciousness and accessibility of death-related thoughts in mortality salience effects. Journal of Personality and Social Psychology, 67, 627– 637. Greenberg, J., Simon, L., Porteus, J., Pyszczynski, T., & Solomon, S. (1995). Evidence of a terror management function of cultural icons: The effects of mortality salience on the inappropriate use of cherished cultural symbols. Personality and Social Psychology Bulletin, 21, 1221– 1228. Greenberg, J., Simon, L., Pyszczynski, T., Solomon, S., & Chatel, D. (1992). Terror management and tolerance: Does mortality salience always intensify negative reactions to others who threaten one’s worldview? Journal of Personality and Social Psychology, 63, 212–220. Greenberg, J., Solomon, S., & Landau, M. (in press). What is the primary psychological function of religion? In D. M. Wulff (Ed.), Handbook of the psychology of religion. New York: Oxford University Press. Greenberg, J., Solomon, S., & Pyszczynski, T. (1997). Terror management theory of self-esteem and cultural worldviews: Empirical assessments and conceptual refinements. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 29, pp. 61–139). San Diego, CA: Academic Press. Grimm, R. (2005). Der Terror als Alltag—Die Menschen reagieren auf Terroranschla¨ge heftig— und vergessen schnell [Terror as part of everyday life—People’s reactions to terrorist attacks are intense— but forgotten fast]. Retrieved October 3, 2005, from http://www.tagesspiegel.de/ tso/sonderthema9/artikel.asp?TextID⫽52020 Harmon-Jones, E., Simon, L., Greenberg, J., Pyszczynski, T., Solomon, S., & McGregor, H. (1997). Terror management theory and self-esteem: Evidence that increased self-esteem reduces mortality salience effects. Journal of Personality and Social Psychology, 72, 24 –36. Heatherton, T. F., & Polivy, J. (1991). Development and validation of a scale for measuring state self-esteem. Journal of Personality and Social Psychology, 60, 895–910. Heimat [Homeland]. (2005). Retrieved September, 14, 2005, from http:// www.wikipedia.org/wiki/Heimat Hill, P. C., & Pargament, K. I. (2003). Advances in the conceptualization and measurement of religion and spirituality: Implications for physical and mental health research. American Psychologist, 58, 64 –74. Hills, P., & Francis, L. J. (2003). Discriminant validity of the Francis Scale of attitude towards Christianity with respect to religious orientation. Mental Health, Religion & Culture, 6, 277–282. Hood, R. (1971). A comparison of the Allport and Feagin scoring procedures for intrinsic/extrinsic orientation. Journal for the Scientific Study of Religion, 10, 370 –374. Hutchinson, G. T., Patock-Peckham, J. A., Cheong, J. W., & Nagoshi, C. T. (1998). Personality predictors of religious orientation among Protestant, Catholic, and non-religious college students. Personality and Individual Differences, 24, 145–151. James, W. (1902). The varieties of religious experience: A study in human nature. New York: Modern Library.

Jonas, E., Fritsche, I., Greenberg, J., Martens, A., & Niesta, D. (2006). Focus theory of normative conduct and terror management theory: The interactive impact of mortality salience and norm salience on social judgment and behavior. Unpublished manuscript, University of Munich, Munich, Germany. Jonas, E., Greenberg, J., & Frey, D. (2003). Connecting terror management and dissonance theory: Evidence that mortality salience increases the preference for supporting information after decisions. Personality and Social Psychology Bulletin, 29, 1181–1189. Kahoe, R. D., & Dunn, R. F. (1975). The fear of death and religious attitudes and behavior. Journal for the Scientific Study of Religion, 14, 379 –382. Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole. Landau, M. J., Johns, M., Greenberg, J., Pyszczynski, T., Martens, A., Goldenberg, J. L., & Solomon, S. (2004). A function of form: Terror management and structuring the social world. Journal of Personality and Social Psychology, 87, 190 –210. Landau, M. J., Solomon, S., Greenberg, J., Cohen, F., Pyszczynski, T., Arndt, J., et al. (2004). Deliver us from evil: The effects of mortality salience and reminders of 9/11 on support for President George W. Bush. Personality and Social Psychology Bulletin, 30, 1136 –1150. Maltby, J., & Day, L. (2000). Depressive symptoms and religious orientation: Examining the relationship between religiosity and depression within the context of other correlates of depression. Personality and Individual Differences, 28, 383–393. Maltby, J., & Day, L. (2004). Should never the twain meet? Integrating models of religious personality and religious mental health. Personality and Individual Differences, 36, 1275–1290. Maltby, J., Lewis, C. A., & Day, L. (1999). Religious orientation and psychological well-being: The role of the frequency of personal prayer. British Journal of Health Psychology, 4, 363–378. Maton, K. I. (1989). The stress-buffering role of spiritual support: Crosssectional and longitudinal investigations. Journal for the Scientific Study of Religion, 28, 310 –323. McCrae, R. R., & Costa, P. T., Jr. (1986). Personality, coping, and coping effectiveness in an adult sample. Journal of Personality and Social Psychology, 54, 385– 405. McGregor, I., Zanna, M. P., Holmes, J. G., & Spencer, S. J. (2001). Compensatory conviction in the face of personal uncertainty: Going to extremes and being oneself. Journal of Personality and Social Psychology, 80, 472– 488. Miller, A. S., & Hoffmann, J. P. (1995). Risk and religion: An explanation of gender differences in religiosity. Journal for the Scientific Study of Religion, 34, 63–75. Minton, B., & Spilka, B. (1976). Perspectives on death in relation to powerlessness and form of personal religion. Journal of Death and Dying, 7, 261–268. Mummendey, A., Klink, A., & Brown, R. (2001). Nationalism and patriotism: National identification and out-group rejection. British Journal of Social Psychology, 40, 159 –172. Newport, F. (2004, December 23). Update: Americans and Religion— Eighty-four percent of Americans identify with a Christian religion. Retrieved December 29, 2004, from http://www.gallup.com/poll/ content/default.aspx?ci⫽14446 Norenzayan, A., & Hansen, I. G. (2006). Belief in supernatural agents in the face of death. Personality and Social Psychology Bulletin, 32, 174 –187. Osarchuk, M., & Tatz, S. (1973). Effect of induced fear of death on belief in an afterlife. Journal of Personality and Social Psychology, 27, 256 – 260. Pargament, K. I. (2002). The bitter and the sweet: An evaluation of the costs and benefits of religiousness. Psychological Inquiry, 13, 168 –181. Park, C., Cohen, L. H., & Herb, L. (1990). Intrinsic religiousness and

TERROR MANAGEMENT AND RELIGION religious coping as life stress moderators for Catholics versus Protestants. Journal of Personality and Social Psychology, 59, 562–574. Parrinder, E. G. (1983). World religions: From ancient history to the present. New York: Facts on File Publications. Pyszczynski, T., Greenberg, J., & Solomon, S. (1999). A dual-process model of defense against conscious and unconscious death-related thoughts: An extension of terror management theory. Psychological Review, 106, 835– 845. Pyszczynski, T., Solomon, S., & Greenberg, J. (2003). In the wake of 9/11: The psychology of terror. Washington, DC: American Psychological Association. Rosenblatt, A., Greenberg, J., Solomon, S., Pyszczynski, T., & Lyon, D. (1989). Evidence for terror management theory: I. The effects of mortality salience on reactions to those who violate or uphold cultural values. Journal of Personality and Social Psychology, 57, 681– 690. Roth, P. L., Switzer, F. S., & Switzer, D. M. (1999). Missing data in multiple item scales: A Monte Carlo analysis of missing data techniques. Organizational Research Methods, 2, 211–232. Ryan, R. M., Rigby, S., & King, K. (1993). Two types of religious internalization and their relations to religious orientations and mental health. Journal of Personality and Social Psychology, 65, 586 –596. Saad, L. (2001). Personal impact on Americans’ lives: Women express much more fear of terrorism than do men. Retrieved December 7, 2001, from http://www.gallup.com/poll/releases/pr010914e.asp Saroglou, V. (2002). Religion and the five factors of personality: A meta-analytic review. Personality and Individual Differences, 32, 15– 25. Schmeichel, B. J., & Martens, A. (2005). Self-affirmation and mortality salience: Affirming values reduces worldview defense and deaththought accessibility. Personality and Social Psychology Bulletin, 31, 658 – 667. Schoenrade, P. (1989). When I die . . . : Belief in afterlife as a response to mortality. Personality and Social Psychology Bulletin, 15, 91–100. Sellin, I., & Schu¨tz, A. (2001, September). Validierung neuer deutschsprachiger Skalen zu Facetten des Selbstwertgefu¨hls [Validation of a new German Multidimensional Self-Esteem Scale]. Poster session presented at the 6th Conference of the German Section of Differential Psychology, Personality Psychology and Psychological Diagnostics, Leipzig, Germany. Sheldon, K. M., & Kasser, T. (1998). Pursuing personal goals: Skills enable progress, but not all progress is beneficial. Personality and Social Psychology Bulletin, 24, 1319 –1331. Simon, B., Pantaleo, G., & Mummendey, A. (1995). Unique individuals or interchangeable group member? The accentuation of intragroup differences versus similarities as an indicator of the individual self versus the collective self. Journal of Personality and Social Psychology, 69, 106 – 119. Smith, B. W., Pargament, K. I., Brant, C., & Oliver, J. M. (2000). Noah revisited: Religious coping by church members and the impact of the 1993 Midwest flood. Journal of Community Psychology, 28, 169 –186. Smith, T. W., Rasinski, K. A., & Toce, M. (2001). America rebounds: A national study of public response to the September 11th terrorist attacks:

567

Preliminary findings. Retrieved December 16, 2002, from the National Opinion Research Center, University of Chicago, Web site: http://www .norc.uchicago.edu/projects/reaction/pubresp.pdf Solomon, S., Greenberg, J., & Pyszczynski, T. (1991). A terror management theory of social behavior: The psychological functions of selfesteem and cultural worldviews. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 24, pp. 93–159). San Diego, CA: Academic Press. Solomon, S., Greenberg, J., & Pyszczynski, T. (2004). The cultural animal: Twenty years of terror management theory and research. In J. Greenberg, S. Koole, & T. Pyszczynski (Eds.), Handbook of experimental existential psychology (pp. 13–34). New York: Guilford Press. Spilka, B., Pelligrini, R. J., & Dailey, K. (1968). Religion, American values & death perspectives. Sociological Symposium, 1, 57– 66. Spilka, B., Stout, L., Minton, B., & Sizemore, D. (1977). Death and personal faith: A psychometric investigation. Journal for the Scientific Study of Religion, 16, 169 –178. Steele, C. M. (1988). The psychology of self-affirmation: Sustaining the integrity of the self. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 21, pp. 261–302). New York: Academic Press. Sturgeon, R. S., & Hamley, R. W. (1979). Religiosity and anxiety. Journal of Social Psychology, 108, 137–138. Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon. Templer, D. I. (1970). Death anxiety in religiously very involved persons. Psychological Reports, 31, 361–362. Tesser, A., Crepaz, N., Collins, J. C., Cornell, D., & Beach, R. H. (2000). Confluence of self-esteem regulation mechanisms: On integrating the self-zoo. Personality and Social Psychology Bulletin, 26, 1476 –1489. Tetlock, P. E., Kristel, O. V., Elson, S. B., & Green, M. C. (2000). The psychology of the unthinkable: Taboo trade-offs, forbidden base rates, and heretical counterfactuals. Journal of Personality and Social Psychology, 78, 853– 870. van den Bos, K., Poortvliet, P. M., Maas, M., Miedema, J., & van den Ham, E. J. (2005). An enquiry concerning the principles of cultural norms and values: The impact of uncertainty and mortality salience on reactions to violations and bolstering of cultural worldviews. Journal of Experimental Psychology, 41, 91–113. Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54, 1063–1070. Zwingmann, C., Hellmeister, G., & Ochsmann, R. (1994). Intrinsische und extrinsische Orientierung: Fragebogenskalen zum Einsatz in der empirisch-religionspsychologischen Forschung [Intrinsic and extrinsic religious orientation: Questionnaire scales in use for psychological studies of religion]. Zeitschrift fu¨r Differentielle und Diagnostische Psychologie, 15, 131–139.

Received January 3, 2005 Revision received March 15, 2006 Accepted March 20, 2006 䡲

Journal of Personality and Social Psychology 2006, Vol. 91, No. 3, 568 –581

Copyright 2006 by the American Psychological Association 0022-3514/06/$12.00 DOI: 10.1037/0022-3514.91.3.568

The Thrill of Victory and the Agony of Defeat: Spontaneous Expressions of Medal Winners of the 2004 Athens Olympic Games David Matsumoto

Bob Willingham

San Francisco State University

The World of Judo Magazine

Facial behaviors of medal winners of the judo competition at the 2004 Athens Olympic Games were coded with P. Ekman and W. V. Friesen’s (1978) Facial Affect Coding System (FACS) and interpreted using their Emotion FACS dictionary. Winners’ spontaneous expressions were captured immediately when they completed medal matches, when they received their medal from a dignitary, and when they posed on the podium. The 84 athletes who contributed expressions came from 35 countries. The findings strongly supported the notion that expressions occur in relation to emotionally evocative contexts in people of all cultures, that these expressions correspond to the facial expressions of emotion considered to be universal, that expressions provide information that can reliably differentiate the antecedent situations that produced them, and that expressions that occur without inhibition are different than those that occur in social and interactive settings. Keywords: emotion, facial expressions, Olympic Games, universality, FACS

Since Darwin’s (1872/1998) classic work, the nature and function of facial expressions of emotion have long been a topic of interest and debate. They are universally recognized (Elfenbein & Ambady, 2002; Matsumoto, 2001; but see critiques by Russell, 1991, 1994, and rejoinders by Ekman, 1994; Izard, 1994), are displayed by congenitally blind infants and children (Charlesworth & Kreutzer, 1973) and nonhuman primates (Chevalier-Skolnikoff, 1973; Geen, 1992; Hauser, 1993; Snowdon, 2003), are associated with distinct physiological signatures (Davidson, 2003; Ekman, Levenson, & Friesen, 1983; Levenson, Ekman, & Friesen, 1990, Levenson, Ekman, Heider, & Friesen, 1992; Tsai & Levenson, 1997), and correspond to emotion taxonomies around the world (Romney, Moore, & Rusch, 1997; Shaver, Murdaya, & Fraley, 2001; Shaver, Schwartz, Kirson, & O’Connor, 1987). However, questions exist concerning whether or not spontaneous expressions of emotion occur in real-life, naturalistic settings. We contribute to this literature by reporting on our examination of emotional expressions of athletes at the 2004 Athens Olympic Games in three very different but highly charged emotional contexts to address basic questions about the nature of spontaneous facial expressions of emotion.

The Link Between Emotionally Evocative Contexts and Facial Expressions of Emotion Laboratory studies have demonstrated that emotionally evocative stimuli reliably elicit discrete facial expressions of emotion when there is no reason for participants to manage or modify their expressions (Berenbaum & Oltmanns, 1992; Bonanno & Keltner, 2004; Camras, Oster, Campos, Miyake, & Bradshaw, 1992; Chesney, Ekman, Friesen, Black, & Hecker, 1990; Ekman, 1972; Ekman, Davidson, & Friesen, 1990; Ekman, Friesen, & Ancoli, 1980; Ekman, Friesen, & O’Sullivan, 1988; Ekman, Matsumoto, & Friesen, 1997; Ellgring, 1986; Frank, Ekman, & Friesen, 1993; Gosselin, Kirouac, & Dore, 1995; Heller & Haynal, 1994; Keltner, Moffitt, & Stouthamer-Loeber, 1995; Keltner & Bonanno, 1997; Rosenberg & Ekman, 1994; Ruch, 1993, 1995). Field studies, however, suggest otherwise. In Kraut and Johnson’s (1979) Studies 1 and 2, bowlers smiled less when facing the pins and watching the action compared with when they faced their companions. In Study 3, smiles occurred when something favorable to the home team had happened, and when fans were socially involved with one another. In Study 4, pedestrians smiled more on nice days than on bad days, and if they were in social interaction with someone else than not. In Ruiz-Belda, Fernandez-Dols, Carrera, and Barchard’s (2003) Study 1, bowlers smiled more when turned to the pit (and presumably in social interaction), and not when they were facing the pins. In their Study 2, soccer fans smiled more frequently when they interacted with another fan, but not when watching the action on television. Finally Fernandez-Dols and Ruiz-Belda (1995) reported that Duchenne smiles occurred most frequently and for the longest duration when athletes from the 1992 Barcelona Olympic Games interacted with dignitaries to get their medal, but not when standing behind the podium or listening to the anthem. These reports of nonfindings are commensurate with Fridlund’s (1994, 2002) behavioral ecology view of expressions, which posits that facial expressions do not reflect underlying emotional states

David Matsumoto, Department of Psychology, San Francisco State University; Bob Willingham, The World of Judo Magazine, Bristol, England. We thank Yasuko Sato for FACS coding the expressions; Robert Levenson for his assistance in producing the EMFACS predictions; Paul Ekman for his review of the EMFACS predictions and comments on a draft of this article; Satoko Hirayama, Ana Maria Anguas Wong, Natalia Kouznetsova, Yasuko Sato, and Seung Hee Yoo for their comments on a draft of this article; and Akiko Terao, Marija Drezgic, Andres Olide, Devon McCabe, and Sanae Nakagawa for their assistance in the general laboratory program. Correspondence concerning this article should be addressed to David Matsumoto, Department of Psychology, San Francisco State University, 1600 Holloway Avenue, San Francisco, CA 94132. E-mail: [email protected] 568

SPONTANEOUS EXPRESSIONS

per se but instead are displays of intent aimed at controlling “the trajectory of a given social interaction” (Fridlund, 1994, p. 130). In this view, facial expressions serve social motives that are mostly determined by the presence of a true or internalized audience, and any link between facial expression and emotion is a circumstantial by-product of the concurrence between social motives and some emotions.

Methodological Considerations Although arguments about the meaning of facial expressions of emotion and their relationship with emotion-eliciting situations have peppered the literature (Ekman, 1992, 1999, 2003; Fridlund, 1997; Keltner, Ekman, Gonzaga, & Beer, 2003; Russell, Bachorowski, & Fernandez-Dols, 2003; Russell & Fernandez-Dols, 1997), the field is relatively short on rigorous expressive data to address these issues. Laboratory experiments typically lack the investigation of expression in social situations with other interactants1 and in many cases involve unnatural emotion elicitation techniques created for the purpose of studying emotion in the laboratory, an unnatural situation itself. Studies of expressive behavior outside the laboratory are important, not only because they address theoretical questions in real life but also because they describe behavior in their richness and complexity (Goffman, 1959). The conclusions from the field studies cited earlier, however, were premature because of limitations in their methodologies, some discussed elsewhere (Bonanno & Keltner, 2004). Here we focus on three directly relevant to our study.

Strong Elicitation of Emotion When studying emotion, it is imperative that emotion be elicited strongly enough to be studied. It is questionable as to whether this occurred in the three field reports cited earlier. For instance, the assumption in the studies involving bowlers (Kraut & Johnson, 1979; Ruiz-Belda et al., 2003) is that they should feel and express happiness each and every time they bowl a spare or a strike. Spares and strikes are good, but whether they need to be antecedents to happiness each and every time in a game is questionable. Many athletes in the middle of competition or individuals in the middle of a task keep their emotions in check pending the final outcome (Kerr, Wilson, & Nakamura, 2005). Emotions are not akin to physical reactions; the same antecedent that brings about an emotion once is likely not to bring about that same emotion again even though the same antecedents repeatedly occur (Ekman, Friesen, & Simons, 1985).

Allowing Time for Expressions to Unfold The time frame analyzed has to be long enough for expressions to unfold. This may not have been the case in the field studies described earlier. For instance, in Ruiz-Belda and colleagues’ (2003) study, the time frame for expressions to be observed in bowlers facing the pins was 1.35 s. Assuming that a second is required for the ball to go through the pins and the final result to occur, that leaves about 0.35 s for sensation, perception, appraisal, emotion elicitation, innervation of the facial nerve, and facial muscle movement to onset and reach apex. Given that the bowlers immediately turned to the pit, their expressions may have occurred

569

while they were turning toward the pit. If so, then it makes sense that they would smile much more while facing the pit because of the short amount of time recorded while facing the pins. That the amount of time facial behavior was recorded in interaction was 2.83 s supports this notion, if the average time window of an expression unfolding (about 4 s) is considered (Ekman, 2003). Ruiz-Belda and colleagues (2003) attempted to remedy this time differential by examining facial behaviors during the first 1.35 s of the second, interactive phase. However, this procedure does not correct for the fact that the first time window is too short; the better procedure would have been for the first time window to be long enough to allow expressions to occur if they were going to.2

Measuring Expressions in Precise, Moment-to-Moment Fashion The face is one of the most complex signal systems available to humans, and facial expressions can be fleeting, rich, and incredibly varied. Observer judgments cannot capture the richness and variety of the face. For example Kraut and Johnson (1979) never distinguished between Duchenne smiles and non-Duchenne smiles. Duchenne smiles involve both the zygomatic major (which pulls the lip corners up) and the orbicularis oculi (the muscle surrounding the eyes); they have been associated with enjoyment (Ekman et al., 1990; Frank & Ekman, 1993; Soussignan, 2002). NonDuchenne smiles do not involve orbicularis oculi and occur when people smile to be pleasant or because of social circumstance, even though they may not feel positive emotion. It may have been possible, therefore, that the individuals in Kraut and Johnson’s study produced more non-Duchenne smiles in the social interaction conditions. Although Ruiz-Belda and colleagues (2003) scored the facial behavior with the Facial Affect Coding System (FACS; Ekman & Friesen, 1978), they did not differentiate among Duchenne and non-Duchenne smiles, either.

The Current Study The purpose of this study was to investigate expressive behavior in a naturalistic setting while addressing the methodological issues described earlier. We examined the spontaneous facial expressions of athletes immediately after they had just won or lost a medal 1 Ekman’s (1972) study is a notable exception. Studies of expressive behavior in marital interactions are also notable exceptions (Ekman, 2003). 2 The same is true for their study of soccer fans. The mean duration of the noninteractive phases was just 2.0 s, whereas the mean duration for the interactive phases was 7.2 s. If expressions of spontaneous emotion generally last on the face between 0.5 and 4 s after emotion is aroused, then the 2.0 s time frame for the noninteractive phase is just too short to fairly determine whether expressions will occur or not, given that emotion was aroused. Again, Ruiz-Belda and colleagues (2003) attempted to rectify this discrepancy by analyzing facial data from only the first 2 s of the interactive phase and demonstrated that smiles occurred more readily there anyway. However, if the timing of expression dynamics vis-a`-vis emotion arousal is considered, it would be expected that expressions occur precisely during that time frame, which would be within 4 s of the beginning of the emotional episode. That Ruiz-Belda and colleagues reported a high prevalence of smiling during this time frame, therefore, may actually be evidence in favor of the notion that smiles are natural expressions of the presence of happiness.

MATSUMOTO AND WILLINGHAM

570

match at the 2004 Athens Olympic Games judo competition. A judo match lasts for 5 min and starts with two contestants in a standing position vying for a grip on each other. There are throwing techniques that originate from standing, and there are grappling techniques on the ground. Points are awarded by throwing the opponent to the ground on the back or by applying a pin, choke, or armlock. Instant wins (the equivalent of a knockout in boxing) are awarded for clean throws to the back, pinning the opponent on the ground for 25 s, or when an opponent submits because of a choke or armlock. Because instant wins can occur at any time during a match, the outcome of a match is never decided until the end of competition time or when an instant win occurs. Athletes participate in a tournament system requiring them to compete in many matches in a single day; thus judo competition at the Olympic Games requires tremendous strength and conditioning. Because the Olympic Games occur only once every 4 years, winning or losing a medal here is one of the most powerful emotional experiences in the lives of these athletes.3 The expressions were photographed by a high-speed digital camera and measured in precise, moment-to-moment fashion using FACS coding (Ekman & Friesen, 1978), continuously from immediately after match completion until the winner was awarded the match—approximately 10 to 15 s—to allow for expressions to unfold. We coded all facial behaviors, not just smiles, to examine a range of expressions. Moreover, we examined the spontaneous expressions of the same athletes at two times during the medal ceremonies, once when interacting with a dignitary when receiving a medal, the second time on the podium posing. Contrasting expressions that occurred immediately at match completion in the heat of battle with those that occurred during the medal ceremonies allowed us to examine expressive behavior in different contexts, one of which was clearly more social and public.

Theoretical Questions Do Expressions Occur in Emotionally Evocative Situations? The main question addressed by our data concerns whether or not expressions occur at all in emotionally evocative situations. We contend that they do, provided the situations are highly charged, sufficient time is allotted to examine whether expressions occur, and expressions are measured with moment-to-moment precision.

The Expression of Victory Sport competition, like many achievement situations, elicits many emotions, especially related to success and failure. One of the most salient emotions for the winners is that of enjoyment. There are many different types of enjoyable emotions; Ekman (2003) distinguished among 16: five sensory pleasures, amusement, contentment, excitement, relief, wonder and awe, ecstasy and bliss, fiero (good feelings about oneself at the moment of accomplishment), elevation, gratitude, schadenfreude (joy in another’s misfortune), and naches (pride–pleasure in the achievement of one’s children). Shaver’s (Shaver et al., 1987, 2001) work on the emotion lexicon grouped positive emotions into two categories: love, with three subcategories (adoration, arousal, and

longing), and happiness with five (amusement, enthusiasm, contentment, enthrallment, and relief). Ekman (2003) suggested that all enjoyment emotions are signaled in the face by Duchenne smiles. Many studies have shown, in fact, that Duchenne smiles are related to enjoyment and amusement (Frank et al., 1993; Hess, Banse, & Kappas, 1995; Keltner, 1995; Keltner & Bonanno, 1997; Smith, 1995). To date, however, no study has examined the facial signals of enjoyment related to achievement and accomplishment. Our study does so.

The Expression of Defeat What about athletes who lose their medal matches? Of course winning any medal in Olympic competition is an amazing accomplishment. Folk beliefs suggest a linear decrease in positive emotions, with the gold medalists experiencing the most, followed by the silver medalists, and then bronze. However, in reality, the silver medalist loses the gold medal match, and the bronze medalists win their last match, capturing the bronze and avoiding going home without a medal. Medvec, Madey, and Gilovich (1995) showed that bronze medalists from the 1992 Barcelona Games appeared happier than silver medalists at the end of the match and on the podium, and that silver medalists’ comments were much more characterized by counterfactual thinking about “what might have been” and were associated with greater regret and less enjoyable emotion. What is the expression associated with defeat? It may simply be the case that athletes who lose their medal match—silver medalists and fifth placers—are less happy and thus smile less or differently; they may show nothing, neutralizing their expressions to be a good loser; or they may show other emotions. No study has examined these possibilities. In this study, we do so by examining the facial behaviors of the athletes who lose their medal match—silver medalists and fifth placers.

The Timing of Expressions Spontaneous emotional expressions last between 0.5 and 4 s on the face (Ekman & Friesen, 1982b; Frank et al., 1993; Hess & Kleck, 1990; Leonard, Voeller, & Kudau, 1991; Richardson, Bowers, Bauer, Heilman, & Leonard, 2000). We do not know, however, how much time is needed from the occurrence of an emotioneliciting stimulus to the beginning of its expressive reaction. Facial reactions to startling stimuli occur within 100 ms of stimulus occurrence (Ekman et al., 1985). However, startles are reflexes, not emotions, and probably bypass the emotion appraisal process. In the solitary emotion-elicitation condition of Schmidt, Cohn, and Tian’s (2003) study, the time from stimulus occurrence to the beginning of its expressive reaction was 2.4 s. Ruiz-Belda et al.’s (2003) data also showed that expressions generally occurred after 2 s but within 4 s of stimulus occurrence. Schmidt and colleagues, 3 We do not bring an uninformed background to this topic. David Matsumoto is a 6th-degree black belt in judo, team leader for the U.S. Olympic Judo Team in Atlanta (1996), director of international coaching and training for the U.S. judo team in Sydney (2000), and a technical official of the International Judo Federation in Athens (2004). Bob Willingham is a 4th-degree black belt in judo, former All England heavyweight champion, and former member of the British national judo squad.

SPONTANEOUS EXPRESSIONS

however, did not analyze their onset data systematically. This study contributes to this literature by directly measuring the time from stimulus occurrence (match ending) to the beginning of expression onset. Understanding this aspect of expression timing has implications for our theoretical understanding of emotion.

Expressive Differences as a Function of Social Situation There have been only a couple of studies that have examined expression differences in different contexts in the same individuals in a within-subjects design. Ekman (1972) demonstrated that, although Americans and Japanese both showed their displays of disgust, anger, fear, and sadness while alone, more Japanese smiled to mask their negative feelings when viewing the stimuli in the presence of a higher status experimenter. Ekman and Friesen (1969) coined the term cultural display rules to explain such expressive differences. Matsumoto and Kupperbusch (2001) extended these findings by demonstrating that collectivistic participants deamplified expressions of both positive and negative feelings when with the higher status experimenter. How might expressive behavior be different in the current study? We expected that all athletes regardless of place finish would spontaneously engage in smiling behavior during the medal ceremonies because of the very social nature of the event. Yet the specific types of smiles for silver medalists should be different than for the gold or bronze medalists, congruent with the findings of Medvec et al. (1995). We did not know exactly how, however, their smiles would differ, because expressive data from individuals in comparable situations do not exist. The data from the current study should allow us to examine whether all athletes do engage in smiling and whether the silver medalists’ smiles are different from those of the gold or bronze medalists.

Method The Setting The staging of the judo competition at the Olympic Games provided the perfect controlled environment for a naturalistic, field study. Judo competition was held at the Ano Liossia Competition Hall in Athens, Greece, which has a seating capacity of 8,000. In the stadium there were two competition areas in the field of play, both on an elevated platform. Each competition area consisted of an 8 ⫻ 8 m contest area, with a 3-m safety area bordering three of the outside edges and a 4-m safety area in between the common edges of the contest areas. A large runway ran around the entire competition areas for athlete and staff entrances and exits on one side and for all technical officials on the other three sides.4 The main spectator seating areas were along three of the sides of the competition area; the fourth side was reserved for VIPs and broadcast companies. The main television cameras were situated on the side of the spectators in between the competition areas. Bob Willingham, who is a professional photographer and the official photographer of the International Judo Federation, was situated in between both competition areas on the side of the technical officials, that is, opposite the main spectator seating and the main television cameras. Thus it was impossible for the photographer to obtain expressions when the athletes faced the crowd. There are seven weight categories each for men and women, and competition occurred for 7 days; each day one weight category for both sexes was contested. Competition occurs in a standard elimination system where winners proceed through preliminary rounds and then quarter-, semi-, and final rounds. Athletes who lose to the semifinalists are placed in a loser’s bracket (known as the repechage) and compete in an elimination

571

system. The gold medalist is the athlete who wins the final round (thus not having lost any matches); the silver medalist is the loser of the final round. Bronze medals are awarded to the winners of the match between the winner of the repechage and the loser of the semifinals; because there are two semifinals and two repechages, two bronze medals are awarded. Preliminary rounds started every morning at 10:30 a.m., and ended by 2 p.m. At 4:30 p.m. the semifinal, repechage finals, bronze medal matches, and then gold medal matches were contested for both men and women. During the preliminary, semifinals, repechage finals, and bronze medal matches, contests occurred on both competition areas; thus the photographer (Bob Willingham) took shots on both mats, alternating between one and the other, depending on the action and the athletes competing. Final, gold medal matches, however, occurred one at a time; thus the photographer was able to focus all attention on those matches. The photographer took action shots during the contests. For the purposes of this study, however, the photographer was instructed to also take shots of the athletes after match completion. Matches can end in one of five ways: time running out; an athlete is thrown cleanly with speed, force, and control, onto his or her back; an athlete pins the opponent to the ground for 25 s; an athlete submits because of the effects of a choke; or an athlete submits because of the effects of an armlock. Photographs were taken of the athletes after match completion until the match was awarded by the referee, a time period that generally occurred within 15 s. The photographer was told that the focus of the study was on expressions; however, no information was given about what specific type of expression or channel, no mention was made of emotion, and he had no formal training in psychology nor knew the literature related to the study or the specific hypotheses to be tested. Medal ceremonies occurred in the middle of the competition area, and generally about 30 min after the completion of the last match of the day. Athletes were marched in single file, stood behind the podium, stood up onto the podium when their names were called, and received their medal and wreath from a dignitary. After all athletes had received their medals, they stood for the playing of the national anthem of the gold medalist and then gathered on the gold medal podium for a group photo. They then marched around all four sides of the field of play, stopping to greet fans and allow their photos to be taken.

Photographic Equipment The camera used was a Nikon D2H professional digital camera. It has a high frames-per-second rate (8 frames per second, with 37-ms shutter time lag) and high resolution (4.1 megapixels effective). The camera was set to use auto focus and manual exposure using available light and shooting in JPEG file formats. The International Standards Organization range used was between 400 and 800, giving shutter speeds of around 1/500th of a second. A variety of interchangeable Nikkor lenses were used, including 28-70 f2.8, 70-200 f2.8, and 300-mm f2.8.

Photo Selection For the purposes of this study we selected photos from the bronze and gold medal matches. Although winning or losing any match can be an antecedent for joy or distress, we wanted to use situations that were clearly the most emotionally charged. These are the medal matches. When a medal match is completed, athletes are done with their Olympic competition, and the final outcome of their performance is determined. For many, it is the end of their competitive career. Although athletes compete in, win and lose, many matches throughout their career, competing in a medal round in the

4 In the Olympic Games, the field of play is a highly secure area, and only individuals with a certain security credential are allowed onto the field.

572

MATSUMOTO AND WILLINGHAM

Olympics is a once-in-a-lifetime experience. It is likely to elicit strong emotions, determining whether an athlete gets a medal or not (losers of the bronze medal matches do not get a medal) or wins the gold or not. Given the finality and rarity of the situation, it is highly likely that medal match completion, therefore, will be associated with strong emotions for both winners and losers. An additional merit to the focus on medal matches is the fact that the medalists participated in the medal ceremony. Whereas the medal matches are likely to lead to relatively uninhibited expressions because of the nature of the situation and competition (more on this later), the medal ceremonies are clearly a social event, produced for the purpose of a viewing audience both in the arena and on television. By focusing on the athletes in the medal matches, we had a chance to observe and measure their spontaneous behavior in two very different situations. This is not true, however, for all other athletes who are eliminated in the preliminary rounds. Approximately 3,000 shots were taken from each day’s competition, resulting in about 21,000 photographs across the 7 days of competition. Of these, we examined all shots taken immediately at the end of each medal match, from the precise moment in time when the match was over and the outcome was known to the time the decision was announced by the referee, a span of approximately 10 to 15 s. Shots were also examined at two times during the medal ceremonies—when the athlete received the medal from the dignitary and when the four athletes were on the gold medal stand posing. Across the 7 days of competition, this resulted in a preliminary selection of 2,735 shots. Because this study focused on facial behavior, photos were then selected for detailed FACS coding. Photos were selected if there was at least a clear profile view of the face of the athlete and if any facial muscles were contracted. When an expression was on a face, we examined the series of photographs that depicted the beginning of the expression to its end. We selected photographs in which expressions had reached their apex for detailed FACS coding. If an expression was already on a face, a new expression was determined to exist if different facial muscles were contracted or existing facial muscles changed by at least two intensity levels according to FACS coding (Ekman & Friesen, 1978). This resulted in the selection of 190 photographs, 117 from the medal matches and 73 from the medal ceremonies.5

Athlete Participants Whose Expressions Were Measured The individuals who contributed expressions in this study, therefore, were the 84 gold, silver, bronze, and fifth place winners of the judo competition at the 2004 Athens Olympic Games. They represented 35 different countries around the world from six continents. As such, they constituted a sample of the most culturally diverse individuals in whom spontaneous expressions that occurred in a highly charged, emotional event in three situations—immediately at the end of match completion and two times in the medal ceremonies— have been examined.

Expression Coding and Emotion Predictions All selected expressions were coded using Ekman and Friesen’s (1978) FACS. FACS identifies each of the functionally anatomical facial muscle movements (action units; AUs) that can occur independently, as well as head and eye positions. All expressions were coded by two certified FACS coders (one was David Matsumoto; the other was blind to the hypotheses and goals of the study); interrater reliability, calculated by doubling the number of codes on which coders agreed and dividing by the total number of codes used, was .79. All AU combinations were then compared with the Emotion FACS (EMFACS) dictionary to obtain emotion predictions (Ekman & Friesen, 1982a; Matsumoto, Ekman, & Fridlund, 1991). The dictionary was accessed via a computer program available to all researchers who have FACS data (Levenson, 2005). EMFACS identifies AUs that are theoretically

related to facial expressions of emotion posited by Darwin (1872/1998) and later Tomkins (1962, 1963) and empirically verified by studies of spontaneous expression and judgments of expressions by Ekman and colleagues over 20 years (Ekman et al., 1990; Ekman & Friesen, 1971; Ekman et al., 1980; Ekman, Friesen, & Ellsworth, 1972; Ekman et al., 1988; Ekman, Sorenson, & Friesen, 1969). The facial configurations associated with the emotion predictions were first listed in Ekman (1972) and in the original FACS manual (Ekman & Friesen, 1978). Prototypic examples of the emotion facial configurations were described in Ekman and Friesen’s (1975) Unmasking the Face and portrayed in their Pictures of Facial Affect (Ekman & Friesen, 1976) and the Japanese and Caucasian Facial Expressions of Emotion (Matsumoto & Ekman, 1988) sets. The EMFACS dictionary has been used in many published studies (Berenbaum & Oltmanns, 1992; Ekman et al., 1990, 1997; Keltner et al., 1995; Matsumoto, Haan, Gary, Theodorou, & Cooke-Carney, 1986; Rosenberg & Ekman, 1994; Rosenberg, Ekman, & Blumenthal, 1998, Rosenberg et al., 2001; SteimerKrause, Krause, & Wagner, 1990), as well as in studies that used FACS and then virtually the same dictionary codes to produce emotion predictions but did not mention EMFACS (Chesney et al., 1990; Ekman et al., 1988; Frank et al., 1993; Gosselin et al., 1995; Heller & Haynal, 1994; Keltner, 1995; Levenson, Carstensen, Friesen, & Ekman, 1991; Messinger, Fogel, & Dickson, 2001; Ruch, 1993, 1995; Sayette et al., 2003). The results of many of these more recent studies were used to adjust the EMFACS emotion predictions. The predictions were then reviewed by Paul Ekman, who was blind to the condition (match vs. medal, winner vs. loser) from which the AUs were generated. Ekman checked all predictions and provided updated predictions based on later findings between face– emotion relationships that had not been updated in the EMFACS dictionary. In actuality, these resulted in only 10 updates of 230 expressions.

Results Did Expressions Occur at Match Completion? Of the 84 athletes, there were no usable photos for 6.6 Of the remaining 78, 67 (86%) provided at least one expression that was FACS codable. Of these, 33 (49%) provided 2 expressions, 13 (19%) provided 3, and 5 (7%) provided 4. Of the 118 expressions coded, only 5 did not produce an emotion prediction by the EMFACS dictionary (see Table 1). Thus, the vast majority of the athletes produced expressions at match completion, and these corresponded to emotions predicted by EMFACS. (A complete listing of all FACS codes used and their emotion predictions is available from David Matsumoto.) Recall that it was impossible for us to obtain expressions when athletes were facing the crowd. To examine whether the athletes were facing others when they produced their expressions, we coded whether or not the athlete was directly facing anyone— crowd, opponent, referee—when the expression was captured. Seventy-two percent of the first expressions occurred when the athletes were not facing anyone. 5 The number of predictions is larger than the number of photographs selected because the final podium shots were photographs that involved expressions by all four medalists in a single shot. 6 One bronze medal match did not occur because of injury to 1 of the athletes. Another bronze medal match ended quickly, and the photographer was focused on the other simultaneously occurring bronze medal match. And for two gold medal matches there was no usable photo for the silver medalist.

SPONTANEOUS EXPRESSIONS

573

Table 1 Raw Frequencies of Athletes Displaying Various Expressions at Match Completion and During Medal Ceremonies Occasion

Type of expression

Match completion

Smiles Duchenne smiles by themselves Duchenne smiles with open mouth Duchenne smiles with sadness Other smiling Contempt Disgust Fear Sadness Undifferentiated negative No prediction No expression No usable photo Total Smiles Duchenne smiles by themselves Duchenne smiles with open mouth Duchenne smiles with control Non-Duchenne smile Non-Duchenne smiles with control Smile-sadness blends Contempt No expression Total Smiles Duchenne smiles by themselves Duchenne smiles with open mouth Duchenne smiles with control Non-Duchenne smile Non-Duchenne smiles with control Smile-sadness blends Contempt Sadness No prediction Total

Receipt of medal

Podium posing

All analyses involving match completion presented subsequently used only the first expression displayed by each athlete, to avoid problems of independence of the data.

The Expression of Victory Thirteen of the 14 gold medalists and 18 of 26 bronze medalists (2 had unusable photos) smiled (binomial test with chance set conservatively at 50% here and later, z ⫽ 2.67, p ⬍ .01, and z ⫽ 1.96, p ⬍ .05, respectively; see Table 1). Of these 31 smiles, 29 were Duchenne smiles, and 24 of these were open mouth. These results provide support for the notion that the Duchenne smile is a signal of enjoyment, replicating previous studies (Frank et al., 1993; Hess et al., 1995; Keltner & Bonanno, 1997; Smith, 1995).

The Expressions of Defeat None of the silver medalists, and only 1 of the 26 fifth placers, smiled (z ⫽ –3.74, p ⬍ .0001, and z ⫽ 4.71, p ⬍ .00001, respectively; see Table 1). There was not a unique expression of defeat; most athletes displayed sadness (43% and 35%), nothing (29% and 23%), or contempt (14% and 15% for silver medalists and fifth placers, respectively). It is possible that the 10 athletes

Gold

Silver

Bronze

Fifth

2 10 0 1 0 0 0 0 0 1 0 0 14

0 0 0 0 2 0 0 6 0 0 4 2 14

1 14 2 1 1 0 0 5 0 1 1 2 28

0 1 0 0 4 1 1 9 1 3 6 2 28

2 12 0 0 0 0 0 0 14

3 3 3 1 1 2 0 1 14

3 20 3 0 1 0 1 0 28

3 9 1 1 0 0 0 0 0 14

4 1 3 1 0 1 1 2 1 14

2 13 6 1 4 0 0 0 2 28

who showed nothing expressed shame in other, nonfacial cues, and future studies examining nonfacial, nonverbal behaviors can investigate this possibility. The difference between the facial behaviors of the victors (gold and bronze medalists) and defeated athletes (silver medalists and fifth placers) was striking (see Table 2, top). The victors were much more likely to smile, especially to display Duchenne smiles; the losers were much more likely to display negative emotions or nothing. These findings are directly in line with those of Medvec et al. (1995), who showed that the 1992 Barcelona Olympic silver medalists were judged less happy than bronze medalists at match completion. Moreover it is not the case that the silver medalists (and fifth placers) are just less happy; those who displayed something displayed discrete, negative emotions. We then examined whether the distribution of expressions differed according to culture. Because of small sample sizes for individual countries, we combined them into three categories: North America–Western Europe (Australia, Austria, Belgium, Canada, Spain, France, Great Britain, Germany, Greece, Israel, Italy, The Netherlands, the United States), East Asia (China, Japan, South Korea, Mongolia, North Korea), and all others (Algeria, Argentina, Azerbaijan, Belarus, Brazil, Bulgaria, Cuba, Estonia,

MATSUMOTO AND WILLINGHAM

574

Table 2 Cross-Tabulations of Type of Expression Based on Emotion Facial Action Coding System Predictions at Three Different Points in Time: Match Completion, Receiving Medal, and On the Podium Gold and bronze

Silver and fifth

Any smiling (AU 12) All other expressions

31 11

Duchenne smiles All other expressions

␹2

df

p

r

1 41

45.43

1

.00

.74

29 13

1 41

40.65

1

.00

.70

Negative emotions or no expressions All other expressions

6 36

34 8

37.42

1 ⬍.001

.67

All Duchenne smiles All other expressions

40 2

9 5

9.20

1

.0024

.41

All uncontrolled Duchenne smiles All other expressions

37 5

6 8

12.06

1

.0005

.46

All Duchenne smiles All other expressions

34 8

8 6

3.18

1

.0747

.24

All uncontrolled Duchenne smiles All other expressions

27

5

3.5

1

.0613

.25

15

9

2 40

5 9

9.20

1 ⬍.01

Occasion Match completion

Receipt of medal

Podium posing

Type of expressions

Negative emotion or uninterpretable expressions All other expressions Note.

.41

AU ⫽ action unit.

Georgia, Iran, Moldova, Poland, Romania, Russia, Slovenia, Tunisia, Ukraine). This gross classification separated North America–Western Europe from East Asia on a number of cultural dimensions, including Hofstede’s (2001) Individualism versus Collectivism, Power Distance, and Long versus Short Term Orientation, and Schwartz’s (2004) Affective and Intellectual Autonomy, Egalitarianism, Embeddedness, and Hierarchy, with the other countries occupying a more intermediate position. We computed three chi-square values, one for each three-way contingency table of the match completion data in the top of Table 2, using type of expression, place finish (gold and bronze vs. silver and fifth), and culture (three categories mentioned earlier) as independent variables. None were significant: smiling versus all other expressions, ␹2(4, N ⫽ 84) ⫽ 0.70, ns; Duchenne smiles versus all other expressions, ␹2(4, N ⫽ 84) ⫽ 0.37, ns; negative emotions or no expressions versus all other expressions, ␹2(4, N ⫽ 84) ⫽ 0.23, ns, indicating that there were no cultural differences in expression at match completion.

How Much Time Elapsed From the Occurrence of an Emotion-Eliciting Stimulus to Its Expressive Reaction? The photographic equipment date and time stamped each photograph (to the second), along with information about technical settings. We calculated the time that elapsed after each match ended to the onset of the expression that was FACS coded, from

onset to apex (which was FACS coded), and the total time from end of match to apex (see Table 3). The relevant data are for the gold and bronze medalists; the data for the silver medalists and fifth placers are not valid markers of expression timing because the photographer captured the expressive behavior of the victors first (common in sport photography) and then the defeated athletes after that. The mean time from match completion to expression onset was 2.90 s and 2.57 s for the gold and bronze medalists; from match completion to apex, it was 3.48 s and 3.57 s, respectively. These data suggest that the time windows used by previous researchers who failed to show that expressions occurred in emotionally evocative contexts (Ruiz-Belda et al., 2003) were too short to allow for Table 3 Mean Time (in Seconds) From Match End to Expression Onset, Expression Onset to Expression Apex, and Match End to Expression Apex Place finish

Match end to expression onset

Expression onset to expression apex

Match end to expression apex

Gold Silver Bronze Fifth

2.90 7.33 2.57 6.14

0.58 1.63 1.00 2.40

3.48 8.96 3.57 8.54

SPONTANEOUS EXPRESSIONS

the expressions to occur in the first place (under 2 s after stimulus). That these same researchers demonstrated that most expressions occurred in the first 2-s window of the subsequent phase of analysis is congruent with our data, as that would be the normal time for expressions to unfold anyway.

How Did Expressive Behavior Differ During the Medal Ceremonies? Receiving the medal. As expected, 54 of the 56 athletes who participated in the medal ceremonies smiled when they received their medal (z ⫽ 6.95, p ⬍ .0001). However, when the specific type of smile was differentiated, differences emerged according to place finish. All 14 gold medalists displayed Duchenne smiles, 12 of them with the open mouth (z ⫽ 2.67, p ⬍ .01). Only 6 of the 14 silver medalists displayed Duchenne smiles by themselves or with the open mouth. Three displayed controlled Duchenne smiles (described more fully subsequently). One silver medalist displayed a non-Duchenne smile, and 1 displayed a controlled, nonDuchenne smile. Two displayed smile–sadness blends, and 1 displayed nothing. Because facial actions related to control were prominent in the medal ceremonies, we describe them in more detail here. Behaviorally, these were smiles that co-occurred with buccinator (AU 14), sometimes in combination with mentalis and/or orbicularis oris (AUs 17 and 24). These lower face actions give the appearance that the expresser is making a conscious effort to control their facial behaviors and/or words, as if they are “biting their lip.” That they often occurred with both Duchenne and non-Duchenne smiles suggested that these facial actions qualified the meaning of the smile, adding information to the message of the smile beyond the signal of enjoyment. The bronze medalists’ data were also revealing. Twenty of the 28 athletes displayed the open-mouth version of the Duchenne smile (z ⫽ 2.27, p ⬍ .05). Three displayed Duchenne smiles by themselves, 3 displayed controlled Duchenne smiles, 1 displayed a controlled, non-Duchenne smile, and 1 displayed contempt. Collapsing the data demonstrated differences as a function of medal (see Table 2). Gold and bronze medalists were much more likely to display Duchenne smiles, and especially uncontrolled Duchenne smiles, than were silver medalists. These same findings occurred when only the silver and bronze medalists’ data were compared against each other. These data provide behavioral evidence for the judgment data originally reported by Medvec et al. (1995). The silver medalists indeed did not display felt, enjoyable emotions as much as either the gold or the bronze medalists. We tested for cultural differences in these expressions using the country classification described earlier and computing chi-square values on the 2 three-way contingency tables in the middle of Table 2. Neither was significant: all Duchenne versus all other expressions, ␹2(4, N ⫽ 56) ⫽ 0.90, ns; all uncontrolled Duchenne smiles versus all other expressions, ␹2(4, N ⫽ 56) ⫽ 0.35, ns. Thus there were no cultural differences in smiling behavior when athletes received their medals. Podium posing. The athletes’ expressions when they posed on the podium were also revealing. Fifty of the 56 athletes displayed some kind of smile, but when the specific type of smile was examined, differences emerged. All of the gold medalists smiled; 13 of 14 were Duchenne smiles (z ⫽ 3.21,

575

p ⬍ .01), and 9 of these were the open-mouth version. In contrast, only 9 of the 14 silver medalists smiled, and only 5 of these were uncontrolled. Three were controlled Duchenne smiles, 1 was a non-Duchenne smile, and 1 was a smile–sadness blend. Two of the silver medalists displayed sadness, 1 contempt, and 1 an uninterpretable expression. Twenty-six of the 28 bronze medalists smiled; 13 of these were open-mouth Duchenne smiles, 2 were Duchenne smiles by themselves, 6 were controlled Duchenne smiles, 4 were controlled nonDuchenne smiles, and 1 was a non-Duchenne smile. Comparisons of the medalists’ expressions (see Table 2) demonstrated again that the gold and bronze medalists were more likely to display Duchenne smiles and uncontrolled Duchenne smiles. Silver medalists were more likely to display negative or uninterpretable emotions. These data also provided behavioral evidence for the data reported by Medvec et al. (1995). We tested for cultural differences in these expressions using the country classification described earlier and computed chi-square values on the 3 three-way contingency tables on the data in the bottom of Table 2. The chi-square for all Duchenne smiles versus all other expressions was significant, ␹2(4, N ⫽ 56) ⫽ 14.71, p ⬍ .01. Follow-up analyses revealed that gold and bronze medalists from North America–Western Europe and East Asia were much more likely to display Duchenne smiles (96%) than all other expressions, whereas gold and bronze medalists from other countries were more likely to display other expressions, particularly non-Duchenne smiles (47%), than Duchenne smiles. The threeway chi-square values for the other two analyses were both nonsignificant. The data from both points of the medal ceremonies suggested that, despite the fact that the overwhelming majority of athletes smiled, in reality there were differences in their smiles, and those differences may have been reliably related to different expressions displayed at match completion. To test this idea, we crosstabulated whether or not the athletes displayed Duchenne smiles at the end of the match with whether or not they displayed Duchenne smiles during the medal ceremonies (see Table 4). Those who showed signs of genuinely enjoyable emotions at the end of the match were more likely to show those same signs of enjoyable emotions when they received the medal and posed on the podium. Conversely, those who did not display signs of enjoyable emotions at the end of the match were likely to not display such signs during the medal ceremonies, despite the fact that most athletes smiled. The same findings occurred when Duchenne smiles were differentiated according to whether or not they were accompanied by facial signs of control.

Discussion This study produced provocative findings about facial expressions of emotion in emotionally charged, naturalistic situations; however, it was not conducted without limitations. First, we did not obtain self-reports of experience because it was logistically impossible. Although we have been careful to focus on the link between emotionally evocative situations and expressive behavior, and although we believe that the expressions are themselves tied to powerful emotional states in the athletes, our data cannot be used to make claims about the relationship between expression and experiences. Second, we used only one photographer, who was

MATSUMOTO AND WILLINGHAM

576

Table 4 Cross-Tabulation of the Expressions Produced at Match Completion With Expressions Produced When Athletes Received Their Medals End of match

Occasion

Type of expression

Any Duchenne smiles

Receipt of medal

Any Duchenne smiles All other expressions

27 0

22 7

Uncontrolled Duchenne smiles

All other expressions

25 2

18 11

Receipt of medal

Uncontrolled Duchenne smiles All other expressions

Any Duchenne smiles On the podium

On the podium

Any Duchenne smiles All other expressions

Uncontrolled Duchenne smiles All other expressions

All other expressions

All other expressions

23 4

19 10

Uncontrolled Duchenne smiles

All other expressions

20 7

12 17

situated in the middle of the two competition areas. This severely limited the photos to be analyzed, because he could only focus on 1 athlete at a time. (This was less a problem for the gold medal matches, which occurred one at a time.) Also, the dynamic nature of the competition precluded us from capturing expressions that may have occurred when athletes were facing away from the camera. For these reasons we have probably underestimated the expressions that did occur.

Do Expressions Occur in Emotionally Evocative Contexts in Naturalistic Settings That Should Elicit Strong Emotions? Yes. Eighty-six percent of the athletes provided an expression at match completion (within 2.5–3 s after stimulus onset for the gold and bronze medalists), and the EMFACS dictionary characterized 97.46% of them, producing predictions for a wide range of emotion, including contempt, disgust, fear, sadness, and multiple types of smiles. The expressions corresponded to those reported previously by Ekman (Ekman, 1972; Ekman & Friesen, 1971; Ekman et al., 1972, 1969) and others (reviewed earlier), in Ekman and Friesen’s (1975) Unmasking the Face, in their stimulus set Pictures of Facial Affect (Ekman & Friesen, 1976), and in Matsumoto and Ekman’s (1988) Japanese and Caucasian Facial Expressions of Emotion (JACFEE) set. That there were no cultural differences in the first expressions at match completion is supportive of the universality of these expressions to occur when emotion is aroused. These results are in contrast to the findings of previous

␹2

df

p

Odds ratio

R

7.45

1

.0063

n/a

.37

␹2

df

p

Odds ratio

R

7.31

1

.0068

␹2

df

p

2.885

1

.0894

␹2

df

p

6.103

1

.0134

7.64

Odds ratio 3.03

Odds ratio 4.05

.36

r .23

r .33

field studies reporting nonfindings (Fernandez-Dols & Ruiz-Belda, 1995; Fernandez-Dols, Sanchez, Carrera, & Ruiz-Belda, 1997; Kraut & Johnson, 1979; Ruiz-Belda et al., 2003). We contend that the methodology we used corrected methodological limitations of the previous studies. Some may argue that the expressions were produced because the athletes were in a social situation. Indeed, the athletes were competing against each other, being judged by referees, on center stage in a packed auditorium, and on television. In addition, athletes compete under the rules of competition, which regulate specific actions that can and cannot occur. Thus the event has many social ties, and athletes need to internalize many social conventions that affect behavior, even with minimal conscious awareness. We argue, however, that these factors probably did not affect the very first expressions displayed at match completion (which are the ones we analyzed) for a number of reasons. The Olympic Games is the pinnacle of sport competition, and Olympic athletes compete in the most intense matches of their lives, and perhaps for the only times in their lives. Judo is combat, and competition requires tremendous concentration and focus, because athletes struggle to throw, pin, strangle, or apply joint locks to each other. Each match requires extraordinary strength and conditioning, as athletes’ hearts have been clocked at 200 beats per minute. Being in a medal match means that the athletes have won numerous matches earlier in the day, and by the time they are in the medal match they are at the edge of physical and mental exhaustion, in the most important match of

SPONTANEOUS EXPRESSIONS

their lives, in the most exciting and important sporting event in the world. Moreover, because instant wins can occur at any time during a judo match, outcomes are never decided until the very end, no matter how much of a lead an athlete has. Our main analyses used the very first expressions produced, which began 2.5 to 3 s after match completion, and were thus the very first reactions of the athletes. The setup precluded us from obtaining expressions when athletes faced the crowd, because the photographer was opposite the crowd and main television cameras, and most expressions occurred before the athlete turned to face the crowd. Finally, 72% of the coded expressions occurred when the athletes were not directly facing anyone. For these reasons we contend that, at the precise moment when matches were completed and outcomes determined, athletes’ initial expressions probably were not produced because of the social nature of the event. After having been engaged in combat in the most important match of their lives, the athletes’ expressions were probably reflections of their emotional reactions to the outcome of the match, relatively unaffected by the social nature of the event (although there is an interesting possibility that the intensity of the expressions differed depending on whether the athlete was socially engaged or not; this should be followed in future studies). Combat is a social event, but the expressions occurred not because combat is a social event but because of the results of that combat. That they occurred within a social event is not surprising because emotions evolved to deal with social events of great import. Emotions and expression can, however, of course occur when alone as well, but there is little question that battle is a social event that elicits strong emotional responses, just as birth and seduction. Although we did not analyze the second, third, or fourth expressions after match completion because of statistical independence issues and small sample size, they provide cues about how social convention may have influenced expression soon after the athletes’ initial reactions. One of the gold medalists, for instance, produced a large Duchenne smile immediately when he threw his opponent for an instant win (AUs 6E ⫹ 12E ⫹ 25), 1 s after the referee announced the score, and while he was facing away from the crowd. Three s later, however, after he got up and before the match was awarded, he controlled his large smile by pushing his lower lip up, tightening the corners of his lips, and rolling his lips together (AUs 6C ⫹ 12C ⫹ 14B ⫹ 17B ⫹ 28B). One s after that he again produced a Duchenne smile without the control, which lasted for several seconds. Then 10 s after that he again produced the Duchenne smile with the control. Another way in which expressions changed over the course of a few seconds was when initial smiling faces were transformed to blends of happiness and sadness. These occurred only in individuals who won their medal match (1 gold medalist and 8 bronze medalists). In these cases these blended expressions occurred after an athlete’s initial smile, where the athlete began to show sadness– distress in addition to the smile as he or she began to cry. Although there are several theories that account for such crying when happy (Frijda, 2001; Scheff, 1979; Vingerhoets & Cornelius, 2000), we believe that one way to understand this phenomenon in the context of achievement is related to the display rule of not showing one’s sadness or distress if one loses (i.e., being a good loser). That is, both athletes probably felt distressed about the impending outcome of the match and the possibility that they might lose. For winners,

577

however, this display rule is lifted once the outcome is determined, and thus it becomes more appropriate for them to display their distress. (A different display rule that exists, however—namely, to be a good winner and not boast—would explain why some victorious athletes also curb their expressions of joy as described earlier.) For athletes who lose their final match, however, the display rule remains. Blended expressions of happiness and sadness did occur in the silver medalists, in fact, but only during the medal ceremonies. These probably reflected either the athlete’s simultaneous and genuine joy of accomplishment and the sadness of having lost the final match, or sadness of having lost the final match qualified by a smile.

The Facial Signs of Victory and Defeat The facial signs of victory were Duchenne smiles and, in particular, the open-mouth version of the Duchenne smile. These data provide further support for the view that Duchenne smiles are associated with enjoyable emotions (Ekman et al., 1990; Frank et al., 1993; Hess et al., 1995; Keltner & Bonanno, 1997; Smith, 1995). Because no other expression was as dominant among the victors, the data also suggest that the Duchenne smile may be the only facial marker of different types of enjoyable emotions (Ekman, 2003), including fiero—the joy of victory. There is probably evolutionary reason why this may be so. Facial behaviors provide rapid, reliable communication of emotional states to others, and it may not have been as adaptively necessary to communicate different enjoyable emotions to others immediately and from a distance as it was to communicate differentiated negative emotions such as anger, disgust, or fear. Enjoyable emotions do not represent immediate threats to survival; when one is threatened, however, it is important to know whether to run or attack. Future research will need to examine whether other channels communicate distinctly different enjoyable emotions. Ekman (2003) suggested that the voice may differentiate enjoyable emotions. Tracy and Robins (2004) indicated that pride is associated with direct eye contact; Duchenne smiles; an open, expanded, body posture; and hand or hands above the head or on the hips. The expressions of the defeated athletes were strikingly different. Of the 42 athletes who lost their medal match, only 1 smiled; the others showed a variety of negative emotions, including sadness, contempt, disgust, and fear. Moreover, a not insubstantial number of them also displayed no emotion. That they did not simply show less smiling strongly suggests that their emotional experiences were substantially different than those of the gold and bronze medalists; thus, there is not a linear decrease in smiling from gold, silver, and bronze medalists. Of course winning a silver medal at the Olympic Games should be a cause for joy in anyone. Thus, it is interesting that silver medalists (and fifth placers) did not smile. In fact, there are several reasons why this might have occurred. First, all athletes at this level of competition are intense competitors who do not like to lose. They were probably extremely disappointed by the fact that they lost their match, which leaves a bittersweet aftertaste when competition is completed. Thus, they probably did not smile because losing is not enjoyable. Because we did not obtain self-reports of subjective experience from the athletes, we can only speculate about why they displayed the various negative emotions they did. One explanation might be

578

MATSUMOTO AND WILLINGHAM

that the expressions of sadness, contempt, disgust, and fear all signal a single, undifferentiated, negatively valenced emotional state (Russell et al., 2003; Russell & Feldman Barrett, 1999). We do not, however, agree with this possibility, because it is unclear as to why the same antecedent (losing the match) and the same supposed emotional reaction (an undifferentiated negative one) would lead to uniquely different expressions, regardless of what label EMFACS might call it. Instead, we believe that losing the match was appraised in different ways that led to different emotional reactions, which, in turn, led to discretely different emotion signals. There are several ways this can occur. For instance, countries differ considerably in the pressure placed on athletes to win Olympic medals. In some countries, the difference between gold and silver can mean the difference between a life of comfort or not, stardom or not, and making a living or not. Moreover, there are many prior expectations coming into the games. Winners of the previous year’s world championships, for example, are often considered favorites to win Olympic gold. When they do not, they may not be as joyful with a silver or a bronze medal, despite the fact that obtaining a silver or a bronze medal at the Olympic Games is a tremendous achievement. Both silver medalists who displayed contempt, for example, and the lone bronze medalist, were previous world champions who came to the Olympics as gold medal favorites and who might have appraised the situation as one in which they were superior to the winner despite the loss. A not insubstantial number of silver medalists and fifth placers displayed nothing on their faces or displayed expressions that were not interpretable. We believe that this finding is related to the display rule for athletes to “be a good loser.” This interpretation is bolstered by the fact that all of these expressions involved relatively strong, bilateral buccinator activity (AU 14), oftentimes with lower lip raise (AU 17) and lip press (AU 24). This same muscle action (AU 14) denoted controlled smiling. And these expressions occurred later than the initial reactions (see Table 4), which, as we discussed earlier, probably allowed time for display rules and social conventions to begin influencing expression more. We believe that these expressions occurred because the athletes were controlling their facial displays so as to not signal their disappointment at having lost the match. Tomkins (1978) suggested that such lower face facial actions signal backed-up affect. When occurring with anger, Tomkins (1978) suggested that these facial actions evolved to prevent angry individuals from biting and attacking others. We speculate, therefore, that there is no unique face of defeat. Instead, athletes appraise losses in individual ways, some eliciting sadness and distress from not obtaining their goal of winning, others being superior over their opponents, others being disgusted at the opponent or the result, and others still being fearful of the consequences of having lost. In addition, there may be individual differences in expressivity that may have influenced these expressions (Gross & John, 1995; King & Emmons, 1990; Kring, Smith, & Neale, 1994), as well as those of the victors (e.g., differentiating between those who continuously smile with or without controls vs. those who smile and then cry). Differences in the meaning and thus appraisals of the loss for each of the athletes, therefore, bring about different emotional reactions, which elicit different expressions (Brown & Dutton, 1995).

How Much Time Elapses From the Occurrence of an Emotion-Eliciting Stimulus to Its Expressive Reaction? The expressions began between 2 and 3 s after the eliciting stimulus occurred and reached apex within 3 and 4 s. These findings explain why Ruiz-Belda et al. (2003) did not find expressions in the solitary conditions of their experiment; the time frame they examined (1.35 s from stimulus occurrence) was not long enough to allow for expressions to unfold in reaction to the antecedent event. In fact, their data demonstrated that most smiles occurred within 4 s of the antecedent, which is congruent with our data. These data open the door to studies of how emotional reactions are activated. Expressive behaviors in the face are one component of the emotion package, and different components may have different timing features. Stepping onto a curb and perceiving a bus coming at you at the last minute would elicit almost immediate (certainly less than 2.5 s) motor response to get out of the way, suggesting that the motor response system has a different timing characteristic than the face. These motor responses, in turn, probably have different timings than cognitions and physiology. It is also possible that these timing characteristics differ by emotion and context. Fear may produce relatively quick reactions; enjoyable emotions may produce relatively longer ones. Moreover there is probably a large moderating role of the contents of activated states of the mind when emotion is aroused. The athletes in our study were engaged in intense combat until match completion. When the match finally ended, athletes needed to switch from “combat mode” to a mode in which they could process and evaluate the results. When an individual is in a comedy club with friends, however, and has been repeatedly laughing at jokes, the time from stimulus occurrence (the next joke) to expression may be considerably shorter because the already activated network makes it easier for such emotions to be elicited and expressed. Future studies can examine how different activated states of mind may moderate the timing of emotional responses, and the differences among different types of responses.

The Effects of Social Situation on Expressive Behavior Nearly all athletes spontaneously smiled during both periods of the medal ceremonies, probably because of the highly staged and public nature of the ceremonies. Here, athletes have had time to process the results of their performance, need to interact with dignitaries, and are pressured to put on a good face for the crowd and television. That this was true for the silver medalists, especially given the fact that none of them had smiled at match completion and nearly all had displayed a negative expression or no expression, demonstrates the powerful influence of social context on expressive behavior. However, the smiles of the silver medalists were differentiated from the smiles of the gold and the bronze medalists. Gold and bronze medalists displayed Duchenne smiles, whereas silver medalists were more likely to display controlled Duchenne smiles, non-Duchenne smiles, or smiles blended with sadness. On the podium, after receiving the medal and after the national anthem of the gold medalist was played, some silver medalists did not smile at all, instead displaying contempt, sadness, or uninterpretable expressions. These data suggested that, although the silver med-

SPONTANEOUS EXPRESSIONS

alists attempted to be socially appropriate by smiling during the medal ceremonies, they probably did not experience solely enjoyable emotions. Instead they were probably either experiencing negative emotions and masking or qualifying them with smiles or were experiencing blends of enjoyable and negative emotions. One of these might be regret, which would be commensurate with the findings of Medvec et al. (1995). It is interesting that it is only on the podium that cultural differences in expression were observed. The fact that cultural differences emerged here and not immediately at match completion nor while interacting with the dignitary is probably due to the fact that the effects of context were too powerful in these latter situations to allow for cultural differences to emerge. At match completion, the initial expressions were probably largely determined by the athlete’s strong emotional reactions to the outcomes. When athletes were interacting with the dignitary, their expressions were probably largely determined by the fact that they were in a highly public situation and were interacting with a dignitary. On the podium, however, the pressure of the medal ceremonies and interacting with the dignitary was lifted, loosening the power of context to override culture and allowing some cultural differences to occur. However, our data cannot speak to what aspect of culture produced the differences. There were no differences between North Americans–Western Europeans and East Asians, thus ruling out a simple individualism versus collectivism interpretation. Future studies will need to examine more specifically what aspects of culture helped to produce these, and other, cultural differences in spontaneous expressions.

References Berenbaum, H., & Oltmanns, T. (1992). Emotional experience and expression in schizophrenia and depression. Journal of Abnormal Psychology, 101, 37– 44. Bonanno, G. A., & Keltner, D. (2004). The coherence of emotion systems: Comparing “on-line” measures of appraisal and facial expressions, and self-report. Cognition & Emotion, 18, 431– 444. Brown, J. D., & Dutton, K. A. (1995). The thrill of victory, the complexity of defeat: Self-esteem and people’s emotional reactions to success and failure. Journal of Personality and Social Psychology, 68, 712–722. Camras, L. A., Oster, H., Campos, J., Miyake, K., & Bradshaw, D. (1992). Japanese and American infants’ responses to arm restraint. Developmental Psychology, 28, 578 –583. Charlesworth, W. R., & Kreutzer, M. A. (1973). Facial expressions of infants and children. In P. Ekman (Ed.), Darwin and facial expression (pp. 91–168). New York: Academic Press. Chesney, M. A., Ekman, P., Friesen, W. V., Black, G. W., & Hecker, M. H. (1990). Type A behavior pattern: Facial behavior and speech components. Psychosomatic Medicine, 52, 307–319. Chevalier-Skolnikoff, S. (1973). Facial expression of emotion in nonhuman primates. In P. Ekman (Ed.), Darwin and facial expression (pp. 11– 89). New York: Academic Press. Darwin, C. (1998). The expression of emotion in man and animals. New York: Oxford University Press. (Original work published 1872) Davidson, R. J. (2003). Parsing the subcomponents of emotion and disorders of emotion: Perspectives from affective neuroscience. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 8 –24). New York: Oxford University Press. Ekman, P. (1972). Universal and cultural differences in facial expression of emotion. In J. R. Cole (Ed.), Nebraska Symposium on Motivation (pp. 207–283). Lincoln: University of Nebraska Press.

579

Ekman, P. (1992). Are there basic emotions? Psychological Review, 99, 550 –553. Ekman, P. (1994). Strong evidence for universals in facial expressions: A reply to Russell’s mistaken critique. Psychological Bulletin, 115, 268 – 287. Ekman, P. (1999). Basic emotions. In T. D. A. T. Power (Ed.), The handbook of cognition and emotion (pp. 45– 60). Sussex, England: John Wiley. Ekman, P. (2003). Emotions revealed. New York: Times Books. Ekman, P., Davidson, R. J., & Friesen, W. V. (1990). The Duchenne smile: Emotional expression and brain physiology: II. Journal of Personality and Social Psychology, 58, 342–353. Ekman, P., & Friesen, W. (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1, 49 –98. Ekman, P., & Friesen, W. (1971). Constants across culture in the face and emotion. Journal of Personality and Social Psychology, 17, 124 –129. Ekman, P., & Friesen, W. V. (1975). Unmasking the face: A guide to recognizing emotions from facial clues. Englewood Cliffs, NJ: Prentice Hall. Ekman, P., & Friesen, W. (1976). Pictures of facial affect. Palo Alto, CA: Consulting Psychologists Press. Ekman, P., & Friesen, W. V. (1978). Facial action coding system: Investigator’s guide. Palo Alto, CA: Consulting Psychologists Press. Ekman, P., & Friesen, W. (1982a). EMFACS. Unpublished manuscript. Ekman, P., & Friesen, W. V. (1982b). Felt, false, and miserable smiles. Journal of Nonverbal Behavior, 6, 238 –258. Ekman, P., Friesen, W., & Ancoli, S. (1980). Facial signs of emotional experience. Journal of Personality and Social Psychology, 39, 1125– 1134. Ekman, P., Friesen, W. V., & Ellsworth, P. (1972). Emotion in the human face: Guidelines for research and an integration of findings. New York: Pergamon Press. Ekman, P., Friesen, W. V., & O’Sullivan, M. (1988). Smiles when lying. Journal of Personality and Social Psychology, 54, 414 – 420. Ekman, P., Friesen, W. V., & Simons, R. C. (1985). Is the startle reaction an emotion? Journal of Personality and Social Psychology, 49, 1416 – 1426. Ekman, P., Levenson, R. W., & Friesen, W. V. (1983, September 16). Autonomic nervous system activity distinguishes among emotions. Science, 221, 1208 –1210. Ekman, P., Matsumoto, D., & Friesen, W. (1997). Facial expression in affective disorders. In P. Ekman & E. L. Rosenberg (Eds.), What the face reveals: Basic and applied studies of spontaneous expression using the facial action coding system (FACS) (pp. 331–341). New York: Oxford University Press. Ekman, P., Sorenson, E. R., & Friesen, W. V. (1969, April 4). Pancultural elements in facial displays of emotion. Science, 164, 86 – 88. Elfenbein, H. A., & Ambady, N. (2002). On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychological Bulletin, 128, 205–235. Ellgring, H. (1986). Nonverbal expression of psychological states in psychiatric patients. European Archives of Psychiatry and Neurological Sciences, 236, 31–34. Fernandez-Dols, J. M., & Ruiz-Belda, M. A. (1995). Are smiles signs of happiness? Gold medal winners at the Olympic Games. Journal of Personality and Social Psychology, 69, 1113–1119. Fernandez-Dols, J. M., Sanchez, F., Carrera, P., & Ruiz-Belda, M.-A. (1997). Are spontaneous expressions and emotions linked? An experimental test of coherence. Journal of Nonverbal Behavior, 21, 163–177. Frank, M. G., & Ekman, P. (1993). Not all smiles are created equal: The differences between enjoyment and nonenjoyment smiles. Humor: International Journal of Humor Research, 6, 9 –26. Frank, M. G., Ekman, P., & Friesen, W. V. (1993). Behavioral markers and

580

MATSUMOTO AND WILLINGHAM

recognizability of the smile of enjoyment. Journal of Personality and Social Psychology, 64, 83–93. Fridlund, A. (1994). Human facial expression: An evolutionary view. San Diego, CA: Academic Press. Fridlund, A. (1997). The new ethology of human facial expressions. In J. A. Russell & J. M. Fernandez-Dols (Eds.), The psychology of facial expression (pp. 102–129). Cambridge, England: Cambridge University Press. Fridlund, A. (2002). The behavioral ecology view of smiling and other facial expressions. In M. Abel (Ed.), An empirical reflection on the smile (Vol. 4, pp. 45– 82). Lewiston, NY: Edwin Mellen Press. Frijda, N. H. (2001). Foreward. In A. J. J. M. Vingerhoets & R. R. Cornelius (Eds.), Adult crying: A biopsychosocial approach (pp. xiii– xviii). Hove, England: Brunner-Routledge. Geen, T. (1992). Facial expressions in socially isolated nonhuman primates: Open and closed programs for expressive behavior. Journal of Research in Personality, 26, 273–280. Goffman, E. (1959). The presentation of self in everyday life. Oxford, England: Doubleday. Gosselin, P., Kirouac, G., & Dore, F. (1995). Components and recognition of facial expression in the communication of emotion by actors. Journal of Personality and Social Psychology, 68, 83–96. Gross, J. J., & John, O. P. (1995). Facets of emotional expressivity: Three self-report factors and their correlates. Personality & Individual Differences, 19, 558 –568. Hauser, M. (1993, July 23). Right hemisphere dominance for the production of facial expression in monkeys. Science, 261, 475– 477. Heller, M., & Haynal, V. (1994). Depression and suicide faces. Cahiers Psychiatriques Genevois, 16, 107–117. Hess, U., Banse, R., & Kappas, A. (1995). The intensity of facial expression is determined by underlying affective states and social situations. Journal of Personality and Social Psychology, 69, 280 –288. Hess, U., & Kleck, R. E. (1990). Differentiating emotion elicited and deliberate emotional facial expressions. European Journal of Social Psychology, 20, 369 –385. Hofstede, G. H. (2001). Culture’s consequences: Comparing values, behaviors, institutions and organizations across nations (2nd ed.). Thousand Oaks, CA: Sage. Izard, C. E. (1994). Innate and universal facial expressions: Evidence from developmental and cross-cultural research. Psychological Bulletin, 115, 288 –299. Keltner, D. (1995). The signs of appeasement: Evidence for the distinct displays of embarrassment, amusement, and shame. Journal of Personality and Social Psychology, 68, 441– 454. Keltner, D., & Bonanno, G. A. (1997). A study of laughter and dissociation: The distinct correlates of laughter and smiling during bereavement. Journal of Personality and Social Psychology, 73, 687–702. Keltner, D., Ekman, P., Gonzaga, G. C., & Beer, J. (2003). Facial expressions of emotion. In R. J. Davidson, K. G. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 415– 432). New York: Oxford University Press. Keltner, D., Moffitt, T., & Stouthamer-Loeber, M. (1995). Facial expressions of emotion and psychopathology in adolescent boys. Journal of Abnormal Psychology, 104, 644 – 652. Kerr, J. H., Wilson, G. V., & Nakamura, I. (2005). Emotional dynamics of soccer fans at winning and losing games. Personality & Individual Differences, 38, 1855–1866. King, L. A., & Emmons, R. A. (1990). Conflict over emotional expression: Psychological and physical correlates. Journal of Personality and Social Psychology, 58, 864 – 877. Kraut, R. E., & Johnson, R. E. (1979). Social and emotional messages of smiling: An ethological approach. Journal of Personality and Social Psychology, 37, 1539 –1553. Kring, A. M., Smith, D. A., & Neale, J. M. (1994). Individual differences

in dispositional expressiveness: Development and validation of the emotional expressivity scale. Journal of Personality and Social Psychology, 66, 934 –949. Leonard, C. M., Voeller, K. K. S., & Kudau, J. M. (1991). When’s a smile a smile? Or how to detect a message by digitizing the signal. Psychological Science, 2, 166 –172. Levenson, R. W. (2005). FACS/EMFACS emotion predictions [Computer software]. Berkeley: University of California, Department of Psychology. Levenson, R. W., Carstensen, L. L., Friesen, W. V., & Ekman, P. (1991). Emotion, physiology, and expression in old age. Psychology and Aging, 6, 28 –35. Levenson, R. W., Ekman, P., & Friesen, W. V. (1990). Voluntary facial action generates emotion-specific autonomic nervous system activity. Psychophysiology, 27, 363–384. Levenson, R. W., Ekman, P., Heider, K., & Friesen, W. V. (1992). Emotion and autonomic nervous system activity in the Minangkabau of West Sumatra. Journal of Personality and Social Psychology, 62, 972–988. Matsumoto, D. (2001). Culture and emotion. In D. Matsumoto (Ed.), The handbook of culture and psychology (pp. 171–194). New York: Oxford University Press. Matsumoto, D., & Ekman, P. (1988). Japanese and Caucasian facial expressions of emotion and neutral faces (jacfee and jacneuf). Retrieved from http://www.paulekman.com Matsumoto, D., Ekman, P., & Fridlund, A. (1991). Analyzing nonverbal behavior. In P. W. Dowrick (Ed.), Practical guide to using video in the behavioral sciences (pp. 153–165). New York: Wiley. Matsumoto, D., Haan, N., Gary, Y., Theodorou, P., & Cooke-Carney, C. (1986). Preschoolers’ moral actions and emotions in prisoner’s dilemma. Developmental Psychology, 22, 663– 670. Matsumoto, D., & Kupperbusch, C. (2001). Idiocentric and allocentric differences in emotional expression and experience. Asian Journal of Social Psychology, 4, 113–131. Medvec, V. H., Madey, S. F., & Gilovich, T. (1995). When less is more: Counterfactual thinking and satisfaction among Olympic medalists. Journal of Personality and Social Psychology, 69, 603– 610. Messinger, D. S., Fogel, A., & Dickson, K. L. (2001). All smiles are positive, but some smiles are more positive than others. Developmental Psychology, 37, 642– 653. Richardson, C., Bowers, D., Bauer, R., Heilman, K., & Leonard, C. M. (2000). Digitizing the moving face during dynamic displays of emotion. Neuropsychologia, 38, 1028 –1039. Romney, A. K., Moore, C. C., & Rusch, C. D. (1997). Cultural universals: Measuring the semantic structure of emotion terms in English and Japanese. Proceedings of the National Academy of Sciences, USA, 94, 5489 –5494. Rosenberg, E. L., & Ekman, P. (1994). Coherence between expressive and experiential systems in emotion. Cognition & Emotion, 8, 201–229. Rosenberg, E. L., Ekman, P., & Blumenthal, J. A. (1998). Facial expression and the affective component of cynical hostility in male coronary heart disease patients. Health Psychology, 17, 376 –380. Rosenberg, E. L., Ekman, P., Jiang, W., Babyak, M., Coleman, R. E., Hanson, M., et al. (2001). Linkages between facial expressions of anger and transient myocardial ischemia in men with coronary heart disease. Emotion, 1, 107–115. Ruch, W. (1993). Extraversion, alcohol, and enjoyment. Personality & Individual Differences, 16, 89 –102. Ruch, W. (1995). Will the real relationship between facial expression and affective experience stand up: The case of exhilaration. Cognition & Emotion, 9, 33–58. Ruiz-Belda, M. A., Fernandez-Dols, J. M., Carrera, P., & Barchard, K. (2003). Spontaneous facial expressions of happy bowlers and soccer fans. Cognition & Emotion, 17, 315–326.

SPONTANEOUS EXPRESSIONS Russell, J. A. (1991). Culture and the categorization of emotions. Psychological Bulletin, 110, 426 – 450. Russell, J. A. (1994). Is there universal recognition of emotion from facial expression? A review of cross-cultural studies. Psychological Bulletin, 115, 102–141. Russell, J. A., Bachorowski, J.-A., & Fernandez-Dols, J. M. (2003). Facial and vocal expressions of emotion. Annual Review of Psychology, 54, 329 –349. Russell, J. A., & Feldman Barrett, L. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76, 805– 819. Russell, J. A., & Fernandez-Dols, J. M. (1997). The psychology of facial expressions. New York: Cambridge University Press. Sayette, M., Wertz, J., Martin, C., Cohn, J., Perrott, M., & Hobel, J. (2003). Effects of smoking opportunity on cue-elicited urge: A facial coding analysis. Experimental and Clinical Psychopharmacology, 11, 218 –227. Scheff, T. J. (1979). Catharsis in healing, ritual and drama. Berkeley: University of California Press. Schmidt, K. L., Cohn, J. F., & Tian, Y. (2003). Signal characteristics of spontaneous facial expressions: Automatic movement in social and solitary smiles. Biological Psychology, 65, 49 – 66. Schwartz, S. H. (2004). Mapping and interpreting cultural differences around the world. In H. Vinken, J. Soeters, & P. Ester (Eds.), Comparing cultures, dimensions of culture in a comparative perspective (pp. 43– 73). Leiden, The Netherlands: Brill. Shaver, P. R., Murdaya, U., & Fraley, R. C. (2001). The structure of the Indonesian emotion lexicon. Asian Journal of Social Psychology, 4, 201–224. Shaver, P. R., Schwartz, J. C., Kirson, D., & O’Connor, C. (1987). Emotion knowledge: Further exploration of a prototype approach. Journal of Personality and Social Psychology, 52, 1061–1086.

581

Smith, M. C. (1995). Facial expression in mild dementia of the Alzheimer type. Behavioural Neurology, 8, 149 –156. Snowdon, C. T. (2003). Expression of emotion in nonhuman animals. In R. J. Davidson, K. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 457– 480). New York: Oxford University Press. Soussignan, R. (2002). Duchenne smile, emotional experience, and autonomic reactivity: A test of the facial feedback hypothesis. Emotion, 2, 52–74. Steimer-Krause, E., Karuse, R., & Wagner, G. (1990). Interaction regulations used by schizophrenic and psychosomatic patients: Studies on facial behavior in dyadic interactions. Psychiatry, 53, 209 –228. Tomkins, S. S. (1962). Affect, imagery, and consciousness: Vol. 1. The positive affects. New York: Springer Publishing Company. Tomkins, S. S. (1963). Affect, imagery, and consciousness: Vol. 2. The negative affects. New York: Springer Publishing Company. Tomkins, S. S. (1978). Script theory: Differential magnification of affects. In Herbert E. Howe & Richard A. Dienstbier (Eds.), Nebraska Symposium on Motivation (Vol. 26, pp. 201–236). Lincoln: University of Nebraska Press. Tracy, J. L., & Robins, R. W. (2004). Show your pride: Evidence for a discrete emotion expression. Psychological Science, 15, 104 –197. Tsai, J. L., & Levenson, R. W. (1997). Cultural influences on emotional responding: Chinese American and European American dating couples during interpersonal conflict. Journal of Cross-Cultural Psychology, 28, 600 – 625. Vingerhoets, A. J. J. M., & Cornelius, R. R. (2000). Adult crying: A model and review of the literature. Review of General Psychology, 4, 354 –377.

Received March 11, 2005 Revision received February 13, 2006 Accepted March 21, 2006 䡲

CHARACTER STRENGTHS AND VIRTUES A Handbook and Classification

Edited by Christopher Peterson and Martin E. P. Seligman / Co published by the APA and Oxford University Press 2004. 800 pages. Hardcover. List: $75.00 • APA Member/Affiliate: $60.00 ISBN: 0-19-561701-5 • Item # 4317046

110 EXPERIENCES FOR MULTICULTURAL LEARNING Paul B. Pedersen 2004. 317 pages. Softcover. List: $39.95 • APA Member/Affiliate: $34.95 ISBN: 1-59147-082-X • Item # 4313084

BEYOND SIGNIFICANCE TESTING Reforming Data Analysis Methods in Behavioral Research

Rex B. Kline 2004. 325 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-118-4 • Item # 4316031

HELPING SKILLS Facilitating Exploration, Insight, and Action, Second Edition

Clara E. Hill 2004. 461 pages. Hardcover. List: $54.95 • APA Member/Affiliate: $44.95 ISBN: 1-59147-104-4 • Item # 4311006

CLINICAL SUPERVISION A Competency-Based Approach

Carol A. Falender and Edward P. Shafranske 2004. 336 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-119-2 • Item # 4317045

GRADUATE STUDY IN PSYCHOLOGY, 2005 Edition

2005. 865 pages. Softcover. Part of Graduate Study in Psychology (yearly publication) series. List: $24.95 • APA Member/Affiliate: $21.95 ISBN: 1-59147-159-1 • Item # 4270088

TEAM COGNITION Understanding the Factors That Drive Process and Performance

Edited by Eduardo Salas and Stephen M. Fiore 2004. 268 pages. Hardcover. List: $59.95 APA Member/Affiliate: $44.95 ISBN: 1-59147-103-6 Item # 4316028

EXPLORATORY AND CONFIRMATORY FACTOR ANALYSIS Understanding Concepts and Applications

Bruce Thompson 2004. 195 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-093-5 • Item # 4316025

ASSISTED SUICIDE AND THE RIGHT TO DIE

LABORATORY ANIMALS IN RESEARCH AND TEACHING

The Interface of Social Science, Public Policy, and Medical Ethics

Ethics, Care, and Methods

Barry Rosenfeld 2004. 201 pages. Hardcover. Part of the Law and Public Policy: Psychology and the Social Sciences series, in the Social Sciences sub-series List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-102-8 • Item # 4316029

CHILDREN’S PEER RELATIONS From Development to Intervention

Edited by Janis B. Kupersmidt and Kenneth A. Dodge 2004. 320 pages. Hardcover. Part of the Decade of Behavior/Science Conference series. List: $59.95 • APA Member/Affiliate: $44.95 ISBN: 1-59147-105-2 • Item # 4318008

FACING HUMAN SUFFERING Psychology and Psychotherapy as Moral Engagement

Ronald B. Miller 2004. 321 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-109-5 • Item # 4317043

PERSONALITY-GUIDED FORENSIC PSYCHOLOGY Robert J. Craig 2005. 359 pages. Hardcover. Part of Personality-Guided Psychology series. List: $59.95 • APA Member/Affiliate: $49.95 ISBN: 1-59147-151-6 • Item # 4317051

The Legacy of Kenneth B. Clark

Edited by Gina Philogène 2004. 336 pages. Hardcover. Part of the Decade of Behavior/Science Conference series. List: $59.95 • APA Member/Affiliate: $44.95 ISBN: 159147-122-2 • Item # 4316032

MEASURING UP Educational Assessment Challenges and Practices for Psychology

Edited by Dana S. Dunn, Chandra M. Mehrotra, and Jane S. Halonen 2004. 292 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-108-7 • Item # 4318011

HANDBOOK OF CLINICAL PSYCHOLOGY, VOLUME 2 Disorders of Behavior and Health

INTERPRETIVE GUIDE TO THE MILLON CLINICAL MULTIAXIAL INVENTORY, Third Edition James P. Choca 2004. 379 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-040-4 • Item # 4317019

HANDBOOK OF CLINICAL PSYCHOLOGY, VOLUME 3

From Potential to Realization

Edited by Robert J. Sternberg, Elena L. Grigorenko, and Jerome L. Singer 2004. 226 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-120-6 • Item # 4318012

PARTICIPATORY COMMUNITY RESEARCH Edited by John R. White and Arthur Freeman 2004. 405 pages. Hardcover. Part of the Decade of Behavior/Science Conference series. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-069-2 • Item # 4318003

COUNSELING THE PROCRASTINATOR IN ACADEMIC SETTINGS Edited by Henri C. Schouwenburg, Clarry Lay, Timothy A. Pychyl, and Joseph R. Ferrari 2004. 250 pages. Hardcover. Part of the Decade of Behavior/Science Conference series. List: $39.95 • APA Member/Affiliate: $34.95 ISBN: 1-59147-107-9 • Item # 4318009

GENETIC TESTING FOR CANCER Psychological Approaches for Helping Patients and Families

Andrea Farkas Patenaude Foreword by Francis S. Collins 2005. 305 pages. Hardcover. List: $59.95 • APA Member/Affiliate: $44.95 ISBN: 1-59147-110-9 • Item # 4317044

SOCIAL PROBLEM SOLVING Theory, Research, and Training

RACIAL IDENTITY IN CONTEXT

Edited by James M. Raczynski and Laura C. Leviton 2004. 470 pages. Hardcover. Part of the Handbook of Clinical Health Psychology series. List: $69.95 • APA Member/Affiliate: $54.95 ISBN: 1-59147-091-9 • Item # 4317037

CREATIVITY

Edited by Chana K. Akins, Sangeeta Panicker, and Christopher L. Cunningham 2005. 274 pages. Hardcover. List: $59.95 • APA Member/Affiliate: $44.95 ISBN: 1-59147-145-1 • Item # 4318013

Models and Perspectives in Health Psychology

Edited by Robert G. Frank, Andrew Baum, and Jan L. Wallander 2004. 641 pages. Hardcover. Part of Handbook of Clinical Health Psychology series. List: $69.95 • APA Member/Affiliate: $54.95 ISBN: 1-59147-106-0 • Item # 4317042

Edited by Edward C. Change, Thomas J. D’Zurilla, and Lawrence J. Sanna 2004. 276 pages. Hardcover. List: $39.95 • APA Member/Affiliate: $34.95 ISBN: 1-59147-147-8 • Item # 4317049

PSYCHOLOGICAL ASSESSMENT OF ADULT POSTTRAUMATIC STATES Phenomenology, Diagnosis, and Measurement, Second Edition

John Briere 2004. 312 pages. Hardcover. List: $34.95 • APA Member/Affiliate: $29.95 ISBN: 1-59147-144-3 • Item # 4317048

LAW & MENTAL HEALTH PROFESSIONALS: TEXAS, Third Edition Daniel W. Shuman 2004. 401 pages. Hardcover. Part of Law and Mental Health Professionals series. List: $79.95 • APA Member/Affiliate: $64.95 ISBN: 1-59147-117-6 • Item # 4315003

EVALUATING SEXUAL HARASSMENT Psychological, Social, and Legal Considerations in Forensic Examinations

William E. Foote and Jane Goodman-Delahunty 2004. 242 pages. Hardcover. List: $49.95 • APA Member/Affiliate: $39.95 ISBN: 1-59147-101-X • Item # 4316027

TAXOMETRICS Toward a New Diagnostic Scheme for Psychopathology

Norman B. Schmidt, Roman Kotov, and Thomas E. Joiner, Jr. 2004. 198 pages. Hardcover. List: $39.95 • APA Member/Affiliate: $34.95 ISBN: 1-59147-142-7 • Item # 4316033

To Order 1-800-374-2721 • www.apa.org/books

AD0321

NEW AND NOTEWORTHY The Cambridge Handbook of Expertise and Expert Performance

Peer Relationships in Cultural Context

Edited by K. Anders Ericsson, Neil Charness, Paul J. Feltovich, and Robert R. Hoffman

Edited by Xinyin Chen, Doran French, and Barry Schneider

$130.00: Hardback: 0-521-84097-X: 920pp $65.00: Paperback: 0-521-60081-2

$90.00: Hardback: 0-521-84207-7: 536pp

Critical Thinking in Psychology Edited by Robert J. Sternberg, Henry Roediger III, and Diane Halpern $65.00*: Hardback: 0-521-84589-0: 320pp $24.00*: Paperback: 0-521-60834-1

The International Handbook of Creativity

Winner, 2006 American Association of Public Opinion Research Book Award

Edited by Stein Bråten

Edited by James C. Kaufman and Robert J. Sternberg

The Psychology of Survey Response

$100.00: Hardback: 0-521-62257-3: 472pp $65.00*: Paperback: 0-521-02989-9

$95.00: Hardback: 0-521-83842-8: 538pp $34.99: Paperback: 0-521-54731-8

Roger Tourangeau, Lance J. Rips, and Kenneth Rasinski

Now In Paperback

Youth Unemployment and Society

Science Education and Student Diversity

Edited by Anne C. Petersen and Jeylan T. Mortimer

Synthesis and Research Agenda Okhee Lee and Aurolyn Luykx

A Vision for Universal Preschool Education

$48.00*: Paperback: 0-521-02857-4: 337pp

$65.00: Hardback: 0-521-85961-1: 208pp $22.99: Paperback: 0-521-67687-8

Edward Zigler, Walter S. Gilliam, and Stephanie M. Jones

Intersubjective Communication and Emotion in Early Ontogeny

Now In Paperback

Disclosure Processes in Children and Adolescents Edited by Ken J. Rotenberg $34.99*: Paperback: 0-521-02860-4: 247pp

Creativity and Reason in Cognitive Development

Frameworks for Thinking A Handbook for Teaching and Learning David Moseley, Vivienne Baumfield, Julian Elliott, Steven Higgins, Jen Miller, Douglas P. Newton, and Maggie Gregson $80.00: Hardback: 0-521-84831-8: 376pp $34.99: Paperback: 0-521-61284-5

Edited by James C. Kaufman and John Baer

Now In Paperback

$75.00: Hardback: 0-521-84385-5: 388pp $31.99: Paperback: 0-521-60504-0

Extending Self-Esteem Theory and Research

Good Kids from Bad Neighborhoods Successful Development in Social Context Delbert S. Elliott, Scott Menard, Bruce Rankin, Amanda Elliott, David Huizinga, and William Julius Wilson $70.00*: Hardback: 0-521-86357-0: 392pp $27.99*: Paperback: 0-521-68221-5

Social Comparison and Social Psychology Understanding Cognition, Intergroup Relations, and Culture Edited by Serge Guimond $110.00: Hardback: 0-521-84593-9: 370pp $50.00: Paperback: 0-521-60844-9

Sociological and Psychological Currents Edited by Timothy J. Owens, Sheldon Stryker, and Norman Goodman $48.00*: Paperback: 0-521-02842-6: 468pp

Critical Perspectives on Activity Explorations Across Education, Work, and Everyday Life Edited by Peter Sawchuk, Newton Duarte, and Mohamed Elhammoumi $80.00: Hardback: 0-521-84999-3: 312pp

Becoming Literate in the City The Baltimore Early Childhood Project Robert Serpell, Linda Baker, and Susan Sonnenschein $75.00: Hardback: 0-521-77202-8: 320pp $29.99: Paperback: 0-521-77677-5

*Prices subject to change.

For more information, please visit us at www.cambridge.org/us or call toll-free at 1-800-872-7423

$80.00: Hardback: 0-521-57246-0: 416pp $29.99: Paperback: 0-521-57629-6

$75.00: Hardback: 0-521-84854-7: 304pp $29.99: Paperback: 0-521-61299-3

Emotions and Culpability How the Law Is at Odds With Psychology, Jurors, and Itself Norman J. Finkel and W. Gerrod Parrott

T

his book investigates why, when, and how ordinary human beings hold some individuals guilty of crimes, but others less so or not at all. Why, for example, do the emotions of the accused sometimes aggravate a murder, making it a heinous crime, whereas other emotions might mitigate that murder to manslaughter, excuse a killing (“by reason of insanity”), or even justify it (“by reason of self-defense”)? And what emotions on the part of jurors come into play as they arrive at their decisions? The authors argue persuasively that U.S. law is out of touch with the way that jurors’“common sense justice” works and the way they judge culpability. This disconnect has resulted in some inconsistent verdicts across different CONTENTS: types of cases and thus has serious implications for whether the Part I – Defining the Ground, and Providing a Psychological Context for the Emotions ■ Chapter 1. law will be respected and obeyed. When the Law’s Story of Emotion and Culpability … Is at Odds With Human Nature Problems arise because criminal law has no unified theory ■ Chapter 2. Within a Normative Law, Can Psychology’s Place Still Be Defended? of emotion and culpability, and legal scholars often seem to ■ Chapter 3. Emotions in Folk Psychology ■ Chapter 4. Emotions in Academic misunderstand or ignore what psychologists know about emotion. Psychology: Implications for Culpability and the Law ■ Chapter 5. The Emotional Palette ■ Part II – Analyses and Comparisons of the Law’s Emotion and The authors skillfully show that the law’s culpability theories are Culpability Theories ■ Chapter 6. Murder’s Incongruities: Criminal Law from Civil (and must be) psychological at heart, and they propose ways in Law, Malice and Emotions Denuded, and Where Intent Matters Not ■ Chapter 7. which psychology can help inform and support the law. Throughout, Manslaughter’s Failing Theories of Mitigation: Emotions Bound by Objective Rules, the authors deftly weave examples from real-life high profile cases or an Unrestrained Subjectivity ■ Chapter 8. Insanity I: The Prototypic, Yet Problematic, such as those of John Lee Malmo, Andrea Yates, and Bernard Excusing Condition ■ Chapter 9. Insanity II: Its Disconnect, “Defect of Reason,” and Incapacity ■ Chapter 10. Where Self-Defense’s Justification Blurs Into Excuse: Goetz, as well as—unexpectedly—illuminating examples from A Defensible Theory, With Fitting Verdicts, for Mistaken Self-Defense ■ Part III – the psychologically sophisticated tragedies of Shakespeare. Concluding Thoughts: Psychology’s Informing Function ■ Chapter 11. Moving the 2006. 304 pages. Hardcover. Law Towards a Coherent Culpability Story ■ Chapter 12. A Reformulation, and Series: Law and Public Policy: Psychology and the Social Sciences Series Editor: Bruce D. Sales

Concluding Recommendations ■ References

List: $69.95 • APA Member/Affiliate: $49.95 • ISBN 1-59147-416-7 • Item # 4316078

ALSO AVAILABLE NOT FAIR! The Typology of Common Sense Unfairness Norman J. Finkel 2001 • 335 pages • Hardcover • List: $39.95 APA Member/Affiliate: $34.95 ISBN 1-55798-752-1 • Item # 431656A

DETERMINING DAMAGES The Psychology of Jury Awards Edie Greene and Brian H. Bornstein 2003 • 238 pages • Hardcover • List: $49.95 APA Member/Affiliate: $39.95 • ISBN 1-55798-974-5 • Item # 431695A

MORE THAN THE LAW Behavioral and Social Facts in Legal Decision Making Peter W. English and Bruce D. Sales 2005 • 272 pages • Hardcover • List: $69.95 APA Member/Affiliate: $54.95 • ISBN 1-59147-255-5 • Item # 4316053

APA Books Ordering Information

800-374-2721

www.apa.or g/boo ks In Washington, DC, call: 202-336-5510 TDD/TTY: 202-336-6123 • Fax: 202-336-5502 In Europe, Africa, or the Middle East, call: 44-207-240-0856

AMERICAN PSYCHOLOGICAL ASSOCIATION AD0475

The Official Pocket Style Guide from the American Psychological Association

Concise Rules New!

of

APA Style ❧ Presents specific chapters from the Publication Manual which provide students and professionals with a quick, portable, and complete reference to the rules of APA Style® ❧ Focuses on how to organize, express, and present ideas and data in papers and articles, as well as how to format figures and tables ❧ Offers a comprehensive list of essential writing standards for clear and effective communication This easy-to-use pocket guide, compiled from the Publication Manual of the American Psychological Association, provides complete guidance on the rules of style and will prove an invaluable reference tool for all those studying and working in the social sciences.The guide provides suggestions for reducing bias in language, reviews of the mechanics of style for punctuation, spelling, capitalization, abbreviation, italicization, headings, and quotations, and gives guidance for the construction and formatting of tables, figures, and appendices. Written for students, teachers, researchers, and clinicians in the social and behavioral sciences, Concise Rules targets only those rules writers need for choosing the best words and format for articles and papers. About 250 pages. 4.5" x 8.5". Spiral-bound. List: $26.95 APA Member/Affiliate: $26.95 ISBN 1-59147-252-0 Item # 4210000

Forget about other pocket style books with updated APA guidelines! Here is the authoritative style guide from the source of APA Style®!

Order Today! Call 800-374-2721 www.apa.org/books AD0500

Identity and Story Creating Self in Narrative EDITED BY DAN P. McADAMS, RUTHELLEN JOSSELSON, AND AMIA LIEBLICH In Identity and Story, the fourth volume in the series “The Narrative Study of Lives,” Dan P. McAdams, Ruthellen Josselson, and Amia Lieblich bring together an interdisciplinary and international group of creative researchers and theorists to examine the way the stories we tell create our identities. An increasing number of psychologists argue that people living in modern CONTENTS: societies give meaning to Contributors ■ Acknowledgments ■ Introduction ■ PART I: Unity vs. Multiplicity their lives by constructing ■ Chapter 1. Multiplicity and Conflict in the Dialogical Self ■ Chapter 2. Between and internalizing self“Being” and “Doing” ■ Chapter 3. The Raw and the Bland ■ Chapter 4. Creative defining stories. The Work, Love, and the Dialectic in Selected Life Stories of Academics ■ PART II: Self vs. Society ■ Chapter 5. Identity Light ■ Chapter 6. Silk from Sows Ears contributors to this volume ■ Chapter 7. Making a Gay Identity ■ PART III: Stability vs. Growth ■ Chapter 8. explore how, beginning in Constructing the “Springboard Effect” ■ Chapter 9. The Identities of Malcolm X adolescence and young ■ Chapter 10. A Narrative Exploration of Personal Ideology and Identity adulthood, our narrative ■ Chapter 11. “Where is the Story Going?” Narrative Form and Identity identities become the Construction in the Life Stories of Israeli Men and Women ■ Author Index stories we live by. 2006. ■ Subject Index ■ About the Editors 232 pages. Hardcover.

List: $59.95 • APA Member/Affiliate: $49.95 • ISBN 1-59147-356-X • Item # 4316071

ALSO AVAILABLE TURNS IN THE ROAD Narrative Studies of Lives in Transition Edited by Dan P. McAdams, Ruthellen Josselson and Amia Lieblich 2001 • 310 pages • Hardcover • List: $39.95 • APA Member/ Affiliate: $34.95 • ISBN 1-55798-773-4 • Item # 431660A

HEALING PLOTS The Narrative Basis of Psychotherapy Edited by Amia Lieblich, Dan P. McAdams, and Ruthellen Josselson 2004 • 222 pages • Hardcover • List: $49.95 • APA Member/Affiliate: $39.95 ISBN 1-59147-100-1 • Item # 4316026

UP CLOSE AND PERSONAL The Teaching and Learning of Narrative Research Edited by Ruthellen Josselson, Amia Lieblich, and Dan P. McAdams 2003 • 288 pages • Hardcover • List: $49.95 • APA Member/Affiliate: $39.95 ISBN 1-55798-940-0 • Item # 431689A

APA Books Ordering Information

800-374-2721

www.apa.org/books In Washington, DC, call: 202-336-5510 TDD/TTY: 202-336-6123 • Fax: 202-336-5502 In Europe, Africa, or the Middle East, call: 44-207-240-0856

AMERICAN PSYCHOLOGICAL ASSOCIATION AD0412

Emerging Adults in America Coming of Age in the 21st Century

J EF F REY J EN S EN A M E T T, P H D A N D J EN N I F ER LY N N TA N N ER

CONTENTS:

Chapter 1. Emerging Adulthood: Understanding the New Way of Coming of Age • Chapter 2. Emerging Adulthood, A Critical Period of Life Span Human Development • Section II: Lives --– Chapter 3. Emerging Structures of Adult Thought • Chapter 4. Emerging Adulthood as an Institutionalized Moratorium: Risks and Benefits to Identity Formation • Chapter 5. Ethnic Identity Exploration in Emerging Adulthood • Chapter 6. Mental Health in Emerging Adulthood: Continuities and Discontinuities in Course, Content, and Meaning • Chapter 7. Resilience in Emerging Adulthood • Section III: Contexts – Chapter 8. Family Relationships and Support Systems in Emerging Adulthood • Chapter 9. Friendships and Romance in Emerging Adulthood: Assessing Distinctiveness in Close Relationships • Chapter 10. Sex is Just a Normal Part of Life: Sexuality in Emerging Adulthood • Chapter 11. School, Work, and Emerging Adulthood • Chapter 12. Emerging Adults in a Mediated World • Section IV: Conclusion – Chapter 13. The Study of Emerging Adulthood: What Is Known, and What Remains to Be Known?

Emerging Adults in America portrays the lives of young Americans between adolescence and young adulthood, a distinct developmental stage that editor Jeffrey Jensen Arnett describes as emerging adulthood. Over the past 40 years, the average age of marriage and parenthood has risen dramatically, and the years from the late teens through the mid-20s are no longer dedicated to settling into traditional adult roles. Instead, the focus has shifted to pursuing higher education, self-exploration, and shaping a future that best suits personal goals and desires. Along with coeditor Jennifer Lynn Tanner, Arnett has compiled a collection of chapters in this ground-breaking work that cover a range of topics from relationships with parents to views about love, sex, and marriage; from experiences in college to those in the work place; and from religious beliefs to beliefs about the concept of adulthood. This insightful book will be a valuable resource for developmental psychologists, therapists, and mental health practitioners who work with emerging adults and will appeal to young people and their families. 2005. 376 pages. Hardcover.

ISBN 1-59147-329-2 • Item # 4317092 • List: $79.95 • APA Member/Affiliate: $49.95

ALSO AVAILABLE THE ORIGINS OF HUMAN NATURE

Evolutionary Developmental Psychology DAVID F. BJORKLUND AND ANTHONY D. PELLEGRINI 2001. 444 pages. Hardcover. List: $49.95 APA Member/Affiliate: $39.95 ISBN 1-55798-878-1 Item # 431671A

SEX AND LOVE IN INTIMATE RELATIONSHIPS ROBERT W. FIRESTONE, LISA A. FIRESTONE AND JOYCE CATLETT 2005. 304 pages. Hardcover. List: $49.95 APA Member/Affiliate: $39.95 ISBN 1-59147-286-5 Item # 4317085

PREVENTING YOUTH VIOLENCE IN A MULTICULTURAL SOCIETY NANCY G. GUERRA AND EMILIE PHILLIPS SMITH 2005. 304 pages. Hardcover. List: $69.95 APA Member/Affiliate: $49.95 ISBN 1-59147-327-6 Item # 4316064

APA Books Ordering Information

800-374-2721 www.apa.org/books In Washington, DC, call: 202-336-5510 TDD/TTY: 202-336-6123 • Fax: 202-336-5502 In Europe, Africa, or the Middle East, call: 44-207-240-0856

AMERICAN PSYCHOLOGICAL ASSOCIATION AD0406

2007

Graduate Study in Psychology is the best source of information related to graduate programs in psychology and provides information related to approximately 600 graduate programs in psychology in the U.S. and Canada. This volume contains information about the number CONTENTS: of applications received by a program, Considering Graduate Study number of individuals accepted in each ■ Accreditation program, dates for applications and in Professional Psychology ■ Programs, Degrees, and Employment ■ Admission admission, types of information required Requirements ■ Competition for Admission ■ Time to Degree ■ Tuition and for an application (GRE scores, letters of Financial Assistance ■ Application Information ■ Rules for Acceptance for Offers for Admission and Financial Aid ■ Explanation for Program Listings recommendations, documentation con■ Contact Information ■ Department Information ■ Programs and Degrees cerning volunteer or clinical experience, Offered ■ APA Accreditation Status ■ Student Applications/Admissions etc.), in-state and out-of-state tuition ■ Financial Information/Assistance ■ Employment of Department Graduates ■ Additional Information Application Information ■ Department Listings by costs, availability of internships and State ■ Index of Programs by Area of Study Offered ■ Alphabetical Index scholarships, employment information of graduates, orientation and emphasis of departments and programs, plus other relevant information. 2007. 832 pages. Paperback. Series: Graduate Study in Psychology

List: $24.95 • APA Member/Affiliate: $21.95 • ISBN 1-59147-423-X • Item # 4270090 • ISBN-13: 978-1-59147-423-4

ALSO AVAILABLE GETTING IN A Step-by-Step Plan for Gaining Admission to Graduate School in Psychology 1993 • 221 pages • Paperback • List: $14.95 APA Member/Affiliate: $14.95 • ISBN 1-55798-219-8 Item # 4313011 • ISBN-13: 978-1-55798-219-3

INTERNSHIPS IN PSYCHOLOGY, 2005-2006 The APAGS Workbook for Writing Successful Application and Finding the Right Match Edited by Carol Williams Nickelson and Mitchell J. Prinstein 2005 • 142 pages • Paperback • List: $24.95 • APA Member/Affiliate: $19.95 ISBN 1-59147-209-1 Item # 4313004 • ISBN-13: 978-1-59147-209-4

GETTING MENTORED IN GRADUATE SCHOOL W. Brad Johnson and Jennifer M. Huwe 2003 • 210 pages • Paperback • List: $29.95 • APA Member/Affiliate: $24.95 ISBN 1-55798-975-3 • Item # 4313082 • ISBN-13: 978-1-55798-975-8

APA Books Ordering Information

800-374-2721

www.apa.org/books In Washington, DC, call: 202-336-5510 TDD/TTY: 202-336-6123 • Fax: 202-336-5502 In Europe, Africa, or the Middle East, call: 44-207-240-0856

AMERICAN PSYCHOLOGICAL ASSOCIATION AD0468