THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 22
This Page Intentionally Left Bla...
22 downloads
1009 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 22
This Page Intentionally Left Blank
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
BY GORDON H. BOWER EDITED
STANFORD UNIVERSITY, STANFORD, CALIFORNIA
Volume 22
(m) n
ACADEMIC PRESS, INC. Harcourt Bnce Jovanovich, Publishers
San Diego New York Berkeley Boston London Sydney Tokyo Toronto
COPYRIGHT 8 1988 BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITIED IN ANY FORM OR BY ANY MEANS. ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY. RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM. WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC . San Diego, California 92101
United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. 24-28 Oval Road, London NWI 7DX
LIBRARYOF CONORESS
CATALOG CARD
ISBN 0-12-543322-0
(alk.
paper)
PRINTED IN THE UNITED STATES OF AMERICA 8889W91
9 8 7 6 5 4 3 2 1
NUMBER:66-30104
CONTENTS
Contributors .............................................................
ix
FORAGING AS OPERANT BEHAVIOR AND OPERANT BEHAVIOR AS FORAGING WHAT HAVE WE LEARNED? &m J: Shettleworth I . Introduction .......................................................... I1. Optimal Foraging Theory ............................................... 111. Simulating Foraging in the Laboratory ................................... IV. Prey Selection and Delay Reduction...................................... V. Patch Departure and the Marginal-Value Theorem ......................... VI . Sampling and Information .............................................. VII. Risk ................................................................. VIII. Time Horizons ........................................................ IX. Conclusions ........................................................... References ............................................................
1 3 5 8 16 26 33 36 41 43
THE COMPARANR HYPOTHESIS A RESPONSE RULE FOR THE EXPRESSION OF ASSOCIATIONS
Ralph R . Miller and Louis D. Matzel I . Introduction .......................................................... I1. The Comparator Hypothesis ............................................ I11. Punctate Comparator Stimuli ........................................... IV. Some Applications of the Comparator Hypothesis ......................... V Implications for Conditioned Inhibition Theory ........................... VI . "taining a CS in Multiple Contexts ...................................... VII . The Temporal Window for Comparisons .................................
.
V
51 54 61 62 67 78 78
vi
Contents
VIII. Comparator Stimuli for Comparator Stimuli .............................. IX Relationship of the Comparator Hypothesis to Other Models ............... X Postconditioning Inflation of Comparator Stimuli ......................... XI. Generalization to Instrumental Behavior .................................. XI1 Appraisal of the Comparator Hypothesis ................................. References ............................................................
. . .
79 80 82 86 81 88
THE EXPERIMENTAL SYNTHESIS OF BEHAVIOR: REImRCEMENT. BEHAVIORAL STEREOTYPY. AND PROBLEM SOLVING
&rry Schwartz I. Introduction .......................................................... I1 Experiment 1: Effects of Pntraining Variation ............................ 111. Experiment 2 Rules. Payoffs. and Contingency Assessment ................. IV. Experiment 3: Rules. Payoffs. and the Apprehension of Logical Form ........ V Experiment 4 Melioration. Optimization. and Prctraining .................. VI General Discussion .................................................... Refmces ............................................................
. . .
93 104 106 114 120 128 135
EXTRACTION OF INFORMATION FROM COMPLEX VISUAL STIMULI: MEMORY PERKIRMANCE AND PHENOMENOLOGICAL APPEARANCE
Geoflny R. Luflus and John Hogden I . Introduction .......................................................... I1 A Model of Information Acquisition and Picture Memory .................. 111. Phenomenological Appearance IV Concluding Remarks ................................................... V. Appendix ............................................................. References ............................................................
. .
..........................................
139 143 166 180 184 188
WORKING MEMORY. COMPREHENSION. AND AGING A REVIEW AND A NEW VIEW
Lynn Hasher and Rose T Zacks I . Introduction .......................................................... I1. The Theoretical Framework: From General Capacity to Working Memory .... 111. The Empirical Evidence: Aging. Inference Formation. and Retrieval Problems ................................................. IV Criticisms of the Reduced Processing Resource Approach ................... V . A N m Framework Inhibition and the Contents of Working Memory ........ References ............................................................
.
193 194 199 208 212 220
contents
Vii
STRATEGIC CONTROL OF RETRIEVAL STRATEGIES
Lynne M. Reder I. Introduction .......................................................... I1. A Wo-Factor Theory of Memory Retrieval ............................... I11. Influencing Strategy Selection: Extrinsic Variables and Intrinsic Variables ..... IV. When Does the Strategy-Selection Stage Operate? .......................... V. Influencing Strategy Selection: Intrinsic Variables .......................... VI . Conclusions ........................................................... VII. Appendix ............................................................. References ............................................................
227 228 235 240 245 254 256 258
ALTERNATIVE REPRESENTATIONS
Ruth S. Day I . Introduction .......................................................... I1. Bus Schedules ......................................................... 111. Medication Instructions ................................................ IV. kt-Editing Commands ................................................ V. Overview of Experiments ............................................... VI . Research Strategy ...................................................... VII. Toward a Comprehensive View of Representation .......................... References ............................................................
261 264 274 283 297 299 300 303
EVIDENCE FOR RELATIONAL SELECTIVITY IN THE INTERPRETATION OF ANALOGY AND METAPHOR Dedn? Gentner and Catherine Clement 1. Introduction .......................................................... I1. Three Accounts of Metaphor and Analogy ................................ 111. Experiments Contrasting Structure-Mapping and Salience Imbalance ......... IV Systems of Relations ................................................... V Appendix A: Scoring Propositional Structure ............................. VI Appendix B: Grammatical Categories Used in Syntactic Scoring ............. References ............................................................
. . .
Index ................................................................... Contents of Recent Volumes
..................................................
307 309 316 344 352 353 355 359 369
This Page Intentionally Left Blank
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
Catherine Clement, Department of Psychology, University of Illinois, Champaign, Illinois 61820 (307) Ruth S. Day, Department of Psychology, Duke University, Durham, North Carolina 27706 (261) Dedre Gentner, Department of Psychology, University of Illinois, Champaign, Illinois 61820 (307) Lynn Hasher, Department of Psychology and Center for the Study of Aging and Human Development, Duke University, Durham, North Carolina 27706 (193) John Hogden, Department of Psychology, Stanford University, Stanford, California 94305 (139) Geoffrey R. Loftus, Department of Psychology, University of Washington, Seattle, Washington 98195 (139) Louis D. Matzel, National Institute of Neurological and Communicative Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892 (51) Ralph R. Miller, Department of Psychology, State University of New York at Binghamton, Binghamton, New York 13901 (51) Lynne M. Reder, Department of Psychology, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213 (227) Barry Schwartz, Department of Psychology, Swarthmore College, Swarthmore, Pennsylvania 19081 (93) Sara J. Shettleworth, Department of Psychology, University of Toronto, Toronto, Ontario, Canada M5S 1Al (1) Rose T. Zacks, Department of Psychology, Michigan State University, East Lansing, Michigan 48824 (193) ix
This Page Intentionally Left Blank
FORAGING AS OPERANT BEHAVIOR AND OPERANT BEHAVIOR AS FORAGING: WHAT HAVE WE LEARNED? Sara J . Shettleworth
I. Introduction A redshank walks slowly over the tidal flats, occasionally probing the mud with its bill and pulling out a worm. A crab crawls over a mussel bed, lifting the mollusks with its claws. Some it drops after a brief manipulation; others it crushes and eats. A pigeon confronts two lighted disks in a small chamber. It pecks one for a few seconds, then the other. A hopper of grain appears and the pigeon eats. The crab and the redshank have been studied by behavioral ecologists testing models of foraging behavior (Elner & Hughes, 1978; Goss-Custard, 1977). The pigeon is representative of the thousands of pigeons that have been subjects in psychologists’ studies of reinforcement and choice. Yet, clearly, the pigeon is in some sense foraging while the crab and the redshank are experiencing what could be described as reinforcement schedules. One might well ask, then, what studies of foraging and studies of food-reinforced operant behavior have to do with each other. In the last decade this question has inspired considerable research and discussion, including substantial parts of three interdisciplinary conferences (Commons, Kacelnik, & Shettleworth, 1987; Kamil. Krebs, & Pulham, 1987; Kamil & Sargent, 1981). Within biology, the development of optimal foraging theory has fostered experiments remarkably similar to THE PSYCHOLOGY 01;L E A R N I N G A N D MOTIVATION. VOL. ??
Copyright (0 I988 hy Academic Prem. Inc. All rights of reproduction in any form reherved.
2
Sara J. Shettleworth
studies of reinforcement schedules. Within psychology, experiments that might once have been labeled studies of maze-running behavior or schedule effects are now seen as studies of the memory and decision processes animals use in foraging for food in the wild (e.g.. Batson, Best, Phillips, Patel, & Gilliland, 1986). Although animals working for food in the laboratory can be seen as foraging, they are undoubtedly doing so under conditions considerably different from those in the wild. It is not self-evident that data and theory about foraging are relevant to principles of reinforcement and choice, or vice versa. That the two are nevertheless intimately related has been argued most carefully by Lea (1981, 1982; see also Baum, 1982b. 1983; Collier & Rovee-Collier. 1981; Kamil, 1983; Kamil & Yoerg, 1982; Staddon. 1980. 1983).On the surface, Lea notes, observations like those at the beginning of this chapter merely suggest an analogy between foraging and operant behavior. However, detailed comparisons show that phenomena like those observed in the field can be produced on appropriately designed schedules in the laboratory. It seems to follow that "the same principles . . . govern instrumental performance in the field as in the laboratory: and that, in consequence, operant psychology may be able to supply the behavioral mechanisms of some of the effects observed by ecologists, while foraging may supply the evolutionary rationale for some of the phenomena observed in the laboratory" (Lea, 1981. p. 355). Belief in Lea's conclusion or a desire to test it has undoubtedly helped to stimulate research on foraging-related problems in psychological laboratories. An optimistic proponent of a unified science of animal behavior might conclude that, in the area of foraging, a synthesis is well on its way (Kamil, 1988, and Rozin & Schull, 1988, discuss such a synthesis). Others might conclude that this research has merely served to confirm Lea's conclusion without producing any new psychological insights. This contribution surveys the results of laboratory simulations of various aspects of foraging. Subsets of this work have been thoroughly discussed by others (e.g., Fantino & Abarca, 1985; Kamil & Roitblat. 1985; Lea, 1981). However, by reviewing a number of recent developments in one place, it is possible to address the question: To what extent have psychologists (or, for that matter, behavioral ecologists) learned anything new from laboratory simulations of foraging? Are there, for example, general principles of reinforcement or choice which become more evident when foraging is considered or which are not evident otherwise'? Because so much of this work is influenced by optimal foraging theory, it is necessary to begin with a brief discussion of optimal foraging theory. More detailed presentations can be found elsewhere (e.g., Krebs & McCleery, 1984; Stephens & Krebs, 1986).
Foraging and Operant Behavior
11.
A.
WHAT
3
Optimal Foraging Theory
Do OPTIMAL FORAGING MODELS Do?
Evolution tends to produce traits that maximize the fitness of individuals possessing them. This plausible assumption is the starting point for optimal foraging theory as well as for much of behavioral ecology. Reproductive success (i.e., fitness) is difficult to measure directly except in short-lived species. but one can measure various crrrrencies that, it seems fair to assume, are related directly to fitness. In the case of male sexual behavior. the currency might be the total number of females inseminated in a lifetime: in the case of foraging, it is usually energy intake per unit time spent foraging (EIT). Developing an optimal foraging model begins by formally characterizing the foraging situation of interest and specifying the currency to be maximized together with relevant constraints on the forager (for example, perhaps it cannot eat and search for prey at the same time). The behavior that maximizes EIT can then be specified and experiments can be designed to discover whether behavior has the predicted properties. This description of the bare bones of optimal foraging theory conceals an important point which is sometimes misunderstood. Optimal foraging models (and there are many of them) specify what the orrtcornc’ of behavior, given certain assumptions, should be. They do not say anything about the through which that outcome might be achieved. underlying tnc~c~lzutrisms Thus, optimality models and theories of the mechanisms of learning or choice are complementary, not alternative, accounts of behavior. Optimality models answer the question, Whut shoirld animals do‘?Psychological explanations answer the question, How do animals do it? The distinction is essentially that between functional (evolutionary or ultimate) and causal (mechanistic or proximate) questions about behavior (Hogan, 1984; Houston, 1987; Tinbergen. 1951). It follows that the way in which the optimum is derived by the theorist carries no implications about how it is achieved by animals. if it is. For example, the optimal rate of sampling a fluctuating patch to see if it has changed for the better can be derived by balancing the total losses of potential prey from sampling too often against those from not sampling often enough (Stephens, 1987; see Section V1). But this kind of derivation does not mean that optimal foraging theory depicts animals as consciously weighing the costs and benefits of different courses of action, nor does it mean that animals must be “literal optimizers” (Herrnstein & Vaughan. 1980)in all, or indeed any, situations. For the predictions of an optimality model to be fulfilled. an animal need only respond in some wwy that leads to the predicted outcome in the environment of its species. This might
4
Sara J. Shettleworth
mean responding to a simple stimulus like prey size that is a reliable correlate of EIT in the species’ natural habitat. That is to say, animals use mechanisms that approximate optimal behavior in nature but may not do so in arbitrary laboratory situations. (Simple decision mechanisms are sometimes referred to as “rules of thumb” by foraging theorists; for further discussion see Houston, 1987; Staddon, 1983). Animals do not need to use complex cognitive mechanisms, or even psychologically very interesting decision processes, to forage close to optimally. This possibility is easy to accept in the case of “lower” animals like crabs and mantids, but it is often overlooked when birds and mammals are being studied.
MODELS: MODIFYINGTHE CONSTRAINTS B. TESTSOF FORAGING Foraging models have been tested both by making observations in the field and by designing laboratory simulations more or less like natural foraging situations. On the whole, tests of optimal foraging models have been reasonably successful (see reviews in Pyke. 1984; Schoener, 1987; Stephens & Krebs. 1986; but see also Gray, 1987). Most often, qualitative predictions have been fulfilled while precise quantitative predictions have not. Experiments and observations are seen as testing not the basic assumption that evolution tends to maximize fitness but the assumptions of particular models. If a model fails to predict behavior accurately, and the assumptions of the model are actually met by the testing situation (not a trivial problem; see Stephens & Krebs, 1986), the theorist considers whether the model makes correct assumptions about the currency being optimized and the constraints within which optimization is achieved. Classical optimal foraging models make rather few and simple constraint and currency assumptions. This makes them very general, but at the same time inappropriate for many real situations. In modifying the constraint assumptions of classical models, psychological research and optimal foraging theory make contact in a particularly interesting way. For example, the classical prey selection model (Stephens & Krebs, 1986) assumes the predator recognizes prey types instantly and perfectly. This assumption leads to the prediction of all-or-nothing choice between two prey types: below a threshold abundance of the more profitable type the predator should be unselective; above the threshold, it should take only the more profitable prey. The behavior of real predators is accounted for much better when the assumption of perfect recognition is changed for one that incorporates a signal detection model (Getty, Kamil, & Real, 1987). As another example, the optimal way to sample a fluctuating patch is to visit it at regular intervals, for example, on precisely every
Foraging and Operant Behavior
5
fifteenth foraging trip (Stephens, 1987). But animals are generally incapable of such accurate counting or timing (Gibbon & Church, 1981). Stephens’s model can be made more realistic by assuming that animals sample at random and solving for the optimal sampling probability rather than the optimal sampling interval (Shettleworth, Krebs, Stephens, & Gibbon, 1988; see Section V1). This cyclical (not circular) process of incorporating ever more realistic and specific constraints into foraging models has been discussed most fully by Cheverton, Kacelnik, and Krebs (1985; see also Houston & McNamara, 1985). They point out that information about behavioral mechanisms, in the form of constraint assumptions, is an essential part of optimal foraging models. This makes functional and causal accounts of foraging not merely complementary but interacting (see also Rozin & Schull, 1988; Shettleworth, 1983). Some second- and third-generation optimal foraging models incorporate information already available about psychological processes like discrimination, timing, and counting. However, tests of foraging models can also stimulate investigations of novel or poorly understood behavioral mechanisms (Shettleworth, 1987b). These sorts of interactions between optimality models and studies of behavioral mechanisms are emphasized in this article.
Ill.
Simulating Foraging in the Laboratory
A. Two EXAMPLES
In one of the first experimental tests of optimal prey choice, Krebs, Erichsen, Webber, and Charnov (1977) trained great tits to take pieces of mealworm from a conveyor belt. The profitability of the “prey” was varied by varying their size and attaching a small piece of sticky tape to some of them. In this kind of experiment animals are tested in a controlled way in captivity, but they are choosing between real prey like those they normally eat. The mealworm pieces moving past on the conveyor belt are like insects running among leaves. In Lea’s ( 1979) pioneering simulation of prey selection using pigeons in an operant chamber, the “prey” are colors on the pecking key, associated with different delays to a fixed amount of grain. The bird accepts an item of a given type by pecking the associated key color. It rejects the item by failing to peck or by pecking another key which returns it to the “search” state. Both of these experiments simulate a foraging situation that has been
Sara J. Shettleworth
6
studied in the field (e.g., Goss-Custard, 1977). Although the formal properties of both simulations fit the classical model of prey choice (Stephens & Krebs, 19861, they differ from each other in how many details are the same as they would be for animals of the same species foraging in the wild. In principle, each of these (for example, substituting pecking at a lighted disk for handling food) should be justified. In fact. because the results of parallel laboratory and field experiments are generally so similar. the very abstract nature of operant foraging simulations is seldom questioned. The results help to justify the claim (e.g., Lea, 1981, 1982; Staddon, 1980, 1983) that the same mechanisms are used in “foraging” in the laboratory as in the wild. Although some researchers (e.g., Fantino & Abarca. 1985) have recognized that principles found in the laboratory must ultimately be tested in successively more natural situations, this has rarely been tried. However, Kacelnik and Cuthill (1987) describe a research program combining field and laboratory studies on the same species. A potentially very fruitful approach is to compare t h e mechanisms used in parallel studies of the same foraging problem in an operant simulation and a natural situation (Schull, Gelch. Vitale, Allen, James, & Harrison, 1985).
B.
W H Y SIMULATE
FORAGING I N THE PSYCHOLOGY LABORATORY?
1. To test optirnal foraging models. Well-designed simulations allow tests of theoretical predictions under much better-controlled conditions than would ever be possible in the field. Indeed, it has been argued (Hanson, 1987) that operant simulations are necessary because only in this way can the values of important variables be known precisely. A more modest claim (Kamil & Yoerg, 1982; Pulliam, 1981; Schoener, 1987) is that the methods of operant psychology are useful for testing foraging models. As well, psychologists have already encountered and overcome some of the same problems that are raised by analyses of foraging behavior. and behavioral ecologists would be unwise to ignore their findings (Kamil & Yoerg, 1982; Kamil. 1983). 2. To discover. whether operant behavior is optirnal. Viewing responding on operant schedules as foraging has lead some to ask whether behavior on standard schedules is optimal. This question ignores the caution that even if animals optimize in nature they do not necessarily do so in arbitrary situations. Much discussion has centered on whether reinforcement rate is maximized by matching relative responses to relative reinforcements on concurrent variable-interval schedules (Commons, Herrnstein. & Rachlin, 1982). An optimality analysis of the contingencies on such schedules shows that maximizing does not require matching (Houston & McNamara, 1981). A related question is whether matching is the outcome of a decision process that maximizes reinforcement in natural situations
Foraging and Operant Behavior
I
(Mazur, 1981; Staddon, 1980, 1983). The large and sometimes confused literature on this subject is succinctly summarized by Houston ( 1987). 3. To discover inore ecologically valid behavioral principles. Data and ideas about foraging have influenced laboratory research on animal learning and cognition by inspiring a flight from strictly controlled testing situations to more naturalistic ones (e.g., Baum, 1983; Mellgren, 1982). It has been suggested that by studying traditional problems of learning and motivation under more naturalistic conditions-for example, by substituting "travel" between pecking keys for a changeover delay (Baum, 1982a)-it will be possible to discover more ecologically valid principles of learning and choice. The most influential work of this type is that of Collier and colleagues (Collier & Rovee-Collier, I98 I ; Collier, 1983; Collier, Johnson, Hill, & Kaufman, 1986). By studying animals working for meals on continuously available schedules, Collier has raised important new questions about motivation and schedule effects. 4. To cina1~yz.eexamples cflearning from the field. Observations of animals foraging in the wild have suggested new questions about memory and cognition which have then been studied in the laboratory. This approach has been particularly productive in the study of memory in foodhoarding birds (Kamil & Balda, 1985; Sherry, 1987; Shettleworth & Krebs, 1982). the analysis of search image formation (Guilford & Dawkins, 1987; Pietrewicz & Kamil. 1977), and studies of how animals learn to avoid aposematic prey (Roper & Wistow, 1986). This sort of approach, like that described in (3) above, is based on a premise opposite to simulations like Lea's. Rather than making natural situations more simple and abstract so as to better control and understand them, it prescribes making laboratory situations more natural so as to analyze processes that have some existence in the world outside the laboratory. A bridge between these approaches is research aimed at discovering whether principles discovered in conventional laboratory studies are applicable in foraging-like situations (e.g.. Rashotte, O'Connell, & Beidler, 1982). 5 . To muko.foruging models more realistic. Early experimental studies of foraging like that of Krebs et al. (1977) aimed to test predictions of simple optimal foraging models. Early operant simulations like that of Lea (1979) aimed to see whether foraging-like phenomena could be obtained on appropriately designed schedules. In more recent work, predictions from optimal foraging theory and predictions from psychological studies of learning and choice have sometimes been brought to bear on a single situation within a single study (e.g., Kacelnik & Krebs, 1985a; Shettleworth et ul., 1988). This juxtaposition can lead to refinement of the constraints assumed in the foraging model as well as to new insights into behavioral processes. The next few sections emphasize the best-developed examples of this kind of work.
Sara J. Shettleworth
8
IV. A.
Prey Selection and Delay Reduction
How SHOULD ANIMALS SELECTPREY?
Models of prey selection deal with what a forager should do when it successively encounters items of different types. On each encounter it must decide whether to accept the item at hand or continue searching. Handling (i.e., capturing, preparing, and consuming) the item at hand precludes further search. Prey types are assumed to differ in their profitability [ratio of net energy yield to handling time ( d h ) ] , and the predator is assumed to be able to recognize prey types perfectly and instantaneously. The classical prey selection model, originally proposed by several authors (see Schoener, 1987; Stephens & Krebs, 1986). answers the question: How should prey be selected to maximize the net rate of energy intake? The answer can be derived by considering, for each item, its net energy yield (eJ, handling time (h,), and encounter rate in itemslunit time (A,), and then solving for the probability of acceptance that maximizes total energy intakeltime. The result can be deduced by considering the situation at the moment of encounter with an item, say, type I . If the predator accepts the item at hand, it will obtain e, units of energy in the next h , time units. If it continues to search for type 2 items instead, it can expect e2 units of energy in the time necessary to find (I/h2) and handle ( A 2 ) a type 2 item. Thus, its rate of energy intake will be e,l(llA2 + h2). The predicted decision depends on which of these alternatives offers the greater net rate of energy intake. The important predictions of this model are that ( I ) prey are included in the diet in order of profitability; (2) choice is all or nothing: a prey type should either always or never be attacked when encountered; and (3) acceptance of a prey type depends not on its own abundance but on the abundance of higher-ranked types. These predictions have been at least qualitatively confirmed in a large number of field studies as well as in laboratory experiments under seminatural conditions (Stephens & Krebs, 1986; but see Gray, 1987). Models with modified recognition constraints have also been developed. For example, like shore crabs that identify mussels by lifting them, the forager can be assumed to take time to recognize items (Elner & Hughes, 1978; Houston, Krebs, & Erichsen, 1980). The consequences of imperfect discrimination among item types have also been explored (Getty el ui., 1987). In this case partial preferences are predicted.
B. OPERANT SIMULATIONS OF PREYSELECTION Beginning with the work of Lea (1979) and Collier (Collier & RoveeCollier, 19811, prey selection has probably been the subject of more operant simulations than any other single foraging problem (review in Fantino &
Foraging and Operant Behavior
9
t 0
Active Key A=Amber R=Red G =Green
@ Dark Key
Fig. I . Flow diagram of a schedule simulating prey choice (Hanson. 1987) showing t h e states of the two pecking keys during the search. handling. and consumption phases. P, Probability value; FR, fixed ratio schedule: VR. variable ratio. The values of the schedule parameters. ( I . I’. and g . can be changed to vary search and handling times respectively. (After Hanson, 1987.)
Abarca, 1985). Hanson’s (1987) version of the schedule devised by Lea is diagrammed in Fig. 1. The pigeon pecks a key to “search” for items. From time to time a second key lights up to signal an encounter with an item. Each item type is associated with a distinctive color on the key. The bird can accept the item by pecking the key through a handling time or reject it by continuing to peck the search key. When an item is accepted, the handling time ends in opportunity to eat from a hopper full of grain for a few seconds, after which the search state is reinstated. Collier (e.g.. 1983)has subjected rats and other animals to similar schedules. However, the results are of limited usefulness as quantitative tests of optimality because the animal usually receives a self-determined “meal” rather than a fixed packet of energy. Like researchers who have studied prey selection with items that must actually be handled, Lea obtained results qualitatively, but not always quantitatively, in accord with the predictions of the optimal diet model. The birds always accepted nearly all of the more profitable (shorter handling time) items offered, and their probability of accepting the items with longer handling time did increase as more profitable items became scarce. However, this transition was not all or none. Partial preferences were
Sara J. Shettleworth
10
observed. Also, 100% of the poor (longer handling time) items were not accepted as soon as they should have been when density of the more profitable items decreased. There was a bias against the long handling time. Finally, with a constant density of profitable items, more poor items were accepted as their density increased.
c.
WHY
PAK'rIAL PREFERENCES?
Partial preference rather than 100% acceptance or rejection of relatively unprofitable prey seems to be the rule in tests of prey selection (Stephens & Krebs, 1986). A number of reasons for partial preferences have been advanced (Krebs & McCleery, 1984; McNamara & Houston, 1987b). Most of them assume the animal's behavior is optimal in some broader framework than the standard prey selection model. For example, one possibility is that to maximize long-term gain the animal must occasionally sample the relatively poor prey types (see Section Vl). Alternatively, it may be necessary to assume that partial preferences are inevitable because animals have thresholds of acceptance varying randomly around the optimum and accept this as a constraint in prey models (Stephens, 1985). McNamara and Houston ( 1987b) have explored the consequences of making variable decisions an intrinsic part of optimization models. Under these conditions, costly deviations from optimality should be rare. as indeed they seem to be (Houston, 1987). Several experiments inspired by Lea's work have aimed to eliminate partial preferences by means of various procedural modifications. For example, Abarca and Fantino ( 1982)tested the hypothesis that performance would become more nearly optimal if ( I ) pigeons were trained longer on each experimental condition, thereby making it more likely that they knew the values of all the experimental parameters, and (2) variable-interval (VI) rather than fixed-interval (F1) schedules were used to represent handling time. Their birds did behave more nearly optimally than Lea's, but partial preferences were still observed. Snyderman ( 1983a) took things one step further by training birds to very strict stability criteria, requiring over 50 sessions in some cases. His data show that animals may need very extensive experience to "settle down" on schedules of this type and that under some conditions partial preferences can be almost abolished. Snyderman's procedure, while still an accurate simulation of item selection, differed in several ways from that used by other investigators. Perhaps most important, all components of the schedule were fixed times, and handling time for all items was a constant 10 sec composed of a wait before food, similar to the handling time in Lea's study, a variable-duration presentation of the food hopper, and a postfood wait. Thus while "prey" in earlier operant studies varied in h but not e . those in this simulation
Foraging and Operant Behavior
II
varied in c’ but not h. (Other methods have also been used, cf. Abarca. Fantino, & Ito. 1985; Ito & Fantino, 1986.) Moreover, Snyderman included the entire handling time in calculating optimal behavior. Other investigators typically include only prefood delay, and the fit of the data to optimality has sometimes been made worse by adding eating time (Ito & Fantino. 1986). Strictly speaking, Snyderman’s approach is correct. He has also shown, however, that when search is precluded for a constant time, pigeons greatly prefer the alternative with shorter prefood delay even if dh is less. Preference for short delays to food may be a constraint in optimality models (Snyderman, 1983a), even though disproportionate liking for short delays may itself be an adaptation to an uncertain world (Kagel, Green, & Caraco, 1986). Noting that searching for and handling real food items is like a ratio schedule because working faster brings the food closer. some investigators have substituted ratio for interval schedules in simulations like Lea’s (Collier & Rovee-Collier, 1981; Hanson, 1987; Peden & Rohe, 1984). The results are similar to those of other simulations. One problem in this area is that possible factors leading to partial preferences and other deviations from optimality have not been investigated independently. Thus, while it is clear that Snyderman’s animals did show long-term learning. it is not clear whether other aspects of his schedule also helped to reduce partial preferences. McNamara and Houston (l987b) conclude that what is needed here as well as in other cases is a systematic analysis of how partial preferences depend on circumstances. D.
DELAYREDUCTION
Lea (1979) concluded that a theory of choice to explain both operant behavior and choice in foraging situations should be developed. Fantino and colleagues (Abarca & Fantino, 1982; Fantino & Abarca. 1985; Fantino, Abarca, & Ito, 1987; Fantino, 1987) have attempted to show that such a theory is provided by the delay-reduction hypothesis, originally developed to account for choice on concurrent chain schedules ( Fantino, 1981).Concurrent chains differ from simulated prey selection in that during the choice phase (initial link) of the schedule the animal has two options available simultaneously. Each of these leads to a second link associated with a different mean delay to food (e.g., a V l schedule). The distribution of responses in the first link is such that the proportion of responses on each alternative matches relative reduction in delay to reinforcement associated with the respective second link. Stated in this way, the delay-reduction hypothesis is, like the matching law, a rule describing overall allocation of responses in a simultaneous choice situation. It can be applied to prey selection simulations by con-
12
Sara J. Shettleworth
sidering the moment when the animal chooses between accepting an encountered item and continuing to search. Exactly as in the classical prey selection model, the animal is predicted to select the alternative providing the shortest expected delay to food (see Fantino & Abarca, 1985). This rule does seem to provide an account of much of what goes on in operant simulations of foraging, but the delay reduction hypothesis as originally proposed was more a molar description of response allocation than a rule of thumb (Staddon, 1983). A number of experiments by Fantino and colleagues (see Fantino & Abarca, 1985) have shown that besides accounting for the results of operant prey selection experiments, the delay reduction hypothesis correctly predicts effects of variables like travel time. A weakness of some of this work is that the potential power of juxtaposing two quantitative models, optimality and delay reduction, is not exploited. Rather than precisely calculating predicted behavior for the particular situation under study, the authors have often only compared qualitative predictions (Kacelnik & Krebs, 1985b; Shettleworth, 1985b). A N D MEMORY FOR DELAYS E. SCALAR EXPECTANCY
The delay reduction hypothesis leaves unspecified how the animal represents the delays to food associated with each stimulus. In spite of the fact that behavior on concurrent chains (and. possibly, operant prey selection schedules) is influenced by whether FI or V1 schedules are used (Abarca & Fantino, 1982) the theorists' calculations for optimality or delay reduction involve only means. Moreover, because the delay reduction hypothesis for concurrent chains involves choice proportions, it implicitly accepts some variability in behavior, without addressing where it comes from or why fixed and variable schedules differ. A recently developed model of choice incorporating what is known about animals' memory for time deals with exactly these points (Gibbon, Church, Fairhurst, & Kacelnik, 1988). It has been applied quantitatively with great success to one foraging simulation (see Section VI), but it clearly has the qualitative features necessary to deal with others. In this model, an animal exposed to a schedule offering reward after a delay or mixture of delays is assumed to accumulate an exhaustive memory of the delays experienced after each response. Each delay is remembered with a standard deviation proportional to its mean, an assumption supported by experiments on timing (e.g., Gibbon & Church, 1981; Roberts, 1981). Each time it makes a choice, the animal is assumed to take one random sample from each of its distributions of remembered delays and pick the alternative with the shortest remembered delay. The scalar expectancy model thus describes a process of maximizing under constraint, the constraints being the variance in
Foraging and Operant Behavior
13
memory for times and the limited sample taken from memory o n each choice. How the predictions of this mechanistic model compare to those of the optimality model with variability developed by McNamara and Houston (l987b) would be well worth exploring. The scalar expectancy model immediately generates the well-known finding that animals prefer a variable delay to reward over a fixed delay with the same mean (see Section VII). It also predicts that if an animal is choosing between two alternatives which both offer variable times to reward, it will sometimes choose the less favorable alternative insofar as the remembered distributions of times to reward overlap. For the same reason, the objectively better item may occasionally be rejected in favor of continued searching. Information about the properties of the schedules allows quantitative predictions of choice probabilities (Gibbon ei ul., 1988). However, two qualitative points are of importance here. First, some variability in choice is to be expected. Second, variability should be greater when the animal has been exposed to variable schedules than to fixed schedules. Dealing as they do with means, neither the optimal prey selection model nor the delay reduction hypothesis makes this point. It has not been explicitly tested, but it is consistent with Snyderman’s finding of greatly reduced partial preferences after extended exposure to fixed schedules. The scalar expectancy model was designed to deal with performance after lengthy exposure to a given set of schedules, at which point the animal is assumed to have memories for all the delays the schedules have to offer. Immediate past history on a given alternative. for example, a run of bad luck, is assumed to have no effect. How quickly the memory distributions are altered is an open question. In later sections we will see evidence that recent experience can have large effects, suggesting that the “memory window” (Cowie, 1977) over which animals accumulate their distributions of times to reward may sometimes be rather small.
F. How WELL Do SCHEDULES SIMULATE REAL PREYSELECTION‘? The original purpose of operant simulations of foraging (e.g., Lea, 1979) was to discover whether phenomena predicted by optimal foraging models and observed in more naturalistic studies could be duplicated on appropriately designed reinforcement schedules and, if so. whether they could be accounted for by general models of operant performance. The studies reviewed in this section have been very successful in achieving these aims. Perhaps it is not surprising that a principle like delay reduction originally proposed for one kind of reinforcement schedule should be able to account for behavior on a second kind of schedule. However, attempts to refine this approach by applying the model of Gibbon et ul. (1988) to operant
14
Sara J. Shettleworth
prey selection could still be illuminating. They might also lead to incorporating a specific constraint (namely, information about how animals average times) into optimal prey selection models. If the delay-reduction hypothesis, refinements of it, or other models of operant behavior (e.g., Killeen, 1985) can account for behavior on schedules simulating prey selection, does this mean that the same model accounts for prey selection in nature? Some authors have acknowledged the necessity of eventually increasing the realism of their simulations to check on this. Such tests of external validity (Fantino & Abarca. 1985) have not been done as yet. However, there are some indications that selection among real prey items can differ in important ways from choice between signals on pecking keys. One major difference is that operant simulations, particularly those using interval schedules, model handling time as a delay to reinforcement. However, although handling is merely a time that precludes eating in optimality and delay reduction models, animals do not necessarily treat all delays equivalently. For example, rats prefer husking a sunflower seed to waiting through the handling time and then getting an already husked seed (Shettleworth & Jordan, 1986). This finding is from a simultaneous choice test, not a test of prey selection, but it does suggest that results with real handling could differ from those found when handling is pecking a key through a delay. In particular, the bias against accepting long handling times found by Lea (1979) might not be observed. Operant simulations neatly partition handling time into a wait, parallel to husking a seed or shelling a nut. followed by eating. Natural handling times need not include a time when eating is precluded. Consumption can occur throughout handling, as when prey must be slowly gnawed or dismembered bit by bit. As another example, 1 presented pigeons with either 15 small or 1 big, pea-sized pellet (Shettleworth, 1985a, 1987a). The collection of small pellets required longer to eat than the single one but had the same total weight. If pigeons are sensitive to handling time under these conditions, they should choose to feed in a patch with large pellets. However, in an operant situation, they chose the "patch" with food in ISpellet packets at the expense of a reduction in rate of energy intake. Again, the alternatives were not offered in a standard prey-selection paradigm, but the results show that members of the same species that behaves s o nearly optimally when handling times are delays do not invariably respond optimally to variations in elh. Snyderman (1983a) makes the same point with respect to the differing effects of pre- and postfood delays. Operant schedules model accepting or rejecting an item as performing a learned response to a signal associated with a given elh. But signals for profitability are not always learned. They can be unlearned or relatively fixed responses to reliable signals for profitability in the species' natural
Foraging and Operant Behavior
15
environment. For example, some animals can be described as using a rule of thumb, “take the larger,” and they can be fooled in artificial conditions (Barnard & Brown, 1981). Similarly, pigeons seem to have a rule of thumb, “more items are more food.” In contrast, operant studies of prey selection and the delay reduction hypothesis apply most obviously to cases where animals learn about profitability, something that has not been much studied with natural prey items. It is easy to think of examples, however. Whenever an animal learns a skill necessary to prepare a food item for consumption-as a squirrel learns to crack nuts or a bee learns to exploit certain kinds of flowers-it may continuously update its estimate of each item’s profitability. This kind of learning has also been involved in at least one artificial prey selection experiment. Houston et ul. (1980; see also Erichsen, Krebs, & Houston, 1980) studied great tits preying on pieces of mealworm inside lengths of plastic drinking straws. In order to conform to the predictions of optimality as closely as they did, each bird must have tracked its own efficiency at handling the straws. In fact, some individual differences in prey selection could be accounted for by considering differences in times taken by different birds to manipulate the straws. Given the success of operant simulations, future research should consider which prey selection problems are actually being modeled and what questions about learning these raise. Both optimal prey selection models and the delay-reduction hypothesis deal with long term means-mean energy values, mean handling times, and mean encounter rates. We have already seen reason to believe that variance in these quantities may be important as well. In addition, some animals respond disproportionately to recent patterns of prey availability. For example, shore crabs are more likely to accept a relatively unprofitable mussel the longer it has been since the last encounter with a relatively profitable mussel (Elner & Hughes, 1978; see Shettleworth, 1984, Fig. 7.3). Great tits are similarly affected by time since the last encounter (Lucas, 1987). Nor are such effects confined to “real” items. Hanson (1987) mentions informal observations that pigeons in an operant simulation of prey selection are more likely to take relatively unprofitable “items” after a run of such items. This observation contrasts with Snyderman’s finding that pigeons require many sessions to settle down when parameters of a prey selection schedule are changed. If only the most recent delays mattered, behavior would change very rapidly with schedule changes. Response to short-term changes can be modeled as a threshold of acceptability which falls continuously after an item is eaten and resets with each item accepted. It represents a rule of thumb similar to that in the motivational model of patch leaving proposed by Waage (1979) for parasitoid wasps (see Stephens & Krebs, 1986, Chapter 8). While such shortterm fluctuation in responsiveness might generate close to optimal behavior
Sara J. Shettleworth
16
under some conditions, it is not what is envisioned in more cognitive models of schedule behavior like that of Gibbon et al. (1988). One way in which short-term fluctuations can be reconciled with the notion of longterm averaging is by varying the memory window over which averaging takes place. It must be long for pigeons on the schedules analyzed by Gibbon et al., short for shore crabs preying on mussels. In conclusion, operant simulations of prey selection have been successful in many ways, but this does not mean that behavior under more natural conditions is always produced in the same way as on schedules in the laboratory. Optimal foraging models are very general and do not recognize the different mechanisms animals might use to meet their predictions. It would be unreasonable to expect all real prey selection to involve the same mechanisms of learning and choice that pigeons use on schedules. However, it might now be time to begin analyzing some more naturalistic examples of prey selection to see how, if at all, they involve mechanisms like those implicated in laboratory simulations.
V. Patch Departure and the Marginal-Value Theorem A.
HOW
LONGSHOULD
A
FORAGER STAY
IN A PATCH?
The prey selection problem discussed in the last section confronts a forager that has already chosen a patch to feed in. A second major foraging problem is how patches should be selected in the first place and how long a forager should stay in each one before traveling in search of anotherpatches, that is, are discrete collections of prey items, separated by areas with no prey. Examples might be freshly watered lawns for robins, oak trees for squirrels, or swarms of mosquitoes for bats. If patches never change in quality, the forager should find the patch offering the highest EIT and stay there forever. But in the real world and in many interesting operant simulations of it, patches do change in quality, and this creates a number of problems for the forager. Some of them are discussed in Section VI. This section, however, deals mainly with how animals should exploit patches that deplete as they forage. Depletion can be sudden, in which case the forager should start in the best patch, move on to the next best when it depletes, and so on. For example, a pigeon feeding alone in a patch of scattered grain removes items systematically. As a result, it experiences a sudden drop in EIT when it finishes its sweep of the patch (Baum, 1987). A theoretically more challenging problem is what the forager should do when it experiences resource depression, a gradual decline in rate of intake with time in a patch. This might be the case for the pigeon in our example if it was feeding in a flock.
Foraging and Operant Behavior
17
The well-known marginal-value theorem (Charnov, 1976) shows how long to stay in patches with resource depression: to maximize long-term EIT, the forager should stay until its rate of intake in the patch falls to the average intake rate for the environment. This means the forager should spend more time in a patch of a given quality the lower the quality of, and the greater the travel time between, other patches. Patch quality and travel time have nearly always had the predicted qualitative effects on patch residence times in laboratory and field tests (Stephens & Krebs. 1986). The confusion between the rules used by theorists and the rules used by animals has been particularly marked in discussions of the marginalvalue theorem. It has often been assumed that the marginal-value theorem means animals must assess instantaneous rate of gain in a patch and leave when this rate reaches a threshold determined by environmental quality. For this to make sense, prey items should not be discrete. Moreover, the average rate of gain in the environment is itself determined by the forager's choice of residence times. In addition, the patch model assumes that the forager recognizes patch types immediately upon encountering them and chooses a residence time for each patch type. If an animal assesses the quality of a patch as it forages, the marginal value theorem does not apply. Thus it is important to distinguish the marginal value theorem, which the theorist can use to calculate optimal behavior with given gain functions and travel times (as in Cowie, 1977),from the marginal value ride, which may or may not govern an animal's patch-leaving decisions and which may or may not be the optimal rule in a given environment (Stephens & Krebs. 1986). Perhaps in part because the marginal-value theorem makes the forager's task seem so difficult, there has been considerable theoretical interest in the performance of simple mechanisms (often referred to in this context as rules of thumb) that animals might plausibly use (review in Stephens & Krebs, 1986, Section 8.3). There are four sorts of candidates for leaving rules: (1) leave after a fixed number of prey have been found; (2) leave after a fixed time in the patch; (3) leave a fixed time after the most recent capture (i.e., after a run of bad luck, a giving-up time rule): and (4) leave when a certain rate of prey capture is reached (the marginal-value rule). In assuming a fixed threshold of time or number, the first three of these clearly involve unrealistic assumptions about animals' abilities to time or count. Moreover, optimal behavior is different when known features of animals' timing abilities are incorporated into a residence time model (Houston & McNamara, 1985). The best E/T that could be obtained by a forager using a given patch-leaving rule can be compared to the optimal EIT calculated from the marginal-value theorem and/or other rules under consideration. Which rule does best depends on how prey are distributed (Iwasa, Higashi. & Yamamura, 1981).
Sara J. Shettleworth
I8
SIMULATIONS OF PATCH PROBLEMS: B. OPERANT WHATDo ANIMALSDo?
Patch departure problems have been simulated in the laboratory in a wide range of ways with a number of different species. Table I lists published examples with vertebrates. These include laboratory studies with no operant schedule components (Krebs, Ryan, & Charnov, 1974; Mellgren, Misasi, & Brown, 1984), a field study with progressive interval schedules (Kacelnik, 1984). and several studies in which all components of the patch problem were operant schedules (e.g., Cuthill, Kacelnik, & Krebs, 1987; Hanson, 1987; Kamil, Yoerg, & Clements, 1988).
TABLE 1 LABORATORY SIMULATIONS
PATCH-LEAVING (MARGINAL-VALUE) PROBLEMS"
OF
Species
PT
TT
All patches depleting Cowie (1977)
Great tit
I
X
Cowie and Krebs
Great tit
2
Cuthill ct c d . (1987)
Starling
I
X
Hanson (1987)
Pigeon
2
X
Great tit
I
cil. (1982)
Blue jay
2
Krebs ef d . (1974)
Great tit
3
Rat
8
Great tit
I
Reference
PQ
GUT
X
N
X
N
(1979)
Kacelnik cf Kamil
ef
Mellgren
ef
ti/.
(1981)
d . (1984)
Ydenberg (1984)
N
X
X
Y
X
Y?
X
Y?
X
Y
Remarks Suggests memory window for capture rate Results after change fit time or number expectation Mixed travel times; stay longer after long travel times Leave after a prey item; deplete patches to same level Deterministic depleting patch Patches have 0 or I item: GUT used when items cryptic GUT same for all patches in environment Gain functions. etc.. not directly measured Varies pattern of schedule to test for use of GUT
Foraging and Operant Behavior
TABLE I (Continued) Reference Central place foraging Kacelnik (1984)
Kacelnik and Cuthill
Species
PT
TT
Starling
I
X
Starling
2
x
Rat
I
X
PQ
GUT
Remarks
N
Progressive intervals: leave after capture is optimal Load size increases with TT even without depletion Travel to central place equals work for accumulated food
x
( 1987)
Killeen
ci ( I / .
(1981)
One depleting, one stable patch Bhatt and Wasserman Pigeon
X
(1987)
Kacelnik
ci
d . (1987)
Kamil and Yoerg
Starling
X
N
Y?
Blue jay
(1985)
Kamil
ci
d . (1988)
Blue jay
X
Y
Depleting vs. multiple VR: birds nearly optimal Abandon depleted patch sooner when it was richer N o effect of prey distribution before depletion Birds use both KOBL and number of prey captured
“PT. Number of patch lypes: ‘IT, travel time vaned; PQ. patch quality varied: GUT. evidence obtained for use of giving-up time: no entry, not applicable or relevant data not collected.
Some authors (e.g., Hanson, 1987; Mellgren er al., 1984) have elected to simulate the full richness of a world in which the animal encounters patches of different qualities. The problem can be simplified by reducing the number of patch types to one. Here the patch is simulated by a progressive ratio schedule (i.e., the number of responses required for a reinforcer increases with each food item earned). At any time the animal can leave the patch and complete a “travel” requirement which resets the progressive ratio to its minimum value. In central-place foraging simulations the animal accumulates food at a decreasing rate while in a patch and then must travel to a central place to consume it or deliver it to the young. A simplification similar to offering just one patch type is to offer one depleting and one stable patch. At the beginning of a trial or session the depleting patch is better than the stable alternative, but at some point it becomes worse. The resource depression can be gradual (Bhatt & Was-
20
Sara J. Shettleworth
serman. 1987) or sudden (e.g., Kacelnik, Krebs, & Ens, 1987). This is usually referred to as a problem of patch assessment (Stephens & Krebs, 1986) because, in theory, rather than selecting a residence time on encounter the forager bases its decision to leave a patch on continuous assessment of its quality. The issue of experimental control vs. realism arises here just as with simulations of prey choice. For example, travel figures as a time in graphical solutions of the marginal-value theorem, and this would appear to justify operant simulations (e.g., Hanson, 1987; Kamil et al.. 1988) in which the animal “travels” by staying in one place and working on a schedule for a given time. At least on concurrent VI-VI schedules, moving from place to place has qualitatively similar results to pecking a key (Baum, 1982a). However, this may not always be the case. The energetics of locomotion may influence the fit between predictions of the marginal-value theorem and data (Cowie, 1977; Kacelnik & Cuthill, 1987). Moreover, travel may have other costs when the animal has to move across an open area where predators may be lurking (Mellgren, 1982). A second important artificiality of some simulations of patch choice is that a single key or perch may stand for many patches in different locations. Some authors (e.g., Olson & Maki, 1983) have suggested that spatial problems bring into play different learning and memory mechanisms than do nonspatial problems. Plowright (1988) has shown that in one operant foraging simulation, the spatial arrangement of the patches can influence how closely behavior approaches the optimum. The possibility that “patches” on keys are fundamentally different in some way from spatially defined patches will have to be addressed if the mechanisms shown to work on schedules are to be invoked to explain behavior in the field. Nevertheless, the fact remains that, in all the studies listed in Table 1. travel andior patch quality had the predicted effects on patch residence times, just as they do in more naturalistic situations.
How Do ANIMALSDO IT? C. THEPATCH-LEAVING DECISION: In order to leave a resource-depressed patch at the optimal time, a forager must in some way keep track of its intake rate in the current patch and compare this to the state of the rest of the environment. Necessary information about the rest of the environment includes remembered travel times, rates of gain in other patches, and probabilities of encountering different patch types. It is usually said that a major difference between this problem and prey selection is that successive choice is involved in prey selection whereas leaving a patch is a simultaneous choice problem. This can be seen by comparing Fig. 2, which diagrams Hanson’s (1987) operant simulation of
Foraging and Operant Behavior
FR I on A
VR a
21
FR I on A
vRrt t I tVRq 3 sec food
0
Active Key A=Amber R =Red G =Green
3 see food
@ Dark Key
Fig. 2. Flow diagram of a schedule simulating patch choice (Hanson, 1987) showing the states of the two pecking keys during the phases simulating travel between patches and search within patches. P, Probability value; FR. fixed ratio schedule; VR, variable ratio. The values of the schedule parameters, a , r. and g , can be changed to vary travel time and density of food within patches, respectively. In schedules simulating depleting patches, r and g increase with each successive item earned and reset to their initial values when the animal returns to the “travel” state. (After Hanson. 1987.)
patch choice, to Fig. 1, which shows his operant simulation of prey choice. The main difference between the two is that having chosen a patch (itself a “prey” problem) the animal can leave at any time. However, at the moment of choice, the situation facing the animal in both cases is whether to stick with the schedule at hand or search for another. This analysis shows that both problems can be seen as simultaneous choice and, as with prey selection, behavior ought to be predicted by the delay-reduction hypothesis (Cuthill et al., 1987; Hanson, 1987). The studies of patch choice in Table I are distinguished by several clever analyses of the information predators actually use in leaving patches. Taken together, these studies indicate that animals can use a number of patch-leaving rules. Exactly how these are related to delay reduction has not always been worked out. A number of studies have provided evidence that animals tend to leave a patch after a run of bad luck, i.e., they use a giving-up-time (GUT) rule: leave after so long with no prey. In one of the clearest examples (Ydenberg,
22
Sara J. Shettleworth
1984). great tits worked on a progressive random ratio schedule which they could reset by traveling to a nearby perch. The distribution of leaving times with respect to the last reward fit a GUT model. In probe trials, the schedule was modified after the first reward by introducing a long ROBL or by adding extra rewards at short intervals. Residence times were significantly shorter in the former case than in the latter, consistent with the use of a giving-up-time rule. In fact, however, the best rule for a stochastic schedule is a residence-time rule, because the expected reward probability declines with time or responses in the patch regardless of recent reward history (Iwasa et al., 1981). A GUT rule is optimal when prey are clumped, and it may be the rule great tits use because at least some of their natural prey are clumped (Ydenberg, 1984). However, this is not the only patch-leaving rule great tits can use. On a deterministic schedule (progressive fixed ratio) great tits learn to leave immediately after finding a prey item (ROBL of 0). which is optimal for this type of prey distribution (Ydenberg, 1984; but see Kacelnik, Houston, & Krebs, 1981). Thus it appears that animals can use different rules depending on the circumstances. Kamil et al. (1988) showed that, in addition, they may use a combination of cues. Blue jays were given a choice between a patch that offered reward 50% of the times it was chosen, but which depleted to zero after a fixed number of rewards, and a nondepleting patch with reward 25% of the times it was chosen. The birds learned to choose the depleting patch at the beginning of the session and switch to the nondepleting patch later, using both the number of items already found in the patch and the ROBL since the last prey as cues to switch (Fig. 3). When the number of prey in the depleting patch was increased from three to nine, the birds first stayed for too short a time but gradually increased their residence times as they gained experience with the nine-prey patch. Had they used a GUT rule alone, they would have shifted their residence time immediately to that appropriate to the new number; Cowie and Krebs (1979) report similar data. Kacelnik et al. (1987) trained starlings in a patch-assessment task similar to that used by Kamil et al. (1988) except that the time at which the unstable patch depleted to zero could not be predicted by the birds. Rather than looking at molecular aspects of behavior like GUTS, Kacelnik et d. asked whether the birds' behavior could be predicted by any of several simple linear operator learning models (see Dow & Lea, 1987a, for another application of these models to patch problems). None of the models could deal with the fact that when the unstable patch initially offered 75% reward the birds came more quickly to choose it a higher proportion of the time than when it offered 25% reward, while also abandoning it sooner when it suddenly depleted. This finding parallels the partial reinforcement extinction effect. To account for it with a linear operator learning model,
Foraging and Operant Behavior 0.7 3 Prey/
patch 0.4
0.3
I
Run of bad luck
-
---
>2 0
o'2LL
0.I
0.0 0
1
c*
2
3
0.7-
6 Prey / patch
o,6
0.504-
0.3-
0
1
2
3
4
5
6
9 Prey / patch
f
0.5
0.31 0.4
I
0.2 0.1
00
_--___-
------/
I
I--
0
1
I
I
I
I
I
I
1
1
2
3
4
5
6
7
8
9
Prey found
Fig. 3. Data from Kamil ef ul. (1988). showing how blue jays' probability of leaving a depleting patch depends on both the number of prey already found in the patch and the number of prey the birds have learned to expect in the patch as well a s on the length of the run of bad luck (ROBL), i.e.. number of successive responses without a reward. Data are shown for two ROBL lengths; those for ROBLs of I o r 2 were intermediate. The birds were trained successively with operant "patches" that provided 9. 6, or 3 prey items before depleting suddenly to zero. In each case, items occurred with a probability of S O per response until the patch depleted; the alternative patch had a constant probability of .25 of yielding an item.
24
Sara J. Shettleworth
Kacelnik et nl. added a process representing the animal's "confidence" that a patch has not changed for the worse. In this way they were able to deal with the fact that the effect of a given number of unrewarded trials depends on past reward history (see also McNamara & Houston, 1980). One marked difference between the behavior of Kamil et d . ' s blue jays and Kacelnik at d . ' s starlings is that the starlings continued to choose the depleted patch for much longer than they should have while the blue jays became very efficient at switching. This may be largely a matter of the different amounts of experience given the animals in the two experiments. Taken together, the results of these two experiments could be taken to support suggestions that experience can modify the memory window over which animals average experience (Cowie, 1977; Killeen. 1984). A run of bad luck or some other change in the schedule could, in principle, be either a cue to future reward contingencies or an element in the animal's average of experience in the patch. Gibbon et d . ' s (1988) model (Section IV) depicts animals as basing their decisions on memories of a large segment of the recent past. Such a large memory window is inappropriate for a rapidly changing environment. There is also evidence that it is not always used even when it would be optimal. Cuthill et nl. (1987) tested starlings in a simulation with varying travel times to a single stochastically depleting patch type. The optimal behavior with varying travel times is to stay in the patch a constant time appropriate to the mean of the travel times. However, the starlings stayed longer after a long travel than they did after a short travel, suggesting that the most recent travel time was overrepresented in their average of experienced travels. When they were exposed for many sessions to alternating long and short travels, some of the birds learned to use travel time as a cue. They stayed a shorter time after long travels, suggesting they had learned to expect a short travel (in terms of delay reduction, a short delay to reward after leaving the patch) after a long travel. The delay-reduction hypothesis has been applied in most detail to patch leaving by Hanson (1987; Hanson & Green, 1987). For some schedule values in his patch simulation, a straightforward application of the delayreduction hypothesis predicts different results than does the marginal-value theorem. However, the expected delay to reward on a depleting schedule depends on how many rewards are considered. Data that do not fit delay reduction with respect to the very next reward may fit when mean delay to the next several rewards is used. Taking the expected delay to the next two or three rewards into account gives approximately the right fit to Hanson's data. The data of Bhatt and Wasserman (1987) also support the suggestion that animals take into account more than the next reward on a progressive
Foraging and Operant Behavior
25
ratio. In a study that was not strictly a simulation of patch choice, Bhatt and Wasserman offered pigeons a choice between a progressive fixed ratio and each of several constant fixed ratios. The question was whether the birds would optimize by switching from the progressive ratio just before the response requirement exceeded that on the fixed alternative. The birds did well on the whole, but at some schedule values they tended to switch away from the progressive schedule too soon. Bhatt and Wasserman offer no very convincing explanation of this tendency other than saying it represents a win-shift strategy, but it is consistent with the birds taking into account more than the very next ratio value in assessing the expected delay to reward on the progressive ratio key. Mazur and Vaughan ( 1987) have developed a model of how animals integrate a string of expected future reinforcers which could be applied to foraging simulations using progressive ratios. D.
PATCHASSESSMENT
Lima (1984, 1985) studied a patch problem where the environment consisted of visually indistinguishable patches that have either no prey or some number of prey distributed at random. For example, patches might consist of 24 holes and, in different phases of the experiment, 6, 12, or 24 would have prey. Optimal behavior is to "sample" a fixed number of holes in each patch and leave if no prey have been found at that point. This is really a problem of incomplete information (Stephens & Krebs, 1986) related to those discussed in the next section, but it seems to bring into play decision processes similar to those used in other patch-leaving problems. Both woodpeckers (Lima, 1984) and starlings (Lima, 1985) behave close to optimally in this situation. Not surprisingly, the number of holes sampled before leaving an empty patch varies rather than being fixed, but the mode is generally quite close to the predicted value. Probe trials (Lima, 1984) show that, as in Kamil et d . ' s (1988) experiments, the birds are using runs of bad luck in a conditional way. If there were two prey items in the first few holes and the rest of the holes were empty, woodpeckers would open nearly all the holes in the patch, but they would leave after a much shorter run of bad luck if it was encountered before any prey were found in the patch. Several authors have noted the similarity of these results to studies of the partial reinforcement effect. In both cases, the lower the prevailing reward rate,the longer animals persist in the face of nonreward. The partial reinforcement effect can be analyzed functionally in much the same way as Lima's situation (McNamara & Houston, 1980). However, in Lima's
26
Sara
J. Shettleworth
studies the birds were repeatedly extinguished, whereas in most partial reinforcement experiments, as in the study of Kacelnik (Jt al. (l987), animals experience fewer runs of unrewarded trials.
E. QUESTIONS FOR
THE
FUTURE
Unlike prey selection, patch choice has been simulated in the laboratory with a variety of species and operant methods (Table 1). Taken together, these studies raise a number of questions about how animals achieve closeto-optimal behavior in patches with resource depression. Some of the same questions also apply to prey choice. Most important is the question how animals average. What determines the memory window over which they accumulate experience and the time horizon over which they take future rewards into account? Although discussions of optimal foraging have sometimes included qualitative statements about what the memory window should be (e.g., Cowie, 19771, standard optimality models have not usually dealt with the times over which averages are taken. In terms of studies of mechanisms, there are a number of possibilities, however. For example, patch leaving can be modeled by simple increment decay mechanisms in which the tendency to stay in a patch increases with each prey capture and decays during runs of bad luck (Stephens & Krebs, 1986; Waage, 1979). In this kind of model, a run of bad luck in effect decreases the animal's assessment of patch quality, just as a recent long travel changed the starlings' estimate of travel in the experiment of Cuthill et a / . (1987). But sometimes a run of bad luck or a specific travel time may be used as a cue. perhaps in conjunction with other informative cues (e.g., Kamil et d . , 1988). Moreover, the same species that leaves after a ROBL can leave immediately after a prey capture under other circumstances (Ydenberg, 1984). Apparently experience in a given situation can influence how animals use the information in it. Rather than more demonstrations that behavior approaches the optimum in laboratory simulations of patch problems, what is needed now is more attempts to formulate and answer questions about how animals solve such problems. VI. A.
Sampling and Information
SAMPLING vs. MOMENTARYMAXIMIZING
In a changing environment, the patch or prey item that was best yesterday may not be best today. Nuts may have ripened, flowers bloomed, or insects hatched in a part of the environment the forager previously rejected as unprofitable. The foraging literature is replete with suggestions
Foraging and Operant Behavior
27
that to keep track of these kinds of changes, animals should constantly sample alternatives other than the one that is currently the best. Krebs, Kacelnik. and Taylor (1978, p. 27) state the issue raised by this suggestion: We suggest that there are two simple types of maximising rules. On the one hand the predator might attempt to maximise its rate of food intake at every instant in time by always foraging in the patch with the highest expected reward rate. We refer to this strategy as “immediate maximising.” Alternatively, the predator might attempt to maximise its intake over the total foraging time and sacrifice short-term gain in order to acquire more information.
In contrast to foraging theorists’ suggestions that animals should sometimes sacrifice short-term for long-term gain, studies of operant behavior suggest that animals nearly always choose the alternative that maximizes reinforcement from moment to moment (Staddon, 1983). On this view, choices of other than the currently best alternative are errors of some kind, perhaps due to inherent variability (cf. McNamara & Houston, 1987b).That is to say, animals might sample in the functional sense, gaining information by occasionally choosing alternatives other than that with the best payoff at the moment, but whether or not a special mechanism underlies this behavior is a separate question. This distinction is illustrated by the examples in the next section. B. Two EXAMPLES Like many subsequent researchers, Krebs et a / . (1977, see Section IV) found that prey selection was not all or none: when great tits should have taken only large (profitable) items they took a few small (relatively unprofitable) ones as well. Krebs e f ul. suggested that one reason for such partial preferences was that animals “follow a deliberate sampling strategy” (p. 36) to keep track of the quality of small items. They recognized, however, that “it will be impossible to distinguish between ‘mistakes’ and ‘deliberate sampling’ until we have devised a specific predictive sampling model. Rechten, Avery, and Stephens (1983)took the essential next step. They assumed that if Krebs et a/.’sgreat tits were “deliberately sampling” they would sometimes reject a large item while waiting for a small one to sample, i.e.. “reject-large” errors should be positively correlated with “takesmall” errors. If. however, the birds were confusing large and small items, the two types of errors should be inversely related in a way predicted by signal detection theory. The latter turned out to be the case. This example makes two important points. ( I ) It is not enough to attribute deviations from short-term optimality to “sampling” for long-term gain. Behavior must be compared to an explicit model. (2) When this has ”
28
Sara J. Shettleworth
been done, it has turned out in all the cases examined so far that "sampling" (in the functional sense) is the outcome of a more general mechanism rather than a special kind of information-gathering behavior. The first of these points has not always been heeded. For example, Zeiler ( 1987) trained pigeons in a kind of probability-learning experiment in which reinforcement was assigned to one alternative on each trial and remained available there until it was collected. The optimal behavior here is always to select the alternative with the higher probability or shorter delay of reinforcement and choose the other one next if the first choice is not reinforced. The pigeons did not consistently follow this policy, even after extended training. Zeiler concludes that they were "sampling" the less profitable alternative and that the results are explained by "a combination of local optimality and sampling, supplemented by reinforcement . . ." (p. 31). This statement is vacuous because no attempt is made to distinguish failures of optimality or reinforcement from "sampling." In fact, it appears likely that a combination of delay reduction and scalar expectancy (see in Section 1V)could account for the data in a quantitative way.
c.
TRACKING A FLUCTUATING ENVIRONMENT
One type of situation that potentially requires sampling is diagrammed in Fig. 4. For simplicity, the environment consists of two patches. a stable patch which pays off with a constant but mediocre probability and a fluctuating patch which is sometimes empty but is sometimes much better than the stable patch. The forager can only tell the state of the fluctuating patch by visiting it, but then it can tell immediately. Clearly, the value of sampling the fluctuating patch will depend on parameters of the situation like the value of the stable patch, the value of the fluctuating patch in its good state, and the frequency of changes in the fluctuating patch. For a specifiable region of parameter values, a rate-maximizing forager should periodically sample the fluctuating patch and switch to it when it finds it in the good state. Sampling should be at regular intervals, e.g., every tenth foraging trip (Stephens, 1987). Shettleworth ef al. (1988) tested pigeons in an operant simulation of the situation modeled by Stephens (see Tamm, 1987, for another test with similar results). The probability of change in the unstable patch was constant throughout the experiment, and the predicted probability of sampling was varied by varying the random ratio in the stable patch (the higher the ratio, the less sampling predicted) and the random ratio in the good state (the higher the ratio, the more the birds should sample). The pigeons did sample, the frequency of sampling varied roughly as predicted by the model (Fig. 5). and the birds obtained nearly the maximum number of reinforce-
Foraging and Operant Behavior
29
Trials Fig. 4. Environment simulated in the experiments of Shettleworth ef a / . (1988). A stable patch provides a constant probability of reward (P,) while the fluctuating patch switches unpredictably between a bad state (probability of reward Pb) and a good state (probability of reward P& Optimal behavior i s to spend most time in the stable patch while the fluctuating patch i s bad. but to sample occasionally and switch to exclusive choice of the fluctuating patch when it i s found in the good state.
ments possible in the situation. However, sampling occurred at random, not regular, intervals. This could be dealt with by incorporating a constraint of random sampling into the model. Two problems remained, however: variations in the good state had little or no effect on the amount of sampling, and the birds continued to sample when the good state was no better than the stable patch. Both of these aspects of the data, as well as random sampling, are well accounted for by the general model of choice between reinforcer delays developed by Gibbon et ul. (1988; Shettleworth rt a / . . 1988, Appendix 2). Moreover, the scalar expectancy model accounts in quantitative detail for an aspect of the data not addressed by the optimality model: “reverse sampling,” or occasional choice of the stable side after the bird has discovered the good state. In this case, therefore, behavior which functions as sampling can be accounted for by the Same constrained optimal choice mechanism which can account for behavior on conventional schedules.
D. THETWO-ARMED BANDIT In the examples modeled by Lima (1984, 1985; see Section V) and Stephens ( 1987). sampling reveals which of a known set of densities prevails in a new or fluctuating patch. A more realistic and theoretically more
Sara J. Shettleworth
30
0.2 0
A Birds(n.4)
rn
C .--
0.1
t::
r 0
0.0
) .
n
2 0.2
n 0.I
0.0
-pg PS
1.0
0.5
0.05
0.5 0.3 1.0 0.5 0.3 0.10 0.30 Experimental conditions
0.3
1.0
Fig. 5 . Predictions of two models (histograms) and data from two groups of pigeons tested by Shettleworth el id. (1988). Top panel shows predictions of Stephens's optimal sampling model with the constraint of random rather than regular sampling. Lower panel shows the predictions of the causal model of constrained maximization developed by Gibbon ef id. (1988). Data points represent each group's mean probability of choosing the fluctuating side when it was in the bad state (i.e.. sampling) for the values of P, and P, that were tested. The bad state was always extinction and the probability of the fluctuating patch changing from bad to good or from good to bad was always .002 per trial.
difficult problem arises when the forager must learn patch densities in the first place, a problem that has become known as the two-armed bandit (Krebs ef al., 1978). An animal confronted with unknown patches must spend enough time visiting each (i.e., sampling) to be reasonably certain of choosing the best, but it should not unnecessarily spend time in unprofitable patches. This is clearly the problem of the optimal rate of learning, and because learning is so close to the heart of psychology, the first study of optimal behavior on the two-armed bandit (Krebs rt al., 1978) is an important point of interaction between behavioral ecology and operant psychology (cf., for example, Kamil & Yoerg, 1982; Plowright & Plowright, 1987).
Foraging and Operant Behavior
31
Although the general nature of the solution to the two-armed bandit seems obvious intuitively, it cannot be arrived at analytically but must be sought with dynamic programming. To make this task easier, Krebs et NI. calculated the optimum under the constraint that foraging was divided into “sampling,” during which the animal strictly alternated between the patches, and “exploitation.” during which it exclusively chose one. ( I t is sometimes concluded that they showed that the unconstrained optimum consists of sampling and then exploitation; on the contrary, they assumed it.) The major predictions from this approach, like those from the unconstrained optimum (Yakowitz, 1969) are that sampling should continue longer (or preference for the better patch should develop more slowly) the more similar the patch densities and the longer the time available for foraging (i.e., the time horizon). Krebs et (11. (1978) tested the predictions of this constrained optimal model (CONOP; Houston, Kacelnik, & McNamara, 1982) using great tits working on two concurrent random ratio schedules. Although concurrent ratio schedules have been often used (e.g., Herrnstein & Loveland, 19751, steady-state behavior rather than acquisition has usually been analyzed, and the work of Krebs et af. helped to stimulate an interest in models of acquisition. Krebs et (11. assessed the accuracy of CONOP’s predictions by dividing behavior into sampling and exploitation periods and showing that the more different the ratios on the two “arms” of the bandit, the earlier the point of transition. Their data only indirectly addressed the question of whether or not the amount of sampling was adjusted to the time horizon since session length was not varied, but Kacelnik (1979; see Houston et al., 1982; Dow & Lea, 1987b) later reported data consistent with the model’s predictions (but see Section VIII). The two-armed bandit is a clear case where the optimal solution, obtained by working backwards, cannot possibly represent what animals actually do. Therefore, as in the case of patch departure, there has been considerable interest in the performance of plausible rules of thumb (Dow & Lea, 1987a; Houston (’t al., 1982). The most appealing type of rule is given by some kind of linear operator learning model (Bush & Mosteller. 1951) in which the animal’s estimate of food density in a patch is an exponentially weighted moving average of past and recent experience. The models developed with patch choice in mind (reviewed in Dow & Lea, 1987a; Kacelnik et d.,1987; Lea & Dow, 1984) differ in such factors as when experience is updated and how decisions are based on the estimates in the two patches. It is likely to be difficult. if not impossible, to decide which, if any, of these models best represents acquisition on a two-armed bandit because there are so many arbitrary aspects of fitting such a model to a particular experimental situation. An additional problem is that al-
32
Sara J. Shettleworth
though successive choices are independent in a true two-armed bandit, they often have not been independent in experimental tests (e.g., Dow & Lea, 1987a,b; Krebs et al., 1978). Dow and Lea (1987a) compared behavior of a group of pigeons on a set of bandit problems with various initial reward probabilities and rules for depletion with several possible rules of thumb. No one rule was consistently the best in terms of overall reinforcers earned, and none best represented the behavior of the pigeons across all of the problems. Houston et ul. (1982) drew a similar conclusion when they compared rules of thumb to the behavior of Kacelnik’s (1979)great tits. Nevertheless, linear operator models have done reasonably well in accounting for how groups of animals distribute themselves between two patches (reviewed in Kacelnik & Krebs, 1985a) and how individual pigeons switch between two depleting patches (Shettleworth, 1987a). A deficiency of all such models is that they do not allow for the possibility of an animal learning the rules of a situation. for example, that a patch repletes when it has been left for awhile (Dow & Lea, 1987a). Another deficiency of this kind was highlighted by Kacelnik et d . ’ s ( 1987) study of partial reinforcement effects in suddenly depleting patches (Section V). Staddon and Reid (1987) suggest that animals use different learning rules in different types of situations, and some of these rules may not resemble linear operators. As they point out, an important task for the future is to understand what determines the specific learning rule which is called into play. Dow and Lea as well as Houston et al. compared the performance of hypothetical learning and decision mechanisms with the performance of real animals at a molar level in terms of total reinforcers earned. Another approach to answering the question of how animals do it is to analyze molecular aspects of behavior such as trial-by-trial patterns of choice. For example, Plowright (1988)finds that pigeons on a two-armed bandit with independent trials switch sides much more often than either optimality or a momentary maximizing rule would predict. Concurrent ratios are usually said to lead to maximizing (choosing the better alternative all the time: Commons et al.. 1982; Herrnstein & Loveland. 1975). but sampling of the worse side is not uncommon (Timberlake, 1984b). Among other things, it is influenced by the spatial arrangement of the patches (Plowright, 1988). Whether or not it is appropriately referred to as “sampling” in the causal sense remains to be determined. It does seem clear, however, that the initial period of visiting both poor and good patches reflects not a special kind of sampling behavior but a gradual learning process. The two-armed bandit is a foraging problem for which theory has outrun data. This may seem surprising since it is so similar to well-studied reinforcement schedules and to traditional learning problems such as suc-
Foraging and Operant Behavior
33
cessive reversal and probability learning. What seems to be needed is additional detailed analysis of how animals solve it.
VII. A.
RESPONSETO
VARIABLE
Risk
AMOUNTSA N D DELAYS
The currency of original optimal foraging models was mean EIT. The data and theory concerning whether animals should also respond to variance in outcomes (i.e.. be sensitive to risk) have been thoroughly discussed elsewhere (e.g., Caraco & Lima, 1987; Stephens & Krebs. 1986). Risk sensitivity is briefly reviewed here because it illustrates particularly well some of the issues raised in other sections. Some of the closest and most productive interactions between behavioral ecology and psychology occur in this area, and considerable progress has been made toward integrating theory and data on response to risk from animal behavior, human decision making, and economics (Kagel et ul., 1986; Rachlin, Logue, Gibbon, & Frankel. 1986; Staddon & Reid, 1987). A small bird foraging in winter needs to have enough energy reserves by dusk to survive the night. If its reserves and/or current rate of intake are low as dusk approaches (i.e., its energy budget is negative), its only chance for survival may be with the more favorable value of a high-variance option. It should therefore prefer variance, or be risk prone, under these conditions. If its energy budget is positive, on the other hand, it should avoid variance. Data consistent with this prediction have been obtained by behavioral ecologists testing a variety of small birds and mammals in the laboratory (see the reviews in Caraco & Lima, 1987; Stephens & Krebs, 1986). These animals have most often been risk averse, but when food deprived and/or given food at a low rate during the test, they may become risk prone. This result contrasts with the finding that rats or pigeons working on reinforcement schedules prefer variance, sometimes quite overwhelmingly (e.g., Herrnstein, 1964). Risk preference in this case can be seen as the outcome of fundamental mechanisms of reinforcement perception and averaging (Gibbon ef al., 1988; Mazur, 1986). With this view it is not immediately obvious how response to risk could change with energy budget. Tests of risk sensitivity by behavioral ecologists and studies of schedule behavior by psychologists have typically differed in a number of potentially important respects: small wild animals vs. larger laboratory rats and pigeons; discrete-trial vs. free operant procedures; variations in amount vs. delay of reinforcement. Animals that can go for hours or days without
34
Sara J. Shettleworth
starving might be expected to have a less well-developed response to risk than tiny wild shrews or sparrows, but the key difference between the two bodies of work is probably in whether delay or the amount of food is varied. This suggestion is supported by a recent demonstration that pigeons offered a choice between a constant number of food pellets and a variable number with the same mean prefer the constant amount on both discrete trial and free-operant procedures (Hamm & Shettleworth, 1987). Under the same conditions of deprivation and rate of intake in our discretetrial procedure, they choose a variable delay to food over a fixed delay with the same mean (Shettleworth & Hamm, unpublished). Another development of optimal risk-sensitivity models makes functional sense of this finding (McNamara & Houston, 1987a). In some regions of a positive energy budget, animals should be risk averse in amount and risk prone in delay. There is also a region of negative energy budget where the opposite pair of tendencies is optimal. Our pigeons could be said to be on a positive energy budget because, although their food was restricted between experimental sessions and they lost weight, within experimental sessions they were feeding at a high enough rate to obtain all their daily intake. Staddon and Reid ( 1987)develop a regulatory account of behavior on schedules which also predicts that under some conditions animals will be risk prone in delay and risk averse in amount. These functional accounts of why risk sensitivity should differ with amounts and delays has a natural causal counterpart in accounts of how reinforcers are perceived and averaged. Preference for a mixture of two delays over a constant delay equal to their mean can be explained as follows: ( I the value of food decreases disproportionately as delay to food increases and (2) perceived values of a mixture of delays are averaged arithmetically (Mazur. 1986; Gibbon et al.. 1988, give a slightly different account). The response to delay can be seen as a discounting of future rewards, a response that could be an adaptation to the inherent uncertainty of the future (Kagel ef ul., 1986). In the case of amounts, one need only assume that the animal's perception or valuation of the amount of food follows Weber's Law, (i.e., an increase in amount of food represents less than a proportionate increase in value) and that, as with delays, perceived values are averaged arithmetically. How this averaging process leads to risk proneness in delay but risk aversion in amount is shown graphically in Fig. 6.
B. EFFECTSOF ENERGYBUDGET Although models of optimal risk sensitivity predict changes in response to risk with changes in energy budget, the evidence that response to risk can change is not clear because most experimental tests have confounded
35
Foraging and Operant Behavior
Risk proneness
"2,
0
Amount of food
m
2m
Delay to food
Fig. 6. Schematic representalion of how preference for fixed amounts of food and for variable delays to food can arise from a simple averaging mechanism. The case depicted is that where the animal compares a given amount or delay to an alternative with the same mean but providing no delay or no food equally often as twice the given delay or amount. In each case the animal is assumed to compare the perceived value of the constant amount or delay ( V m Jto the arithmetic mean of the perceived values of twice the mean ( V 2 J and the minimum amount or delay ( V , , ) .
other factors with energy budget. For example, in switching juncos from a positive to a negative energy budget, Caraco, Martindale, and Whittam (1980) changed the mean reward size, the deprivation level at the start of the session, and the mean intertrial interval. Moreover, the rule determining intertrial interval was such that the side with the variable amount of food also had variable intertrial intervals. The birds therefore could have been responding to the variable delays between rewards rather than to the variable amounts. Similar problems afflict most other published work on this subject (McNamara & Houston, 1987a; Staddon & Reid, 1987). However, Hamm and Shettleworth (1987) reduced risk aversion in pigeons by decreasing only the rate at which food was presented on variable-interval schedules. As on other concurrent schedules, preference moved toward indifference as the V1 value increased, an effect accounted for by the delay reduction hypothesis (Fantino, 1981).However, preference did not actually reverse. The case might be different with delays. Although the effect of energy budget on response to variable delays has apparently not been tested, Kagel ef al. (1986) review considerable evidence that more hungry animals discount the future more strongly. This should lead to more strongly riskprone behavior in hungry animals choosing between fixed and variable delays. Another problem needing investigation is just what determines any en-
36
Sara J. Shettleworth
ergy-budget effects that may be found. As mentioned previously, Caraco et ul. (1980) sought changes in response to risk by varying a number of factors relevant to energy budget. Hamm and Shettleworth (1987) observed the predicted effect of energy budget when they varied feeding rate alone, but the data reviewed by Kagel rt al. (1986) suggest that hunger alone should have an influence. The role of each of these factors needs to be examined separately. In addition, psychophysical studies of reinforcement value such as those of Mazur (1986) need to be done with amount as well as with delay, and with the role of factors relevant to energy budget taken into account. Finally, the question of whether or not different kinds of animals (for example, big vs. small) differ in their response to risk remains to be explicitly addressed. In keeping with the generality of optimal foraging models, tests of risk sensitivity have involved a wide range of species, from bees to rats (Stephens & Krebs, 1986). but species have not been explicitly compared.
VIII.
Time Horizons
A. THEPROBLEM *
Few animals can expect to forage continuously. Foraging may be interrupted by the arrival of rivals or predators; night may fall or the forager may need to groom, drink, or return to its nest. Thus, the time available for foraging, the forager’s time horizon, is usually limited. In models of several foraging problems discussed so far, time horizon partly determines optimal behavior (see the review in Krebs & Kacelnik, 1984). For instance, the model of risk sensitivity discussed in Section V11 is derived from the assumption that a given energy requirement must be met by a given time, such as nightfall. If the animal is never interrupted, risk aversion is always optimal. Not only changes in response to risk varying with changes in energy budget, but discounting of future rewards can be derived on the assumption that foraging is unpredictably interrupted. McNamara and Houston (1987a) suggest that foragers may invariably behave as if time horizons are rather short. This suggestion enjoys some experimental support. For example, Staddon (1980)concludes a review of choice in foraging and operant situations as follows: “The evidence presently available suggests that animals in choice situations act as local maximizers of some sort, picking the alternative with the highest local rate or probability of payoff. They behave as if they are unable to look very far ahead” (p. 127). Timberlake (1984a; see also this section) drew a similar conclusion from his demonstration that rats did not “take into account” whether free food would be available at the end of a session of bar pressing for food.
Foraging and Operant Behavior
37
The belief supported by these authors, that the behavior of animals both in the wild and in operant chambers is controlled by short-term, local events, is reflected in the fact that, until recently, psychologists paid little attention to the possibility that length of experimental sessions might influence food-rewarded behavior. The length of operant sessions has traditionally been a matter of convenience, but it has now become an issue in connection with discussions of the merits of experiments in open vs. closed economies (Hursh, 1980; Collier, 1983). Animals working for food in an open economy are food restricted and receive supplementary food in their home cages. In a closed economy, the animal receives all of its food in the experiment. Experiments with closed economies have typically [though not always (see Abarca et al., 1985; Timberlake & Peden, 1987)l differed from those with open economies in having animals live in the experimental apparatus rather than visit it for a short time each day. Thus, animals in closed and open economies have vastly different time horizons. Although differences between results from closed and open economies have been reported (Collier, 1983; Collier et ul., 1986; Rashotte & O’Connell, 1986). these could well be due either to differences in the subjects’ motivation or to details of experimental procedure (Staddon & Reid, 1987; Timberlake & Peden, 1987; but see Collier et al., 1986). If unambiguous responses to time horizon were to be demonstrated, these would have implications for experiments in which closed and open economies are compared. Only a few tests of optimal foraging models have explicitly investigated whether or not variations in time horizon influence behavior. An effect of time horizon per se does not seem to correspond to any phenomena known from the operant laboratory, with the possible exception of closed vs. open economy differences. Thus, the question for psychologists is this: Through what mechanism does the influence of time horizon, if any, occur? Particularly intriguing is the possibility that anticipation of the time available for foraging influences choice. However, time-horizonlike effects could result from other factors that are confounded with time horizon in specific situations. For example, a forager expecting to be interrupted by predators may be more fearful than one not expecting to be interrupted, and different levels of fear rather than anticipation of different times for foraging could influence its behavior. Models that prescribe effects of time horizon do so without reference to specific factors that set the time horizon. If animals do modify their foraging behavior as a function of time horizon in itself, such models would gain plausibility. B. TIMEHORIZONS ON
THE
TWO-ARMED BANDIT
Models of optimal behavior on the two-armed bandit (Section V1) predict that there should be more sampling with long time horizons than with
38
Sara J. Shettleworth
short time horizons. This prediction makes some intuitive sense: The longer time available to use it, the more valuable information is. In contrast to CONOP (Houston ef a / . , 1982) and the unconstrained optimum (Yakowitz, 1969), most rules of thumb for the two-armed bandit that have been explored do not predict an effect of number of trials (Houston et a / . , 1982). Linear operator learning models cannot account for different rates of acquisition in sessions of different length without ad hoc modifications to memory windows or learning rate parameters (cf. Dow & Lea, 1987b). Krebs et a / . (1978) reported that their great tits met the criterion of a switch from sampling to exploitation at a point in the session close to that predicted for the average session length they experienced (but see PloWright & Plowright, 1987, for corrected predictions). Kacelnik (1979; see Houston et d.,1982) later explicitly varied session length in a similar experimental arrangement. Although short of statistical significance, his results were consistent with the predictions of CONOP. Dow and Lea (1987b) ran pigeons on a two-armed bandit with sessions of 256 or 1024 choices and had similar results. However, there are some problems with both of these studies. Neither was a true two-armed bandit with independent trials; in each case there was more cost to switching sides between trials than to staying on the same side, and this meant that the birds actually "sampled" rather little, especially in the study of Dow and Lea. More important. it is not clear to what extent any effect was produced by information about different times available for foraging as opposed to differential carryover from sessions of different lengths. This problem arises because, in the experiments of Kacelnik and of Dow and Lea, the birds were taught about the current time horizon by exposing them to a series of either long or short sessions. In a long session there is more time to develop a strong preference for the better side of the bandit and, therefore, a greater likelihood that preference will carry over from one session to the next and either facilitate or interfere with development of a new preference. In Dow and Lea's experiment, the better side of the bandit was usually switched from one session to the next. Retarded acquisition with long sessions could, therefore, have been due to differential carryover. Shettleworth and Plowright ( 1988) tested pigeons on concurrent random ratio schedules in two experiments designed to separate carryover effects from effects of time horizon per se. We found no consistent effect of time horizon: in some cases birds learned faster in 50-trial sessions than in 250trial sessions, but in other cases there was a significant difference in the opposite direction. In nature, which patch was better yesterday is likely to be a good predictor of which patch is better today. Thus, carryover of preference from one day to the next could be adaptive in itself. In the experiments of Shettleworth and Plowright, carryover effects overwhelmed
Foraging and Operant Behavior
39
any possible effects of time horizon, suggesting that purported demonstrations of time horizon effects that have not controlled for carryover should be interpreted cautiously. One possibly important difference between our methods and those of previous authors is that our sessions were much longer in terms of time: each trial took about 6 sec instead of a single peck or hop on a perch. Data reviewed later in this section suggest that animals generally cannot “look ahead” for more than about 15 min. Clear effects of time horizon, such as those predicted by optimality models, might only be expected when foraging bouts are very short. A study taking into account both possible confounding carryover effects and total session lengths remains to be done. Meanwhile. whether or not, and if so how. time horizon influences performance on a two-armed bandit remains an open question.
c.
DIETSELECTION
I N SHORT
BOUTS
The standard model of diet selection (Section 1V) prescribes rejecting an unprofitable prey item whenever the forager cannot expect to finish handling it before the next profitable item arrives. This prescription applies only if foraging time is infinite. If the foraging bout will surely end before the next profitable item can be expected, the predator has no reason to reject the unprofitable item at hand. Thus, with given prey items and densities, predators should be less selective if foraging in short bouts: selectivity should also decrease toward the end of a bout provided this time is predictable (Lucas, 1983, 1987). Thus, as with risk sensitivity (see McNamara & Houston, 1987a). taking time horizon into account leads to a dynamic model in which behavior changes through time. Lucas ( 1983. 1987) reviews considerable literature that depicts animals as being constantly interrupted by environmental events or other animals and forced to forage in short bouts. There is some evidence for reduced selectivity under these conditions. For instance, when salamanders are in area marked by another salamander, they continually interrupt foraging with territorial marking and other behavior. Selectivity between different sizes of flies is less than when they are on their own territory. While such results clearly fit Lucas’s functional model, from the point of view of a mechanistic analysis it is not clear whether the change in selectivity is brought about by a change in motivational state or by information about the time horizon. In the one reported test that successfully eliminated the influence of factors other than time horizon (Lucas, 1987). great tits selected prey in the conveyor belt situation used by Krebs et ~ i l (1977). . The birds were accustomed to 30- or 60-sec bouts for many trials before data were collected. Selectivity was less in the shorter bouts (i.e., more unprofitable
40
Sara J. Shettleworth
prey were taken). It also decreased more markedly within short bouts than within long bouts, an effect attributable to the birds being better able to discriminate the end of a short bout (Gibbon & Church, 1981). One possible confounding in this experiment is that the birds apparently obtained twice as much food in sessions with long foraging bouts and, therefore, would have been less hungry in the last part of the session than they ever were in sessions with 30-sec bouts. Hunger does decrease selectivity in diet selection and other situations (Snyderman, 1983b; Kagel ef ul., 1986).However, while differences in average hunger levels might account for differences in average selectivity, they cannot explain the difference in how selectivity changed within long and short bouts. This seems most plausibly accounted for as due to the birds accurately anticipating the end of the short foraging bouts. A N D TIMEHORIZONS D. PATCH DEPARTURE
A custom in many operant laboratories is not to feed food-restricted animals immediately after a session of working for food. If the animal expects free food. the reasoning goes, its behavior will not be as well controlled by the conditions for obtaining food in the experiment. This notion seems to be built partly on the assumption that animals should minimize the total effort they expend for food over periods of hours or days. It is clearly contradicted by evidence that animals heavily discount future rewards (Kagel ef al., 1986) and by data such as that cited at the beginning of this section indicating that animals maximize locally without necessarily maximizing globally. Timberlake ( 1984a) put the conventional wisdom about postsession feeding to the test by giving rats a session of bar pressing on a progressive ratio schedule followed at various times later by a session of free food sufficient to make up the rats' daily requirements. If the rats could integrate information from the two sessions and if they minimized total effort per unit of food over a time horizon of more than an hour, they should not be willing to work on very high ratios when the free food was going to be available soon. In fact, however, bar pressing was unaffected by the presence of free food except when the free food was given just befort>the session. Timberlake, Gawley, and Lucas (1987) further explored the limits of this phenomenon by making the low-cost food available for pressing a lever in the same chamber where rats were working on a progressive ratio, The low-cost food became available on a continuous reinforcement schedule at various times up to 2 hours after the session began. Thus, unlike the case in the former experiment, the animals could leave the progressive ratio side of the chamber at any time and wait for the free
Foraging and Operant Behavior
41
food. They did do this to some extent at all delays, but working for food was only suppressed when the low-cost food was delayed 16 min or less after the start of the session. As Timberlake (1984; Timberlake el al., 1987) points out, in some ways this situation is like that in experiments reviewed in Section V where animals choose between continuing to work in a depleting patch and traveling to a richer patch. It also has some similarities to experiments on “selfcontrol” in which animals choose between an immediate small reward and a larger delayed reward. In all three cases the animal has to integrate information about current and future availability of food. We have seen, however, that the data on patch-leaving decisions may be accounted for at a molecular level in terms of the delay-reduction hypothesis. In these terms, animals do not choose to wait for cheap food rather than work for costly food. Rather, they compare waits to the next few food items and take the shortest. It would be worth looking at the temporal details of behavior in Timberlake’s situation to see if it is susceptible to the same kind of account. Certainly, these data do not contradict the view summarized at the beginning of this section-that, in general, animals behave as if they have very short time horizons. It is notable that the clearest evidence for an effect of time horizon such as that predicted by an optimal foraging model, that from Lucas’s ( 1987) experiment, involves very short intervals of the sort animals can time quite accurately (Gibbon & Church, 1981).
IX. Conclusions Much of the literature reviewed here takes for granted Lea’s (1981) conclusion that psychology can supply mechanistic analyses of foraging and goes on to analyze the role of particular behavioral mechanisms in particular aspects of foraging. Developments in both optimal foraging theory and psychology have considerably enriched this enterprise since it was reviewed by Lea (1981) in its early stages. New foraging models have appeared. Psychological data and theory are integral to tests of those dealing with risk and information (Sections VI and VII). Standard models of prey and patch choice have been refined to take into account constraints on the forager such as imperfect counting, timing, and discrimination. At the same time, new theory and data on timing and choice have encouraged the analysis of foraging behavior at a finer level of detail. Analysis of any specific foraging problem leaves a number of unanswered questions. Some apply to many aspects of foraging. For example, the success of operant schedules in simulating foraging should encourage researchers to begin exploring the ground between schedules and natural
42
Sara J. Shettleworth
foraging behavior. A second general problem is that while standard foraging models often predict all-or-nothing choice, experiments nearly always reveal partial preferences. This discrepancy can be dealt with by incorporating known facts about imperfect memory, discrimination, or timing into optimality models as constraints (Cheverton et a / . , 1985). Alternatively, variability can be made intrinsic to an optimality model (McNamara & Houston, 1987b). How the degree of partial preference varies with conditions and how it can best be accounted for are questions that are ripe to be addressed. Another issue, pointed out by Lea (1981) but still not resolved, lies in the tension between molar and molecular accounts of behavior. What determines the memory window over which animals average and the time horizon over which anticipation of future events influences their choices'? Reviewed in this contribution are cases where very recent events matter disproportionately and other cases where behavior changes only gradually with experience. As well, experience can apparently influence how animals use the information in a given situation and what learning and performance rules they bring into play. Theory and descriptive data dealing with how this happens need to be developed. Kamil and Yoerg ( 1982) identified several barriers to effective communication between psychologists and behavioral ecologists. They concluded that these made a truly synthetic study of learning and foraging unlikely in the immediate future. More recent developments have belied this conclusion. A number of recent reviews of foraging problems include integrated discussions of literature from operant laboratories and experiments by behavioral ecologists. Some of the best examples are in the area of risk sensitivity (e.g.. Kagel ez a / . , 1986; McNamara & Houston, 1987a; Staddon & Reid, 1987). Investigations in which quantitative models of optimal behavior and quantitative predictions from psychological models are juxtaposed are the most fruitful expression of this kind of integration (e.g., Cuthill et d.,1987: Fantino & Abarca, 1985; Hanson, 1987; Kacelnik et a / . , 1987; Shettleworth ez a/., 1988). They have already lead to refinements of several foraging models and are leading to new insights into the control of operant behavior as well. In answer to the question posed by the title of this article, it would not be true to say that operant studies of foraging have produced any wholly new accounts of operant behavior nor that they have provided many new answers to old questions (but see McNamara & Houston, 1980). Viewing operant behavior as foraging has, however, lead to new questions being asked, particularly in areas where functional models predict phenomena different from what would be expected on the basis of psychological research and theory. Apparently conflicting functional and mechanistic predictions have been tested in t h e areas of sampling behavior, response to time horizon, and the effects of energy budget on response to risk. In the
Foraging and Operant Behavior
43
process, some insights have been gained into the possible function of phenomena observed in the laboratory as well as into the mechanisms animals may use in foraging and the kinds of constraints future optimal foraging models may need to incorporate. Ultimately, of course, functional and causal approaches should converge on complementary explanations of behavior. Optimal foraging theory is not without its critics (Gray, 1987; Stephens & Krebs, 1986). The standard models referred to here often seem too simple to apply to behavior in real situations. Nevertheless, however future developments may transform foraging theory, it will have drawn attention to an aspect of behavior in which animals acquire and use information about the environment in interesting ways. Whether they perform optimally in terms of current models or not, understanding how they forage presents an important challenge to students of learning and choice.
ACKNOWLEDGMENTS I thank Cynthia Thomas for help with the more tedious aspects o f preparing this contribution and also for 5 years’ help in studying foraging. Production of the article as well :is the research from my laboratory described i n i t have been generously and steadily supported by the Natural Sciences and EngineeringCouncil of Canada. I thank Shannon Hamm. Alasdair Houston. John Krebs. Catherine Plowright. and Pamela Reid for comments on the manuscript.
REFERENCES
..
Abarca. N & Fantino. E. ( 1982). Choice and foraging. Joctrncrl i j f ’ r l i ~E.v/wrir?icw/cr/A/rcr/ysi.\ f!~nt~i~crl~if~r. 38, I 17-123. Abarca. N.. Fantino. E.. & 110. M. (1985). Percentage reward in an operant analogue to foraging. Ani/tr(r/ fkhcrr3iorrr. 33, 1096- I 101. Barnard. C. J.. & Brown. C. A. J. (1981). Prey size selection and competition in the comnim shrew. Bchcriiord Ecology m i d Sociobiology, 8, 239-243. Batson. J. D.. Best. M. R.. Phillips. D. C.. Patel. H.. & Gilliland. K . K. (1986). Foraging on the radial-arm maze: Effects o f altering the reward at a target location. Airiurtrl LcJtrrflirlg tr,tci nc+trlvl)r. 14, 244-248. Baurn. W. M.( 1982a). Choice. changeover. and travel. Jorrrncrl c!f’/lra E.t-primt,nr~r/Arrtrlysis of’Bzlrtriior. 38. 35-49. Baum. W. M. (1982b). Instrumental behavior and foraging in the wild. I n M. L. Commons. R. J. Herrnstein. & H. Rachlin (Eds.), Q r r c t n r i r t r t i ~ ~ e~ r n ~ r / . v . s e of’ s lwlrcri-iorc V o l . 2. Mtrrchiug tint/ itiu.vi/uizing t r c w r r n t s (pp. 227-240). Cambridge: Ballinper. Baum. W. M. (1983). Studying foraging in the psychological laboratory. I n R. L. Mellgren (Ed.), Airinin/ c ~ ) g n i / i o ntriid hclroiior (pp. 253-2831. Amsterdam: North-Holland Publ. Baurn. W. M. ( 1987). Random and systematic foraging. experimental studies of depletion. and schedules of reinforcement. I n A. C. Kamil, J. R. Krebs. & H. R. Pulliam (Eds.). Forcrging bclrtriior (pp. 587-607). New York: Plenum.
Sara J. Shettleworth
44
Bhatt. R. S.. & Wasserman. E. A. (1987). Choice behavior of pigeons on progressive and multiple schedules: A test of optimal foraging theory. Jorrrnirl 4' E.rpi~riiiiiwrcr/Psychlogv: Aniriiul Brhiriior Prcii~rssrs.13, 40-5 I. Bush, R. R.. & Mosteller. F. (195 I ). A mathematical model for simple learning. P.syc~lro/i~gic~cr/ Reiieii.. 68, 3 13-323. Caraco, T., & Lima, S. L. (1987). Survival, energy budgets. and foraging risk. In M. L. Commons, A. Kacelnik. & S. J. Shettleworth (Eds.), Quunritutiiv unuly.se.s of behiiiior: Vol. 6 . Foruging (pp. 1-21). Hillsdale, New Jersey: Erlbaum. Caraco. T.. Martindale, S., & Whittam, T. S. (1980). An empirical demonstration of risksensitive foraging preferences. Animal Behuviour. 28, 820-830. Charnov. E. L. (1976). Optimal foraging: Attack strategy of a mantid. Ameriiun Nirrrrrirlisr. 110, 141-151.
Cheverton, J., Kacelnik, A.. & Krebs. J. R. (1985). Optimal foraging: Constraints and currencies. In 8. Holldobler & M. Lindauer (Eds.), Experimentul hehuviorul ecology und sociobiology (pp. 109-126). Stuttgart: Fischer. Collier, G. (1983). Life in a closed economy: The ecology of learning and motivation. In M. D. Zeiler & P. Harzem (Eds.), Advuncrs in unulysis o f behuvior: V o l , 3. Biologicd fucfors in leurning (pp. 223-274). New York: Wiley. Collier. 0 . H., Johnson, D. F.. Hill, W.L., & Kaufman, L. W. (1986). The economics of the law of effect. Jortrnul of f h e Experimentul Anulysis of Behuvior. 46, 113-136. Collier, G . H., & Rovee-Collier. C. K. (1981). A comparative analysis of optimal foraging behavior: Laboratory simulations. In A. C. Kamil & T. D. Sargent (Eds.), Foruging behavior: Ecobgicul, ethologicul. und p s y c h d o g i c d upproiiches (pp. 39-76). New York: Garland STPM. Commons, M. L., Herrnstein. R. J.. & Rachlin, H. (Eds.) (1982). Quimrifutive irnu1y.si~sof behavior: Vol. 2. Mutching und muximizing accounts. Cambridge, MA: Ballinger. Commons, M. L., Kacelnik. A.. & Shettleworth, S. J. (Eds) (1987). Quuntifcrrii~ei111~1y~e.s of behavior: V o / . 6 . Foraging. Hillsdale, NJ: Erlbaum. Cowie. R. J . (1977). Optimal foraging in great tits (Purris tni!jorJ. Ncrfrrri~ILondonJ. 268, 137- 139. Cowie. R. J.. & Krebs. J. R. (1979). Optimal foraging in patchy environments. In R. M . Anderson. B. D. Turner. & R. L. Taylor (Eds.). Po~prilufiondyntrrnic~.s(pp. 183-20.0. Oxford: Blackwell. The i:ffi,i.rs Cuthill. I. C., Kacelnik. A.. & Krebs. J. R. (1987). Sfiirlings exphiring prrti~hi~s: qfreixwt expiviiwce on.fi>riigingdeiisicins. Kings College Research Memo. Kings College. Cambridge. Dow. S. M.. & Lea. S. E. G. (1987a). Foraging in a changing environment: Simulations in the operant laboratory. In M. L. Commons. A. Kacelnik. & S. J . Shettleworth (Eds.), Qriunririrrii~i~ im111yses ifbehiiiior: V d . 6 . Foruging (pp. 89-1 13). Hillsdale. NJ: Erlbaum. Dow. S. M.. & Lea. S. E. G. (1987b). Sampling of schedule parameters by pigeons: Tests of optimizing theory. Anirnirl Behiri~iorrr.35, 102-1 14. Elner. R. W.. & Hughes. R. N . (1978). Energy maximization in the diet of the shore crab. Cirrr*intrs muencrs. Jorrrnirl qf Anirnirl Eciilogv. 47, 103-1 16. Erichsen. J. T.. Krebs. J . R.. & Houston. A. I. (1980). Optimal foraging and cryptic prey. Jortrnd i~fAniinii1Ei~ok>gy.49, 271-276. Fantino. E. ( I981 ). Contiguity, response strength, and the delay-reduction hypothesis. In P. Harzem & M. D. Zeiler (Eds.). Adwnces in crnu1y.si.s qf brlrcriior: Vol. 2. Predic~rcrhility. i~orrelcrtionirnd i~onfigrrity(pp. 169-201 1. Chichester: Wiley. Fantino. E. ( 1987). Operant conditioning simulations of foraging and the delay-reduction hypothesis. In A. C. Kamil. J. R. Krebs. & H. R. Pulliam (Eds.). ~ 1 J ~ l l g i nbi4rr11ior g (pp. 193-214). New York: Plenum. Fantino. E.. & Abarca. N . (1985). Choice, optimal foraging. and the delay-reduction hypothesis. Bi~hiii~itirirl di Bruin Scirnivs. 8, 3 15-330.
Foraging and Operant Behavior
45
Fantino. E.. Abarca. N.. & Ito. M. (1987). Choice and optimal foraging: Tests of the delayreduction hypothesis and the optimal-diet model. In M. L. Commons. A. Kacelnik. & S. J. Shettleworth (Eds.). Qiritnti/ii/ii*eiiniilysi~sqfhehtriior: V O ~6.. FfJrilpinp (pp. I 8 I207). Hillsdale. NJ: Erlbaum. Getty. T.. Kamil. A. C.. & Real. P. G . (1987). Signal detection theory and foraging for cryptic or mimetic prey. In A. C. Kamil, J. R. Krebs. & H. R. Pulliam (Eds.). Forirging hehiriior (pp. 525-5481. New York: Plenum. Gibbon. J.. & Church. R. M. (1981). Time left: Linear versus logarithmic subiective time. Jortrncrl of E.rpivirni~rr/itlPsychology: Anirnid Behiivior Proi~cssi~s. 7, 87- 108. Gibbon, J., Church, R. M., Fairhurst, S.. & Kacelnik, A. (1988). Scalar expectancy theory & choice between delayed rewards. Psychological Review. 95, 102-1 14. Goss-Custard. J. D. (1977). Optimal foraging and the size selection of worms by redshank. Tringia totanirs. in the field. Animal Behaviour. 25, 10-29. Gray, R. D. (1987). Faith and foraging: A critique of the “paradigm argument from design.” In A. C. Kamil, J. R. Krebs. & H. R. Pulliam (Eds.). Foraging behiriior (pp. 69-140). New York: Plenum. Guilford, T.. & Dawkins. M. S. (1987). Search images not proven: A reappraisal of recent evidence. Animal Beliuviour. 35, 1838-1845. Hamm, S. L., & Shettleworth, S. J. (1987). Risk aversion in pigeons. Journiil c!ffipi~rimentirI Psvclrology: Animal Behavior Processes, 13, 376-383. Hanson. J . (1987). Tests of optimal foraging using an operant analogue. I n A. C. Kamil. J . R. Krebs, & H. R. Pulliam (Eds.). Foriiginp h i h i * i o r (pp. 335-362). New York: Plenum. hy pi,!!l’O/l.s. Hanson. J.. & Green. L. ( 1987). Forrrpinp diJcisions:Pirfi,h choii~iJirnd i’.Vp/Oililti~J/? Submitted. Herrnstein, R. J. ( 1964). Aperiodicity as a factor in choice. Jorirnirl of //re E.rpi~rimc,n/ir/ Ancc1y.vi.v c?fBi~htll’i(Jr. 7, 179-182. Herrnstein, R. J., & Loveland. D. H. (1975). Maximizing and matching on concurrent ratio schedules. Jotrrniil of the E.rpi~rinriwtiilAnulvsis of Bchiri*ior.24. 107-1 16. Herrnstein. R. J.. & Vaughan. W. (1980). Mehordtion and behavioral allocation. In J . E. R. Staddon (Ed.). Lirnils t o ticlion: Thc iil/oi~ir/ionof’individrritl hi~hiriior(pp. 143- 176). New York: Academic Press. Hogan. J . A. (1984). Cause. function. and the analysis of behavior. M e ~ i c ~ i Jr (r ~ i i r / ? i iqf’ / B C ~ C OA,?lrl\’.si.s. ~ O ~ 10, 65-7 I . Houston. A. 1. (1987). The control of foraging decisions. In M. L. Commons. A. Kacelnik. & S. J. Shettleworth (Eds.). Qtrcm/i/irtiiv iinol~sesc!fhi,hiri*ior: Vol. 6. Forirpinp (pp. 41-61 1. Hillsdale. NJ: Erlbaum. Houston. A. I.. Kacelnik. A.. & McNamara, J . (1982). Some learning rules for acquiring information. In D. J. McFarland (Ed.). Frinc/ionii/ on/openy (pp. 140-191). London: Pitman. Houston. A. I., Krebs. J. R.. & Erichsen. J. T. (1980). Optimal prey choice and discrimination time in the great tit (Pirrits rnl!jin’ L .J. Behiiiiorirl E d o p y iurd S ~ ~ ~ k ~ h i 6. d ~169py. 175. Houston. A. I.. & McNamara. J . (1981). How to maximize reward rate on two variableinterval paradigms. Jotrrniil r!f !hi. Expi~rimi~ntir/ Anirlysis r!f B i h i . i o r . 35, 367-396. Houston, A. I.. & McNamara. J. M. (1985). The variability of behavior and conslrained optimization. lorrrnir/ of Thi~oreticiilBifhJgv, 112, 265-273. Hursh. S. R. (1980). Economic concepts for the analysis of behavior. Jorrrnirl c ! f / h i , ESp i ~ r i r n m / i iA / nii/ysis c!fBelliri~ien’.34, 2 19-238. Ito, M.. & Fantino. E. (1986). Choice. foraging, and reinforcer duration. Jorirnirl c!f the, E.rpivinii~n/irlAnerlysis c!fBehuiior, 46, 93- 103. lwasa. Y.. Higashi. M.. & Yamamura. N. (1981). Prey distribution as a factor determining optimal foraging strategy. Arnrri~onNirrirrirlis/, 117, 7 10-723.
Sara J. Sheltleworth
46
Kacelnik. A. (1979). Unpublished doctoral dissertation. Oxford University. Kacelnik. A. ( 1984). Central place foraging in starlings (Srrrrnris i*rrlgtiris)I: Patch residence time. ./orirrict/ ( i t Aiiirriid E C f d l J R y . 53, 283-299. Kacelnik. A,. & Cuthill, 1. C. (1987). Starlings and optimal foraging theory: Modeling in a fractal world. I n A. C. Kamil. J. R. Krebs. & H. R. Pulliam (Eds.). Forirging hc4rtriior (pp. 303-3331. New York: Plenum. Kacelnik. A.. Houston. A. I.,& Krebs. J. R. (1981).Optimal foragingand territorial defense g y Soi,ichichgy. 8, 35-40, in the great tit (Piirrts rr7trjor). Bi4icii~ioridE i * i ~ / ~irrid Kacelnik. A,. & Krebs. J. R. (1985a). Learning to exploit patchily distributed food. In K. M. Sibley. & K. H. Smith (Eds.). Behoi’ioirrcil e i d o ~ y Ec~ologii~cil : i.~)ri.Vi’yrii’ri~’i’,s of ~idirprii~e hc~hcri~iorir (pp. 189-205). Oxford: Blackwell. Kacelnik. A.. & Krebs. J. R, ( 1985b). Rate of reinforcement matters in optimal foraging theory. Behcwiord & Brtriri S c i e r ~ i ~ e 8, s . 340-341. Kacelnik, A.. Krebs, J. R.. & Ens. B. (1987). Foraging in a changing environment: An experiment with starlings ( S / I I ~ I ~i~rrlgirris). II.~ In M. L. Commons. A. Kacelnik, & S. J. Shettleworth (Eds.). Qritrrtritciriiv crntr/yses i?f he/rcr~*ior:V o l . 6 . Foriigiirg (pp. 6387). Hillsdale. New Jersey: Erlbaum. Kagel. J. H.. Green. L.. & Caraco. T . (1986). When foragers discount the future: Constraint or adaptation‘? Atiiiriol E i h i i i ~ ~ i i34, r , 27 1-283. Kamil. A. C. (1983). Optimal foraging theory and the psychology of learning. AriiivYi~iiir Z i ) ~ l o g i ~23, / . 29 1-302. Kamil. A. C. ( 1988). A synthetic approach to the study of animal intelligence. Ni4wtrsktr Synposirtrii o i l Moriivirion. in press. Kamil. A. C.. & Balda. R. C . (19x5). Cache recovery and spatial memory in Clark‘s nutcrdc kers (Nrri.$ugii i~olirrnhiirnii).Jorrrricil of Expc~rirni~nriil Psyiho/ogy: A r r h d Bi4rco*ior P~Ol’i’S.Vl’.V. 1 I + 95- 1 1 I . Kamil. A. C.. Krebs. J. R.. & Pulliam. H. R. (Eds.) (19x7). F o r q i i i g hi4rriior. New York: Plenum. Kamil. A. C., Peters. J.. & Lindstrom. F. J. (1982). An ecological perspective on the study of the allocation of behavior. In M. L. Commons. R. J. Hermstein. & H. Rachlin (Eds.), Q r i i r t i r i r i i r i i v iinii/y.si~.sof haliiii*ior: V o l . 2. MirrcliinR iitrd ~~rir.viriii~itrg i i i w r i r r r , s ( pp. 189-203 ). Cambridge: Ballinger. Kamil. A. C.. & Roitblat. H. L. (1985). The ecology of foraging behavior: Implications for animal learning and memory. Aaiitriil Revieit* i ! / P s y c h l o g y , 36, 141-169. Kamil. A. C.. & Sargent. T. D. (Eds.) (1981). Forirging helrcriior: ~ 1 d l J g ~ i erlrologicd. d . irnd psyi~lrologicdiipproai./ii~s.New York: Garland STPM. Kamil. A. C.. & Yoerg. S. I. (1982). Learning and foraging behavior. In P. P. G. Bateson & P. H. Klopfer (Eds.). P i ~ r s p i w i i win e ? h i ~ / ~ g(Vol. y 5 . pp. 325-3643. New York: Plenum. Kamil. A. C.. & Yoerg. S. I. ( 1985). The effects of prey depletion on patch choice of foraging blue jays (Cyirrroiirrir cri.s/ii/ii).Anirnirl Brlrirviorir, 33, 1089-1095. Kamil. A. C.. Yoerg. S. I.. & Clements. K. C. (1988). Rules to leave by: Patch departure in foraging blue jays. Aninrid B i h i * i o r i r . 36, X43-853. Killeen. P. R. (19x4). Incentive theory 3: Adaptive clocks. In J. Gibbon & L. Allen (Eds.). T h i n g irnd tirni, prrcepiion. Ar~tiirl.sc?f/liiJ New York Accidivny o f d i k w r s . 423, 5 15527. Killeen. P. R. ( 1985). Delay reduction: A field guide for optimal foragers’? B i h i i o r i i l & Briiin S r i e n i ~ ~ 8, s . 341-342.
Killeen. P. R.. Smith. J. P.. & Hanson. S. J. (1981). Central place foraging in Rorrrrs / I O ~ I ’ egii~rrs.Anirncrl Beliiriiorir. 29, 64-70. Krebs. J. R.. Erichsen. J. T.. Webber. M. I.. & Chamov. E. L. (1977). Optimal prey selection in the great tit (Pirrirs nwjor). Aniriiirl Bi~liiriiorrr.25, 30-38.
Foraging and Operant Behavior
47
Krebs. J . R.. & Kacelnik. A. (1984). Time horizons of foraging animals. I n J . Gibbon & L. Allan (Eds.). Tinling irtrd /C?ri8pi~rwp/iotr.Annirls of /hi>N e w York Ai.irdcwiy c ~ ~ S i ~ i i ~ n i ~ i ~ . s . 423, 278-291.
Krebs. J . R.. Kacelnik. A.. & Taylor. P. (1978). Test of optimal sampling by foraging great tits. Nirirrri, (Lotrikin). 275, 27-3 I . Krebs. J. R.. & McCleery. R. H. (1984). Optimization in behavioral ecology. In J . R. Krebs & N. B. Davies (Eds.). Brhiriiorrrid ivologv (2nd ed.. pp. 91-121). Oxford: Blackwell. Krebs. J. R.. Ryan. J. C.. & Charnov. E. L. (1974). Hunting by expectation or optimal foraging’! A study of patch use by chickadees. Animrl Bi4riri~iotrr.22, 953-964. Lea. S. E. G. ( 1979). Foraging and reinforcement schedules in the pigeon: Optimal and nonoptimal aspects of choice. Atriinirl Brlrtrihr~r.27, 875-886. Lea. S. E. G. (1981). Correlation and contiguity in foraging behavior. In P. Harzem & M. Zeiler (Eds.). Adiwicx,s in imtrlvsis o f bch\ior: Vol. 2. Privlic/irhiliip. i ~ i i r r i h / i i ) nirnd . iwnrigrriry (pp. 355-406). New York: Wiley. Lea. S. E. G. (1982). The mechanism of optimality in foraging. In M. L. Commons. R. J . Herrnstein. & H. Rachlin (Eds.). Qrrirntitutiiv c1ni11y.sc.sqfhi4iri~ior:Vol. 2. M i i i i h i n a irtrd ~nit.~imi:ing irciwin/.s (pp. 169-188). Cambridge: Ballinger. Lea. S. E. G., & Dow. S. M. (1984). The integration of reinforcements over time. In J . Gibbon & L. Allan (Eds.). Timing irnd titire percepplion. Antrtrls of’thi~N P M * York Acirdiwiy (?fS(’il’t?l‘l’.V.423, 269-279. Lima. S. L. (1984). Downy woodpecker foraging behavior: Eflicient sampling in simple stochastic environments. Eidogy. 65, 166-174. Lima. S. L. ( 1985). Sampling behavior of starlings foraging in simple patchy environments. Belririiorirl Eidogy trtrd Soiiohiologv, 16, 135-142. Lucas. J. R . ( 1983). The role of foraging time constraints and variable prey encounter in optimal diet choice. Ainivkwn Nir/rrrrrli.s/. 122, 191-209. Lucas. J . R. (1987). Foraging time constraints and diet choice. In A. C. Kamil. J. R. Krebs. & H. R. Pulliam (Eds.). Forriging hehirvior (pp. 239-269). New York: Plenum. Mazur, J . E. (1981). Optimization theory fails to predict performance of pigeons in a tworesponse situation. Siii~ncv.214, 823-825. Mazur. J . E. ( 1986). Fixed and variable ratios and delays: Further tests of an equivalence rule. Jorrrnirl of’Espc~rin~m/cil P.spc./ro/o~v: Anirncrl Bekiiiior Proc~c~ssc~s. 12, I 16- 124. Mazur. J. E., & Vaughan. W. (1987). Molar optimization versus delayed reinforcement as explanations of choice between fixed-ratio and progressive-ratio schedules. Jorrrt~trli~/’ /hr E.rpi~rinien/iilAnir1y.si.s of B i h t i o r . 48, 25 1-261. McNamara. J.. & Houston. A. (1980). The application of statistical decision theory to animal behavior. Jorrrnirl qf‘ Tlreorc~rii~ol Biology. 85, 673-690. McNamara. J. M., & Houston. A. 1. (1987a). A general framework for understanding the effects of variability and interruptions on foraging behavior. Actii Bio/hrori~iii~rr. 36. 322. McNamara. J. M.. & Houston, A. I. (1987b). Partial preferences and foraging. Atii/iitrl Si,hlrliorrr. 35, 10x4-1099.
Mellgren. R. L. (1982). Foraging in simulated natural environments: There’s a rat loose in the lab. Jorrrnirl c ? f ’ / / i i ~ E.rpi~rirnivr/rrlAnu1v.si.s of Behrriior. 38, 93-100. Mellgren. R.. Misasi. L.. & Brown, S. W. (1984). Optimal foraging theory: Prey density and travel requirements and Ro//u.s norvegiiws. Jorrrnrrl of Ciiinpirrirlii*c, 98, 142-153. Olson. D. J.. & Maki. W. S. (1983). Characteristics of spatial memory in pigeons. Jorirnirl of E,rpi~rirnrtr/itlPsyi~l~ology: Anitnul Brhiivior Proi~esses.9, 266-280. Peden. B. F.. & Kohe. M. S. ( 1984). Effects of search cost on foraging and feeding: A threecomponent chain analysis. Jotrrnirl if/he Expivirnm/irl Ancrlwis c~f’Bc~lirri*ior. 42, 2 I I221.
48
Sara J. Shettleworth
Pietrewicz. A. T., & Kamil. A. C. (1977). Visual detection of cryptic prey by blue jays (Cvunocittii crisliita). Science, 195, 580-582. Plowright. C. M. S. (1988). Ph.D. thesis, University of Toronto. Plowright, C. M. S., & Plowright, R. C. (1987). Oversampling by great tits? A critique of Krebs. Kacelnik, and Taylor's (1978). "Test of optimal sampling by great tits." Ciinudiun Joiirniil c?f z i w / o g v e 65, 1282-1283. Pulliam. H. R. (1981). Learning to forage optimally. In A. C. Kamil & T. D. Sargent (Eds.). (pp. 379-388) Foriiging hehinior: Ecological. et/rokigica/, und p.syiho/ogic~ii/iipproiichs (pp. 379-388). New York: Garland STPM. Pyke. G. H. ( 1984). Optimal foraging theory: A critical review. Anncial R e i i i w f#f2kiJ/fJg.Y rind Systeincirii~s.IS, 523-575. Rachlin, H.. Logue. A. W.. Gibbon, J.. & Frankel. M. (1986). Cognition and behavior in studies of choice. Psycliologicul Review. 93, 33-45. Rashotte. M. E.. & O'Connell. J. M. (1986). Pigeon's reactivity to food and to Pavlovian signals for food in a closed economy: Effects of feeding time and signal reliability. Joitrnul qf Experirnentcil Psycltokogv: Animal Behuvior Processes, 12, 235-247. Rashotte, M. E., O'Connell. J. M.. & Beidler, D. L. (1982). Associative influence on the foraging behavior of pigeons (Coliimhu livia). Joiirniil of Experitnenrcil Psyc~lrology:Ani m d BehiiviiJr Processes. 8, 142-153. Rechten, C.. Avery. M., & Stevens, A. (1983). Optimal prey selection: Why do great tits show partial preferences? Aniinul Be/ruviour. 31, 576-584. Roberts. S. ( 19811. Isolation of an internal clock. Joiirnul of Experiinentd P.Sv~/lfJ/i~~.V: Aniiniil Bekii vior Proc~essrs.7 , 242-268. Roper, T. J.. & Wistow. R. (1986). Aposematic colouration and avoidance learning in chicks. Qiiorter1.v Joiirniil qf Experimental Psvckology, 38B. 14 1-149. Rozin. P.. & Schull. J. (1988). The adaptive-evolutionary point of view in experimental psychology. In R. C. Atkinson, R. J. Herrnstein. G. Lindzey. & R. D. Luce (Eds.). Hiindhook qfExperimentii1 Psycliology. in press. New York: Wiley (Interscience). Schoener. T. W. (1987). A brief history of optimal foraging ecology. In A. C. Kamil. J. R. Krebs. & H. R. Pulliam (Eds.). Foraging hehuvior (pp. 5-67). New York: Plenum. Schull. J., Gelch. H.. Vitale, J.. Allan. A., James, S.. & Harrison, M. (1985). OpriiniiI .fi)ruging in depleting putches: Operunt simiilulions compured with semi-niitiirul ohseriwtions. Paper presented at meetings of the Animal Behavior Society, Raleigh. NC. Sherry, D. F. (1987). Foraging for stored food. In M. L. Commons, A. Kacelnik. & S. J . Shettleworth (Eds.). Qiiuiitiiutiw unii1vse.s of be/ruvior: Vol. 6 . Foriiging (pp. 209-227). Hillsdale, NJ: Erlbaum. Shettleworth. S. J. (1983). Function and mechanism in learning. In M. D. Zeiler & P. Harzem (Eds.). A d w n c e s in cina1.vsi.s of hehuvior: Vol. 3. Bio/ogica/frctor.s in Ieiirnirrg (pp. I39). New York: Wiley. Shettleworth. S. J. (1984). Learning and behavioral ecology. In J. R. Krebs, & N . B. Davies (Eds.). Behui~ii)riiIew1og.v (2nd ed., pp. 170-194). Oxford: Blackwell. Shettleworth. S. 3. (1985a). Handling time and choice in pigeons. k J i i m d qf the ELrpi~riinentcil Aniilvsis of Bchaviiw. 44, 139-155. Shettleworth. S. J. (1985b).Questions about foraging. Behiwioriil & Bruin Scieni'es. 8, 347348. Shettleworth. S. J. (1987a). Individual differences in choice of food items by pigeons. Brlrciviorul Processes. 14, 305-3 18. Shettleworth. S. J. (1987b). Learning and foraging in pigeons: Effects of handling time and changing food availability on patch choice. In M. L. Commons. A. Kacelnik. & S. J. Shettleworth (Eds.), Qircmtitutive unulwes of hehiivior: Vol. 6 . Foruginl: (pp. I 15-132). Hillsdale. NJ: Erlbaum.
Foraging and Operant Behavior
49
Shettleworth. S. J.. &Jordan, V. (1986). Rats prefer handlingfood to waiting for it. Anirncrl Belitr iiiirrr. 34, 925-927. Shettleworth. S. J.. & Krebs. J. R. (1982). How marsh tits find their hoards: The roles of site preference and spatial memory. Joirrnctl ofE.uperirnenrit1 P.sycholugy: Aninicrl Behovior Prcicesses. 8, 354-375. Shettleworth. S. J.. Krebs, J. R.. Stephens, D. W.. & Gibbon, J. (1988). Tracking a fluctuating environment: A study of sampling. Anirnrrl Behcri8iciirr.36, 87-105. Shettleworth. S. J.. & Plowright. C. M.S. (1988). Time horizons of pigeons on a two-armed bandit. Animal Belittviurrr. in press. Snyderman. M. ( 1983a). Optimal prey selection: Partial selection, delay of reinforcement and self control. B e / i d o r Anrrlvsis Letters. 3, 131-147. Snyderman. M.( 1983b). Optimal prey selection: The effects of food deprivation. Behcivior Anu1.vsi.v Letters, 3, 359-369. Staddon. J. E. R. (1980). Optimality analyses of operant behavior and their relation to optimal foraging. In J. E. R. Staddon (Ed.), Limits /o r i i ~ i o n(pp. 101-141). New York: Academic Press. Staddon. J. E. R. ( 1983). Aderpriiv hehcivior crnd krrrning. Cambridge: Cambridge University Press. Staddon. J. E. R., & Reid, A. K. ( 1987).Adaptation to reward. In A. C. Kamil. J. R. Krebs. & H. R. Pulliam (Eds.). Furcrging hektr17ior (pp. 497-5231, New York: Plenum. Stephens. D. W. ( 1985).How important are partial preferences? Anirnitl Bdrctiiorrr. 33,667669. Stephens. D. W. ( 1987).On economically tracking a variable environment. T/icwri~tii~rl Popriltrtion Biology. 32, 15-25. Stephens, D. W.. & Krebs, J. R. (1986).Forrrging /heor?. Princeton. NJ: Princeton University Press. Tamm. S. (1987). Tracking varying environments: Sampling by hummingbirds. AnirmtI Brh t i i i o i r r . 35, 1725-1734. Timberlake. W. (1984a). A temporal limit on the effect of future food on current performance in an analogue on foraging and welfare. Jurrrncrl qfrlre Experimenter/ Antrlvsis cfBi4iinior. 41, 117-124. Timberlake, W. ( 1984b). Behavior regulation and learned performance: Some misapprehensions and disagreements. Jortrnul c?f the Experirnentirl Analvsis c?f Behavior. 41, 355-375. Timberlake. W., Gawley, D. J., & Lucas. G. A. (1987). Time horizons in rats foraging for Anirncrl food in temporally separated patches. Jairrnul (if Expivirncwtril P.s~i~lioliig.v: Brkir vier Processes. 13, 302-309. Timberlake. W., & Peden, B. F. (1987). On the distinction between open and closed economies. Jorrrnrrl i f the Expcrirnenrul Anolysis of BeIi~vior.48, 35-60. Tinbergen. N . (195 I ). The srridv uf instinct. Oxford: Clarendon. Waage. J. K. ( 1979). Foraging for patchily-distributed hosts by the parasitoid. Nc~nii~riti.s cirnescens. Joirrncrl of Anirntil Eidugy, 48, 353-371 . Yakowitz. S. J . ( 1969). Morhemutics c$[tdciptive control prcii~i~ssi~s. New York: Elsevier. Ydenberg, R. C. (1984). Great tits and giving-up times: Decision rules for leaving patches. Brlictviorrr. 90, 1-24. Zeiler. M. D. ( 1987). On optimal choice strategies. Jotrrntil c!f E.upc~rirnc~rilolP.syi.ho/ogv: Anitnal Brkiiiior Proce.sses. 13, 3 1-39.
This Page Intentionally Left Blank
THE COMPARATOR HYPOTHESIS: A RESPONSE RULE FOR THE EXPRESSION OF ASSOCIATIONS Ralph R . Miller Louis D . Matzel
I. Introduction Traditional learning theories were built on a foundation laid by the British empiricist philosophers (e.g., John Locke, David Hume, and John Stuart Mill), who believed that learning consisted of the formation of mental links, called associations, between internal representations of events. Associations between events were most apt to be formed when the events had similar stimulus attributes and occurred in temporal and spatial proximity to one another, that is, when the events were contiguous. The British empiricists were interested in thoughts, not behavior, and gave little consideration to the fact that thoughts have to be translated into observable behavior before they can be subjected to scientific analysis. Consequently, it is not surprising that the British empiricists focused their attention on acquisition processes. Even today, most theories of learning are built on a framework of associationism, and the various models are distinguished by how they delimit contiguity and the conditions, in addition to contiguity, that they regard as essential for learning to occur. The associationist tradition has resulted in a continuing emphasis on the processes underlying the acquisition and storage of information and a relative neglect of the many other processes that likely occur between the learning event and any consequent change in behavior, e.g.. retrieval from inactive memory, response selection, and THE PSYCHOLOGY OP L6AKNING A N D MOIIVAI'ION. VOL. ??
51
Copyrieht ( 1 , 19XX hy Acidernic l're\\. Inc. All right\ of reproduction in m y liirm rewrved.
s2
Ralph R. Miller and Louis D. Matzel
response generation. This emphasis has often prompted researchers to explain the absence of a particular acquired response in terms of acquisition failure if conditioned responding during training was not observed, or in terms of retention failure if responding during training was observed, without consideration of other possible sources of the behavioral deficit. The emphasis traditionally has been so strong that students of information processing in humans and animals have been said to study learning, a term which does not begin to encompass the complexity of the acquisition, processing, and expression of information by organisms, which in fact really should be the subject of interest. The title of this series, The Psychology of Learning and Motivation, exemplifies this bias toward acquisition and, on a small scale, illustrates the historical roots of this bias, namely, the fact that Kenneth and Janet Spence coined the title when they inaugurated the series over two decades ago. The recent shift in category label by some researchers from learning to cognitive psychology is motivated in part by a desire to escape from this enduring bias. The assertion that expressed behavior is not a perfect reflection of what has been learned has been made repeatedly over the years (e.g., Ballard, 1913; Bartlett, 1932; Tolman, 1932). Tolman, for example, focused largely on nonassociative factors, particularly motivation, that he called performance variables. He and others of like mind showed that in instrumental situations learning was not always expressed in behavior. Specifically. Tolman demonstrated that latent learning could occur both when subjects were not internally motivated to express the acquired information and when incentives appropriate to the subject’s motivational state were not available (e.g., Tolman & Honzik, 1930). A Pavlovian parallel to the phenomenon of latent learning is seen in sensory preconditioning, in which first pairing two “neutral” stimuli and then reinforcing one of these stimuli results in conditioned responding to the nonreinforced stimulus that is appropriate for the reinforced stimulus (Brogden, 1939; Prewitt, 1967). Currently, few researchers question the need to match motivational state and incentive in order to control behavior, acquired or unconditioned (although today most experiments with humans use demand characteristics to motivate the expression of acquired information). However, motivational performance variables are not the only factors that influence the expression of associations. Cognitive investigators studying humans have exhibited considerable interest in some of the associative aspects of information processing beyond the conditions necessary for acquisition, particularly in the structure of memory and the form of the representations within memory (e.g., Anderson, 1983). They have also given some, but often limited, attention to the expression of associations (e.g., see work on priming such as Neely. 1976). Although students of information processing in animals have been
The Comparator Hypothesis
53
slower to leave the comfortable simplicity of the associationist tradition with its stress on acquisition (perhaps wisely so), new data are rendering models of information processing that focus exclusively on acquisition less and less viable. For example, Miller, Kasprow, and Schachtman (1986) have summarized evidence for the importance of differentially effective retrieval in explaining many differences in the acquired behavior of animals. Retrieval is clearly dependent upon associative processes, but processes that occur quite independently of acquisition itself. One method of demonstrating a failure to express an association, as opposed to a failure of acquisition, is to induce recovery with a “reminder” treatment that precludes new, relevant learning. Among the putative acquisition deficits that were subsequently reversed by reminder treatments are overshadowing (Kasprow, Cacheiro, Balaz, & Miller, 1982), blocking (Balaz. Gutsin, Cacheiro, & Miller, 1982), conditioned stimulus (CS)preexposure effect (Kasprow, Catterson, Schachtman, & Miller, 1984), and retrograde amnesia produced by electroconvulsive shock (e.g., Miller & Springer, 1972). Prior to these demonstrations of reminder-induced recovery, theoretical bias caused researchers to assume that these deficits were due to irreversible acquisition failures. Even now, with evidence available that these deficits arise at some postacquisition stage of information processing, these observations do not seem to have entered the mainstream of theorizing concerning animal behavior, perhaps because the prevailing models of “learning” emphasize acquisition and have relatively little to say about the many other stages of processing that are surely involved between input of information and output of modulated behavior. This emphasis on acquisition is analogous to that of the trace-dependent theories of human memory which prevailed prior to Tulving’s work emphasizing, the importance of retrieval cues (e.g., Thomson & Tulving, 1970). Many of the complexities of modern learning theory reflect attempts to explain the presumed learning deficits of overshadowing, blocking, and the CS-preexposure effect (e.g., Mackintosh, 1975). If these behavioral deficits are recognized as arising from inadequacies of postacquisition information processes such as retrieval, then there is no reason for acquisition mechanisms to have to explain them, thereby allowing for far simpler models of acquisition than are currently fashionable. This places an added explanatory burden on theories of retrieval and other postacquisition processes, but the reminder data requires that models of postacquisition processing address these issues. Given this unavoidable burden on models of postacquisition processing, models of acquisition should take full advantage of the resulting reduction in their explanatory burden. Retrieval of an association is necessary but not sufficient for successful performance on a long-term retention test. In addition to reactivation, response rules determine how the reactivated association will be expressed
54
Ralph R. Miller and Louis D. Matzel
in behavior. Moreover, the experimenter’s interpretation of the behavior is dependent on the particular response system that he or she chooses to monitor. Response systems are not necessarily equivalent in the information that they reflect (e.g., Holland, 1980; Perruchet, 1985).This article attempts to demonstrate both the potential explanatory power of a specific response rule and its implications for models of acquisition. We do not believe that the proposed response rule is precisely correct. Rather, we offer it as a working hypothesis to provide a sense of what response rules can contribute to theory. Owing to our background, the examples provided come largely from animal behavior laboratories that employ classical conditioning procedures. However, the conclusions drawn are sufficiently elemental that we expect they will prove applicable across a variety of tasks and species. In principle, all comprehensive theories concerning the processing of acquired information must explain how acquired information impacts behavior. Due in large part to the tradition of associationism, this issue has received little attention in contemporary learning theories and only marginally more in those theories that might be called cognitive. For instance, the elegant and frequently cited model of Rescorla and Wagner (1972) states only that responding to a conditioned stimulus’will be monotonically related to the associative strength of the stimulus; that is, response strength and associative strength should vary in the same direction. In practice, a linear relationship between associative strength and response strength is often implicit in their data analyses. Regardless of whether they are linear or merely monotonic, such simple response rules contribute little toward our understanding of behavioral differences between treatment groups. As we shall see, a somewhat more detailed statement concerning the expression of acquired information can have appreciable explanatory value. 11.
The Comparator Hypothesis
We call our response rule the comparator hypothesis. It was originally inspired by Rescorla’s (1968) contingency theory and a strong sense that differential acquisition contributed less than was traditionally assumed to the differences in behavior observed between individuals (see discussion of reminder-induced recovery from various behavior deficits in Section I). Rescorla noted that if the number and frequency of conditioned stimulus-unconditioned stimulus (CS-US) pairings are held constant, unsignaled presentations of the US during training attenuate conditioned responding. This observation complemented the long-recognized fact that the delivery of nonreinforced presentations of the CS during training also
The Comparator Hypothesis
55
attenuates conditioned responding. The symmetry of the two findings prompted Rescorla to propose that during training, subjects inferred both the probability of the US in the presence of the CS, P(USlCS), and the probability of the US in the absence of the CS, P(US(no CS), which is roughly equivalent to P(USlcontext), and they then established a CS-US association based upon a comparison of these quantities. Specifically, he suggested that conditioned excitation developed to the extent that P(US1CS) was greater than P(US1no CS) and that conditioned inhibition developed to the degree that P(US1CS) was less than P(US1no CS). This relationship is illustrated in Fig. 1. Although the initial tests of Rescorla's (1968) contingency theory found the behavioral predictions that it generated to be generally accurate, the model contains a logical contradiction. Sensing P(US1CS) and P(USlno CS) requires at least temporary memory of CS-US trials along with memory of nonreinforced CSs and unsignaled USs. By such logic, the subject's perception of contingency requires that a CS-US association already exist! In addition, two major assumptions of contingency theory went untested. First, the model assumed that the associative strength of the CS (i.e., its predictive power) was compared to the associative strength of the training context as opposed to the test context; yet, the same context was always used for training and testing. Second, the model assumed that the com-
CONTINGENCY THEORY (Rescorla. 1968 1
"0
P(US1 NO-CS)
I
Fig. 1. Rescorla's (1968) contingency theory states that total experience with a CS and US can be represented as a point in the contingency space depicted here. The strengths of resultant excitatory and inhibitory associations are assumed to be proportional to the perpendicular displacement of the point representing training from the line of indifference. From Matzel er ul. (1988a). Copyright 1988 by Academic Press.
Ralph R. Miller and Louis D. Matzel
56
parison took place at the time of training as opposed to at the time of testing, but P( USIcontext) was never altered between training and testing. The comparator hypothesis arose from our efforts to test these assumptions. A.
Is THE CS COMPARED TO THE TRAINING CONTEXT OR THE
TESTCONTEXT?
In the experiments from our laboratory described here, ( I ) the subjects were water-deprived, naive, adult rats; (2) the CSs were distinctive auditory and visual cues that were known not to generalize to one another; and (3) the US was a brief, mild foot shock. Three contexts that differed in shape, color, and texture were employed. One context (A) was used for CS training, the second context (B) was used for CS testing when testing was to occur outside the training context, and the third context (C) was used for excitatory conditioning of a CS to be used as a known excitor in those experiments that employed a stimulus summation test. The first two contexts contained water tubes that were connected to lickometets. Counterbalanced within groups, one of the two contexts with water available served as Context A and the other served as Context B. Typically, all animals were acclimated to drinking in both Contexts A and B, during which time the CSs were presented twice without reinforcement in order to minimize later unconditioned suppression (external inhibition) of drinking by the CSs in control subjects that had not received the CS during training. Then, over a 4-day period, CS conditioning occurred in Context A. Ordinarily, conditioning consisted of presenting the US immediately following the CS on 25 or 33% of its presentations and presenting the US in the absence of the CS with either a substantially higher (negative contingency) or substantially lower (positive contingency) density than with the CS. This was followed a few days later by testing. Testing consisted of allowing subjects to complete 25 licks, which ordinarily took 4 to 10 sec, and then presenting the test CSs and measuring the subject’s latency to complete 25 additional licks. Thus, our index of manifest memory was the degree of conditioned suppression of ongoing drinking. Our first goal was to determine whether the probability of the US in the presence of the CS is compared to the probability of the US in the context in which the CS was trained, or to the probability of the US in the context in which the CS was tested. During each conditioning session, the subjects were switched back and forth between Context A and Context B on one of two schedules such that each subject had three daily minisessions in each context. During one minisession in Context A, the CS was presented six times and reinforced with foot shock on half of these
The Comparator Hypothesis
57
occasions. During the minisessions that immediately preceded and followed this minisession, half of the subjects received 24 unsignaled foot shocks ( U S alone) either in Context A or in Context B. The schedule used for con!ext switching insured that all subjects had equivalent exposure to both contexts and that the intervals between the unsignaled shocks and the CS presentations were identical for all subjects that received the unsignaled shocks. Finally, lick suppression was assessed either in Context A or in Context B. Thus, the study employed a 2 x 2 x 2 factorial study in which the order of context exposure (Context A or B immediately before and after CS training in Context A), the occurrence of unsignaled shock immediately before and after CS training (present or absent), and the test context (A or B) were varied. The procedural details of this experiment can be found in Experiment 1 of Kasprow, Schachtman, and Miller (1987).
In Y
0
-1
c
0
w
In
z a W
I LOCATION DURING SUBSESSIONS 3 AND 5
Fig. 2. Mean latencies to emit 25 licks in the presence of the CS (noise). Left panel, unsignaled shock during subsessions 3 and 5 ; right panel, no unsignaled shock during subsessions 3 and 5. All subjects received identical CS-US (foot shock) pairings in Context A during subsession 4. During subsessions 3 and 5 , subjects either did or did not receive a high density of unsignaled foot shocks in either the CS training context (A) or an alternate context (B). Subsessions 1. 2, and 6 contained neither CS nor shock presentations; they served to equate exposure to the contexts. Subjects were tested in either Context A or B. Higher scores indicate more robust conditioned excitation evidenced in suppression of licking. Brackets depict standard errors. From Kasprow et al. (1987). Copyright 1987 by the American Psychological Association. Reprinted by permission of the publisher.
58
Ralph R. Miller and Louis D. Matzel
Examination of test trial suppression latencies (see Fig. 2) determined that, independent of both order of context exposure and test context, subjects trained without unsignaled shock exhibited equally robust suppression to the CS. Moreover, unsignaled shock in Context B did not influence suppression to the CS, but unsignaled shock in Context A (the CS training context) reduced suppression to the CS almost to pretraining levels. Thus, as was assumed by Rescorla (19681, the unsignaled US presentations in the training context, not in the test context, were found to inversely affect responding to the CS. This observation is fully compatible with not only Rescorla's contingency theory, but also most other contemporary models of conditioning. Most of them would explain the results in terms of the excitatory training context blocking acquisition of the CS-US association. B. DOEST H E COMPARlSON OCCUR
DURING
TRAINING OR TESTING?
In principle, the putative comparison between P(US(CS)and P(US1 training context) could occur at either the time of training or the time of testing (or, although unlikely, even during the retention interval). To answer this question, we reinforced 2 out of 6 daily CS presentations in Context A which were intermingled with 23 unsignaled shocks in Context A. This specific negative contingency procedure had been shown previously to make the CS into a conditioned inhibitor as defined by both retardation and summation tests (Matzel, Gladstein. & Miller, 1988a). Following the last training session, all subjects received CS-US pairings in Context A to determine the extent to which negative contingency training interfered with subsequent acquisition of an excitatory response (i.e., a retardation test). Following these CS-US pairings, the training context was extinguished over 8 days for some of the subjects, i.e., these animals were simply placed in Context A for 90 min each day. Then responding to the CS was assessed in a neutral context (Context B). Relative to animals for which Context A was not extinguished, responding to the CS was enhanced by extinction of Context A; i.e., extinction of Context A largely eliminated retardation (see Fig. 3). In a parallel study, the same negative contingency training was administered and followed, for some of the subjects, by extinction of the training context. Inhibitory summation was subsequently assessed by measuring suppression in Context B to a conditioned excitatory stimulus that had been trained in Context C and to the same excitor compounded with the putative conditioned inhibitor (i.e.. a summation test). Extinction of Context A was found to decrease the degree to which the putative inhibitor attenuated responding to the excitatory stimulus (see Fig. 4). (Details of these manipulations are described in Experiments 2A and 2B in Kasprow ef id., 1987.) These studies converged on a common conclusion. Following extinction
The Comparator Hypothesis
v)
Y
?In
2.21
T
2.0
N
2
a
w I
ND
DC
DA
TREATMENT Fig. 3. Mean latencies in a neutral context (B) to complete 25 licks in the presence of the CS (noise) (stippled bars) and latencies to complete 25 licks in Context A (training context) in the absence o f any punctate stimuli (striped bars). A l l subjects received negative contingency inhibitory training with the CS and US (foot shock) and. as part o f a retardation test. subsequent CS-US pairings, both in Context A. On inhibitory training days, unsignaled USs equivalent to those given i n Context A during inhibitory training were administered to all subjects in Context C. Group D A had Context A deflated (extinguished) between inhibitory training and the CS-US pairings. Group DC had Context C deflated between inhibitory training and the CS-US pairings. Group ND received no context-deflation treatment. Lower scores indicate more retardation of conditioned responding (lick suppression). Brackets depict standard errors. From Kasprow ef ol. (1987). Copyright 1987 by the American Psychological Association. Reprinted by permission of the publisher.
of the training context, less conditioned inhibition (as measured by both retardation and negative summation tests) was seen than without extinction. That extinction of the conditioning context following CS training was able to affect responding to the CS in a neutral context indicates that the response potential of the CS was determined by a comparison of the associative strength of the CS with that of the training context that occurred at the time of testing rather than at the time of training (also see Kaplan, 1985; Kaplan & Hearst. 1985; Miller & Schachtman, 1985). Con-
Ralph R. Miller end Louis D. Matzel
In Y
0
2.0
T
J
10 N
T
18
0
16
w
In
I .4 0
z w
t-I
I.2
10
ND-C
D-C
ND-CN
D-CN
TREATMENT
Fig. 4. Stippled bars represent mean latencies in a neutral context (B)to complete 25 licks either in the presence of an excitor (clicks = -C) previously paired with foot shock or in the presence of that excitor compounded with a noise CS (clicks + noise = -CN), i.e., a negative summation test of the inhibitory potential of the noise. All subjects initially received negative contingency inhibitory training with the noise CS and foot shock US in Context A. Half the subjects then had the noise training context ( A ) deflated (D), i.e.. extinguished, between inhibitory training and testing. The remaining subjects received nodeflation (ND) treatment. Lower scores for the CN compound than the C stimulus indicate negative summation. Striped bars represent posttreatment latencies to complete 25 licks in the noise training context (A) without punctate stimuli present. Brackets depict standard errors. From Kasprow e / a / . (1987). Copyright 1987 by the American Psychological Association. Reprinted by permission of the publisher.
sistent with the first experiment, the comparison appeared to be with the training context (Context A) despite the fact that testing occurred in a different context (Context B). Moreover, control subjects for whom there were two excitatory contexts (the CS-training context and’an irrelevant context) demonstrated that extinction of an excitatory context other than the CS-training context had no effect upon responding to the CS. Thus, the effect of posttraining context extinction appears to be specific to the CS-training context (see Fig. 3). That the putative comparison takes place at the time of testing rather than at the time of training suggests that the comparison is part of the
The Comparator Hypothesis
hl
response mechanism, not part of the acquisition process as had been assumed by Rescorla (1968). The implication is that at the time of testing, presentation of the CS reactivates not only the CS-US association, but, as extinction of the training context had an effect even when testing occurred elsewhere, presentation of the CS must also reactivate the association between the CS and the training context, with the representation of the training context, in turn, reactivating the training context-US association. Over the retention interval, the subject must have stored three associations, CS-US, CS-training context, and training context-US. The absence of any of these three would have interfered with the effect observed in the present experiment. We should mention that preliminary experiments determined that less than eight 1-hour context extinction sessions were ineffectual. In far less than 8 hours, freezing behavior, indicative of fear of the training context, vanished. Yet, the effect of context extinction on the inhibitory strength of the CS appeared only with extensive overextinction. This fact is anomalous and suggests either that the comparison is sensitive to context-US associations that are subthreshold with respect to direct responding (which we think is unlikely) or that the comparison is to a type of context-US association that is qualitatively different from that which supports direct excitatory responding. The two preceding experiments, each of which we have successfully replicated with a variety of procedures and parameters, are the basis for the comparator hypothesis. The comparator hypothesis is a qualitative response rule stating that the response to a CS will be a direct function of the CS-US associative strength and an inverse function of the strengths of the associations between the US and other cues that were present during truining of the CS. 111.
Punctate Comparator Stimuli
In both of the preceding experiments, the training context served as the comparator stimulus. This raises the question of whether or not brief, discrete (punctate) stimuli present during training of the target stimulus could also serve as comparator stimuli. Surely there will always be a context of some sort present during training of the CS, so it is likely that this training context will always contribute to some degree to the putative comparison, but might not a punctate stimulus also play a role‘?To answer this question, we repeated the previous extinction experiment using Pavlov’s conditioned inhibition procedure (Y +NX - ) rather than the negative contingency procedure (+/X-)that we had initially used. In a preliminary study, Stimulus Y was repeatedly paired with the US
62
Ralph R. Miller and Louis D. Matzel
in Context A, thereby making it excitatory. Then, still in Context A, consistent reinforcement of Stimulus Y alone was intermingled with 25% reinforcement of a YX compound stimulus. Testing Stimulus X in Context B for both retardation and summation relative to appropriate control groups determined that X was functionally a conditioned inhibitor. Having determined that our Pavlovian procedure did in fact make X into a conditioned inhibitor, we then repeated the procedure, but this time after inhibitory training we extinguished Stimulus Y in Context A for half of the subjects. The nonextinguished control subjects spent equivalent time in Context A in order to equalzany effect of extinguishing the training context. Finally, all subjects were tested in Context B. Extinction of Stimulus Y was found to decrease retardation and inhibitory summation. Thus, punctate stimuli present during the training of a CS appear to contribute to the comparator term for that CS. Details of these experiments are described in Hallam, Matzel, Sloat, and Miller (1987; also see Lysle & r'owler, 1985). An excitatory test of Stimulus X (direct responding) also found that posttraining extinction of Stimulus Y made Stimulus X mildly excitatory. Thus, posttraining extinction of comparator stimuli can not only reduce conditioned inhibition, it can increase conditioned excitation. Further research determined that the appearance of conditioned excitation in this study depended upon the 25% partial reinforcement of the YX trials during inhibitory training. With no reinforcement of the YX compound, it was not possible to extinguish the comparator stimulus to a sufficiently low associative level relative to Stimulus X to obtain conditioned excitatory behavior. IV. Some Applications of the Comparator Hypothesis To illustrate the explanatory potential of the comparator hypothesis, we provide two examples of how phenomena traditionally explained by prevailing theories in terms of acquisition failure can be reinterpreted in terms of performance deficits arising from comparator processes. The supporting data in each instance are not only compatible with the comparator hypothesis, they are inexplicible in terms of the traditional explanations. Our first example (overshadowing)illustrates the influence of a punctate comparator stimulus, and our second example (the US-preexposure effect) illustrates the influence of the conditioning context as a comparator stimulus. A.
OVERSHADOWING
Overshadowing refers to a deficit in responding to Stimulus X following reinforced training with the compound XY relative to reinforced training with X alone (Pavlov, 1927). Mackintosh (1971) has presented data in-
The Comparator Hypothesis
63
dicating that overshadowing arises from something more profound than mere stimulus generalization decrement as a result of training with XY and testing with X. Prevailing theories such as those of Mackintosh (1975) and Rescorla and Wagner (1972), while differing with respect to the underlying mechanism, explain overshadowing as a consequence of Y accruing associative strength at the expense of X. Both of these theories are generally successful at predicting multitrial overshadowing, although neither can predict overshadowing that occurs with a single training trial (James & Wagner, 1980; Mackintosh & Reese, 1979), reminder-induced reversal of multitrial overshadowing (Kasprow et d.,1982), or spontaneous recovery from overshadowing over an extended retention interval (Kraemer, Lariviere, & Spear, 1988). These latter effects indicate that overshadowing cannot properly be considered an acquisition deficit. Kaufman and Bolles (1981) and Matzel, Schachtman, and Miller (1985) gave rats reinforced compound trials (XU + ) followed for some of the subjects by extinction of the Y stimulus. Testing on X found greater conditioned responding in subjects for which Y had been extinguished than in those for which Y had not been extinguished (see Fig. 5). Matzel et a / . also found that extinction of an excitatory stimulus other than the overshadowing stimulus did not produce a recovery from overshadowing. Thus, this effect appears to be specific to extinction of the overshadowing stim-
-g- I
-
TONE
2.2
TREATMENT
Fig. 5. Mean latencies to complete 25 licks in the presence of the overshadowed stimulus and ex(light) and in the presence of the overshadowing stimulus (tone). Overshadow (0) tinction (E) groups received the light-tone compound reinforced with foot shock, whereas the overshadowing control (OC)group received the light alone paired with shock. Between training and testing, Group E had the tone extinguished. Lower scores indicate more overshadowing of conditioned responding (lick suppression). Brackets depict standard errors. From Matzel ef d.(1985). Copyright 1985 by Academic Press.
64
Ralph R. Miller and Louis D. Matzel
ulus as opposed to postovershadowing extinction of any excitatory stimulus. Further research by Matzel, Shuster, and Miller (1987b) examined serial overshadowing (see Egger & Miller, 1962) in which Y immediately preceded X during training. Consistent with the data obtained from simultaneous overshadowing, postovershadowing extinction of Y resulted in robust conditioned responding to X. The implication of these overshadowing studies is that part or all of the overshadowing deficit is due to Stimulus Y being a component of the comparator stimulus for Stimulus X, and acquired X-US associations not being expressed because of the high excitatory status of Y. This comparator explanation of overshadowing does not speak to whether the X-US association is as strong as the Y-US association or somewhat less so. The only clear implication is that the excitatory strength of Y masks the excitatory strength of X at the time of testing. Because X and Y have the same temporal and spatial relationship to each other, one might expect that if Y is part of the comparator stimulus for X, then X ought to serve as part of the comparator stimulus for Y. This is tantamount to predicting reciprocal overshadowing between X and Y. However, overshadowing of a stimulus element, X, typically occurs only when X has rhe lower associability of the two compounded elements (Mackintosh, 1976), with “associability” indexed by the rapidity seen in naive subjects with which each stimulus by itself comes to elicit conditioned responding when paired with the US. According to the comparator hypothesis, the lack of reciprocal overshadowing occurs because the more salient stimulus, Y, competes more successfully with the training context to become part of the comparator stimulus for X than Stimulus X competes with the training context to become part of the comparator stimulus for Y. Although the training context also contributes (to various degrees) to the comparator stimuli of X and Y, it has little impact because all of the USs during training were signaled by X and Y. Thus, the overshadowing of X by Y but not Y by X following XY compound conditioning may reflect not a difference in associative strength per se between X and Y. but a difference between the degree to which Y becomes part of X’s comparator stimulus and X becomes part of Y’s comparator stimulus. Before departing the topic of overshadowing, note might be taken of the similarities between overshadowing and blocking, which suggest that comparator processes may well contribute to blocking under appropriate circumstances.
B. THEUS-PREEXPOSURE EFFECT Having just described a case in which a punctate stimulus contributes to a comparator term, we now turn to an example in which the training context serves as the comparator stimulus. Following unsignaled exposures
The Comparator Hypothesis
hS
to the US, subjects ordinarily require more CS-US pairings before conditioned responding is observed (e.g., Tomie, 1976). This retardation in the appearance of conditioned responding typically has been attributed to the context becoming excitatory as a result of the unsignaled USs and the context subsequently blocking acquisition to a CS-US association during the CS-US pairings. Both extinction of the context between the unsignaled USs and the CS-US pairings and a change in context between the unsignaled USs and the CS-US pairings have been found to eliminate the US-preexposure deficit (e.g., Randich & LoLordo, 1979). These two observations generally are viewed as evidence for an acquisition failure at the time of the CS-US pairings that is mediated by the excitatory context, i.e., blocking by context. However, both of these observations are equally compatible with the view that the CS conditioning context acts as the comparator stimulus for the CS. Not only can the comparator hypothesis explain the effects of context shifts and context extinction before CS training, but unlike the blocking-by-context explanation, it further predicts that extinction of the conditioning context following the CS-US pairings will reduce the US-preexposure effect. Using rats in a conditioned lick suppression task, Matzel, Brown, and Miller (1987a) tested this prediction. They found that extinction of the training context after CS training almost entirely eliminated the USpreexposure effect (see Fig. 6). Moreover, another experiment in the same series determined that, following CS training, extinction of an excitatory context other than that used for CS training had no effect upon responding to the CS. Comparable data using an appetitive US has been reported by Timberlake (1986). Thus, as predicted by the comparator hypothesis, the attentuation of the US-preexposure deficit resulting from context extinction after CS-US pairings appears to be specific to the context in which the CS was trained. As blocking of acquisition by the context depends only upon the associative status of the context at the time of CS training, this effect of extinction of the training context following CS truining is inexplicable in terms of blocking. Notably, the blocking explanation views the US-preexposure deficit as the result of an acquisition deficit, whereas the comparator explanation views it as the consequence of a response rule. As an aside, we note that extinction of the conditioning context after CS-US pairings largely restored responding to the CS, but across a number of experiments we frequently observed a small residual deficit after extinction (Matzel et al., 1987a). As this small residual deficit was not observed following extinction of the training context prior to CS training, one possible explanation of the residual deficit was contextual blocking of acquisition as predicted by Rescorla and Wagner (1972). However, an alternative possibility was that the excitatory context was producing a conditioned analgesic effect that reduced the effective intensity of the foot
Ralph R. Miller and Louis D. Matzel T
TREATMENT
Fig. 6. Mean latencies i n a neutral context (B) to complete 25 licks i n the presence of the CS (noise). Groups + A , +A/b, and +A/a received US preexposure i n Context A, whereas Group + C received US (foot shock) preexposure in Context C. A l l subjects then received. as part of a retardation test, CS-US pairings in Context A. Group +A/b had the conditioning context (A) extinguished before US preexposure and the CS-US pairings. Group A/a had the conditioning context extinguished ufier the CS-US pairings. Groups + A and C received no extinction treatment. Lower scores indicate greater retardation of excitatory responding (lick suppression). Brackets depict standard errors. From Matzel e / c d . (1987a). Copyright 1987 by the American Psychological Association. Reprinted by permission of the publisher.
+ +
shock U S during the CS-US pairings. In support of this conjecture, injection of the rats with naloxone (an opiate antagonist) immediately prior to a single CS-US pairing (used to assess retardation) was found to eliminate the residual deficit not already accounted for by comparator processes (Matzel, Hallam, & Miller, 1988b). Moreover, direct tests of pain sensitivity determined that exposure to the context in which unsignaled USs had been given did induce analgesia. Although this conditioned decrease in sensitivity to the U S (and, hence, effectiveness of the US) might be regarded as underlying apparent blocking by the context (Schull, 1979), such a blocking mechanism is different than that espoused by informationprocessing interpretations (e.g., Rescorla & Wagner, 1972) which view blocking as the consequence of the subject’s correct anticipation of the U S without any diminution in its perceived intensity.
The Comparator Hypothesis
V.
67
Implications for Conditioned Inhibition Theory
Conditioned inhibition has been conceptualized in a number of different ways. For example. Konorski (1967) viewed the behavioral indicators of inhibition as reflecting a link (association) between a neural center representing the CS and a center representing “no US” (specific to the US in question). The activation of this no-US center was thought to negatively summate with activation of the corresponding US center. Konorski (1948) and Rescorla ( 1979) proposed that conditioned inhibition arose from an elevation of the threshold of reactivated US trace strength necessary for conditioned responding to occur. Wagner and Rescorla ( 1972) regarded conditioned inhibition as being indicative of CS-US associations with negative value, i.e., symmetrical with excitation resulting from CS-US associations with positive value. All of these conceptualizations of conditioned inhibition seemed to us either implausible because they make incorrect predictions (e.g., see the discussion of extinction of conditioned inhibition in Section V,E) or inadequate because they fail to emphasize the dependency of conditioned inhibition (maintenance as well as acquisition) on underlying excitatory associations (see Lysle & Fowler, 1985). The comparator hypothesis is unique in denying that there is any underlying associative mechanism responsible for conditioned inhibition that differs from the mechanisms responsible for conditioned excitation. Rather, it posits that associations can only be excitatory. Conditioned excitatory responding is presumed to be a positive monotonic function of the associative value of the CS and a negative monotonic function of the associative value of the CS’s comparator stimuli, whereas the inverse relationships hold for conditioned inhibitory behavior, i.e., retardation and negative summation test performance. As traditional models of conditioned inhibitory mechanisms are moderately successful in explaining behavioral effects observed following inhibitory training (e.g., retardation test performance, summation test performance, and superconditioning),the comparator hypothesis, in denying the existence of traditional inhibitory mechanisms, is obligated to provide alternative explanations of these phenomena. By application of comparator processes and/or evocation of other established behavioral phenomena, we believe that viable alternative explanations can be provided. A.
RETARDATION TESTS
Following inhibitory training with a particular CS and US, that CS is typically slow to acquire excitatory strength when paired with the US relative to a condition in which that same CS had not undergone inhibitory training (Rescorla, 1969). We propose that retardation test performance
68
Ralph R. Miller and Louis D. Matzel
can be explained without recourse to traditional inhibitory mechanisms such as negative associations. Rather, we view retardation as the joint consequence of (1) loss of associability owing to CS preexposure, (2) habituation to the US, (3) contextual blocking of acquisition, and (4) the comparator mechanism itself (which may encompass factor 3; see Section IV,A). Of these phenomena, the first three are well established and known to occur under appropriate circumstances. If inhibitory training includes presentations of the CS alone, as is the case in negative contingency inhibitory training ( +/X-), retardation arising simply from this exposure to the CS prior to the CS-US pairings, that is, the CS-preexposure effect (Lubow & Moore, 1959), is a distinct possibility. The retardation seen following CS preexposure presumably is not due to true inhibitory learning because an initially neutral stimulus presented without reinforcement is known to fail a negative summation test for conditioned inhibition (Reiss & Wagner, 1972; Rescorla, 1971a). Instead, the CS-preexposure effect is generally assumed to arise from a decrease in the associability of the stimulus (see Section V,B for a discussion of how this view can be reconciled with the same stimulus passing a negative summation test). Like CS preexposure, US preexposure can interfere with subsequent conditioning. For example, repeated exposure to a U S can, under certain circumstances, result in habituation to the US (e.g., Thompson & Spencer, 1966).Consequently, repeated exposure to a U S during inhibitory training possibly could reduce the perceived intensity of the US and, therefore, the magnitude of the conditioned response arising from associations to the trace of the US. Additionally, to the extent that inhibitory training makes the context excitatory at the time of the retardation test CS-US pairings, blocking by the context is a distinct possibility (Randich & LoLordo, 1979). [The degree to which blocking at the time of acquisition is an acquisition deficit or a deficit in subsequent retrieval is an open question at this time (Balaz et al., 1982).] In addition to the three well-established phenomena which contribute to the retardation that is seen following inhibitory training, the comparator process itself suggests another source of retardation. If during inhibitory training the stimuli that will later serve as the comparator stimuli for the CS (i.e., background cues or discrete stimuli) become excitatory, comparator processes at the time of testing would be expected to attenuate the conditioned excitatory responding observed on a retardation test. Schachtman, Brown, Gordon, Catterson, and Miller (1987)investigated this possibility and found that, following negative contingency inhibitory training (+/X-),a change in context between inhibitory training and the retardation test CS-US pairings reduced retardation. Presumably, the neutral context used for the CS-US pairings of the retardation test con-
The Comparator Hypothesis
69
tributed to the CS comparator term at the expense of the contribution of the training context that was used for inhibitory training (see Section VI for further discussion of the effects of training in multiple contexts). Furthermore, after negative contingency inhibitory training with one CS, retardation was seen when a second CS was paired with the US provided that the same context was used. These two observations are compatible with both the comparator hypothesis and blocking by the training context, but not with any of the traditional explanations of inhibition. However, additional experiments in this series found that extinction of the training context following both negative contingency conditioned inhibition truining ( + I X - ) and subsequent returdation test CS-US pairings (in the training context) attenuated retardation to the same degree as did extinction of the training context following negative contingency conditioned inhibition training but prior to the retardation test CS-US pairings (see Fig. 7). Such results indicate that the retarded acquisition of excitatory responding seen following inhibitory training cannot be attributed entirely to an acquisition deficit. Moreover, none of the other factors that might contribute to the
T
TREATMENT
Fig. 7. Mean latencies in a neutral context (B) to complete 25 licks in the presence of the CS. Groups DB, ND, and DA received negative contingency inhibitory training with the noise CS and foot shock US in Context A. Group HC (habituation control) received equivalent shocks in Context C. As part of a retardation test, all subjects then received CSUS pairings in Context A. Group DB had Context A deflated (extinguished) before the CSUS pairings. Group DA had Context A deflated after the CS-US pairings. Groups ND and HC received no-deflation treatment. Lower scores indicate more retardation of conditioned responding (lick suppression). Brackets depict standard errors. From Schachtman c / ol. (1987). Copyright by the American Psychological Association. Reprinted by permission of the publisher.
70
Ralph R. Miller and Louis D. Matzel
retardation that is seen after inhibitory training (e.g., CS preexposure, habituation, blocking of acquisition, or traditional inhibitory mechanisms) can explain this loss of retardation. (Most studies of conditioned inhibition have controlled for some of these factors, but none has simultaneously controlled for all of them.)
B. SUMMATION TESTS Following inhibitory training with a CS, presenting that CS simultaneously with a known conditioned excitor ordinarily attenuates responding to the excitor more than if inhibitory training had not occurred (provided that the USs used in inhibitory training and in establishment of the known excitor were the same). Unlike retardation test performance, for which we have strong evidence that traditional inhibitory mechanisms are not necessary to explain behavior commonly regarded as indicative of inhibition, to date we have only preliminary results to support our two nontraditional alternative explanations of negative summation test performance following inhibitory training. 1.
Two Attentional variables
The rationale of Hearst (1972) and Rescorla (1969) in proposing that both retardation and negative summation tests be used to assess condi-
tioned inhibition was that retardation could arise from a loss of attention to the target stimulus as well as from conditioned inhibition, and negative summation could arise from heightened attention to the target stimulus (distraction from the known excitor) as well as from conditioned inhibition. Consequently, passage of both tests must uniquely reflect conditioned inhibition because attention to the target stimulus could not simultaneously decrease and increase. This reasoning is compelling provided that “attention” with respect to the two tests is assumed to refer to the same variable. However, there may be two relatively independent attentional variables, one that refers to attention as it impacts acquisition and a second that refers to attention as it impacts immediate performance. We choose to call the former associability and the latter salience. (This suggestion was inspired by, but differs appreciably from the attentional model of Pearce & Hall, 1980.) For example, stimuli that have been consistently reinforced have been found to have lost associability, but, as they elicit conditioned behavior, they likely have gained salience (Hall & Pearce, 1979; Kasprow, Schachtman, & Miller. 1985). A retardation test is a measure of readiness to learn new associations to a stimulus, whereas a summation test is a measure of how a stimulus influences immediate responding. Thus, conditioned inhibition training might increase salience of a stimulus because
The Comparator Hypothesis
71
the stimulus has become informative about the US, even if the information is that the stimulus announces a decrease in the likelihood of the US. Consequently, summation test performance with a “conditioned inhibitor” in part or whole might reflect an increase in the salience of the inhibitory stimulus that effectively reduces attention to the known excitor and, hence, reduces responding to the known excitor. Simultaneously, conditioned inhibition training might decrease associability of the stimulus owing to the CS having consistent consequences (at least with certain inhibitory training parameters; there are several other mechanisms that can contribute to retardation test performance following inhibitory training, see Section V,A).
2. Comparator Rule f o r Responding to Compounded Stimuli That Were Independently Trained Responding to two simultaneously presented, independently trained conditioned stimuli X and Y raises a question for the comparator hypothesis. The Rescorla-Wagner (1972) model of acquisition predicts with at least qualitative success that responding will reflect the summed associative strength of X and Y regardless of whether both X and Y are excitatory (positive associative strengths) or one is excitatory and one is inhibitory (negative associative strength). By denying the existence of negative associative strengths, the comparator hypothesis becomes obligated to provide an alternative summation rule that is at least as successful. In presenting the comparator hypothesis, we intentionally avoided making a statement concerning the mathematical representation of the putative comparison. We doubt that acquired behavior, which is profoundly influenced by a multitude of variables (and poorly understood variables at that), lends itself at this stage of our science to quantitative predictions regarding the detailed behavior of a single individual. Thus, we do not wish to see the comparator hypothesis rise or fall based on the accuracy of any quantitative prediction. Nevertheless, summation test performance may best be qualitatively understood through a mathematical rule for obtaining response potentials for compounded stimuli. Such a rule first requires a statement concerning the mathematical representation of the comparison process itself. We hazard the suggestion that conditioned responding reflects the difr ference between the associative strengths of the CS and the comparator stimulus, as opposed to a ratio of associative strengths such as proposed by Gibbon and Balsam (1981). Subtracting the associative strength of the CS from that of the comparator stimuli, excitatory responding would be expected to increase with increasing differences (with no responding below some difference near zero) and “inhibitory” behavioral effects would be
12
Ralph R. Miller and Louis D. Matzel
expected to increase with decreasing differences (with no inhibitory behavior above some difference near zero). The fact that the mathematical operations of subtraction and addition are associative (in the mathematical sense) makes moot the question of whether the sum of the associative strengths of X and Y is compared to the sum of the associative strengths of their respective comparators A and B, i.e., (X + Y) - (A + B), or the outcomes of the comparisons of X to A and Y to B are summed, i.e., (X - A) + (Y - B). The existing summation literature indicates that, in most cases, compounding two excitors yields more responding than to either alone, whereas compounding an excitor and a nominal inhibitor yields less responding than to the excitor alone (for a review, see Weiss, 1972). Both of these observations, as well as unpublished data from our laboratory suggesting that a compound of two nominal inhibitors yields enhanced inhibition relative to either inhibitor alone, are consistent with the proposed difference rule (as well as with the Rescorla-Wagner, 1972, model). The ultimate value of a comparator difference rule in explaining summation test performance remains to be determined. C. SUPERCONDITIONING Superconditioning, or supernormal conditioning as it has sometimes been called, refers to the enhanced responding to a CS that is observed when the CS is trained in the presence of a “conditioned inhibitor” relative to the CS being trained in the presence of the same additional stimulus but without prior inhibitory training with that stimulus (Rescorla, 1971b). Thus, the effect can be viewed as the opposite of blocking, which takes the form of impaired responding to a stimulus that was trained in the presence of a conditioned excitor. The traditional explanation of superconditioning, mirroring that of blocking, is that the “inhibitor” produces a negative expectation of the US on the compound trials, making the actual delivery of the US more surprising than it would have been in the absence of the inhibitory stimulus, and consequently “superconditions” the added element (Rescorla & Wagner, 1972). As a result of the broad acceptance of this interpretation, superconditioning has sometimes been regarded as a third test for conditioned inhibition (e.g., Revusky & Garcia, 1970). Navarro-Guzman, Hallam, Matzel, and Miller ( 1987) have presented data supporting an alternative interpretation of superconditioning that makes no mention of negative associations or other traditional mechanisms of conditioned inhibition. As mentioned previously, superconditioning typically has been inferred when a neutral stimulus comes to elicit excitatory responding more rapidly when reinforced in compound with an inhibitor than in compound with a neutral stimulus. Navarro-Guzman et al. noted that all of the significant demonstrations of superconditioning had
The Comparator Hypothesis
73
omitted control subjects that were trained without the added cue being present. Thus, there is no evidence that the superior responding to the CS by the superconditioned subjects arose from enhanced performance of that group as opposed to impaired responding by the standard control subjects for which the added cue was neutral. Such impairment might result from overshadowing of the CS by the added element. Inhibitory training might have reduced the associability of the added cue, thereby attenuating the potential of that stimulus to overshadow the target CS in the “superconditioned” group. To test this hypothesis, Navarro-Guzman et al. (1987) conducted a superconditioning study with both the standard superconditioning control group (i.e., with the overshadowing stimulus neutral rather than inhibitory) and an overshadowing control group (i.e., with the overshadowing stimulus absent). In Phase 1, the superconditioning group (N-NL) and the over-
N- NL
N- L
GROUP
Fig. 8. Mean latencies to complete 25 licks in the presence of the superconditioned CS (light; striped bar) and in the presence of the CS (noise: stippled bars) with which the light was compounded during excitatory conditioning with the US (foot shock). Groups N-NL and N-L initially received explicitly unpaired inhibitory training with the noise CS (N-) and shock US, while Group C-NL received explicitly unpaired inhibitory training with 4 click CS (C-) and shock US. Then Groups N-NL and C-NL received the noise and light compounded (NL) and immediately followed by shock. Group N-L received the light alone (L) followed by shock. Greater suppression to the light by Group N-NL than by Group C-NL indicates superconditioning. Suppression to the light by Group N-L indicates excitatory conditioning of the light without overshadowing by the noise. Brackets depict standard errors. From Navarro-Guzman e / ul. (1987).
14
Ralph R. Miller and Louis D. Matzel
shadowing control (N-L) received explicitly unpaired inhibitory training with a white noise stimulus serving as the inhibitor, whereas the standard superconditioning control group (C-NL) received as a control treatment equivalent inhibitory training with a click train as the inhibitor. In Phase 2, Groups N-NL and C-NL received simultaneous noise-light presentations immediately followed by foot shock, whereas Group N-L received equivalent reinforcement of the light alone. Subsequent testing on the light found more robust conditioned responding in Group N-NL than in Group C-NL, i.e., traditional superconditioning, but the performance of Group N-NL was less than that of Group N-L (see Fig. 8). This outcome raises the possibility that empirical superconditioning is no more than reduced overshadowing. In further experiments, Navarro-Guzman et al. (1987) found that a “conditioned inhibitor” that had been intermittantly reinforced failed to supercondition the CS with which it was subsequently reinforced. Presumably, partial reinforcement prevented the inhibitor from losing its associability (Pearce, Kaye, & Hall, 1984), and consequently the inhibitor still was able to overshadow the added element. Moreover, mere CS preexposure to a stimulus, which is believed to reduce associability of the stimulus but not make it into an inhibitor (Rescorla, 1971a), provided that stimulus with the potential to supercondition the CS with which it was subsequently reinforced (see Fig. 9). Thus, convergent data suggest that empirical superconditioning can be better explained in terms of decreased overshadowing of the target CS than in terms of subzero expectations of the U S arising from the presence of the conditioned inhibitor. Although this view of superconditioning does not evoke the comparator hypothesis, it does obviate the need to retain traditional views of conditioned inhibition in order to explain superconditioning.
D. CONDITIONED INHIBITIONA N D CONDITIONED EXCITATION ARE NOT MUTUALLYEXCLUSIVE Many early students of learning (e.g., Pavlov, 1927; Hull, 1943; Konorski, 1967) believed that a CS could be simultaneously an excitor and an inhibitor. However, the majority view at the present time is that conditioned excitation and conditioned inhibition are opposite halves of the possible values that can be assumed by a single variable representing associative strength (Rescorla & Wagner, 1972). Specifically, positive values of this variable are assumed to correspond to excitation and negative values are assumed to correspond to inhibition. This assumption leads directly to the conclusion that excitation and inhibition are mutually exclusive. Although such a conclusion is intuitively appealing in its simplicity, there is almost no data that support this conclusion, largely because investigators
The Comparator Hypothesis
:: 2.0
T
r
v)
Y
GROUP
Fig. 9. Mean latencies to complete 25 licks in the presence of the "superconditioned" CS (light; striped bars) and in the presence of the CS (noise; stippled bars) with which the light was compounded during excitatory conditioning with the US (foot shock). Groups NNL and N-L initially received nonreinforced exposures to the noise (N-) in the absence of any USs (i.e.. CS-preexposure treatment), while Group C-NL received nonreinforced exposures to the clicks (C-). Then Groups N-NL and C-NL received the noise and light compounded (NL)and immediately followed by shock. Group N-L received the light alone (L) followed by shock. Greater suppression to the light by Group N-NL than by Group CNL suggests an effect analogous to superconditioning. Suppression to the light by Group N-L indicates excitatory conditioning of the light without overshadowing by the noise. Brackets depict standard errors. From Navarro-Guzman cf ( I / . ( 1987).
have not tested the same stimulus for both excitation and inhibition following training that was likely to produce both effects. In terms of a contingency space (see Fig. 101, conditioned excitation would be expected to increase as the point representing the current training conditions moved toward the upper left corner and conditioned inhibition would be expected to increase as the point moved toward the lower right corner. Thus, excitation and inhibition would be most apt to coexist following training that corresponds to points relatively near the putative "line of indifference" (see Fig. I). Matzel ct (11. (1988a) trained rats in a lick suppression situation with P(US(CS) = .68 and P(USlno CS) = .33. Following this conditioning phase, they found that the CS not only passed the retardation and summation tests for conditioned inhibition, but also passed a direct test for
Ralph R. Miller and Louis D. Matzel
76
00 XLL
wo
0I
+
0
EXCITATORY ASSOCIATIVE STRENGTH OF COMPARATOR STIMULI BASED ON P(US NO-CS)
I
Fig. 10. Theoretical contingency space with zones marked in which conditioned excitatory and inhibitory behaviors might appear based on research indicating that excitation and inhibition are not mutually exclusive. See text for elaboration. From Matzel ct crl. (1988a). Copyright 1988 by Academic Press.
conditioned excitation. Thus, the potential for excitatory responding and inhibitory responding were seen to coexist in a single stimulus. The comparator hypothesis, unlike Rescorla's ( 1968) contingency theory and the Rescorla-Wagner ( 1972) model, allows simultaneous excitation and inhibition because it only predicts that increases in P(US1CS)and decreases in P( USlno CS) will increase behavioral excitation and decrease behavioral inhibition and that decreases in P(US1CS) and increases in P(US1 no CS) will decrease behavioral excitation and increase behavioral inhibition. In retrospect, the observation of simultaneous potentials for excitation and inhibition should not have been surprising because the tests for excitation and inhibition are not opposites of one another. They measure three different qualities of a CS (direct conditioned responding, retardation, and summation) that, without a strong theoretical bias, one would not expect to be perfectly correlated. Figure 10 depicts, on a hypothetical contingency space, the zone in which excitation might be expected and the zone in which inhibition might be expected. However, it oversimplifies things by assuming that the boundaries for obtaining retardation and negative summation are the same. The actual boundaries in a real situation will, of course, depend upon a number of variables that are not represented in a contingency space, such as arousal level and associative strength of
The Comparator Hypothesis
77
the known excitor used on the summation test (Mackintosh & Cotton, 1985).
E. EXTINCTION OF CONDITIONED INHIBITION Although the Rescorla-Wagner ( 1972) model predicts that operational extinction of a conditioned inhibitor, i.e., CS-alone presentations, should reduce inhibition, numerous studies have failed to observe such an effect (e.g., Zimmer-Hart & Rescorla, 1974). However, the comparator hypothesis, in assuming that only excitatory associations exist, predicts that conditioned inhibition will increase with both increasing associative strength of the training context and decreasing associative strength of the CS. Consequently, CS-alone presentations following inhibitory training with a CS should not merely fail to decrease conditioned inhibition, but should increase conditioned inhibition. Such an effect would depend upon the comparator stimuli not being appreciably extinguished during operational extinction of the CS, and the CS having some associative strength to lose following inhibitory training. To test this prediction of the comparator hypothesis, Miller and Schachtman ( 1985) gave rats negative contingency conditioned inhibition training with 33% reinforcement of the CS and 68% reinforcement of the absence of the CS (based on intervals lacking the CS being divided into temporal units equal in duration to the CS). This resulted in conditioned inhibition as measured by both retardation and summation tests. Consistent with the prediction, subsequent operational extinction of the CS resulted in an increase in conditioned inhibition as seen in retardation and summation tests. Other researchers have obtained similar effects (e.g., DeVito & Fowler, 1986; Holland & Gory, 1986; Pearce, Nicholas, & Dickinson, 1982).
Zimmer-Hart and Rescorla (1974) presumably did not obtain an increase in inhibition as a result of CS-alone presentations because they initially gave explicitly unpaired CS and US presentations during inhibitory training and their parameters were such that the CS accrued little or no secondorder excitatory value from CS-context associations. Thus, their inhibitor had no associative strength to lose. To determine the degree to which the increase in inhibition that we observed depended upon partial reinforcement of the CS during inhibitory training, we operationally extinguished inhibitory CSs that were either partially reinforced or explicitly unpaired with the US during inhibitory training (unpublished research). The partially reinforced CS yielded the larger increase in inhibition. However, the explicitly unpaired CS also showed small increases in inhibition which we believe arose from the CS losing second-order excitation that was mediated by CS-context and context-US associations.
78
Ralph R. Miller and Louis D. Matzel
VI. Training a CS in Multiple Contexts
The preceding discussion has assumed that one and only one context is used during the training of a single CS. However, there is, in principle, nothing that prevents training in more than one context. Based on a review of the literature and our own preliminary data, the effective comparator stimulus in such cases appears to be a hybrid of the two (or more) contexts in which training of the CS has occurred. The relative weight given to one training context as opposed to another seems to be positively correlated with the percentage of CS training trials given in the context and the relative recency of these training trials. The recency gradient, which has yet to be systematically investigated, is important in understanding what the effective comparator context of a CS is. One practical implication of the possible compounding of comparator contexts concerns test trials. In order to investigate comparator effects without confounding the response elicitation potential of the CS by test trial summation with the response elicitation potential of the comparator stimuli (often the training context), testing should ordinarily be performed in a neutral context. Even then, only the first test trial could possibly serve as a pure measure of comparator processes, since, as far as the subject is concerned, a test trial is functionally another training trial. Consequently, after even one test trial, the test context will be incorporated into the comparator term. Although the test context following a single test trial may have been the site of a very small percentage of the total number of CS trials, this trial will probably be disproportionately weighted owing to its relative recency. VII. The Temporal Window for Comparisons The comparator hypothesis states that a CS is compared at the time of testing to the internal representation of the stimuli that were present at the time of training. The question arises as to whether the “time of training” is only that during which the CS and US are present, or does it extend forward and backward in time (and, if the latter, how far forward and backward from each CS training trial). In the limiting case, contributions to the comparator could come from events occurring at any time during the training session (or even temporally proximal events outside the training session). Furthermore, the potential of a moment separated in time from a CS presentation to contribute to the comparator term need not be all or none; it more reasonably might be expected to be represented by a gradient that decreases with the temporal distance from the immediately preceding and subsequent CS presentations. The slope of such a
The Comparator Hypothesis
79
gradient around a CS will not necessarily be symmetrical between the time before and the time after CS presentation, and its specific shape will likely depend upon the parameters of the situation. Although there is a need for more research before all of these questions can be answered, a considerable body of data already suggests that the comparator term arises from some sort of local context (i.e., cues temporally and spatially proximal to the CS rather than the entire CS training session). For example, using pigeons in an autoshaping preparation, Reilly and Schachtman (1987) found that those parts of an intertrial interval that were filled with an irrelevant stimulus did not contribute appreciably to the comparator term, presumably because the larger context (including the irrelevant stimulus) was discriminated from the local context of the CS. Furthermore, Farley (1980), also using autoshaping with pigeons, found that a local context that signals a decrease in reinforcement will support responding to a CS even when the reinforcement rate for the context over the entire training session is greater than that for the CS. Both of these studies employed discrete stimuli to temporally differentiate local and distal contexts. Consequently, the subjects may have discriminated between what they perceived as different contexts rather than different temporal separations within a single context. However, Kleiman and Fowler (1984) found that the inhibitory potential of a stimulus following explicitly unpaired CS and US presentations ( + /X - ) increases in direct proportion to the temporal proximity of the CS to the US, provided that very short intervals were avoided (i.e., the CS just prior to the US which would favor excitatory responding to the CS). Collectively, these experiments make a case for the role of local context per se. Despite such evidence supportive of the relative importance of local context in modulating conditioned responding, distal context might be expected to contribute via generalization to the comparator term when the distal context is highly similar to the local context (e.g., Baker, 1977). Since comparator stimuli such as the CS training context are ordinarily not highly localized in time, studies of this issue would best use punctate comparator stimuli, e.g., Pavlovian conditioned inhibition (A +/AX rather than negative contingency conditioned inhibition ( + /X - ). VIII.
Comparator Stimuli for Comparator Stimuli
In all of the previously cited experiments from our laboratory, we measured direct response elicitation by the putative comparator stimuli, whether punctate cues (plus background cues) or background cues alone, and assumed that this behavior reflected the associative strength of the comparator stimuli. However, just as responding to a target CS appears
80
Ralph R. Miller and Louis D. Matzel
to be modulated by the associative status of its comparator stimuli, so too should responding to a comparator stimulus be modulated by the associative status of its comparator stimuli. In a typical animal conditioning experiment, the home cage which the animal occupies before and after the session is apt to contribute to the comparator term for the conditioning context. CSs embedded in the conditioning session might also contribute to the comparator term. However, strong reciprocal contributions by a CS to the comparator term for the CS’s comparator probably would be rare, owing to the presumably nonsymmetrical temporal relationship between the CS and its comparator stimuli. We have recently gathered preliminary data demonstrating the potential of a surrounding context to modulate responding to an embedded context. A more interesting and yet unanswered question is whether variation in the associative status of such a surrounding context would influence responding to a CS trained in the embedded context. On the one hand, the gradient for local context might simply reach out far enough to allow the surrounding context to contribute to the comparator term. On the other hand, changing the response potential of the embedded context without altering its associative strength might directly influence responding to the CS. This is tantamount to asking whether the comparator term represents the response potential of the comparator stimuli or, as we have previously proposed, the associative strength of the comparator stimuli. If the former possibility proves correct, it would call for a modification of the comparator hypothesis.
IX. Relationship of the Comparator Hypothesis to Other Models A. COMPARISON TO THE GIBBON-BALSAM MODEL
The comparator hypothesis has many features in common with the Gibbon-Balsam model (1981; also see Balsam, 1984; Jenkins, Barnes, & Barrera, 1981). which is based on scalar timing theory (Gibbon, 1977). Basically, the Gibbon-Balsam model states that conditioned responding reflects the ratio (comparison) of US waiting time in the presence of the context to U S waiting time in the presence of the CS. Despite the obvious similarities, the comparator hypothesis can be distinguished from the Gibbon-Balsam model on a number of levels. First, in the Gibbon and Balsam formulation, the effective comparator stimulus is always the context, whereas the comparator hypothesis assumes that all cues, punctate or otherwise, present during or proximal to CS training will contribute to the comparator term. Second, Gibbon and Balsam are silent concerning whether the context serving in the comparison is the training context or the test context, while the comparator hypothesis clearly states that it is
The Comparator Hypothesis
81
the training context that participates in the comparison. (Several investigators have mistakenly concluded that the comparator context in the Gibbon-Balsam model is that of CS testing (e.g., Ayres, Bombace, Shurtleff, & Vigorito, 1985; Durlach, 1983; Grau & Rescorla, 1984). Third, Gibbon and Balsam claim that acquisition with respect to a CS and a context are independent, which leads to the prediction that a CS will not overshadow acquisition to a context. (The data concerning this issue are mixed, e.g., Durlach, 1983; Gibbon & Balsam, 1981; Grau & Rescorla, 1984; Jenkins et al.. 1981.) In contrast, the comparator hypothesis is exclusively a response rule (but see Section IX,C) and, consequently, does not take a position concerning overshadowing of contexts by CSs. Fourth, Gibbon and Balsam would not predict enhanced excitatory responding to a CS following extinction of its comparator stimuli because they assume that waiting times are recalculated only following a US presentation. The comparator hypothesis makes the more traditional assumption that associative strengths of the CS and its comparator stimuli increase with reinforced presentations of these stimuli and decrease with nonreinforced presentations of these stimuli. Fifth, the Gibbon-Balsam model does not offer an explanation of behavior indicative of conditioned inhibition, whereas the comparator hypothesis (with the assistance of various established phenomena that are outside the domain of the comparator hypothesis, see Section V) can explain inhibitory behavior without recourse to any of the traditional theoretical mechanisms of inhibition. Finally, Gibbon and Balsam (1981) espouse a ratio rule for their comparison, while we are loath to subscribe to a mathematical formulation for the comparison until we feel that sufficient data are available to warrant such a precise formulation. However, forced to choose, we are currently inclined toward a difference rule (see Section V.B,2). We are painfully aware that by refusing to stand by a specific mathematical relationship we lose predictive precision, but we would rather be vague than be wrong. By not offering a mathematical relationship, the comparator hypothesis is reduced to being merely a qualitative statement. However, the greaterthan/less-than predictions afforded by a qualitative model are in reality no different than the predictions that are actually tested and reported in the journals (as opposed to the quantitative predictions that are possible, but rarely made) based on quantitative models of learning such as that of Rescorla and Wagner, 1972). B.
IMPLICATIONS FOR THE RESCORLA-WAGNER MODEL
The Rescorla-Wagner (1972)model is primarily a theory of acquisition, whereas the comparator hypothesis is concerned only with the translation of associations into behavior. Consequently, the two models in principle could complement each other. However, the previously mentioned effects
Ralph R. Miller and Louis D. Matzel
82
of posttraining deflation of comparator stimuli on phenomena such as conditioned inhibition, overshadowing, and the US-preexposure effect suggest that many phenomena commonly explained by the RescorlaWagner model are better accounted for by the comparator hypothesis. These shifts in explanation of select phenomena do not indicate that the Rescorla-Wagner model is incorrect in its basic thrust, but they do reduce the overall explanatory value of the model. The greatest undermining of the Rescorla-Wagner model by the comparator hypothesis comes from the data indicating that inhibitory phenomena can be explained without recourse to negative associations. The possible occurrence of negative associations is a central tenet of the Rescorla-Wagner model. If they do not exist, the Rescorla-Wagner model would require major revision.
C. IMPLICATIONSFOR LEARNING THEORIES I N GENERAL In the narrow theoretical sense, learning theory is concerned with the acquisition of associations. Most theories of learning, particularly those in the Pavlovian tradition, place little or no importance on response rules. Instead, they have attempted to explain most performance deficits in terms of acquisition failures. This explanatory burden has resulted in relatively complex theories of learning. To the degree that some of these phenomena are more readily explained in terms of response rules, they allow for less complicated models of acquisition. This shift in the locus of explanation does not necessarily represent a net simplification of theory, as the response rule that is the comparator hypothesis is more complex than the response rules that are typically attached to learning theories that emphasize acquisition. However, the data reviewed here, independent of the validity of the comparator hypothesis in particular, demand a more elaborate response rule than is traditionally offered. If we are forced toward a more complex response rule regardless of the complexity of the accompanying acquisition rules, we should at least take advantage of such response rules by adopting the simpler models of acquisition that would then be allowed. In the limiting case, contiguity theory, as ancient a notion as it is, might prove to be essentially correct in stating a sufficient condition for the formation of associations. Such a view is receiving increasing support from cellular investigations of learning (e.g., Farley & Alkon, 1987; Hawkins & Kandel, 1984). X.
Postconditioning Inflation of Comparator Stimuli
A. ASSESSMENTOF THE 2
X
2 MATRIX
The comparator hypothesis makes clear predictions concerning how decreases and increases in the associative strength of a comparator stimulus following the conclusion of CS training will influence responding to
The Comparator Hypothesis
83
the CS. (Changes in the associative strength of comparator stimuli in either direction before or during CS training have the effects predicted by the comparator hypothesis; however, the Rescorla-Wagner (1972) model and most other current models of learning are equally compatible with these observations.) In the case of excitatory measures, decreases in the associative strength of the comparator stimulus (extinction).following CS training are expected to enhance responding to the C S , and increases in comparator strength are expected to attenuate responding to the C S . In the case of behavior indicative of inhibition, posttraining decreases in the associative strength of the comparator stimuli are expected to attenuate manifest inhibition and increases are expected to enhance manifest inhibition (see Table 1). Research in our laboratory has consistently found that posttraining extinction of comparator stimuli attenuates manifest conditioned inhibition as seen on both retardation and negative summation tests (see Section 11,B).Additionally, we have observed enhancement of excitatory responding following posttraining extinction of comparator stimuli (e.g., reductions in overshadowing and the US-preexposure effect, see Section IV). While these latter effects of posttraining extinction of comparator stimuli upon excitatory responding have been reliable, they have not proven as ubiquitous as are the effects of posttraining extinction of comparator stimuli in attenuating inhibition. Why some situations exist that fail to yield the typically observed enhancement of excitatory responding is not clear to us at this time. We have not been able as yet to see any clear pattern to the occasional failures. Nevertheless, enhanced responding is the more usual outcome. In contrast to the general support found for the comparator hypothesis
TABLE I
THE2
X
2 MATRIX:PREDICTED AND OBSERVED EFFECTSOF ASSOCIATIVE STRENGTH OF COMPARATOR STIMULI FOLLOWING CS TRAINING
CHANGING
Associative strength of comparator stimuli for the CS
Response to CS Excitation
Inhibition
Decrease
Predict enhancement Observe enhancement (usually)
Predict attenuation Observe attenuation
Increase
Predict attenuation Observe no effect
Predict enhancement Observe no effect
84
Ralph R. Miller and Louis D. Matzel
following posttraining extinction of the comparator stimuli, posttraining pairings of the comparator stimuli and US (associative inflation) have proven almost uniformly unsuccessful across numerous unpublished studies performed in our laboratory (see also Ayres & Benedict, 1973; Kaplan & Hearst, 1985). Although these experiments have had null results, their variety and number begin to tell a story. Moreover, in each instance, tests of direct responding to the comparator stimuli indicate that these stimuli did gain associative strength as a result of the inflation treatment. Possibly, posttraining increases in the associative strength of comparator stimuli do have effects, but these effects are masked by opposing factors. For example, in the experiments in which these failures have been obtained, the comparator stimuli (usually the training context) have typically been made, at most, only mildly excitatory before or during CS training in order to provide the opportunity to increase the strength of the comparator stimuli following CS training. However, this may have resulted in the subject paying less attention to the context during CS training than would have occurred had the context been more excitatory during CS training, with the result that the CS-context association would be weak. Subsequent associative inflation of the context may have increased the context-US associative strength but not have been evident in responding to the CS because of the weak CS-context association. While we continue to investigate the possibility that the response potential of a CS can be affected by posttraining increases in the associative strength of its comparator stimulus, we suspect thaf such effects may not exist.
B.
IMPLICATIONS OF THE LACKOF EFFECTOF POSTTRAINING INCREASES IN THE ASSOCIATIVE STRENGTH OF
COMPARATOR STIMULI The apparent lack of effect of increasing the associative value of the comparator following CS training is an embarrassment to the comparator hypothesis. Assuming that such effects are not being masked, there are three courses of action available. First, the hypothesis can be abandoned. However, this would seem ill advised given the numerous successes of the hypothesis in suggesting illuminating experiments (e.g., extinction of comparators following inhibitory training, overshadowing, and US-preexposure retardation pairing trials) and explaining previously observed phenomena that were otherwise inexplicable (e.g., the effects of operational extinction of a conditioned inhibitor). Second, the comparator hypothesis could be modified to accommodate the lack of effect of posttraining increases in the associative strength of comparators. As a highly speculative example, the asymmetry between posttraining decreases and increases in associative status of comparators
The Comparator Hypothesis
85
could reflect a genetic predisposition arising from foraging patterns in the species’ natural habitat. In the field, rats ordinarily search for food and deplete a source once it is found; they do not ordinarily stay in a given location waiting for more food to arrive. As this example suggests, the major drawback to applying such bandaids to the comparator hypothesis is that the modifications tend to be post hoc. On the other hand, if they generate predictions that can be confirmed in the laboratory, their post hoc nature would be a moot issue. In the present example, this would involve examining the effects of posttraining increases in the associative strength of comparators across several species as a function of foraging strategy. Third, a strategy could be adopted of acknowledging the comparator hypothesis’ failure and still using the hypothesis to generate predictions and, when viable, to explain observations. Although such an approach may seem inconsistent, it just might be the wisest course to take until some new model is formulated that is able to handle the existing data and suggest interesting new experiments. Even though such a strategy is contrary to the popular view of science, it is, in fact, how science proceeds. For example, the Rescorla-Wagner ( 1972) model has been disconfirmed by at least two dozen phenomena (e.g.. failure of inhibitors to lose inhibitory value when they are presented alone, see Section V,E), but, researchers still use it when it fits their data. The reason for this is that the model continues to suggest interesting experiments and is elegantly simple. Moreover, no distinctly better model of acquisition has been proposed to date (although the comparator hypothesis suggests that contiguity theory might be closer to truth than has been generally believed in recent years, see Section IX,C). The lack of effect of posttraining increases in associative strength of comparators might be viewed as evidence that the putative comparison takes place at the time of acquisition. However, the frequently observed effects of posttraining decreases in the associative value of comparators is inconsistent with such a conclusion. This latter observation can be understood only in terms of the putative comparison occurring at the time of testing. In turn, this conclusion suggests that all comparisons occur at the time of testing because a subject does not “know” at the time of training if the associative status of the comparator is going to be increased, decreased, or left unchanged. The implication is that subjects compare the associative strength of the CS to the current associative strength of the comparator at the time of testing if the comparator has been associatively devalued (i.e., a recency effect) and to the associative strength of the comparator at the time of CS training if the comparator has been associatively inflated (i.e., a primacy effect). As direct tests of the comparator’s associative strength indicate that associative inflation was suc-
86
Ralph R. Miller and Louis D. Matzel
cessful, while comparator processes reflect the old associative value of the comparator, the subject must retain simultaneously at least two comparator-US associations. This suggestion is contrary to the assumption of associative path independence that has been a hallmark of learning theory since its inception. Acknowledgement of the view that subjects can retain information about past associative states alongside contrary information about current associative states would better correspond to reality than do the present path independent models (see Miller & Matzel, 1987), but would possibly complicate our learning theories to a degree that is currently unacceptable. XI. Generalization to Instrumental Behavior The examples of the putative comparator process described to this point have all involved Pavlovian conditioning. However, there is considerable reason to believe that the basic tenets of the comparator hypothesis hold as readily in instrumental situations. Empirically, the vigor of instrumental responding appears to reflect the relative strength as opposed to the absolute strength of the supporting association (e.g., behavioral and sirnultaneous contrast, Mackintosh. 1974). Furthermore, decreases in the value of the comparator-reinforcer association following training have been found to augment responding. For example, Dickinson and Charnock ( 1985) reported that thirsty rats responding operantly for liquid reinforcement will decrease responding when free liquid is introduced and that responding is restored by extinction of the training context following terminution of instrumental training (also see lnnes & Mills, 1985). Although the former effect could simply reflect reduced thirst, the latter effect lends itself to explanation in terms of comparator processes. In this example, responding with reinforcement (earned) was embedded in a context of not responding with reinforcement (free). Put in the more traditional terms of contrast studies, the net value of responding (i.e., rate of earned reinforcement times quality of reinforcement) appears to be compared to the net value of not responding (i,e., rate of free reinforcement times quality of reinforcement). In many choice situations, subjects have to choose between different schedules of reinforcement and/or reinforcers that differ in quality or quantity. If the response potential of each choice is modulated by the incentive value of all alternative choices available during training, the comparator hypothesis could begin to address contrast effects in general. The applicability of the comparator hypothesis to contrast situations depends upon the degree to which contrast effects hinge upon options that were available during training as opposed to options available during test-
The Comparator Hypothesis
87
ing. The traditional use of a single operant setting for such research has tended to obscure this distinction. Although a context shift between training with multiple behavioral options available and testing with a single option available would likely result in attenuated performance owing to generalization decrement, such attenuation would presumably be uniform and contrast effects arising from comparator processes should transfer. Of course such effects would have to be assessed rapidly because testing would constitute further training. Without going into detail, we also should note that the comparator hypothesis could readily be converted into the terminology of economic modeling and has much in common with optimal foraging theory (Krebs & McCleery, 1984; Zeiler. 1987).
XII.
Appraisal of the Comparator Hypothesis
The comparator hypothesis is a qualitative response rule that in principle can complement any model of acquisition. However, as it appears to explain a number of phenomena that various learning theories try to explain in terms of acquisition deficits, the comparator hypothesis better complements some learning theories than others. That the comparator hypothesis is qualitative (but see Section V,B,2) and does not directly address acquisition are intentionally imposed theoretical limitations. At the empirical level, the seeming lack of effect of posttraining increases in the associative value of comparators appears to be a failure of the comparator hypothesis (see Section X);however, this apparent failure pales in contrast to the successes of the hypothesis. What are the merits of the comparator hypothesis? First, it already has served and continues to serve as a useful heuristic device. Experiments suggested by the model have shed new light on the concept of conditioned inhibition (see Section V) and have provided a better understanding of phenomena such as the US-preexposure effect (see Section IV,A). overshadowing (see Section IV,B), and superconditioning (see Section V.C). Second, the comparator hypothesis highlights the fact that behavior depends far more on relative associative strength than on absolute associative strength. Although the specifics of the comparator hypothesis are probably incorrect, subsequent response rules will likely retain the notion of relativity in some form. Third, and perhaps most important, the comparator hypothesis draws our attention to postacquisition processing of information. Largely for historical reasons (see Section I), researchers have tended to overemphasize variability in acquisition and neglect the effects of differential processing subsequent to acquisition. If the comparator hypothesis stimulates an increased interest in postacquisition information processing, we will be well served.
88
Ralph R. Miller and Louis D. Matzel
ACKNOWLEDGMENTS This contribution was prepared with the support of NSF Grant BNS 86-00755. Thanks are due David G . Payne and Stanley Scobie for their comments on the model, and Nick Grahame, Steve Hallam. Fran Held. and Joan Wessely for their critical reading of an early draft o f the manuscript.
REFERENCES Anderson, J. R. ( 1983). The irrcliitectrrre of cognition. Cambridge. MA: Harvard University Press. Ayres, J. J. B.. & Benedict, J . 0. (1973). US-alone presentations as an extinction procedure. Aiiirnirl Lecrrning cind Behinior, I , 5-8. Ayres. J . J. B.. Bombace. J . C.. Shurtleff. D., & Vigorito, M.(1985). Conditioned suppression tests of the context-blocking hypothesis: Testing in the absence of the preconditioning context. Jortrnirl of Experimentirl Psycliology: Anirnirl Belruvior Proc~e.ssi~.s. 11, 1-14. Baker, A. G. ( 1977). Conditioned inhibition arising from a between-sessions negative correlation. Jortrnul c!f E.rperirnento1Psychology: Aniinul Behavior Proce.s.ses. 3, 144-ISS. Balaz. M. A.. Gutsin. P.. Cacheiro. H..& Miller, R. R. (1982). Blocking as a retrieval failure: Reactivation o f associations to a blocked stimulus. Qriurterly Jorrrnul of Experiinentul Psychology, M B , 99-1 13. Ballard, P. B. ( 1913). Oblivescence and reminiscence. British Jorrriicil of fsyc~liologyM o i i o grirph Sripplnnents. 1, 1-82. Balsam, P. D. (1984). Relative time i n trace conditioning. Annuls qf t h e New York Actrdcrny of Sciences. 423, 2 I 1-227. Bartlett, F. C. ( 1932). Remembering: A strrdy in experiinentirl iind sociirl psyc.lrology. Cambridge: Cambridge University Press. Brogden. W. J. ( 1939). Sensory pre-conditioning. Americun Joirrnirl c!fP.sychiIogy. 52,4655. DeVito. P. L., & Fowler, H. (1986). Effect o f contingency violations on the extinction of a conditioned fear inhibitor and a conditioned fear excitor. Jortrnul of' Exprrirnentcrl Psycliokigy: Anirnirl Behavior Processes. 12, 99-1 IS. Dickinson. A.. & Charnock. D. J. (198%. Contingency effects with maintained instrumental reinforcement. Quirrterly Jorirnitl of Experirnentd Psyi~liology.37B, 397-4 16. Durlach. P. J. (1983). The effect of signaling intertrial USs in autoshaping. Jorrrnd of Experirnentirl Psychology: Anirnul Bekuvior Priicessi~s.9, 374-389. Egger. M. D..& Miller. N. E. (1962). Secondary reinforcement i n rats as a function of information value and reliability o f the stimulus. Jorrmiil qf Expc~rirnentirlPsyc.liology. 64,97-104. Farley, J. (1980). Automaintenance, contrast, and contingencies: Effects of local vs. overall, and prior vs. impending reinforcement context. Leurning and Motivation. 11, 19-48. Farley, J., & Alkon, D. (1987). I n vitro associative conditioning o f Hermissendii: Cumulative depolarization of Type B photoreceptors and short-term associative behavioral changes. Journul of Neurophysiology, 57, 1639-1668. Gibbon. 1. (1977). Scalar expectancy theory and Weber's law in animal timing. Psyc~liologic~trl Reviiw. 84, 279-325. Gibbon, J.. & Balsam. P. (1981). Spreading association in time. I n C. M. Locurto. H. S. Terrace. & J. Gibbon (Eds.), Airtoslruping iind conditioning theory (pp. 2 19-253). New York: Academic Press.
The Comparator Hypothesis
89
Grau. J. W.. & Rescorla, R. A. (1984). Role of context in autoshaping. Jorrrncd ~~f’~vp~~ririic~r7terl Psyi~lrology:Animrrl Bektrvior Processes. 10, 324-332. Hall. G . . & Pearce. J. M. (1979). Latent inhibition o f a CS during CS-US pairings. Jorrrncil c?f Exprriinentcrl P.s.vcliolcigy:Animcil Behoiior P r ~ ~ c ~ c ~ 5, s s e3s1-42. . Hallam. S. C., Matzel. L. D.,Sloat, J., & Miller, R. R. (1987). Excitution und inhibition us u function of the excitutory cue used in fuvloviun inhibitory /ruining. Submitted. Hawkins. R. R.. & Kandel. E. R. (1984). Is there a cell alphabet for simple forms of learning’! PS.VC/IO/OR~C~~/ Rt’l*icJtta. 91, 375-39 I . Hearst, E. (1972). Some persistent problems in the analysis o f conditioned inhibition. In R. A. Boakes & M. S . Halliday (Eds.), Inhibition crnd Iecrrning (pp. 5-39). London: Academic Press. Holland. P. C. (1980). Influence o f visual conditioned stimulus characteristics on the form o f Pavlovian appetitive conditioned responding in rats. Jorrrntil qf E.vperiincwol f s y cliology: Anirnul Bektiiior Proces.ses. 6, 8 1-97. Holland. P. C.. & Gory, J . (1986). Extinction of inhibition after serial and simultaneous ntd feature negative discrimination training. Q i w t e r l v Jorrrncil of‘ E x p ~ r i n i ~ ~ P.sy~h/ogy,
388, 245-265. Hull. C. L. (1943). PrincYples qf belrcnior. New York: Appleton. Innes. N. K.. & Mills, W. A. ( 1985). The role of context cues in operant responding i n rats. neilcrlifIirrr~i ~r~)c~..s.se.s. 10, 2 I 1-218. James. J . H.. & Wagner. A. R. (1980). One-trial overshadowing: Evidence of distributed processing. Jorrrnrrl qf Experirnmtul Ps.vclio/o~v:Anirnrrl Bclrcri*ior Procwsi,s. 6, I88-
205. Jenkins. H. M.. Barnes. R. A.. & Barrera. F. J. (1981). Why autoshaping depends on trial spacing. I n C. M. Locurto. H. S. Terrace. & J. Gibbon (Eds.), Arr/o.slrcrping irncl COW ditioning theory (pp. 255-284). New York: Academic Press. Kaplan. P. S. (1985). Explaining the effects of relative time in trace conditioning: A preliminary test o f a comparator hypothesis. Anirnrrl Leorning trnd Beliavior. 13, 235-238. Kaplan. P. S.. & Hearst, E. (1985). Excitation, inhibition. and context: Studies of extinction and reinstatement. I n P. D. Balsam & A. Tomie (Eds.). Contcw trnd Icwninp (pp. 195224). Hillsdale, NJ: Erlbaum. Kasprow, W. J.. Cacheiro. H.. Balaz. M. A., & Miller. R. R. (1982). Reminder-induced recovery o f associations to an overshadowed stimulus. Lerrrning crnd Motiiwion. 13,
155-166. Kasprow. W. J.. Catterson. D.. Schachtman. T. R.. & Miller, R. R. (19x4). Attenuation of latent inhibition by postacquisition reminder. Qurwtrrlv Jorrrnol of’E.vperirnento1 Psycholcigy. 368, 53-63. Kasprow. W. J . . Schachtman. T. R., & Miller, R. R. (1985). Associability of a previously conditioned stimulus as a function of qualitative changes in the US. Qircirtcrly Jorrrnd qf Experirnenttil P.svc~lrologv, 378, 33-48. Kasprow. W. J . , Schachtman. T. R.. & Miller. R. R. (1987).The comparator hypothesis of conditioned response generation: Manifest conditioned excitation and inhibition as a function of relative excitatory associative strengths of CS and conditioning context at the time of testing. Jorirnal of Expcrinwntul Psychok~gy:Aniiniil Behcriior Procx~ssc~s. 13, 395-406. Kaufman. M. A.. & Bolles. R. C. (1981). A noiiassociative aspect of overshadowing. Brtll~~rin of‘ the, P.svc~/ionornicSocietv. 18, 3 18-320. Kleiman. M.. & Fowler, H . (1984). Effects o f contingency violations on the extinction of conditioned fear inhibition and conditioned fear excitation. Laornin~ctnd Motii~rtion. 15, 127-155. Konorski. J . ( 1948). Conditioned reflexes trnd neirrril orgtinizcrtion. Cambridge: Cambridge University Press.
90
Ralph R. Miller and Louis D. Matzel
Konorski, J. (1%7). Integrative activity of the bruin: An interdisciplinury upprouch. Chicago: University of Chicago Press. Kraemer. P. J., Lariviere, N., & Spear. N. E. (1988). Expression of a taste aversion conditioned with an odor-taste compound: Overshadowing is relatively weak in weanlings and decreases over a retention interval in adults. Animul Leurning und Behuvior. 16, 164-168.
Krebs, J. R.. & McCleery. R. H.(1984).Optimization in behavioural ecology. In J. R. Krebs & N . B. Davies (Eds.). Behcrviorul ecology: An evi)liitionurv iippri)oc~l~ (2nd ed.. pp. 91-121). Sunderland. MA: Sinauer. Lubow. R. E.. & Moore. A. U.(1959). Latent inhibition:The effect of nonreinforced exposure to the conditioned stimulus. Jorimul of Coinpurutivr iind Pliysiologii~ulPsvchology. 52, 4 15-4 19. Lysle. D. T.. & Fowler, H. (1985). inhibition as a “slave” process: Deactivation of conditioned inhibition through extinction of conditioned excitation. Jorirnal of Experitncwtitl Psvc~liology:Anirnul Behuvior Processes. 11, 7 1-94. Mackintosh. N. J. (1971). An analysis of overshadowing and blocking. Qrw-terly Jor/rntrl of Experimental Psychology. 23, 118-125. Mackintosh. N . J. (1974). The psychology of unimul Irurning. London: Academic Press. Mackintosh, N . J . (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Ps.vc~holi)gicwlReview, 82, 276-298. Mackintosh. N. J. ( 1976). Overshadowing and stimulus intensity. Anirnul Lerrrning itnd BPhovior. 4, 186-192. Mackintosh, N . J.. & Cotton, M. M. (1985). Conditioned inhibition from reinforcement reduction. in R. R. Miller & N. E. Spear (Eds.), Inji>r/ncitionprocessing in i u i i m i 1 . v : Conditioned inhibition (pp. 89-1 I I). Hillsdale, NJ: Erlbaum. Mackintosh, N. J., & Reese, B. (1979). One-trial overshadowing. Qtriirtcrly Jorrrncrl of Experimentnl P.vychli)gv, 31, 5 19-526. Matzel. L. D., Brown, A. M., & Miller, R. R. (1987a).Associative effects of US preexposure: Modulation of conditioned responding by an excitatory training context. Jorimcrl qf Experimental Psychology: Animal Behavior Processes, 13,65-72. Matzel, L. D.. Shuster, K., & Miller, R. R. (1987b). Covariation in conditioned response strength between elements trained in compound. Animul Leurning und Brhuvior.. 15, 439447. Matzel, L. D., Gladstein. L., & Miller, R. R. (1988a). Conditioned excitation and conditioned inhibition are not mutually exclusive. Leurning und Motivation. 19, 99-121. Matzel, L. D.. Hallam, S. C., & Miller, R. R. (1988b). Contribution of conditioned opioid analgesia to the shock-induced associative US-preexposure deficit. Anirnul Leurning und Behavior, in press. Matzel, L. D., Schachtman. T. R., & Miller, R. R. (1985). Recovery of an overshadowed association achieved by extinction of the overshadowing stimulus. Leurning und Motivution. 16, 398412. Miller, R. R., Kasprow, W. J., & Schachtman, T. R. (1986). Retrieval variability: Sources and consequences. Arnericirn Jorimul qf /?syc~holi>gv,99, 145-2 18. Miller. R. R.. & Matzel. L. D. (1987). Memory for associative history of a CS. Leurtiing und Motivation. 18, 118-130. Miller. R. R.. & Schachtman. T. R. (1985).Conditioning context as an associative baseline: Implications for response generation and the nature of conditioned inhibition. In R. R. Miller & N. E. Spear (Eds.). Infi~nnutiiinprocessing in unimii1.s: Conditioned inliihitiort (pp. 51-88). Hillsdale, NJ: Erlbaum. Miller. R. R., & Springer. A. D. (1972). Induced recovery of memory in rats following ECS. Plry.vii~li)~v & Belriivior. 8, 645-65 I .
The Comparator Hypothesis
91
Navarro-Guzman. J. 1.. Hallam. S. C.. Matzel, L. D.. & Miller, R. R. (1987). Siiperiwnditicining iind o i ~ 0 : s l i i i d ~ ~ i i ~ Submitted. ing. Neely. J. H. (1976). Semantic priming and retrieval from lexical memory: Evidence for facilitatory and inhibitory processes. Memoiy irnd Cognition. 4, 648-654. Pavlov. 1. P. ( 1927). Conditioned reflexes. London: Oxford University Press. Pearce, J. M.. & Hall. G . (1980). A model for Pavlovian conditioning: Variations in the effectiveness of conditioned but not unconditioned stimuli. Psvi~liologii~iil Review. 87.
332-352. Pearce. J. M.. b y e . H.. & Hall. G . (1984). Predictive accuracy and stimulus associability: Development of a model for Pavlovian learning. I n M. L. Commons, R. J . Herrnstein. & A. R. Wagner (Eds. 1, Qiriintitirtive irnii1yse.s of beliuvi~ir:V o l . 3, Ai.yiiisitiori (pp. 241-2551. Cambridge. MA: Ballinger. Pearce. J. M.. Nicholas. D. J.. & Dickinson. A. (1982). Loss of associability by a conditioned inhibitor. Qriurterly Joirrnal qf Experirnentiil Ps.vc~liologv.33B, 149- 162. Perruchet, P. ( 1985). A pitfall for the expectancy theory of human eyelid conditioning. Prwloviun Joirrniil of Biologii~rlScience, 20, 163-170. Prewitt. E. P. (1967). Number o f preconditioning trials in sensory preconditioning using 64,360-362. CER training. Joiirniil c?f Compiirutive und Physiologii~iilPsychoIo~,v. Randich. A.. & LoLordo, V. M. (1979). Preconditioning exposure to the unconditioned stimulus affects the acquisition of a conditioned emotional response. Leurning tirid M o 1iviition. 10, 245-277. Redly. S .. & Schachtman, T. R. (1987). The effects o f IT1 fillers in autoshaping. Lctirnirig
tind Motiiwtion. 18, 202-2 19. Reiss. S.. & Wagner. A. R. (1972). CS habituation produces a "latent inhibition effect" but no active "conditioned inhibition." Leiirning iind Motiiwtion. 3, 237-245. Rescorla. R. A. (1968). Probability of shock in the presence and absence of CS i n fear conditioning. Jorirnal qf Comparative und Plivsiologii~irlPsyc~ho/ogy,66, 1-5. Rescorla. R. A. ( 1969). Pavlovian conditioned inhibition. P.svc~hologii~iil Birlletin. 72, 77-94. Rescorla, R. A. (1971a). Summation and retardation tests of latent inhibition. Jorrrnd q/' Coinpcrriitive iind Pliysicilogii~iilPsvi~hologv.75, 77-8 I . Rescorla. R. A. ( 1971b). Variation in effectiveness of reinforcement following prior inhibitory conditioning. Leiirning und Motivution. 2, 113-123. Rescorla, R. A. (1979). Conditioned inhibition and extinction. I n A. Dickinson & R. A. Boakes (Eds.), Mechanisms of leurning and motivotion (pp. 83-1 10). Hillsdale. NJ: Erlbaum. Rescorla. R. A.. & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness o f reinforcement and nonreinforcement. I n A. H. Black & W. F. Prokasy (Eds.). Clussicul conditioning. 11: Current reseurch und theory (pp. 64-99). New York: Appleton. Revusky. S . . & Garcia. J . (1970). Learned associations over long delays. In G . H. Bower (Ed.), The p.svc~l~ology of lecirning iind rnotivirtion (Vol. 4, pp. 1-83). New York: Academic Press. Schachtman. T. R.. Brown. A. M.. Gordon, E., Catterson. D.. & Miller. R. R . (19x7). Mechanisms underlying retarded emergence of conditioned responding following inhibitory training: Evidence for the comparator hypothesis. Jorrriicrl (if' Exp~~rirnerittil Psycliologv: Anirniil Behuvicir Processes. 13, 3 10-322. Schull. J . ( 1979). A conditioned opponent theory of Pavlovian conditioning and habituation. I n G . H. Bower (Ed.). The psyc/io/ogy qf Ieiirning iind rnoti~wtiori(Vol. 13. pp. 57-90), New York: Academic Press. Thompson, R. F.. & Spencer, W. A. (1966). Habituation: A model phenomenon for the study of neuronal substrates of behavior. P.svi~liologii~cil Revirii~.73, 1 6 4 3 .
92
Ralph R. Miller and Louis D. Matzel
Thomson, D. M..& Tulving, E. (1970). Associative encoding and retrieval: Weak and strong cues. Joirrniil i?f Experiinentul Psvihology. 96, 255-262. Timberlake. W. (1986). Unpredicted food produces a mode of behavior that affects rats’ subsequent reactions to a conditioned stimulus: A behavior-system approach to context blocking. Anirniil Lecirning iind Beliuvior, 14, 276-286. Tolman, E. C. (1932). Pitrposive hekuvior in iinimuls und men. New York: Century. Tolman, E. C.. & Honzik. C. H. (1930). Introduction and removal of reward. and maze performance in rats. University of Cirliforniii Prrhliccitic~nsin Psyclrology. 4, 257-275. Tomie, A. ( 1976). Interference with autoshaping by prior context conditioning. Joiirntrl c!/ Experiinentiil Psyclioli~gy:Anirnul Behiivior Processes. 2, 323-334. Wagner, A. R., & Rescorla, R. A. (1972). Inhibition in Pavlovian conditioning: Application of a theory. In R. A. Boakes & M. S. Halliday (Eds.), fnliihition iirid Ieiirning (pp. 301336). London: Academic Press. Weiss, S. J. (1972). Stimulus compounding in free-operant and classical conditioning: A review and analysis. Psychologicirl Birlletin, 78, 189-208. Zeiler, M. D. (1987). On optimal choice strategies. Joitrnirl qf E.rperiinentir1 P.vyih)lo~y:v: Anirniil Beliui+or P r o i v . ~ s ~ 13, s . 3 1-39. Zimmer-Hart. C. L.. & Rescorla. R. A. (1974). Extinction of a Pavlovian conditioned inhibitor. Jorirnul of Cotnptrrirtiw iind P l ~ y s i o l ~ ~ g iPsycliologv. cul 86, 837-845.
THE EXPERIMENTAL SYNTHESIS OF BEHAVIOR: REINFORCEMENT, BEHAVIORAL STEREOTYPY, AND PROBLEM SOLVING Barry Schwartz
I. Introduction What determines behavioral units in organisms? Into what categories does behavior get organized? For a time in the history of learning theory, it was thought by many that behavior had to be explained at the level of individual muscle movements (Hull, 1943, 1952; Logan, 1956, 1960). But some learning theorists thought this strategy inappropriate. Tolman ( 1932) argued that the molecular level of analysis was simply the wrong one for understanding instrumental behavior. Rats trained to press a lever with the left paw could use the right one if the left paw was disabled; rats trained to run down an alley could swim down the alley if it was flooded. In general, preventing the execution of specifically trained muscle movements did not prevent execution of the “act.” Tolman argued that instrumental conditioning involved the development of expectations about means-ends relations, not the mechanical connection of muscle movements to outcomes. Learning involved the formation of representations that were considerably more abstract than connections between sensory inputs and particular muscle twitches (see Roitblat, 1982). Skinner (1935, 1938) also objected to molecular explanation and suggested instead that units of behavior be defined functionally in terms of their effect on the environment and that generalizations should be sought relating environmental events such as reinforcement to the occurrences of these functional units. Rather than having the experimenter specify in THE PSYCHOLOGY OF LEARNING AND MOTIVATION. VOL. ??
93
Copyright 0 I988 by Academic Press. Inc. All rights of reproduction in any form reserved.
94
Barry Schwartz
advance what the appropriate behavioral units should be, one could establish a broad functional class and let the animal's behavior reveal what the units of behavior actually were. Thus, the lever press, or the key peck, was understood to be a response class, and though members of the class might be distinct from one another in form, they would be treated as functionally equivalent as long as they possessed the property that defined membership in the class. The class established would be appropriate if relations between reinforcement variables and the occurrence of members of the class were orderly. Orderly relations would be the indication that experimental procedures had "carved nature at its joints" (e.g., Fodor, 1983). Though Skinner's ideas seemed largely methodological in intent. there is in them an implicit theory of how units of behavior are determined in nature. The rat starts out initially with a diverse and inefficient set of lever presses. Presses differ a great deal from one to another and include much movement and energy expenditure (for example, rearing, sniffing, turning around, stalking) that is unnecessary. Gradually, as the rat learns that lever depression is the only one of its movements that is reliably and consistently followed by reinforcement, extraneous movements drop out, variability decreases, and an eficient, stereotyped lever press emerges. In effect, the contingency of reinforcement has not reveuled a functional behavioral unit so much as it has created one. And the form of this creation is determined by what the contingency of reinforcement specifies together with the differential reinforcement of response forms that meet the specification with ever greater efficiency. Though there are no doubt some constraints on what kinds of behavioral units can be created, within broad limits, the Skinnerian answer to questions about behavioral units is that they are defined by the prevailing contingencies of reinforcement. Thus, the discipline that is traditionally known by its practioners as "the experimental analysis of behavior" might be more accurately described as "the experimental synthesis of behavior." Until fairly recently, studies of the form or the structure of instrumental behavior were rather infrequent, partly because the interests of investigators lay elsewhere and partly because the responses that were typically studied were so simple in form that whatever changes in structure or organization occurred in the course of training were essentially complete within the first experimental session. As interest in studying the structure and organization of behavior has grown in recent years, the complexity of required responses has been greatly increased, so that the development of efficient, sterotyped response forms takes time and can be investigated. Most often, this increase in complexity has been achieved by making simple responses like the lever press or the key peck single components of multicomponent sequences that together constitute the required response
The Experimental Synthesis of Behavior
9s
(e.g., Capaldi, Verry, Nawrocki, & Miller, 1984; Fountain, Henne, & Hulse, 1984; Hulse & Dorsky, 1977, 1979; Marr, 1979; Page & Neuringer. 1985; Piscaretta, 1982; Wasserman, Nelson, & Larew 1980; Vogel & Annau, 1973). One particularly useful technique that provides the foundation for all the work to be discussed later was reported by Vogel and Annau (1973). They exposed pigeons to a task that required exactly three pecks on each of two response keys for reinforcement. These pecks could occur in any order as long as three pecks were directed at each key before a fourth peck was directed at either. The pigeons were guided in this task by a 4 x 4 matrix of lights. A trial began with the topleft matrix light illuminated. Each peck on one of the keys moved the light down one position, and each peck on the other key moved the light one position to the right. Reinforcement occurred with the illumination of the bottom-right light. What is especially nice about about this task is that the contingency of reinforcement does not require stereotyped, structured behavioral sequences. There is room for a great deal of easily measured behavioral variability with little or no cost to the organism in reinforcement; 20 different response sequences satisfy the reinforcement contingency. Nevertheless, Vogel and Annau found that each pigeon developed a particular, stereotyped response sequence that occurred on up to 90% of all trials. We have extended the Vogel and Annau findings with procedures quite similar to theirs, except that the matrix of lights is 5 x 5 and reinforcement requires four pecks on each key so that 70 different sequences satisfy the contingency (Schwartz, 1980, 1981a,b, 1982a.b. 1985, 1986a.b; Schwartz & Reilly. 1983, 1985). Like them, we have found that sterotyped response sequences develop for each pigeon, occurring on 6 5 - W o of all trials. Furthermore, these sequences seem to develop into integrated behavioral units that are resistant to disruption and that respond to variation in reinforcement parameters in the way that individual responses do. Some representative findings that support these points are these:
I . In well-trained animals, extinction-the witholding of reinforcement-reduces the likelihood that response sequences will be initiated. But if a response sequence is initiated, it is almost always the animal's dominant. stereotyped one and pecks within the sequence occur at the same rate as when sequences are being reinforced (Schwartz, 1981b). 2. When sequences are being reinforced on intermittent schedules of various types, including multiple and concurrent schedules, one observes the same functional relations between sequence responding (with entire sequences treated analytically as single responses) and various parameters of reinforcement as are routinely reported in the extensive literature involving simple responses. So, for example, the well-known "matching
96
Barry Schwartz
law” (Herrnstein, 1970) provides an accurate description of response allocation when whole sequences are taken as behavioral units, but not when the individual responses that comprise them are (Schwartz, 1982b, I986a). 3. When pigeons are subjected to a lengthy posttraining retention interval (up to 60 days) in which they engage in no experimental activity, sequence stereotypy is almost perfectly preserved (Schwartz & Reilly, 1985). Impressive evidence like this that response sequences become stereotyped and structured even when neither stereotypy not structure is required for reinforcement led us to investigate the effects of procedures that demand structure and stereotypy. We found that pigeons learn, with little apparent difficulty, to produce sequences that require particular internal patterns, such as beginning with two left-key pecks or with a left-right alternation (Schwartz, 1980, 1982a). However, we also found (as had Vogel and Annau) that when reinforcement depends upon the absence of stereotypy, stereotyped sequences develop anyway. In two experiments 1980. 1982a), one with experienced and one with naive pigeons, reinforcement required that a sequence of four left or four right pecks differ from the sequence that had occurred on the immediately preceding trial. Pigeons showed almost no evidence of responsiveness to this variability contingency. Thus, we tentatively concluded that stereotypy might be an inevitable consequence of procedures involving reinforcement. Reinforcement strengthened whatever responses preceded it, with the result that stereotyped repetition of particular behavior patterns was all that reinforcement could produce. Our tentative conclusion proved too bold and overstated. Page and Neuringer (1985) repeated our experiments with some procedural modifications. They found that pigeons were able to produce varied sequences, but only if no additional response requirement was operative. That is, if pigeons were only required to make eight pecks on two keys with no demand that they peck each key four times, they could master a contingency that required intersequence variability. Page and Neuringer argued that pigeons were, in effect, learning to be random response generators, with such randomness yielding a high degree of trial-to-trial variability. The trouble with random choice of left and right keys, in our procedure but not in Page & Neuringer’s, is that it will often result in more than four pecks at one of the keys and, therefore, fail to produce reinforcement. In support of this analysis, Page and Neuringer showed that a computer program designed to simulate a random responder attained about the same level of success under our variability contingencies as did our pigeons. Thus, one is tempted to conclude that pigeons can learn to produce variation, but only by behaving randomly.
The Experimental Synthesis of Behavior
97
Are there alternatives to stereotypy on the one hand and randomness on the other? Interestingly, versions of this question have surfaced in recent discussions of whether apparently intelligent human behavior can be explained in terms of the operation of unintelligent procedures of (random) variation and selection (e.g., Dennett, 1975; Rosenberg, 1986a,b; Skinner, 1981).In the domain of human behavior, the intuition is certainly strong that stereotypy and randomness do not exhaust the behavioral possibilities. People seem quite capable of engaging in intelligent and systemuric variation, as in the case of the scientist who goes about constructing and testing hypotheses in an attempt to analyze some phenomenon. Indeed, it may be just this systematicity that distinguishes intelligent human problem solving from the problem solving that characterizes natural selection (but see Campbell, 1974; Ruse, 1986, for the alternative view that human problem solving is essentially isomorphic with natural selection). It was largely to explore issues like these that we conducted a series of experiments with human subjects that roughly parelleled our experiments with pigeons (Schwartz, 1982~).College students (in this and all subsequent studies, Swarthmore College undergraduates) sat in a small experimental room that contained a 5 x 5 matrix of white lights, two response buttons, one red light, and a counter (see Schwartz, 1 9 8 2 ~ for details). Pushes on the two response buttons moved the illuminated light (always the top-left light at the start of a trial) in the matrix either down or to the right. Four pushes on each button before a fifth on either lit the red light and incremented the counter. Accumulated points were redeemed with cash at the end of a session. We found that, like the pigeons, each subject settled on one sequence and produced it on almost every trial. Indeed, human subjects evidenced greater stereotypy than pigeons. Figure 1 presents representative data averaged across four subjects. It is clear that, with experience, the proportion of trials containing the dominant sequence increased until subjects were producing the same sequence on almost every trial of each 50-trial session. When subjects completed the experiment, they were asked what they thought was required to produce points. About 65% of all subjects asked this question answered with a description of their dominant sequence. That is, they identified the particular sufficient condition that they had developed as a necessary condition. This kind of error has been reported in quite different contexts by Wason and Johnson-Laird (1972, Chapters 6 and 13; see also Wason, 1964) and by Mynatt, Doherty, and Tweney (1977, 1978) and discussed by them as reflecting a bias to overvalue confirming and undervalue disconfirming evidence with respect to whatever hypothesis is under consideration. Though our subjects' erroneous understanding of their task was easily corrected with one or two simple probe questions, this finding nevertheless suggested the possibility that a history of reinforcement might contribute to the confirmation bias that other in-
98
Barry Schwartz 100 90
10 V
.
1
2
3
4
5
6
7
8
9
SESSION Fig. I .
Percentage of trials in each SO-trial session of the sequence procedure in which each subject’s dominant sequence occurred. Data are from Schwartz (1982~).
vestigators have observed (see Einhorn & Hogarth. 1978, for a similar suggestion). This led us to examine the effects of reinforcement on hypothesis testing and rule discovery in our experimental situation. In one experiment (Schwartz, 1982c, Experiment 41, a constraint was imposed so that four pushes on each button were necessary, but not sufficient, to obtain reinforcement. In other words, only a subset of 70 sequences of four pushes on each button was actually effective. Subjects were instructed that they would earn 5 cents for each correct sequence. They were also asked to discover the rule that determined whether or not their responses produced points. Data for six subjects are presented in Fig. 2. None of these subjects developed full-blown stereotypies until they had discovered the rule (downward arrows). Until rule discovery occurred, they varied their response patterns reasonably systematically and effectively, the way laboratory scientists might. What this experiment demonstrated was that reinforcement for correct responses does not inevitably result in stereotypy. If subjects are motivated to discover generalizations rather than to maximize points (money), they will behave in ways that are consistent with that motivation. Just as a sensible scientist will not keep repeating the same experiment because it “works,” the sensible rule seeker does not keep repeating the same sequence.
99
The Experimental Synthesis of Behavior
100
5 50 2 40
3
30
020
*lo O J
1
I
2
3
4
5
6
7
8
9 1 0 1 1 1 2
SESSION
.,
Fig. 2. Percentage of trials in each SO-trial session of the sequence, rule-discovery proS14; 0 ,S15; S16; 0 , cedure in which each subject’s dominant sequence occurred. S17; A, S18; A. S19. Arrows indicate the session in which S15. S18, and S19 correctly identified the rule and show that when sterotypy occurred, it did not develop until after the rule was discovered. S14. S16. and S17. whose sequences never became stereotyped, discovered the rule in the 10th. 12th. and 8th sessions, respectively. Data are from Schwartz (1982~).
+,
However, when subjects were confronted with this rule discovery task after several sessions of pretraining in which correct sequences earned points (money) according to the same contingencies and no mention was made of rule discovery, their performance on the rule-discovery task was not as efficient as the performance of naive subjects. They were less likely to discover the operative rule and took longer to do so (Schwartz, 1982c, Experiment 5 ) . This result suggests that whether or not a reinforcement contingency that rewards and produces stereotyped repetition will gain control of behavior depends upon subjects’ past history with similar contingencies. The rule discovery situation is one that has conflicting influences on subjects. The explicit payoffs encourage stereotypy. After all, each time a subject produces a sequence that differs from previously used successful ones, there is a distinct possibility that it won’t work. On the other hand, the instructions encourage variation, since that is the only way to discover correct generalizations. In effect, the procedure is asking subjects to choose between acting like research scientists (whose task is to discover general rules or laws) and engineers (whose task is to discover and implement any procedure that works). Pretraining makes our subjects into payoff-maximizing engineers.
IOU
Barry Schwartz
This conclusion was supported by the results of another study (Schwartz, 1982c, Experiment 6). Subjects were exposed to a procedure in which reinforcement required that a given sequence be different from the ones that had occurred on the last two trials. Subjects succeeded on this task by developing higher-order stereotypies. That is, they would switch systematically between three or four different sequences. This is, of course, a perfectly efficient way to produce what the contingency demands. However, it turned out to have a surprising and inefficient consequence when the task was changed. In the new task, a sequence was only reinforced the first time it occurred in any block of 50 trials. Thus, producing 50 reinforcements required 50 different sequences of four left and four right responses. Since there are only 70 such sequences possible, success on this task requires either a prodigious memory for sequences already produced or some systematic way of varying sequences so that repetition is impossible or unlikely. Subjects (12) were exposed to this task for several 50-trial blocks. Half were pretrained on the variability task just described and half were naive. The results for each group are presented in Fig. 3, which plots the mean per-
100
T
90
y 00 4 70 [r
I- 60
10
0
I
1
2
3
4
5
6
7
0
9
10
SESSION Fig. 3. Percentage of trials in each SO-trial session that contained novel sequences in the sequence procedure in which only novel sequences were reinforced. Naive subjects ( , n = 6) had no prior experience while pretrained ( 0 ,n = 6) had been exposed to a procedure that reinforced only sequences different from the previous two. Data are from Schwartz ( 1982~).
+
The Experimental Synthesis of Behavior
101
centage of reinforced (correct) trials in each 50-trial block. The six naive subjects very quickly mastered the contingency so that they were obtaining 48 or 49 reinforcements per 50-trial block. They did so by developing explicit strategies to avoid repetition. The pretained subjects never developed such strategies. Though the variability of their behavior increased over the course of training, their asymptotic performance was considerably inferior to that of the naive subjects. Thus, even though the pretrained subjects seemed to have the advantages of (1) general familiarity with the task, and (2) familiarity with contingencies in which reinforcement on a given trial depends upon performance on previous trials, and reinforcement depends upon variability, they performed less efficiently than naive subjects. Postexperimental interviews suggested that the advantage naive subjects had resulted from their being more actively engaged with the task than pretrained subjects. Retained subjects did not seem to view the task as something that required the active engagement of the intellect, perhaps because their previous task had not. The fact that they were only moderately successful was apparently not enough to shake them out of this belief, so their performance improved gradually just the way the performance of pigeons might, but it never approached perfection that insight coupled with a strategy shift would have made possible. These two experiments gave strong indications that contingencies of reinforcement that are effective in creating efficient and succesful patterns of behavior in the short term may be counterproductive in the long term if patterns of intelligent hypothesis construction and test are required. We examined this possibility systematically in a final experiment (Schwartz, 1982c, Experiment 7). College students, some naive and some with substantial pretraining on the sequence task in which correct sequences earned them money, were exposed to a series of sequence problems. Each problem required four responses on each button, but additional constraints were added so that only a subset of all such sequences was successful. All subjects were told that they would receive a series of problems and that their task was to figure out the rule that determined whether a particular sequence of responses would succeed or not. Within this general directive to find the rule, subjects differed. Some subjects were told that they would get 1 dollar for each rule they discovered, some were told they would get 1 cent for each correct sequence, some could earn money both for correct sequences and for rule discoveries, and some were simply paid a flat fee for being in the experiment and could not earn anything. The questions of interest were these: Did pretraining help or hinder rule discovery? Did payoff contingencies affect rule discovery? And was there an interaction between payoff contingencies and pretraining? The results
I02
Barry Schwartz
of the study suggested that the answers to the questions were these:
1. Pretraining interfered with rule discovery by reducing the number of problems subjects solved and increasing the number of trials required for problem solution. 2. Pretraining had its principal negative effect in the groups for which each correct sequence produced a payoff. 3. Payoff contingencies affected rule discovery only in subjects with pretraining; in naive subjects, payoffs seemed to have no effect. Thus, one of the effects of pretraining was to increase the sensitivity of subjects to future contingencies of reinforcement. These effects are presented graphically in Figs. 4 and 5. Figure 4 presents the percentage of problems solved in each group (out of a maximum of 40; 4 problems by 10 subjects). In each payoff condition, pretrained subjects did worse than naive ones, but the difference was especially large in the two conditions that involved payoffs for each correct sequence as opposed to a bonus for discovering the rule. Note also that the performance of naive subjects was constant across payoff conditions. The same pattern is apparent in Fig. 5 , which presents the number of trials (in 10-trial blocks) that subjects required to discover the rule. Naive subjects were always
+ PAYOFF + BONUS
BONUSONLY
PAYOFFONLY
NOREWARD
REWARD CONDITION Fig. 4. Percentage of sequence problems solved by pretrained and naive subjects in each of four different incentive conditions identified on the x axis. Striped bars, naive; open bars, sequence. Data are from Schwartz ( 1 9 8 2 ~ ) .
The Experimental Synthesis of Behavior
I03
25
0
g id
20
215
U t-
8’O 5m 5
5z
0 PAYOFF +
BONUSONLY
PAYOFFONLY
NO REWARD
BONUS
REWARD CONDITION Fig. 5 . Mean number of trial blocks required per problem by naive (striped bars) and pretrained (open bars) subjects in each of four different incentive conditions identified on the x axis. Data are from Schwartz (1982~).
faster than pretrained ones and, unlike the pretrained ones, were unaffected by the payoff contingencies that were in effect. This pattern of results, from pigeons and from people, is interesting for both theoretical and practical reasons. Theoretically, the role of contingencies of reinforcement in the organization of action has certainly been underappreciated. Our data seem to show that reinforcement can induce behavioral stereotypy and that the effect persists even when contingencies that might favor stereotypy are removed. Practically, knowing that reinforcement can have these stereotyping effects may lead to a more careful evaluation of the effectiveness of techniques involving reinforcement in applied settings, especially those concerned with education. The research with human subjects indicated that reinforcement contingencies may have negative transfer effects, and if these effects are general, there is good reason to worry about the use of reinforcement in applied settings. However, our previous data provide no indications about generality. Reinforced pretraining was always of one particular sort, and later testing involved problems and settings that were virtually identical to pretraining. Before one begins to make general claims about negative transfer effects of reinforcement, one must show that the effects are indeed general. It was for this reason that further experiments were undertaken. These experiments examined a range of different pretraining procedures and looked for transfer effects o n a range of different target tasks.
104
Barry Schwartz
11. Experiment 1: Effects of Pretraining Variation
In our previous experiments, pretraining confounded two variables of potential significance. First, it included reinforcement; second, it involved a contingency that produced a high degree of stereotypy. Either of these features of pretraining might be responsible for the negative transfer observed in subsequent rule-discovery tasks. The obvious experimental strategy is to have subjects engage in the sequence task without reinforcement. However, it is not clear that subjects would persevere on the sequence task without reinforcement. Instead, we employed several different reinforced pretraining conditions that past research had indicated yield a range of stereotypy. Also, the subsequent transfer test did not involve explicit instructions to look for a rule. Instead, a contingency was employed requiring sequence novelty that could only be mastered by subjects who had in fact discovered the rule even though not instructed to. This task was the same one employed in Experiment 6 in Schwartz (1982~). The experiment included four groups of eight experimentally naive subjects. After different types of pretraining, all subjects were exposed to a procedure we labeled DIF-25. In any block of 25 trials, a sequence of four responses on each of two buttons earned a point only if it had not already occurred. Thus, within a block of 25 trials, repetitions of correct sequences were never reinforced. The subjects received 12 of these 25-trial blocks, with individual blocks separated by a 5-min rest period during which data from the previous block were collected and stored. Each sequence that met the contingency requirements earned the subject 3 cents. The entire procedure lasted about 90 min. Pretraining occurred 1-3 days prior to testing. For one group of subjects (SEQ), it involved four blocks of 50 trials in which reinforcement depended on sequences of four left and four right responses that began with a leftright alternation. The contingency was the same for a second group (SEQ3 , except that only a random 50% of correct sequences was reinforced. Past research (Schwartz, 1982c. Experiments 2 and 3) had suggested that, at least in the short run, probabilistic reinforcement produces less stereotyped behavior than regular reinforcement. A third group was exposed to a DIF-3 procedure, in which reinforcement followed only correct sequences that differed from the previous three sequences. Thus, this procedure was simply a variant on what was to be the test procedure. Finally, a fourth group had no pretraining. Instructions were minimal. Subjects were told (1) that they could earn points by pressing on two keys and (2) that when they produced correct behavior, a red light would light up for a second and a counter would increment. Each point would earn them 3 cents. When they returned for the second (DIF-25) session, they were given no additional instructions.
I05
The Experimental Synthesis of Behavior
The results of this experiment showed first that our pretraining manipulation achieved its objective. Figure 6 presents the proportion of the last 100 pretraining trials in which each subject’s dominant sequence occurred, averaged across each of the three groups. Differences in sequence stereotypy were highly significant (F(2,21) = 28.7; for this and all subsequent references to statistical significance, a criterion probability of less than .05, two tailed, has been adopted). In addition, Scheffe’s tests showed that all between group differences were also significant. Despite this difference in stereotypy, there was little difference in performance efficiency. The mean number of correct sequences in the last 100 trials was 91.8 for group SEQ, 90.2 for group SEQ-.5, and 81.6 for group DIF-3. Only the difference between DIF-3 and SEQ groups approached significance. The results for the test phase of the experiment are presented in Fig. 7, which presents the percentage of trials that were successful in the first, second, and final block of 100 DIF-25 trials experienced by each subject. The data are group means. As is clear from the figure, the group without pretraining was substantially less effective than the pretrained subjects at the beginning and substantially more effective at the end. At no point in the test phase were the pretrained groups statistically different from each other. Each was significantly better than the naive group at the start and significantly worse at the end.
10
0 DIF-3
SEQ
SEQ.5
Fig. 6. Percentage of trials in Experiment I in which each subject’s dominant sequence occurred in each of the pretraining conditions identified on the x axis.
Barry Schwartz 100
f3 90 5
a
I- 80
.#"
. 1-4
5-8
TRIAL BLOCK
9-12
.,
Fig. 7. Percentage of trials averaged per 25-trial block that contained novel sequences in the sequence procedure in which only novel sequences were reinforced. Data are averaged across the first, middle, and last four 25-trial blocks and are presented separately for the four different pretraining groups of Experiment I . Naive; 0 ,DIF-3; SEQ; 0. SEQ-..5.
+,
Thus, it appears that even when reinforced pretraining did not produce highly stereotyped responding, it interfered with performance that required the discovery of a rule. This was true even though subjects were not told explicitly that the task required them to discover a rule. N o mention was made of rules or rule discovery in the test phase of this experiment. It was only the fact that the almost flawless performance of the naive group would be virtually impossible unless the subjects had discovered the rule that led us to infer that their behavior was actually rule governed. In addition, most of the subjects in this group, but not the others, were able to articulate the rule accurately when they were questioned at the end of the experiment. 111. Experiment 2: Rules, Payoffs, and Contingency Assessment
Our understanding of conditioning has been transformed in the last 20 years with the discovery that organisms are sensitive to the degree of contingency or informativeness between events in both Pavlovian and instrumental contexts (e.g., Alloy & Tabachnik, 1984; Rescorla, 1972; see Schwartz, 1984, for a textbook treatment of this extensive literature). These developments in the animal laboratory have sparked a great deal
The Experimental Synthesis of Behavior
I07
of interest in the study of contingency detection in humans. In general, the results of such studies indicate that humans are inaccurate in various systematic ways in assessing contingencies (e.g., Allan & Jenkins, 1980; Alloy & Abramson, 1979; Arkes & Harkness, 1983; Crocker, 1981; Einhorn & Hogarth, 1978; Jenkins & Ward, 1965), an outcome that is somewhat surprising in light of the apparent accuracy of pigeons and rats when faced with similar tasks. There have been a few attempts to reconcile the two literatures, the most sweeping of which was offered by Alloy and Tabachnik (1984), who suggested that accuracy in contingency assessment is largely determined by whether assessment is based upon expectations derived from past experience or upon analysis of current situational data. Humans are perhaps especially susceptible to having theories based upon past experience that shape expectations, which, in turn, influence the perception of current events. These theory-based perceptions are the source of various biases that have been well documented in the psychological laboratory, in the applied clinical or industrial setting, and even in the history of science (e.g., Alloy & Tabachnik, 1984; Einhorn & Hogarth, 1978; Nisbett & Ross, 1980; Platt, 1964; Tweney, Doherty, & Mynatt, 1981). One particular source of error in contingency judgment has often been called a “confirmation bias.” It refers to people’s tendency both to seek evidence that can only confirm hypotheses and to overvalue confirming evidence and undervalue disconfirming evidence (e.g., Einhorn & Hogarth, 1978; Schwartz, 1982~;Wason & Johnson-Laird, 1972; but see Baron, 1985; and Klayman & Ha, 1987, for a critical discussion of confirmation bias). A suggestive account of where this particular bias might come from was offered by Einhorn and Hogarth (1978). They suggested that, in natural settings, people rarely have the opportunity to falsify hypotheses. A variety of practical constraints operate to make only partial tests of hypotheses possible, and these partial tests are often of the sort that make confirmation easy and disconfirmation difficult. Consider, for example, the personnel manager who uses an aptitude test to screen job candidates, Candidates who score above a criterion are hired and those who score below it are not. Most people hired perform their job well. The personnel manager concludes that the aptitude test is a good diagnostic. Although this may be a good, practical conclusion, it is unwarranted by the data without a comparison group of workers with low aptitude test scores doing the same jobs. Such comparison groups are rarely (if ever) established, however, because of various practical constraints such as the firm’s interest in making money rather than in scientific progress. Einhorn and Hogarth’s point is that practical constraints such as these can lead to habits of inference that prevent people from either collecting
I08
Barry Schwartz
the appropriate data or processing appropriate data in an appropriate fashion, even in circumstances in which practical constraints are absent. It appears from our research that a history of reinforcement for successful sequences may create just such habits of inference. When a subject evolves a particular sequence of responses to produce payoffs, his or her orientation is toward producing desirable outcomes rather than true generalizations, and this tendency persists even in the face of explicit instructions to do otherwise. In Experiment 2, we examined whether or not a history of reinforcement for correct sequences in fact reduces accuracy in a contingency-detection task conducted in a different context. The reasons for this study were ( I ) to assess the account of bias in contingency detection suggested by Einhorn and Hogarth (1978) and supported by our own work (Schwartz, 1982~) and (2) to attempt to explore the situational generality of the negative transfer effects we have previously observed. The study was a replication of contingency detection experiments reported by Alloy and Abramson (1979). They found that judgments of control were inaccurate in several ways. When the environmental events in question were frequent, people judged that they controlled their occurrence, even when in fact they did not. Also, when the environmental events in question were hedonic in nature (wins or losses of money), control estimates were affected by the hedonic nature of the events. People gave higher estimates of control over good outcomes than over bad ones even though the actual degree of control that they had was the same. More specifically, Alloy and Abramson gave subjects a series of trials in which they could either push a button or not push it in any 3-sec period. At the end of the 3 sec, a light would either come on or not come on. This was the environmental event the control of which subjects had to estimate. In some circumstances, each time a light came on subjects won money; in other circumstances, each time the light failed to come on they lost money. In still other circumstances, no money was involved. The actual degree of control that subjects had was manipulated by varying two conditional probabilities: the probability of the light given a response and the probability of the light given no response. When these probabilities were equal, subjects had no control. When they were unequal, subjects had control to a degree that could be quantified as the difference between the two conditional probabilities. After a series of such trials, subjects were asked to estimate, on a 100-point scale, the degree of control they had. In our experiment, there were three major groups of subjects. Group Sequence experienced a 300-trial session of the sequence task, in which correct sequences (four pushes, in any order, on each of the two buttons) earned 2 cents, I or 2 days prior to the contingency estimation task. Group
The Experimental Synthesis of Behavior
I09
Rule experienced three sequence problems in which correct sequences had to include four responses on each button and satisfy additional constraints (for example, beginnning with two left-button pushes), and they were instructed to discover the rule that determined whether or not they earned points. These subjects were paid $5 for participating in the session, which occurred 1 day or 2 days prior to the contingency-estimationtask. Finally, group Naive had no pretraining. Each of these three major groups was divided into three subgroups ( n = 10). The subgroups differed in whether the contingency estimation task gave them the opportunity to win money, lose money, or did not involve money at all. Subjects in the “win” groups were told that each time the light went on they would win 5 cents. Subjects in the “lose” groups were told that they would start the session with 10 dollars and that each time the light did not go on they would lose 5 cents. The other subjects were simply given 5 dollars for participating (amounts were chosen so that all subjects would earn about $5 for the session). Subjects were seated at a console that contained a push button, a white light and a red light. They were then read the following instructions (virtually identical to the instructions used by Alloy & Abramson): In this experiment, it is your task to learn what degree of control you have over whether or not the red light comes on. When the white light comes on, it indicates the start of a trial, the occasion to do something. For each trial, you have the option of pressing the button or not pressing the button. You need press the button only once in a trial. If you intend to press the button in a given trial, you must do so within three seconds after the white light comes on; otherwise, the trial will be counted as a not-press trial. So, in this experiment, there are two things you can do; either press the button within three seconds after the white light comes on or do nothing. Any questions? You may find that the red light goes on in some percentage of the trials after you press the button. You may also find that the red light goes on in some percentage of trials after you do not press the button. Also, the red light may not go on on some percentage of trials after you push the button, and it may not go on on some percentage of trials after you don’t push the button. So all together, there are four possibilities as to what may happen on any given trial: button push and red light; button push and no red light; no button push and red light; and no button push and no red light. Since your task is to learn how much control you have over whether or not the red light comes on, it is to your advantage to press on some trials and not on others, so that you know what happens when you don’t press as well as when you do press.
I10
Barry Schwartz
Forty trials will constitute the problem, After the problem, you will be asked to put an “X” somewhere on this scale. Put an “X” at 100 if you have complete control over the onset of the red light, at 0 if you have no control over the light, and somewhere between these extremes if you have some but not complete control over the light. Complete control means that whether or not the light goes on is completely determined by what you do on a trial. No control means that you have found no way to influence whether or not the light goes on, that it is determined not by what you do, but by luck or chance. [At this point, subjects in one of the two incentive conditions were told about the payoff structure of the task.] All together, you will have a series of five of these problems. After 40 trials, you will put your “X” on the control scale, 1’11 come and collect it, and then we’ll begin the next series. Any questions? The conditional probabilities of red light given a response or no response in the five series were .75--75, .75-SO, .75-.25. 3 - 3 0 . and . 25-.25. Subjects received them in random order and were counterbalanced as to whether the first probability in each pair was associated with pressing or not pressing the button. Also, half of the subjects in each group were male and half were female. The results of the pretraining phase were as expected: subjects in group Sequence developed stereotyped sequences that occurred in over 90% of the last 100 trials, while subjects in group Rule did not develop stereotyped sequences and, in virtually all cases, discovered each of the rules (if a subject had not guessed a rule in 200 trials, we went on to the next rule; this happened only three times in 30 subjects). The data of primary interest in the contingency estimation phase of the study were, of course, the subjects’ estimates of control. For purposes of analysis, the five series of contingencies experienced by the subjects were treated as two groups of three. In one group (.75-.75, 3 0 - S O , .25.25) the contingency was always zero and what varied was the probability of the outcome. In the second group (.75-.75, .75-SO, .75-.25) the degree of contingency actually varied. A 3 (contingency series) x 3 (incentive condition) x 3 (pretraining condition) analysis of variance was conducted for the two groups of three contingency series. In both analyses, there were significant effects of contingency series, of incentive condition, and of pretraining. The results are organized graphically in Figs. 8-1 1. Figure 8 presents control estimates across the series where degree of control actually varied. The data are presented separately by pretraining but averaged across incentive conditions. It is clear from the figure that subjects in group Rule were quite sensitive to the degree of actual contingency while subjects in group Sequence were insensitive. The naive
The Experimental Synthesis of Behavior
NAIVE
0 SKXlENcE
Ill
RULE
+ 75-75
75- 50
75-25
CONTINGENCY Fig. 8. Estimates of control in Experiment 2 in the conditions in which the actual control that subjects had varied. Data are presented separately for each pretraining group, but are averaged across incentive conditions.
subjects were in the middle; their estimate of control in the .75-.25 series was significantly higher than in the other two. Separate analyses for each incentive condition yielded essentially the same pattern as in Fig. 8 for each of the three groups; that is, group Sequence subjects gave essentially the same control estimates across problems no matter what incentive condition they were in, group Rule subjects gave essentially accurate estimates of control no matter what incentive condition they were in, and naive subjects differentiated the .75-.25 condition from the other two no matter what incentive condition they were in. Figure 9 presents similar data from the three problem series in which subjects had no control. The data for naive subjects were very much like those reported by Alloy and Abramson; the degree of control was consistently overestimated (since actual control was zero), and the magnitude of the overestimation varied directly with the frequency of the outcome. For group Sequence, the effect of frequency was diminished because estimates of control were high at all frequencies. For group Rule, the effect of frequency of outcome was also diminished because all estimates of control were quite modest. Essentially, this pattern of results occurred for subgroups within each incentive condition.
Barry Schwartz
I12
I 50
NAIVE
SKXENCE
RULE
I
T
n +
0
25-25
50-50
75-75
CONTINGENCY Fig. 9. Estimates of control in Experiment 2 in the conditions in which subjects had no control but the overall frequency o f the outcome varied. Data are presented separately for each pretraining group, but are averaged across incentive conditions.
To assess the effects of incentive condition on estimates of control, we subtracted estimates in the lose condition (in which subjects started with $10 and lost money on unsuccessful trials) from estimates in the win condition (in which subjects started with nothing and gained money on successful trials). If assessments of control were based on rough computations of conditional probability and were independent of reward condition, as would be normatively appropriate, these difference scores would be zero. Alloy and Abramson had found that the control estimates were dramatically affected by whether they were winning or losing. Figures 10 and 1 1 present the data. Figure 10, like Fig. 8, is from the three series in which the degree of control actually varied, and Fig. 11, like Fig. 9, is from the series in which event frequency varied but control was actually zero. The results are clear. Both naive and sequence subjects gave estimates of control that varied with whether they were winning or losing money. In contrast, subjects pretrained in rule finding were much less influenced by whether they were winning or losing money. The two incentive conditions did not have symmetrical effects. In all cases, the win condition estimates were statistically indistinguishable from the no-incentive condition estimates. Thus. the incentive effect was es-
The Experimental Synthesis of Behavior
W 40
2 2
F UJ W
r
I13
r
8<20 W
l-
a
5
F l O v)
W
z
30
75-75
75-50
75-25
CONTINGENCY Fig. 10. Difference in control estimates in Experiment 2 between positive (win) and negative (lose) incentive conditions. Data are presented separately for each pretraining group from just those.conditions in which the actual degree of control that subjects had varied.
sentially that being in a losing situation reduced control estimates. Being in a winning situation did not inflate them. Perhaps with stakes this small, getting the target light to go on had about the same hedonic effect as winning money. As described thus far, the results of the study clearly indicate that pretraining involving rule discovery yields much more accurate control estimates than pretraining involving reinforcement for correct sequences. Another question that can be asked is whether or not this difference reflects both a negative transfer effect of reinforcement and a positive transfer effect of rule finding. Specifically, we can ask if group Sequence was significantly less accurate than naive subjects and if group Rule was significantly more accurate than naive subjects. A series of post hoc Scheffe tests were done to make these comparisons. The estimates of group Rule were significantly more accurate than the estimates of the naive subjects in all incentive conditions and almost all problem series. So the positive effect of rule-finding pretraining was clear. Less clear was the negative effect of reinforcement. Though the estimates of group Sequence were almost always less accurate than those of the naive subjects, the differences were not generally statistically significant.
Barry Schwartz
I I4
r
I-
w w
cn
3I 2 0 w I-
a
2
Fro cn u
z
30 25-25
50-50
75-75
CONTINGENCY Fig. I I . Difference in control estimates in Experiment 2 between positive (win) and negative (lose) incentive conditions. Data are presented separately for each pretraining group from just those conditions in which subjects had no control. but the overall frequency of the outcome varied.
It is perhaps worth noting that the win versus lose incentive conditions are a species of “frame” effect, not unlike the contrast between calling prices discounts or surcharges (Kahneman & Tversky, 1984; Tversky & Kahneman, 1981). Every trial in which one does not lose money is, in effect, a trial in which one has won money, and vice versa. Alloy and Abramson demonstrated that this frame effect can be quite powerful in naive subjects. Pretraining in a rule-discovery task, which did not in itself involve either winning or losing money, apparently reduced this frame effect substantially. The connection between rule discovery and frame effects of this sort is not transparent, but is of enough potential significance to encourage future investigation. IV.
Experiment 3: Rules, Payoffs, and the Apprehension of Logical Form
The bias that reveals itself in inaccurate estimates of contingency relations has also been investigated in other contexts. Perhaps the best known of these is the so-called “selection task” (Wason, 1964; Wason & Johnson-Laird, 1972). Subjects are shown four cards, two of which have
The Experimental Synthesis of Behavior
II5
letters (A or B), and two of which have numbers ( I or 2), and are made to understand that each card has a letter on one side and a number on the other. They are then read a proposition and asked to select the card or cards they would have to inspect to assess the validity of the proposition. The proposition is typically of the conditional form, if p , then 9. For example, the proposition might be: “If there is an A on one side, there is a 1 on the other.” Propositions of this form are false only when the antecedent (p) is true and the consequent (9) is false. Faced with four cards displaying A, B, I , and 2, and the proposition “If there is an A on one side, there is a I on the other,” the schooled logician would know that the A card (p) and the 2 card (not 9 ) would have to be examined. The other two cards, while relevant to evaluating some kinds of propositions, are not relevant to this one. Schooled logicians may know this, but typical college students do not. Very few subjects choose all and only the relevant cards. The most common choices are of card A alone or of card A and card 1. The evidence on card A is relevant to efforts at either confirmation or falsification of the proposition (a 1 on the hidden side confirms and a 2 falsifies). However, the evidence on card l can only confirm (an A on the hidden side confnns and a B is irrelevant). The fact that subjects choose card 1, which can only confirm, and ignore card 2, which can only falsify, is what leads to the conclusion that subjects come to tasks like this with a confirmation bias. Does pretraining of various kinds enhance or diminish people’s tendencies to appreciate the value of some kinds of evidence and ignore the value of other kinds? Our interest in this issue was kindled by our very first sequence experiment with human subjects. When subjects were asked what one had to do to get a point, they reliably offered the particular, sufficient sequence that had dominated their own performance as if it were necessary. This mistake was easily corrected, but nevertheless it led to the suspicion that pretraining could result in biased interpretation of questions with various logical forms. The statement, “If p , then q” is logically equivalent to “p is sufficient for 9 ” , while “if 9, then p” is equivalent to “ p is necessary for 9.” In this experiment, we asked whether people would behave appropriately when required to test propositions that were presented in varying linguistic but equivalent logical form, and whether pretraining would influence subjects’ strategies for testing propositions in systematic ways. As in the last experiment, there were three groups of subjects. They were screened to exclude people who had taken a course in logic. Group Sequence subjects (n = 10) experienced a 3Wtrial session in which correct sequences earned 2 cents. Group Rule subjects (n = 10) were paid 5 dollars for the session, experienced three sequence problems, and were instructed
1I6
Barry Schwartz
to discover the rule that determined whether or not they earned points. Group Naive (n = 10) had no pretraining. As in previous work, the Sequence subjects developed stereotyped sequences and the Rule subjects did not. Within 2 days of their pretraining, all subjects then experienced an identical test phase of the experiment. In the test phase, they were seated in front of the sequence apparatus and given the following instructions: 1 am going to give you a series of problems to work on. Each problem will be presented on a sheet of paper, like this [holds up paper]. On the paper is a diagram of this 5 x 5 light matrix [points,to matrix] and a statement. One of the 25 squares of the matrix is shaded. The statement refers to the shaded square. It might say something like, “if the light goes through the shaded square, I get a point”, or “if 1 got a point, the light went through the shaded square”. Your task is to test the statement, to decide if it’s true or false. The way to decide is to experiment. Whenever the top-left light of the matrix lights up, the equipment is ready. At that point, you can move the matrix light by pushing these two buttons [points to buttons]. To get a point, you have to light up the bottom, right square; that is, you have to push each button four times. The statements are about aspects of the task in addition to this requirement that you push each button four times. Whenever you get a point, this counter will increase by one, and this red light will light up for a few seconds. You can make notes to yourself right on the problem sheet. If you are ready to make a guess about whether the statement is true or false, push this button [points], and I will come in. You can write down your guess on the sheet, and I will give you another problem to work on. I won’t tell you whether any of your guesses are right or wrong. All together, you will have eight problems to work on. You will be paid $5 for your participation. Any questions?
For subjects in the Naive group, the experimenter then ran through a series of five trials to demonstrate how the equipment worked. Obviously. this was unnecessary for subjects who were pretrained. Subjects were then given a series of eight problems, each including a matrix diagram with one shaded square and a statement. These materials occupied about half of an 8.5 by 1 I-inch piece of white paper. The remainder of the sheet was available for subjects to use for taking notes, framing hypotheses, or whatever they thought useful. Four of the eight problems contained statements that were logically equivalent forms of the claim “if p, then q,” with p standing for the shaded square and q standing for the getting of a point. Each of these statements was a claim about sufficiency. The other
The Experimental Synthesis of Behavior
I I7
four problems contained statements that were claims about necessity, logically equivalent to “if q, then p.” Specifically, the eight statements were these (the first four relate to sufficiency, the last four to necessity): 1. If the light goes through the shaded square I get a point. 2. To get a point, it is sufficient for the light to go through the shaded square. 3. Either the light does not go through the shaded square, or I get a point. 4. The light went through the shaded square only if I got a point. 5 . If the light does not go through the shaded square, 1 don’t get a point. 6. To get a point, it is necessary for the light to go through the shaded square. 7. Either I didn’t get a point, or the light went through the shaded square. 8. I get a point only if the light goes through the shaded square.
Subjects received these eight problems in random order, and for each ‘subject,a random half of both the necessity and the sufficiency statements were true. Because we expected that pretraining might enhance an already existing tendency in subjects to treat questions about necessity as if they were questions about sufficiency, the results of the experiment were analyzed separately for the two types of statements. Figure 12 presents data
NAIVE
0 SEQUOJCE
RULE
T
SUFF
NECC
PROBLEM TYPE Fig. 12. Percentage of correct guesses about the validity of the statements to be tested in Experiment 3. Data for statements about sufficiency are on the left, and data for statements about necessity are on the right. Data are presented separately for each pretraining group.
I18
Barry Schwartz
on accuracy; the percent of correct conclusions for each group on each subset of the problems. On the problems asking about sufficiency, there were differences between the groups, but they were not significant. On the necessity problems, the analysis of variance was significant, as were all between-group comparisons. Thus, rule-trained subjects were significantly more accurate than naive ones, who in turn were significantly more accurate than sequence pretrained subjects. A second measure of performance was the number of trials subjects took in evaluating each problem, and these data are presented in Fig. 13. First, evaluations of necessity took more trials than evaluations of sufficiency. This difference was significant overall and in group-by-group comparisons. Second, within problem types, there was a significant effect of pretraining, and Scheffe tests revealed that, in the case of both types of problems, group Sequence took significantly more trials than either of the other groups, which did not, in turn, differ from each other. When questions ask about sufficiency, they can only be falsified by trials in which the shaded square is in fact illuminated. The statement that, for example, “if the middle square in the 5 x 5 matrix is illuminated you get a point” is only tested by trials in which that square is illuminated.
SUFF
PROBLEMTYPE Fig. 13. Mean number of trials per statement taken by subjects in Experiment 3 prior to making their guesses about statement validity. SUFF. Statements about sufficiency; NECC, statements about necessity. Data are presented separately for each pretraining group.
I19
The Experimental Synthesis of Behavior
Conversely, statements about necessity can only be falsified by trials that do not illuminate the square in question. To assess the claim that “you get a point only if the middle square is illuminated,” one must generate trials that avoid illuminating the middle square. For each subject, we evaluated the proportion of trials generated that actually were appropriate tests of the proposition in question. In other words, what proportion of trials that evaluated statements about sufficiency were actually tests of sufficiency, and what proportion of trials that evaluated statements of necessity were actually tests of necessity? The data are presented in Fig. 14. On sufficiency problems, both pretrdined groups generated a significantly higher proportion of trials that actually tested for sufficiency than did the naive subjects. This might lead to the conclusion that both kinds of pretraining improved performance. However, the data from necessity problems suggest quite a different conclusion. Here, the Rule group was significantly better than the Naive group while the Sequence group was significantly worse. The appropriate conclusion
70
-
60
0
8
=50
8LL
-.
$40
cn
*
30
*
20
9
+ SUFF
NECC
PROBLEM TYPE Fig. 14. Percentage of trials in Experiment 3 in which the tests subjects generated were appropriate to the claims in the statements being tested. Thus data on the left (SUFF) present the percentage of tests of statements about sufficiency that were actually tests of sufficiency. while data on the right (NECC)present the percentage of tests of statements about necessity that actually were tests of necessity. Data are presented separately for each pretraining group.
Barry Schwartz
I20
from these data seems to be that rule pretraining indeed helps subjects in their analysis of the logical form of various propositions so that they understand what claim is being made and test it correctly. In contrast, sequence pretraining induces subjects to treat all logical claims as claims about sufficiency and to test them as such. Or, alternatively, perhaps it induces them to attempt to produce positive results and ignore logic all together. This will improve the accuracy and appropriateness of subjects' behavior only as long as the claims in question actually ure about sufficiency. It seems, therefore, that the kind of bias identified and studied by Wason and Johnson-Laird (1972) and by Einhorn and Hogarth (1978) is ameliorated by experience with systematic hypothesis testing and rule discovery and exacerbated by experience generating particular behavior patterns to produce particular outcomes. V.
Experiment 4: Melioration, Optimization, and Retraining
In recent years, research on reinforcement schedules has focused on the study of choice behavior. Within the study of choice, there has been a persistent dispute between those who claim that choice behavior is governed by a set of maximization or optimization rules and those who claim that choice is governed by principles of melioration or satisficing (e.g., Herrnstein, 1970; Herrnstein & Vaughn, 1980; Prelec, 1982; Rachlin, Battalio, Kagel, & Green, 1981). In many circumstances, one of the difficulties in resolving the dispute is that optimization rules and melioration rules converge on the same asymptotic pattern of behavioral allocation. Most commonly, that pattern is described by what is known as the matching law (Herrnstein, 1970). which says that behavior is distributed so that the relative frequency of responding on the various alternatives matches the relative frequency of reinforcement available on the alternatives. For melioration views of behavioral allocation, matching is not merely the end result of the choice process; it is also the mechanism that yields the end result. For maximization views of behavioral allocation, while matching may describe the end result, it does not describe the choice process itself. In efforts to test between the views, there have been several attempts to design procedures in which the two different kinds of allocation rulesmatching and maximizing-will yield clearly different patterns of behavior. One ingenious experiment of this type was reported by Mazur (1981). In Mazur's experiment, pigeons had a choice between two pecking keys, each of which was associated with intermittent food reinforcement. A single timer determined the availability of reinforcement at random intervals, and each reinforcer was allocated to the left or right key randomly with an equal probability. When a reinforcer was allocated to the left key,
The Experimental Synthesis of Behavior
121
a peck on that key was required to produce it; pecks on the right key had no effect. Moreover, and critically, until that left-key reinforcer was actually collected, the timer was stopped so that no progress toward the next reinforcer could be made until the currently available one was actually earned. Thus, for example, if the next reinforcer was allocated to the left key and, for some reason, the pigeon spent 20 min pecking at the right key, it would earn no reinforcer during that period, nor would it get any closer to earning a reinforcer. Only after a left-key peck produced the reinforcer that was currently available would the timer that determined reinforcer availability start working again. A pigeon that was optimizing its reinforcement rate on a procedure like this would peck the two keys with equal frequency. Unfortunately, since equal frequencies of reinforcement would be available on the two keys, such a maximization strategy would also be a matching strategy. Suppose, however, the procedure was changed so that while reinforcement periods were still assigned with equal probability to left and right keys, only 1 in 10 reinforcement periods actually contained food for left key pecks. The other 9 just involved a brief period of dead time in which no food was available and no pecks could occur. What should the maximizing organism do in such a procedure? The answer is that it should continue to peck the two keys with equal frequency. It may not “like” the consequences of left-key pecks as much as it “likes” the consequences of right-key pecks, but it can’t get access to the next right-key reinforcement period until it gets through the current left-key one. In other words, there is nothing the pigeon can do about the fact that half of its reinforcement periods will be associated with the left key. By confining itself largely to the right key a pigeon will introduce substantial delays between the time that a reinforcement period sets up on the left key and the time that it is earned, and as a result it will reduce the overall frequency of reinforcement obtained. On the other hand, under these new conditions right-key pecks will be reinforced 10 times as frequently as left-key pecks, and if pigeons obey the matching law, they should start pecking the right key almost exclusively and, in the process, cost themselves reinforcement. Mazur introduced several conditions like this and found that, by and large, pigeons moved in the direction of matching. In the particular case just described, pigeons switched from pecking the two keys about equally to making about 85% of their pecks on the right key. As a result, they obtained about a third less reinforcement than they would have had they continued to distribute their responses equally. We attempted to replicate Mazur’s findings with human subjects. We were interested, first of all, in whether or not people were more likely to develop a maximizing strategy than were pigeons. The contingencies that operate in this procedure are both complex and subtle, and discovering
122
Barry Schwartz
the maximizing strategy may well be beyond the capacities of organisms that might nevertheless be out to maximize in general. Second, we were interested in seeing whether the effects of pretraining (both positive and negative) of the sort we have already found would transfer to a task that is quite different from the others we have examined. As in our previous experiments, there were three main groups of subjects. One was naive, one had experienced a session of the sequence procedure in which correct sequences earned points and, thus, money, and one had experienced a session of the sequence procedure with the task of discovering rules. As in the previous studies, the sequence pretraining induced stereotyped sequences and the rule-discovery pretraining did not. Each subject then began a series of three 1 hr sessions in the next phase of the experiment. These sessions began no more than 2 days after pretraining and were conducted on consecutive days, except for a few instances in which subjects had to postpone a scheduled session. In no case did more than I day interrupt these sessions, and in two cases subjects participated in two sessions on the same day because of scheduling difficulties. In the test phase of the experiment, subjects were seated at a computer keyboard and instructed that they could press two of the keys at any time they wanted, with presses on the other keys having no effect. They were also told that they could not press the two keys simultaneously. The computer screen was blank until the session began, at which time a diamond pattern appeared on the screen with the words “SESSION RUNNING: KEYS ACTIVE” inside it. There was no feedback for key pressing except for the reinforced key presses. Then, a square pattern replaced the diamond, with the words “TIMEOUT: YOU HAVE JUST EARNED A POINT. CONGRATULATIONS!” inside it. The time-out lasted for 3 sec, after which the diamond reappeared. During phases of the experiment in which not every reinforcement period earned a point, the message inside the square simply said “TIMEOUT” when reinforcement was not forthcoming. A counter in the upper right corner of the screen kept a running total of the points subjects earned. The incentive conditions in this experiment were as follows. Subjects were paid 3 dollars per session for participating. They were informed that they would be competing with other subjects during the sessions and that the subject who earned the most points would win 30 dollars while the runner-up won 20 dollars. It was thought (and some pilot work confirmed) that prizes like these would have much greater incentive effects than would the marginal advantages (in pennies) of optimal response allocation during a session. In other words, this seemed to be an effective way to keep subjects serious about and attentive to what was otherwise an extremely boring task. These prizes were paid to the most successful subjects in each experimental condition.
The Experimental Synthesis of Behavior
I23
Subjects (10) from each of the pretraining groups were exposed to a series of three phases, each lasting for a single session. In the first session, reinforcement opportunities were allocated with equal probability to each of the response alternatives. The reinforcement opportunities occurred at random intervals averaging 30 sec, so that responses on each alternative were effectively reinforced on a I-min random-interval (R1 I-min) schedule. The random-interval program stopped timing until an available reinforcement opportunity was collected. Furthermore, each reinforcement opportunity actually contained a reinforcer. That is, each response-produced time-out also produced a point. Subjects who responded efficiently on this procedure could obtain about 60 points for responding on each alternative, or a total of 120 in the l-hr session. This procedure was essentially identical to the one studied by Mazur. In the second phase, allocation of reinforcement opportunities to response alternatives was as in the first phase. However, for I of the 2 responses, only I in 10 reinforcement periods actually produced reinforcement: the others just produced time-outs. The maximizing subject here would continue to distribute responses equally between the 2 alternatives, as in the first phase, even though I of the 2 alternatives was yielding 10 times as many points. In the third phase of the procedure, allocation of reinforcement was again as in the first phase. Now, however, the alternative for which each reinforcement period had been producing a reinforcement was changed so that only 25% of the reinforcement periods produced reinforcement, while reinforcement in the previous 10% alternative was increased to 100%. In this last phase, even though I alternative was yielding 4 times as much reinforcement as the other, the maximizing strategy was still to distribute responses equally between alternatives. Our principal interest in this study was in whether or not subjects pursued the maximizing strategy. In all cases, this involved responding equally on the two alternatives. While such behavior could indeed be a reflection of maximization, it could also be a reflection of complete indifference to the task. To evaluate this possibility, another three-phase procedure was run with 10 subjects from each of the three pretraining conditions. In this procedure, unlike the first one, maximization required shifts in response allocation from one phase to the next. The first phase of this procedure was the same as in the first procedure. In the second phase, each reinforcement period continued to yield a reinforcer as in the first phase, However, equiprobable reinforcement periods were no longer assigned to the two alternatives. Now, 75% of them were assigned to one alternative and only 25% to the other. This manipulation meant that reinforcement opportunities were three times more likely for one alternative than the other. Both maximization and matching accounts of response allocation would predict a shift from SOS O toward .75-.25. The third phase of this procedure was also identical
I24
Barry Schwartz
to the third phase of the first procedure. Reinforcement opportunitieswere equiprobable, but one actually yielded reinforcement every time while the other only yielded it 25% of the time. Here, as in the first phase, maximization would involve equal distribution of responses between the alternatives; matching would not. To summarize, in the first procedure, though relative reinforcement frequency in the three phases varied from S O to .09 to .20, the maximizing relative response allocation stayed constant at S O ; in the second procedure, relative reinforcement frequency varied from 3 0 to .25 to .20, and the maximizing relative response allocation varied from S O to .25 to 3 0 . The data of critical interest concerned response allocation between the two alternatives. To determine subjects' asymptotic responsiveness to contingencies, we examined responding during the last 10 min of each session. Figure I5 presents relative response rate as a function of relative reinforcement rate for the three pretraining conditions in the first procedure. Recall that the maximizing strategy here was to respond equally on the two alternatives in all phases of the procedure. The rule-pretraining group did exactly this. The prediction that a matching- or meliorationbased account of choice would make is that relative responding would track relative reinforcement rate. This was approximately true of the sub-
.50
.09
.20
RELATIVE REINFORCEMENT Fig. IS. Relative response rate as a function of relative reinforcement rate in Experiment 4. These data are from the procedure in which maximization required that relative response rate stay constant at .5 in all three phases of the experiment. Data are presented separately for each pretraining group.
The Experimental Synthesis of Behavior
I25
jects with sequence pretraining for points. They actually “undermatched,” that is, the relative response rate was more evenly distributed between the alternatives than was the relative reinforcement rate, but this is quite a common result. Mazur (1981) observed the same pattern in his experiment with pigeons. Furthermore, subjects only had I hour of exposure to each condition, and, with more exposure, it is possible that response rates would have drifted closer to the matching relation. The naive subjects were between the two pretraining groups. Response allocation changed dramatically from phase to phase, but not as dramatically as in the sequence-pretraining group. Statistical analysis revealed that both the pretraining condition and the phase of the experiment had significant effects on choice. Scheffe tests indicated that the rule-pretraining group was significantly different from the other two groups in both the second and third phases of the procedure. The difference between naive and point-pretraining groups was only significant in the final phase of the procedure. Did failure to follow a maximization strategy on the part of naive and sequence-pretrained subjects actually make a difference? Did it cost them reinforcements? The answer to these questions is yes. The data are presented in Fig. 16, which plots the total reinforcements obtained by each
.09
.20
RELATIVE REINFORCEMENT Fig. 16. Total reinforcement obtained by the three different pretraining groups in Experiment 4 in the second and third phases of the procedure in which maximization required that relative response rate stay constant at .5 despite the variations in relative reinforcement rate indicated on the x axis.
Barry Schwartz
I26
pretraining group in the second and third phases of the procedure. Analyses of variance yielded significant pretraining effects for both phases. As in the case of response rates, Scheffe tests indicated that the rule pretraining group was significantly different from the other two groups, which were not, in turn, different from each other. Data from the second choice experiment are presented in Figs. 17 and 18. In this procedure, the maximizing and matching strategies yield the same choice patterns in the second phase; in either case, relative response rate and relative reinforcement rate should be about equal. As Fig. 17 indicates, subjects in all three groups moved in that direction. Though all groups undermatched, there were no significant differences between them. In the third phase, maximizing and matching strategies diverge, with maximization yielding a relative rate of S O and matching yielding a relative rate of .20. Figure 17 shows that the rule-pretraining group seemed to be maximizing. Between the second and third phases, relative response rate moved up from .32 to .47 even though relative reinforcement rate moved down from .25 to .20. Responding of the other two groups did not change from the second to the third phase, so that the rule group was significantly
I
NAIVE
0 SKXlENcE
RULE
I
U0.5
80.4
3
;0.3
2 50.2 U
a0.1 0.0
.50
.25
.20
RELATIVE REINFORCEMENT Fig. 17. Relative response rate as a function of relative reinforcement rate in Experiment 4. These data are from the procedure in which maximization required that relative response rate change from one phase of the experiment to the next. Data are presented separately for each pretraining group.
The Experimental Synthesis of Behavior
I27
different from each of them while they did not differ from each other. And again, as Fig. 18 shows, the failure to adopt a maximizing strategy had costs in reinforcement. Subjects in the rule-pretraining group obtained significantly more reinforcement in the last phase of the experiment than did subjects in either of the other groups. The results of this experiment seem to indicate that only subjects with pretraining in rule discovery were able to pursue a maximization strategy effectively. This difference we observed between groups could have resulted from two quite distinctive possible differences in the underlying process. One possibility is that all subjects were trying to maximize, but that only subjects with rule pretraining were able to do so. The contingencies involved in this experiment were complex and subtle, and subjects had rather limited exposure to them. With more extended exposure, all subjects might have converged on the same patterns of behavior. The second, more interesting possibility is that behavioral differences reflected genuine differences in underlying process. Melioration as a process can be relatively passive and unintelligent. The idea behind it is that behavior keeps moving in the direction of what is better without concern for or computation of what is best. One way that melioration can occur is through
50
.25
.20
RELATIVE REINFORCEMENT Fig. 18. Total reinforcement obtained by the three different pretraining groups in Experiment 4 in the second and third phases of the procedure in which maximization required shifts in relative response rate from one phase of the experiment to the next.
I28
Barry Schwartz
simple strengthening effects of reinforcement. Reinforcement increases the likelihood of behavior that produces it, and the more frequently reinforcement occurs, the more the relevant behavior is strengthened. Through this positive feedback loop, behavior slowly shifts in the direction of the alternative that pays off more frequently. Equilibrium is reached at about the point where the allocation of responses matches the allocation of reinforcement. Often, but not always, this equilibrium will be the same one that is reached by an organism pursuing maximization. But the process of seeking a maximum seems to demand more active hypothesis construction and testing than does the process of melioration. The potential maximizer must be able to resist the positive feedback effects of reinforcement. Indeed, the maximizer must sometimes shift behavior in a direction that is opposite the direction of reinforcement frequency advantage. The rulepretrained subjects in our experiment were clearly able to do this, and while it is possible that the other subjects eventually would have as well, they gave no indication, either in performance or in postexperimental interview, that they were operating with a maximizing orientation.
VI. General Discussion The four experiments described here extend our previous work (Schwartz, 1982~) and show that a history of reinforcement contingent on particular successful patterns of behavior can have negative transfer effects in later tasks that are substantially different from the pretraining task. These experiments also indicate that pretraining requiring subjects to seek rules can have positive transfer effects. Experiment I showed that reinforced pretraining can have negative effects even when the pretraining itself does not result in stereotyped behavior. Experiment 2 showed both positive effects of rule pretraining and negative effects of sequence pretraining in a contingency judgment task. Experiment 3 showed that rule pretraining improved the accuracy with which subjects interpreted and tested claims that varied in logical form. Finally, Experiment 4 showed that rule pretraining facilitated the discovery of maximizing solutions in complex choice situations, while sequence pretraining seemed to discourage the discovery of such solutions. Taken together, these studies raise several issues, both theoretical and practical. What is the source of the negative transfer effect produced by contingent reinforcement? What does the search for rules do that promotes positive transfer? What is the relation between stereotypy, variability, and intelligent problem solving? We will take up each of these issues briefly in turn.
The Experimental Synthesis of Behavior
A.
I29
NEGATIVEEFFECTSOF REWARD
There is a substantial literature on the negative effects of reward (e.g., Deci, 1975; Lepper & Greene, 1978). Most of it is focused on accounts based upon motivation. The idea is that when rewards are introduced into a situation contingent on behavior that is already occurring without them, they gain control over the behavior. Whatever other reasons the individual may have had for engaging in the activity-motivation that is said to be intrinsic to the activity-these reasons are overwhelmed by the power of the extrinsic reinforcer. The result is that the reinforcer controls the behavior. In consequence, intrinsic characteristics of the activity are no longer sufficient to sustain it. Reinforcement has “turned play into work.” It may be that what is critical about the motivational shift induced by contingent reinforcement is not so much that once reinforced, a target activity will no longer occur without reinforcement, but that the form that an activity takes will itself depend upon the motivation that supports the activity. Reinforcement encourages the repetition of what has worked in the past, in part because the aim of the activity is not to produce something like a general principle or a rule, but to produce another reinforcer. Generally, such a tendency to repeat what has worked in the past is a sensible adaptation to environmental contingencies. But there are some settings where it is not so sensible. The experimental scientist, for example, identifies a problem, formulates a hypothesis, designs an experiment to test the hypothesis, does the experiment, and revises or endorses the hypothesis in light of the results. Results that confirm the hypothesis will surely be reinforcing, if anything is. But such results will not lead the scientist to repeat precisely the sequence of activities that has just occurred. Good science involves a kind of win-shift strategy characterized by intelligent, systematic variation from one experiment to the next. Of course, this is not a characteristic restricted to science. Typically. when one is interested in uncovering a generalization, one does not simply repeat what has worked in the past. One varies one’s activities from occasion to occasion to discover what is critical and what is not. If one wants to know the essentials of successful bread baking (and not merely a procedure that is sufficient for baking good bread), one must experiment in the kitchen just as in the laboratory. Despite the clear advantages of behaving “scientifically” in many natural contexts, there is evidence that people do so only imperfectly. We have already reviewed evidence that people overestimate the value of certain classes of evidence and undervalue or ignore all together the value of other classes of evidence. It has been persuasively argued by Einhorn and Hogarth (1978), and confirmed by some of the present results, that one source of this bias is that most of the time, people are operating under
I30
Barry Schwartz
constraints that make normatively appropriate tests of hypotheses difficult or impossible. The constraints are such that the specific outcomes of tests matter. It may be true that one learns as much or more about baking from a failed loaf of bread as from a successful one. But this will be small comfort to one’s dinner guests as they break their teeth trying to bite into a falsified hypothesis. Horton (1%7) has pointed out that one of the extraordinary things about Western science as an institution-the thing that most distinguishes it from patterns of thought characteristic of traditional African societies-is that science is characterized by pure or segregated motives while traditional African thought is characterized by mixed motives. What Horton means by this is that the abstract and the practical are interconnected in traditional African thought. Hypotheses are not framed in such a way that all outcomes are equally interesting or informative. The African farmer might do an “experiment” to improve crop yield, but the point of the experiment is not to discover a true generalization about agriculture: the point is to improve crop yield. In contrast, Western science separates the pure and the practical or applied. The laboratory is a place where outcomes don’t matter except for the information they provide. As we tell our students, the well-designed experiments are the ones where all possible outcomes are interesting and informative. This is true not only because it reduces the likelihood that one will design experiments whose outcomes make them a waste of time. It is also true because it reduces the likelihood that one will have a personal stake in which way the experiment comes out, and this, in turn, reduces the likelihood that one will tamper with or falsify experimental results. Segregation of motives goes a long way toward preserving scientific integrity. Seen in this light, the unfortunate motivational effect of reinforcement is precisely that it conflates the “pure” and the “applied.” the abstract and the practical. On the one hand, particular results of each experiment (trial) matter. After all, one would rather win money than lose it. On the other hand, the best route to discovering the truth involves risking some sure positive outcomes. What is one to do? Different people will make different compromises when faced with these conflicting demands. In our experiments, reinforced pretraining did not have overwhelming effects. But our experiments involved trivial manipulations of outcome consequences, and they were done with subjects-talented college studentswho presumably were highly interested in and attuned to the process of discovering true generalizations. So, given the weakness of our reinforcement manipulation and the nonrepresentativeness of our subject population, what is perhaps most surprising is that we produced any negative effects at all. Arguments like these about the negative effects of reward have been
The Experimental Synthesis of Behavior
131
around for some time. They are frequently directed at the use of reinforcement contingencies in classroom settings. A point that is often made in response to these arguments is that no one in a classroom setting would dream of tampering with desirable activities that are already occurring with high frequency. One does not reinforce reading in children whose noses are always in a book. The problem faced in the classroom is precisely one of making low probability activities into high probability activities. Thus, one might use reinforcement procedures to teach reading. In initial acquisition, reading is not high probability. Reading is simply not fun when a child must struggle with every word. Once the skill has been acquired, it becomes fun. At this point, reinforcement can be discontinued, with the expectation that the activity will be self-sustaining (Feingold & Mahoney. 1975; Staats, 1975). Such a defense of the use of rewards in the classroom depends upon the assumptions that the effects of rewards do not carry over beyond acquisition into later occurrences of the activity in question and do not transfer to related, but different, activities. The lesson of the experiments reported here is that both of these assumptions are, at best, suspect. Pretraining experiences involving reinforcement have effects tht transfer to new situations that may not themselves involve reinforcement. There is no reason to believe that there is anything self-contained about the effects of a reinforcement regimen (see Greene, Sternberg, & Lepper, 1976, for further confirmation). As a result, the negative effects of reinforcement reported here are always a possibility that must be weighed when any application using reinforcement contingencies is contemplated.
B. POSITIVE EFFECTSOF RULEFINDING There is less to be said about the positive effects of pretraining with rule finding than there is about the negative effects of pretraining that only requires the production of correct sequences. The rule-finding task teaches several lessons, all of which might be relevant to the subsequent tasks. First, it teaches subjects the informational value of variation, of experimentation. Second, it teaches subjects that their first guesses are likely to be wrong; that the world is at least a little more complex than it appears. Third, it teaches subjects that even when they have an explanation (a rule) that is consistent with all of the data generated thus far, it may not be the correct explanation, that theories are always underdetermined by data. Fourth, it teaches subjects that rules can change. The practical import of these lessons may be that confirmation bias is weakened, that the tendency to test multiple hypotheses against each other is strengthened (Platt, 1964), or that the range of data considered to be relevant is increased. It may seem implausible that talented college students actually need to
I32
Barry Schwartz
be taught lessons that boil down to truisms like “keep an open mind,” “try a range of possibilities,” and so on. Indeed, it seems quite likely that our subjects already knew all of these lessons long before they participated in our experiments. However, if the last decade or so of research on inference and decision making has taught us anything, it is that even bright and highly trained people are capable of making the most elementary errors if the context is right (e.g., Baron, 1985; Kahneman. Slovic & Tversky, 1982; Nisbett & Ross, 1980). For example, our subjects may know these truisms only abstractly and not know how to apply them in particular situations. Or they may know them extremely concretely as a set of very specific strategies that govern their activity in certain quite familiar domains. What our rule-finding pretraining may do is help bridge the gap between the very abstract and the very concrete. It may increase subjects’ accessibility to the problem solving strategies they have already been using without knowing it in particular situations (Rozin, 1976). It may also make them aware of some of the errors to which they are susceptible and allow them to guard against similar errors later. Baron (1985) has offered a careful and detailed theory of what good thinking in problem-solving situations like these requires. His account includes a catalog of the ways in which people can and do go wrong along with some suggestions for correcting or even preventing the most likely sorts of errors. Subjects who have been through three sequence problems in our rule-finding task are likely to have made most of these errors and corrected them. It is clear how improved problem-solving skills of the general sort being described here would have positive effects in the test tasks we employed. Maximally effective performance in Experiment I would have been virtually impossible without systematic variation, and subjects could easily have learned the value of systematic variation in rule pretraining. Accurate contingency estimation in Experiment 2 required sampling all the possibilities and giving all outcomes appropriate weight. The task instructions virtually insured that all subjects would sample all of the relevant possibilities, but nothing assured the appropriate weighting of evidence. Rulediscovery pretraining may have helped teach subjects the value of falsification and the significance of nonoccurrences of expected events. Rulediscovery pretraining may also have sensitized subjects to the differences between necessary conditions and sufficient conditions, which may in turn have induced them to examine the logical status of varied verbal formulations a good deal more carefully than naive subjects did. Finally, rule-discovery pretraining may have encouraged the wide-ranging variation in behavior that is probably necessary if subjects are to discover how to maximize as rapidly as they did in Experiment 4. None of these possible connections between pretraining and test tasks is incontrovertible, and analytic experiments would be required to pin down specific positive
The Experimental Synthesis of Behavior
I33
transfer effects. However, the present account is at least plausible, in part because it connects the positive effects of pretraining to known weaknesses in reasoning that occur without pretraining (see Baron, 1985).
C. STEREOTYPY, RANDOMNESS,A N D INTELLIGENT VARIATION Our interest in the effects of reinforced pretraining on subsequent behavior in people arose from our finding with pigeons that while reinforcement was good at producing stereotyped repetition, it was perhaps too good at it. When the contingencies demanded variation, it did not occur (Schwartz, 1980, 1982a). This led us to suggest that, perhaps, it was impossible to train variability using contingencies of reinforcement. Page and Neuringer (1985) demonstrated that we were wrong. When pecks can be distributed between two alternatives in any order or amount without constraint (unlike our procedures, which require four pecks on each key), pigeons are capable of learning to produce variable sequences. Page and Neuringer showed that pigeons accomplish this variation by choosing in a quasi-random manner. Randomness produces enough sequence-tosequence variation to satisfy quite stringent novelty requirements, as long as no other constraints are placed on sequences. Random choice will fail to satisfy our variability task requirements about as often as the pigeons we studied failed to satisfy them. It appears, therefore, that we also succeeded in training pigeons to choose at random; it is just that, in our task, randomness was not good enough. To many people working in fields as diverse as the experimental analysis of behavior, evolutionary biology, cognitive psychology, the philosophy of mind, artificial intelligence, and the history and philosophy of science, randomness and stereotypy appear to exhaust the possible categories of action (e.g.. Campbell, 1974; Dennett, 1984; Flanagan, 1984; Rosenberg, 1986a,b; Ruse, 1986; Skinner, 1981). Some other category, marked by what looks like intention, foresight, planning, teleology, and the like, must itself be explained by appeal to mechanisms of randomness and stereotypy. One of the appeals of the theory of natural selection in evolutionary biology was that it got “unscientific” notions of design out of the explanatory scheme. Design required a designer, some unseen intelligence that ran the universe. So taken were people by the apparent success of the theory of natural selection that it became the model for the explanation of seemingly intelligent human action. Indeed several people who have written on this topic who otherwise disagree violently on practically everything seem to be in agreement that no account of intelligence other than one based on random variation and selection is even possible (Dennett, 1975; Skinner, 1981).If the source of action were other than random, then some process or mechanism that was generating action selectively would be
I34
Barry Schwartz
said to be intelligent, and the operation of that mechanism would require an explanation. Though these views are now widely held within the scientific community, there are several good reasons to regard them as suspect (Davidson, 1980; Lacey & Schwartz, 1986, 1987; Rosenberg. 1986a.b; Schwartz & Lacey, 1982). In an argument directed explicitly at the inadequacy of behavior theory as a framework for the explanation of intelligent human action, Lacey and Schwartz (1986. 1987) have given a detailed account of the explanatory and predictive power of the everyday language of intention and teleology that people use to make sense of the actions of themselves and others in their social world. We have suggested that attempts to eliminate this language and the conceptual scheme that underlies it dramatically impoverish the descriptions we can give of human action and the distinctions we can make among types of action. There are as yet no good reasons-either scientific or practical-for abandoning these explanatory categories, and there are good reasons for retaining them and using them with rigor and precision. Without intentional categories, it becomes difficult to know how to distinguish planned, intelligent, systematic variation in action from mere randomness. Such a distinction seems to be necessary. Clearly, subjects who were able to master our variability task (Experiment 1 : Schwartz, 1982c Experiment 6) were not responding randomly because, as Page and Neuringer (1985) have shown, random responding does not work. Nor were they engaged in stereotyped responding, except perhaps at the level of following the rule that describes what is required for reinforcement. What, then, were they doing? Equipped with the blinders that the random variation-selection framework supplies, Catania ( 1987) recently described our demonstrated failures to produce variability in pigeons as having been “refuted by an experiment showing the opposite when artifrrctual constraints on the required sequence are removed (Page & Neuringer, 1985)” (Catania, 1987, p. 255, emphasis added). What made these constraints “artifactual?” Perhaps they were judged to be artifactual because they seemed to demand (and. in humans, produce) a class of behavior-intelligent variation-that the theoretical, random variation-selection framework says does not exist. We do not need experimental demonstrations to convince us that intelligent variation in human behavior does exist. But a lesson of the research reported here is that whether it will continue to exist in the future may well be an empirical question. If the application of instructional techniques involving reinforcement contingencies of the sort we have been studying becomes sufficiently widespread, then the problematic category of intelligent variation may disappear as a characteristic of human behavior
The Experimental Synthesis of Behavior
I35
in need of explanation. This will make it easier to construct scientific accounts of human action at the same time that it makes it harder to find people who are capable of formulating such accounts.
ACKNOWLEDGMENTS Most of the research reported in this contribution was supported by NSF grants BMS 78-15461 and BNS 82-06670, as well as several Swarthmore College faculty research grants. The article was written with the support of a Eugene M. Lang Faculty Fellowship. I am grateful to Alan Heubert and Heidi McBride. who between them provided 4 years of invaluable assistance with the design. execution, and analysis of this research, and to Jon Baron and Allen Neuringer for helpful comments on and stimulating discussion of much that is contained herein.
REFERENCES Allan, L. G . . & Jenkins, H. M. (1980). The judgment of contingency and the nature of the response alternatives. Crinudirin Joitrnril of Psychology, 34, 1-1 I . Alloy. L. B.. & Abramson, L. Y. (1979). Judgment of contingency in depressed and nondepressed students: Sadder but wiser? Journul of Experimentul Psvchologv: Gener.ii1. 108,441-485.
Alloy, L. B., & Tabachnik. N. (1984). Assessment of covariation by humans and animals: The joint influence of prior experience and current information. Psvcliologicd Rwieiv. 91, 112-149. Arkes, H. R.. & Harkness, A. R. (1983). Estimates of contingency between two dichotomous variables. Joitrncil of Experimentul Psychology: Generul. 112, I 17-1 35. Baron, J. (1985). Ruficmulity rind intelligence. New York: Cambridge University Press. Campbell, D. T. (1974). Evolutionary epistemology. In P. A. Schilpp (Ed.), The p/iilo.sipliy of Karl Popper (pp. 413-463). LaSalle. IL: Open Court Publ. Capaldi, E. J.. Verry. D. J.. Nawrocki, T. M. & Miller, D. J. (1984). Serial learning. interitem associations, phrasing cues, interference, overshadowing, chunking. memory. and extinction. Animiil Leiirning und Behavior, 12, 7-20. Catania, A. C. (1987). Some Darwinian lessons for behavior analysis: A review of Bowler’s The eclipse of Darwinism. Jorrrnul of f h e Experimentid Analysis of Beliiirior. 47, 429257. Crocker. J. (1981). Judgment of covariation by social perceivers. P.sycliologicti1 B d l e t i n . 90, 272-292. Davidson. D. ( 1980). Essctys o n ucfions und events. Oxford: Oxford University Press. Deci. E. L. (1975). Intrinsic mofivufion.New York: Plenum. Dennett. D. (1975). Why the law of effect will not go away. Joitrnii1,fiw flie T1ieor.y qf’dociiil Belicittor, 5 , 169-187. Dennett, D. (1984). Elhow room. Cambridge. MA: MIT Press. Einhorn, H. J., & Hogarth, R. M. ( 1978). Confidence in judgment: Persistence of the illusion of validity. Psycliologiccil Review, 85, 395-416. Feingold. B. D., & Mahoney, M. J. (1975). Reinforcement effects on intrinsic interest: Undermining the ovejustification hypothesis. Beliuvior Therupy. 6 , 367-377. Flanagan. 0. J. (1984). The science of the mind. Cambridge. MA: MIT Press.
I36
Barry Sehwartz
Fodor. J. (1983). The modulurity of mind. Cambridge, MA: M I T Press. Fountain, S. B., Henne, D. R., & Hulse. S. H. (1984). Phrasing cues and hierarchical organization in serial pattern learning by rats. Journul of Experimentul Psvchology: Anitniil Behavior Processes. 10, 30445. Greene, D.. Sternberg, B., & Lepper, M. R. (1976). Overjustification i n a token economy. Journal of Personulity & Sociul Psychology, 34, 12 19-1234. Herrnstein, R. J. (1970). On the law o f effect. Joitrnul of the Experimentul Anu1.vsi.s o f B e huviar, 13, 243-266. Hermstein, R. J. & Vaughn. W. (1980). Melioration and behavioral allocation. In J. E. R. Staddon (Ed.), Limits to uction (pp. 143-176). New York: Academic Press. Horton. R. (1%7). African traditional thought and Western science. Africu. 37,50-71; 155187. Hull. C. L. (1943). Principles of hehiiiior. New York: Appleton. Hull. C. L. (1952). A hehuvior system. New Haven, CT: Yale University Press. Hulse, S. H., & Dorsky, N. P. (1977). Structural complexity as a determinant of serial pattern learning. Leiirning und Motiviition. 8,488-506. Hulse. S. H.. & Dorsky. N. P. (1979). Serial pattern learning by rats: transfer of a formally defined stimulus relationship and the significance of non-reinforcement. Animul Leiirning iind Behiivior, 7 , 21 1-220. Jenkins, H. M.. & Ward. W. C. (I%S). Judgment of contingency between responses and outcomes. P.sychoIogicii1 Monogruphs. 79 ( I, Whole No. 594). Kahneman. D.. Slovic, P.. & Tversky. A. (Eds.) (1982). Jiidginent tinder iincertiiinty: Heicr i d e ‘ s und hiiises. New York: Cambridge University Press. Kahneman. D.. & Tversky, A. (1984). Choices, values and frames. Ainericun Psych)logist. 39, 341-350. Klayman. J . , & Ha. Y. ( 1987). Confirmation, disconfirmation and information. P.sychologicul Review. 94, 21 1-228. Lacey, H.. & Schwartz. B. (1986). Behaviorism, intentionality and sociohistorical structure. Behuviorism, 14, 193-2 10. Lacey, H.. & Schwartz, B. (1987). The explanatory power of radical behaviorism. In S. Modgil & C. Modgil (Eds.). B . F. Skinner: Consensirs find controversv (pp. 165-176). New York: Falmer. Lepper. M. R.. IQ Greene. D. (Eds.) (1978). The hidden cos/.s o f r e w u r d . Hillsdale. NJ: Erlbaum. , Logan, F. A. ( 1956). A micromolar approach to behavior theory. Psychlogiccil R e ~ i e i v 63, 63-73. Logan, F. A. (1960). Incentive. New Haven, CT: Yale University Press. Marr, M. J. (1979). Second-order schedules and the generation of unitary response sequences. I n M. D. Zeiler & P. Harzem (Eds.). Adviinces in the einci1ysi.s of hehaitor. Vol. 1. Reinfirceinent iind the orgiinizrition .fhehuiior (pp. 223-260). Chichester: Wiley. Mazur. J. E. (1981). Optimization theory fails to predict the performance o f pigeons in a two-response situation. Science. 214, 823-825. Mynatt. C. R.. Doherty. M. E., & Tweney, R. D. (1977). Confirmation bias in a simulated research environment: A n experimental study of scientific inference. Qiiurterly Joiirniil of Experimentd Psvcliology, 29, 85-95. Mynatt. C. R.. Doherty, M. E., & Tweney. R. D. (1978). Consequences o f confirmation and disconfirmation in a simulated research environment. Qiiiirter1.v Joiirniil ofExperiinentul P s v c l i o l u g ~30, , 395-406. Nisbett. R. & Ross. L. ( 1980). Hiiinun inference: Strcitexies iind .shortc~oining.sof sociril jiidginent. New York: Prentice-Hall.
The Experimental Synthesis of Behavior
I37
Page. S.. & Neuringer, A. (1985). Variability i s an operant. Joiirnul of Experimental Psyclioliigy: Anitnu1 Behavior Processes. 11, 429-452. Piscaretta. R. (1982). Some factors that influence the acquisition of complex, stereotyped response sequences in pigeons. Joirrnul of the Experimentul Anulysis of Beliuvior, 37, 359-369. Strong inference. Science, 146, 347-353. Platt. J . R. (1%). Prelec, D. (19821. Matching, maximizing, and the hyperbolic reinforcement feedback function. Psvcliologicril Review, 89, 189-230. Rachlin, H. Battalio, R. C . . Kagel, J . H., & Green, L. (1981). Maximization theory in behavioral psychology. Behuvir~rd& Bruin Scienres, 4, 37 1-388. Rescorla, R. A. ( 1972). Informational variables in Pavlovian conditioning. In G . H.Bower (Ed.). The psvcliologv of Ieurning und motivution (Vol. 6, pp. 216-282). New York: Academic Press. Roitblat. H. ( 1982). The meaning o f representation i n animal memory. The Behuviorul & Bruin Sciences. 5, 353-372. Rosenberg. A. (1986a). Intentional psychology and evolutionary biology. Part I: The uneasy analogy. Bdiuiiorism. 14, 15-26. Rosenberg. A. ( 1986b). Intentional psychology and evolutionary biology. Part 11: The crucial disanalogy. Beliriiiorism, 14, 125- 138. Rozin. P. (1976). The evolution o f intelligence and access to the cognitive unconscious. I n E. Stellar & J . M. Sprague (Eds.), Progress in psychobiology and physiologicul psychology (Vol. 6. pp. 124-180). New York: Academic Press. Ruse, M. (1986). Tuking Darwin seriortsly. New York: Blackwell. Schwartz. B. ( 1980). Development of complex, stereotyped behavior in pigeons. Joirrnal qf the Erperitnentd Anrilysis qf Behuviiir, 33. 153-166. Schwartz. B. (1981a). Control of complex, sequential operants by systematic visual information in pigeons. Joirrniil of Erperimentrrl Psvclilogy: Animul Behavior Processes. 7, 31-44. Schwartz. B. (1981b). Reinforcement creates behavioral units. Behrivioirr Anri1.vsis Letters. I, 33-41. Schwartz. B. (1982a). Failure to produce response variability with reinforcement. Jorimul of the Experiinentcil Ancilysis of Beliciiior. 37, 171-181. Schwartz. B. (1982b). Interval and ratio reinforcement o f a complex sequential operant in pigeons. Joiirnul of tlie Experiinentul Anulv.sis of Behavior. 37, 349-357. Schwartz. B. ( 1982~).Reinforcement induced behavioral stereotypy: How not to teach people to discover rules. Joiirnril of Experiinentul P.s.vc/iology: Generul. I1 1, 23-59. Schwartz. B. (1984). Tlie p.sydio1og.v ofletrrning und heliuvior. New York: Norton. Schwartz, B. ( 1985). On the organization of stereotyped response sequences. Aniinul Leurning rind Behririor. 13, 261-268. Schwartz. B. (1986a). Allocation o f a complex, sequential operant on multiple and concurrent schedules of reinforcement. Joiirnul i f the Experimentol Anr11v.si.sqf Bt4irivior, 45, 32 I335. Schwartz. B. (1986b). Response stereotypy without automaticity: Not quite involuntary attention in the pigeon. Letrrning rind Motivtrtion. 17, 347-365. Schwartz. B . . & Lacey. H. (1982). Behaviorism, science und hiiinrrn nutiire. New York: Norton. Schwartz. B.. & Reilly, M. (1983). Response stereotypy without automaticity in pigeons. Leurning rind Motivcition, 14, 253-270. Schwartz. B.. & Reilly, M. (1985). Long-term retention of a complex operant in pigeons. Jorrrncil of Experimental Psychology: Aniinul Be/icriv?w Praivsses. 1I , 337-355.
Barry Schwartz
I38
Skinner. B. F. (1935). The generic nature of the concepts of stimulus and response. Joirrniil of Generul P.v.vchiilogy, 12, 40-65.
Skinner, B. F. (1938). The hehuvior of orgunisms. New York: Appleton. Skinner. B. F. (1981). Selection by consequences. Science. 213, 501-504. Staats. A. W. ( 1975). Sociiil behoviorism. Homewood. IL: Dorsey. Tolman. E. C. (1932). Piirposive hehuvior in utiimals und men. New York: Appleton. Tversky. A., & Kahneman. D. (1981). The framing of decisions and the psychology of choice. Science. 211, 453458.
Tweney, R. D., Doherty, M. E.. & Mynatt, C. R., Eds. (1981). On scientific thinking. New York: Columbia University Press. Vogel, R.,& Annau, Z. (1973). An operant discrimination task allowing variability of response patterning. Joiirniil iif the Experimental Analysis of Behuvicir, 20, 1-6. Wason. P. C. ( 1964). The effect of self-contradictionon fallacious reasoning. Qiiiirterlv Jorirniil of Experimentul Psychology, 16, 30-34.
Wason. P. C., & Johnson-Laird, P. N. (1972). The p.s.vch~ilogvof reiisiining. Cambridge, MA: Harvard University Press. Wasserman. E. A.. Nelson. K. R..& Larew, M. B. (1980). Memory for sequences of stimuli and responses. Joiimul of the Experimental Anulysis of Behuvior. 34, 49-59.
EXTRACTION OF INFORMATION FROM COMPLEX VISUAL STIMULI: MEMORY PERFORMANCE AND PHENOMENOLOGICAL APPEARANCE Geoffrey R . Loftus John Hogden
I. Introduction An observer viewing a visual stimulus forms a representation of the stimulus in memory that can later be accessed in a variety of ways. The encoding processes by which the representation is formed are undoubtedly diverse and complex (cf. Potter, 1976; Intraub, 1984; Loftus & Ginn, 1984; Loftus, Hanna, & Lester, 1988). They may, however, be conveniently divided into those that operate on ( I ) the physical stimulus, (2) the iconic image (hereafter icon) that follows the physical stimulus, and (3) the shortterm representation of the stimulus that follows the icon's termination. Using the terminology of Intraub (l980), Loftus and Ginn (1984) and Potter (l976), we call the first two types of processes perceptuul and the third conceptual. Our focus in this article is on perceptual processes, and our first goal is to construct and test a model of the relation between the perceptual processing of some stimulus and the quality of the stimulus's eventual memory representation. Given our definitions of perceptual and conceptual processing, it is evident that perceptual processing occurs in conjunction with conscious (or phenomenological) awareness of the to-be-encoded stimulus. Indeed, it is the existence of such awareness that underlies the perceptualkonceptual dichotomy to begin with; a common intuition is that there must be a rawTHE PSYCHOLOGY OF LEARNING AND MOTIVATION. VOL ??
I39
Copyright 0 19XX by Acddemlc Prer\. In& All nghh of reproduclion in dny torrn re\erved
I40
Geoffrey R. Loftus and John Hogden
information extraction process that can only occur if the stimulus is phenomenologically present. Our second goal is to extend our perceptualprocessing model to encompass the phenomenological awareness of a visual stimulus. To briefly anticipate, we will argue and present data favoring the proposition that phenomenological awareness is a consequence of perceptual processing rather than the other way around. Much of the empirical work described here focuses on extraction of information from an icon and phenomenological awareness of the icon. In keeping with past terminology, we shall often refer to the latter as visible persistence. The appropriateness of these issues as topics of scientific investigation is a source of some debate (see Haber, 1983, 1985). While we are not neutral in that debate (cf. Loftus, 1983, 1985b),we note that our present interest is not so much in the icon per se, but rather in the icon as a tool for investigating the relation between information extraction and phenomenological awareness. We start by reviewing evidence that the same perceptual processes operate on a physical stimulus and on an icon. This evidence sets the stage for a series of new experiments in which we investigate factors that influence perceptual processing of the icon and then go on to argue that these same factors are intimately involved in the operation of perceptual processes in general. Empirically, our starting point is a series of picture-memory experiments reported by Loftus, Johnson, and Shimamura (1985). This work was motivated by the existence of two important similarities between a picture and the icon that follows. First, information that is useful in a subsequent memory test can be extracted from the icon, just as it can be extracted from the physical stimulus. Second, there is no phenomenological dividing line between the offset of the physical stimulus and the onset of the icon; indeed, naive subjects think that an icon is a fading extension of the physical stimulus. These similarities led Loftus et al. to hypothesize that a stimulus and an icon are equivalent in terms of ( I ) the potentially extractable information that they contain, (2) the perceptual processes that operate on them, and (3) their influence on whatever mental machinery is responsible for phenomenological awareness. Loftus et al. (1985) were concerned chiefly with information extraction. They reasoned that if icon/stimulus equivalence held, then information extracted from an icon might be parsimoniously characterized in terms of information that could potentially be extracted from a physical extension of the stimulus. To investigate this possibility, Loftus et al. assessed memory performance for pictures that had been followed either by an immediate noise mask (which did not permit an icon) or by a 300-msec delayed noise mask (which did permit an icon). Generic results from these experiments are shown in Fig. 1. As expected on the basis of past data
Information Extraction from Visual Stimuli
141
Good
9)
0
e
cp
E
5n 5
z
3 Poor
Fig. I . Generic results from Loftus e / id. (1985): memory performance (in the experiments. measured by detail recall. recognition, or ratings) as a function of stimulus exposure duration. The two curves are for imrnediate-mask (no-icon)and delayed-mask (icon)stimuli. The curves are horizontally parallel, displaced from one another by 100 msec.
(and on the basis of common sense), performance increased with increasing exposure duration.' The finding of primary interest, however, was that the physical exposure duration required to achieve any given performance level was approximately 100 msec longer for immediate-masked pictures, relative to delay-masked pictures. This result was independent of the picture's exposure duration; moreover, it obtained for three performance measures, four sets of pictures, and two levels of stimulus luminance. Loftus el al. concluded that the additional information that could be extracted from an icon was approximately equal to the additional information that could be extracted from a 100-msecextension of the physical stimulus. 'The major findings of the Loftus et ul. experiments held over a variety of dependent variables (detail recall. yes-no recognition, and rated visibility). For this reason, the ordinate of Fig. I is simply "performance" rather than some specific performance measure.
I42
Geoffrey R. Loftus and John Hogden
Accordingly, they characterized the icon as having an eyuivalenf physical dirrufion, or a worth of M’ = 100 msec.’ This invariance of an icon’s worth over such a wide variety of conditions could be entirely coincidental. It seems more likely, however, that the invariance is not coincidental. In particular, it could result from the kind of equivalence hypothesis sketched previously: that from the cognitive system’s perspective, an icon is equivalent to a literal (albeit fading) extension of the physical stimulus. In this article, we expand on this idea and show how the invariance of the icon’s worth follows from such equivalence. The article is divided into two main sections. In the first section, we propose a formal model, incorporating the notion of icon/stimulus equivalence, that accounts for the Loftus ef al. (1985) data as well as for other findings in the picture-memory literature. In the second section. we extend this model to account for subjective accounts of visible persistence. In each section we present experiments in support of the model. Our model incorporates two fundamental propositions. The first is that a stimulus and its icon are equivalent with respect to both the kind of information that they provide the observer and the perceptual processes that operate on them. Given this viewpoint, it is appropriate to use the term visual sfimirl~sto encompass both the physical stimulus and any icon that follows. The second proposition, which we formalize in Section 11, is that phenomenological experience of a stimulus results from extraction of information from that stimulus. This means that one sees an icon for the same reason that one sees a physical stimulus-the information-extraction process is the same in both cases, so the phenomenology is the same in both cases. An implication of this second proposition is that extraction of information from a visual stimulus on the one hand and the subjective experience of seeing the stimulus on the other are mediated by the same processes. As applied to the icon, this notion has recently come under attack. The most extensive argument against it was made by Coltheart (1980) who compared degree of information extraction, as assessed in a Sperling (1960) partial-report task, with the duration of visible persistence, as assessed in a synchrony-judgment task (e.g., Efron, 1970)or a temporal-integration task (e.g., Eriksen & Collins, 1967; Di Lollo, 1980). Coltheart noted a ‘To forestall confusion. it is worthwhile at this point to distinguish between an icon’s worth and an icon’s duration. These two entities are related, but they are not the same. Worth. as noted, refers to the amount of time by which a masked physical stimulus must be extended in order to extract the same amount of information as would be extracted from an icon. Duration refers to the maximum time following stimulus offset during which the icon continues. by some criterion. to exist. Later we will compare worth and duration in detail.
Information Extraction from Visual Stimuli
143
particular variable-stimulus duration-that has different effects on the two phenomena. Stimulus duration has little, if any, effect on partial-report performance, but a substantial effect on the estimated duration of visible persistence; longer stimuli show less persistence than do shorter stimuli. Our model accounts for this effect.' 11.
A Model of Information Acquisition and Picture Memory
Our goal in this section is to formulate a model of the relation between picture viewing and later picture memory. After describing the model, we show that it accounts for some robust findings in the visual-memory literature, and we then present three picture-recognition experiments in support of it. We will not be concerned with phenomenological appearance in this section; we defer extension of the model to this domain until Section 111. We present the model in two forms: a general and a quantitative form. The general form is composed of five qualitative assumptions that we believe may correspond to psychological reality. In the quantitative form of the model, two of these qualitative assumptions are replaced with corresponding quantitative forms. These quantitative assumptions are stronger than their qualitative counterparts in that the former imply the latter, but not vice versa. Although we have substantially less faith in the accuracy of the quantitative assumptions, they may be approximately correct and, in any event, are useful for illustrating relationships and predictions. A.
THEMODEL
I.
Overview
Consider a situation in which an observer views a briefly presented visual stimulus with the intent of being able to remember it later on. Within our model, the stimulus is treated as a bundle of information that must be extracted and eventually encoded in some relatively long-term memory. The model does not precisely characterize what is "information." It does, however. incorprate the assumption that information is unidimensionul, 'Coltheart also asserted that stimulus luminance has an effect similar to that of stimulus duration: that is, he asserted that luminance has no effect on partial-report performance. but a negative effect on persistence duration. However. as we discuss later. Adelson and Jonides ( 1980) showed a small effect of stimulus luminance on partial-report performance. while Long and Beaton ( 1982) showed larger and more robust effects. In addition. the effect of luminance on persistence duration turns out to be somewhat complicated. both empirically and theoretically. We also discuss luminance in some detail later in this article.
144
Cathy
R. LoNus and John Hogden
i.e., that amount of extracted information is representable by a single value on some ordinal scale. We mention this assumption here because it is crucial: if it is incorrect, the remaining four assumptions make no sense. At the end of this article, we discuss possible limitations on the unidimensionality assumption's validity along with concommitant restrictions on the model itself. 2. Assumptions
The model consists of five assumptions involving ( 1 ) available (potentially extractable) stimulus information, (2) unidimensionality of information, (3) the rate at which available information is extracted, (4) the relation between extracted information and subsequent memory performance, and ( 5 ) the basis of phenomenological appearance. We describe the first four assumptions in this section, and the fifth in Section 111. Assumption I : Available Information. A stimulus consists of information that is potentially available to a subsequent information-extraction process. While the stimulus is physically present, all information is available; when the stimulus physically disappears, available information decays over time. The proportion of total stimulus information available at time t following stimulus onset is designated a((). In the general model,
a(t) =
b(r - d )
for t < d for t > d
where d is stimulus duration and b is the poststimulus decay function. The function b is assumed to be nonnegative, monotonically decreasing with the constraint that b(0) = 1.0, and the integral of b from 0 to infinity is equal to w' (recall that w is the icon's worth in units of time).4Because the argument of b is ( I - 4, the shape of d and the value of its integral are independent of d, the picture's duration. In the quantitative model, b is a negative exponential; thus for t < d for t > d Equation (Iq) is illustrated in Fig. 2 (top) for two values of d: 20 and 270 msec. The icon's worth, w , is set to 100 msec, the value obtained by Loftus et a/. (1985). 'Note that n ( 0 . being a proportion, is a dimensionless number. Therefore, its integral over time is in units of time.
I45
Information Extraction from Visual Stimuli
1.o
0.8
0.8 0.4
0.2
Tlme Slnce Stimulus Onset (ms) 4.0
I
Time Slnm Stimulus Onset (mr)
20-ms Stimuli
0
100
200
300
400
500
6
T i m Since Stimulus Onset (mr) Fig. 2. Quantitative model: illustration of stimulus onset.
c r f f J . N I J . and / f f J
as functions of lime since
Geoffrey R. Lonus and John Hagden
146
Assumption 2: Unidimensionulity. Information is unidimensional; that
is, both amount of information available in the stimulus and amount of information extracted by the observer can be represented by a single value on some ordinal scale. Assumption 3: fnformution-Extraction Rute. The proportion of total stimulus information extracted by time t is designated f ( t ) . New information is extracted at a rate dt), where d t ) is the derivative of extracted information with respect to time, i.e., d t ) = dl/dt. The information-extraction rate is determined by two things. First, d t ) is assumed to be a multiplicative function of u(t), the available information [since with zero available information, d t ) should be zero]. Second, r ( t ) is assumed to be a decreasing function of f ( t ) ,the proportion of information already extracted; i.e., earlier information is extracted faster than later information. This assumption (in conjunction with unidimensionality) has been incorporated, in one form or another, into a variety of informationacquisition models (e.g.. Kowler & Sperling, 1980; Krumhansl, 1982; Loftus & Kallman. 1979; Massaro, 1970; Rumelhart, 1969). The idea is that easier- (Lea,faster)-to-extract information is acquired earlier than harder(i.e.. slower)-to-extract information Cjust as, for example, the earlier words in a crossword puzzle are tilled in faster than the later words). In the general model, r(t) =
u(r)hlf(t)l
(2g)
where h is a nonnegative, monotonically decreasing function that approaches zero as f ( t ) approaches 1.0. The constraints embodied in Eqs. (Ig) and (2g) provide the model with certain desirable properties. First, they instantiate the ideas sketched previously that d t ) is multiplicatively related to a(t)but negatively related to f ( t ) . Second, f ( t )cannot exceed I .O. Third, if the stimulus remains physically present indefinitely, l ( t ) approaches or reaches I .O. The general form of f ( t ) can be derived from Eqs. (Ig) and (2g). As shown in Section V, A. it is: I"'=
{
'
H -- ' ItId
+
+ H(0)l H(0) + Bft - d)]
for t < d for t > d
where
H(f) = j[l/h(l)ldf and B(t - d ) =
1: "Mr -
- d) dt
and H - ' is the inverse function of H , i.e. H - ' { H [ l ( t ) ]=} l ( t ) .
(3g)
Information Extraction from Visual Stimuli
I47
The interpretation of Eq. (3g) is. essentially, that f ( r ) is a function, H - I , of two components, which are seen in the bottom part of the equation. The first, indicated by id + H(O)],corresponds to information extracted from the physical stimulus, and the second, indicated by B(r - d ) , corresponds to information extracted from the icon. That the same function, H - ' , is applied to both components reflects the proposition that the same processes are applied to both the physical stimulus and the icon. For the quantitative form of h , we have chosen a function that describes a Variety of physical situations: h(f) is proportional to [ 1 .O - f ( t ) ] ,the asyet unextracted available information. Thus,
where c , the constant of proportionality, is a free parameter with units of sec-'. This leads to the equation for dt):
Equation (2q) is illustrated in Fig. 2 (middle) with c = 3.7, a value that was estimated in an experiment to be described in Section 11.5 The function h(f) is central to the model in that the effects of a variety of independent variables are assumed to be mediated by their influence on h(0. In the quantitative model, on which we will later rely heavily, h(f) is controlled by the parameter c. It is evident from Eqs. (2q) and (2g) that the parameter c and, more generally, the function h determine both the initial value of r(r) [that is, the value of r(t) when t = 01 and how fast r(r) declines with increases in f(r). When we later characterize some independent variable (e.g., stimulus luminance) as affecting 4th this effect is instantiated in the quantitative model by variation in c across levels of the independent variable. In general, a high c value (e.g.. with bright stimuli) implies an d t ) that is initially high but declines rapidly over time. Conversely, a low c value (e.g.. with dim stimuli) implies an dr) that is initially lower but declines more slowly over time. The proportion of extracted information, f ( t ) . always increases more rapidly the higher the value of c. The quantitative-model equation for f ( r ) can be derived from Eqs. (lq) and (2q). It is
Equation (3q) is illustrated in Fig. 2 (bottom) 'The parameter c takes on this value when I i s expressed in seconds.
I48
Geoffrey R. M u s and John Hogden
Assumption 4: Memory Performunce. Memory performance, however measured, is a monotonic function of extracted information, i.e.,
where P(d)is memory performance for pictures presented for a duration of d sec and m is a monotone increasing function. Concern with the nature of m is beyond the scope of this article (here we will mostly be concerned with model predictions that do not depend on strong assumptions about m). We note in passing, that m is determined by such things as the nature of postperceptual (conceptual) processing of the stimulus, the nature of events occurring during the study-test interval, the nature of the memory test, and the nature of the retrieval process.
B. APPLICATIONS OF THE MODEL The model as described thus far accounts for several salient aspects of picture-memory performance that we will briefly describe. First, however, we describe how we apply the model to data.
I . Evaluation Procedures The model allows calculation of f ( t ) , the information extracted by time However, because the model specifies the function m relating / ( I ) to memory performance to be no stronger than monotonic, it is not possible to predict exact performance for a given experimental condition. There are, however, two other ways in which we can apply the model to data. First, the model can predict the ordering of performance values across a set of experimental conditions; thus, we can evaluate whether the acrossconditions relation between predicted l(r)and observed memory performance is monotonic. Second, the model can, in some instances, predict equivalence properties, that is, it can specify the sets of exposure durations that produce equal memory performance under different levels of some independent variable; thus we can test whether these equivalence properties hold. We shall use both evaluation procedures in application of the model to existing data. We use the first procedure only in application of the model to Experiments 1-3. I.
2. Applicutions of the Model to Existing Data In this section we describe application of the model to five kinds of picture-memory data: effects of stimulus exposure duration, stimulus luminance, subjects' age, stimulus priming, and stimulus maskedunmasked. We also describe application of the model to a partial-report paradigm.
Information Extraction from Visual Stimuli
I49
a . Stimulus Duration. Numerous experiments have shown that performance increases with increasing stimulus duration (e.g., Loftus, 1972; Loftus & Bell, 1975; Loftus & Kallman. 1979; Potter 1976; Potter & Levy, 1%9; Shaffer & Shiffrin, 1972). The model’s account of this finding is straightforward: although declining over time, the information-extraction rate, 41).is always positive. Therefore & I ) , the integral of 41).must increase over time. b. Three Multiplicative Variables: Stimulus Luminance, Stimulus Priming, and Subjects’ Age. Empirically, an independent variable bears a multiplicative relation to exposure duration when it is observed that
Here, PAX)and PJ(x)denote performance for two levels, i and j , of the independent variable following some exposure duration, x, and c is a dimensionless constant. The interpretation of Eq. ( 5 ) is that the exposure duration required to achieve any given performance level is greater by some factor, c , for level j relative to level i of the independent variable. Note that Eq. ( 5 ) defines an equivalence property; it specifies the circumstances under which performance is equal under the different levels, i and j . of some independent variable. Multiplicative relations have been demonstrated for three independent variables: stimulus luminance (Loftus, 1985a, 1986), subject age (Loftus, Truax. & Nelson, 1986). and stimulus priming (Reinitz, 1987; Tulving, Mandler, & Baumal, I%), For example, Loftus (1985a)varied luminance during initial viewing in a picture-recognition paradigm. He found that when luminance was reduced by two log units, exposure duration had to be multiplied by approximately 2.0 in order to maintain the same performance level. The form of Eq. ( 5 ) that represents this finding is
where PsRlGHT and P,,, refer to memory performance for high-luminance (bright) and low-luminance (dim) pictures, respectively. The model accounts for multiplicative relationships by assuming variation in the information-extraction rate, 4th across levels of the independent variable. In particular. suppose that for the two levels, i andj, of the independent variable, ri(ti)= dl/dt,= f(1)
and
Iso
Geoffrey R. Loftus and John Hogden
wherefis some monotone function. Note that Eqs. (6a) and (6b) conform to the r ( t ) functions of the general model [Eq. ( 2 g k h in addition, they imply that for any given information-acquisition value, I, r(t) is different by some factor, c , for level i relative to level j of the independent variable. Then, as shown in Section V.C.2. a multiplicative effect of the independent variable will obtain. c. An Additive Variable: ImmediutelDelayed Mask. Empirically, an independent variable bears an udditive relation to exposure duration when it is observed that
P,(d) = P,(k
+ d)
(7)
Here, P i x ) . Pi(&),and d are defined as in Eq. (51, and k is a constant in units of time. The interpretation of Eq. (7) is that the exposure duration required to achieve any given performance level is greater by k msec for level j relative to level i of the independent variable. Equation (7). like Eq. ( 5 ) . defines an equivalence property. As we discussed earlier, Loftus et a / . (1985) found an additive relation for stimuli maskedunmasked: performance for (d + 100)-msec. immediatemasked pictures (i.e., pictures that were not followed with an icon) was equal to performance for d-msec. delay-masked pictures (i.e., pictures that were followed by an icon). The form of Eq. (7)that describes this finding is
where PI,,,, and PNOICON refer to memory performance for pictures followed by an icon and not followed by an icon, respectively. To account for this additive relation, we assume that presentation of a noise mask reduces r ( t ) to zero. We show in Section V. B that the Loftus et a / . result then follows from the general model. d. The Partial-Report Paradigm. Sperling's ( 1960) classic article introduced the partial-report paradigm and provided the foundation for almost three decades of work on the icon. In the partial-report paradigm, a matrix of items is briefly presented to an observer. Suppose, for the sake of illustration, that a 3 x 3 matrix of letters is presented. In a wholereport condition. the observer reports as many of the nine letters as possible. In a purtial-report condition, the observer is cued via a high-. me'In all experiments in which multiplicative variables have been found, stimuli have been masked at offset. In this configuration, there is no icon and the r ( r ) functions concern only the situation in which the stimulus is physically present. Therefore, u(r) = 1.0.
Information Extraction from Visual Stimuli
151
dium-, or low-frequency tone to report only one of the three rows. To estimate the number of available letters in the partial-report condition, the number of reported letters per row is multiplied by the number of rows (three in this example). Sperling and legions of subsequent investigators' found that as the interstimulus interval (1S1) between stimulus and cue increases, the estimated number of available letters decreases and asymptotes at the whole-report level (about 4-5 letters) after an IS1 of about 300 msec. The explanation was that information was being read out of a rapidly decaying information store, and the notion of the icon was born. Our model's account of these data rests on the idea that information extraction does not begin until the cue is presented.' Essentially. this means that total extracted information-and thus partial-report performance-depends only on the value of u, the available information, at the time of cue presentation. To be more precise, suppose that the cue is presented at a delay of q msec following stimulus offset, i.e.. at time t = (d + 4).At that time, I ( r ) = 0 and u(r - d) = u(q). Therefore, at time (d + 4).we know from Eq. (2g) that
It is evident from Eq. (8g) that r(r) does not depend on d, the stimulus duration; it depends only on q. the cue delay. Therefore, r ( t ) ' s integral, I ( r ) , which determines partial-report performance, also depends only on q. in accordance with results reported by Sperling (1960) and Yeomans and Irwin (1986). To illustrate using the quantitative model, the total information extracted from the icon given a cue delay, q. is
as shown in Section V, D. Thus, total extracted information, Nr), does not depend on d. However, as q becomes larger, I ( t ) becomes smaller, which is the classical delay-of-cue finding. As indicated in Eqs. (8g) and (8q). however, r ( r ) and therefore its integral. Mt), and partial-report performance does depend on the function h. Later, we discuss the influence of stimulus luminance on h. Briefly, if stimulus 'The partial-report procedure as applied to picture perception is described by Biederman (1972). Biederman. Mezzanotte. and Rabinowitz (1981), and Biederman. Rabinowitz. Glass. and Stacy (1974). T h e instructions in the partial-report task may lead the observer either to refrain from extracting information prior to the cue or to extract random information from the array prior to the cue. In each case. we assume that information extraction from the cued row begins anew at the time that the cue is processed.
I52
Geoffrey
R. Lollus and John Hogden
luminance is low enough, then h(l) is lowered, as demonstrated by Loftus ( l985a. Experiment 3, using alphanumeric stimuli). With similarly low stimulus luminance, dr), and thus partial-report performance, must also be lowered. Adelson and Jonides (1980) have confirmed this prediction.
c.
EXPERIMENTS 1-3: NEW DATACONCERNING THE DURATION OF PERCEP~UAL PROCESSING FOLLOWING STIMULUS OFFSET
The Loftus e f a / . (1985) experiments were designed to assess an icon's worrh-how much information can be extracted from the icon in terms of additional physical exposure duration. Another salient feature of an icon that has been the subject of substantial investigation and that will play a major role in our arguments is an icon's duration. Our goal in Experiments 1-3 was to measure icon duration in the sort of picture-memory paradigm used by Loftus ef al. Duration here refers to the length of time following stimulus offset during which perceptual processing-i.e., extraction of useful information from the icon-continues to occur. Measurement of the icon's duration in this way constitutes a preliminary test of the proposition that information extraction and visible persistence are two effects of the same process. If this proposition is correct, then the icon's duration should be in the 200-300 msec range found in persistence experiments. To measure the duration of perceptual processing, we used a paradigm reported by Loftus and Ginn (1984; see also Erwin, 1976; Erwin & Hershenson, 1974; Irwin & Yeomans, 1986). in which briefly presented target pictures are followed by a noise mask that is either bright or dim. The bright mask is such that when it is physically superimposed on a target picture, no features from the target can be seen (thereby fulfilling Eriksen's, 1980. "minimal test" for a mask). The dim mask is such that, while the mask itself can be perceived when target and mask are physically superimposed, target features are still available. The fundamental assumption underlying this paradigm is that variation in mask luminance affects only perceptual processes. Thus, if mask luminance is observed to affect subsequent picture memory, it is inferred that perceptual processing was ongoing at the time that the mask occurred. If the mask occurred sometime following stimulus offset, a mask-luminance effect further implies the continuing perceptual processing of (i.e., extraction of information from) the icon. Thus, the stimulus-mask interstimulus interval (ISI) at which mask luminance no longer affects memory performance is an estimate of icon duration.' "Empirically, the question of when asymptote has been reached is not simple to determine. We take the traditional hypothesis-testing approach and, for each stimulus-mask ISI, determine whether there is a significant performance difference between masked and unmasked
Information Extraction from Visual Stimuli
I53
A recognition memory procedure was used in Experiments 1-3. In an initial study phase, target stimuli were presented, one by one, for inspection. Immediately following the study phase was a test phase in which the target stimuli, randomly intermingled with distractor stimuli, were presented, again one by one, in an oldhew recognition memory test.
I . Experiment I
In Experiment I , two independent variables were factorially combined in the study phase. They were target-mask ISI, which ranged from 0 to 300 msec. and mask luminance, which was either high or low (hereafter, bright or dim). a. Method. University of Washington undergraduates (I 10) participated in a I-hr session for course credit. They were run in 22 groups of 5 subjects per group. Stimuli were 132 naturalistic color pictures, prepared as 35-mm slides, depicting seascapes, landscapes, cityscapes, and weddings. They were randomly placed into two slide trays of 66 slides per tray. A noise mask consisted of a jumble of black lines on a white background. The noise mask could be projected at either of two luminances, bright or dim. The bright noise mask, projected at normal projector luminance, was such that when it was physically superimposed on a stimulus picture, the stimulus could not be seen. The dim noise mask was attenuated by 2 log units relative to the bright mask using a neutral-density filter. The dim mask could be seen when it was superimposed on a stimulus picture, but it did not prevent extraction of any stimulus features. When nothing was being projected, a dim adapting field was present. All relevant luminances are shown in Table I. The same apparatus was used in all seven experiments that we report. Stimuli were displayed by a Kodak random-access slide projector and subtended a visual angle that ranged from 15 to 22" horizontal and from 10 to 15" vertical, depending on where the subject sat. A Kodak standard projector was used to display the noise mask, and a second standard projector was used to display a dim fixation point that preceded each target. Filter wheels were positioned in front of the stimuli and mask projectors. All projectors were equipped with Gerbrands tachistoscopic shutters with rise and fall times of approximately I msec. Subjects made all responses on individual 16-key response boxes. All display and response apparatuses stimuli. This procedure has the disadvantage that an estimate of asymptotic IS1 will systematically depend on experimental power. Across the current experiments. power is approximately constant so results are comparable. However. to compare across different sets of experiments. an absolute criterion masked-unmasked performance difference that defines asymptote should ideally be specified.
I54
Ceoflrey R. Loflus and John Hogden
TABLE 1 STIMULUS LUMINANCE Stimulus Adapting field Projector on. no slide Fixation spot Pattern mask Bright background Black markings
Luminance (millilamberts) 0.07 38.43 0.38
25. I9 2.57
were controlled by an Apple I1 computer system described by Loftus. Gillispie, Tigre and Nelson ( 1984). An experimental session consisted of a study phase followed by a test phase using the stimuli in the first slide tray and then another study and test phase using the stimuli in the second slide tray. On each study trial, a target stimulus was displayed for 40 msec. A 500-msec noise mask followed most target pictures at one of 5 ISIs: 0, 100,200,250, or 300 msec. There were thus 5 lSIs x 2 mask luminances = 10 conditions. In addition, there was a control condition in which no mask was shown; hence, there were I I conditions in all. Within each tray, 33 stimuli were presented during the study phase. The I I conditions were presented in random order with the restriction that each condition occurred once during each of the three I I-trial blocks within each slide tray. The sequence of events on each study trial was as follows. First, a I sec tone signaled the subjects to fixate a dim spot that concurrently appeared at the center of the viewing field. A target picture was then presented for 40 msec, followed, except in the control condition, by the mask, presented for 500 msec at its appropriate IS1 and luminance. The stimulus onset asynchrony (SOA) between study trials was 3 sec. In the no-mask control condition, only the adapting field was present between the offset of the target picture and the start of the next study trial. At the time of test, all 66 stimuli in the slide tray were shown in random order. The target-distractor ordering was different for the two slide trays but, for each tray, was identical for all 22 groups in the experiment. For each test stimulus, subjects were asked to respond "old" or "new" corresponding to whether they thought they had or had not seen the stimulus in the just-preceding study phase. Each subject responded by pressing the appropriate key on his or her response box. Each test trial began 0.5 sec after all subjects had responded to the previous test picture. Each of the 132 stimuli appeared as a target for half of the groups and
Information Extraction h m Visual Stimuli
0.7
1
is5
X
No-mask control
0.6
+ .-
h
5
0.5
Q 0.4
0.3 0.2
I
0
Fig. 3.
'
I
'
1
'
100
1
' I
200
'
IS1 (ms)
I
*
I
300
Experiment I data. Each data point is based on 792 observations.
as a distractor for the other half. Each stimulus appeared once in each of the I I conditions over the I I groups for which it appeared as a target. 6 . Results and Discussion. Because all study conditions were randomly intermingled within a study tray there was only a single false-alarm probability for each tray. Averaged over the two slide trays, the falsealarm probability was 0.278. Figure 3 shows stimulus performance (hit probability) as a function of stimulus-mask 1S1. Different curves are shown for the two mask luminances, and the far right-hand point represents control-condition performance. Performance in the no-mask control condition (0.707) was significantly higher than performance in any of the other conditions, [lowest t(109) = 2.211; thus any mask, be it bright or dim, lowers memory performance for the picture it follows, at least in the IS1 range of 0-300 msec."' Performance increased as a function of stimulus-mask ISI, both when the mask was dim [F(4,436) = 8.251 and when it was bright W(4.436) = 50.331. Recall that at a given ISI, the presence of perceptual processing (which indicates the continuing existence of the icon) is implied by a superiority of dim-mask performance over bright-mask performance. Accordingly, individual one-tailed t tests comparing the two mask-luminance conditions were performed at each of the five ISls. The results are shown in Table "'As shown by Loftus. Hannd. and Lester ( 1988). one effect of a noise mask is to impair conceptual as opposed to perceptual processing. This is why a mask can cause a performance deficit relative to a no-mask condition, even if the mask occurs following icon termination.
CeoKrey R. Loftus and John Hogden
I56
TABLE I1 EXPERIMENT 1: I VALUES BETWEEN DIMA N D BRIGHT-MASK PERFORMANCE AT EACHSTIMULUS-MASK ISI" IS1 (msec)
I( la)
0 100 200 250
8.16 1.80
300
- 0.80
I .35 I .95
-
"Positive values indicate dim-mask performance superiority.
11. It is evident that dim-mask performance significantly exceeds brightmask performance at lSls of 0 and 100 msec. At lSls of 200 and 250 msec, dim-mask performance also exceeded bright-mask performance; however, this difference was significant at 250 msec but not at 200 msec. At a 300msec ISI. the performance difference was reversed. Collapsed over the 200-300-msec IS1 range, the dim-bright difference was not significant [t(108) = 1.431.
Perceptual processing appears to be largely complete by 200 msec following stimulus offset and entirely complete by 300 msec. However, given the pattern o f t values in Table 11, the results are somewhat ambiguous. One purpose of Experiment 2 was to replicate Experiment I with additional statistical power. c. Application of the Model. We applied the quantitative form of our model to the data of Experiment 1. To do so, it was necessary to select a value for the free parameter, c, and also to make assumptions about the effects of the superimposed noise masks. We set c to 3.7, a value estimated in an experiment to be described in Section 111. Based on other data (Loftus & Hogden. 1988; see also Sperling. 1986). we assumed that superimposing a noise mask would lower dt) and, thereby, constitute a multiplicative effect as defined earlier. That is, we assumed that increasing mask luminance lowers dt),the rate of extracting stimulus information. The bright mask is such that its superimposition reduces the informationextraction rate to zero. We allowed the reduction in dt) due to dim-mask superimposition to be a free parameter. We then found the value of this parameter that maximized the rank-order correlation between I . the predicted extracted information and d'. the obtained recognition-memory
Information Extraction from Visual Stimuli
I57
performance" over 33 total conditions: the I I conditions from Experiment I along with 22 conditions from Experiments 2 and 3. The best fitting dim-mask reduction was 5 I%. which produced an overall rank-order correlation of 0.89. For the I I conditions of Experiment I only, the obtained d' /predicted I correlation was 0.92. 2. Experiment 2: Information from the Icon and from the Physical Stimulus
Experiment 2 had two purposes. The first, as noted, was to replicate the essential aspects of Experiment I with more statistical power. The second was to begin investigating a central proposition of our model, which is that the same kind of perceptual processing is applied both to a physical stimulus and to an icon. If this proposition is correct, then any independent variable must have the same qualitative effect whether the variable is applied to the physical stimulus or to the icon. In Experiment I , we discovered that increased mask luminance led to decreased performance when the mask was superimposed on the icon. We inferred this decrement to be mediated by the mask's effect on perceptual processes. If the perceptual processes that operate on the physical stimulus are the same as those that operate on the icon, then increasing mask luminance must similarly lead to a performance decrement when the mask is superimposed on the stimulus. This prediction was tested in Experiment 2. u. Method. University of Washington undergraduates ( 1 10) participated in a I-hr session for course credit. They were run in 24 groups of 5-8 subjects per group. The same stimuli used in Experiment I were used in Experiment 2; however, the number of stimuli per tray was increased from 66 to 72, and a third 72-slide tray was added. The noise mask was the same as in Experiment I , and was displayed at the same two luminances. An experimental session consisted of a study phase followed by a test phase using each of the three slide trays in sequence. On each study trial, a target was displayed for 100 msec. A noise mask accompanied each target presentation at a target-mask IS1 of -50, -25, 0, 40, 100, or 200 msec.I2 There were thus 6 lSls x 2 mask luminances for a total of 12 experimental conditions. Within each tray, 36 stimuli were presented in the study phase. The 12 conditions were presented in random order with the restriction that each condition occurred once during each of the three 12-trial blocks. "We used d' scores to correct for false-alarm probabilities across the different experiments. "When the IS1 was negative. the mask temporally overlapped with the physical stimulus.
Geoffrey R. LORUSand John Hogden
I58
The sequence of events on each study trial was similar to that in Experiment I . Following the warning tonelfixation point, a 100-msec target stimulus was presented in conjunction with the 500-msec noise mask at its appropriate IS1 and luminance. The SOA between study trials was 3 sec. The test phase was identical to that of Experiment I , except that 72 test stimuli were presented in each of the three trays. Each of the 216 stimuli appeared as a target for 12 of the 24 groups and as a distractor for the other 12 groups. Each stimulus appeared once in each of the 12 conditions over the 12 groups for which it appeared as a target. h. Resrrlts and Discussion. The false-alarm probability was 0.309. Figure 4 shows hit probability as a function of stimulus-mask 1%; different curves are shown for the two mask luminances. The vertical dashed line indicates stimulus offset. All essential aspects of Experiment I were replicated in Experiment 2. Performance increased with increasing ISI, both when the mask was dim, [F(5,645) = 5.841 and when it was bright [F(5,645)= 92.581. As indicated in Table 111. t tests were again used to contrast the two masking conditions at each IS1 in order to assess the duration over which perceptual processes operate. These t( 129)s were significant over lSls from - 50 to 100 msec. At an IS1 of 200 msec. the difference of less than a percentage point between the masking conditions was not significant. At this ISi, the prob-
0.8 I
0.7
-
+ .5. 0.5 P 0.4 0.8
n
I
+ Dimmask +Bright
mask
0.3L12-standard error
0.2-100
-50
0
50
100
150
200
1 i0
IS1 (ms) Fig. 4. Experiment Z data. The dashed vertical line represents stimulus offset. Each data point is based on I170 observations.
Information Extraction from Visual Stimuli
I s9
TABLE Ill EXPERIMENT 2: t VALUES BETWEEN DIM-A N D BRIGHT-MASK PERFORMANCE AT EACHSTIMULUS-MASK 1sI" IS1 (msec)
- 50 - 25 0
40 100 200
t( 129)
12.86 11.00 7.06 3.04 I .93 0.53
"Positive values indicate dim-mask performance superiority.
ability of a Type I1 error is less than .05 if the true bright-mask/dim-mask performance difference is greater than .03. Taken together, the results of Experiments I and 2 suggest that perceptual processing is largely complete by about 200 msec following stimulus offset. Note in Fig. 4 that ISIS less than zero correspond to a superimposition of mask over the physical stimulus plus the icon, whereas ISIs of zero or more correspond to a superimposition of the mask over the icon only. The mask-luminance effect is qualitatively the same in these two situations. This is consistent with the proposition that the same perceptual processes govern information extraction from the physical stimulus and from the icon. c. Applicution ofthe Model. The predicted I values for the 12 conditions of Experiment 2 were computed using the 51% dim-mask r ( t ) reduction estimated from the 33 total conditions of Experiments 1-3. The across-conditions, rank-order correlation between predicted I and obtained d' for Experiment 2 was .92.
3. Experiment 3: I s Icon Ditrution Controlled by Time Since Stimiiliis Onset or by Time Since Stimulus Offset?
Di Lollo (1980, 1985) has proposed a model that is similar to ours in the sense that visible persistence is assumed to result from active processing. In Di Lollo's model, as in ours, the magnitude of active processing depends on time since stimulus onset; thus, persistence duration similarly depends on time since stimulus onset. Based on data from his missingdot paradigm (Di Lollo, 1980. described in detail later in this section), Di Loll0 contends that the kind of processing that generates visible persistence should be complete by roughly 150 msec fidlowing stimulus onset. Ac-
160
Geoflrey R. Loftus and John Hogden
cordingly. Di Lollo (1985) argued that the sort of processing that immediately follows short stimuli (shorter than about 200 msec) is qualitatively different from the sort of processing that immediately follows longer stimuli. Di Lollo claims that the former is based on a visible representation of the stimulus (an icon) whereas the latter is based on a nonvisible representation of the stimulus. In our terms, Di Lollo would claim that any processing following the offset of a stimulus that is longer than about 150 msec is conceptual, not perceptual. Experiment 3 was designed to evaluate Di Lollo's prediction and used the same paradigm as Experiments I and 2. Again, the presence or absence of an icon was inferred from the presence or absence of a mask-luminance effect. Stimulus duration was either 20 or 270 msec, and stimulus-mask IS1 was either 0 or 250 msec. Of particular interest was a comparison of the 20-msec stimulus/250-msec IS1 condition with the 270-msec stimulus/ 0-msec IS1 conditions. Time since stimulus onset is the same in these conditions (270 msec). while IS1 differs (0 vs. 250 msec). If, as Di Lollo argues, information extraction is determined by time since stimulus onset, then any mask-luminance effect must be the same in these two conditions. a. Mefhod. University of Washington undergraduates ( 133) participated in a I-hr session for course credit. They were run in 20 groups of 5-8 subjects per group. The two slide trays used in Experiment 1 were used in Experiment 3; however, there were 80 slides in each of the two trays. The noise mask was the same as in Experiments 1 and 2 and was displayed at the same two luminances. An experimental session consisted of a study phase followed by a test phase using each of the two slide trays. On each study trial a target was displayed for either 20 or 270 msec, and the stimulus-mask IS1 was either 0 or 250 msec. As in Experiments I and 2, the mask was either bright or dim. In addition to the 2 x 2 x 2 = 8 conditions produced by this factorial design, there were two no-mask control conditions involving stimulus durations of 20 and 250 msec. There were thus 10 conditions in all. Within each tray, 40 stimuli were presented at a study. The 10 conditions were presented in random order with the restriction that each condition occurred twice during each of the two 20-trial blocks. The sequence of events on each study trial was similar to that of Experiments I and 2. Following the warning tone/fixation point, the target was presented for its appropriate duration and followed by the appropriate ISI, which was followed, except in the control conditions, by a 500-msec mask at its appropriate luminance. The SOA between study trials was 3 sec. The test phase was identical to that of Experiments 1 and 2 except that 80 test pictures were presented in each of the two trays.
Information Extraction from Vbiuol Stimuli
161
Each of the 160 stimuli appeared as a target for 10 of the 20 groups, and as a distractor for the other 20 groups. Each stimulus appeared once in each of the 10 conditions over the 10 groups for which it appeared as a target. 6 . Resiilts and Discussion. The false-alarm probability was .294. Figure 5 shows performance (hit probability) as a function of stimulusmask ISI. The top panel shows performance for the 270-msec targets, while the bottom panel shows performance for the 20-msec targets. In both panels, different curves represent the two mask luminances, and the far right-hand points represent control-condition performance. Under what circumstances does perceptual processing occur? Essentially, the results shown in Fig. 5 indicate that the mask-luminance effect
-
0.8
X
: 7 No-mask control
5 .Q
0.7
Q
0.6
-
0.5
-
0.4
*
+ Dim mask + Bright mask
@ standarderror 1
". .
n7.
-
1
-
1
~
1
~
l
-
1
-
l
7 X
0.6-
.r
No-mask control
0.5-
Y
a
0.4-
+ Dim mask
0.3-
Bright mask
@
0.2
1
0
standard error
100
200
IS1 (ms)
300
Fig. 5. Experiment 3 data. Top, 270-msec stimuli: bottom, 20-msec stimuli. Each data point is based on 1064 observations.
I62
Geoffrey R. Lonus and John Hogden
is present-and. by inference, perceptual processing is ongoing-at 0msec IS1 following both short and long stimuli. The results also indicate that there is no perceptual processing following a 250-msec IS1, for either short or long stimuli. Of particular interest is a comparison of the two 270-msec SOA conditions. For the 270-msec stirnulus/&msec IS1 condition, the mask-luminance effect is relatively large (about 8%), whereas for the 20-msec stimulus/250-msec IS1 condition, the mask-luminance effect is relatively small (less than 1%). In short, the results of Fig. 5 provide evidence for a decaying icon following the offset of both short and long stimuli. Table 1V provides the statistical evidence for these assertions. It shows t values contrasting the dim and bright-mask conditions for the four combinations of stimulus duration and 1SI. For both stimulus durations. the differences are statistically significant at the 0-msec ISI, but are nonsignificant at the 250-msec 1S1. c . Application cdthe Model. The predicted I values for the 10 con) ditions of Experiment 3 were computed using the 51% dim-mask r ~ t reduction estimated from the 33 total conditions of Experiments 1-3. The across-conditions, rank-order correlation between predicted I and obtained d' for Experiment 2 was 1.00. J. Why D o O w Conclusions Difffr from Di Lollo'.~i? Recall that, based on his data, Di Lollo concluded that visible persistence is determined by time since stimulus onset. The stimuli in Di Lollo's missing-dot paradigm consist of 24 dots that occupy 24 of the 25 squares in an imaginary 5 x 5 grid. The observer's task is to detect the location of the missing dot-an easy task if all 24 dots are presented simultaneously. To investigate the properties of visible persistence. Di Lollo presented the 24-dot array as two 12-dot groups separated in time. The idea is that detection of the
TABLE IV
EXPERIMENT 3: t VALUES BETWEEN DIM-A N D BRIGHT-MASK ISI/STIMULUS DURATION PERFORMANCE IN EACHSTIMULUS-MASK CONDITION" IS1 (msec)
Stimulus duration (msec)
0
250
20 270
8.93 3.31
0.40 0.84
"Positive values indicate dim-mask superiority. and each I is based on 132 degrees of freedom (do.
Information Extraction from Visual Stimuli
I63
missing dot depends on the degree to which the two 12-dot groups can be visually integrated. which in turn depends on the magnitude of group I visible persistence at the time of group 2 presentation. Two factors affect performance in the missing-dot paradigm. First, as group Vgroup 2 IS1 increases. performance decreases. Second, however, as the drrration qf the grorrp I presentation increases, performance decreases in virtually an identical manner. Even with a group I/group 2 IS1 of zero, performance is essentially at chance when group I duration is longer than about 200 msec. This finding formed the primary basis of Di Lollo's claim that visible persistence is determined by time since stimulus onset (SOA), rather than time since stimulus offset (ISI). Why does the paradigm used in the present experiments yield a different conclusion'? There are several possibilities. First, as Coltheart ( 1980) argues, it may be that visible persistence (underlying performance in the missing-dot paradigm) and information extraction (underlying performance in the present paradigm) are mediated by fundamentally different processes. Second, it may be that the same process mediates the results of both paradigms, but that some quantitative difference between stimuli in the two paradigms is responsible for the difference in results. We argue for the latter possibility. We suggest, in particular, that relevant information is extracted much faster from relatively simple dot patterns than from relatively complex naturalistic pictures. I n our model, faster information extraction is represented by a higher value of the function h - o r , in the quantitative model, a higher c value-for dots relative to pictures. Figure 6 shows the predicted Experiment 3 I values for two situations. The top panel shows predictions for c value of 3.7 used so far. The bottom panel shows predictions for a much higher value, c = 15. (We will demonstrate later that c = 3.7 is appropriate for the complex pictures used in the present experiments, whereas c = I5 is appropriate for the simple dot patterns used by Di Lollo.) When c = 3.7, r ( t ) is still about 0.368 at the offset of a 27CLmsec stimulus [because r ( t ) is in units of the proportion of total stimulus information/ sec, a value of 0.368 is relatively high]. In contrast, when c = 15, 41)has fallen to about 0.017 at the offset of a 270-msec stimulus. In general, when c = 3.7, the predicted results correspond well to our obtained Experiment3 results. When c = IS, the predicted results correspond to Di Lollo's ( 1985) prediction: perceptual processing on the 270-msec stimulus has ceased by the time of stimulus offset. 4.
General Discrrssion: Experiments 1-3
Experiments 1-3 produced several noteworthy empirical findings. First, the results of Experiments 1 and 2 indicate that perceptual processing continues for approximately 200 msec followng stimulus offset, at least
R. Loftus and John Hogden
C&y
164
f
]0 ]1 .
=
0.8
270
mm
2 0 . 0 ! .
4.
I
0
.
mr
DbnMdt
., . ,
,
50
100
Stlmull
150
.
I
.
1
.
200
250
3 0
200
250
300
IS1 (ms)
-50
0
50
100
150
IS1 (ms) Fig. 6. Quantitative model: Experiment 3 predictions for two values of C. Top. results corresponding to present Experiment 3 (c = 3.7. complex stimuli). bottom. results predicted by Di Loll0 (1985) (c = 15. simple stimuli).
when stimuli are complex pictures presented for 40-100 msec. Second, the results of Experiment 2 indicate that the effect of mask luminance is qualitatively the same whether the mask luminance is applied to the physical stimulus or to the icon that follows. Third, the results of Experiment 3 indicate that perceptual processing continues following stimulus offset, even when stimuli are as long as 270 msec. The Experiment 3 results also indicate that perceptual processing has ended by 250 msec following stimulus offset at least for pictures ranging from 20 lo 270 msec in duration.
u. Model-Dufu Comparisions. Qualitatively, the data were in accord with the model. The estimated poststimulus duration of perceptual pro-
Information Extraction from Visual Stimuli
I65
cessing was about 200 msec, which is within the range of visible persistence durations estimated using other paradigms. The influence of at least one variable-mask luminance-was qualitatively the same when applied to stimulus or to icon. These qualitative tests of the model are, however, quite weak. Of somewhat more interest is that the quantitative tits of the model to the data were also quite good. Over all 33 conditions of Experiments 1-3. the correlation between model and data was 0.89. Because the three experiments used different subjects. were run at different times, and had different falsealarm probabilities, one would expect the within-experiment tits to be better than the between-experiment fits, and indeed the within-experiment correlations ranged from 0.92 to I .00. b. The Relation between Icon Worth and Icon Durution. In Experiments I and 2, we estimated the duration of poststimulus perceptual processing to be roughly 200 msec; this was the duration at which mask luminance no longer had a statistically significant effect. This estimate agrees reasonably well with previous estimates of icon duration obtained from quite different paradigms (e.g., Eriksen & Collins, 1%7; Haber & Standing, 1970; Sperling. 1960). thereby lending credence to the proposition that the same process is being measured in all instances. But does a 200-msec duration estimate correspond reasonably with the 100-msec estimate of icon worth obtained by Loftus et ul. (1985)? Icon worth and icon duration are not the same thing, but they are certainly related, and within the context of our model they are related quite specifically: the icon's worth is the area under b(t), the iconic-decay function (see Fig. 2, top). In particular, as expressed in Eq. (lq), the quantitative model posits exponential decay in which the icon's worth is the decay constant. Suppose that we take the quantitative model seriously. From Eq. (Iq), we can calculate that, at 200 msec following stimulus offset, available information, u ( t ) is about 0.14. Thus, according to the model, about 14% of stimulus information remains available-and information extraction continues to occur-at a poststimulus interval by which, according to the results of Experiments 1 and 2, perceptual processing has ceased. Does this mean that Experiments I and 2 disconfirm the quantitative model'? There are two reasonable answers to this question. The first is yes: exponential decay may be an incorrect description of available poststimulus information. The second, however, is that exponential decay, or something close to it, may be correct, but our experimental power may be insufficient to detect any 200-ms ISI, bright-maskldim-mask any performance difference that actually exists. To assess this possibility we can use the model to calculate total extracted information in both the dimand bright-mask 200-msec IS1 conditions. For a 40-msec stimulus (as used
Geoffrey
166
R. Loflus and John Hogden
in Experiment I ) , these values are I = 0.389 and I = 0.374 for the dim and bright-mask conditions, respectively. For a 100-msec stimulus (as used in Experiment 2). the corresponding I values are 0.5 I 1 and 0.498. In both cases, the predicted I difference between the 200-msec IS1 dimand bright-mask conditions is quite small. What about predicted performance differences? Although the function m that maps I onto performance is not specified by the model, it can nonetheless be estimated. Recall that we obtained a rank-order correlation of 0.89 between predicted I and obtained d' over the 33 conditions of Experiments 1-3. This &(I) function is an estimate of the particular m that maps I onto d'. From it. along with the observed false-alarm probabilities, we can obtain a corresponding estimate of the m that maps I onto hit probability (the performance measure on which the statistical analyses were performed.) In the range of interestroughly I = 0.3 to I = 0.5-this latter function is approximately unitslope linear. This means that the 200-msec ISI, dim-mask/bright-mask performance differences are predicted to be only about 0.015 and 0.013 for the 40- and 100-msec stimuli of Experiments 1 and 2. These predicted differences are less than the standard errors of the mean. In summary, given a 100-msec icon worth, the model specifies that approximately 14% of stimulus information remains available at a 200-msec poststimulus interval and that information extraction continues to occur. However, a close examination reveals that the predicted difference in amount of available information that is actually ucqitired in the dim- vs. bright-mask condition-and the corresponding performance differenceis too small to be detected experimentally. This, in turn, means that a 100-msec icon worth is consistent with the results of Experiments I and 2.
111.
Phenomenological Appearance
As we noted earlier, a salient characteristic of an icon is that it appears to be an extension (albeit a fading extension) of the physical stimulus. We
now concern ourselves with this phenomenology. We first extend our model to account for the conscious experience of a stimulus, and we then present four experiments in support of this extension. A.
I.
EXTENSION OF T H E MODEL Overview
What might underlie phenomenological appearance'? Sperling (e.g., 1960, 1963, 1967; Averbach & Sperling, I961 ; see also Erwin, 1976) characterized a fading icon in terms of decay of available information. This suggests an
I67
Information Extraction from Visual Stimuli
extension of the model in which phenomenological appearance is equated with u(r),the proportion of available information. By this notion, the icon would remain phenomenologically present until u(r)dropped below some criterion, ucril.This model is illustrated in Fig. 7 where u(r) is shown as a function of time since stimulus onset for 20- and 270-msec stimuli. The horizontal line represents ucri,.and duration of visible persistence is represented by the double-headed arrows between the time of stimulus offset and the time at which u(r) crosses ucril. Figure 7 indicates one obvious property of this model: persistence duration is independent of physical stimulus duration. However, this property conflicts with data from a variety of paradigms in which estimated persistence duration is found to be a decreasing function of physical stimulus
8 20-mr Stimuli
0
200 300 400 500 Tlmr Slncr Stlmulur Onrrt (mr)
100
61
-r
270-mr Stimuli 1.o
0.8 0.6
0.4
-
-
0.2 0.0
0
200 300 400 500 Tlmr Slncr Stlmulur O n r t (mr)
100
6
Fig. 7. Quantitative model: an extension in which phenomenologic;il appearance is represented by t i ( / ) . the available information. The horizontal line represents the criterion c d / ) below which the stimulus is reported to have disappeared.
168
Geoffrey R. Loftus and John Hogden
duration. One such paradigm is the Di Lollo missing-dot procedure described earlier. Another, which we use in Experiments 4-7. is a synchronyjudgment paradigm (e.g., Efron, 1970). In a synchrony-judgment paradigm, a target stimulus is presented for some duration d. Following stimulus offset is a variable interval, at the end of which is a synchrony signal. such as an audible click or a second visual stimulus. The observer's task is to adjust the stimulus-signal interval such that the signal appears to just coincide with the phenomenological disappearanceof the target stimulus. The duration of the interval set by the observer thereby constitutes an estimate of visible persistence duration. 2.
The Information-Extraction Rate as a Mediator of Persistence
We have seen that equating phenomenological appearance with a(t)will not account for the observed negative relation between stimulus duration and persistence duration. Another means of extending the model is to equate phenomenological appearance with 41).the rate of extracting information from this stimulus. By this notion, the icon would remain phenomenologically present until 41)dropped below some criterion, rcril.This idea is not entirely new; similar proposals have been made by Di Lollo (1980) and Erwin (1976; Erwin & Hershenson, 1974). As we illustrate below, such an extension accounts for the relation between stimulus duration and persistence duration. It also accounts for other data showing effects on persistence duration of the amount of to-be-extracted information in the stimulus (Avant & Lyman, 1975; Erwin, 1976; Erwin & Hershenson. 1974). Assumption 5: Phenomenological Appearance. An observer remains phenomenologically aware of a stimulus until 4t), the rate of extracting information from the stimulus falls below some criterion, rcri,.This model is illustrated in Fig. 8 where 41)is shown as a funclion of time since stimulus onset for 20- and 270-msec stimuli. The horizontal line represents rcrilrand duration of visible persistence is represented by the double-headed arrows between the time of stimulus offset and the time at which r ( r ) crosses rcri,. Assumption 5 is essentially that conscious experience of a stimulus results from extracting information from the stimulus. This notion is similar to one in' the selective attention literature that conscious experience results from attending to the stimulus (cf. James, 1890; Norman, 1976).
B. APPLICATIONS OF THE MODEL I. Evaluation Procedures This extension of the model makes a global prediction: any variable that affects 4th the information-extraction rate, must concomitantly affect
Information Extraction from Visual Stimuli
/Ii
-f 0
4.0
-
I 69
1
lOm8 Stlmull
3.0: 2.0
-
1 1.0-
t 3 .-
0.0-.
0
100
200
300
400
500
6d0
100
200
300
400
500
6 0
Tima Slnm Stlmulua Onut (mi) Fig. 8. Quantitative model: an extension in which phenomenologicalappearance is represented by r f i ) . the information-extraction rate. The horizontal line represents the criterion r f i ) below which the stimulus is reported to have disappeared.
persistence duration. We show for both existing data and for the new data of Experiments 4-7 that this prediction is confirmed when dr) is manipulated in a variety of ways. Most versions of our model. including the quantitative version described earlier, make strong predictions about the effect of one particular independent variable-stimulus duration-on persistence duration. In Experiment 4, we vary stimulus duration and find the best-fitting value of the model parameter, c. Using the best-fitting c value we then illustrate predictions about the effects on persistence duration of several other independent variables. Following our evaluation of these predictions in Experiments 5-7. we discuss the limitations of this kind of model-evaluation procedure.
I70
C e o f h y R. Lonus and John Hogden
2. Appliccition ofthe Model to Existing Data
In this section. we describe application of the model to stimulus duration, stimulus luminance, and stimulus informational content. a . Stitnrrlirs Duration. It has typically been found that persistence duration decreases with increasing stimulus duration (e.g.. Efron. 1970; Di Lollo. 1980; Haber & Standing, 1970). The model's account of this effect is illustrated in Fig. 8, and it can be seen that the model correctly predicts the data for the following reason. Because r ( 0 decreases with increasing I ( ( ) , r ( t ) decreases over the time during which the stimulus remains physically present. Therefore, following a short stimulus, r ( f ) is relatively high at stimulus offset and takes a relatively long time to fall to any given criterion level. Conversely. following a longer stimulus. r ( t ) is lower at stimulus offset and takes less time to fall to the same criterion.
h. Stimrrlrrs Lrrminance. The effects of stimulus luminance on persistence duration are somewhat mixed. The typical effect of increasing luminance is to decrease persistence duration (e.g., Allport. 1970; Bowen, Pola. & Matin, 1974; Dixon & Hammond, 1972; Efron & Lee, 1971). although, occasionally. increasing luminance increases persistence duration (e.g., Sakitt, 1976). Loftus (l985a) has shown that manipulating stimulus luminance can affect 40. the information-extraction rate: with sufficiently low luminance. r ( 0 is decreased. Luminance, or any variable that affects r ( t ) , can have two counteracting effects on persistence duration. The first, and most straightforward, is that decreasing r ( t ) decreases persistence duration since persistence duration depends on r(0. This is illustrated in the left panel of Fig. 9. where bright and dim stimuli have c' values of 3.7 and 2.0, respectively. It is evident that bright stimuli are predicted to have longer persistence than dim stimuli. contrary to most (although not all) of the extant data. Recall. however, that r ( t ) decreases with increasing I ( ( ) . This means that a sufficiently large initial r ( t ) can cause such a rapid increase in I ( 0 that r ( f ) itself rapidly declines. Under appropriate circumstances, the decline is such that r ( 0 eventually becomes less than it would have been had r ( f ) been smaller to begin with. This seemingly convoluted assertion is illustrated in the right panel of Fig. 9 which depicts a situation in which the cihsolrrte c' values are very large: bright and dim stimuli have c values of 20 and I I , respectively." Here, r ( 0 is initially much larger for bright than for dim stimuli; however, the bright and dim r ( f ) curves eventually
"Note the change in scale in the right relative to the left panel of Fig. 9.
Information Extraction from Visual Stimuli
171
4
Y
t’
* BrigMStimJli + DimBimli
g 2
d
b
t
0
100
200
300
Tlme Slnm Stimulus Onset (nuec)
400
0
100
200
300
400
Tlm Slnm Stlmulur Onwt (msec)
Fig. 9. Quantitative model: predicted information-extraction rate for bright and dim stimuli. Left, variation in stimulus luminance (complex photo stimuli): righi. variance in stimulus luminance (simple stimuli). In both panels. the horizontal line represents the criterion r f r ) below which the stimulus is reported to have disappeared.
cross, and persistence duration is thus greater for dim than for bright
stimuli. As discussed earlier, large absolute c values are characteristic of simple stimuli from which relevant information is extracted very quickly. Small absolute c values, in contrast, are characteristic of complex stimuli, from which relevant information is extracted more slowly. Thus, the right panel of Fig. 9 describes the simple stimuli typically used, and its prediction is confirmed (e.g.. by Bowen et al., 1974).The left panel of Fig. 9 describes the complex stimuli used in the present research. Its prediction-that persistence duration should increase with luminance-was confirmed in Experiment 7 to be described shortly. c. Informational Content of the Stimulus. Erwin (1976) measured persistence duration of letter strings that varied in terms of approximation to English. In one set of conditions, the letters had to be remembered and eventually reported; in the other set of conditions, the letters did not have to be remembered. Erwin found that persistence duration decreased with increasing approximation to English, but only for to-beremembered letters. Thus, higher informational content (as instantiated by a lower approximation to English) leads to longer persistence. The model’s account of these data is straightforward. With more information to be extracted, the information-extraction rate must remain higher for a longer period of time; thus, with more information it will take longer for r ( t )to fall to any criterion level. This leads to longer persistence duration.
I72
Geollrey
R.
Loftus and John Hogden
C. EXPERIMENTS 4-7: DURATION OF VISIBLE PERSISTENCE FOLLOWING STIMULUS OFFSET In Experiments 4-7, we used a synchrony-judgment task to estimate persistence duration. Recall that in a synchrony-judgment task, the observer adjusts the IS1 between a stimulus and a synchrony signal such that the signal occurs at the time of phenomenological stimulus offset. In each experiment, we tested the model's prediction that decreasing the information-extraction rate, d t ) , must decrease persistence duration. I . Experiment 4: Decreasing Information-Extraction Rate by Increasing Stimulus Duration
As indicated, previous experiments have demonstrated a negative relation between stimulus duration and persistence duration. However, the stimuli used in these experiments were very simple, often consisting of small, monochromatic light patches. The first purpose of Experiment 4 was to replicate the stimulus-duration effect using the complex scenes from Experiments 1-3. The second purpose was to estimate the model parameter, c . a . Method. University of Washington graduate and undergraduate students (6) served as paid subjects. Each participated individually in a 1.5-hr session. The stimuli were 12 of the slides used in Experiments I3. All stimuli were attenuated by one log unit relative to the projector luminance. The masking slide used in Experiments 1-3 was used in Experiments 4-7 as a synchrony signal. An experimental session consisted of 12 practice trials followed by 144 test trials. Each trial involved a single target stimulus presented at one of six durations: 20, 80, 140, 200, 260, or 320 msec. A trial consisted of a series of presentations, each presentation made up of a 500-msec warning tone/fixation light, followed by the target stimulus, followed by a blank ISI, followed by the noise mask. The subject's task was to adjust the IS1 across presentations in such a way that the mask appeared to coincide with the complete phenomenological disappearance of the stimulus. Within a trial, stimulus duration remained constant across presentations. The stimulus-mask IS1 adjustment procedure worked as follows. At the start of each trial, the IS1 was set either to 0 or to 480 msec. Following each presentation, the subject requested, via one of two response keys, either an increment or a decrement in ISI. The IS1 of the next presentation was accordingly lengthened or shortened by an increment/decrement that was initially set to 80 msec. After each reversal (a requested decrement followed by a requested increment or vice versa) the magnitude of the increment/decrement was halved. The persistence duration estimated on
Information Extraction from Visual Stimuli
I73
each trial was defined to be the mean of the two lSls just preceding and following the fourth reversal. The six target durations were factorially combined with the two start intervals to produce 12 conditions. For each subject, the 156 total trials (12 practice trials plus 144 test trials) were divided into 13 12-trial blocks. Within each block, each stimulus and condition was presented once. The 12 stimuli were counterbalanced over the 12 conditions across the 12 test blocks via a Latin Square. The initial ordering of conditions across trials within a block was randomized anew for each subject. 6 . Results und Discussion. There was no interaction of start IS1 with stimulus duration [F(5,25)< I]; accordingly, the data were collapsed across start 1%. Figure 10 shows the function relating persistence duration to stimulus duration, d (the solid lines through the data points are described below). As expected from past results, this function declines. In the present experiment, the decline is approximately linear. The slope-approximately - 0.3 I msec of persistence duration per millisecond of stimulus durdtionis substantially shallower than the - I slope obtained by others (e.g.. Efron, 1970). We shall have more to say about this shallower slope shortly.
0
too
200
300
400
Stimulus Duration (ms) Fig. 10. Experiment 4 data. Diamonds represent data pints and the solid line represent\ the model prediction. Each data point is based on 144 observations.
I74
GeoNrey
R. Loflus and John Hogden
c'. Application of the Model. We applied the quantitative form of our model to the data. allowing two free parameters. The first is c'. which reflects how quickly the information-extraction rate, d t ) , declines with increases in f ( t ) ,the total extracted information. The second is rcri,.the criterion rate at which the stimulus is reported to have vanished (see Fig. The 8). The best fit was provided by c' = 3.70 and rcri,= 8.5%/~ec.'~ The predictions of the model are shown by the solid lines in Fig. root-mean-squareerror between predictions and data (5 msec) is less than the standard error of the data (15.6 msec), indicating a statistical confirmation of the model. The relatively shallow slope relating persistence duration to physical stimulus duration is consistent with the model. As we have noted, simpler stimuli, such as those used by Efron, are associated with rapid extraction of relevant stimulus information. Within the context of our general model, rapid information extraction is represented by higher values of the function h(O-or, within the context of the quantitative model, a higher c value. Higher h (e.g.. c) values, in turn, lead to more dependence of r(t)on stimulus duration and, thus, to a steeper slope. In the quantitative model. for example, a - I slope emerges when c is about 15.
2. Experiment 5: Reducing lnformrition-Extrcic'tionRate with N Superimposed Mask Our model's account of the Experiment 4 data incorporates the idea that stimulus duration affects r ( f ) at the time of stimulus offset; a longer stimulus leads to a lower r(t), which, in turn, leads to shorter persistence durations. However, there is an alternative explanation involving sensory adaptation. Each stimulus presentation is preceded and followed by relative darkness. Perhaps longer stimuli lead to greater light adaptation. which, in turn, somehow leads to a shorter perceived icon. Experiment 5 was designed in part to evaluate this possibility. In Experiment 5 , all target stimuli were shown for 150 msec. Three conditions were defined by what occurred during the first 50 msec of stimulus exposure. In a bright-mask condition, a bright mask was superimposed over the target for the first 50 msec; in a dim-mask condition, a dim mask was superimposed over the target during this period; in a control condition, no mask was superimposed. ' T h e parameter c was also fit by Loftus. Hanna. and Lester (1988) in a set of picturerecognition experiments that. apart from using the same stimuli, bore no resemblance to the present Experiment 4. LoRus eful. obtained best-fittingc value of 3.4, which is remarkably close to the value of 3.7 obtained here. "The linearity of the model's fit is only approximate. i.e., the assumptions of the model do not imply linearity of this function. With other parameter values, substantial departures from linearity would occur.
Information Extraction from Visual Stimuli
I75
We know from the results of Experiment 2 that the greater the luminance of a superimposed mask, the more impaired is information acquisition. According to our model, this means that increased mask luminance can lead to smaller I ( r ) and, thus, a greater r ( r ) sometime following stimulus offset. This prediction is illustrated in Fig. I I for the bright- and no-mask conditions. Here, c' is the usual 3.7 for the no-mask condition. We assume that superimposing a bright mask reduces r ( t ) to 0 during the time the bright mask is physically present and that following bright-mask offset, r ( t ) returns exponentially to 3.7 with a time constant of 100 msec. The model predicts persistence to be greater in the bright-mask condition. The adaptation-level hypothesis, in contrast, predicts that increased mask luminance should lead to greater light adaptation and, thus, to shorter persistence duration.
u. Method. Members of the University of Washington Psychology Department (12) served as subjects. Each participated individually in a I-hr session. The stimuli were 60 of the pictures that had been used in Experiments 1-3 and included the 12 pictures used in Experiment 4. Two copies of the noise mask were prepared. The first was used as a synchrony signal, exactly as in Experiment 4. The second was, in some conditions, projected physically superimposed over a target stimulus. The procedures for obtaining estimates of persistence duration were identical to those of Experiment 4.
r
Fig. I I . Quantitative model: predictions of variation in mask luminance for Experiment 5. The curves show information-extraction rate for stimuli over which a bright or ii dim mask is superimposed during the first 50 msec of stimulus exposure. The horizontal line represents the criterion r ( r ) below which the stimulus is reported to have disappcared.
I76
Geoffrey R. Loftus and John Hogden
Three mask-luminance conditions were defined by superimposing a bright mask, a dim mask, or no mask on the target picture during the first 50 msec of the target's 150-msec total duration. The dim and bright masks were as in Experiments 1-3. The three mask-luminance conditions were factorially combined with the two start ISls for a total of six experimental conditions. Each of the 60 stimuli was shown only once to a given subject; thus, there was a total of 60 trials. The six conditions were presented in random order with the restriction that each condition occurred twice during each 12-trial block. Stimuli were counterbalanced over the six conditions across subjects; thus, across the 12 subjects there were two complete replications. b. Results und Discussion. There was no interaction of start IS1 with mask luminance lF(2.22 = 1.211; accordingly, the data were collapsed across start ISI. Table V shows estimated persistence duration for the three masking conditions, again collapsed over start interval. Persistence duration is longer with greater mask luminance, thereby confirming our model and disconfirming the adaptation-level hypothesis.
3. Experiment 6: Inweusing Information-Extraction Rate by Lowering Stimulus Luminance
It might be argued that the results of Experiment 5 could be explained by simply assuming that the brighter the overall stimulus configuration, the longer the persistence duration. This result has occasionally been found (e.g., Sakitt, 1976). although the opposite relationshipa negative relation between stimulus luminance and persistence duration-is more typical (cf. Coltheart. 1980). Experiment 6 was designed to test this possibility. Experiment 6 was similar to Experiment 5 except that the luminance of the superimposed mask remained constant while the luminance of the target stimulus was varied. A great deal of evidence indicates that a mask will interfere with dimmer stimuli more than with brighter stimuli (e.g., Eriksen, 1966; Eriksen & Lappin, 1964). According to our model, therefore, dimmer stimuli, from TABLE V EXPERIMENT 5 DATA:ESTIMATED PERSISTENCE DURATION (MSEC) THE THREE MASK-LUMINANCE CONDITIONS" No mask
Dim mask
Bright mask
217
284
306
"Standard error, I I msec. Each data point i s based on 240 observations.
FOR
Infwmation Extraction from Visual Stimuli
+
0
I77
BdghtStimuY
MmStimuY
100
n m Sin-
200
300 Stimulur pnrrt (mt)
400
r
Fig. 12. Quantitative model: predictions of variations in stimulus luminance (superimposed mask) for Experiment 6. The curves show information-extr;iction rate for bright or dim stimuli over which a mask is superimposed during the first 50 msec of stimulus exposure. The horizontal line represents the criterion r f t ) below which the stimulus is reported to have disappeared.
which less information has been extracted. will have a higher informationextraction rate following stimulus offset. This prediction is illustrated in Fig. 12. Here, it is assumed that r ( f ) for a bright stimulus is unaffected by the mask, but that r ( f ) for a dim stimulus is reduced to zero by the mask. It is further assumed that r ( f ) is normally 3.7 for bright stimuli and 3.0 for dim stimuli. Again, following mask offset, df) returns exponentially to its normal level. The prediction is that dimmer stimuli persist longer than brighter stimuli. a. Method. Members of the University of Washington Psychology Department (6) served as subjects. Four had participated in Experiment 5 . Each participated individually in a I-hr session. The stimuli and noise masks were those used in Experiment 5 . The target pictures were presented for 150 msec. The superimposed mask still occurred during the first 50 msec of target presentation. Target pictures were either the same luminance as they were in Experiment 4, or they were attenuated by 0.5 or I .O log units. The concurrent mask that occurred during the first 50 msec of stimulus presentation was attenuated by 1.0 log unit relative to the bright mask of Experiment 4. The three target luminance conditions were factorially combined with the two start lSIs for a total of six experimental conditions. The counterbalancing procedures were identical to those of Experiment 5 .
I78
Geoffrey R. Lofius and John Hogden
h. Resitlfs and Discussion. There was no interaction of start IS1 with stimulus luminance, lF(2.10) < I]; accordingly, the data were collapsed across start 1S1. Table VI shows estimated persistence duration for the three stimulus-luminanceconditions, against collapsed over start interval. Persistence duration is longer with lower stimulus luminance. again confirming our model. 4. Experiment 7: Reditcing Informution-Extraction Rate by Lowering Stimulirs Luminance
The effects of stimulus luminance on persistence duration have been described earlier. In most reported experiments, the effect of lowering luminance is to increase persistence duration. However, the experiments that have demonstrated this effect have used simple stimuli in which cihsolute information-extraction rate is higher than with the present complex stimuli. Figure 9 illustrated why our model predicts that reducing luminance should, in contrast, decrease persistence duration when complex stimuli are used. Experiment 7 was designed to test this prediction. a . Method. The same 6 subjects, 12 stimuli, and apparatus used in Experiment 4 were used in Experiment 7. The Experiment 7 procedure was identical to the Experiment 4 procedure, except that the major dependent variable was stimulus luminance rather than stimulus duration. Stimulus duration in Experiment 7 was 100 msec. Stimulus luminance was either unattenuated or attenuated by 1.0 or 2.0 log units. Initial stimulus-signal IS1 was either 0 or 480 msec; thus, there were six experimental conditions. Each subject received six 12-trial blocks for a total of 72 trials. Each of the 12 stimuli occurred once, and each of the six experimental conditions occurred twice during each block. The 12 stimuli were counterbalanced through the six conditions over the six blocks.
h. Resitlfs und Discussion. There was no interaction of start IS1 with stimulus luminance [F(2,22) < I]; accordingly, the data were collapsed across start ISI. Table VII shows the results. Persistence is longer with higher luminance, again confirming our model.
TABLE VI EXPERIMENT 6 DATA: ESTIMATED PERSISTENCE DURATION (MSEC) THE THREE STIMULUS-LUMINANCE CONDITIONS"
"Standard error, 8
Bright
Moderate
Dim
273
317
336
msec. Each data point
is based on 120 observations.
FOR
Information Extraction from Visual Stimuli
I79
TABLE VII EXPERIMENT 7 DATA:ESTIMATED PERSISTENCE DURATION (MSEC) FOR T H E THREE STIMULUS-LUMINANCE CONDITIONS OF EXPERIMENT 7" Bright
Moderate
Dim
345
310
297
"Standard error. 14 msec. Each data point is based on 288 observations.
5 . General Discussion: Experiments 4-7 The results of Experiments 4-7 demonstrated several variables that affect persistence duration. The negative effect of physical stimulus dumtion (Experiment 4) was expected on the basis of numerous past results. The positive effect of pure variation in stimulus luminance (Experiment 7) was generally unexpected on the basis of past results, although there exist comparable data (e.g., Sakitt, 1976). The effects of superimposed noise masks (Experiments 5 and 6) represent a new manipulation and are, therefore, not predictable on the basis of past results. The effects of all four experiments, whether expected on the basis of past results or not, can be accounted for by our model. As illustrated in Figs. 8.9, I I , and 12 in each experiment. the manipulated variable affects r ( t ) . the information-extraction rate, following stimulus offset. To the degree that r ( t ) is higher (because of a shorter stimulus. a brighter superimposed mask, a dimmer stimulus over a superimposed mask, or a brighter otherwise uncontaminated stimulus) the model predicts the longer persistence that, indeed, was found. It is especially noteworthy, as demonstrated in Fig. 9, that the ordinal effect of luminance on persistence duration is predicted to depend on the state of other variables (such as the stimulus complexity). In a broad sense, this means that the model can account for the somewhat mixed effects of stimulus luminance that exist in the literature. To be candid, we must point out that we have used the quantitative form of our model with quite specific parameter values in order to formulate the predictions shown in Figs. 9. I I , and 12. Within the context of our quantitative model, other parameter values could be found that would incorrectly predict the outcomes of Experiments 5-7: likewise, other instantiations of the general model could be found that would incorrectly predict the results of all four experiments. So while we have shown the general model to be cupuhle of predicting the results of Experiments 47. we have not shown that it must invuriubly make the correct predictions for these experiments. [In contrast, for example. we have shown in Section
180
Geoffrey
R. Loftus and John Hogden
V, B that any form of the general model must predict the results of the Loftus et al. (1985) experiments.] In principle, therefore, the results of Experiments 4-7 place constraints on what forms of the general model are viable, although it is beyond the scope of this article (and probably very difficult) to formally characterize these constraints.
IV. Concluding Remarks A. INFORMATION ACQUISITION Assumptions 1-4, discussed in Section 11 of this chapter, constitute a model of both information extraction and the relation between extracted information and picture-memory performance. Assumptions 1 (available information), 2 (unidimensionality)and 3 (information-acquisition rate) are similar to corresponding assumptions in other information-acquisition models (e.g., Kowler & Sperling, 1980; Krumhansl, 1982; Loftus & Kallman, 1979; Massaro, 1970; Rumelhart, 1969). Assumption 4 weakly ties the main entity produced by assumptions 1-3 [extracted information, I ( t ) ] to any observed measure of memory performance.
I . Perceptual Processing and “Information:” The Unidimensionality Assumption As indicated earlier, the model does not precisely characterize what is “information.” Our crucial unidimensionality assumption, however, is that information, whatever it may be, can be represented by a single number. If one considers a picture’s eventual memory representation, this assumption is probably incorrect. A picture’s long-term representation probably consists of a variety of different kinds of information that differ qualitatively from one another. For instance, a good deal of evidence favors a dual-code model, which incorporates the fundamental assumption that a visually presented stimulus is encoded both visually and verbally (e.g., Paivio, 1971). Within the context of such a model, the information constituting a picture’s representation must be characterized by at least two numbers, one representing the state of visual information and the other representing the state of verbal information. It should be kept in mind, however, that the present model is not designed to characterize all of the encoding that results in a picture’s eventual memorial representation; it is designed to characterize perceptual processing only. The output of perceptual processing could reasonably be unidimensional.
Information Extraction from Visual Stimuli
181
2. Conceptual Processing As discussed by others (e.g.. Intraub, 1980, 1984; Potter, 1976; Loftus, Hanna, & Lester. 19881, the output of perceptual processing is transient in the sense that, without subsequent encoding, eventual memory performance is very low. The presumed encoding that operates on the output of perceptual processing and produces an ultimate memory representation has been termed conceptual processing. The exact nature of conceptual processing is not formally explicated by anyone. Roughly and informally, however, conceptual processing may be thought of as including rehearsal, verbal recoding, association of features within the picture, association of the picture to other pictures, and other higher-level, controlled cognitive processes. Lofius, Hanna, and Lester (1988) provide a.model of conceptual processing that uses extracted information, l(t),as defined in the present model, as input. 3. On the Relation between Extracted Perceptual lnformation and Memory Performance: The Monotonicity Assumption
Our model assumes a monotonic relation between extracted perceptual information, f ( 4 , and memory performance, f(4.However, monotonicity cannot universally apply at the individual-item level for the following reason. As just discussed, memory performance must be based on some eventual memorial representation of the picture, and this eventual representation constitutes the output of conceptual, as well as perceptual, processing. As we have noted, conceptual processing is not simple; rather it should be viewed as a clsss of diverse cognitive operations that are under the subject’s control. Therefore, given the same amount of extracted perceptual information, different patterns of conceptual processing could give rise to different memory representations and, thus, to different values of memory performance. Such a situation could arise, for example, when different subjects see the same picture under identical circumstances or when one subject sees different pictures under identical circumstances. In a properly counterbalanced experiment to which the model is applied, however, the monotonicity assumption could reasonably be correct at the statistical level; on the average, two items from which the same perceptual information has been extracted are expected to show the same performance.
B. PHENOMENOLOGICAL APPEARANCE Assumption 5 extends the model by equating phenomenological experience with information extraction.
Ix2
CeofFrey R. LoAus and John Hogden
I.
The Caitsul Relation between Phenomenology and Informution Extraction
There are two fundamental ways of viewing the relation between information extraction and phenomenological experience. The first is that phenomenological experience is an automatic process determined strictly by physical stimulus properties (e.g., luminance, contrast, duration). In this view, information extraction from the stimulus requires the phenomenological presence of the stimulus. The second view is that information extraction takes precedence. In this view, information extraction depends on physical properties of the stimulus and on the goals of the observer, whereas phenomenological experience is a by-product of the information-extraction process. Along with others (most notably, Di Lollo. e.g.. 1980; and Erwin, 1976). we favor this second view. We believe that phenomenological experience results from an active process rather than from the passive state of an informational store. Erwin ( 1976) has demonstrated a close relation between information extraction and phenomenological appearance. As we described earlier, Erwin showed that persistence duration decreases with increasing approximation to English. In a second experiment with the same stimuli, Erwin used a paradigm very similar to that of Experiments 1-3. He presented the stimuli followed by a mask after varying lSls and determined the IS1 at which further IS1 increases had no additional beneficial effect on subsequent memory performance. Erwin found that the approximation to English variable had the same effect on this “crucial IS1” as it did on persistence duration. Erwin claimed this effect as evidence for the proposition that the same variable underlies the icon as a basis for information extraction and the icon as a basis for phenomenological appearance. Erwin (see also, Erwin & Hershenson, 1974) interpreted these results in terms of a two-component model of persistence that is similar in many respects to our model. Erwin (1976) characterizes these two components as “a physical component whose duration is unrelated to stimulus parameters and an informational component whose duration is inversely related to the efficiency of encoding stimulus information” (p. 191). These two components correspond, in essence, to the a(t) and r(t) of the present model. A crucial difference between the two models is that Erwin ascribes phenomenological properties to both persistence components, whereas our model ascribes phenomenological properties to r ( t ) only. “The major difference between Erwin’s paradigm and the paradigm used in Experiments 1-3 i s that we varied mask luminance whereas Erwin compared all masking conditions (i.e.. all ISls) to a no-mask control condition. As we have noted earlier. however, a mask can have both perceptual and conceptual effects. For this reason, it is probably more accurate to use the mask luminance effect as the measure of perceptual processing.
Information Extraction from Visual Stimuli
183
2. The ‘‘Indefinite-Duration Stimulus” Problem Our model makes a seemingly paradoxical prediction. Consider that phenomenological appearance depends on dt), which declines with increasing I(t). This means that if the physical stimulus is left on indefinitely, dt) should eventually decline to the point that the observer will cease to be phenomenologically aware of the stimulus. At first glance. this prediction seems unreasonable. Assuming the model’s validity, there are two resolutions to this issue. First, this problem may simply represent a boundary condition on the model; that is, the rules that determine phenomenological experience may change for long relative to brief stimuli. Second, it may be true that the stimulus may phenomenologically vanish after some period of time. This second view is actually quite reasonable. Anecdotally, we have all had the experience of gazing at a visual stimulus-a page of text, a conversational partner, the scenery outside a car window-and suddenly realizing that “our thoughts have been elsewhere” and we have not been at all aware of what we were looking at. Empirically, the dichotic listening literature supports such a notion, at least in the auditory domain; when a subject is forced to attend to one auditory channel, e.g., by having to shadow it, there is no evidence that there is any conscious awareness of anything on the nonattended channel-subjects can remember nothing from it, and notice nothing that happens on it, even a change in language from English to German (Cherry, 1953; Cherry & Taylor, 1954; Moray, 1969). It may well be that phenomenological experience depends on a good deal more than simply what enters the sensorium. C. MEDIATINGPROCESSES A major issue that we have sought to address in this article is that of whether information extraction and phenomenological experience are simply two consequences 0f-i.e.. are mediated by-the same process. Bamber (1979) provides an excellent formalization and discussion of the nature of mediation. Among other things, Bamber describes necessary conditions for concluding that two (or more) performance measures are mediated by the same underlying hypothetical variable. One necessary condition is that any independent variable must affect one of the performance measures if and only if it affects the other(s) as well. As described earlier, Coltheart (1980) and others assert that visible persistence duration and performance in a partial-report procedure cannot be explained by the same underlying process because of certain independent variables (particularly stimulus duration) that affect one performance measure but not the other. If “underlying process” means a single, unidimensional variable, then, by Bamber’s logic, this assertion is correct.
Geoffrey R. Loftus and John Hogden
184
In this contribution, however, we have implicitly taken the position that an underlying process can be more complex than a single unidimensional variable. In particular, our model posits that one variable-total extracted information-mediates any kind of memory performance, whereas another variable-information extraction rate-mediates phenomenological awareness and, thus, persistence duration. While not the same, these two variables are intimately related: one is the derivative of the other. It is in this sense that we consider the system, composed of an information-extraction rate, and the resulting extracted information to be a unified process-and it is this process that mediates both memory performance and phenomenological experience.
V. Appendix A.
DERIVATION OF EQUATION (3C)
We start with the equation for 4th and assume that t < d. Thus,
or.
Since a(t) = 1.0 whenever t < d ,
Integrating both sides of Eq. (2).
H(f) = t
+x
(3)
where H ( f ) is the integral of l/h(l) and x is the constant of integration. To determine x, we use initial conditions of f = 0 when t = 0; thus, x = H(0) and
H(f)= t
+ H(0)
or, f = H-'
[f
+ H(0)I
which constitutes the top part of Eq. (3g).
(4)
I85
Informatino Extrsetloa from Visual Stimuli
Now assume that t > d. The equation ford?) is the same as in Eq. (I), but a(?)= b(t - d). Thus,
Integrating both sides of Eq. (6). H(1) = B(r - d )
where B(t - d)is the integral of b(t
-
+x
d)from 0 to ( t
(7) -
d). To determine
x, the constant of integration, we use initial conditions deriving from Eq. (3,that I = H - ' [ d H(O)] when t = d. Furthermore, B(t - d) = B(0) when t = d and, according to the model, B(0) = 0. Therefore,
+
x = H{H-'[d
+ H(O)]} = d + H(0)
(8)
and, substituting Eq. (8) into Eq. (7). H(I) = d
+ H(0) + B(r - d )
I = H-"d
+ H(0) + B(t - d)l
Finally, (9)
which constitutes the bottom part of Eq. (3g). B. DERIVATION OF THE RESULT
LOFllJS,
JOHNSON,
A N D SHlMAMURA
Consider a (d + w)-msec masked picture. Because the mask reduces 0, there is no icon, and the amount of extracted information is obtained by Eq. ( 5 ) Section V. A:
r(t) to
I = H-"d
+ H(0) + w ]
(10)
Now consider a d-msec delayed-mask picture. Such a picture is followed by an icon, and the amount of extracted information is determined by Eq. (7).with t equal to 03." By the model, B ( m ) = w . Therefore,
"The value of r should actually be 300 msec. the delay time of the mask. However. given the actual parameter values, E(300) is approximately equal to B W .
Gcomey
186
R. Loftus nnd Jobn Hogden
The equality of f in Eqs. (10) and ( I I ) indicates that the amount of information extracted from an immediate-mask, (d + w)-msec picture is equal to the amount of information extracted from a delayed-mask, dmsec picture. This is the Loftus, Johnson, and Shimamura finding.
c.
PROOF THATEQUATIONS (6A) AND (6B) IMPLY EFFECT
A
MULTIPLICATIVE
Consider level i of the independent variable: r(ti) =
dfldr, = Jrf)
or, dflJrf)= dri Integrating both sides of Eq. (I),
F(f) =
li
+x
where F is the integral of Ilfand x is the constant of integration. With initial conditions of f = 0 when r = x = F(0). therefore,
F(f)=
ri
+ F(0)
or f = F-'
[ri + F(O)]
Now consider level j of the independent variable: r(t,) =
dfldrj = cflf)
or,
Integrating both sides of Eq. (3), F ( f ) = crj + x Again with initial conditions of f = 0 when I = 0, x = F(O), and
F ( f )=
CIj
+ F(0)
Information Extraction h m Visual Stimuli
1x7
I = F - ' [ c f j + F(O)]
(4)
or
Equal performance for levels i and j implies equality of I in Eqs. (2) and (4). Setting this equality, applying the function F to both sides and canceling the F(0)s yields
ri
= crj
which is the definition of a multiplicative effect.
D. PROOFOF EQUATION (8y) At time ( r - d),
Therefore,
or,
Integrating both sides of Eq. (I),
where x is the constant of integration. To solve for x, we use initial conditions of f ( r ) - 0 at time ( t - d ) = 9. This gives
Substituting this value of x into Eq. (2).
The amount of information from the array is I(?) when f = co (since information extraction continues until available information has vanished). Substituting t = into Eq. (4). f(t) = 1.0 - e - [ c w exp ( - q / w ) l
which is Eq. (8q). ACKNOWLEDGMENTS The writing of this article and the research described was supported b y Grant No. MH41637 from the National Institute o f Mental Health. We thank Tony Greenwald, Aurd Hanna, Buz Hunt, Tom Nelson, John Palmer, and Karen Preston for useful comments on the previous versions of the manuscript.
REFERENCES Adelson. E. H.. & Jonides. J. (1080). The psychophysics o f visual storage. Journal of Experimental Psychology: Humun Perception and Performanre. 6, 486-493. Allport. D. A. (IW).The rate o f assimilation o f visual information. Psychonomic Science. I29 231-232. Allport. D. A. (1970). Temporal summation and phenomenal simultaneity: Experiments with the radius display. Quarterly Journal of Experimental Psychology, 22, 686-701. Avant, L.,& Lyman. P. (1975). Stimulus familiarity modifies perceived duration in prerecognition visual processing. Joiirnal of Experimental Psychology: Human Perception and Performance. 3,205-2 13. Averbach, E.. & Coriell. H.S.(l%l). Short-term memory in vision. Bell S.vstrins Tri.liniid Joirmul. 40,30-328. Averbach. E..& Sperling, G . (l%l). Short-term storage in vision. In C. Cherry (Ed.). Symposiirm on lnfonniitiun Tlieory (pp. 196-21 I). London: Butterworth. Bamber. D. (1979). State trace analysis: A method o f testing simple theories o f causation. Jorrrnul of M~itlieinutiiwlPsyc'lrologv. 19, 137-1 8 I. Biederman. I . ( 1972). Perceiving real-world scenes. Science, 177, 77-80. Biederman, I., Mezzanotte, R. J.. & Rabinowitz. J . C. (981). Detecting and judging scenes 14, 143-177. with incongruous contextual relations. Cognitive P.s~c1i~~1~~g.v. Biederman. I.. Rabinowitz. J. C.. Glass. A. L.. & Stacy. E. W.. Jr. (1974). On the information extracted from a glance at a scene. Jorrrnul c$Evperirnentiil Psyc~hology,103, 597-600.
Information Extraction from Visual Stimuli
I89
Bowen. R. W.. Pola. J., & Matin. L. (1974). Visual persistence: Effects of flash luminance, duration and energy. Vison Research. 14, 295-303. Cherry. E. C. (19531. Some experiments on the recognition of speech with one and with two ears. Journal of the Acoustical Society. 25, 975-979. Cherry, E. C., & Taylor, W. K. (1954). Some further experiments on the recognition of speech with one and two ears. Journal of the Acoustical Society. 26, 554-559. Coltheart, M. (1980). Iconic memory and visible persistence. Perception & Psychophysics, 27, 183-228. Di Lollo. V. (1980). Temporal integration in visual memory. Journal of Experimentul Psychology: General. 109, 75-97. Di Lollo. V. (1985). E pluribus mum: Rusus? A comment on Loftus, Johnson, and Shimamura's "How much i s an icon wotth?" Journal of Experimental Psychology: Human Perception and Performance, 11, 379-383. Dixon. N. F.. & Hammond. J. (1972). The attenuation of visual persistence. British Journal of PSVChOIOgy. 63, 243-254. Efron. R. (1970). Effect of stimulus duration on perceptual onset and offset latencies. Perception & Psychophysics. 8, 231-234. Efron. R.. & Lee, G.N. (1971). The visual persistence of a stroboscopically moving object. American Journal of Psychology. 84, 365-375. Eriksen, C. W. (1966). Temporal luminance symmetry effects in backward and forward masking. Perception and Psychophysics. 1, 87-92. Eriksen. C. W.(1980). The use of a visual mask may seriously confound your experiment. Perception & Psychophysics, 28.89-92. Eriksen, C. W.. & Collins, J. F. (1%7). Some temporal characteristics of visual pattern perception. Journal of Experimental Psychology, 74, 476-484. Ericksen. C. W.. & Lappin. J. C. (1964). Luminance summation-contrast reduction as a basis for certain forward and backward masking effects. Psychonamic Science. 1,313314. Erwin, D. E. (1976). Further evidence for two components in visual persistence. Journal of Experimental Psychology: Human Perception and Performance, 2, 191-209. Erwin. D. E., & Hershenson. M. (1974). Functional characteristics of visual persistence predicted by a two-factor theory of backward masking. Journal of Experimental Psychology. 103, 249-254. Golden, R. ( 1984). Pictures at recognition: Multidimensionally scaled pictures und forcedchoice recognition performance. Unpublished doctoral dissertation, University of Washington. Haber. R. N. (1983). The impendingdemise of the icon: A critique of the concept of iconic storage in visual information processing. Behavioral & Bruin Sciences. 6, 1-1 I . Haber. R. N. (1985). An icon can have no worth in the real world: Comments on Loftus, Johnson. and Shimamura's "How much i s an icon worth?" Journul of Experimental Psychology: Human Perreption and performance. 11, 374-378. Haber, R. N.. & Standing. L. (1970). Direct estimates of the apparent duration of a flash. Canadian Journal of Psychology. 24, 216-229. Intraub, H. (1980). Rcsentation rate and the representationof briefly glimpsed pictures in memory. Journal of Experimental Psychology: Human Learning and Memory, 6,142. Intraub, H. (1984). Conceptualmasking: The effects of subsequent visual events on memory for pictures. Journal of Experimental Psychology: Learning, Memory and Cognition, 10, 115-125. Irwin, D. E.,& Yeomans J. M. (1986). Sensory registration and informational persistence. Journal of Experimental Psychology: Human Perception and Performance, 12, 343360.
Ckoffkey R. Loftus and John Hogden
190
James, W. ( 1950). The principles of psychology. New York: Dover (originally published 1890).
Kowler. E. & Sperling. G. (1980). Transient stimulation does not aid visual search: Implications for the role of saccades. Perception & Psychophysics, 27, 1-10. Krumhansl. C. (1982). Abrupt changes in visual stimulation enhance processing of form and location information. Perception & Psychophysics. 32, 5 I 1-523. Loftus. G. R. (1972). Eye fixations and recognition memory for pictures. Cognitive PsvChoIogy. 3, 525-55 I . Loftus. G. R. (1983). The continuing persistence of the icon. Behuvioral & Bruin Sciences. 6, 28.
Loflus. G. R. ( M a ) . Picture perception: Effects of luminance level on available information and information-extraction rate. Journal of Experimental Psychology: General. 114, 342-356.
Loftus, G. R. (1985b). On worthwhile icons: Replies to Di Loll0 and Haber. Journul of Experimental Psychology: Human Perception and Performance, I I, 384-388. Loftus, G. R. (1986). Information acquisition rate, short-term memory and cognitive equivalence. Reply to Sperling. Journal of Experimental Psychology: Generul. 115,295-298. Loftus. G. R.. & Bell, S. M.(1975). Two types of information in picture memory. Joi4rncrl of Experimental Psychdogy: Human Learning and Memory, 104, 103-1 13.
Loftus. G. R., Gillispie. S.. Tigre. R. A., & Nelson. W. W. (1984). A computerized slideprojector laboratory. Behavior Research Methods. Instrumentation. und Computers. 16, 447-453.
Loftus. G. R., & Ginn. M.(1984). Perceptual and conceptual processing of pictures. Jor~rnul of Experimental Psychology: Leurning, Memory. and Cognition. 10,435-441.
Loftus. G. R., Hanna, A.. & Lester. L. (1988). Conceptual masking: How one picture steals allention from another picture. Cognitive Psychology. 20, 237-282. Loftus. G. R., & Hogden. J. (1988). Picture perception: Information extruction und phenomenological persistence. In preparation. Loftus. G. R.. Johnson, C. A.. & Shimamura. A. P. (1985). How much is an icon worth’? Journal of Experimental Psychology: Human Perception and Performunce, I I, 1-1 3. Loftus. G. R.. & Kallman. H.J. (1979). Encoding and use of detail information in picture recognition. Journal of Experimental Psychology: Human Learning and Memory, 5, 197-211.
Loftus. G. R., Truax. P. E.. & Nelson, W. W. (1986). Age-related differences in visual information processing: Quantitative or qualitative? In C. Schooler & K. W. Schaie (Eds.), Cognitive functioning and social structures over the life course. Long, G. M..& Beaton, R. J. (1982). The case for peripheral persistence: Effects of target and background lumiance on a partial-report task. Journal of Experimenrd Psychology: Human Perception and Performance. 8, 383-391. Massaro. D. W. ( 1970). Perceptual processes and forgetting in memory tasks. ~ . T y ~ h o / f > ~ i C f f / Review, 77,557-567. Moray. N. (1969). Attention in dichotic listening: Affective cues and the influence of attention. Quarter1.v Journul of Experimental Psychology. 11, 56-60.
Norman. D. A. (1976). Memory and attention. New York: Wiley. Paivio. A. (1971). Imagery and verbal processes. New York: Holt. Potter. M. C. (1976). Short-term conceptual memory for pictures. Journal of Experimentul Psychology: Human Leurning and Memory, 2, 509-522.
Potter. M. C.. & Levy. E. 1. (1%9). Recognition memory for a rapid sequence of pictures. Journal of Experimental Psychology. 81, 10-15.
Reinitz. M. (1987). The eflects of semantic priming on visual encoding. Unpublished doctoral dissertation. University of Washington.
Information Extraction from Visual Stimuli
191
Rumelhart. D. E. (1%9). A multicomponent theory of the perception of briefly exposed visual displays. Jortrnul of Muthemarical Psychology. 7 , 191-2 18. Sakitt. B. ( 1976). Iconic memory. f.syhologicu1 Review. 83, 257-276. Shaffer. W.0.. & Shiffrin. R. M. (1972). Rehearsal and storage of visual information. Jortrnul of Experimental Psychology. 92,292-295. Sperling. G. ( 1960). The information available in brief visual presentations. Psychologicul Monogruphs. 74, 1-29. Sperling, G. (1%3). A model for visual memory tasks. Humun Fucrors. 5, 19-31. Sperling, G. (1%7). Successive approximations to a model for short-term memory. Acru Psychokgicu, 27, 285-292. Sperling, G. (1986). A signal-to-noise theory of the effects of luminance on picture memory: Commentary on Loftus. Journul of Experimenrul Psychology: Generul. 115, 189-192. Tulving, E., Mandler. G.. & Baumal, R. (1964). Interaction of two sources of information in tachistoscopic word recognition. Cunudiun Journul .fPsychology. 18, 62-7 I . Yeomans. J . M.,& Irwin, D. E. (1986). Stimulus duration and partial report performance. Perception & Ps.vchoph.vsics. 37, 163-169.
This Page Intentionally Left Blank
WORKING MEMORY, COMPREHENSION, AND AGING: A REVIEW AND A NEW VIEW Lynn Husher Rose T. Zucks
I. Introduction A reasonable case can be made for the view that linguistic competence remains invariant across the adult life span (Light, 1988). Contrast this conclusion with one derived from the literature on aging and memory: Here age deficits of varying sizes are common (see Craik. 1983; Kausler, 1982). This is important to our general concern with discourse comprehension because there is every reason to believe that linguistic performance is constrained by memory functioning (Clark & Clark, 1977; Just & Carpenter, 1987). Consider, for example, the performance of younger and older adults on two different tasks tapping knowledge of word meanings (Bowles & Poon, 1985). Younger and older adults did not differ on a task which required them to determine if each of a series of letter strings was a word. However, older adults showed poorer performance (as measured by accuracy and speed) on a task which required them to produce target words when cued with their definitions. The important difference between the two tasks appears to be the greater retrieval demands made by the definition task. These results fit well with the contention that memory factors are important determinants of the degree of age differences in linguistic performance. Indeed, even the overarching objective of linguistic competence, comprehension, is constrained by performance circumstances that may well THE PSYCHOLOGY OF LEARNING AND MOTIVATION. VOL. 22
I93
Cupyrighl 8 1988 by Academic Prcrr. Inc. All righls of reproduction in sny form rcurvecl.
194
Lynn Hasher and Rose T. Zpcks
be memory based: understanding and remembering are substantially impaired for older adults as compared to younger adults when a message is presented rapidly rather than slowly (e.g., S h e . Wingfield, & Poon. 1986) or when it contains syntactic structures that put heavy as compared to light demands on working memory (e.g.. left-branching clauses vs. rightbranching clauses; Kemper. 1988). We begin this article with an overview of the theoretical and empirical literatures that address aging and discourse comprehension. We then present a series of five studies which were guided by a particular working memory viewpoint regarding the formation of inferences during discourse processing. The data, although in broad agreement with our initial framework, suggested that an altered viewpoint might be more useful in guiding further research. In the next section we turn to a critique of working memory models and of the broader category of limited capacity models that they exemplify. Our data, together with these criticims. lead us to propose, in the final section, a new framework for conceptualizing working memory, one that draws on ideas from current parallel-architecture attention theories, from social cognition, from classic interference theory of forgetting, from work on reading and discourse comprehension, and from cognitive gerontology. It is a framework developed from our interest in normal aging and from our assessment that breakdowns in cognition, as occur with aging (and possibly with depression, chronic high arousal, and chronic illness), may prove to be as valuable a window into normal cognitive functioning as breakdowns in amnesia and aphasia are currently proving to be (e.g., Squire, 1987). 11. The Theoretical Framework: From General Capacity to Working Memory
The research reported here is part of an evolving understanding of the relations among aging, memory, and discourse processing. We began research on this topic using a “general capacity” view, which was modified from an earlier perspective (Hasher & Zacks, 1979. 1984) to be appropriate for discourse processing in general and inference generation in particular. Two central assumptions characterize most general capacity models: ( I ) cognitive functioning is constrained by the resources that are momentarily available, and (2) the multiple components assumed to occur in almost every task vary in the resources that each needs for maximal performance (e.g., Hasher & Zacks, 1979, 1984). This general framework was then extended to develop an understanding of the cognitive deficits associated with aging. As applied to adult age differences in cognitive functioning, a third central assumption is that capacity declines across the adult lifespan.
Working Memory, Comprehension, and Aging
I95
Taken together, these three capacity assumptions predict that the degree of age-related decline on a particular task will depend on the resources of the individuals involved and, critically, on the demands made by the subcomponents of that task. When those demands are minimal (as when most of the processing is nearly automatic), age deficits too should be minimal. Age deficits, however, should increase as the cognitive demands of task components increase. Such a view is consistent with that of Craik and collaborators (1983; Craik & Byrd, 1982; Craik & Rabinowitz, 1984) who postulated an age-related decline in the "processing resources" available for capacity-demanding cognitive operations. This deficit reduces, at both encoding and retrieval, the kinds of self-initiated activities that result ( I ) in the generation of elaborated and distinctive memory traces and (2) in subsequent access to those traces, at least under circumstances of limited environmental support (or cues) to guide retrieval. Such viewpoints provide a useful heuristic for integrating a considerable research literature (see Craik, 1983). For example, we have reported some age differences in memory for frequency of occurrence (Hasher & Zacks, 1979) which disappear when subjects are tested using a task that provides considerable contextual support for retrieval (Attig & Hasher, 1980). Even the often reported age deficit in memory for particular items, found in tasks such as keeping track of the elements in a series (Zacks, 1982), free recall, and list learning (see Craik, 1983; Kausler, 1982, for reviews), can be eliminated when testing provides substantial contextual support for retrieval (Light & Singh, 1987). In the Light and Singh work, perceptual identification and word completion of previously presented words was compared to the identification and completion of new words in an implicit memory task. On these two memory measures, age differences were small and unreliable, a result which can be attributed to the strong contextual support for memory trace access that these two tasks provide (see Jacoby, 1983; but see also Light & Singh, 1987). Discourse comprehension is an ideal domain for assessing limited capacity frameworks because most models of discourse processing assume that multiple components, demanding substantially different levels of cognitive resources (e.g., LaBerge & Samuels, 1974; Perfetti. 198% are involved. Thus, for example, access to a lexical representation from either a visual array or an auditory message is virtually capacity free. Other components, such as inference generation, may vary from being extremely undemanding (as in inferring that a gun was used in a shooting incident) to extremely demanding. A.
DISCOURSE PROCESSING A N D
WORKING
MEMORY
In the discourse processing literature, the notions of limited capacity and of differential demands on that capacity by component tasks are most
I%
Lynn Hasher and Rose T. Zscks
often embedded in the concept of working memory. Working memory, in turn, is conceived of as playing a central role in discourse comprehension. It bears the burden of orchestrating and enabling the multiple processes that co-occur to make skilled comprehension possible (Daneman & Green, 1986; Just & Carpenter, 1987; M e r g e & Samuels. 1974; Sanford & Garrod, 1981; Stanovich & West, 1983). In most of this research (although see Baddeley, 1986),' working memory is conceived of as a limited capacity mechanism which shares its resources between a storage function (as when it holds an earlier phrase, clause, or larger unit) and a set of processing functions (as when it analyzes syntax, establishes local meanings, and integrates meanings across psychological units). Demanding inferences may be seen as placing a large burden on the storage function of working memory, requiring it to serve two types of information: ( I ) that presented in preceding portions of a message and (2) any prior knowledge activated by the message.* There is no doubt that having prior information accessible in working memory is critical for comprehension to flow smoothly. The research of Glanzer and collaborators (Fischer 8z Glanzer, 1986;Glanzer & Nolan, 1986)shows clearly that having access to the information from the one or two immediately preceding sentences enables reading to proceed apace. Even a brief interruption will disturb this process, presumably by displacing relevant, antecedent information from working memory and substituting it with something less relevant to the ongoing message. If, however, the amount of capacity allocated to higher-level processing components, such as establishing connections among elements of a text, leaves little for the support of necessary information in active storage, these processes can be expected to malfunction, preventing inferences from being drawn and preventing the establishment of a coherent and integrated representation. Because of the intuitive and logical appeal of this conception as well as the introduction of a measure of working memory that seemed to be a powerful individual difference variable (Daneman & Carpenter, 1980. 19831, the notion of working memory has both guided considerable re-
'Baddeley's research (see Baddeley. 1986) has focused not on the central executive of working memory. to which this description applies. but on three subsidiary components of working memory: the articulatory Imp. the phonological store. and the visual spatial scratch pad. Thus, although the concept of working memory as used in some areas of the discourseprocessing field originated with the Baddeley and Hitch (1974) notion. developments in discourse comprehension have not necessarily followed Baddeley's lead in the subsequent development of this concept. 'In most current work, the phrase "prior knowledge" refers to trxf-rrlrwnf information. As will be seen, other types of stored information are important for comprehension and memory as well.
Working Memory, Comprehension, and Aging
197
search in the 1980s and been widely used as an explanatory concept. This has certainly been true in cognitive gerontology where the notion that working memory capacity declines with age has been especially compelling. Our application of a working memory capacity model to aging and discourse comprehension hinges on one assumption beyond those already articulated for a general capacity view: in the competition for a decreasing supply of capacity, the processing component of working memory has higher priority than does the storage component (Spilich, 1983; cf. Light & Anderson, 1983). This latter argument stands on a 2-fold rationale: (I) the initial phases of linguistic processing are probably largely stimulus driven (consider, for example, lexical access), and (2) social convention requires that a listener take turns and/or nod or verbalize agreement at appropriate points in the transmission of a message. To do this, ongoing processing of words, phrases, and larger units must occur. Of direct relevance to the assumption that the storage function of working memory is differentially reduced with aging is evidence suggesting that variables that make demands on the storage component have a particularly disruptive effect on older as compared to younger adults. For example, Light and Capps (1986) have shown that the ability of younger and older adults to determine the antecedent of a pronoun is a function of how much information intervenes between the antecedent and its subsequent pronoun (see also Cohen & Faulkner, 1984; Wright, 1981). If the noun-pronoun relationship occurs in contiguous sentences, older and younger adults are equally able to determine the antecedent. If the relationship is separated by intervening sentences, older adults are substantially disadvantaged. These studies (and others, see Light & Albertson, 1988) suggest that when the demands on storage are high, older adults are particularly disadvantaged compared to younger adults. There is one possible exception to the pattern of evidence consistent with an age-related loss of storage capacity in working memory: adults with high verbal ability andor levels of education seem to be buffered from the considerable aging declines often found in discourse comprehension tasks (e.g., Hultsch & Dixon, 1984; Hultsch, Hertzog, & Dixon, 1984; Zelinski & Gilewski, 1988). We and others (Cohen, 1988; Hunt, 1985; Perfetti, 1985; Zacks, Hasher, Doren, Hamm, & Attig. 1987) have suggested that adults with high verbal ability may use more efficient processing strategies than adults with lower ability, thereby enabling at least some sparing of functioning, including sparing of the storage component in working memory. Some suggestions in the individual difference literature support such a view (see Carpenter & Just, 1988), with correlations between working memory span and vocabulary generally ranging from approximately .35 (Zacks & Hasher, unpublished data; Hartley, 1986, 1988)
Lynn Hasher and Rose T. Zpcks
I98
to approximately S O (Daneman & Green, 1986; but see Baddeley, Logie. Nimmo-Smith, & Brereton, 1985, for an exception).3
B.
WORKING
MEMORYA N D
INFERENCES
Thus, a working memory capacity view has some validity for integrating existing research on aging and discourse comprehension and memory. The research reported in this section attempts to apply this view to the ability to create and remember inferences. Inferences are critical to the establishment of a coherent and integrated representation of a text, without which comprehension is poor and retrieval extremely impaired (Bransford, 1979; Bransford & Johnson, 1972). The establishment of an integrated representation requires that a listener or reader produce at least some of the anaphoric and causal connections that are implicit in a text (e.g., Clark. 1977; Sanford & Garrod, 1981; O’Brien & Myers, 1987h4 In turn, inferences vary in the demands they place on working memory capacity. Some require the integration of two familiar pieces of information (e.g., that the vase on the dinner table was the container for the roses that were being admired by the guests). Some inferences require only the integration of a well-learned fact with the incoming information (e.g.. the sentence “He shot her” implies that a gun was used). Such inferences should create relatively few problems, even for those who have reduced working memory capacity, because they require so little storage. Indeed there is evidence that older adults are not particularly disadvantaged in the formation of such inferences (see Belmore. 1981). By contrast, other inferences require the maintenance of substantial information in working memory, often from different segments of a message, and sometimes from general knowledge as well, in order for processing to establish the logical or pragmatic connections among the relevant aspects of information. Inferences that place high demands on storage capacity would be expected to show substantial age differences-and they do, as will be shortly seen (see also, e.g., Cohen, 1979; Light & Albertson, 1988; Light & Capps. 1986; Light, Zelinski, & Moore. 1982). That research reveals that older adults have problems forming even critically important inferences, at least under some circumstances. A central limitation to the formation of in‘For a measure of working memory capacity. see Daneman and Carpenter ( 19x0. 19x3) and Daneman and Green (1986). There is increasing evidence that even young adults do not spontaneously form all of the connecting inferences that a text permits (see Alba & Hasher. 1983: McKoon & Ratcliff. 1986). Some inferences may be more critical for text cohesion than others. Current evidence suggests that anaphoric (pronoun to referent) and causal-sequential inferences are among those that are drawn on line (see O’Brien & Myers, 1987). As will be seen. our research relies on inferences that are critical to a logical understanding of a paragraph.
Working Memory, Comprehension, and Aging
1 9
ferences during on-line comprehension appears to be the inability of older adults to retrieve quickly and efficiently the antecedent information necessary to form an inference. This insight led us to address the question of what factors might function to limit on-line retrieval; it also led to an answer, proposed in the final section of this article, with broad implications for discourse comprehension and memory across the lifespan. Ill.
The Empirical Evidence: Aging, Inference Formation, and Retrieval Problems
The three experiments we report first represent tests of the notion that reductions in the storage capacity of working memory explain most of the age-related reductions in inference formation. Two other experiments addressed the role of retrieval problems in inference formation. A.
INFERENCE DIFFICULTY STUDIES: EXPERIMENTS 1-3
The materials for these studies were 12 paragraph-length passages (borrowed from Alba, 1984) designed to vary the storage demands on working memory for encoding an inference about a central target event. For example, central to a correct understanding of the passage entitled “The Artist” (see Table 1) is the recognition that a phone call the protagonist TABLE I AN EXAMPLE O F MATERIALS U S E D IN THE RESEARCH The Artist Explicit and Expected versions (in the latter. the phrase in parentheses i s omitted) The artist was busily painting one day when he received the phone call he had been expecting from his doctor’s office. He was concerned about the results o f a series o f lab tests he had taken. The artist was told he had three more months (to live). He was shocked to hear this kind o f news from his doctor. Although he had not been feeling well. he still had not expected to hear such bad news. H i s doctor expressed sympathy and hung up. Suddenly the painting was no longer important. The artist mixed himself a strong drink. Unexpected version The artist was concerned about having his painting ready for the exhibit deadline. While he was busily painting one day. he received a phone call. The artist was told he had three more months. H e was shocked to hear this kind o f news from his doctor. Although he had not been feeling well. he still had not expected to hear such bad news. H i s doctor expressed sympathy and hung up. Suddenly, the painting was no longer important. The artist mixed himself a stiff drink. Target fact question The artist was told he had three more months to do what?
200
Lynn Hasher and Rosc T. Zncka
received "while busily working one day" presented him with the news that he had three more months to live. There were three different versions of each passage, including one which stated the target fact directly (Explicit version) and two (the Expected and Unexpected versions) which required inferences for encoding the target fact. The Expected passages provided strong contextual support for the target inference from their beginning; the Unexpected versions did not and, in fact, tended to initially elicit an alternative interpretation of the situation (e.g., that the artist had three more months tofinish the painting he was working on). In the Unexpected but not in the Expected passages, generation of the target inference was presumed to require a reevaluation of the encoding initially assigned and retrieval of sufficient passage information to resolve the discrepancy in understanding. Because of the working memory demands of this process, age deficits were expected to be largest for Unexpected passages. The limited demands on the storage function of working memory made by the Expected passages might be insufficient to disrupt younger adults but sufficient to disrupt inference formation in older adults. Because no particular demand is placed on storage capacity for encoding the target information in the Explicit passages, no age differences were expected. These expectations have been largely confimed in experiments in which subjects listened to or read a series of such passages under comprehension instructions and then were tested using direct questions (e.g., "The artist had three more months to do what?"). These studies also used similar questions to test recall of two factual, explicitly presented control details for each passage. Based on the large literature showing age differences on detailed information that is not central to the understanding of a message, we expected to find age differences on these items across all three types of passages. Across the studies, materials were presented in one of three ways. In Zacks et al. (1987). subjects listened to the passages, read at a comfortable, preset rate of presentation which was not under the subjects' control. In a second (see Zacks & Hasher, 1988). subjects read the materials themselves and controlled the rate of presentation by pressing a key on a computer to present the next sentence on a screen. In this study (and a third, which also used a self-paced presentation procedure), we could examine reading rate data for individual sentences in the different passage versions. The second two studies differed from each other in that the presentation method in one (Zacks & Hasher, 1988) simulated an auditory presentation mode in which there was no access to previous information. except via one's own memory, once a sentence disappeared from the screen (this was termed a "noncumulative" presentation procedure). In the other, the presentation method simulated, at least within each passage, a standard reading procedure in which each new sentence joined, rather than replaced,
Working Memory, C ~ m p ~ h e ~ i and o n ,Aging
20 I
the preceding sentences (this was called a “cumulative” presentation procedure). Such a procedure provides an “external memory” and, if our working memory capacity rationale is correct, an external memory should eliminate the need to rely on the storage component of working memory which, in turn, should eliminate age differences even in the most demanding condition (the Unexpected passages) because the participants no longer need to rely on their own memories.’ Across the three experiments, the procedures (other than those concerning the presentation method and timing controls) were basically the same. Subjects read one version of each of the 12 passages, with four from each of the conditions (Explicit, Expected, and Unexpected). Across subjects, passages were counterbalanced so that each was used equally often in each condition of an experiment. The most critical point is that the identical target information was tested equally often in all three of the inference conditions. Subjects read or heard a series of six passages under oral presentation and 12 passages under written presentation and were then tested, in the order of initial presentation (which was varied across subjects) and in the presence of a passage title (e.g.. “The Artist”), for their answers to three questions per passage, one concerning the target inference and two control details. The same procedure was followed for the remaining six passages. In all three studies, the young subjects were university students who were mainly in their first 2 years of college. They were typically volunteers from a department subject pool and were not paid for their participation. The older adults, by contrast, were paid. In all but the first experiment, older participants ranged in age from 63 to 75 and most had some college education. In the first experiment, the upper limit was 90 years of age. Participants were tested either in a university-based laboratory setting or in a community-based setting arranged so as to simulate a laboratory. In either case, older participants arranged for their own transportation to participate in the testing. All subjects were administered the vocabulary subtest of the WAIS-R. There were 24 younger and older subjects in the oral presentation experiment, 60 in each age group for the noncumulative procedure, and 30 in each age group for the cumulative procedure. The data on target recall are shown in Table 11. with the difference score used as a shorthand measure to indicate the effect of age on each ’Details of the procedures used in the first two experiments may be found in Zacks CI ul. (1987) and in Zacks and Hasher (1988). We note that a portion of the third experiment reported here entailed a replication of the visually presented noncumulative procedure reported in Zacks and Hasher ( 1988). For ease of presentation, here we collapsed the two replications of the noncumulative procedure. creating one younger and one older group of 60 subjects each.
Lynn Hasher and Rose T. Zacks
202
TABLE I1 PERCENTAGE CORRECT RECALLOF TARGET OF AGE. INFERENCE CONDITION, lNFORMATlON AS A FUNCTION A N D PRESENTATION MODE Written
Inference condition Explicit Expected Unexpected
Oral presentation (48 subjectdage group)
Noncumulative
(60subjects/age group)
Cumulative (30 subjectsbdge group)
Young Old Difference Young Old Difference Young Old Difference
88.0 90.1
80.7
85.4 12.4 60.9
2.6 17.7 19.8
97.1 Y4.6 87.5
92.1 90.4 72.9
5.0 4.2 14.6
93.2 94.2 88.3
90.8 91.7 85.0
2.4 2.5 3.3
of the three inference conditions. These data yield two important conclusions. The first is that in all cases. age differences are small and nonsignificant for target facts presented in Explicit passages, indicating that older and younger adults are equally able to encode and retrieve the centrally important target facts when they are actually presented. Futhermore, with identical information having been tested in the implicit conditions, any age differences in these conditions are most readily interpreted as being due to encoding differences. That is, older adults have greater difficulty forming inferences, even ones such as those used here, which are central to an understanding of a passage. The second major conclusion is that the age differences found in inference generation are a function of presentation mode and/or pacing as well as inference difficulty. In particular, with oral presentation [and no control over pacing (Experiment I)]. there are large and reliable age deficits for both easy and hard inferences. With the self-paced noncumulative presentation mode, only the Unexpected inferences show a deficit; older and younger adults no longer differ in their likelihood of forming an Expected inference. With the self-paced, cumulative presentation used in the final experiment, there is no age deficit in target recall even for the Unexpected passages. In all three experiments, recall of the control facts was lower (by 821%) for older than for younger adults. The size of the age difference was somewhat smaller with written than with oral presentation, but was unsystematically related to inference condition. Information on how the pattern of recall of target inferences comes about can be derived from the reading times in the two self-paced conditions using cumulative and noncumulative presentations. Table 111 presents the reading times per word for each of two sentences, called Critical
Working Memory, Comprehension, and Aging
203
TABLE 111
READINGTIME( I N MSEC/WORD) FOR INFERENCE.CRITICAL, AND POSTCRITICAL SENTENCES Noncumulative"
Cumulative"
Sentence
Explicit
Expected
Unexpected
Explicit
Expected
Unexpected
Younger subjects Critical Postcritical
36 I 323
369 350
45 I 404
449 34 I
455 346
590
Older subjects Critical Postcritical
520 432
5 I4 449
585 503
532 483
583 584
790 68 I
461
"n = @/age group. "n = 3O/age group.
and Postcritical. For participants receiving the Unexpected version of a passage, the Critical sentence contained the first direct indication that their current understanding was incorrect. For example, in "The Artist." the sentence "He was shocked to hear this kind of news from his doctor" made it clear that "the three months" did not concern a painting deadline. The postcritical sentence was the immediately following sentence in each passage. For the Explicit and Expected versions of a passage, the information in the Critical sentence was not particularly surprising, nor was it new. Rather, it fit well with subjects' current interpretation of the passage. Thus, reading time for the Critical sentence in these two conditions can be considered a baseline against which to assess the effects of information that, in the Unexpected version, leads to a reconsideration of the appropriate interpretation of the passage. Consider first the data from the Noncumulative presentation condition. Both older and younger adults slowed their reading of the Unexpected passages when they came to the Critical sentence which first clued the target inference. Although the magnitude of the slowdown is approximately equal for younger and older participants, younger adults were nonetheless more likely to form and remember this inference than were older adults (see Table 11). Thus, time alone is not the critical factor in inference formation. Either or both of two alternatives must be considered as the potential source of the failure of the slowdown to result in satisfactory inference formation: ( I ) older adults suffer from an inability to quickly retrieve information no longer in working memory and/or (2) although they can retrieve the information they have trouble doing the reconsideration necessary to establish the inference in the Unexpected passages.
Lynn Hosher and RoscT.Zseka
204
The data from the Cumulative presentation condition are helpful in suggesting the locus of this deficit. We note first that the provision of additional information in the Cumulative condition slows down reading somewhat for both younger and older subjects, as is clearly seen for the Critical sentence. This slowdown is apparent throughout the passage after the initial sentence. Of special interest here is the slowdown in the Unexpected versions: it is larger in the Cumulative condition than in the Noncumulative condition, suggesting that both younger and older subjects pause at this point in the passage to consult preceding information that is physically available on the computer screen. Especially noteworthy in these data is the fact that the slowdown in the Unexpected version is substantially larger for older than for younger adults. Recall that in this condition only (Table 11) is there evidence that older adults can form and remember the central inference in the Unexpected version as successfully as younger adults. It is clear then, that when the relevant information is provided and is available, older adults have little trouble with the logical work necessary to accommodate the change in understanding that is required under the Unexpected condition (a similar conclusion on reasoning may be found in Light et al., 1982). By exclusion then, the fundamental problem older adults seem to have in dealing with these passages (and potentially in a wide range of other materials and tasks) is their relative inability to retrieve needed information.
B.
PRIMING STUDIES
I . Experiment 4 The suggestion from the inference studies is that, unless conditions are optimal, older adults have difficulty retrieving sufficient information into working memory to form the difficult Unexpected inferences. The next set of studies focuses on possible age deficits in retrieval of prior information. In general, we expected to find such deficits primarily when demands on working memory capacity were fairly high. In particular, in the first of the studies, which examined the ability of a word to cue (or prime) the retrieval of another word from a recently presented sentence, we did not expect an age difference. Because only short sentences were used in the study, if a sentence was remembered, its activation by a cue should allow the whole sentence to be as available in the working memory of older as of younger participants. The procedure of this experiment was modeled after that of Ratcliff and McKoon (1981, Experiment 2). In each of 24 blocks of trials, subjects were first presented (at a 4-sec rate) six unrelated noun,-verb-noun, sen-
Working Memory, Comprehension, and Aging
205
tences (e.g., "The scientist nudged the sheriff'). Then they performed a recognition test on nouns from the preceding set of sentences. Each test noun was paired with a word that served as a "prime." In the conditions of greatest interest, the test nouns had appeared in one of the preceding six sentences and the prime was either the other noun from the same sentence (Within prime) or a noun from another of the five sentences (Between prime). If a prime cues the retrieval of its sentence, subjects should be able to recognize test words more quickly in the Within- than in the Between-prime condition (the Priming effect). Different groups of 24 younger and older adults were tested with delays (SOAs) of 300 or loo0 msec between the onset of the prime word and the onset of the test noun. The primary dependent measure was the reaction time for correct recognitions of the test word. These data are summarized in Table l V , where "forward" priming refers to cases in which noun, primed nounz and "backward" priming refers to the reverse situation. This is of potential interest because a different pattern of forward vs. backward priming for the two age groups might indicate age differences in the memory structures of the sentences. Ratcliff and McKoon's (1981) finding of equal forward and backward priming for young adults suggests that this age group stores the sentences in memory in an abstract, nondirectional form (cf. Howard, Heisey, & Shaw. 1986). The priming effect in Table IV (the faster response times for Within as compared to Between primes) varies somewhat across conditions, but statistical analysis shows that significant priming is obtained for both ages at both SOAs and in the backward as well as the forward direction. Using a slightly different paradigm, Howard et al. (1986) recently obtained similar results, except that their older adults showed no priming for sentences
TABLE IV
MEAN RECOGNITIONTIMES( I N AND
MSEC) AS A FUNCTION OF AGE, SOA. PRIMETYPE Backward priming
Forward priming Group
Within prime
Between prime
Priming effect
Within prime
Between prime
Priming effect
674 640
6% 70 1
22 61
670 632
697 699
27 67
918 919
941 985
23 66
89 I 923
966 947
75 24
Young
300 SOA loo0 SOA Old
300 SOA loo0 SOA
*
206
Lynn Hasher and Rase T. Zacks
presented once, whereas ours did. The conflicting results are probably due to differences in list length and test delay. We tested subjects after each set of six sentences; they tested them only at the end of a long list. In any case, both sets of findings suggest that older adults can retrieve into working memory an entire sentence, if that sentence is short enough and if it is accessible in long-term memory. The equal forward and backward priming for both age groups further indicates that the stimulus sentences have similar memorial structures across the age groups. The question of the limits of sentence complexity or length for which this is true is left for future work. 2. Experiment 5
The next experiment also assessed retrieval using a priming procedure, one developed by Dell, McKoon, and Ratcliff (1983) that measures priming on-line during a speeded text-processing task. This time the question was whether a prime (an anaphor) could retrieve information from a preceding sentence (the antecedent). In this study, as in the original Dell et (11. study, there were pre-existing relations between the prime (e.g., pet) and the antecedent target (e.g.. cat) information. If there were an age-related decline in priming using such materials, this would predict even less priming on-line for older adults when the prime-antecedent relation is newly established by the text. Although the procedure of this experiment has received some criticism, careful examination has established its validity (OBrien, Duffy, & Myers, 1986). In it, subjects read paragraphs whose initial sentences introduce an antecedent (e.g., cat) which will, some sentences later, be referred to by an anaphor (e.g., pet). The question is whether the presentation of the anaphor reminds the subject of the antecedent, and if so, how quickly. This is asked in an on-line situation such that the test of whether or not reminding is successful occurs at different points within a critical sentence in which the anaphor or its control word (a word that meaningfully tits into the sentence and has about the same frequency of occurrence in the language but that does not point to the antecedent) is presented as the second word. In our variant of the procedure, the test of activation (essentially how long it takes the subject to recognize that a target word, cut, was actually in the current paragraph) occurred following either the prime or its control by 250-, 500-, or 750-msec time intervals that were filled with the presentation of the remainder of the sentence. As can be seen in Table V. the data are quite straightforward. As in previous research, there was speeded reminding of an antecedent for young adults at all tested points during sentence comprehension. Thus, an anaphor activates its antecedent and that activation remains available at least
207
Working Memory, Comprehension, and Aging
TABLE V TIMEOF CORRECT REFERENT RECOGNITIONSAS A FUNCTION OF AGE, CUEING CONDITION. A N D TIMING OF TEST'' Y oungcr
Older
Condition
250
500
750
250
500
750
Anaphor Control Effect
613 65 I 38
553
636 670 34
1047
638 85
I025 - 22
%5 1016 51
1063
979 84
"Delay from anaphor in msec.
throughout a relevant sentence. This is critical given the substantial evidence that although some interpretational processes occur on-line in a word-by-word manner (the "immediacy hypothesis"), there is also processing that is held in abeyance until the sentence boundary. Such "wrapup" processing may be particularly critical for the establishment of coreference across sentences that is critical for an integrated representation of a message (see Just & Carpenter, 1987 for a recent overview of these arguments). Older adults produced a rather dramatically different priming pattern. They showed no reliable priming 250 msec after an anaphor; priming was not reliable until 500 msec later. Failure to find semantic priming for older adults at short SOAs has been reported before (Howard et al., 1986), although some success has been reported as well (Light & Albertson, 1988).'Thus, even when there is a strong, preexisting connection between words in a passage, older adults are slower to gain access to an antecedent. This can easily limit the quality of understanding that older adults can produce in on-line comprehension. Such persistent slowing might eventually lead to a strategy of failing to search. Recall from the reading time data in the inference study that older adults in the Noncumulative condition (where retrieval was required for successful completion of the inference in the Unexpected version) spent approximately the same amount of time 'The source o f the discrepancy between the two studies i s unclear. The procedures used were very similar. in part because Leah Light generously provided us with the computer program. Thus, the differences seem most likely to reside in aspects of the materials (e.g.. in the strength o f the preexisting connections between the anaphors and their referents) or the subjects. Light and collaborators' subjects tend to come from a very highly able sample. to judge from WAlS vocabulary scores which are often above a mean of 65 and sometimes above a mean o f 70. Our samples o f older adults, in order to match our sample of younger adults, have mean WAlS vocabulary scores that range from the mid-40s to the mid-50s.
208
Lynn Hasher and Rose T. Zacks
on the critical sentence as did the more successful younger adults. This equal search time may be the product of an invariance across the adult life span in subjectively permissible search times, or it may be the product of considerable experience with retrieval failures. C. EMPIRICAL CONCLUSIONS The major conclusions from these studies are clear. The ease of inference formation for older adults varies with demands tied to pacing and to the need to retrieve information from memory. With information presented at a standard speaking rate (at least for young adults), older adults have trouble forming both easy and hard inferences. With control over pacing, older adults continue to have difficulty with just the hard, memory-demanding inferences. This difficulty disappears when older adults no longer need to rely on their own memories for retrieval. Retrieval problems can be seen clearly in Experiment 5 , in which older adults are slower to retrieve a cued (primed) word despite the existence of preexisting connections among words.
IV. Criticisms of the Reduced Processing Resource Approach Although our findings and others in the literature (see e.g., Cohen. 1988; Light & Burke, 1988) are generally supportive of the idea that older adults have reduced capacity for cognitive processing, it may be time to reconsider the heavy reliance on reduced capacity views in cognitive gerontology and elsewhere. A number of factors are persuasive on the need for such a reevaluation. These include recent criticisms of limited capacity views in general (e.g., Allport, 1987; Hirst & Kalmar, 1987; Navon, 1984; Neumann, 1987) and of their application to accounts of age-related changes in cognition (e.g., Light, 1988; Salthouse, 1982). It seems clear that the causes of age changes in cognition are more complex than reduced capacity views assume. For example, these views ignore the social and affective concomitants of aging and the impact these factors might have on performance in cognitive tasks (cf. LaBouvie-Vief & Blanchard-Fields, 1982). As others have discussed in detail and as we therefore review briefly, there are serious conceptual and empirical shortcomings to the limited capacity approach. One problem is the lack of agreement, and often the lack of specification, of the nature of the central resource that is limited in quantity. For example, Craik (1983) deals with processing resources in a way that suggests an energy metaphor. Our working model, by contrast, suggests a space metaphor. Although it is difficult to determine whether these differing notions of capacity are contradictory, it is clear that they
Working Memory, Comprehension, and Aging
209
implicate different factors (e.g.. effort vs. storage load) as determining resource demands. A second problem with current capacity views is the lack of agreement about (and frequently the lack of specification of) how capacity limitations and their presumed age-related reductions actually impact on mental functioning. Among the possibilities here are constraints on the amount of effortful. elaborative processing at encoding and retrieval (Craik's view) and constraints on active storage of information (our view). Another controversial issue concerns whether the individual's limited resources should be thought of as a single pool of general-purpose processing resources or as a set of independent resource pools, each tied to a different mental function (Allport, 1987; Navon, 1984). In part, this controversy stems from failed attempts to develop broadly applicable measures of available capacity and of the capacity demands of different mental processes.' Consider, for example, the secondary task procedure, which may be the best of the available methods for capacity measurement. To determine the relative capacity demands of a set of primary mental tasks, subjects are required to perform each in conjunction with a standard secondary task, such as responding to a tone (e.g., Kerr, 1973). The slower the reaction time on the secondary task, the greater the presumed capacity demand of the concurrent primary task. The problems with this line of argument for single resource pool views are that different secondary tasks may give different rankings of the target primary tasks and that the amount of mutual interference between two primary tasks performed simultaneously is often inconsistent with estimates based on secondary task performance (e.g., Allport, 1987; Hirst & Kalmar, 1987; Navon. 1984). Even the possibly less ambitious goal of finding a general measure of working memory capacity has proved elusive: Daneman and Carpenter's (1980) sentence span test has not, as was initially hoped, proved broadly useful as a predictor of individual differences across a range of cognitive tasks (although it appears to be extremely useful as a predictor of language comprehension; Carpenter & Just, 1988). Instead, the best predictor for different tasks may be a working memory measure that is fairly closely tied to each task (Daneman & Green, 1986; Daneman & Tardiff, 1987). However, multiple resource pool views also have their problems. Such views are not readily accommodated to the fact that there is always some mutual interference when two tasks, even very different ones, must be performed together. Also, there is the difficulty, in the absence of a priori 'One serious consequence of the lack of a standard measure of capacity demands is the tendency to categorize mental operations as automatic or effortful on an ad hoc basis (cf. Light & Burke, 1988). In the aging literature, this tendency sometimes takes the form of arguing that automaticity is demonstrated by age invariance while effortfulness is demostrated by age change, a line of argument which is clearly circular if age comparisons are the sole basis of classification.
210
Lynn Hasher and Rose T. Zacks
criteria. of determining the number of different pools an adequate model must include. We do not intend to imply with these criticisms of the limited capacity notion that it has not been useful in cognitive psychology. Indeed. it has been an extremely powerful idea. In the cognitive gerontology literature. in particular, it has helped organize and make comprehensible complex (and on the surface, possibly conflicting)sets of findings. It has also generated much interesting research. The criticisms, however, clearly suggest that a new direction in theory development is required. As a first step toward motivating the particular direction our theorizing has recently taken, we raise a number of additional issues which are not so much criticisms of reduced capacity views as questions that either are not addressed or are not adequately answered by such views. The theoretical approach we present in the next section includes potential answers to these questions. One question is why working memory capacity (or any other mental resource) should decline with increasing age. Possibly relevant here is the obvious mental slowing that occurs as we grow older. Some (e.g., Cohen, 1988) have conceived processing resources in terms of speed of processing. Taken to an extreme (e.g.. Salthouse, 1982). such views imply that mental slowing is the fundamental cause of age declines, with the apparent capacity losses being a consequence of the cognitive slowing. The opposite position is. of course, also possible, and, in fact, differentiating between them may be very difficult (cf. Salthouse, 1985). In any case, it is probably foolhardy to ignore mental slowing phenomena. Another issue concerns the potential impact of noncognitive age differences-for example. in values. goals, and interests-on the cognitive performance of older adults. LaBouvie-Vief and collaborators (e.g., LaBouvie-Vief & Blanchard- fields,^ 1982) have argued that the "poorer" performance of older adults on cognitive tasks is. at least in part, a result of factors such as lack of interest in the tasks or performance goals that do not match the investigator's criteria of good performance. For example, older adults may try to recall a story in a way that is interesting rather than in a way that is as true to the original as possible (e.g., Arbuckle & Harsany, 1985). Cognitive psychologists have, perhaps, too fully ignored this type of claim. As a final step in motivating our new theoretical orientation, we report some findings that were germane to our thinking. These findings come from Verneda Hamm's dissertation, now nearing completion at Temple University. In a study using the inference difficulty passages, she attempted to determine what subjects were thinking about at key times in their reading of these passages (1) near the midpoint, prior to the time at
Working Memory, Comprehension, and Aging
21 I
which subjects in the Unexpected version would learn that they had the wrong understanding, or (2) at the end of the story. She did this by asking subjects to make a speeded decision to an individual word which was either the final, Target inference, or the initial, Competing inference from the Unexpected versions. (For “The Artist.” the two words would be live and finish, respectively.) The passages were presented on a computer screen for self-paced reading. When, on occasion, a single word appeared on the screen, the subjects were to decide whether or not that word was consistent with their current understanding of the passage. Subjects read a total of 24 passages, half each in the Expected and the Unexpected formats. Consistency judgments were made either midway through or at the end of the passage. On Expected passages, younger and older subjects were highly similar in their responses. Both groups judged the target inference to be consistent with their interpretation at both the midpoint and at the end of the passage (although the agreement figure increased from a midpoint value of 67% to a final value of 90%). For the Unexpected passages, by contrast, the two age groups were quite different. At the midpoint of the Unexpected version, younger subjects were more likely than older subjects (81 vs. 72%) to judge the competing inference Cfinish) to be consistent with their understanding. This suggests that given the limited context provided at the beginning of the Unexpected versions, older adults have a more difficult time generating an appropriate inference. By the end of the passage, younger and older subjectsjudged the Target inference as consistent with their current interpretation equally often (90% and 88%, respectively). The most intriguing finding was the outcome of those trials in which subjects were tested at the end of an Unexpected passage with the competing inference: Young subjectsjudged those items to be consistent with their interpretation 28% of the time. Older adults judged those same items to be consistent fully 48% of the time. Thus, although live was more likely to be consistent with a reader’s understanding at the end of the Unexpected version of “The Artist” than was finish,finish nonetheless retained greater credibility for older adults than it did for younger adults. It is as if older adults, having formed an inference, are slower and/or more reluctant than younger adults to give it up even when, as the data indicate. they have arrived at a more appropriate one. That older adults are more likely than younger adults to maintain both inferences for Unexpected passages is clearly at variance with a view (such as our initial one) that assumes that older adults have less storage capacity in working memory. If anything, the reduced working memory capacity position would have suggested the opposite pattern of data across ages. The unexpected nature of Hamm’s results suggested a hitherto unsuspected source of differential disruption
212
Lynn Hasher and Rose T. Zacks
between younger and older adults. This source, an age-related decline in the efficiency of inhibitory processes, is incorporated in the modified theoretical framework to which we now turn. V. A New Framework: Inhibition and the Contents of Working Memory
We begin with the assumption that working memory is centrally involved in comprehension. Now, however, we focus on the contents of working memory rather than on its capacity. Central to the eficient operation of working memory, and to selective and intensive attention as well (e.g.. Neumann, 1987). are inhibitory mechanisms which, when normally functioning, serve to limit entrance into working memory to information that is along the "goal path" of comprehension. This refers to information necessary to the establishment of an objective understanding of a message, one which largely coincides with the intentions of the speaker or writer.' By contrast, off-goal-path ideas are irrelevant or peripheral to the formation of a coherent and detailed representation of a text. Because attentional gating will not be perfect, non-goal-path information, such as personally relevant thoughts, contextually inappropriate interpretations of words or phrases having multiple meanings, and daydreams, may on occasion enter working memory. When they do, normally functioning inhibitory mechanisms will rapidly dampen the activation of the non-goal-path thoughts. When the goal of the listener is not restricted to the establishment of the objective meaning of the message, the comprehension situation will be slightly altered from that just described. This is because the goals of the reader or listener are critical determinants of what is and is not inhibited (see Navon, 1986). With respect to older adults whose interests, values, beliefs, and goals may be different from those of younger adults, this issue assumes particular importance. If there is an age-related increase in the importance of one's personal values and experiences along with an agerelated increase in the tendency to apply these concerns to a wider range of information, more information that is off the goal path of establishing objective meaning is almost certainly likely to enter working memory. We turn now to consider what might happen if inhibitory mechanisms malfunction or become inefficient, as might occur whenever central neural functioning is slowed and/or when goals differ from the determination of objective meaning. "A listener may have goals other than that of extracting the objective meaning of a message. Such goals might be enacted when one tries to be polite to a bore or when one tries to discern the "hidden agenda" of a meeting or conversation.
213
Working Memory, Comprehension, and Aging
Breakdowns in the efficiency of inhibitory processes may have profound consequences. In parallel-architecture attention systems. a breakdown in inhibition will lead to cross talk among simultaneously active messages, preventing organized responses. Behavioral consequences may include such abnormal examples as schizophrenia and attention deficit disorder (see e.g.. Posner. 1987). But the consequences of inefficient inhibition need not be so profound as these in order to limit comprehension and memory. Consider what happens (as we suggest occurs with normal aging, when inhibitory mechanisms become inefficient) when goals change (see Fig. I). Inefficientinhibition will enable the initial entrance into working memory of information that is off the goal path. It will also result in the prolonged maintenance of such information in working memory. At least three categories of off-goal-path thoughts may be identified: irrelevant environmental details, personalistic memories or concerns, and off-goal-path interpretations. Thoughts about irrelevant contextual and/or environmental details mighr occur when one listens to a research presentation, loses track of the message, and begins to wonder about the speaker rather than the content of the talk: “I wonder if her squash game is still good?” or “This auditorium looks like a pig sty: imagine what the dorms must look like!” Or the thoughts might be highly personalistic ones that involve particular events (“Remember the time we all went to the pier instead of the meeting?”). plans (I better remember to make dinner reservations right after the talk”). or evaluations (“This is not the best talk he’s ever given. but the ideas are interesting”). These might be initiated by something in the content of a primary input (perhaps even a minor detail. as when a footnote acknowledging that an experiment carried out during a sabbatical at an English university initiates thoughts about one’s own visit there) or by something in the environment. A final category of activated responses encompasses responses tied to
inhibition
flretrieving
prior
goalpath-
-importnor,
-
ChW-h
comprehelmion
2 I4
Lynn Hasher and Rose T. Zacks
the linguistic message itself but not on the ultimate goal path. These include the activation of contextually irrelevant meanings of words or phrases, as occurs when the hank being discussed is a financial institution but the listener briefly considers a geological formation (Simpson. 1984). Contextually irrelevant meanings also occur to accidental and intentional garden-path messages (as is true for the Unexpected passages in our inference studies) when the initial interpretation the listener or reader assigns is plausible but not the one the communicator intended. Consider the following example: "The assistant baseball coach reached into the equipment cabinet and saw the bat that was trying to get out." What are the consequences of the entrance into working memory of irrelevant information? Obviously, these will depend on the degree of concentration (or the intensity of attention) accorded the off-goal-path information. If this assumes the proportions accorded fascinating daydreams (e.g., Klinger. 1978). some goal path aspects of the message will go unprocessed. This may result in an awkward pause in a conversation when the listener fails to nod or articulate appropriate interest or encouragement. Or in reading, one may realize that although the eyes have moved down the page, little information has been absorbed (see Just & Carpenter, 1987). These are probably extreme examples of what can happen when attention is directed to irrelevant information. Indeed, these examples may be the self-generated analogs of what happens when the readedlistener has so little appropriate knowledge accessible that he or she is unable to establish coreference among the phrases and sentences of a passage (e.g., Bransford & Johnson, 1972). The determination of coreference is a fundamental requirement for the establishment of a coherent representation of a complex message. Coherent representations, in turn, play a major role in one's sense of comprehending a message as well as in determining the ease or likelihood of recall of that message (Bransford & Johnson, 1972). if not necessarily of its recognition (e.g.. Alba. Alexander, Hasher. & Caniglia. 1981). Another consequence of decreased inhibitory functioning will be a reduction in the ability to switch attention from one target or category of events to another (e.g., Logan, 1985; Posner, 1987). For discourse comprehension, this suggests that changes in the scene or setting of a text or in some other aspects of a mental model (Sanford & Garrod, 1981; van Dijk & Kintsch. 1983; Johnson-Laird, 1983) should be particularly difficult to respond to for those with diminished inhibitory mechanisms. Thus, narrative passages containing many scene changes may pose particular problems. The argument to this point is that the inefficiency of inhibitory mechanisms sets up the circumstances that permit more objectively irrelevant information to enter working memory and, once it has entered. permits
Working Memory, Comprehension, and Aging
215
the irrelevant information to receive more sustained activation than it otherwise would. This combination of events creates the ideal circumstances for the operation of at least two classic mechanisms of forgetting: ( I ) weaker or poorer quality initial encodings (e.g., Hasher. Griffin, & Johnson, 1977) and (2) competition among related ideas (e.g., Postman & Hasher, 1972). Although our central interest is with competition at retrieval, these two sources of forgetting are not independent. Ideas that cannot be retrieved due to poor encoding and/or competition cannot enter into distributed rehearsal cycles, and these are critical both to the development of integrated representations (e.g.. macrostructures, Kintsch & van Dijk, 1978) and to the ultimate likelihood of recall (e.g.. Hasher & Griffin. 1978; Keppel. 1%7). We are proposing that reductions in inhibition “enrich” the contents of working memory with more non-goal-path information, which, in turn, creates the conditions necessary for competition at retrieval. Heightened competition is the direct consequence of the greater timesharing that reduced inhibition permits for off- and on-goal-path information. These categories of information will be linked in memory as a result of their temporal proximity in working memory in such a way as to serve as sources of mutual competition at retrieval. Thus, the off-path information may be able to prevent, at least momentarily, access to target information. Depending on the nature of this target information, a failure to retrieve it can be more-or-less damaging to comprehension. Suppose the target (which co-occurred with irrelevant thoughts about a friend whose behavior was similar to that of a main character) is the referent of a subsequent pronoun. The failure to retrieve the target in a timely manner because of competition from the irrelevant thought will limit the reader/listener’s ability to establish coreference and, so, will limit his or her ability to establish a coherent representation. In on-line comprehension, sources of self-generated competition include any non-goal-path antecedent or interpolated thoughts that enter working memory. Thus, the greater amount of non-goal-path information that is reflected on by persons (such as older adults) with reduced inhibitory functioning sets the stage for an increase in a major source of temporary forgetting. The resulting reduction in access to antecedent information may be particularly disruptive whenever performance conditions (e.g., rapid pacing) increase competition. Indeed, discourse presented at faster as compared to slower rates is differentially disruptive for older adults (Stine & Wingfield. 1987). Thus, a person with reduced inhibitory functioning can be expected to show more distractibility, to make more inappropriate responses and/or to take longer to make competing appropriate responses, and, finally. to be more forgetful than others. Existing evidence in the cognitive geron-
216
Lynn Hmher and Rose T. Zacks
tology literature suggests that the behavior of older adults is consistent with the expectations that stem from a diminished inhibition view. In fact, the increased presence of irrelevant thoughts in working memory (and the attendant consequences) may well be the factors that produce the behaviors that have made it appear as if older adults have reduced capacity for cognitive functions. Anecdotal evidence on the failure of older adults to inhibit non-goalpath thoughts comes from observations of their tendencies to infuse conversations with personalistic intrusions (see Obler, 1980). These may represent a combination of heightened interest in non-goal-path thoughts as well as reduced ability to inhibit those thoughts. More substantial evidence of age-related decreases in inhibition of thoughts can be seen in increased intrusion rates in free recall (e.g., Stine & Wingfield, 1987). in heightened rates of false recognition responses to semantic associates of actually presented words (Rankin & Kausler, 1979; Smith, 1975). and in reduced ability to inhibit both well-practiced and newly learned response patterns in order to acquire a new one (e.g., Hess, 1982, Experiment 3; Kausler & Hakami, 1982). In the semantic memory literature (see Nebes. Boller, & Holland. 1986, Experiment 2), there is evidence that older adults are less successful in dampening an activated thought when it is inappropriate for the situation than are younger adults. Thus, the empirical basis for the assumption of an age-related breakdown in the ability to inhibit thoughts is suggestive. Evidence also suggests an age-related decline in the ability to inhibit responses to actually presented stimuli. Consider the Stroop procedure: it demands inhibitory control of responses in the version in which both the name of the printed word and the color of the ink the word is printed in are activated. The former response must be inhibited so that the latter can be produced. The Stroop interference effect increases with age (Cohn, Dustman, & Bradford, 1984; Comalli, Wapner, & Werner, 1962). suggesting a diminution of inhibition that operates on responses to actually presented stimuli. Research on visual selective attention (see Plude & Hoyer, 1985) suggests that when there is some degree of spatial uncertainty about the location of a target, an increase in the number of distractors in an array will slow target detection more for older adults than for younger adults (e.g., Madden, 1983; Plude & Hoyer. 1985; Scialfa, Kline, & Lyman, 1987, but see Madden, 1983, for an alternative interpretation). Thus, there is evidence that “noise” in the environment poses particular problems for older adults (Layton, 1975; Welford, 19841, problem that may stem from the inefficient functioning of inhibitory mechanisms. We turn now to a brief consideration of evidence consistent with the view that retrieval problems contribute substantially to age-related deficits in memory. Indeed. this view is widely argued in the cognitive gerontology literature (e.g., Burke & Light, 1981; Till, 1985). A strong argument can
Working Memory, Comprehension, and Aging
217
be made that on-line retrieval failures lie at the heart of inference failures reported for older adults (e.g., Light & Capps, 1986, as well as data reported here). These failures are consistent with earlier research showing that when the level of original learning is not held constant, older adults suffer more from the classic sources of forgetting identified by interference theory (see Kausler, 1982; Hess, 1982, Experiment 3). COMPENSATORY MECHANISMS If older adults have profound retrieval problems that diminish even online comprehension, one might be tempted to ask how it is that older adults can function at all? Not only is it clear from personal experience that impairment is often far from overwhelming, but the empirical literature is also clear on this point. Indeed, a major observation of age-related performance is the increase in variability among participants. As well, age differences are more pronounced for tasks that require speeded responding than for others (e.g.. Stine et al., 1986). And they are more pronounced for tasks requiring the participant to retrieve information from memory with relatively little contextual support from cues (see Craik, 1983). When contextual support is great and interference is minimal (as it may be in at least some implicit memory tasks, see Graf & Schachter. 1987). age differences are also minimal (see Light & Singh, 1987). In the face of what may well be massive memory problems associated with the elevated competition at retrieval that is the inevitable consequence of an inefficient inhibitory system, how are we to understand the sustained functioning of many older adults, especially those of high verbal ability, as has often been reported in the empirical literature (see Zelinski & Gilewski, 1988)? We propose that the answer lies in the fact that repeated encounters with retrieval failures lead to reduced attempts to retrieve and, thus, to compensatory changes in comprehension styles that rely heavily on two sources of information for their effective functioning: ( I ) information that is easily accessed from memory and (2) information that is in the surrounding environment. Consider the two major memorial sources of easily accessible information. One includes the current contents of working memory. The other includes memories or thoughts that have (what in an earlier day would have been called) high levels of learning (probably the result of many, distributed practice trials). Both the traditional verbal learning literature as well as research on everyday memory (see Bahrick. Bahrick, & Wittlinger, 1975) show clearly the importance of these variables in influencing accessibility. For all of us, some personal experiences. some opinions, some values are particularly highly accessible. If relatively little of the recent input can be retrieved, these can be used to fill the interstices of a conversation. As noted earlier, older adults seem particularly likely
218
Lynn Hasher and Rose T. Zwks
to make personalistic productions a salient part of their conversation (Obler. 1980). These concerns lead us to a reconsideration of the framework depicted in Fig. 1. There, we assumed that the increased importance of personal values and experiences alters the goals involved in determining what enters working memory. An alternative is that changes in goals with age are the consequence of increased experience with inhibition-produced retrieval failures, leading to an increased reliance on highly accessible (by virtue of practice) personal memories. The representation of the framework thus includes an arrow leading from changes in comprehension style back to the box representing the increased importance of personal information. At this point, as the question mark indicates, the issue is indeterminate. But easily accessed memorial information can only account for some of the preserved skills that older adults show. What accounts for the rest'? The answer lies with information that is in the current stimulus array or that can easily be made to become so, as do the contents of a nearby closet once the door is opened. Certainly, some cues in the environment will be excellent. direct cues to a stored memory. Indeed. we have known since McGeoch (1932) that one major determinant of memory performance is the degree of overlap along a similarity dimension between cues at an initial experience and those at another point in time (Tulving & Thomson, 1973). Thus, we speculate that the power of retrieval cues, coupled with predictability and redundancy in the environment, are likely to leave a person who must rely on such cues only modestly impaired in mental functioning in the real world, compared to someone who is easily able to search memory or to generate his or her own retrieval cues. Indeed, there are several empirical findings which are consistent with the view that older adults rely more heavily on the immediate array than do younger adults. For example, they make greater use of context in priming tasks (Cohen & Faulkner, 1983). leading occasionally to larger priming effects for older adults than for younger adults (Balota & Duchek, 1988; Bowles & Poon, 1985). There are also findings consistent with the view that older adults rely more heavily on easily accessible memories than do younger adults: (1) they often make more intrusions in recall, most often ones that are consistent with the "theme" of the material (Arbuckle & Harsany, 1985; Stine & Wingfield, 1987; Mueller, Kausler. Faherty, Oliveri, 1980); (2) they make greater use of personal experience and knowledge in interpretations (Labouvie-Vief & Blanchard-Fields, 1982); and (3) they may be more likely than younger adults to make decisions based on plausibility rather than to search memory to retrieve a fact (Reder, Wible, & Martin, 1986). Loosely speaking, this general view of the functioning of memory might be termed a "sloopy desk" model. If one is missing a particular, desired
Working Memory, Comprehension, and Aging
219
piece of information, one can either search through a filing system (analogous to a search of memory) or one can search through the perceptual array. If the array is rich (the desk is piled with pieces of information), there is a greater general likelihood of an effective retrieval cue occurring during a search of the immediate environs. Thus, if we ultimately learn that there is an age-related change in the degree to which memory vs. the environment is searched, this change can be viewed as a compensatory strategy . Compensatory strategies may be used with different degrees of likelihood across the life span, largely as a function of the efficiency with which inhibitory mechanisms function, because these largely determine the facility with which memory can be searched. If a search through memory enables the sustained activation of too many off-goal-path ideas, competition will be too great to produce the target item with a high degree of regularity. There is no reason to assume that inhibitory problems are confined to elderly adults. Inhibitory problems may occur in the attentional mechanisms of depressed adults, attention deficit disordered children, and schizophrenics (see Posner, 1987). We suggest it is conceivable that environmentally driven retrieval (the product of breakdowns in inhibition) may occur for young adults during periods of sustained physical illness, high arousal, and, possibly, depression (see Jacobs & Nadel, 1985). The diminution of inhibitory processes during these periods may ultimately be identified as a key mechanism in memory. To summarize, we discuss briefly the memory framework proposed here. The central assumption is that under some circumstances (most notably here, aging), the efficiency of the inhibitory processes that underlie selective attention is reduced. This decrement in inhibition allows more irrelevant information to enter working memory, and once entered, it allows the irrelevant information to receive sustained activation. This then sets the stage for subsequent reduced rates of success in accessing required information from memory. The consequences for discourse comprehension in particular may be profound because the establishment of a coherent representation of a message (and, thus, ultimately of understanding and recalling that message) hinges on the timely retrieval of information necessary to establish coreference among certain critical ideas. Repeated failures to successfully retrieve searched-for information in memory may set the stage for changes in cognitive styles, at least some of which may come to function as successful compensatory mechanisms. The current approach can handle the considerable data that has been otherwise taken to support reduced capacity views. The approach also explicitly acknowledges the contribution of individual and group differences in personal values and interests to the regulation of the content of thought processes. We note that although the framework was devised in
220
Lynn Hasher and R a ~ eT. Zacks
the context of research on older adults, its usefulness extends beyond that group. Indeed, it may be useful in developing an understanding of breakdowns in memory-dependent cognitive functions in physically and/or emotionally ill young adults a s well as an understanding of individual differences in the likelihood of using memory searches vs. environmental searches as strategies for cognitive functioning.
ACKNOWLEDGMENTS L. H. i s grateful to the John Simon Guggenheim Memorial Foundation and to Duke University for their support of a sabbatical leave which enabled the development o f the theoretical framework proposed here. The research reported here and the preparation o f this article was supported by Grant ROI MH33140 from the National Institute on Aging to both authors. The earliest research on this project was supported by grants from the Biomedical Research Support funds at both Temple and Michigan State Universities. We are indebted to Bonnie Doren and Verneda Hamm and particularly to Karen C. Rose and Linda Gerard for their sustained contributions to this project.
REFERENCES Alba, J. (1984). Nature of inference generation. Amerirun Jortrnul of P.syihlogv, 97, 215233. Alba, J., Alexander. S.. Hasher, L.. & Caniglia. K. (1981). The role o f context in the encoding of information. Jortrnul of Experimentul Psychology: Hiunun Leurning und Mernory. 7,283-292. Alba, J., & Hasher, L. (1983). I s memory schematic? Psyc/rologicul Birllerin. 93, 203-231. Allport, A. ( 1987). Selection for action: Some behavioral and neuropsychological considerations o f attention and action. In H. Heuer & A. F. Sanders (Eds.). Perspec*tii*e.son perception und uctiiin (pp. 395-419). Hillsdale. NJ: Erlbaum. Arbuckle, T. Y.. & Harsany, M. (1985). Adult age differences in recall of a moral dilemma under intentional, incidental and dual task instructions. Experiinenrul Aging Re.seciri~/i. 11, 175-177. Attig. M. S . . & Hasher. L. (1980). The processing of frequency o f occurrence information by adults. Jortrnul ofGeron1o1rig.v. 35, 66-69. Baddeley. A. D. ( 1986). Working meinory. New York: Oxford University Press. Baddeley. A. D., & Hitch, G . J. (1974). Working memory. I n G . H. Bower (Ed.), The psvchologv of Ieurning undinotivution (Vol. 8. pp. 47-90). New York: Academic Press. & Brereton, X. (1985). Components of fluent Baddeley, A. D.. Logie. R.. Nimmo-Smith, I., reading. Jorrrnul of Memory und Longrruge. 24, 119-131. Bahrick. H. P.. Bahrick. P. 0.. & Wittlinger. R. P. (1975). Fifty years o f memory for names and faces: A cross-sectional approach. Joirmul of Experiinentul Psvcholiigy: Generul. 104, 54-7s. Balota. D.A.. & Duchek. J. M. (1988). Age-related differences in lexical access, spreading activation, and simple pronunciation. Psvchology und Aging. 3, 84-93. Belmore. S . M. (19811. Age-related changes in processing explicit and implicit language. Jotrrnul of Gerontology. 36, 3 16-322.
Working Memory, Comprehension, and Aging
22 I
Bowles, B. L.. & Poon, L. W. (1985). Aging and retrieval of words in semantic memory. Jiorrrnul of Gerc~ntology,40, 71-77. Bransford. J. D. ( 1979). Humun cognition: Leurning. understunding. und remeinbering. Belmont, CA: Wadsworth. Bransford, J. D., & Johnson, M. K. (1972). Contextual prerequisites for understanding: Some investigationsof comprehension and recall. Jorrrnul of Verbal Leurning crnd Verbul Behavior. 11, 717-726. Burke, D. M.. & Light. L. L. (1981). Memory and aging: The role o f retrieval processes. Psychologiccrl Bulletin. 90, 5 13-546. Carpenter. P. A.. & Just, M. A. (1988). The role of working memory in language comprehension. In D. Klahr & K. Kotovsky (Eds.). Complex informution processing: The impurt of Herbert A . Simon. Hillsdale, NJ: Erlbaum. Clark, H. H. ( 19771. Inferences in comprehension. In D. LaBerge & S.J . Samuels (Eds.). Basic processes in reuding: Perception und Comprehension (pp. 243-263). Hillsdale. NJ: Erlbaum. Clark, H. H.. & Clark. E. V. (1977). Psyclioli~gyund lmgrruge: An introduction to psvcho1ingrristic.s. New York: Harcourt. Cohen, G . (1979). Language comprehension in old age. Cognitive P.syc.hology. 11,412-429. Cohen, G . (1988). Age differences in memory for text: Production deficiency or processing limitation?. In L. L. Light & D. M. Burke (Eds.), Longrtcrge. memory. crnd uging. New York: Cambridge University Press, in press. Cohen. G., & Faulkner, D. (1983). Word recognition: Age differences in contextual facilitation effects. British Jortrnul if Psychology, 74, 239-25 I. Cohen. G . , & Faulkner, D. (1984). Memory for text: Some age differences in the nature o f information that i s retained after listening to texts. In H. Bouma & D. G . Bouwhuis (Eds.), Attention und performcrnce X: Control if lungrtuge prcm~sses(pp. 501-5 14). London: Erlbaum. Cohn. N. B., Dustman, R. E.. & Bradford. D. C. (1984). Age-related decrements in Stroop color test performance. Jortmul of Clinicul Psychology. 40, 1244-1250. Comalli. P. E.. Jr.. Wapner. S.. & Werner H. (1962). Interference effects of Stroop colorword test in childhood. adulthood, and aging. Jortrnul of Genetic P.sychology. 100.4753. Craik, F. 1. M. (1983). On the transfer o f information from temporary to permanent memory. Philosphicul Trunscrctions of the Royul Society of London. Series B . 302, 34 1-359. Craik, F. 1. M., & Byrd. M. (1982). Aging and cognitive deficits: The role o f attentional resources. I n F. I . M. Craik & S . Trehub (Eds.). Aging crnd cognitive processes (pp. 191-21 I ) . New York: Plenum. Craik, F. 1. M.. & Rabinowitz, J. C. (1984). Age differences i n the acquisition and use of verbal information: A tutorial review. In H. Bouma & D. G. Bouwhuis (Eds.), Attention und perfiormunce X: Control of Iungriuge processes (pp. 471499). London: Erlbaum. Daneman, M.. & Carpenter. P. A. (1980) Individual differences in working memory and reading. Jorirnul of Verbul Leurning und Verbul Behwior, 19, 450-466. Daneman, M.. Carpenter, P. A. ( 1983). Individual differences in integrating information between and within sentences. Journal of Experimentul Psychiilogy: Leurning. Memory und Cognition. 9, 561-583. Daneman, M., & Green, 1. (1986). Individual differences in comprehending and producing words in context. Journul of Memory und Lungriuge. 25, 1-18. Daneman, M.. & Tardiff, T. (1987). Working memory and reading skill re-examined. I n M. Coltheart (Ed.), Attention ond performunce XII: The psycholi~gyof reuding (pp. 491508). Hove. U.K.: Erlbaum. Dell, G . S.. McKoon, G.. & Ratcliff, R. (1983). The activation o f antecedent information
222
Lynn Hasher and Rose T. Zacks
during the processing o f anaphoric reference i n reading. Jorrrnul of Verhcrl Letmiinp rind Verbol Beliuvior. 22, 121-132. Fischer, B., & Glanzer. M. (1986). Short-term storage and the processing of cohesion during reading. Qiriir/erly Jnrrmcil of Experimenrul Psychology. M A , 43 1-460. Glanzer. M., & Nolan. S . D. (1986). Memory mechanisms i n text processing. I n G . H. Bower, The psycliology of Ieurning und moriwrion (Vol. 20. pp. 275-3 17). Orlando FL: Academic Press. Graf, P.. & Schacter. D. L. (1987). Selective effects o f interference on implicit and explicit memory for new associations. Jorrrnul of Erperimenrul Psychology: Lerrrning. Merntip. und Cogni~icin.13, 45-53. Hamm, V. P. (1988). Aging und /he f ~ r m u ~ i oofn inferences. Ph.D. dissertation, Temple University, in preparation. Hartley, J. T. (1986). Reader and text variables as determinants o f discourse memory in adulthood. Psychology and Aging, I, 150-158. Hartley, J. T. (1988). Aging and individual differences i n memory for written discourse. I n L. L. Light & D. M. Burke (Eds.). Lunguuge. memory, unduging. New York: Cambridge University Press, in press. Hasher. L.. & Griffin. M. (1978). Reproductive and reconstructive processes in forgetting. Jotirnul 1.f Experirnenriil Psychology: Hrimun Memory und Leirrning. 4. 3 18-330. Hasher, L.. Griffin, M., & Johnson, M. (1977). More on interpretive factors in forgetting. Memory rind Cogni/ion, 5 , 41-45. Hasher, L.. & Zacks. R. T. (1979). Automatic and effortful processes in memory. Jortrnal of Experimenrirl Psyc~hology:Generul. 108, 356-388. Hasher. L., & Zacks, R. T. (1984). Automatic processing of fundamental information: The case of frequency of occurrence. Americun P.sycliokgi.s/. 39, 1372-1388. Hess. T. M. (1982). Visual abstraction processes in young and old adults. Developrni~ntul PsycholfJg~.IS, 473-484. Hint. W.. & Kalmar, D. (1987). Characterizingattentional resources. Jorrrnul ifExperimentul Psychology: Generul, 116.68-81. Howard, D. V.. Heisey, J. G . . & Shaw, R. J. (1986). Aging and priming of newly learned associations. Developtnenrul Psychology, 22, 78-85. Howard, D. V.. Shaw, R. J.. & Heisey, J. G . (1986). Aging and the time course of semantic activation. Jorrrnul c.fGercinroIogy. 41, 195-203. Hultsch, D. F., & Dixon. R. A. (1984). Memory for text materials in adulthood. In P. B. Baltes & 0.G . Brim. Jr. (Eds.). Lifcspun developmen/ und hehrriicor (Vol 6. pp. 77108). New York: Academic Press. Hultsch. D. F., Hertzog, C., & Dixon, R. A. (1984). Text recall in adulthood: The role of intellectual abilities. Developmenrul Psychology. 20, I 193- 1209. Hunt. E. (1985). Verbal ability. I n R. J. Sternberg (Ed.), Hiimirn uhiliries: An informutiionprocessing crpprc~crcl~ (pp. 3 1-58). New York: Freeman. Jacobs. W. J.. & Nadel. L. (1985). Stress-induced recovery of fears and phobias. Psychologicul Review. 92, 5 12-53 1. Jacoby. L. L.( 1983). Perceptual enhancement: Persistent effects of an experience. Jorrrnul 1.f Experimmrul Psyc~hology:Learning. Memory. und Cognition. 9, 21-38. Johnson-Laird, P. N. ( 1983). Menrirl models: Towirrds 11 cognitive science of Iunpiiupi~. inference. und i.onsciorr.sness. Cambridge. MA: Harvard University Press. Just. M. A.. & Carpenter, P. A. (1987). The psychology of reuding und lmprruye cwnprehension. Boston: Allyn & Bacon. Kausler. D. H. (1982). Experimenrul psychokogy und hitmiin uging. New York: Wiley. Kausler, D. H.. & Hakami, M. K. (1982). Frequencyjudgments by young and elderly adults for relevant stimuli with simultaneously present irrelevant slimuli. Jorrrncrl 1Jf Ceron/ologys37, 438-442.
Working Memory, Comprehension, and Aging
223
Kemper. S . ( 1987). Life-span changes in syntactic complexity. Joiirnol of’Gercintol~gy,42. 323-328. Kemper. S . ( 1988). Geriatric psycholinguistics: Syntactic limitations o f oral and written language. In L. L. Light & D. M. Burke (Eds.), Longiiuge. metnory. and iiging. New York: Cambridge University Press, in press. Keppel. G . ( 1967). A reconsideration o f extinction-recovery theory. Joitrntil of Verhcil Lerirning iind Verhril Beliriviiir. 6, 476-486. Kerr. B. ( 1973). Processing demands during mental operations. Memory iind Cogt~iiion,I, 401-412. Kintsch. W.. & van Dijk. T. A. (1978). Toward a model of text comprehension and production. P.vvc~hologicwlReview 85, 363-394. Klinger. E. (1978). Modes o f normal conscious flow. I n K. S . Pope & J . L. Singer (Eds.). The sirccim of conscioiisne Scieniific in vesiigutions into the J7ow o f liiirncin cxperiencx, (pp. 225-258). New York: Plenum. LaBerge, D.. & Samuels, S . J. (1974). Toward a theory of automatic information processing in reading. Cognitive Psyc~l~olrigv. 6, 293-323. Labouvie-Vief, G . . & Blanchard-Fields. F. ( 1982). Cognitive ageing and psychological growth. Ageing crnd Sociri.v. 2, 183-20. Layton. B. ( 1975). Perceptual noise and aging. Psycliologiciil Biillriin. 82, 875-883. Light. L. L. (1988). Language and aging: Competence versus performance. I n J . E. Birren & V. L. Bengston (Eds.). Hundbook of ilicories ofciging. New York: Springer. i n press. Light. L. L.. & Albertson. S . (1988). Comprehension of pragmatic implications in young and older adults. I n L. L. Light & D. M. Burke (Eds.), Lcingiiugr. metnory. cindciging. New York: Cambridge University Press, in press. Light. L. L.. & Anderson, P. A. (1983). Memory for scripts in young and older adults. Monoty itrid Cognition. I I, 435444. Light L. L., & Burke, D. M. (1988). Patterns of language and memory in old age. I n L. L. Light & D. M. Burke (Eds.), Lctngitiigr. mrmoty. rind ciging. New York: Cambridge University Press, in press. Light L. L.. & Capps. J. L. (1986). Comprehension o f pronouns in young and older adults. Deialopmentul Psvcholog.~,22, 580-585. Light L. L.. & Singh. A. (1987). Implicit and explicit memory in young and older adults. Joiirnul of Experiinentiil Psychology: Lecirning. Memory. iind Cognition. 13, 5 3 1-54 I. Light. L. L., Zelinski. E. M., & Moore, M. (1982). Adult age differences in reasoning from new information. Joitrnal of Erperimeniiil Psvcho/ogy: Lecirning. Metnor?. iind Cogniiion, 8, 435-447. Logan. G . D. ( 1985). On the ability to inhibit simple thoughts and actions: II. Stop-signal studies of repetition priming. Joitrnul of Experirnenicil P s y c h h i g y : Lrcirning. Memory. rind Cogniiion. I I , 675-69 I. Madden, D. J . (1983). Aging and distraction by highly familiar stimuli during visual search. Developmenicil Psyi~liology.19, 499-507. McGeoch. J. A. (1932). Forgetting and the law o f disuse. Psycliologicd Rcvicw. 39, 352370. McKoon, G . , & Ratcliff. R. (1986). Inferences about predictable events. Joiirnol q/Experitnrntol Psycliology: Lcurning, Memoty, iind Cognition. 12, 82-9 I. Mueller, J . H., Kausler. D.H.. Faherty. A., & Oliveri. M. (1980). Reaction time as a function o f age. anxiety, and typicality. Biilleiin of ilie P.vycIionomic Socieiy. 16, 473476. Navon. D.( 1984). Resources-a theoretical soupstone? P.vychologicci1 Review. 91, 2 16-234. Navon. D. ( 1986). Visibility or diwhility: Notes on riiieniion (ICS Report 86-06). San Diego: University of California. Nebes. R. D.. Boller. F.. & Holland, A. (1986). Use of semantic context by patients with Alzheimer’s disease. P.vycliii1og.v und Aging. 1, 261-269.
.
224
Lynn Hasher and Rose T. Zockp
Neumann. 0. (1987). Beyond capacity: A functional view of attention. In H. Heuer & A. F.Sanders (Eds.). Perspecifives on perception und ucfion (pp. 361-394). Hilkddle. NJ: Erlbaum. Obler. L. (1980). Narrative discourse style in the elderly. In L. Obler & M. Albert (Eds.). Liingiiuge und commiinicurion in the elderly (pp. 75-90). Lexington. MA: Heath. O'Brien. E. J.. Duffy, S. A.. & Meyers. J. L. ( 1986). Anaphoric inferences during reading. Joiirnul of Experimenful Psychology: Leurning. Memory. und Cognifion, 12, 346-352. O'Brien. E. J.. & Myers, J. L. (1987). The role of causal connections i n the retrieval o f text. Memory und Cognition. IS, 419-427. Perfetti, C. A. (1985). Reiiding ubility. New York: Oxford University Press. Plude, D. J., & Hoyer. W. J. (1985). Attention and performance: Identifying and localizing age deficits. I n N. Charness (Ed.). Aging und hiimun perfiwiniince (pp. 47-99). New York: Academic Press. Posner, M. 1. (1987). Sfriictures undjiincfions of selecrive utfention. Paper presented at the American Psychological Association Meeting. New York. Postman, L.. & Hasher, L. (1972). Conditions of proactive inhibition in free recall. Joiirniil of fiperimentul Psychology. 92, 276-284. Rankin, J. L., & Kausler, D. H. (1979). Adult age differences in false recognitions. Joiirnul of Geronfology, 34, 58-65. Ratcliff, R.. & McKdon, G . (1981). Automatic and strategic priming in recognition. Joiirnul of Verbul Leiirning und Verbal Behuvior, 20, 204-2 IS. Reder. L. M.,Wible, C.. & Martin. J. (1986). Differential memory changes with age: Exact retrieval versus plausible inference. Joiirnul of Experimenful Ps.vcho1ogy: Leiirning. Memory. und Cognition. 12, 72-81. Salthouse, T. A. ( 1982). Adiilf cognifion: An experimenliil psyc~ho/ogyof Iiiiiniin ciging. New York: Springer-Verlag. Salthouse, T. A. (1985). Speed o f behavior and its implications for cognition. In J . E. Birren & K. W. Schaie (Eds.). Hundhook of fhe psychology of iiging (2nd ed.. pp. 400-4261. New York: Van Nostrand-Reinhold. Sanford, A. J.. & Garrod. S . C. (1981). Understunding writfen liingiiiige: Explorufions in comprehension beyond the sentence. Chichester: Wiley. Scialfa. C.T., Kline, D. W.. & Lyman, B. J. (1987). Age differences in target identification as a function o f retinal location and noise level: Examination o f the useful field o f view. Psychology und Aging, 2, 14-19. Simpson. G . B. (1984). Lexical ambiguity and i t s role in models o f word recognition. Psvchologicul Biilletin. 96, 3 16-340. Smith. A. D. (1975). Partial learning and recognition memory in the aged. Inremcifioniil Joiirnul of Aging und Hiimun Development, 6, 359-365. Spilich. G . J . (19831. Life-span components o f text processing: Structural and procedural differences. Jiiiirnul of Verhul Leurning und Verbiil Behuvior. 22, 23 1-244. Squire, L. R. (1987). Memory and bruin. New York: Oxford University Press. Stanovich, K. E.. & West. R. F. (1983). On priming by a sentence context. Joiirniil .f Experimenfiil Psyc*holi~gy: Generul, 112, 1-36. Stine, E. L. & Wingfield. A. (1987). Process and strategy in memory for speech among younger and older adults. Psychology und Aging. 2, 272-279. Stine. E. L..Wingfield. A.. & Poon. L. W. (1986). How much and how fast: Rapid processing o f spoken language in later adulthood. Psychology und Aging. 1. 303-31 I. Till, R. E. ( 1985). Verbatim and inferential memory in young and elderly adults. Joiirniil of Geronfologv. 40,3 16-323. Tulving. E.. & Thomson. D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psyi~hologiculReview, 80, 352-373.
Working Memory, Comprehension, and Aging
225
van Dijk. T. A., & Kintsch, W. ( 1983). Slrctregies c.$discortrse c~ompreliension.Orlando. FL: Academic Press. Welford, A. T. (1984). Between bodily changes and performance: Some possible reasons for slowing with age. Experiinenral Aging Resectrch. 10, 73-88. Wright, R. E. (1981). Aging. divided attention. and processing capacity. Joitrnctl of Geron/ology, 36, 605-614. Zacks, R. T. (1982). Encoding strategies used by young and elderly adults in a keeping track task. Joitrncil ~$GeronroIogy. 31, 203-21 I . Zacks. R. T., & Hasher, L. (1988). Capacity theory and the processing of inferences. In L. L. Light 19D.M. Burke (Eds.), Langttuge. memory, cindctging. New York: Cambridge University Press, in press. Zacks, R. T.,Hasher, L.. Doren, B., Hamm, V., & Attig. M. S. (1987). Encoding and memory o f explicit and implicit information. Juttrnctl c.fGeronro11)g.v. 42, 418-422. Zelinski. E. M.. & Gilewski, M. J. (1988). Memory for prose and aging: A meta-analysis. In M. L. Howe & C. J. Brainerd (Eds.), CogniJiVe deivloptnenr in ~idtiIr/tood.New York: Springer-Verlag, in press.
This Page Intentionally Left Blank
STRATEGIC CONTROL OF RETRIEVAL STRATEGIES Lynne M. Reder
I.
Introduction
Virtually all complex cognitive tasks can be accomplished using one of several different strategies. Not only do different people use different strategies to accomplish a given task, but the same individual will often chose to switch strategies within a short period of time even though the task or goal of this individual has not changed. Why does a person do this? How does a person decide to do this, and is it in his or her best interest to switch strategies so often? Although 1 have investigated these issues (e.g.. Reder, 1982, 1987) within the domain of question answering, many of the arguments that hold for question answering can likewise be applied to other higher-level cognitive tasks, e.g., arithmetic, divided attention tasks, problem solving, and mental rotation. For the purposes of this article 1 will focus on question answering. Clearly, we use multiple strategies for question answering. Consider how differently you would approach questions such as “What country recently had a nuclear power plant disaster?” vs. “Does Ronald Reagan take naps often?” vs “How many Lithuanians live in Saskatoon?” It is most likely that attempts to answer these questions would look qualitatively different in terms of the strategies employed. So how do we decide if we know anything that is relevant to answering a particular question? How do we decide if we know anything that is relevant to answering a particular question? How do we decide the most appropriate procedure for answering a given question? This article will try to argue for a particular type of architectural frameT H E PSYCHOLOGY OF L E A R N I N G A N D MOTIVATION. VOL. ??
221
Copyright 0 19WX by Academic Press. Inc. All riphts of reprduction in any form rewved.
228
Lynne M. Reder
work which accounts for basic memory phenomena and then, based on these simple assumptions, argue for the importance of assuming a strategyselection component prior to careful memory search. These arguments will be both theoretical and empirically based. 1 will describe the variables that affect strategy selection and the possible mechanisms involved in selection and speculate on the generality of these mechanisms for other cognitive tasks. 11. A Two-Factor Theory of Memory Retrieval
Before describing my work on strategy selection in question answering, it will be useful to describe the type of memory framework on which I build. This framework is attractive in that it uses only two constructs to account for much of the memory retrieval literature. These concepts are level of activation of a memory trace and strength of a memory trace. This framework also makes some architectural assumptions about memory. Specifically, it assumes that memory consists of interconnected ideas (concept nodes) and that these interconnections or associations between ideas vary in strength. Strengths of associations are determined by how often the associations are encountered. Strength builds up slowly over multiple encounters and also dies away slowly from disuse (much like a muscle). What a person is currently attending to is in a state of mental activation, and this activation can spread along pathways or connections to associated concepts. Concepts can become active from stimulation, either external, i.e., from the environment, or internal, i.e., something one is thinking about activates that concept. Activation spreads very quickly to other concepts but dies off very quickly, with a half-life of less than 1 sec. How much activation spreads to another concept node is partly a function of the strength of the connection between the two ideas, how many other associations or connections also share the activation, and how much activation there is to share. Activation determines the ease of access of information-the more active the information, the easier it is to access. However, there is only a finite amount of activation that can be shared among concepts, so only a small portion of memory can be active at any one time. The rate of processing of an idea depends on the level of activation of the relevant concepts, i.e., those concepts involved in the procedures to be executed by memory. An easy way to explain these constructs is by using some examples that we have probably all used to explain terms such as "short-term memory" to a nonpsychologist. In this situation, we remind the layperson of a past experience where a telephone number was obtained from directory
Strategk Control of Retrieval Strategies
low
ACTIVATION high
forgotten events
Efa 3a
low
li high
229
memories that can be retrieved on demand
what you are thinking about NOW, but will forget
short-term memory
what you are currently reminiscing about
long-term memory
h
I
working memory
Fig. I. Two-factor theory of memory retrieval.
assistance and forgotten prior to its use because someone else asked him or her a question before the number was dialed. This situation and others are represented in Fig. 1. This figure shows the four possible states of a memory trace assuming binary values on these two dimensions. The right-hand column describes information that is currently in a highly active state, namely, those thoughts of which we are currently aware. In other words, working memory consists of thoughts that are in a highly active state, and nothing can be in working memory that is not active. The top row refers to information that is low in strength, i.e., the connections between the elements (concepts) are weak because they have not been practiced together very much. So, for example, the phone number we get from directory assistance would be described by the upper righthand box as long as we continued to rehearse it [and no one interrupted us to ask us a question, thereby blocking further rehearsals-See Baddeley (1986)for an elegant discussion]. We know that this phone number is low in strength because otherwise we would not have asked directory assistance for the number, but would instead have retrieved it from our longterm memory. The bottom row refers to those items that are high in strength and, thus, have been retrieved from memory. The lower right-hand cell describes the information that we are currently reminiscing about. Once we stop thinking about that information, it falls from its active state, but it is not forgotten since the connections are still strong. Later they can be retrieved when wanted. The phone number, however, is long gone once we dial it and talk to the party on the other end. Of course, we all know that the
230
Lynne M. Reder
phone number could become strengthened from repeated trials of using it (e.g., Hebb, 1961) or learned easily if the number were especially mnemonic (e.g., Prytulak, 1971). In the latter case, associations are built to prior structures that are already especially strong, and one only needs to rehearse the pointer or connection to the prior strong set of connections. A.
COMPLICATING THE PICTURE: COMPETITION FOR ACTIVATION
Unfortunately, we can not simply say that our ability to retrieve from memory depends on our ability to activate the relevant memory traces and that the strength of a trace or association affects how easily it can be activated. This picture needs to be complicated because the umoitnt of’ activation is influenced by the relative strength of the associations, that is, the strength of an association in comparison to the other competing links also associated with the activated concept.’ Indeed, many phenomena concerned with ease of learning, probability of forgetting, or time to respond can be explained in terms of competing associations to the same stimulus. The paired-associate learning paradigm (e.g., Postman I97 1 ; Postman & Underwood, 1973) is an obvious example, as is the “fan” paradigm (e.g., Anderson, 1983). The latter paradigm shows that the more facts that are committed to memory about a particular concept, the slower a person is to recognize or reject (as not studied) any statement sharing that concept. Recently, there has been much research associated with “overwriting memory” (Loftus, 1979; Miller, & Burns, 1978; Loftus. Schooler, & Wagenaar, 1985; McCloskey & Zaragoza, 1985; Tversky & Tuchin 1987). This research is probably best thought of as instances of interference. This is shown both in the reaction time results of Donders, Schooler, and Loftus (1987) and in the research on probability of recognition by Tversky and Tuchin (1987). These two variables, strength and interference, have opposite effects on response time: practice makes retrieval faster and competition makes it slower; however, the effects can be combined in a regular fashion. For example, Anderson (1983) showed that the time to recognize a sentence could be fit by a power function but that sentences with greater fan were fit to a different power function with longer retrieval times (RTs) for the same amount of practice. Similarly, Peterson and Potts (1982) and Lewis and Anderson (1976) showed that real-world facts are verified more quickly ‘One metaphor that might make this easier to understand is to equate aclivalion with the amount of water that flows through an irrigation channel. The strength of a connection maps onto the depth of the channel: the deeper the channel. the more water (activation) can pass through. The more competing irrigation channels that share one water source (more links emanating from one activated node), however, the slower the water will go down a given channel (the less activation for any particular path).
Strategic Control of Retrieval Strategies
23 I
than new facts (due to their greater strength) but even real-world facts are verified more slowly when more new facts are learned that are related to them.
B. THEPARADoX OF THE EXPERT It seems quite sensible, a priori, that strength should be associated with greater efficiency in retrieval; however, it seems counterintuitive to believe that knowing more about a concept should produce interference and weaker performance (Smith, Adams, & Schorr, 1978.) Nonetheless, subjects are slower to verify a fact when additional facts are known about that person or topic. When thought of as "interference." the fan effect makes sense, but when thought of in terms of expertise of knowledge, it seems paradoxical. Perhaps the best way to understand this phenomenon is to assume that experts do not typically try to retrieve any one specific fact even to verify an assertion; rather, they frequently draw on their general knowledge and compute whether or not an answer seems plausible. There are several lines of research that are consistent with this point of view. For example, Reder (1976, 1979) found that verification times for statements about a story were strongly influenced by the statement's plausibility even when that statement had been explicitly mentioned in the story. This suggests that we do not always first search memory to see if a statement has been presented when we are asked to decide on a statement's plausibility. Reder and Anderson (1982) found that people will often use a plausibility or consistency strategy even when asked to make recognition judgments. They had people study sentences such as the following: ( I ) the teacher went to the train station; (2) the teacher bought a ticket j b r the 1O:OO train; and ( 3 ) the teucher arrived on time at Grand Central Station. The fan effect was not found when the foils (statements to be rejected as not studied) were not thematically related, e.g., the teacher culled to have N phone installed. On the other hand, the traditional fan effect reappears when the foils preclude using an inference strategy to make recognition judgments, e.g.. the test probe was the teacher checked the Amtruk schedde. In addition, the density of errors in the recognition task increased with the fan of the foils, suggesting that with more consistent information, the plausibility of the foil increased. This result supports the view that we often prefer to adopt a plausibility strategy even when asked to make recognition judgments and that sometimes we even adopt such a strategy when it will cause us to make errors (Reder, 1982). The paradox is not completely resolved by showing that the interference effects of additional knowledge are dissipated when "experts" can adopt a plausibility-like strategy to make recognitionjudgments: knowing more should actually facilitatejudgments. Experiments by Reder and Ross (1983)
232
Lynne M. Reder
and Reder and Wible (1984) actually did find situations where there is facilitation in verification due to increases in fan. In these studies, all subjects learned thematically related sets of information and were tested about this information in different conditions. In Reder and Ross, subjects were tested in two types of recognition blocks similar to Reder and Anderson; however, they were also tested in a consistency judgment task where they were to say "yes" to both studied facts and facts that were not studied but were thematically consistent. For that block of trials, subjects were only to say "no" to facts that were thematically inconsistent with the material studied. Subjects were much faster to verify statements with a greater fan if they could verify them using a plausibility (or consistency) strategy. In the Reder and Wible study, subjects were also tested at a 2-day delay.' They replicated the findings of Reder and Ross ( 1983) and Reder and Anderson (1982) in the comparable conditions; however, they found greater facilitation with fan for the stated probes at longer delays, suggesting a greater reliance on the plausibility or consistency strategy at longer delays. These facilitation effects can be explained by assuming that the semantically related facts are organized into a subnode attached to the main concept (or person) node. The more facts emanating from the subnode, the less activation that will go to any one fact; however, if the subnode itself is sufficient to determine that a fact is plausible, then the link from the main node to the subnode will be stronger when more facts fan out of it.'
c.
THEROLE OF ELABORATIONS IN MEMORY:W H E N DO THEYHELP A N D WHEN Do THEYHURT?
Thus far, we have discussed the variables that affect retrieval, e.g., the amount of activation that a memory trace receives. Related facts can compete for activation, thereby slowing response times (causing the fan effect) or hurting recall probabilities; however, we saw that, in certain situations. related facts can actually cause subjects to respond faster. The phenomenon that subjects verify a statements's plausibility more readily when there are more consistent facts would be of limited interest if it could only be demonstrated in fan-effect paradigms where the experimenter systematically varies the number of relevant facts. However, the idea that 'Actually. in the Reder and Wible study. different subjects made recognition vs. consistency judgments; making the Judgments at two different delays was. however. a within-subject variable. 'A comparable model would be that facts are retrieved from the person node at random and the greater the number of consistent facts. the sooner will any one relevant fact be found.
Strategic Control of Retrieval Strategies
233
people “recognize” statements by inferring semantically consistent facts can be shown in domains other than memorization of sets of facts. The use of plausible reasoning to make recognition judgments has been shown for episodically learned materials. such as stories (e.g., Owens, Bower, & Black, 1979; Reder, 1976, 1982, 1987). It has also been shown for passages that refer to famous figures (e.g., Sulin & Dooling, 1974; Dooling & Cristiaansen, 1977). Owens et al. showed that people remembered more of a passage when there was additional information making the passage more interesting, viz., that the protagonist was concerned that she had become pregnant by her professor. In that situation, more veridical facts were recalled. but also more intrusions were made. Similarly, Sulin and colleagues found that passages about Helen Keller or Adolf Hitler were better remembered than identical passages about nonentities; however, there were also significantly more false alarms to statements that were true of the famous characters but had not been stated. Arkes and Freedman ( 1984) replicated the well-known finding (e.g., Chiesi. Spilich, & Voss, 1979) that high knowledge subjects have better memory for the information given; however, they showed that when the task forces the high-knowledge subjects to discriminate facts actually presented in the passage from plausible inferences, their performance was significantly worse than novices. This, too, suggests that we normally elaborate and that in most situations these elaborations facilitate rather than interfere. We have thus seen that whether elaborations facilitate or interfere with memory retrieval depends on the situation. When we are able to answer a question by using a reconstructive or plausible reasoning strategy, more elaborations help our ability to answer questions; however, when we are required to use “direct retrieval,” elaborations are interfering. In my own studies (e.g., Reder. 1979, 1982, 1987). I have found that people are faster to verify highly plausible statements than moderately plausible statements when they had been presented as part of the story, but also more prone to falsely recognize them when they had not been presented. Given that there are situations where using reconstructive memory (or plausible reasoning) is the preferred way to answer a question and that there are other situations where direct retrieval is preferred, the question then becomes this: How does a person determine which procedure is preferable in a given situation? Some of my past research (e.g.. Reder, 1982, 1987; Reder & Wible, 1984) suggests that the tendency to adopt one strategy or another varies with the situation and that different question-answering procedures are not simply executed in parallel with the faster procedure winning. Instead, these experiments support the view that people make a decision as to which procedure to prefer, i.e., that it can be under a person’s conscious control. Before reviewing the specifics of some of these experiments, let me
234
Lynne M. Reder
first describe the general paradigm that characterizes them. Subjects read a series of simple stories and are then asked to make judgments about statements based on these stories. The stories are presented sentence by sentence on a computer screen, and the subject presses a space bar to see the next sentence of the story. The questions are asked after each story. after every 2 stories, after 5 stories, after all 10 stories, or a few days later. Subjects are typically asked either to make recognition judgments (i.e., “Did you see this sentence when you read the story?”) or to make plausibility judgments (i.e., “Is this sentence plausible given the story you read?”). Statements that should be judged as plausible fall into two plausibility categories: they were rated by other subjects either as highly plausible or as moderately plausible. A subset of these plausible statements are randomly selected to be presented in the story as part of the story even for subjects asked to make plausibility judgment^.^ For subjects asked to make plausibility judgments, there were an equal number of implausible statements also asked at the time of the test so that subjects would respond positively to 500/0 of the items. An example of the type of story used and its questions are listed in Section VII. Reaction times for correct responses are used to infer which strategy subjects use to answer the questions. The difference in response times between previously stated and not-stated test probes measures the degree of use of the direct retrieval strategy, while the tendency to employ the plausibility strategy is operationalized as the difference in RT between the moderately plausible and highly plausible statements. In earlier studies (Reder, 1982) it was found that subjects became faster in making plausibility judgments as the delay increased between reading the stories and the test of the information, while they became slower to make recognition judgments at comparable delays. The speedup for plausible reasoningjudgments was due almost exclusively to the improvement in verifying statements that had not been presented as part of the story; those statements that had been asserted in the story were verified as quickly when tested immediately as when they were tested at a delay. One explanation of this result is that, at short delays, subjects prefer to search memory for the specific fact, while at longer delays they do not bother to first search memory before trying to infer the answer. In fact, the error rates for the recognition judgments went up dramatically from the short to the long delay, but only for those plausible statements that had not been presented. This also suggests a shift away from the use of a direct retrieval strategy toward a plausible reasoning strategy. This random selection was done separately for each subject so that effects due to materials were part of the subject error term.
Strategic Control of Retrieval Strategies
111.
A.
235
Influencing Strategy Selection: Extrinsic Variables and Intrinsic Variables
STRATEGIC VARIABLES
In this section, I describe some recent work (Reder. 1987) that further supports the existence of an independent strategy-selection stage. These experiments use the paradigm described previously, namely, subjects read a series of stories and are asked to make judgments about them. I . Are Subjects Sensitive t o the Probability of Finding u Stutement Stored in Memory?
In one of these experiments, subjects were asked to make plausibility judgments about statements immediately after reading a story. We varied the probability that the statement to be judged had been asserted in the story. For the first six stories that subjects read, the ratio of presented to not-presented statements was either 20:80 or 80:20. with different subjects randomly assigned to one of these two ratios. The “stated” implausibles were contrudictions of statements from the story: a statement from the story had one word replaced with its opposite to create the contradiction. After subjects judged statements for six stones with this uneven ratio of presented to not-presented statements, they read four more stories where the ratio was an even 5050 of presented to not presented (and an equal number of implausibles and exact contradictions). Subjects were never explicitly told of the ratio, nor were they told that the answers to some of the questions had been stated in the story. One question of interest is whether a person’s probability of employing a particular strategy varies as a function of the prior history of success with that strategy. If so, then several conclusions can be drawn about the question-answering process. For example, if subjects prefer a strategy that is sensitive to the ratio of presented to not-presented statements, then they are not always searching first for an exact match in memory before trying the plausible reasoning strategy. Second, they would not simply be letting both the search process and the inference process run in parallel, with the faster process providing the answer. The results of this experiment are displayed in Fig. 2, which plots a difference measure of reaction times for not-stated plausible probes minus stated plausible probes as a function of the ratio of presented to not-presented probes. A large difference in RT suggests that subjects were using the direct retrieval strategy. A small difference suggests that subjects were employing the plausibility strategy. Note that for subjects who received a disproportionately large number of probes that had been stated in the
Lynne M. Reder
236
l.00Oy
\ Direct Retrieval Bias
0.8OOC 0.700'
.B w
I
8w &
Ei
z
0.600' 0.600'
0.400' 0.300 '
Oe200 0.100 0.000
Plausibility Bias
I 1-6
7-10
STORIES Fig. 2. Difference in reaction time between stated and not-stated statements (collapsed over plausibility).
story, there was a big difference in response time between the stated and not-stated probes. This suggests that these subjects became accustomed to searching memory first for the probe and only resorted to plausible reasoning when that strategy failed. In contrast, for those subjects who rarely found the probe stated in memory, the difference in RT was much smaller, suggesting that they tended to adopt the plausible reasoning strategy. A second type of analysis examines the difference in response time between highly plausible statements and moderately plausible statements, collapsing over whether the statement had actually been presented as part of the story. In that case, the pattern is exactly reversed: there is a large difference in RT for subjects who rarely found a probe stated in the story, which is consistent with the view that they were using the plausible reasoning strategy as a first choice. Similarly, subjects who were biased to use the direct retrieval strategy because of the large proportion of stated probes showed a very small difference in RT between highly and moderately plausible statements, suggesting that they tended to prefer the direct retrieval strategy. The size of the differences in RT (between the stated vs. not-stated probes and the moderately vs. highly plausible probes) grew
Strategic Control of Retrieval Strategies
237
with exposure to the ratio. Analyses that compare differences after three stories and after six stories showed the increasing trend, and the differences between the two ratio groups declined during the last four stories (comparing stories 7 and 8 with stories 9 and 10). 2. Are Subjects Sensitive to Official Task Requirements When These Do Not Matter Objectively? A slightly different type of experiment also provides support for the view that the type of question-answering procedure employed is under strategic control. In this experiment subjects also read stories and made judgments about statements concerning each story. Subjects were randomly assigned either to the recognition task or to the plausibilityjudgment task. In recognition, subjects were asked to decide whether or not a statement had in fact been seen as part of the story; the other subjects judged whether or not a statement was plausible given the story they had read. Although the official tasks were different, the experiment was constructed in such a way that subjects could follow either the plausibility strategy or the direct retrieval strategy and always get the correct answer. This effect was achieved by including all plausible statements in the story. So for the recognition task, all not-presented statements were implausible, thereby allowing subjects to use a plausibility strategy to make a recognition judgment. Likewise for the plausibility task, subjects could use a direct retrieval strategy because if the statement was plausible, it would have been previously presented. Some subjects were asked to answer these questions after each story while others were randomly assigned to the delay condition (approximatelya 20-min delay) where questions were asked about each story after all ten stories had been read. The results of this experiment indicate that subjects do not simply employ the official strategy requested of them; however, they are definitely influenced by the task demands. Figure 3 graphs the tendency to employ the plausibility strategy as a function of official task demands and delay condition. In the last experiment described, we had two converging measures of strategy use, one being the difference between the two levels of plausibility and the other being the difference between the stated and notstated probes (of otherwise equal plausibility). In this experiment, since all plausible statements had been presented, we can only use the former measure. Note that the difference in RT is much greater for subjects who were asked to make plausibility judgments than for those subjects who were asked to make recognitionjudgments; however, at the longer delay interval, the tendency to use the plausibility strategy over the direct retrieval strategy has increased for both groups such that subjects in the delay recognition condition have as great a tendency to use the plausibility
Lynne M. Reder
238
Plausibility
3
0.1
-
0.0
-
b
el
I
I
I
IMMEDIATE
DELAY DELAY
Fig. 3.
Difference in reaction time between moderately and highly plausible statements.
strategy as do subjects assigned the the immediate plausibility condition. The increased use of the plausibility strategy with delay is not, by itself, an argument that subjects bias their preference for one strategy over another. However, it is the most parsimonious explanation: task instructions differentially affect the tendency to select one strategy over the other, even though subjects do not always use the strategy that corresponds to the official task. In sum. it seems clear that subjects do control which strategy they will use at any given time. 3. Strategic Conrrol f r o m Triul to Triul The next experiment does more than support the view that subjects have strategic control of their use of question-answering processes. It also shows that subjects can adjust their strategy preference from trial to trial rather than only changing their long-term bias on the basis of prior history of success with a strategy or official task demands. In this study subjects
Strategk Control of Retrieval Strategies
239
were advised prior to each question which strategy was most likely to be effective. Again. subjects read a series of stories, each story followed by a set of statements to be rated on their plausibility with respect to the last story read. Half of the plausible statements had been included in the story and half of the implausible statements had their exact contradiction presented in the story. On half of the trials, subjects were advised that the next probe (or the exact contradiction of the probe) could probably be found in memory. On the other half of the trials, subjects were advised that they probably would not find the next test probe (or its exact contradiction) in memory. The advice was accurate 80% of the time. For the first two stories. the advice was always correct in order to motivate subjects to pay attention to the advice. The remaining eight stories were analyzed as a function of advice appropriateness, whether or not the statement had been presented in the story, and as a function of probe plausibility. The correct RTs are displayed in Fig. 4 as a function of correctness of advice and whether the probe had been stated or not in the story. These two variables define the type of advice given to the subject. For instance if the statement had not been
3.0-
....,...a D.RET.
8 u
%
2.5-
INF.
CORRECT
INCORRECT ADVICE
Fig. 4. Reaction times for stated vs. not-stated probes (collapsed over plausibility). D. RET.. Direct retrieval: INF.. inference: 0 , not stated; 0. stated.
240
Lynne M. Reder
presented and the advice was inappropriate, then the advice must have been to try searching memory for the fact (the direct retrieval strategy). Subjects were significantlyfaster when they were given the correct advice as compared with the inappropriate advice, so, clearly, subjects were acting on the advice given to them. The RTs for not-stated probes when direct retrieval was advised were much slower than all others because this was the only condition where the advised strategy would not work. The advice to infer when the probe was stated will work: however, it is nonoptimal since, at such a short delay, direct retrieval is a faster strategy. IV. When Does the Strategy-Selection Stage Operate? Taken together, these studies provide evidence for the existence of a strategy-selection stage that precedes strategy execution. If subjects simply let both the direct retrieval and plausibility strategies run in parallel. then giving the wrong advice prior to receiving the question should have no effect. Likewise, official task demands and prior history of success should have no effect on behavior. On the other hand, our results do not shed light on several other related issues. First of all, these studies do not discriminate between a system where the competing procedures run in parallel with differential allocation of resources and a system where only one of the two procedures executes at one time (i.e., a serial system). In either case, however, there must first be a mechanism that decides which process to favor. Second, these studies do not answer the question of whether or not people can adjust which strategy they will prefer based on an initial examination of the question. That is, the evidence supports both the need for and the existence of a strategy-selection phase; however, none of these data is inconsistent with the view that this decision stage occurs prior to seeing a sentence. It could be the case that subjects make specific decisions about what strategy is the best to use in the absence of parsing a question. yet are unable to adjust strategy selection based on seeing the question itself. Prior history of success with a strategy (base rate of presented statements, etc.), advice prior to seeing a question, official demands of the experiment (task to judge plausibility vs. recognition), and prior knowledge of the delay between reading the story and test questions could all be explained within a model that posits subjects adjusting strategy preference prior to seeing a question. Several experiments will be described that lend support to the view that strategy selection can also take place after an initial parse of the test probe.
Strategic Control of Retrieval Strategies
A.
C A N W E SELECT A STRArEGY A R E R W E
HAVES E E N
24 I
THE
QUESTION? Strutegy Selection in u Mixed-Delay Design
This study (Reder & Wells, 1986) was quite similar in design to many of the studies described previously. Subjects read ten stories; half were asked to make recognition judgments and half were asked to make plausibility judgments about statements concerning the story. The critical change in design was that before a question appeared on the computer screen the subject did not know whether it would be from a story that he had just read or from a story read 2 days earlier. Subjects read five stories 2 days prior to answering questions and read the other five stories just prior to answering questions. Subjects received a random mix of questions from a story they had just read and from one read 2 days earlier.5 Using this design, we could determine whether subjects could use a rapid inspection of a statement in order to bias question-answering strategy use. If it is possible to bias strategy selection‘after seeing the question, presumably stories read 2 days earlier would be answered using a plausibility strategy while questions from stories that had just been read would be answered (when possible) using the direct retrieval strategy. The results of this study are presented in both tabular form (see Table I) and in two figures (see Figs. 5 and 6). The data are represented in the figures as a function of the delay between reading the story and seeing the test probe (immediate vs. 2 days) and as a function of task (plausibility vs. recognition). The ordinate in Fig. 5 plots a difference measure of correct response times for not-stated probes minus stated probes. Figure 6 plots the same contrast for accuracy. The variable of plausibility is ignored in these figures, although it is reported in the table. It should be pointed out that the function for the plausibility subjects represents only “yes” responses while the correct responses for not-presented statements in the recognition task are “no” responses. Figure 5 shows that the effect of whether an item was stated or not is much greater for subjects who were asked to make recognitionjudgments than for those asked to make plausibility judgments; in the immediate condition, there is an effect of stated vs. not-stated for both groups. This
‘After reading a story on the second day, a subject was shown a title from one of the five stories from the previous session. The questions from both the story referred to by the title and the story just read were then presented. In this way. subjects knew which two stories were to be queried in any set of questions.
Lynne M. Reder
242
TABLE I
MEAN RESPONSE TIMES(SEC) TO MAKEVERIFICATION JUDGMENTS. A N D PROPORTION CORRECT, MIXED-DELAY EXPERIMENT" Recognition Stated
Plausibility
Not stated
Stated
Not slated
2.518 (.74) 2.571
1.888
2.195 (.88) 3.024 (.73) 2.676 (36)
Immediate High
Moderate
(.W Implausiblekontradictory
Delayed High Moderate Implausiblekontradictory
2.159
1.92) 2.529 (.89)
_.
2. I67 (.771 2.320 t.73)
(.%I
2.840 (S3) 2.932
(.a)
2.238 2.520 (33) 2.693 (.71)
2.261 (32) 2.615 (.69) 2.695 t.82)
"Proportion correct is shown in parentheses.
triple interaction is significant [F(1.39) = 10.2, p < .Ol]. This means that subjects who were asked to make plausibility judgments were adopting the direct retrieval strategy in the immediate condition but not when the statement had been read 2 days earlier. Consistent with this view, the effect of statement plausibility (in the plausibility task) is also significantly larger for stories that were read 2 days earlier if the statements had not been presented, i.e., plausibility x stated x delay [F(1,20) = 7.23, p < .01]. (For statements that had been presented in the story, the effect of plausibility is almost always attenuated.) Now consider the performamce of subjects who were asked to make recognition judgments. Although the response times are considerably longer for statements that were read 2 days earlier (see Table I). the difference function (betweeen not stated and stated) does not change much with delay. This is because if subjects adopt the plausibility strategy, they are very likely to make an error on the not-stated probes, and those RTs are not reflected in these means. On the other hand, the error rates do show an increased tendency to use the plausibility strategy for the statements that were read earlier (see Fig. 6). Subjects make significantly more errors to not-stated probes that were read 2 days earlier [F(I , 19) = 8.53.
Strategic Control of Retrieval Strategies
0.700
243
t
w
Eia
0.400
8+
0.300
R a
0.200
-
0.100
-
P1ausibi1ity
I
IMMEDIATE
DELAY DELAY
Fig. 5 . Difference in reaction time between not-stated and stated probes (collapsed over plausibility).
< .01]. suggesting that they used the plausibility strategy for "older" statements instead of using direct retrieval and, thus, erroneously said "yes" to not-stated probes. The reliance on a strategy that produces errors for subjects who were asked to make recognition judgments produced a main effect of task on errors (p < .01.) Recognition subjects were much more likely to make errors on highly plausible statements (plausibility x task) (p < .01). They were also much more error prone for not-stated items at a delay than were the plausibility subjects (stated x delay x task). p < .01). All subjects were more likely to make errors to not-stated probes (p < .01). but the effect was most marked for subjects who were asked to make recognition judgments on highly plausible, not-stated items (plausibility x stated x task, p < .01). Here, subjects are erroneously accepting the highly plausible statements which they are supposed to reject in that condition. As one would expect, performance was also more error prone for older memory traces for all subjects, i.e., there was a main effect of delay on errors.
p
244
Lynne M. Reder
I
IMMEDIATE
DELAY DELAY
Fig. 6. Percentage difference in accuracy between stated and not-stated probes (collapsed over plausibility).
TABLE I1
EXAMPLES OF TYPES OF QUESTIONS USEDI N GAMESHOW EXPERIMENTS Easy What month follows September? (October) How many letters are there in the alphabet? (26) Moderate Who wrote "Romeo and Juliet"? (William Shakespeare) Who invented the phonograph? (Thomas Edison) Hard What was Mark Twain's real name? (Samuel Langhorne Clemens) Which well-known artist painted "Guernica"? (Picasso) Impossible What size collar does Lassie wear? What is George Bush's telephone number?
Strategic Control of Retrieval Strategies
245
In sum, there was evidence of use of strategies other than the prescribed strategy for both tasks, viz., use of the direct retrieval strategy by subjects who were asked to make plausibility judgments at short delays and use of the plausibility strategy by subjects who were asked to make recognition judgments at long delays. This replicates other findings by Reder (1987). but in a paradigm where subjects cannot know prior to viewing the question how old the memory trace is. One can, therefore, conclude that subjects are able to adjust their strategy choice after seeing a question. It is not something than can only be adjusted in advance of processing the question.
V.
Influencing Strategy Selection: Intrinsic Variables
Now that we have established that there must be a strategy selection mechanism that biases strategy use, there remains the issue of how this strategy selection is done. Given that factors such as prior history of success with a strategy, official task demands, and advice prior to seeing a question can all influence strategy selection, we know that there is a mechanism that is independent of question parsing that can influence this selection. Earlier work (Reder. 1982) had shown that delay between reading a story and test of the story’s content influenced strategy use. It was unclear if such strategy preference was a conscious decision based on the subject’s knowledge of when the story had been read or was influenced by the parsing of the question itself. The previous experiment described showed that people can ascertain the age of the information by inspecting the question itself and adjust their strategy use accordingly. Therefore, we know that not all mechanisms associated with strategy selection are concerned with cues that are external to the probe such as prior history of success. Thus, we need to specify a mechanism that can help people to determine, on the fly, what is the best strategy to use in a given situation. A.
FEELING OF KNOWINGAS MECHANISM
AN
INTRINSIC STRATEGY SELECTION
One possible mechanism that can help people determine the best strategy involves “feeling of knowing” judgments. Previous work on feeling of knowing has demonstrated that when people are unable to answer a question, they still can assess their ability to recognize the answer (e.g., Nelson, Gerler, & Narens. 1984). Conceivably, the same mechanism that is involved in assessing our feeling of knowing when we cannot answer a question is also involved in assessing our ability to answer those questions that we can answer.
246
Lynne M. Reder
To address this hypothesis, we developed the "game show" paradigm (Reder, 1987) where subjects first estimate whether they think they will be able to answer a question before they actually attempt to answer it. We call it this because in some television game show formats, contestants are motivated to press a buzzer indicating their willingness to attempt an answer to a question even though they have not yet heard the full question. We asked half of our subjects to respond in this game show fashion, giving a rapid first impression or best guess as to their ability to subsequently answer a given question. The other half of the subjects were asked to immediately answer the questions. The experimental question is: can people estimate their ability appreciably faster than they can generate the answer, without suffering a loss in accuracy? Table I1 gives examples of the types of questions subjects might be asked to answer. The results were very clear: subjects were able to estimute that they could answer a question significantly faster than they could answer it. The data are presented in Table 111 for time to estimate an answer or give an answer, proportion attempted and proportion correct. There is clearly a sizeable RT advantage for subjects in the estimate group. This effect is even larger if the first 25% of the trials are excluded. Presumably, this practice effect reflects the fact that subjects are not accustomed to overtly estimating their ability to answer a question. One potential confounding factor in this experiment is that subjects in the answer condition had to articulate a response that they had not said before in the course of the experiment while subjects in the estimate condition were either saying "yes" or "no"-short, one-syllable words they had practiced a lot during the experiment. So we conducted a control experiment where the answer group pressed a button when ready to give the answer or when they had decided that they could not answer the question. The estimate group pressed a button if they thought they would be able to answer a question or the other button if they thought they could not. The difference in the two conditions, apart from the instructions, was that the question disappeared from the screen in the answer condition after the subject pressed a key (indicating that the answer was in mind) while the question remained on the screen after the subject in the estimate condition indicated that he probably could answer it. The data for this control study are displayed in Fig. 7. The time to push the button is displayed in the bottom curve. The upper curve is the total time required to generate an answer. Therefore, the distance between the bottom curve and the top curve reflects the amount of time required for the subject to generate the answer after pushing the button. Note that subjects who were asked to estimate whether they could answer a question took (significantly) less time to push the button than did subjects who were asked to push a button when the answer had been retrieved from
Strategic Control of Retrieval Strategies
247
TABLE 111 TIMETO ESTIMATE OR ATTEMPTANSWERS, PROWRTION OF ATTEMPTED,PROPORTION ANSWEREDCORRECTLY, AND QUESTIONS ACCURACYOF ATTEMPTS AS A FUNCTION OF TASKA N D QUESTION DIFFICULTY; GAMESHOW CONTROL EXPERIMENT“ ~
Question difficulty Easy Moderate Hard Impossible All attempted”
Time to attempt (sec)
Proportion attempted (5%)
Total correct (96)
Accuracy of attempt
Est
Ans
Est
Ans
Est
Ans
Est
Ans
1.650 1.828 1.728 1.524 1.735
2.512 2.654 2.388 2.925 2.518
86.02 67.46 42.38 7.56 65.29
91.62 72.29 47.09 25.13 70.34
80.62 59.48 34.13
80.35 54.97 28.42
93.60 x7.98 81.44
87.54 76.0~ 59.07
58.08
54.58
~7.67
74.23
RT for answered correctly” RT for answered incorrectly* RT for not attemptedh
-
-
-
Est (sec)
Ans (sec)
1.722 2.069 1.851
2.3% 3.00 3.197
“Est. Estimate: Ans, answer. ”These means do not include the impossible inclusion.
-
items. The pattern is quite similar with their
memory. On the other hand, they took significantly more time to generate the answer once they decided that they probably could find the answer than did answer subjects, who claimed that they already had the answer in mind. Indeed, the total time required to answer a question was effectively identical for the two groups, which suggested that the processes used in both tasks are the same, but partitioned differently. It is important to note that the significant speed advantage for the estimate group is not associated with poorer accuracy. On the contrary, the estimate group was appreciably more accurate or calibrated (88%) than the answer group (74%). Accuracy or calibration is defined as the percentage correct divided by the percentage attempted. Although the estimate group attempted considerably fewer questions, they seemed to have an excellent sense of which questions they will not be able to answer, while people in the answer group seem to try to answer too many. The point that subjects are more accurate in the estimate condition allows us to dismiss the issue of a possible speed/accuracy trade-off. Clearly, the estimate condition is a useful and quick process that might well play a role in an initial evaluation phase used to select an appropriate strategy.
Lynne M. Reder
248
Total Time 1.90 v1
n z
1.80
!2
1.70
Ek
1.601
e
._." ,...,,.." ._.."
...'...J
..I'
..I'
0"'
1.40
ANSWER
ESTIMATE
GROUP Fig. 7. Time to push button and total time to initiate response.
B.
VARIABLES THATAFFECT FEELING OF KNOWING A N D ABILITY ANSWERQUESTIONS
TO
The previous section provided support for the idea that people's feeling of knowing operates fast enough to be a viable mechanism involved in strategy selection. Now we can ask what variables intrinsic to the question affect our feeling-of-knowingjudgments. One possible variable would be familiarity or recency of exposure to the terms in the question. Presumably, a feeling of knowing occurs for a question when there are recent memory traces of words used in the question and there is no concomitant association of nor knowing the answer. That is, the question topic might well seem familiar to a person if most of the content words in the question had recently been talked about.
I . Spurious Feelings of Knowing
In another experiment, we attempted to give subjects a spurious feeling of knowing by giving them prior recent exposure to some of the terms of some of the questions. We did this by having subjects make frequency-
Strategic Control of Retrieval Strategies
249
of-occurence estimations on words which would later be used in some of the questions that they were asked to judge and answer. We did not want subjects to be aware of our attempt to give them a spurious feeling of knowing, so we only primed one-third of the questions. We were not certain whether subjects could be given a feeling of knowing about the questions without their also being aware that terms used in the question had also been seen a few minutes earlier in the same experimental context. We selected questions that varied in the probablity that a subject could answer them correctly. The pool of questions that we used had been normed by Nelson et a / . (1984). For each question, we selected two words or expressions from the question that seemed cenrrul to it. For example, for the question "What is the term in golf for scoring one under par?" we selected the words golfand par. For the question "What was the name of the clown on the 'Howdy Doody' television show?" the selected terms were clown and Howdy Doody. Although a pair of words was designated as central for every question, only one-third of the questions were randomly selected for priming for any given subject. All subjects were tested on all questions, but each subject received a different random selection of priming terms. After the word-frequency estimation task, subjects were randomly assigned to either the estimate condition or the answer condition of the game show experiment. We expected different effects of priming for the two tasks even though priming was hypothesized to affect feeling of knowing in all cases. In the estimate task, we expected subjects to be more inclined to think that they could answer a question and, hence, have an elevated propensity to attempt to answer questions. On the other hand, we did not expect this effect to show up in the answer condition since answer subjects should not be more likely to actually retrieve the answer. The spurious feeling of knowing should manifest itself in longer search times before saying "can't say" for that group. The results essentially supported our hypothesis. Figure 8 plots the effect of priming on percentage of questions attempted for the two groups as a function of question difficulty. The effect of priming is the difference in proportion attempted between the primed and unprimed questions. In the estimate condition, subjects are more inclined to attempt a difficult question if it was primed. Presumably, the reason priming did not matter for easier questions is that subjects already had a feeling of knowing for these questions." For the answer condition, there is essentially no effect of "In fact. the situation is slightly more complex than this. With very easy questions. subjects attempted even fewer if they were primed. We believe this is because subjects are sensitive to the priming manipulation and try to adjust their threshold for feeling of knowing if they recognize that the question has been primed. The adjustment tends to overcorrect for easy questions and undercorrect for very difficult ones. See Reder ( 1987) for additional discussion.
Lynne M. Reder
250
-1 -4
EASIER HARDER QUESTION DIFFICULTY
Fig. 8.
Difference in percentage attempted between primed and unprimed statements.
priming. The effect of priming is manifest for the answer group in the time to say "don't know" shown in Fig. 9. These subjects also have a spurious feeling of knowing, but in this case it causes them to search longer before giving up. 2. Are Feelings of Knowing Influenced by Self-Assessments of Topic Knowledge?
In the following study (Reder & Fabri. 1984). we were interested in seeing how one's self-assessment of expertise in an area affected one's willingness to attempt to answer a question. We developed trivia questions in four domains: movies, sports, geography, and U.S. history. We varied whether the questions contained a lot of topic-relevant terms or were fairly short in length and contained few topic-relevant terms. Our interests here were in determining whether extra concepts helped to "prime" an answer, in discovering if this priming manipulation behaved differently for feelingof-knowing judgments as compared with just answering questions, and, finally, in determining whether extra priming or topic terms interacted with one's own assessment of expertise in an area.
Strategic Control of Retrieval Strategies
I
1
I
PRIMED
UNPRIMED
25 I
PRIMING Fig. 9. Time to say “don’t know” instead of attempting to answer.
Our initial intent was to control length of question while varying the number of terms in the question that pertained to one of the four categories mentioned previously; however, it proved virtually impossible to control length while varying number of concepts without significantly changing the style and structure of the questions. So, instead. we made sure that all additional priming terms occurred at the end of each question.’ Examples of the short and long versions of a question are “What is the name of the longest river in the United States?” and “What is the name of the longest river in the United States whose mouth is at New Orleans‘?” After subjects rank ordered their knowledge of the four topics, they were randomly assigned to either the answer condition or the estimate condition. In the latter condition, they were asked to estimate their ability to answer the presented question prior to actually answering it. These subjects were told to say “yes” or “no” as quickly as they could because we were interested in their first impression of their ability to retrieve the answer to the question. Only after “yes” responses were subjects asked to give the answer. ’If questions with more primes are faster, then it could not be due to getting a good clue early.
Lynne M. Reder
252
Subjects in the answer condition spoke their answer to a question directly into the microphone. They were instructed to respond only when they were sure they had an answer in mind. If subjects did not know the answer to a question, they were told to respond “don’t know.” In both conditions, RTs were measured from the time a question appeared on the screen until the subject began to vocalize the answer. There were a number of interesting results associated with this experiment. Figure 10 displays the percent of questions attempted as a function of task and topic familiarity. The percentage of questions that subjects attempted to answer differed as a function of the question topic, such that questions from the topics that subjects rated as their best topic were attempted more often than questions from the topics that subjects rated as their worst topic. This result in itself is not surprising; what is more interesting is that subjects’ self-classifications of relative knowledge of these topics seems to have a greater impact in the estimate condition than in the answer condition. That is, subjects seem to use their own impressions of what they know fast enough to estimafe whether or not they could answer a question. Unfortunately, this effect did not reach significance.
941
93 -
91 92
B
90-
E
894 BP 8 8 -
86 87
BEST
WORST TOPIC FAMILIARITY
Fig. 10. Percentage attempted as a function of self-rated topic knowledge and task.
Strategic Control d Retrieval Strategies
253
A related finding was that there was more of an effect of rated topic knowledge in the answer condition upon the time to say "don't know" (see Fig. 1 I). This is quite similar to the effects in previously described experiments which concerned spurious feelings of knowing where primed questions took longer to reject. Here, subjects in the answer condition took longer to say "don't know" to a fact if it concerned an area they knew more about.
c.
AN UNEXPECTED RESULTFROM MANIPULATING FEELING OF KNOWING
A side result from the work designed to investigate which variables affect feeling of knowing was the phenomenon that priming the terms of a question seemed to actually increase the probability that the answerer could answer the question. That is, in one of the game show studies described previously, there seemed to be a tendency for subjects to respond more accurately in the answer condition when the question had been primed than when it had not been primed. Conceivably, priming the terms
20
-
15 -
a,.,,
H :
8 .
''....,+
z E
er
lo-
s . 'D
5-
0
0
0
Estimate
1
I
BEST
WORST
254
Lynne M. Reder
of a question not only gives one a feeling of knowing but actually raises the level of activation for relevant information such that the answer is more likely to pass over some kind of threshold necessary to elicit an answer. This result led us to do further tests to confrm the result. We conducted another study (Reder, Dennler, & Wells, 1985) where subjects were asked to generate sentences using terms that would later appear in questions they had to answer. In other words, just as in the "spurious priming" study described previously, questions were randomly selected to be primed for each subject. In the earlier study, subjects rated the terms for familiarity. Here they were required to generate original sentences using two terms. Again, subjects were not told that these terms would later appear in questions. Of those questions that were selected for priming, some were assigned to the conjoint and some to the disjoint condition. In the conjoint condition, subjects composed a sentence using two terms that came from the same eventual question; in the disjoint condition the two terms came from two different questions. For example, if the test questions were "What is the name of the clown on the Howdy Doody show?" and "In what town did Lady Godiva make her famous ride?" the primed terms might be recombined into the pairs, clown-ride and Howdy Doody-Lady Godiva.m The results are displayed in Table 1V.Subjects were significantly faster to answer a question in the conjoint condition, as compared with the disjoint or the unprimed condition, (p c .05). Subjects were also more accurate for primed than for unprimed questions and slower to say "don't know" to primed than to unprimed questions, but these latter effects did not reach significance. It appears that the same manipulation that affects feeling of knowing may also influence a person's ability to answer a question, namely, level of activation of the relevant structures. Note that this effect cannot be explained in terms of a simple lexical encoding effect. If the result were simply due to ease of encoding the words in the question, then the effect should be as large for the disjoint condition as the conjoint condition; on the contrary, the disjoint condition is at least as bad as the unprimed condition. VI. Conclusions This article has presented some of the reasons for the importance of considering the strategic components of memory retrieval when develop"Of the terms that were selected for priming, half were assigned to a three-sentence condition and half lo a one-sentence condition. "Three" or "one" referred to the number of sentences that subjects were required to write for a given priming pair. This variable had no effect on the data.
Strategic Control of Retrieval Strategies
2s5
TABLE IV ANSWERGROUPONLY:CORRECT ANSWER RTs (SEC), CONJOINT/DISJOINT SENTENCE-GENERATION EXPERIMENT Primed
Easy Hard
Unprimed
Disjoint
Conjoint
3.0s 3.33
3. I6 3.73
2.33 2.75
ing a model of memory and question answering. The experiments described here help to identify the variables that affect this strategy selection, both when the selection appears to be under conscious control and when it appears to be an automatic by-product of processing the question. The variables extrinsic to a memory probe that influence strategy selection include prior history of success with a strategy-when the strategy is successful, the subject stays with it; when it fails the subject tends to adopt an alternative as the preferred strategy. Other situational variables, such as explicit advice about successful strategies, task instructions and knowledge of the age of the memories tested can also influence this bias in strategy selection. Data were presented that support the view that strategy selection can occur while or after the question is understood, i.e., it does not have to be decided before reading the question. Evidence was reviewed that supports the idea that our feeling of knowing may be involved in this strategy selection. Several variables were shown to affect our feeling of knowing as well. Both recent exposure to concepts involved in the question and knowledge of the general topic referenced by the question affect one's estimates of ability to answer a question. Further research is needed to uncover the mapping between feeling of knowing and strategy selection. Feeling of knowing must affect strategy selection. It seems obvious that if nothing is known about a topic, no question-answering strategy is used at all. In this case, people quickly say "don't know." Certain manipulations have been shown to influence feeling of knowing, and others have been shown to influence strategy selection. What we have not shown is that variables that manipulate feeling of knowing also influence the type of strategy attempted. Surprisingly little work has been done on strategy selection in question answering. Work on strategy selection mechanisms is limited in general: however, there have been a few investigations in other domains (e.g., Dixon & Just, 1986; Payne, Bettman &Johnson, 1988; Siegler & Shrager,
256
Lynne M. Reder
1984). Siegler & Shrager looked at strategy use in children’s arithmetic. They also believed that the competing strategies are carried out serially (as opposed to a parallel race with the faster strategy forcing a response). However, they also believed that for this domain, children would always first attempt the direct retrieval strategy and only try the computational strategy if the correct answer was not stored. More recently, Siegler (1987) believes that children do, in fact, choose between the competing strategies and that they can elect to use a computational strategy even when they have an answer stored for an arithmetic problem. He posits similar types of variables that might affect strategy selection, e.g., frequency of problem presentation, knowledge of related problems, and difficulty of using the computional strategy. Payne e? al. (1988; Bettman, Johnson, & Payne, 1987) have looked at adaptive strategy selection in the domain of decision making. They find that people are very adaptive at switching strategies. In this case strategy selection depends on the cognitive effort involved in a strategy, how difficulty interacts with time pressure, and the probability of error with a particular strategy. Their notion that cognitive effort and accuracy predict strategy use fits nicely with my own conception in that ease of use and accuracy of the direct retrieval strategy will vary with delay between reading and test and with task demands. Further, the probability of success with a strategy, either based on the ratio of presented to not-presented statements or based on the advice of the experimenter, is also consistent with the idea of effort/accuracy tradeoffs. As a final note, it is important that memory researchers appreciate that people do not always search memory first for an exact match to a memory probe. Therefore, we cannot answer the question of which inferences are made during reading by examining differences in latencies among types of inferences. Some of the more recent research by McKoon and Ratcliff (1987) and Keenan. Potts, and Golding (1987) using word priming in a lexical decision task or in naming latencies are examples of promising approaches to understanding which inferences are made while reading. Question answering after a passage is read is a useful technique for understanding the strategies people use to answer questions, but not to infer the processes involved during reading.
VII. Appendix This story, the “Riverboat Race,” is an example of the type of story used in the experiments. The following incident occurred in the river town of Napoleon, Arkansas.
Strategk Control of Retrieval Strategies
257
A steamboat race was held on the 4th of July. The Kentucky was favored to beat the Walter Scott easily. The boats ran neck and neck over the first part of the course. They were stripped of all ornamentation. Then the Kentucky ran out of fuel. The captain took charge of the situation. There was no time to stop for more fuel. The crew gathered on deck. The Walter Scott began to pull far ahead. The crew of the Kentucky began to lose hope. They chopped up everything wooden on board. They worked carefully. The woodwork kept the fires going. The big stern paddlewheel went on turning. The Walter Scott was about to win. The finish line was in sight. The Walter Scott struck a sand bar. Her bow was stuck fast. All that was left of the Kentucky was the hull and engines. The captain ordered full speed ahead. She had no trouble with the sand bar. The crowd was stunned as the Kentucky crossed the finish line. The night was spent carousing. The winners of bets bought drinks for the losers. The next day business as usual resumed.
The following statements were to be judged for plausibility. Moderately plausible statements: The race was a big event in the town. Ornamentation slowed down boats. Bets were being placed on the two ships. Highly plausible statements The Kentucky’s crew chopped and burned everything but the hull and the engines. The Kentucky’s crew was going to use the wood for fuel. The Kentucky won the race. Contradictory statements The Kentucky struck a sand bar. The losers of bets bought drinks for the winners. The first mate took charge of the Kentucky’s situation. Implausible statements The race occurred on a cold, windy day.
258
Lynne M. Reder
Each boat was allotted the same amount of fuel. The townspeople had a very sober nature. ACKNOWLEDGMENTS The work reported here was sponsored by Grant BNS-0371 from the National Science Foundation and, i n part, by the Office o f Naval Research, Contract No. N00014-84-K-00h3. Contract Authority Identification Number NR667-529. I thank John Anderson and Gail Wells for their comments.
REFERENCES Anderson, J. R. (1983). Retrieval of information from long-term memory. Scienc.e, 220. 2530. Anderson, J. R. ( 1985). Cognirive psychology und i/s intplic~u/ions.New York: Freeman. Arkes, H. R.. & Freedman. M. R. (1984). A demonstration of the costs and benefits of expertise in recognition memory. Memory und Cogni/ion. 12, 84-89. Baddeley. A. (1986). 0xf;rd P.syc/rology Series. No. I I.Working memory. New York: Oxford University Press. Bettman, J. R.. Johnson, E. J.. & Payne, J. W. ( 1987). Cognirive eflw rinddecision nrct/iin~ s/ru/egies:A comnponen/iirlimirl.vsis c.fchoice. Unpublished manuscript. Duke University. Chiesi. H. L.. Spilich, G . J.. & Voss, J. F. (1979). Acquisition of domain-related information in relation to high and low domain knowledge. Joitrnd .f Verbal Leurnitrg irnd Verbol Beltcrvior. 18, 257-273. Dixon. P. & Just. M. A. (1986). A chronometric analysis of strategy preparation in choice reactions. Meinon’ und Cognition. 14, 488-500. Donders. K. A.. Schooler. J. W., & Loftus. E. F. (1987). Trorthles w i / h menrory. Talk presented at the 28th Annual Meeting of The Psychonomic Society, Seattle. WA. Dooling. D. J., & Cristiaansen. R. E. (1977). Episodic and semantic aspects of memory for prose. Jorrrnd of fiperiinen/ir/ Psyc/ro/ogv: Hiitnun Lecrrning irnd M m r o w . 3, 428436. Hebb. D. 0.(1961). Brain mechanismsand learning. I n J. F. Delafresnaye (Ed.). Dis/inc/hv feuirtres of Ieirming in the higher unirnirl. New York: Oxford University Press. Keenan, J. M., Potts. G . R.. & Golding, J. M. (1987). Merlrodsfor awessing //re ociwrence c!f eIuhoru/ive infirences. Talk presented at the 28th Annual Meeting of the Psychonomics Society. Seattle. WA. Lewis, C. H.. & Anderson, J. R. (1976). Interference with real world knowledge. Cognirivc, fSVClI0hJg.V. 7, 31 1-335. Loftus. E. F. (1979). The malleability of memory. Anrericirn Scienrisr. 67, 312-320. Loftus. E. F.. Miller. D. G.. & Bums. H. J. (1978). Semantic integration of verbal information into a visual memory. Jortrnul of Experirnen/ul Psve/rology: Hutnun Lecrrning ond Mernoy. 4. 19-3 I. Loftus, E. F., Schooler, J. W.. & Wagenaar. W. (1985). The fate o f memory: Comment on McCloskey and Zaragoza. Jorrrnul ofExperiinen/ul Psyclrologv: Generul. 114,375-380. McCloskey. M . . & Zaragoza. M. (1985). Misleading postevent information and memory for events: Arguments and evidence against memory impairment hypotheses. Jorrrncrl of Experimen/ul Psvclrnlogy: Generul. 114, 3-18. McKoon. G . . & Ratcliff. R. ( 1987). Semirnric ussociu/ion und inference proc~esses.Talk given at the 28th Annual Meeting of The Psychonomics Society. Seattle, WA.
Strategic Control of Retrieval Strategies
2.59
Gerler. D..& Narens. L. (1984). Accuracy of feeling-of-knowingjudgments Nelson, T. 0.. for predicting perpetual identification and relearning. Jotrrncrl of Experirnenrcil Psyc / t d ~ > g v113, . 301-323. Owens. J.. Bower. G . H.. & Black. J. B. 11979). The 'soap opera' effect in story recall. Memory cind Cognition. 7 , 185-191. Payne. J. W.. Bettman. J. R.. &Johnson. E. J. (1988). Adaptive strategy selection in decision making. Joiirnirl of E.rperirnento1 P.syclro1og.v: Lrcirning. Memory. cmd Cognition. in press. Peterson. S. B.. & Potts. G . R. (1982). Global and specific components of information integration. Jorrrnol of Verhul Lecirning rind V e r h d Belrwior, 21, 403420. Postman. L. ( 1971). Organizing and interference. Psvi~lrologiccrlReview, 78, 290-302. Postman. L.. & Underwood. B. J. (1973). Critical issues i n interference theory. Memory cind Cognition. I, 1940. Prytulak. L. S. (19711. Natural language mediation. Cognitive P.syc/iology, 2, 1-56. Reder, L. M. (1976). The role of eluhoru/ions in tlre processing qfprose. Doctoral dissertation, University o f Michigan. Available through University Microfilms. Ann Arbor. Reder. L. M. (1979). The role of elaborations in memory for prose. Cognirivc Ps.vc/roIogy. 11, 221-234. Reder, L. M. (1982). Plausibility judgments vs. fact retrieval: Alternative strategies for sentence verification. Psyclrologiccil Rei*iea,. 89, 250-280. Reder. L. M. (1987). Strategy selection in question answering. Ciignitiiv P s y c h l o g y . 19, 90-138. Reder, L. M., & Anderson. J. R. ( 1982). Effects of spacing and embellishment on memory for the main points of a text. Memory crnd Cognition. 10, 97-102. Reder. L. M.. Dennler. C.. & Wells. G. (198.5). Conjoin/ vs. disjoint priming experiinen/. Unpublished raw data. Carnegie-Mellon University. Reder, L. M., & Fabri, S. (1984). World knowledge e.rperirnen/. Unpublished raw data. Carnegie-Mellon University . Reder. L. M.. & Ross. B. H. (1983). Integrated knowledge in different tasks: The role of retrieval strategy on fan effects. Joirrncil i f Experirnentcil Psyclroli>gy:Leiirning. Meinor?.. cind Cognition. 9, 55-72. Reder. L. M., & Wells. G . (1986). Mixed deliip c~.rperirncn/.Unpublished raw data. CarnegieMellon University. Reder. L. M.. & Wible, C. (1984). Strategy use in question-answering: Memory strength and task constraints on fan effects. Meinor?, und Cognition. 12, 41 1-419. Siegler. R. S. ( 1987). How dinnuin-generd cind domuin-specific knowledge in/criic.t t o prodrrw strci/egy c1ioice.v. Unpublished paper. Carnegie-Mellon University . Siegler. R. S.. & Shrager. J. (1984). Strategy choices i n addition and subtraction: How do children know what to do? I n C . Sophian (Ed.).Origins qfcognitive skills: Tlre eigliteenrh unntrril Ciirnegie Syrnposiitm on cognition. Hillsdale, NJ: Erlbaum. Smith, E. E.. Adams. N., & Schorr. D.(1978). Fact retrieval and the paradox of interference. Cogni/ive Ps.vcho1ogy. 10, 438464. Sulin. R. A.. & Dooling, D. J. (1974). Intrusion of a thematic idea in retention o f prose. Joiirnul of Experimentul Psvc~lrology.103, 255-262. Tversky. B., & Tuchin. M. ( 1987). Memory irnpirirment by rnisleiiding posteven/ infortnitlion: A reconcilicr/ion. Talk presented at the 28th Annual Meeting of The Psychonomic Society. Seattle. WA.
This Page Intentionally Left Blank
ALTERNATIVE REPRESENTATIONS Ruth S. Day
I Introduction
For a long time the concept of information processing was the core concept in cognitive psychology; it focused attention on processes such as the acquistion. storage, and retrieval of information. Then the concept of representation emerged to take its place alongside information processing (and perhaps even surpass it). For example, Anderson and Bower (1973, p. 51) contended that "The most fundamental problem confronting cognitive psychology today is how to represent theoretically the knowledge that a person has." Although attention was still paid to processes involved in using a given representation, focus clearly shifted toward structural aspects of knowledge, both in terms of its content and procedures for using it. As more investigators turned to problems of representation, basic textbooks began to treat it as a core problem as well (e.g., Reed, 1982; Reynolds & Flagg, 1983; Anderson, 1985; Glass & Holyoak, 1986; Solso. 1988). Previously, the term was often confined to discussions of the Gestalt tradition in problem solving (e.g., Duncker, 1945). A.
TYPICAL APPROACHES
A major goal in much of the research on representation has been to determine what the representation is: Do people use linguistic, spatial, propositional, or other forms of representation? In some cases, the search for operative representations took place within specific task environments; for example, Clark (1969) argued that subjects solve three-term series T H E PSYCHOLOGY OF LEARNING A N D MOTIVATION. VOL ??
26 I
Copynght 6 ) 191111 by Academic he\\. Inc All nghtr of reproduction in any form re*erved
262
Ruth S. Day
problems using a linguistic representation, while Huttenlocher ( 1968) argued that they use a spatial representation. In other cases, the search was extended to broader areas of cognition, such as semantic memory or mental imagery. For example, some investigators argued that people generally store information in imagistic or spatial form (e.g., Paivio, 1971; Bower, 1972; Kosslyn, 1980; Yuille. 1983) or in propositional form (e.g., Anderson & Bower, 1973; Pylyshyn. 1973; Kintsch. 1974; Norman & Rumelhart. 1975). Rumelhart and Norman (in press) distinguished four families of representational systems, propositional, analogical, procedural, and distributed knowledge systems, and each has its own vigorous advocates. A popular approach in studying representation is to have subjects perform a task and then try to determine what representation they used. This approach is often useful, especially when the task is set up to examine contrasting theories of representation. Sometimes the focus is on whether or not subjects used “the correct” representation, especially in problemsolving tasks; thus, subjects who represent a problem one way may be more likely to solve it than those who do so another way (e.g., Wickelgren, I 974).
B. THEALTERNATIVEREPRESENTATIONS APPROACH The research strategy reported here takes a different approach. Instead of giving subjects a task to perform and then trying to determine what representation they used, we give them a particular external representation at the outset; subjects study this representation, then perform the task. We also provide alternative representations for the same information in order to assess how different formats affect performance. Types of representational formats include lists, matrices, outlines, tree diagrams, networks, and various types of spatial and pictorial formats. No claim is made for a one-to-one correspondence between the external representation provided and subjects’ internal representation. However, if we obtain systematic, robust performance differences across alternative representations, we can conclude that the internal representation is more similar to the format subjects studied than to other possible representations. Then we can study how specific properties of each representation affect performance in various cognitive tasks. Another feature of this approach is that it assesses the efficacy of a given representation across several tasks, including perception, memory, comprehension, and problem solving. Research using the alternative representations approach may well provide new ways to devise and test theories about the nature of human representation. Although the present approach is quite different from that typically used in representation research, it retains some concerns common to older tra-
Alternative Representations
263
ditions. Underwood (l%3) argued that the stimulus experimenters present to subjects may be only a nominal stimulus; that is, subjects may encode it in various ways so that the stimulus they actually use differs from the one provided. The nominal-functional stimulus distinction has been quite useful in understanding results from many paradigms using simple stimulus ingredients such as nonsense syllables and numbers (e.g., Martin, 1971). The present research acknowledges that subjects may also encode more complex information in many different ways; however, it provides subjects with alternative representations in order to observe potential differential effects more directly. This article reports experiments using information from three quite different domains-bus schedules, medication instructions, and computer text editing. Each set of experiments begins with an everyday problem in which people want to perform some task but often do so in an inaccurate, ineficient. and/or frustrated way; in each case, the representation of relevant information appears to be a significant part of the difficulty. One of the alternative representations we test is always taken from the everyday situation; alternative representations of the same information are then devised (hopefully)to make the everyday task easier. These improvements are typically based on well-known concepts or principles in cognitive psychology (such as semantic grouping). Then a task is devised which mimics the real-world situation but takes place in a more controlled experimental environment. Therefore, the alternative representations approach is both “ecologically valid” (using real-world problems and representations) but experimentally controlled (achieving appropriate control over both representations and tasks). In the three domains presented in the following sections, only initial experiments from longer sets of studies are presented. These initial experiments ask whether or not alternative representations of the same information affect performance, based on principles of potential interest for theories of representation. Follow-up experiments are then designed to determine how these alternative representations affect cognitive processes; for example, do they affect acquisition. storage, and/or retrieval? Such follow-up experiments are reported elsewhere (Day, in preparation), along with related projects in many other domains. Certain methods are common to the experiments reported here. Subjects were undergraduates at Duke University participating as part of a basic psychology course (elsewhere in this research, other types of subjects are included, such as experts in the domains studied). Subjects were assigned randomly to representation conditions, were not allowed to write on the display provided during the initial study phase of the experiment, and were not allowed (with one exception) to reinspect the display during subsequent test phases.
Ruth S . Day
264
The next sections present experiments from the bus schedule, medication instructions, and text-editing domains. Subsequent sections provide an overview of the experiments, suggest a research strategy, and sketch a general view of human representation. 11. Bus Schedules A. THEPROBLEM
The travel industry claims to reduce the mental and physical demands on travelers. We do not have to stay alert to guide a vehicle during a trip, plan routes, monitor fuel consumption, or worry about many other aspects of travel. Thus, we are urged to sit back, relax, and “fly the friendly skies” with an airline or “leave the driving” to a bus company. Nevertheless, these systems frequently create new problems, such as making it difficult to find and select the plane, train, or bus that best fits our schedule. For example, how would you select the best bus to get from New York City to Baltimore in time for dinner tomorrow using the schedule shown on Fig. la? The schedule itself’ suggests that the answer is “with considerable difficulty.” For some departure and destination points it is hard to tell whether you cun get there at all on a particular day. The bus schedule shown in Fig. I poses many problems for the user. For example, it is difficult to distinguish and interpret the various type fonts and decipher the small print. Even if these surface problems were eliminated, it would still be easy to make errors, such as selecting a bus which does not run on the day you want to travel or one which has no meal stop during a long haul. A major problem with this display is that there are many footnotes on the basic schedule matrix. Some footnotes place restrictions on the matrix information (e.g., “Fridays only”) while others elaborate it (e.g., “rest stop”). Although a list of footnote symbols and their definitions is provided on a separate page of the schedule folder, it introduces yet new problems, as shown on Fig. I b. Some problems with the footnote definition list are local in nature. For example, some restrictions or elaborations are useful to many passengers (e.g., whether there is a meal stop), while others are useful to very few (e.g., “Operates via Walt Whitman Bridge and 1-295”). Other definitions are ambiguous, such as “Stops only to discharge passengers at agency or in town.” Does this mean that there are other places buses generally stop, that the passengers can get off but cannot take their baggage, that there is no package delivery service, and/or something else? The major ‘Greyhound Bus Schedule, 1982.
Alternative Representatlorn
265
problems with the footnote display, however, stem from the overall way the information is represented and the consequent demands it places on working memory and scanning procedures. These problems are discussed in the next section.
B. ALTERNATIVE REPRESENTATIONS Assuming that all the footnotes and their exact definitions must be provided (which is certainly arguable), two aspects of the display cause processing difficulties for the user. The entries provide information about various semantic categories (such as holiday restrictions), yet items within each category are often scattered unsystematically throughout the list. Also, the relationship between a given symbol and its definition is sometimes meaningful (e.g., the symbol TNJ means "Transport of New Jersey") and sometimes arbitrary (e.g., a cloverleaf symbol means "Operates via 1-295 between NJTP Exit #4 and Delaware Memorial Bridge"). Furthermore some symbols are difficult to encode linguistically and/or are visually confusable with others (e.g., the squiggly symbol for "Sundays only" versus a similar one for "Saturday and Monday only"), thereby making it difficult for the user to remember the symbol when searching for its definition on the footnote page. Admittedly, it is impossible to provide intrinsically meaningful symbols for all the definitions, yet a more systematic selection of symbols within categories is possible. The haphazard arrangement of symbols in the footnote list and their often arbitrary nature can easily produce the following scenario. Users find a footnote symbol next to a potential selection in the schedule matrix and must remember what it looks like as they scan the footnote list. Since many symbols have little or no meaning in themselves (e.g., the squiggly lines), bear no meaningful or even a misleading relationship to their definitions (e.g.. a for "Mondays only; h for "holds for connection" but not "holidays"), and/or are visually confusable with other symbols (e.g.. triangles pointing up and down), it is easy to forget the target symbol; this may result in selecting the wrong definition or in having to reconsult the schedule page and then rescan the footnote list. Thus, the display places a heavy load on working memory and can easily lead to error and timeconsuming backtracking. Also, users must scan many entries in the unorganized list and compare those symbols with their memory of the target symbol; such search is inefficient and, again, quite time consuming. In order to convey the footnote information more effectively, alternative representations were devised. They varied systematically in terms of semantic grouping and symbol meaning, as illustrated at the top of Fig. 2. All displays retained the definitions provided in the original bus schedule folder, but represented them in different ways. To facilitate legibility, Fig.
Fig. I . Excerpts from a bus schedule folder: (a) Typical schedule matrix. (b) Footnotes (including typeface conventions) and their definitions provided on a separate page of the
-
AM - Light Face Type. PM Bold Face Type. Times shown in ITALICS indicate service via connecting schedule. All schedules operate daily unless otherwise noted. f Flag stop bus will stop on GL - Greyhound Lines, Inc. TNJ Transport of New Jersey. signal to pick up and MoF Mondays and Fridays only. discharge passengers. Fri - Fridays only. TWTSa Tuesdays, Wednesk Saturdays only. days, Thursdays and FSU - Fridays and Sundays only. Saturdays only. or SaMo - Saturday and Monday only. ESuH - Daily except Sundays and Holidays. 4- Daily except Saturdays and Sundays. # Rest stop. - Saturdays and Sundays only. 11 Meal stop. R See Restrictions. f o r Sun - Sundays only. a - Mondays only. 4 - Interstate service only see restriction. EssH Daily except Saturdays, Sundays and Holidays. t - Agency station handling tickets only. A - Full service agency handling tickets, baggage and express, including C.O.D. express. 8 Agency handling tickets, all express, but no baggage. @ - Operates I via Benjamin Franklin Bridge and NJTP #3 on Sundays only. @ - Operates via Walt Whitman Bridge & 1-295 on Sun. only. 4 Operates via 1-295 between NJTP Exit #4 and Delaware Memorial Bridge. h - Holds for connection. HS - Highway stop does not go into town or via agency. D Stops only to discharge passengers at agency or in town. Times shown are approximate. W Stops to discharge passengers and package express only. Times shown are approximate. Will operate Wednesday, November 25, Thursday, December 24 and Thursday, December 31 instead of Friday, November 27, December 25 and January 1. - Will operate Thursday, November 26, Friday, December 25 and Friday January 1 instead of Saturday, November 28, December 26, and January 2. Will not operate Thursday, November 26, Friday, December 25 or Friday, January 1.
-
-
-
3
1 ~
-
-
-
-
-
-
-
-
-
-
schedule folder;although the informationhas been retyped to enhance legibility. the symbols. definitions, and spatial anangement of entriesarethe same as in the original bus schedule folder.
-
am morning. pm altcrnoonlevening. Times shown in bold face indicate service via connecting schedule
R!&J&e OH TNJ
- Greyhound. - Transport of New Jersey
- Flag stop - hus nil1 stop on signal to pick tip and dtscharge pastengers - Highwa) stop - doei not go into t o w or via agency CS - Connection stop - bus holds lor connection. DS - Discharge stop for passengers and package express only. Times qhown arc approximate. AS. TS - Agency stop or torn stop. to discharge passengers only. FS HS
Times shown arc approximate.
-
All schedules operate dally unless otherwise noted
R - See Restriction%. V - operates ONLY on these days X
- Operates daily EXCEPT on these days M
-
Monday
- Tuctday W - Wednesday Th - Thursday F - Friday Sa - Saturday Su - Sunday H - Holiday
T
EXAMPLES: V b l F XSuH
- Operates hfonda)r and Fridays only - Operates daily except Sundays and Holidays
- Will operate U'ednccdat. Nntcmher 25, Thuridq. Decemher 24 and Thurtday. Decemher 3 1 instead of Frida). Notemher 27. Dccumhcr 25 and Januar) I
0
- Will operate Thursday. Notemher 26. Frida\.
Decemher 25 and Friday January I instead of Saturday. Novemher 2R. Deccmhcr 26. and January 2 Will not operate Thursday. November 26. Friday. December 25 or Friday. January I
-
-
w z
Rest stop.
8 --I
@
@
-
-
Interstate iervicc only see rcitriction. Operates via Benjamin Franklin Bridpe and NJTP 1 3 on Sundays only. Operates via Walt H'hitman Bridge b. 1-295 nn Sun. only. Operates via 1-295 between NJTP Exit 1 4 and Delaware Memorial Bridge.
0 - tickets only 4 - tickets and express 9- tickets. baggage. express. and C.O.D.
Fig. 2. Alternative representations o f the bus schedule footnotes. Top portion defines the four displays, which vary in terms of semantic grouping (0 = ungrouped. = grouped) and the relationship between the symbol and its definition (arbitrary. meaningful). The rest offigure shows the display which differs most from the original footnote list (Fig. Ib). Where possible. symbols in the meaningful displays provide intrinsic cues for their definitions: other cases provide cues only about category membership (e.g.. holiday changes).
+
Alternative Representations
269
2 shows details of only one alternative representation (the grouped-meaningful display). Additional information concerning all of the displays is provided in the following sections.
I . Semantic Grouping In the grouped displays, items belonging to a given semantic category are grouped together spatially beneath a title identifying their general content and separated from other categories by blank space. Thus, they group the entries into seven semantic categories while the nongrouped displays show 35 separate entries listed in unsystematic order. 2. Symbol Meaning
The meaningful symbol displays contain several changes. Where possible, the footnote symbol is meaningful with respect to its definition. Such symbols are meaningful abbreviations (e.g., FS means “flag stop,” HS means “highway stop”) or meaningful icons (e.g.. the fork means “meal stop”). It is difficult and perhaps impossible to devise meaningful symbols for some categories; nevertheless, category membership is denoted by using visually similar symbols (e.g.. holiday changes are all represented by filled geometric shapes). This approach creates meaning about category membership even though intrinsically meaningful symbols are not provided for individual items. The station services category combines an arbitrary convention (a circle plus lines) with a meaningful one (more lines through the circle indicate more services at the station). Thus, symbol meaning is achieved in a variety of ways, with all symbols “meaningful” to some extent and, often, much more meaningful than in the original display. The meaningful symbol displays also systematize the relationships between footnote symbols and their definitions. Each definition has only one symbol, whereas in the original display a given definition may have multiple symbols (e.g., “Sundays only” is indicated by both a squiggly line and by the abbreviation Sun). Each day of the week has only one abbreviation, whereas the original display often has several (e.g., S, K , and s are all used for restrictions involving “Saturday”). Two morphemes are provided for each daily schedule restriction, one for the day($ of the week and one for the type of restriction; thus, a check mark indicates “only on these days” while an X indicates “except these days.” The resulting two-part symbols greatly simplify representation of schedule restrictions, yet still provide the same basic information as the original display. All of these changes are designed to enhance the meaningfulness of the footnote symbols.
270
Ruth S. Day
3. All Displrys To summarize, there were four displays which varied in terms of semantic grouping (grouped. nongrouped) and symbol meaning (meaningful, arbitrary), as shown at the top of Fig. 2. Note that the original display is the least systematic (nongrouped-arbitrary) while the display shown at the bottom of Figure 2 is the most systematic (grouped-meaningful). C. SUBJECTS A N D PROCEDURE Subjects were assigned randomly to one of the four representation conditions. There were 6 in each condition except for the nongrouped-arbitrary condition which had 7, yielding 25 subjects in all. Subjects were asked to study the footnote display and learn the meaning of all symbols; they had 5 min to do so. Then they participated in two tasks. In the cued recall task, subjects saw the 35 symbols they had studied (in random order) and had to write a definition for each; they had 10 min to complete this task. They then did the matching task, in which they saw a list of the symbols next to a randomized list of their definitions; they had to write the number of the appropriate definition in the blank next to each symbol and had 10 min to do so.
D. RESULTS A N D DISCUSSION I . Cued Recall Tusk
The cued recall task was quite difficult, with overall performance only 55% correct. Semantic grouping facilitated performance, with subjects who studied grouped displays recalling 61% of the footnote definitions and those who studied the nongrouped definitions recalling only 49% [F(I ,24) = 5.72, p < .05], as shown on the left side of Fig. 3. This result is consistent with research showing that organized lists are easier to learn than unorganized ones. Most of those studies used familiar words as stimuli, while the present one used unfamiliar footnote symbols and definitions. Many previous studies used familiar semantic categories such as minerals and sophisticated organizational principles such as hierarchies (e.g., Bower, Clark, Lesgold, & Winzenz, 1969). while the present one used unfamiliar categories and only simple categorical organization. Thus, the bus schedule results extend the generality of the finding that organized lists are easier to remember than unorganized ones. Despite the fact that many of the footnote definitions were sometimes difficult to comprehend, they were easier to recall when they were represented by meaningful footnote symbols (69% correct) than by arbitrary ones (43%) [F( 1.24) = 32.31. p < .OOOl]. A long line of studies going back to Ebbinghaus (1964, originally published in 1885) demonstrates that the
27 I
Alternative Representations CUED RECALL
0
+
+
0
SEMANTIC GROUPING Fig. 3. Percent correct performance from the bus schedule experiment for representations which varied in terms of semantic grouping (0 = ungrouped, = grouped) and symbol meaning. Left side shows data from the cued recall task; right side shows data from the matching task.
+
meaningfulness of items affects their acquisition. Typically, subjects had to learn lists or pairs of items (nonsense syllables or words) scaled for meaningfulness (e.g., Noble. 1952). The present study tested recall not of the high- or low-meaningful symbols themselves, but their ability to cue fairly long, complex, and unfamiliar bus schedule conditions. The fact that symbol meaningfulness affected memory for more complex information extends the generality of another well-known principle. Both representational factors, semantic grouping and symbol meaning, clearly affected cued recall performance (with no interaction between them). However, it is not clear at what stage these effects occurredduring acquisition, storage, or retrieval of the bus schedule information. The grouped displays may have enabled subjects to encode the definitions more thoroughly or efficiently, establish more structured memory representations, and/or search memory more efficiently. The meaningful symbols may have enabled subjects to form associations between symbols and definitions more readily, store the symbols in more robust form, and/ or decode them more easily during recall. The meaningful symbol displays also lessened memory load, since they eliminated multiple symbols for the same definitions and also made it clear which letters stood for days of the week by preceding them with an inclusionary or exclusionary symbol (thus the symbol FS indicating “flag stop” would not be misinterpreted as a “Friday and Saturday” restriction). The condition which combined semantic grouping with meaningful symbols further reduced memory load. Whereas the other conditions listed 35 entries (including three typeface
212
Ruth S. Day
conventions), the grouped-meaningful condition listed only 23 of the original entries (including one typeface convention). two new definitions ("only on these days" and "except on these days"), and eight familiar yet consistent abbreviations for the days of the week plus "holidays," yielding 33 entries in all. Thus, this condition gave somewhat fewer entries, simpler entries, and two simple rules for generating all possible daily restrictions rather than a complex array of symbols for this category such as TWTSu, 0,a , EssH, and squiggly lines.
2. Matching Tusk A large part of the problem in performing the cued recall task is that subjects must remember original definitions, which are often complex and/ or ambiguous, as well as how they are paired with the footnote symbols. The matching task was designed to eliminate the reproductive memory component; the original definitions were provided and subjects simply had to match them to the appropriate footnote symbols. As shown on the right side of Fig. 3, overall performance was 73% correct, up 18% over that for the cued recall task, largely because there was less to remember.' The matching task is more like a recognition task than a recall task, and indeed these results are similar to those typically found when recognition and recall tasks are compared (Klatzky, 1980, provides a useful review of this contrast). The matching performance of subjects who studied the meaningful symbols was excellent, with an overall average of 90% correct. Although subjects who studied the arbitrary symbols also improved, they averaged ~ .OOOl].However, semantic grouponly 58%correct (F(1.24) = 2 7 . 7 3 , < ing did not affect performance (F(1.24) = 2.37, n.s.1, nor was there any interaction between the two experimental variables (F(1.24) = 0.85, n.s.1. Thus. the two representational factors had differential effects in the two tasks: symbol meaning affected performance in both tasks, while semantic grouping only affected cued recall performance. These results suggest that some representational conventions may, in general, affect a wider range of memory tasks than others. Additional studies are needed to confirm this possibility, perhaps using more traditional paradigms to contrast recall and recognition. 3. General Discussion
The experiment reported here is just a first step-it shows that alternative representations can affect memory for bus schedule information. Admittedly, the original footnote display is quite awful, but many displays 'However. practice effects associated with doing the matching task after cued recall may have contributed somewhat to this increase.
Alternative Reprenentatlons
213
of information in everyday life are awful. The fact that even very bright people-’have trouble working with such displays and can be helped considerably when using alternative representations is important. The same general finding, that alternative representations affect cognitive processes, occurs in situations involving far less information, as shown in the experiments from other domains presented shortly. With this initial experiment in hand, the next step is to study how the four bus schedule representations affect problem-solving situations like those encountered by actual travelers. The importance of representation in problem solving has been acknowledged for a long time, from the Gestalt psychologists to the modern approach established by Newell and Simon (1972). However, much of this work has focused on what happens when subjects do or do not have “the correct representation” for a given problem (Anderson, 1985, p. 222, provides an overview of this tradition). In contrast, work in progress using bus schedules examines the differential effects of various alternative representations on problem solving. Briefly, subjects study one of the footnote lists, then answer questions about specific travel problems while viewing the schedule matrix. In some cases they select the best bus from the original schedule given certain contraints (e.g.. departure location. destination location, maximum transit time, approximate arrival time). In other cases, they answer questions about transit details of a given bus (e.g.. “If you leave at 9:30 A . M . from Washington, D.C., how many rest stops will there be before your arrival at Union City, NJ?”). Dependent variables include percent correct, solution time, and (when permitted) whether subjects reconsult the footnote definition page. Cases where footnotes on the schedule are not relevant for solution of particular problems are especially interesting; although people may not fully learn the footnote definitions during the study period, they may learn enough about the type of symbols used in each semantic category to know when a given symbol can be ignored (e.g., holiday restrictions). The main goal of this follow-up work is to determine whether alternative representations affect problem solving as well as memory. 4 . Practical Applications
The bus schedule experiments suggest some practical applications. Simple modifications of current footnote practices may enable schedule users to find needed information more quickly and with fewer errors. Consequently, they may need to ask fewer questions of station agents (who often reply gruffly that “it’s all in the schedule-just read it”), thereby reducing staffing costs and the frequent complaints of travelers. Similar changes may be beneficial in the train, airline, and travel industry ‘Duke students have combined Scholastic Aptitude Test scores above 1300.
Ruth S. Day
214
in general, thereby enabling people to spend less time making selections, yet making better ones. For example, the display conventions in the airlines' Oficiul Airline Guide are difficult for both travelers and travel agents to use, resulting in time-consuming searches and some (subsequently)disappointing selections. In contrast, the American Automobile Association's Tow Books have few footnotes on hotel/motel listings and the symbols are often intrinsically meaningful (e.g., 2 f / / B 77.00- means that "a room for two persons with one bed begins at $77.00"); nevertheless the user must still use a footnote key to understand why, for example, the University Inn in Pittsburgh is 40, X f , 10, F. In fact, any time a list of footnotes accompanies a larger document (other than numbered footnotes, as in manuscripts), the principles of semantic grouping and symbol meaning are potentially important. These principles may also affect the acquisition of other types of symbols across many domains (such as chemistry, mathematics, and computer commands). as well as new vocabulary and abbreviations in all content domains. Overviews of such to-be-acquired information are often provided in list form without grouping by semantic category and/or using sufficiently meaningful symbols. 111.
Medication Instructions
A. THEPROBLEM
Every day, millions of people take prescription drugs and many of them do so incorrectly. They may forget to take pills, take too many, take them at the wrong times, or fail to follow relevant restrictions (such as avoiding alcohol). Hundreds of patient compliance studies show a dangerous trend. For example, one review found that the percentage of patients who complied with their medication instructions ranged from 18% to 89% (even for acute illnesses), with an average of 46% across various long-term illnesses (Sackett & Haynes, 1976). Another review found average compliances of 62% for short-term regimens, 57% for long-term preventive regimens, and 54% for long-term treatment regimens (Sackett & Snow, 1979). Similarly gloomy findings were reported in the definitive work on health care compliance which summarized 537 original articles (Haynes, Taylor, & Sackett. 1979). Although there are many methodological problems in the patient compliance literature, such as verifying which pills patients have and have not taken (see Epstein & Cluss. 1982, for a review), best estimates indicate that noncompliance is widespread and often has dire consequences-with patient error introducing new health complications or even death.
Alternative Representations
215
Patient medication errors are especially frequent among older adults. Approximately 75% of people aged 65 and above take one or more prescription drugs. One government publication estimates that more than half of older patients make errors, even when they are given explicit medication instructions (HEW. 1978). It also points out that about 22% of patients over 65 are admitted to hospitals each year due to their own medication errors. In addition to the resulting anguish caused such patients and their families, the price tag is staggering. For example, in 1982 Medicare spent $42 billion for in-hospital treatment, with approximately $9.24 billion of this due to patient medication errors. Adding in coverage through Medicaid, the Veterans Administration, and private sources pushes this estimate to about $12 billion. Why do people have such difficulty taking their medications, especially when it can cause dire health and financial consequences? Most of the blame is usually heaped on the patients themselves-presumably they lack sufficient education, motivation, or (especially if they are over 65) memory ability. However, examination of medication instructions provided by health professionals suggests that representational format may be a significant part of the problem.
B. ALTERNATIVE REPRESENTATIONS The instructions provided by an actual doctor to a patient are shown on the left side of Fig. 4. The patient had suffered a mild stroke, responded well to hospital treatment, and received these written instructions when
LIST FORMAT lnderal
-
1 tablet 3 times a day
Lanoxin
-
1 tablet every a.m.
Carafate
-
1 tablet before meals and at bedtime
Zantac
-
1 tablet every 12 hours
MATRIX FORMAT
(twice a day)
Quinaglute - 1 tablet 4 times a day Coumadin
-
1 tablet a day
Fig. 4. Alternative representations for some medication instructions. The list format is identical to one given to an actual patient by his doctor.
Ruth S. Day
216
he left the hospital. Over the next few days, he had difficulty remembering what pills to take, as well as what pills he had already taken. It would be easy to blame the patient; after all, he was 81 years only and had just had a stroke. However, he was highly intelligent, was still working full time (and had even begun a new and demanding career a few years earlier), was not otherwise disoriented. and was highly motivated to return to work and an active life-style. Modification of his written medication instructions as shown on the right side of Fig. 4 improved his memory and eliminated further errors, thereby suggesting that his doctor’s representation was largely responsible for his previous difficulties. The original and modified representations served as stimuli in the present experiment. 1. List Formut
The doctor’s display shown on the left side of Fig. 4 was used as one of the representations in the experiment. The medications are shown in list form, with phrases providing specific instructions for each. In order to illustrate some problems with this representation, try to answer the following questions by consulting the list format:
It is 12:oO noon; which pills should you take? If you leave home in the afternoon and will not be back until breakfast time the next day, how many Inderal should you take along? It takes a considerable amount of time to answer such questions (which occur in the everyday life of patients), and it is easy to answer them incorrectly. This is true when the written instructions are present and even more so when they are not, which is the usual state of affairs. Part of the problem is that the instructions are given in different ways. namely, in terms of the number of pills per day (Inderal. Zantac. Quinaglute. Coumadin), general time zones (Lanoxin), amount of time since last pill (Zantac). or associated with everyday events such as meals (Carafate). Another problem is that the list format makes it difficult to see what pills are taken together. In order to solve these problems, a new representation was devised for these same instructions.
2. Mutrix Formut The same medication instructions are shown in matrix form on the right side of Fig. 4. All instructions are given in terms of their association with daily events (meals and bedtime). The doctor who prepared the original instructions verified that this display satisfies his original intentions. Linking pills to meals is usually a good idea since it provides a strong and
Alternative Repmentations
211
reliable cue for remembering to take them, although there are some exceptions (e.g., some medications should be taken on an empty stomach). Check marks in the matrix indicate when to take each pill; replacing them with numbers would enable it to be used for multiple dose regimens. The matrix representation makes it easy to see what pills are taken together and the number of each type of pill taken per day. 3. Both Displays
Both displays provide the same information-in the sense that they meet the intentions of the prescribing doctor. However, the matrix format provides the same type of instruction for all medications (i.e.. take them at the time of various daily events) and also emphasizes the specific links between a given medication and the times it is taken (the union of the medication and time dimensions). If the matrix does indeed produce better performance in a relevant task, we can then go back to determine the relative contribution of the matrix format itself vs. the use of consistent instructions across medications. C. SUBJECTS A N D PROCEDURE A group of 29 subjects viewed the list format and 28 viewed the matrix format (57 subjects in all). These groups were sutjdivided into different test conditions on a random basis, as described in the following sections. 1 . Study Phase
During the study phase, subjects studied one of the two displays shown in Fig. 4. Subjects were told that they would see some medication instructions devised by a doctor for a patient who had been seriously ill, to envision themselves as this patient, and to study the instructions carefully so that they could follow the doctor's orders perfectly. They were also told they would answer questions about the display later. Then they studied the display for 2 min. 2. Filler Task
Next came the filler task, designed to prevent rehearsal and allow some forgetting to occur. Subjects wrote answers to questions concerning their general knowledge about medications. For example, they were asked to predict how often various populations (e.g., college students, business executives, children) take aspirin or aspirin substitutes. They had 3 min to complete this task, and most finished just as the time was up. There was a narrow range in the overall amount of information written across
278
Ruth S. Day
subjects, suggesting that they were involved in this task to a comparable degree.
3. Test Phuse During the test phase, subjects were again asked to envision themselves as the patient taking the medications they had studied previously. The experimenter read aloud I2 questions concerning this information and subjects had 10 sec to write their answers to each. The questions concerned the number of pills taken per day (e.g., “How many Quinaglute do you take per day?”), when pills are taken (e.g., “When do you take Zantac?”), or combinations of factors (e.g., “Which pills do you take only in the morning?” or ”How many pills do you take at dinner time?”). These questions were of two general types. Factual questions required subjects to recall facts explicitly provided in the displays (e.g., “When do you take Zantac?”), while inferential questions required them to go beyond the explicit information provided (e.g., “If you leave home in the afternoon and will not be back until breakfast time the next day, how many lnderal should you take along?”). There were seven factual questions and five inferential questions. Subjects were divided into two conditions for the test phase. Some did not have the original displays available during the test and, hence, were in a memory condition. Others did have copies of the same display they had studied; since all they had to do was understand the information in the display, theirs was basically a comprehension condition. 4. Overview
The experiment used a 2 x 2 design, with representation (list, matrix) and test condition (memory, comprehension)as the factors. For convenience. the letters L and M are used to denote the list and matrix representations, while ( +) and (0) denote the presence or absence of original displays during the comprehension and memory test conditions. The number of subjects in each condition was L(0) = 15. L( + 1 = 14, M(0) = I I , M( + ) = 17 (57 subjects in all).
1 . Scoring
All pieces of information requested in a given question were required to get credit for the answer. Each question was worth I point, yielding a total of 12 possible points per subject. Two questions had more than one acceptable answer. Question 3 (“When do you take Zantac?”) could be answered, “Twice per day,” “Every 12 hours,” or “At lunch and bed-
Alternative Representations
279
time.” For Question 12 (“At approximately what times during the day do you take pills?”), answers had to provide at least four separate times, two of them had to have 12 hr between them, and mealtime and bedtime (or general hours when these events occur) had to be mentioned. Since subjects frequently misspelled medication names, credit was given only when such answers were clearly identifiable and differentiated from other medication names. For example, “lnderol” was acceptable (for Inderal), but “lndoxin” was not (evidently a blend of lnderal and Lanoxin). 2. Overall Performance
Overall performance was 67% correct across all conditions, which means that one third of these bright, healthy subjects would apparently make mistakes in taking the prescribed medications, at least in terms of the procedures used here. The results are summarized at the top of Fig. 5 . The alternative representations produced different effects on performance [F(1.56) = 13.63.p < .001]; subjects who studied the matrix format were correct on about three-quarters of the items (78% correct). while those who who saw the list were correct on only about one-half the items (56% correct). As would be expected, having the original displays available during the test phase facilitated recall IF( I ,56) = 36.08.p < .OOOI];subjects in the comprehension conditions ( + ) had a mean score of 81% correct while those in the memory conditions (0) had only 50% correct. There was no interaction between representation and test condition. It is interesting to note that subjects who had the display present during the test (comprehension condition) still made errors, especially those who had the list format. 3. Question Type
Performance was better on the factual questions than on the inferential ones, as illustrated in the middle of Fig. 5 and confirmed by a repeatedmeasures analysis of variance lF(3.53) = 114.43,p < .OOOl], with 74% correct responses on the factual questions and only 57% correct on the inferential questions. The inferential questions yielded lower performance for both representations lF(3.53) = 0.88 n.s.1 but had a greater effect on memory than on comprehension scores [F(3,53)= 4.78. p < .05]. It is interesting to note the wide range of scores across the various conditions, with subjects in the M(+ ) condition achieving 95% correct scores on factual questions and those in the L(0) condition achieving only 31% correct scores on the inferential questions. Examination of individual questions shows some interesting response patterns. For example, when asked a scenario question (“If you leave home in the afternoon and will not be back until breakfast time the next
Ruth S. D a y
280
ALL QUESTIONS
100
+ 80 0
w
a a
80
0
0 40
+
0 (Memory)
(Cornprehenston)
TEST CONDITION FACTUAL
1 oo
;
80
W
g
80
INFERENTIAL
/:
0 0
*
40
20
+
0
+
0
SCENARIO
100
0
+
Fig. 5. Percent correct results from the medications experiment which used two types of representations (L = list. M = matrix) and two test conditions (0 = memory. + = comprehension). Pluses and zeroes reflect the presence or absence of the original displays during the test phase. All displays show the percentage o f questions answered correctly; the top panel shows data for all questions, the middle panel contrasts factual vs. inferential questions. and the bottom panel shows results for a single scenario question ("If you leave home in the afternoon and will not be back until breakfast time the next day, how many lnderal should you take along?").
Alternative Representations
28 I
day, how many lnderal should you take along?”). all subjects performed poorly except those in the M( +) condition, as shown at the bottom of Fig. 5 . This example illustrates an important point-alternative representations can interact with specific task requirements. A similar point has been made in other contexts, such as memory-orienting tasks (Morris, Bransford, & Franks, 1977). 4.
General Discussion
The list format emphasizes item information-in this case, the medication names. Since all instructions begin with “ I tablet,” it also highlights one aspect of the instructions, the number of pills to be taken. The particular list used in this experiment (Fig. 4) varies how the rest of the instructions are given (number of times per day, time of day, time elapsed between pills, association with daily events), making it hard to understand and remember other aspects of the regimen. In fact, most of the factual questions focused primarily on the type of information readily available from the list format, number of pills taken (Questions I , 2, 4). and which pills are taken (Questions 5 , 8, 10). The remaining factual question (3) asked a simple “when” question (“When do you take Zantac?”) which can be answered by quoting the definition verbatim ( “ I tablet every 12 hours” or “ I tablet twice a day”). The matrix format emphasizes the union of item and time information (when to take what) and helps people learn this crucial part of the information. Even if the list format were modified to give all instructions in terms of their association with mealtimes and bedtime, it would still not emphasize the union of items and time, nor would it make it easy for users to combine item and time information in an efficient way. Some of the factual questions focus on just such combinations, such as Question 6 (“Which pills do you take with every meal?”). Another factual question asks for number-by-time information (Question I I: “How many pills do you take at dinner time?”). To answer this question with the matrix, users simply need to find the dinner column and count the number of check marks in it. To answer it with the list format, even if instructions were all in terms of mealtime and bedtime, users must still scan each medication and read its instruction, determine whether or not dinner time is included in each, and accumulate a counter based on the yes-no results of each successive medication. These scanning operations take more time and provide greater opportunity for errors to occur. When list entries are given in different ways (i.e., in terms of number of pills per day, amount of time since last pill. and association with daily events), as in the present list, this task is even more difficult. The matrix is especially useful for answering everyday scenario questions. For example, when asked how many Inderal they should take along
282
Ruth S. Day
when leaving home in the afternoon and returning for breakfast the next day, users simply find the lnderal row, scan across it to the appropriate time zone, and count the remaining check marks in the row. Even though the corresponding entry in the list format ("I tablet 3 times a day") seems easy to understand, people using this format have great difficulty answering the question (Fig. 5 ) . Medication instructions. however displayed, always have two basic underlying factors: medication names and times. It is crucial to understand how these factors are to be combined-when to take what. The matrix is ideally suited to display the union of factors in a clear and efficient manner. It enables users to comprehend and remember the instructions more readily as well as solve everyday scenario problems. The scenario problem shown in Fig. 5 was still difficult to solve without reconsulting the matrix, while having it available greatly improved performance. Hopefully, with repeated usage of the matrix in an everyday regimen, users will learn it sufficiently well to solve scenario problems as they arise. It is not clear whether extended use of the list format will produce comparable improvements in performance; if it does, it should still take longer to achieve such performance, with more opportunity for harmful errors during this time. The medications experiment suggests a general display principle which should extend to other domains as well. Whenever a set of information has two factors which must be combined, the matrix format may be an optimal form of representation. If only one factor needs emphasis, and/ or its defining information does not vary in a systematic way, the list format may suffice. Subsequent work (Day & Stoltzfus, in preparation) shows that the alternative representations studied here affect acquisition of the medication names themselves as well as knowing when to take them. We are also examining people's ability to make changes in a well-learned regimen (a common real-world problem), the effects of other types of representations (such as tree diagrams), and more complex regimens. 5. Practical Applications
Many people are evidently aware that they have problems taking their medications since they buy special pillboxes advertised to reduce memory errors. Some of these boxes have separate compartments to be used for each pill, time zone, or day of the week. Some even have electronic beepers to remind the user that it is time to take a pill. Beyond problems with setting timers appropriately (which is not trivial for many people), these systems ignore a crucial problem: How do patients know what pills to put into each compartment? An appropriate initial representation, such as the matrix format, is needed to load the pillbox in the first place in
Alternative Representations
283
order to then take the pills in an accurate manner. Fancy pillboxes cannot replace adequate human representation. Many health professionals are aware that patients have trouble taking their medications. Pharmacists in particular have been working on the problem, though with limited success. For example, one survey of patient counseling practices sent to lo00 pharmacists in Florida, home to many elderly patients on complex medication regimens, yielded a return rate of only 10% (Robinson & McKenzie, 1984). Most of those who did return the survey were against required patient counseling; the minority who favored it worked in hospitals rather than in independent or chain pharmacies, were staff pharmacists or assistant managers instead of owners or managers, and served patients with higher incomes. A large part of their reluctance involves the sheer amount of time needed to talk with patients on an individual basis-it simply is not cost-effectiveand patients are probably unwilling to cover these costs through higher medication prices. A potential solution is to use simple display devices such as the matrix studied in the present experiment; most patients should understand it quickly and can then retain it as a continuing memory aid. Some health professionals (especially pharmacists and nurses) do give their patients various memory aids, including matrix-type displays like the one studied here. However, they provide only subjective, anecdotal observations concerning their effectiveness. The present study provides scientific methods for evaluating such representations in an informed way; it also provides evidence that the matrix representation improves memory and comprehension of medication regimens, at least among young, healthy individuals. Work in progress is designed to determine whether similar effects occur among patients taking medications, especially the elderly. If so. the practical benefits could be substantial-with a minimum of effort, we may be able to reduce self-medication errors, thereby producing healthier people and reducing medical costs. IV. Text-Editing Commands A.
THEPROBLEM
Computer-based systems for producing and modifying text save considerable time and provide much greater flexibility than other means. However, learning to use such systems can be a time-consumingand frustrating process. As more people abandon their typewriters in the workplace and at home, there is increased need to make the learning process easier. There are several practical goals: to enable people to learn text-editing systems more quickly, to reduce training costs and frustration during the learning process, and to produce more proficient users. In order to ac-
284
Ruth S . Day
complish these goals, many new text editors have been developed and others improved to make them more "user friendly." At the same time, many investigators have been studying human-computer interaction in the text-editing environment. They have found, for example, that even among expert users editor design makes a big difference, with displayoriented systems producing faster editing than teletypewriter-oriented systems; furthermore, the fastest user on the best editor may work as much as 12 times faster than the slowest user on the worst editor (Card. Moran. & Newell, 1983. p. 120). These differences are based in part on the relative amount of work required by each system as measured by the number of keystrokes required to perform a given task. All text-editing systems provide many of the same basic functions but accomplish them in different ways. For example, consider one of the simplest problems in text editing-how to indicate places on the screen where new text is to be inserted or old text modified. There are over 1700 such locations in just one screenful for the typical display monitor (roughly 22 lines x 78 characters per line), so quick and easy procedures are needed for moving to all of them. In pointer-based systems, a mouse is used to move directly to the new location. while in many character-based systems, a control key (represented by the symbol *) is held down while another key is p r e ~ s e dThese .~ cursor-movement procedures are some of the first things a user must know to start producing and modifying text, which in turn makes learning other types of procedures easier. Therefore, quick and easy learning of cursor commands is especially important. Many investigators have examined the ease with which people can learn to associate the command letter(s) with cursor movements in characterbased systems. They have contrasted various types of command names, such as nonsense syllables, vowel-deletion, control-character-plus-firstletter, and truncation approaches, and found that they affect the ease with which people can learn and remember the commands and their definitions (see John and Newell, 1987, for an overview). Much of this difference in acquisition ease is based on the principle of stimulus-response compatibility, which was first characterized by Fitts and Seeger (1953) in another context. In previous studies of text editing, the stimulus is often the command name and the response its abbreviation; in the experiment reported shortly, these functions are reversed during initial learning, with the abbreviation serving as the stimulus and its definition as the response; in either case, the greater the similarity between the stimulus and response, the easier the items should be "to generate, recall, or interpret" (Rogers & Moeller, 1984). Thus, it should be easy to learn cursor movement to 'Also. arrow keys can move the cursor to adjacent positions, but require hand movement away from the main keyboard;other keys produce movement to specific locations such as the "end" of the screen, but they access only a tiny fraction of all possible locations.
Alternative Representations
285
the end of the line using the command ^E (as in the EMACS editing system), since it matches the first letter of the keyword “end,” but harder to do so using ^QD(as in Wordstar). since it has no mnemonic value (and also has more letters). Typical experiments on the acquisition of editing commands ask subjects to study the target commands and their definitions, then recall the pairs or judge whether a given command-definition pair is correct. The commands are provided in list form during the study phase; although the reports typically provide no information concerning the order of items within this list, alphabetical listing is quite likely since most manuals use this principle to summarize commands. The work presented here goes back a step and varies the way in which the original study materials are represented. It contrasts the typical list format with a spatial format which reflects how the commands actually move the cursor on the editing screen. Thus, it achieves a different (perhaps “deeper”) level of stimulus-response compatibility than studied previously since it combines information about direction and distance of cursor movement into an integrated map. The experiment also varies the stimulus-response compatibility between command letters and their definit ions.
B. ALTERNATIVEREPRESENTATIONS Six elementary types of cursor movement were selected for study: movement one character to the left, right, above, and below the cursor, and movement to the beginning and end of the current line. Commands used to achieve these movements were taken from EMACS. a characterbased text editor’ selected both because it requires only a single letter keystroke (with a control character) for each command and because few undergraduates in the pool of available subjects know it. The six command letters and their definitions were displayed in alternative representational formats as described in the following sections. 1. List Formut
Although text-editing manuals typically introduce the user to various commands in discourse form, they usually summarize them in list form. The six cursor commands in the present study were displayed in list format, as shown in the top row of the matrix in Fig. 6. The commands are listed in alphabetical order, along with their definitions. Ordinarily the symbol precedes a command letter to indicate that the control key is depressed at the same time; for simplicity it is not used in these displays (although A
%eveloped by Richard M. Stallrnan at the MIT Artificial Intelligence Laboratory.
Ruth S. Day
SYMBOL-DEFINITION CORRESPONDENCE M i s m a t c h (-1
- beginning of B - left one E - last of line F - riQht one N - down line P - up line A
w
line
Match
(+I
- ahead of line B - back one E - end of line F - forward one N - next line P - previous line A
SAMPLE PROBLEMS
---I-.--
-------
h C
P
Fig. 6 . Text-editing cursor movements. The left side shows alternative representations for cursor-movement commands i n a computer text-editing system (EMACS); they vary in terms o f format and whether command symbols match ( + 1 or mismatch ( - ) the key word i n the definition. Sample problems on the right side show simplified computer "screens" i n which the dashes are possible locations for characters, the tilled rectangle is the cursor. and the asterisk shows the desired new location for the cursor.
it is used in this article to clearly identify letters as cursor commands). The alphabetical organization is easy to use when searching for the definition of a particular command. However, it is cumbersome to use when a certain operation is needed but its command name is not remembered; in this case, the retrieval cue is embedded in the list format in a unsystematic way. A spatial representation of the same information was designed to address this problem. 2. Spuriul Formcit A spatial format for the same commands is shown in the bottom row of the matrix in Fig. 6 and features a centrally located box representing the cursor, arrows pointing to the six new locations, and the command letters and their definitions at the end of each arrow. Thus, it provides
Alternative Representations
287
all the same information as the list format, but does so in a way that corresponds directly to its content-how to move from one spatial location to another on a computer screen.
3. Symbol-Definition Correspondence The definitions in displays shown in the right column of the matrix in Fig. 6 feature key words, each of which begins with the command letter. For example, ^B means “back one character” and *F means ‘Iforward one character.” Displays shown in the left column of the matrix use the same command letters; however, their definitions use different key words while still preserving the same basic meaning. For example. ^Bis defined as “left one” and ^ Fas “right one.” Thus, there were two conditions of symbol-definition correspondence, a match condition ( + ) and a mismatch condition ( - 1. This distinction reflects the presence and absence of stimulus-response compatibility, as described previously. 4. All Displuys
Four displays were used, comprising a 2 x 2 design with representation (list, spatial) and symbol-definition (matched or mismatched letters) as factors (Fig. 6). For convenience, these displays are called L( + 1, L( - ), S( + ), and S( - ).
C. SUBJECTS A N D PROCEDURE I. Introdiictorv Phose The 55 subjects in the experiment were assigned randomly to the four representation conditions, with 14 in each except the L( - ) condition (which had 13). The 2 subjects who knew EMACS had been discarded; inclusion of others was not based on previous knowledge of text-editing systems, although information about such knowledge was obtained during the session (as described later). Subjects inspected a schematic diagram of a simplified computer screen, consisting of a box filled with three rows of seven dashes and a small, filled rectangle in the center blank. They were told that the dashes represent possible locations for characters, the rectangle represents the cursor, and why it is important to be able to move the cursor to any new location on the screen. They were also told that they would be learning single-letter commands to move the cursor to various new locations; although these commands are ordinarily combined with a control key, the control symbol would be omitted to simplify the experiment.
288
Ruth S. Day
2. Study Phase Subjects studied one of the four displays so that they “would be able to use the commands in the actual word-processing system.” They had
2 min to do so. 3. Prior Knowledge Subjects provided information concerning their previous knowledge of text-editing systems. They listed the word-processing systems they knew (if any) and indicated their proficiency in each, using a 5-point rating scale (where 5 = excellent, 4 = good, 3 = adequate, 2 = fair, 1 = poor). These ratings were summed to obtain a prior-knowledge score for each subject, with zeros entered for those with no knowledge of any system. There was a wide range of prior knowledge, with scores ranging from 0 to 20 (mean = 3.4, SD = 4.8). All scores were then categorized to obtain two levels of knowledge, high (scores 3 3) and low (scores zz 2). The cutoffs were inspected to be sure that no one was categorized in the high group based on fair-to-poor scores on multiple systems. Even though subjects has been assigned randomly to representation conditions, categorization into knowledge groups produced essentially the same number in all conditions [six highs and eight lows, except six and seven in the L( - ) condition]. Subjects had 3 min to work on this knowledge question and answer other questions about their computer knowledge. Virtually all were finished when time was called. 4 . Problem Solving
In order to test many subjects quickly, a page of schematic computer screens was devised to simulate cursor-movement problems in paper-andpencil form. Sample problems are shown on the right side of Fig. 6. Each problem consisted of a 2 ‘/4 x 1 in. rectangle with 3 rows of 7 dashes to represent a computer screen and possible locations for characters. The cursor box was always located in the central blank and an asterisk was located in one of the other blanks to indicate the desired new location. There were 10 problems and the minimum number of letter commands needed to solve them was I , 2, or 3 (with 3, 4, and 3 cases of each, respectively).The asterisk appeared in the top, middle, or bottom line over trials (with 3, 3, and 4 cases of each, respectively). Subjects tried to determine which command letter or letters were needed to move the cursor to the new location in each problem and wrote their response next to each display. If they wrote more than one letter, they could enter them in any order. They were encouraged to work quickly but accurately and were given 2 min to complete the task. Since additional time was needed to give instructions for this and the prior knowledge
289
Alternative Representations
task, as well as to answer subjects’ procedural questions, about 10 min elapsed between the end of the study task and the beginning of the problemsolving task. 5. Command Memory
In order to assess what subjects remembered about each command, three brief tests were given. The test page consisted of three sections, all of which listed the six commands (in alphabetical order) with a blank next to each. In the definition test, subjects wrote a brief definition for each command, trying to capture its “basic gist.” The other tests required subjects to go beyond the explicit information provided in the study phase. In the line-movement test, they entered the word “between” or “within” next to each command to indicate how the cursor moves with respect to lines; thus, ^A. ^B, ^F. and ^Emove within lines, while ^Pand ^N move between lines. In the distance-traveled test, they entered “adjacent” or “distant” next to each command to indicate the amount of movement produced by each; thus, ^B, *F, ^P, and ^Nmove to immediately adjacent locations. while ^A and ^Emove to distant locations. Subjects had as much time as they needed to complete these tests. D. RESULTS AND DISCUSSION I . Problem-Solving Accuracy
Problem-solving responses were scored as correct if they attained the requested cursor location irrespective of the number or order of commands given. Thus, for Problem 2 in Fig. 6, both ^AP and ^PA are correct, as well as the less-efficient sequences *BBBPand *PBBB.hOverall performance was good, with subjects solving 84% of the problems. Performance was good partly because there were only six commands to learn, a leisurely amount of time was given to learn them, a brief amount of time elapsed between study and test phases, and the problems were quite easy (since the cursor was always located in the center position). a . Representation. Subjects who studied the spatial format clearly had an advantage; they solved 91% of the problems while those who studied the list format solved only 77% [F(1.54) = 4.32, p < .05]. The spatial representation comes closer to the intrinsic meaning of the commandshow they move the cursor from one spatial location to another. It does so both by placing the command letters in the actual locations they produce “Ordinarily. such multiple responses are written as *PB^B*B,to indicate that the control key is held down during all commands; for simplicity only an initial symbol is used here. Again, subjects never saw the control symbol.
-
Ruth S. Day
290
and by connecting the cursor to these locations with directed arrows. Thus, it provides an integrated map of two features, the direction and distance of cursor movements, and also highlights their differences. As subjects study the display, they must scan to each location to read both the command letter and its definition. Therefore, physical location of the to-belearned information provides additional, intrinsically meaningful cues for remembering the information; it is also possible that eye and/or head movements involved in scanning these displays provide some sort of generalized motor elaboration as well. b. Symbol-Definition Correspondence. Matching stimuli and responses in a compatible way generally facilitates the acquisition and recall of material. As shown on the left side of Fig. 7. however, the mismatched condition produced better performance, although this advantage was not signifcant (F(I .54) = 2.91, p = .W]; subjects in the mismatched conditions solved 90% of the problems, while those in the matched conditions solved 78%. This result is puzzling until the command letters are examined closely. All six were taken directly from EMACS and then defined with
ACCURACY EFFICIENCY 3 v)
W
Y
0
I-
E2
0 00W
v)
>
50-
8
40-
W Y
30-
* l v)
20 -
W
0 X W
lot
Mismatch (-1
Match (+)
List
Spatial
FORMAT
CORRESPONDENCE Fig. 7. Results from the text-editing experiment for representations which varied in format and symbol-definition correspondence. The left side shows problem-solving accurdcy measured in the percentage of problems solved correctly. The right side shows problemsolving efficiency measured by excess keystrokes; thus. higher scores indicate lower efficiency.
Alternative Representations
291
either matched or mismatched key words. Thus, a potentially important part of the picture is missing-subjects may generate different (implicit) labels for the moves, which then conflict with those provided in the study materials. The control experiment described next evaluated this possibility. In order to evaluate the potential role of subject-generated definitions in producing better performance in the mismatched condition, 50 new subjects from the same pool saw six display “screens” like those in the present experiment. Each contained a centrally located cursor box, with an asterisk in a location attained by using only one of the six EMACS commands. Subjects were told that the goal in each case was to “move the small rectangle from its current location to the location of the star (*)” and were asked to “write a brief phrase next to each display to describe this movement.” The vast majority of responses for horizontal movement were “left” (92% for ^A and 92% for *B) and “right” (88% for ^E and 94% for -F), while those for vertical movement were “up” (96% for ^P)and “down” (94% for *N). The geographical labels “north, south, east, west” ran a very distant second (5% overall). The remaining 2% of responses were “back,” “over,” and “forward.” Thus, subject-generated definitions for the six cursor movements were very different from those used in the experiment. Apparently, subjects performed better in the mismatched condition because some of its key words correspond to their implicit definitions (*B = “left one,” *F = “right one,” ^N = “down line,” ^P = “up line”) even though the command letters do not match their key words. People may learn cursor commands more effectively if they correspond to these implicit definitions: *L,*R,-D,YJ. Of course these letters may already be dedicated to other functions, such as the “reverse search” procedure for ’R in EMACS. c. Prior Knowledge. Prior text-editor knowledge had no effect on problem-solving scores (neither a significant main effect nor interactions with other variables). Thus both high- and low-knowledge subjects performed better with the spatial representation and with mismatched symboldefinition pairs. Prior knowledge did affect problem-solvingefficiency, as described in the following section.
2. Problem-Solving Efficiency
Although subjects may solve a given problem, they may not do so efficiently. The sample problems in Fig. 6 illustrate the idea of problemsolving efficiency. Problem 2 can be solved efficiently with a minimum of two keystrokes (*APor ^PA) or inefficiently with four keystrokes (*BBBPor ^PBBB). Not all problems afforded the opportunity for excess keystrokes to occur. For example, Problem I in the figure has one solution (*F),while Problem 3 has several solutions but all have the same number
292
Ruth S.
Day
of keystrokes (*NBB, *BBN. ^ANF,^NAF).Only the three problems which provided opportunities for excess keystrokes were used to evaluate the efficiency of correct responses; Problem 2 required movement to the upperleft corner, Problem 9 to the left edge of the middle line, and Problem 10 to the lower-right corner. Thus, all required subjects to use a distantmoving keystroke (^A or ^E)for efficient solution with or without an adjacent-moving one. The minimum number of keystrokes (Kmin)needed to solve each problem was determined. as well as total keystrokes per response (Klol).Problemsolving efficiency was then assessed by calculating the excess keystroke score (K.,) for each correct response:
For example, the response ^BBBPfor Problem 2 yields K, = 2 (two excess keystrokes). There were 0.78 excess keystrokes for Problem 2, 0.65 for 9. and 0.55 for 10; the numbers of subjects who solved each of these problems, and hence contributed to these means, were 46,46(a different subset), and 40,respectively. The magnitude of these excess keystrokes may not seem very great until aspects of everyday editing are considered. Ordinarily, many cursor-movement problems need to be solved-often in the hundreds for manuscript preparation. Also, everyday problems are much more complex than those studied here; the display screen has many more locations for characters, documents are much longer than one screenful of material, and the cursor can be in any location relative to the desired new location. Thus, inefficient use of cursor commands can accumulate to substantially lengthen the editing process. Inefficiency may also affect document quality by making the editing process so tedious that the user resists making many substantive or stylistic changes. a . Representution. Subjects who studied the list format had 2. I excess keystrokes across all target problems, while those who studied the spatial format had only 1.2. This effect occurred for Problem 2 [F(I ,45) = 3.80,p = .06] and Problem 9 IF(1.45) = 4.08.p = .05]. Thus, representation affected problem-solving efficiency as well as accuracy. In both cases the list format adversely affected performance-making it less accurate. and when accurate, less efficient. Although there were no reliable interactions of representation with other variables, it is interesting to note that high-knowledge subjects who studied the spatial displays gave no excess keystrokes whatsoever. h. Symbol-DeJnition Correspondence. The matched or mismatched status of symbols and their definitions had no effect on efficiency scores for any of the target problems.
Alternative Representations
293
c. Prior Knowledge. Subjects with low prior knowledge of text-editing systems had 2.3 excess keystrokes across all target problems, while those with high knowledge had only I .7. This effect occurred for Problem 2 [F(1,45)= 5.91, p .05] and Problem 10 [F(1,39) = 3.40, p = .08]. Thus, knowledge level affected problem-solving efficiency even though it did not affect accuracy.
-=
3. Command Memory
Responses on the definition test were scored as correct only if they included both the direction and distance produced by the cursor command. For example, the response “forward” for *F is wrong since it does not indicate how far forward the cursor moves (one character? to the end of the line?). Variant wordings were scored as correct, as long as the basic gist was correct. Scoring the line-movement and distance-traveled tests was straightforward since each permitted only two possible responses (“betweedwithin” or “adjacent/distant”). a. Experimental Variables. Performance on all command-memory tests was excellent: 92% correct for definitions, 92% correct for line movement, and 8Wo correct for distance traveled. Because of ceiling effects, it is difficult to observe any effects of the various experimental variables on these command memory scores. Nevertheless, some marginal results are of interest. The spatial format facilitated performance on the definition test iF(1.54)= 2.W. p = .09], suggesting that representation can affect memory for the six commands as well as the ability to use them in problem solving. Subjects with high prior knowledge of text-editing systems also had a marginal advantage in recalling the definitions [F(1.54) = 2.96, p = .09]. Symbol-definition correspondence did not affect memory for the commands, just their use in problem solving (as described previously). There were no interactions among combinations of the experimental variables on the definition test and no main effects or interactions on the line-movement or distance-traveled tests. b. Declarative and Procedural Knowledge. Subjects clearly acquired knowledge about simple cursor movement in EMACS. But what kind of knowledge was it? The definition test showed that they were at least able to learn and recall the original definitions. But did they fully comprehend them? High scores on the other command-memory tests suggest that they did. However, the relationships among these three scores present a somewhat different picture. There was no correlation between definition scores and either line-movement or distance-traveled scores. Yet line-movement and distance traveled scores were highly related ( r = .75, p < .OOOl). Thus, there was a dissociation between definition scores which measure
294
Ruth S. Day
explicit knowledge of the commands and the other tests which measure implicit knowledge. This dissociation suggests that some subjects may have had declarative knowledge but not full procedural knowledge (Anderson, 1982). In order to evaluate the potential role of the declarative-procedural distinction, relationships between command memory and problem-solving scores were examined. Problem-solving accuracy was related to all command memory scores; its correlation with definition score was .35 ( p < .Ol). .36 (p < .01) with distance traveled, and .24 (p marginal at <.lo) with line movement. Problem-solving efficiency showed a different pattern of relationships with command-memory scores. Excess keystrokes were correlated with both line-movement scores ( r = - .3 I, p < .05) and distance-traveled scores (r = - .32, p < .05), but were unrelated to definition scores. Although these correlations are small and do not account for much of the variance, their overall pattern is consistent with the declarativeprocedural distinction. Thus, declarative knowledge could apparently enable subjects to get the job done, but procedural knowledge helped them do so more efficiently. 4. General Discussion
Alternative representations of the same information had pervasive effects in this experiment-they affected problem-solving accuracy, problemsolving efficiency. and command memory. The effects were dramatic given several aspects of the situation: ( I ) the paucity of information to be learned (only six pairs, each consisting of a single letter and a two- to three-word definition),(2) the easy pace of the experiment (20 sec to learn each command and 12 sec to solve each problem), (3) the simplicity of the problems (all "screens" had only 21 possible locations and the cursor was always in the center position). (4) the few problems given (10). and ( 5 ) the academic aptitude of the students (see footnote 3). Factors which made the experiment easy pushed performance toward the ceiling and thereby yielded some effects which were only marginally reliable. Subsequent work (Day & Diaz. in preparation) retained the same six commands and representations. but made other aspects of the experiment more difficult and the results magnify some of the results reported here. For example, subjects who studied the list format were highly inefficient-they gave over a dozen times more excess keystrokes than those who studied the spatial format. The most appropriate representation used to display text-editing commands may depend on the nature of the task to be performed. If the goal is simply to find definitions for particular commands, then an alphabetical list may be adequate. However. if the goal is to learn the commands well enough to use them effectively. then a representation which emphasizes their meaning is needed. Other types of list organization may be helpful,
Alternative Representations
295
such as those that group similar commands together, as shown in the following outline: LEFTWARD B-left one A-beginning of line RIGHTWARD F-right one E - e n d of line UPWARD P-previous line DOWNWARD N-next line This display emphasizes that there are only four directions of movement. It highlights the differences between commands that move in the same direction, and it introduces grouping labels that match the implicit definitions subjects possess. Other possible organizing principles include grouping by distance traveled (adjacent, distant) or by line movement (between, within). Nevertheless, the spatial format used in the present experiment goes beyond the advantages provided by such grouped lists because it locates the commands in the actual spatial locations they control. Thus. the structure of this display is meaningful in itself. The disadvantages of the list representation demonstrate what might be called the Procrustean Principle. The mythological Greek character, Procrustes (Diodorus, circa 36 B.c.). proclaimed that he had a bed that fit all overnight guests perfectly. In fact it did, for he chopped off their heads and feet if they were too long, or literally stretched them out if they were too short-obviously to their disadvantage. Similarly, it may often be disadvantageous to force information into an Q priori structure (such as a list) without taking its intrinsic meaning into account. In other words, structure should be made to match content whenever possible, not vice versa, especially when people must learn, comprehend, and use the information. The intially surprising finding that subjects did somewhat better when command letters and key words did not match suggests than an otherwise well-established principle, stimulus-response compatibility, may need some amplification. Evidently this principle can operate on various levels, perhaps at the same time. In the present case these levels include the degree of match between ( I ) the command letter and the first letter of the key word, (2) the nominal key word (provided by the experimenter) and the functional key word (implicitly provided by the subjects). and (3) the external representation provided and one which is intrinsic to the infor-
2%
Ruth S. Day
mation itself. All of these compatibilities (or the lack of them) may have contributed to the present results. In fact, whenever people must learn definitions for items, similar types of compatibilities may be involved. This research may hold some interesting implications for current theories of text editing in particular and human-computer interactions in general. For example, if the GOMS model (goals, operations, methods, and selection rules) of Card et al. (1980) is implemented as a simulation program (e.g., Polson & Kieras, 1985), the number of algorithms needed may differ depending on the initial representation of the command-definition pairs. Such a situation would then increase any discrepancy between the simulation and data from experimental subjects. Thus it is important to consider the initial representation of information given to both computers and humans when evaluating a model. 5 . Practical Applications When people learn a text-editing system, using the spatial format for cursor commands should enable them to learn the commands more easily and thoroughly. This is important for users who are new to computers, for cursor commands enable them to do some useful work almost immediately, thereby providing an initial positive experience (and perhaps reducing their computer anxiety). The spatial format also introduces them to basic cursor-movement concepts (e.g., movement occurs in only four directions), as well as to the specific commands in the system they learn. In fact, the spatial display without any commands or definitions present might be useful as an initial orienting device to introduce new users to the general nature of cursor movement; command letters can then be placed in the empty slots, along with their definitions. Experienced users should also benefit from using the spatial format when they learn an additional character-based editing system. They already know the basic concepts behind cursor movement, but must set aside known commands for each move and replace them with new ones. However, interference from old commands is a very frequent and frustrating problem. Placing the new command letters on the spatial format, perhaps along with the old letters, should both facilitate acquisition and reduce interference effects. In other words, an appropriate representation should facilitate transfer to a different system. This research suggests some strategies for designing new text-editing systems. First, all commands are grouped by semantic content, such as cursor movement, insertions, deletions, searching, and replacing. Then potential users (with and without prior experience) generate definitions for commands within each category. Although previous research (Landauer, Galotti, & Hartwell, 1983) found no agreement in terms generated across semantic categories, there may be agreement within categories, as
Alternative Representations
291
demonstrated here. When there is high agreement concerning a given definition. it is accepted and the first letter of its key word becomes the command letter. Conflicts involving the use of the same letter for different commands are then noted and resolved by considering the amount of intersubject agreement for each definition and the relative frequency with which most people will use each command. Finally, representational formats are developed which best highlight the nature of each semantic category. Thus, the principles of semantic grouping, implicit labeling, and intrinsic representation can be used to design cognitively based editing systems. Such an approach should enable users to become more proficient more quickly, and perhaps even enjoy the learning process. V. Overview of Experiments
The experiments on bus schedules, medication instructions, and textediting commands all lead to the same basic conclusion-alternative representations of the same information have clear and often dramatic effects on cognition. The experiments themselves differed in many ways, including whether the amount of information to be learned was relatively heavy (bus schedules) or light (text editing); whether the domain occurs in everyday situations (bus schedules, medications) or more specialized settings (text editing); whether a list representation was compared with another type of list (bus schedules), a matrix (medications), or a spatial format (text editing); and whether the cognitive processes examined were memory (bus schedules, medications, text editing), comprehension (medications, text editing), or problem solving (text editing). Despite all these differences, some representations always facilitated performance while others hindered it. A.
GENERALPROPERTIES OF REPRESENTATIONS
All representations studied here contained items (letters, words, or pictorial symbols) which were paired with either definitions or instructions. Thus, subjects had to learn the meaning of footnote symbols in bus schedules, when to take medications, and how commands move the cursor on an editing screen. Representation of such pairs can be provided in many forms. When they were given in list form, subjects had difficulty-especially when items were listed in random order (bus schedules, medications) or alphabetical order (text editing). However, items in lists can often be categorized in terms of their meaning. When such categories were made explicit through spatial grouping, performance improved (bus schedules). Sometimes each item in a list can be characterized in terms of a single
298
Ruth S. Day
semantic dimension. For example, medications can be characterized by times when they are to be taken, which reflects the immediate concerns of patients. If the subjects were medical personnel reviewing patients' drug regimens, then the purpose of the medications (e.g.. heart regulation, blood pressure reduction) might be more important. Thus, the semantic dimensions selected to enhance the meaning of information should be relevant to the intended users. When the mappings between items and a relevant dimension were made explicit by the matrix format, performance improved. Sometimes, list items can be characterized in terms of multiple dimensions. For example, cursor commands can be characterized by the directions in which they move and the extent of that movement. One way to emphasize the role of multiple dimensions is to use a hierarchical representation, such as the outline shown in the general discussion section of the text-editing experiment. However, other types of representations may be more intrinsic to the overall meaning of the items. Thus, when the locations controlled by the cursor were used to construct a spatial format, performance improved. To summarize, an important part of selecting an effective representation is to determine whether items can be grouped into separate semantic categories or along one or more semantic dimensions. Representations appropriate in one of these situations may not be appropriate in another. For example, the bus schedule items could be shown in a matrix with the names of their semantic categories across the top (e.g., stopping procedures, schedule restrictions, holiday changes, station services). However, once the check marks are filled in, each item will have only one check mark since each item belongs to only one category. If the items are listed in haphazard order along the side of the matrix, the pattern of check marks will be quite confusing and induce many additional scanning operations; if items are listed by semantic category, then the check marks will be entirely redundant and just add visual clutter. One goal of this research is to identify general properties of representations and the nature of their correspondence to general aspects of information. Then we can construct potentially appropriate (and inappropriate) representations for specific sets of information and evaluate their effectiveness as people use them. If we can characterize general features of effective representation based on empirical data, then we can design representations to facilitate the acquisition of almost any type of information by almost any type of person.
B. How REPRESENTATIONS AREUSED Although it is important to study general properties of representations, it is also important to determine how people use them to acquire, store,
2 9
Alternative Representations
recall, comprehend, and use the information displayed. The experiments reported here provide some preliminary discussion of processes such as scanning, comparing, retrieving, and elaborating information. Subsequent work (Day, in preparation) is designed to examine these processes in a more detailed way. For example, during acquisition people may inspect some parts of a given display more often, for longer periods of time, or in a particular order. These differences may show up in subsequent free recall and affect other memory, comprehension, and problem-solving tasks. VI.
Research Strategy
Although representation has been studied in many different areas of cognitive psychology, most investigators follow a similar research strategy. The typical approach is to search for the representation people use. Sometimes this search is data driven; an interesting effect emerges from empirical work and a particular type of representation is then proposed to interpret the results. Sometimes the search is concept driven; the investigator has a general theory about the nature of a given representation and then performs experiments or simulations to test it. The work presented here suggests a different research strategy, as illustrated in Fig. 8. There are two stages in this alternative representations approach; for lack of better terms they are called the do-do and can-do approaches. In the do-do approach, we set up a task and let subjects
RESEARCH STRATEGIES
Approach
search for
Approach
A
'THE' reprerentation use any use'any representation
UOO
Rep. +1
u's
a Rep. +2
Fig. 8. Two strategies for studying representation.
300
Ruth S. Day
perform it using whatever representation(s) they wish. We observe what subjects do do-that is, what representation(s) they apparently use as suggested by features of their performance and perhaps by verbal descriptions of what they did. The do-do approach provides baseline performance levels reflecting subjects’ abilities when left to their own devices. It also generates a variety of possible representations, some of which may not have occurred to the experimenter, that can be used as external formats in the can-do phase of the research. In the can-do approach, we provide subjects with an an external representation in a particular alternative form, have them perform the same task, and observe performance differences relative to those obtained in the do-do approach. The resulting internal representations may differ in various ways from the external one, yet still bear a closer resemblance to it than to other possibilities; in fact, this approach can be used to study the degree of isomorphism (Shepard, 1978) between internal and external representations. The internal representations should also be more similar across subjects and less variable within subjects. Furthermore, subjects should spend less time generating, selecting. and trying possible representations. In short, much of the variability inherent in the do-do approach is reduced in the can-do approach. Since external representations constrain subjects’ internal representations, they enable us to evaluate their effects on performance more carefully and compare the relative effectiveness of alternative representations. The results can have considerable educational as well as theoretical significance. For example, work in progress shows that representations not previously in subjects’ repertoire can sometimes work better than those that are. Therefore, teaching people new representations may enable them to learn and use information more effectively and, perhaps, increase the overall quality of their thinking.
V11. Toward a Comprehensive View of Representation
Previous research on representation has been very informative. It has examined representation in a variety of content domains and cognitive tasks. However, the representations studied often vary widely across these situations, making it difficult to generalize the results. Perhaps it is time to reassess our current approach and add some new components. In order to achieve a comprehensive view of the role of representation in human cognition. interactions among content domain, form of representation, and cognitive processes should be studied as illustrated in Fig. 9 and discussed in the following sections.
Alternative Representations
Fig. 9.
A.
30 I
A comprehensive view of representation.
DOMAINS
Much of the research on representation employs one type of stimulus materials, such as words, sentences, connected discourse, semantic categories, geometric figures, or pictures. Although this approach is useful in many ways, it misses potential interactions between types of materials and representations. That is why the experiments reported here examined a variety of content domains from both everyday life and more specialized settings. In other parts of this research program we are studying alternative representations for many other domains including recipes, diet plans, consumer decisions. dance routines, lecture notes (Day, 1980). chemical molecules, text, computer programming, and hardware design. The representational traditions and intrinsic needs of these domains are rich and varied; by considering each domain anew without seeking "the" optimal representation, we can distinguish general properties of representations that are domain dependent from those that are independent. On a more practical note, we plan to export representational conventions from some domains to those which are representationally poor, in order to make them easier to learn and understand.
302
Ruth S. Day
Given that it is useful to study a variety of content domains, how do we select particular ones to pursue? Whether we have a specific theoretical framework or just want to explore the properties of alternative representations in general, one approach is to begin with situations in which people have trouble learning, remembering, understanding, and/or using information. Everyday situations are especially useful, since people are already highly motivated-motivated to select the best bus. plane, rental car, video cassette recorder, lawnmower, computer, college, job, or diet plan; to follow directions correctly to find someone's house, make Ukranian poppyseed cake, assemble a do-it-yourself kit, or set up a new stereo system; to solve problems in games (checkers, chess. Monopoly, bridge); and to perform well in academic settings (solve problem sets, write papers, study for tests) and job situations (increase sales, optimize worker productivity). Admittedly, these domains are more complex than the simple word lists and other materials used in many cognitive experiments; nevertheless, they can be partitioned into manageable subsets of information (as illustrated for the EMACS text editor here) and represented in carefully controlled ways. Furthermore, by selecting subjects more-or-less knowledgeable in these domains, we can study how representation varies with degrees of expertise.
B. FORMSOF REPRESENTATION Much of the previous research has argued that representation is basically linguistic, pictorial, or propositional in nature, or that knowledge is represented in terms of entities such as features, sets, or networks. Some studies compared two types of representation, such as text and diagrams in physics or mechanical problems (Larkin & Simon, 1987; Hegarty. in press). As argued here, representation of the same information can often take many alternative forms, including text, list, matrix outline, tree, spatial map, graph, network, and pictorial formats. Nevertheless, some types of representation are traditionally used for specific content domains. Graphs are used widely in the natural and social sciences but rarely in the humanities, biology relies heavily on taxonomies, chemistry uses pictorial representations of molecules, and so forth. However, there may be less reason to limit a given domain to the types of representations dictated by tradition than previously thought. For example, historical information can sometimes be usefully represented by graphs (Tufte. 1983). Thus, we need to take a given set of information, represent it in traditional and nontraditional ways, and evaluate their effects on cognitive tasks. This alternative representations approach should suggest new types of useful representations for many domains, yet still identify characteristics of domains that make certain types of representation essential.
Alternative Representations
303
C. COGNITIVEPROCESSES
Research on representation has studied a variety of general cognitive processes, including perception, memory, comprehension, and problem solving. Although investigators acknowledge that such global processes overlap and interact, most study representation in only one (or at best two) of them. It is important to study representation across many types of processes, in order to determine whether a general theory of representation is realistic and to avoid making pronouncements about "human representation" which are, in fact, limited to certain types of cognition (or worse, are paradigm bound). Some representations may have similar effects across cognitive processes, while others may have quite different effects. By observing the relative success and failure of alternative representations across broad areas of cognition, and for various tasks within them, we can more fully understand the processes they influence. For example, they may affect which types of operations are performed (e.g.. searching, comparing, retrieving, elaborating) and/or the time needed to perform them. D. CONCLUSION
The argument presented here is not that our previous approaches to studying representation are wrong. They have, after all, yielded many important theoretical concepts and empirical results. The suggestion is that it is time to extend our research strategy in order to examine possible interactions among forms of representation, content domains, and cognitive processes. Only then can we get a comprehensive view of the role of representation in human cognition. ACKNOWLEDGMENTS The framework for this research was developed during a sabbatical year at CarnegieMellon University. Jim Staszewski provided useful comments on a previous draft of the manuscript. as did Ellen Stoltzfus. Lynne Diaz. Bonnie John, and Fran Whaley.
REFERENCES Anderson, J. R. (1982). Acquisition of cognitive skill. PsycAoIogiccrl Review. 89, 369406. Anderson, J. R . ( 1985). Cognitive psychology und its imp/icctfion.s.New York: Freeman. Anderson. J. R. & Bower, G. H. (1973). Human iissociiitive nremoty. Washington. DC: Winston. Bower. G. H . (1972). Mental imagery and associative learning. In L.Gregg (Ed.). Co~gni~iiin in Iccirning cmd memory. New York: Wiley.
Ruth S. Day
304
Bower, G . H., Clark, M. C.. Lesgold. A. M. & Winzenz, D. (1%9). Hierarchical retrieval schemes in recall o f categorical word lists. Joiirnnl of Verhcil Leurning rind Verhul Belruvior. 8, 323-343. Card. S. K., Moran. T. P., & Newell. A. (1983). The psycliology of Iiciincin-c~otnprrtcrinteruction. Hillsdale, NJ: Lawrence Erlbaum. Clark. H. H. ( 1969). Linguistic processes in deductive reasoning. P.s.vc~hologiculRiwiew. 76, 387-404. Day, R. S. (1980). Teaching from notes: Some cognitive consequences. I n W. J. McKeachie (Ed.), New directions for teuching und leurning: Leurning. c.ognition. ond cwllegc teucliing (pp. 95-1 12). San Francisco: Jossey-Bass. Day, R. S. (in preparation). Wuys to slioiv it: Mentul representtition if ideus. Day. R. S..& Diaz. L. T. (in preparation). Alrernurive represcnrcition.s.~~r text-editing cwmmunds. Day, R. S..& Stoltzfus. E. R. (in preparation). Alternuriiperepresenrutions for medicwtion ins rrirctions Diodorus (rirrci 36 B.C.). In L. Dindorf (Ed.). The librury qfhistoff ofUiodorii.s 0fSicil.v. lV.59.5 (pp. 232-233). Paris: A. Firmin Didot, 1855. Duncker, K. ( 1945). On problem solving. Psycltologicul Monogrup1i.s. 58, (270). Ebbinghaus, H. E. ( 1964). Memory: A contribution to experirnentul psycliologv. New York: Dover. (Originally published 1885.) Epstein, L. H., & Cluss. P. A. (1982). A behavioral medicine perspective on adherence to long-term medical regimens. Jorirnul of Consulting und Clinicul P.sycliology. 50, 950971. Fitts, P.M.. & Seeger, C. M. (1953). S-R compatibility: Spatial characteristics of stimulus and response codes. Joiirnul of Experimental Psyclroloxy. 46, 199-2 10. Glass A. L., & Holyoak, K. J. (1986). Cognition. New York: Random House. Haynes, R. B.. Taylor, D. W., & Sackett. D. L. (Eds.). (1979). Coinpliunce in Iieulrli cure. Baltimore. MD: Johns Hopkins Press. Hegarty. M. (in press). Understanding machines from text and diagrams. In H. Mandl & J. Levin (Eds.). Knowledge ucqiiisitionfrom text cind pictiire. Amsterdam: North Holland Publ. HEW. ( 1978). Workshop on plirirmricology und uging. Department o f Health. Education and Welfare Publication No. (NIH) 78-353. Huttenlocher. J. ( 1968). Constructing spatial images: A strategy in reasoning. P.syc~hnlogii~cil Review, 75, 550-560. John. B. E. & Newell, A. (1987). Predicting the time to recall computer command abbreviations. Proceedings of the CHI 1987 Conference on Hiirnun Fiicrors in Coinpitring. pp. 3 3 4 0 . Kintsch. W. (1974). The representufion of meaning in memory. Hillsdale. NJ: Erlbaum. Klatzky. R. L. ( 1980). Hiimun memory: Striictiires undproresses. San Francisco: Freeman. Kosslyn, S. M. ( 1980). lmuge und mind. Cambridge, MA: Harvard University Press. Landauer. T. K.. Galotti, K. M.. & Hartwell. S. (1983). Natural command names and initial learning: A study o f text-editing terms. Commiinicutions cftlre ACM. 26, 495-503. Larkin J. H. & Simon, H. A. (1987). Why a diagram i s (sometimes) worth ten thousand words. Cognitive Science, 11, 65-99. Martin, E. ( I97 I). Verbal learning theory and independent retrieval phenomena. Psychologiccil Review. 78, 314-332. Morris. C. D., Bransford, J. D.. & Franks. J. J . (1977). Levels o f processing versus transfer appropriate processing. Joirrnul of Verbal Leurninx cind Verbiil Bcliuiior. 16, 5 19-533. Newell. A.. & Simon, H. A. (1972). Hitman problem solving. Englewood Cliffs, NJ: PrenticeHall.
.
Alternative Representations
305
Noble. C. E. ( 1952). The role o f stimulus meaning (m) in serial verbal learning. Joiirncil of' Erperitnentcil P.svcho1iig.v. 42, 437-446. Norman. D. A.. & Rumelhart. D. E. (1975). Explorutiiin in ciigniriiin. San Francisco: Freeman. Paivio. A. (1971). Irnugerp find verbul processes. New York: Holt. Polson, P. G . . & Kieras. D. E. (198s). A quantitative model of the learning and performance of text editing knowledge. Proceedings of the C H I 1985 Ciinferenc.e on Hiirncin Fcic/or.s in Cotnpiiting. pp. 207-212. Pylyshyn, Z. W. (1973). What the mind's eye tells the mind's brain: A critique of mental imagery. Psyc~hologic~cil Biilletin, 80, 1-24. Reed. S. K.( 1982). Cognirbn: Tlreory and cipplicutions. Monterey. CA: Brooks/Cole. Reynolds A. G . . & Flagg, P. W. (1983). Cognirive p.syclio/ogv. Boston: Little, Brown. Robinson. J. D..& McKenzie. M. W. (1984). Pharmacists' views on mandatory patient counseling. Drug Intc~lligenc~e und Cliniccil P/iurtniic:y. 18, 913-917. Rogers. W. H.. & Moeller. G . (1984). Comparison o f abbreviation methods: Measures of preference and decoding performance. Hitmcin Fiictors. 26, 49-59. Rumelhart. D. E.. & Norman. D. A. (in press). Representation in memory. I n R. C. Atkinson. R. J Herrnstein. G . Lindzey. & R. D. Luce (Eds.). Stevens' Ireindbook of e~.rpc~ri/nenrcil ps.vclro1og.v. New York: Wiley. Sackett D. L.. & Haynes, B. R. (1976). Comnpliuncc~airh tkrrcipriiric~reginrens (pp. 9-25). Baltimore: Johns Hopkins Press. Sackett. D. L., & Snow, J. C. (1979). The magnitude o f compliance and noncompliance. In R. B. Haynes. D. W. Taylor, & D. L. Sackett (Eds.). Co/nplicincc, in Ireiilth cwrij. Baltimore: Johns Hopkins Press. Shepard. R. N. (1978). The mental image. Americwn Psvc~hnlogist.33, 12s-137. Solso, R. L. ( 1988). Cognirive ps.vc/io/iiy. Boston: Allyn & Bacon. Tufte. E. R. ( 1983). The i*i.siiciI di.sp1ei.v of qircintitarii~einjiirmcition. Cheshire, CT: Graphics Press. Underwood. B. J. (1963). Stimulus selection in verbal learning. I n C. N. Cofer & B. S. Musgrave (Eds.1, Verbeil behavior and Ieurning: Prohlems cind pruiccssrs ( pp. 33-48]. New York: McGraw-Hill. Wickelgren. W. A. (1974). How to solve problems. New York: Freeman. Yuille. J. C. (1983). Imccgery. memory. und cognition: Esseiv.s in honor of Allein Piiivio. Hillsdale. NJ: Erlbaum.
This Page Intentionally Left Blank
EVIDENCE FOR RELATIONAL SELECTIVITY IN THE INTERPRETATION OF ANALOGY AND METAPHOR Dedre Gentner Cutherine Clement
I. Introduction Analogies and metaphors are pervasive in language and thought. They range from scientific analogies, such as “electricity is like flowing water.” to expressive comparisons such as “the moon, the massy pearl of night,” to whole systems of extended meanings, such as “rising and falling GNP” (Lakoff & Johnson, 1980; Kittay & Lehrer. 1981; Nagy, 1974). In this article we ask how such comparisons are interpreted: that is. how the meaning of an analogy or metaphor is derived from the meanings of its terms. We compare four accounts of the interpretation process: Tourangeau and Sternberg’s (1981) multidimensional space account, Ortony’s (1979) salience imbalance theory, Holyoak’s (1985) pragmatic account, and Gentner’s (1980, 1983) structure-mapping theory. We argue, both on theoretical and empirical grounds, that despite its representational complexity, only the structure-mapping account is adequate to describe the phenomenona. Before laying out the theories let us be clear about the issues on which we are focusing and the kinds of comparisons we want to explain. Our concern is with the interpretution of metaphor and analogy, that is. with the way in which the meaning of a metaphor or analogy is derived from the prior representations of its constituent terms. We are not aiming at this stage to describe the real-time processing steps by which the interTHE PSYCHOLOGY OF LEARNING AND MOTIVATION VOL ??
307
Cupynghl @ J 19W hy Acddemic h e w . Inc All nghh ofreproduction in dny form re*erved
308
Dedre Gentner and Catherine Clement
TABLE 1 A SELECTION OF COMPARISONS I. Lemonade i s like water.
2. Heat is like water: it flows down a temperature gradient. 3. Ifwe do not plant knowledge when we are young. it will give us no shade when we are old. 4. Sharp wits, like sharp knives, do often cut their owner's fingers. 5 . She allowed life to waste like a tap left running. (Virginia Wolfe) 6 . I have ventured. /Like little wanton boys that swim on bladders. /This many summers in a sea of glory: /But far beyond my depth. M y high-blown pride /At length broke under me: and now has left me, /Weary and old with service, to the mercy /Of a rude stream. that must for ever hide me. (William Shakespeare) 7. The glorious lamp of heaven, the sun. (Robert Herrick) 8. Coffee i s like a solar system.
pretation is performed. Rather, the goal is a descriptive model of how the derived meaning of the comparison relates to the initial representations of its terms. Our interest is in what Marr (1982) called the computational level rather than in the algorithmic level. Or, to put it in Palmer and Kimchi's ( 1985) terms, we are primarily interested in the informational constraints on interpretation, rather than in the behavioral constraints. Table I gives an idea of the range of comparisons we must consider. These assertions differ in the degree and kind of similarity they convey. Statement I expresses literal similarity: it tells us that most of what we know about water can be applied to lemonade. Statements 2-7 are nonliteral similarity comparisons-either analogies or metaphors. Many of the comparisons could be labeled as either analogy or metaphor (or simile),' and for many purposes these two categories are alike. We will combine analogy and metaphor for now; later we discuss their differences. Statement 8 is an anomaly because the two terms have nothing in common. It is included to underscore a simple point. In defining metaphor it is not sufficient to differentiate it from literal similarity; we must also differentiate it from anomaly. This means that in order to lay out the interpretation rules for metaphor we must consider what makes a nonliteral comparison apt. We begin by reviewing three theoretical approaches to analogy and metaphor in increasing order of the complexity of their representational assumptions: ( I ) the multidimensional space models of Rumelhart and Abrahamson (1973) and Tourangeau and Sternberg (1981); (2) Ortony's 'Similes differ from metaphors i n that they contain an explicit comparative term such as "like" or "as." However, since available evidence suggests that the underlying interpretation processes for similes and metaphors are highly similar (Reynolds & Ortony. 1980). they will be considered together throughout this article.
Relational Seketivity In Metaphor
309
salience imbalance theory (Ortony, 1979; Ortony, Vondruska, Foss, & Jones, 1985); and (3) Gentner’s (1980, 1983, 1986a. 1988a) structure-mapping theory. In the Section 111 we present three experiments contrasting structure-mapping with salience imbalance. In the Section IV we take up the question of higher-order knowledge structures in analogy interpretation. We contrast structure-mapping with Holyoak’s (1985) pragmatic account, an alternative view which, like structure-mapping, utilizes a complex representational format.
11. Three Accounts of Metaphor and Analogy
A.
MULTIDIMENSIONAL SPACE MODELS
Rumelhart and Abrahamson (1973)developed a model of analogy based on multidimensional space models of similarity (e.g., Shepard, 1974; Krumhansl, 1978). The model is based on the notion of constructing parallel vectors in a multidimensional space. An analogy such as H0rse:zebra::dog:-” can be solved by constructing a vector from horse to zebra and then constructing the parallel vector from dog and reading off its end point (which might be “fox” in this case). Tourangeau and Sternberg (1981) extended this model to metaphor. In this model, a metaphor, such as “Brezhnev is a hawk,” is a mapping from a base subspace (e.g.. birds) to a target subspace (e.g., political figures). A metaphor is understood by constructing a vector from the origin in the target subspace that is parallel to the original vector in the base subspace. The ideal comparison concept is found at the terminus of the target vector; the distance between it and the actual target term is a measure of the withinspace fit of the metaphor. In the Tourangeau and Sternberg formulation, the aptness of a metaphor is greater the lower the within-space distance and the greater the between-space distance between the base and target subspaces. Tourangeau and Sternberg (1981) found some support for the theory, although chiefly for the within-space predictions. They compared subjects’ aptness ratings for metaphors with the within-space and betweenspace distances obtained from similarity ratings on the items and found the predicted negative correlation between aptness and within-space distance. The predicted positive correlation between aptness and betweenspace distance was not as strong, although in one study they did find a correlation between quality (the sum of aptness, goodness, interestingness, and likability) and between-space distance. However, a more fundamental difficulty with the multidimensional-space representational format is that it imposes severe limits on the kinds of metaphors that can be modeled. It can capture dimensional relations such as LARGER THAN (hawk, ‘ I
310
Dedre Gentner and Catherine Clement
dove) and can metaphorically match different dimensions: e.g., LARGER THAN with MORE IMPORTANT THAN. But there is no good way to represent event relations, such as PURSUE (hawk, dove), and still less causal relations between events. Given these limitations, the utility of the multidimensional space approach is fairly limited. Indeed, in a later article, Tourangeau and Sternberg ( 1982) consider other representations and, in particular, remark that semantic networks may be a useful format for representing metaphors involving actions or whole sentences.
B. SALIENCE IMBALANCE The next approach we consider is Ortony's ( 1979) influential salience imbalance model of metaphor. Like Tversky's (1977) contrast model of similarity on which it is based, it uses featural representations rather than mental-distance representations.' In Tversky's model, the similarity between two items is a weighted function of their common features less the difference sets of nonshared features, with nonshared features of the target weighted more. Ortony (1979) proposed an extension to Tversky's model in which the salience of a feature is defined relative to the particular object of which it is an attribute. Thus, the same feature can have different salience in two different objects. He suggests that the difference between metaphor and literal similarity is largely due to a difference in the relative salience of the features shared between the base and target. In a metaphorical comparison, such as "billboards are like warts," the shared features (such as ugly) are of high salience in the base (warts) and of low salience in the target (billboards). In a literal similarity statement, such as "billboards are like placards," the shared features are of high salience in both the target and the base domain. He suggests that "the imbalance f(u,b) in salience levels of matching attributes of the two terms is a principle source of metaphoricity" (Ortony, 1979, p. 164). One line of support for the salience imbalance account is the observation that metaphors tend to be strongly directional. Reversing the terms produces a relatively different meaning. For example, the simile "billboards are like warts" conveys something like "billboards are ugly bumps on the landscape." But the reverse order, "warts are like billboards," conveys something like "prominent advertising." In contrast, reversing the terms in a literal similarity comparison produces relatively little change in meaning. as in the statements "billboards are like placards" and "placards are like billboards." Ortony interprets this strong directionality in metaphor in terms of salience imbalance. Since the meaning of a metaphor depends on a match of high-salient features of the base (the second term) T h e "features" can refer to relations between objects as well as to simple object attributes.
Relational Selectivity in Metaphor
31 I
with low-salient features of the target (the first term), reversing the order of terms will, in general, change the meaning. Ortony's core observation that metaphors tend to display directionality is extremely persuasive. We might ask, then, whether salience imbalance theory could provide an account of how analogies and metaphors are interpreted. To this end, we consider three proposals suggested by Ortony (1979). The first proposal, the one of most interest here, is that salience imbalance provides an interpretation rule for metaphors that specifies how the meaning of a metaphor is derived from the meaning of its terms. Thus, it provides the informational constraint that metaphor interpretation consists of high-salient features in the representation of the base and lowsalient features in the representation of the target. The second proposal is that salience imbalance is constitutive of metaphor. That is, the degree of salience imbalance determines the degree to which we take a comparison to be metaphorical. The third proposal is that salience imbalance acts as a real-time processing rule for metaphor. Since there is no evidence for or against this proposal, and since our interest is in informational constraints rather than processing algorithms, we will not be concerned with this possibility further. Three studies by Ortony et al. (1985) provide general support for the role of salience imbalance in metaphors but do not clearly support the position that salience imbalance constrains the interpretation of metaphors. In the first study subjects were presented with pairs of forward and reverse similes and literal comparisons. They judged which direction was preferable and rated the similarity of the base and target. Order preferences were greatest for similes (suggestingthat directionality affected the meaning of similes more than the meaning of literal comparisons), and the differences between similarity ratings of forward and reverse comparisons were strongest for similes. Thus, the results showed stronger directional asymmetry for metaphorical comparisons than for literal statements. The second and third experiments directly assessed the salience imbalance of comparison statements. In the second study subjects were given propositions taken from interpretations of similes and literal comparisons written by the experimenters.3Subjects rated these propositions for their salience with respect to either the base terms or the target terms. Three converging measures of salience were used: applicability, conceptual centrality, and characteristicness. Results revealed higher salience ratings for propositions rated with respect to base terms than for those rated with respect to target terms. Salience imbalance was found for both similes and literal comparisons, but the effects appeared to be stronger for similes. These interpretations were validated in a previous experiment. in which subjects rated the adequacy of the interpretations. Results suggested that subjects were satisfied with the interpretations.
312
Dedre Centner and Catherine Clement
A third experiment provided similar results. Subjects read either similes or literal similarity statements. They were then given either the base or the target term and asked to provide an attribute of the term that contributed to the metaphor. Finally, subjects rated the salience (actually, the “distinctiveness”) of this attribute with respect to the term. (Subjects were told that a distinctive attribute was easily brought to mind, that it was very characteristic of the object, and that it distinguished the object from other objects). Results indicated that the difference in salience of attributes contributed by base terms and target terms was greater for similes than for literal similarity statements. Although the two latter studies suggest that metaphors show more salience imbalance than literal similarity comparisons, they do not address the specific proposal of interest here-that salience imbalance functions as an interpretation rule. In the second study, subjects rated interpretations provided by the experimenter rather than generating them themselves. In the third study, subjects generated partial interpretations (i.e., they wrote out those attributes of one of the terms that contributed to the metaphor). However, they rated the salience of these attributes with respect to constituent terms ufrer they read and interpreted the metaphor. This order opens the way for influence from the metaphor to the object descriptions since placing terms in a similarity or metaphor comparison may increase the subjective salience of their common attributes (Elio & Anderson, 1981; Forbus & Gentner, 1986; Gick & Holyoak, 1983h4Rips and Tourangeau have obtained evidence that placing object terms in metaphors does indeed affect the subjective salience of their attributes (L. J. Rips, personal communication, November, 1987). Thus, we cannot assume that the salience of base and object attributes measured after the metaphor has been processed is representative of their normal salience. For this reason, the Ortony et al. studies do not address whether or not salience imbalance provides an intepretation rule for metaphor. We will return to this point in the following studies. C. STRUCTURE-MAPPING Gentner’s (1980, 1982, 1983, 1988a) structure-mapping theory is aimed at characterizing analogy and differentiating it from ordinary literal similarity. Like featural approaches, the structure-mapping approach is componential, but it assumes a propositional representation in which there are structurally different kinds of components which play different roles in the interpretation process. The basic intuition is that an analogy is a mapping of knowledge from one domain (the base) into another (the target), which conveys that a system of relations that holds among the base objects ‘Ortony 11979) suggested that “attribute promotion” may occur in the target; that is, features in the target may be heightened by the metaphor.
Relational Sekctivity in Metaphor
313
also holds among the target objects. Thus, an analogy is a way of noticing relational commonalties independently of the objects in which those relations are embedded. According to this view, in interpreting an analogy people seek a common relational structure. Computationally, the interpretation of analogy requires finding a one-to-one correspondence between the objects of the base and the objects of the target so as to obtain the maximal structurally consistent match in relational structure (Falkenhainer, Forbus, & Gentner, 1986, 1988; Gentner, Falkenhainer & Skorstad, 1987). In addition to the general structural constraints of one-to-one correspondence and structural consistency, structure-mapping postulates two specific informational constraints on the interpretation of an analogy from its constituent terms. First, what is important is common relations. not common object descriptions. The corresponding objects in the base and target don't have to resemble each other; object correspondences are determined by roles in the matching relational structures. Second, the choice of which relations to match is guided by the principle of systematicity: rather than mapping isolated predicates, people prefer to match and carry over systems of predicates governed by higher-order constraining relations. The systematicity principle is a structural expression of peoples' tacit preference for coherence and deductive power in interpreting an analogy. To take a familiar example, in the Rutherford analogy between the solar system and the hydrogen atom, the intended interpretation consists of a set of common relations: that the nucleus is more massive than the electron (just as the sun is more massive than the planet), that the nucleus attracts the electron, that this plus the mass relation causes the electron to revolve around the nucleus, and so on. Object descriptions are disregarded; there is no attempt to match the nucleus with the sun in color, size. or temperature, Moreover, the choice of the common relational structure of a central-force system is determined by the fact that it is the maximal systematic structure that can be found (or postulated) in both domains. Besides analogy, other kinds of similarity can be distinguished in this framework according to whether the match is one of relational structure, object descriptions, or both. Analogies discard object descriptions and preserve relational structure. Mere-appearance matches preserve object attributes and discard relational structure. Literal similarity matches preserve both relational structure and object descriptions. Figure 1 shows the similarity space formed by varying the degree to which the base and target share different kinds of features. I . Representational Distinctions
We use a propositional representation in which ( 1 ) nodes or constants represent concepts treated as wholes and (2) predicates, when applied to the nodes, express propositions about the concepts (cf. Collins & Quillian,
314
Dedre Cenlner and Catherine Clement
1 U
f
r
v) v)
c
.-c0
-cd
P)
a
Attributes Shared
-
Fig. I. Similarity space formed by varying the kinds of features shared by the base and target of a comparison.
1969; Miller & Johnson-Laird, 1976; Norman & Rumelhart, 1975; Palmer, 1978; Rumelhart & Ortony, 1977; Schank & Abelson, 1977). Two structural
distinctions are important. First, to capture the distinction between object descriptions and rational structure, we distinguish object attributespredicates taking one argument, such as YELLOW (xkfrom relations.~ predicates taking two or more arguments, such as COLLIDE ( x , ~ )(See Section V for more details.) The second structural distinction is the order of a predicate, defined as follows: ( I ) constants and functions on constants have order 0 and (2) the order of a predicate is 1 + the maximum order of its arguments. Thus, a first-order predicate is one whose arguments are objects. A second-order predicate is one for which at least one argument is a first-order predicate, and so on. For example, if COLLIDE ( x , y ) and STRIKE b,z) are first-order predicates, CAUSE [COLLIDE (x,y). STRIKE ( y , z ) ]is a second-order predicate. It is important to note that these distinctions among predicate types are 'Computing the meaning o f a one-place predicate may involve an implicit extradomain comparison (Palmer, 1978; Rips & Turnbull, 1980). For example. to comprehend /urge (sun) requires an implicit comparison between the sun and other stars (since /urge for a star i s a differenl size from /urge for. say, a mouse). However, despite this complexity. once the value of the attribute is computed, /urge (sun) can be psychologically treated as a one-place predicate in the domain. In contrast. a predicate such as /urger /Aun (sun. planet) i s inherently a relation between two objects in the discourse. Thus the distinction between one-place and n-place predicates i s well formed if all the objects are in the domain of discourse.
Relational Selectivity in Metaphor
31s
intended to apply to psychological representations. Logically, the same proposition can be expressed in many formally equivalent ways. For example, a relation R(a,b,c)can be represented perfectly well as a one-place predicate Q(x), where Q(x) is defined to be true just in case R(c1.h.c.) is true. But our interest is not in all of the ways a domain could logically be represented, but in how it is psychologically represented at a given time for a given person. The claim is that, given the person's current representations of the base and target, the structure-mapping rules describe the informational constraints on the interpretation of an analogy. 2. Analogy and Metaphor
Structure mapping was developed as an account of explanatory analogies, such as those used in science (e.g., Gentner, 1983; Collins & Gentner, 1983, 1987). However, we suggest that a large class of metaphors can also be encompassed in this framework (Gentner, 1982; Gentner et a / . . 1987). As shown in Fig. I , we can divide metaphors into categories of relational metaphors, attributional metaphors, and combinations of these." Relational metaphors convey common relational structure and can be analyzed like analogies. Attributional metaphors are mere-appearance matches: their focus is on common object attributes. There are also metaphors that are combinations of attributes and relations. We can exemplify these distinctions with the comparisons in Table 1. As discussed previously, Statement 1 is a literal similarity comparison and Statement 8 is an anomaly. Statement 2 would be called analogy, and Statements 3-6 are relational metaphors. (Note that they could easily be described as analogies.) Finally, Statement 7 is an attributional metaphor. Although metaphors can be either relational or attributional. relational metaphors appear more characteristic of adult metaphorical language. We suggest that people seek relational interpretations of metaphors and prefer metaphors for which such interpretations can be found. Adding this preference assumption results in two general claims. First, in deriving the interpretation of a metaphor from the prior representations of the base and target, people should try to preserve common relations and disregard attributes. Second, people should judge aptness by the degree to which they are successful in finding a relational interpretation. As a psychological model, structure-mappingis rather elaborate. It assumes that the comprehension of metaphors involves processing of com'It must be noted that there are metaphors which do not fit the framework. e&. metaphors that lack structural consistency (see Gentner 1982). An example is Dylan Thomas's "On a star of faith pure as the drifting breadJAs the food and flames of the snow. . . ." Such metaphors are characterized by many cross-weaving connections with no one best mapping between base predicates and target predicates. We will not consider such metaphors here.
Dedre Gentner and Catherine Clement
316
plex representational structures and that the matching process is sensitive to distinctions about predicate structure. In contrast, salience imbalance makes far fewer representational assumptions. It requires only a set of features (or predicates) ordered by salience; no structural distinctions are required. It is reasonable to ask whether the elaborate assumptions of structure mapping are necessary, or whether the simpler representational assumptions of salience imbalance are sufficient to account for metaphor interpretation. Therefore, we now present experiments that contrast salience imbalance and structure mapping 111.
Experiments Contrasting Structure-Mapping and Salience Imbalance
Structure-mapping and salience imbalance make different predictions about how people derive the meaning of a metaphor from the prior representations of its terms. In this section we describe a series of studies that test these predictions. The method was straightforward. Subjects first wrote out descriptions of individual terms. Then they wrote out interpretations of forward or reverse metaphors containing these terms and rated the metaphors for metaphoricity and aptness. (They were not told about the metaphor task until after they had completed the description task.) Subsequently, judges evaluated the responses according to the predictions of salience imbalance and structure-mapping, as described shortly. The structure-mapping hypothesis states that people seek interpretations of metaphors that preserve relations from the base and drop object attributes. This generates three specific predictions. First, the metaphor interpretations should include more relations than object attributes. (The assessment of attributionality and relationality is described later). Second, this difference between the amount of relational information and the amount of attributional information should be greater for metaphor interpretations than object descriptions. Third, the more relations subjects can map from base to target, the more apt they should find the metaphor. Therefore, the aptness ratings for metaphors should be positively correlated with the degree to which the metaphor interpretation is relational. No such prediction holds for attributes: there should be either no correlation or a negative correlation between aptness ratings of metaphors and the attributionality of the interpretations. In drawing predictions from the salience imbalance theory, one difficulty is the lack of a clear definition of “salience.” In this study. following a suggestion by Ortony (l979), we operationalized salience as the order of mention of propositions in subjects’ object descriptions. With this proviso, salience imbalance makes three predictions. First, if salience imbalance
Relational Selectivity in Metaphor
317
governs the interpretation of a metaphor, then the chief determinant of which aspects of the object descriptions are used in the metaphors should be salience imbalance; that is the metaphor interpretations should contain a preponderance of features' that are mentioned early in the base description and late, if at all, in the target description. The remaining two predictions derive from the claim that metaphoricity depends on salience imbalance.' The second is that the rated metaphoricity of forward metaphors should be greater than that of reverse metaphors. This is because the feature matches for the forward metaph0rs-e.g.. "cigarettes are like time bombs"-should satisfy salience imbalance to a greater degree than should the reversed metaph0rs-e.g.. "time bombs are like cigarettes." Third, if the metaphors vary in the degree to which they display salience imbalance, the rated metaphoricity should depend on the degree of salience imbalance: that is, on the degree to which the predicates that appear in the metaphor interpretations appear early in the base descriptions and late in the target descriptions. A.
EXPERIMENT I
I . Method u . Subjects. Undergraduate college students (20) from the Cambridge, Massachusetts area, served in the basic metaphor interpretation task. They were paid for their participation. Two other groups served as judges in the scoring tasks: (1) 5 advanced undergraduate psychology students at the University of California at San Diego (U.C.S.D.). and (2) 22 undergraduate students, also from U.C.S.D. Both groups received course credit for participating. b. Materials. The eight metaphors used were taken from Ortony (1979)and are shown in Table 11. Two sets of metaphors were constructed, each containing half the metaphors in forward order (e.g., "sermons are like sleeping pills") and half in reverse order (e.g., "sleeping pills are like sermons"). Each set also contained eight filler metaphors that were always in forward order, i.e., in the most intuitively natural order. Half of the subjects saw each set so that the order assignment was counterbalanced. 'We stress that these features can include relations as well as object attributes. Ortony specifically mentions schemas as an instance of the kinds of representations he means the theory to apply to. However, although salience imbalance allows different kinds of predicates. distinctions among predicate types do not enter into the theory. 'Note that the scopes of the two theories are somewhat different. Structure-mapping makes strong predictions about aptness, not not about metaphoricity. Salience imbalance makes predictions about metaphoricity but not about aptness.
3 18
Dedre Centner and Catherine Clement
TABLE 11
MATERIALSU S E D
IN EXPERIMENT
I
Blood vessels are like aqueducts. Surgeons are like butchers. Education is like a stairway. Sermons are like sleeping pills. Cigarettes are like time bombs. Science is like a glacier. Encyclopedias are gold mines. Billboards are like warts.
The eight experimental and eight filler metaphors yielded 32 object terms for the object-description task. c. Procedure. Subjects were first asked to write descriptions of each of the individual terms (e.g.. sermons, sleeping pills). The 32 object terms were presented in random order, except that the 2 terms from a metaphor were never presented contiguously. Subjects were not told about the metaphor task until they had completed the object descriptions. Then they were given the 16 metaphors in random order, in workbooks, one to a page. They were told to write the intended meaning of each metaphor and to rate its metaphoricity and aptness on separate 1-5 scales. “Metaphoricity” concerned whether the comparison was literal or nonliteral. and “aptness” concerned how clever, interesting, and worthwhile the comparison was. d . Scoring. To test the structure-mapping hypothesis, the relationality and attributionality of the metaphor interpretations and object descriptions were rated in two ways: ( I ) by 5 trained advanced undergraduates (judges’ ratings) and (2) by a group of 22 undergraduate subjects with no special training (undergraduate AIR ratings). To test the salience imbalance hypothesis, 2 of the trained judges rated whether the propositions that occurred in the metaphor interpretations occurred early or late (if at all) in the object descriptions (salience ratings). e . Judges’ Ratings of Relationality and Attrihutionality. All five judges had some advanced training in linguistics or psycholinguistics. In addition. they received roughly 10 hours of training in the use of propositional notation to represent meaning. The judges were blind both to the subjects’ aptness and metaphoricity ratings and to the direction of the original metaphors. Only one judge knew the hypotheses of the study. Three to five judges participated in each scoring session. The 20 interpretations for a given metaphor (10 from the forward presentation and 10
Relational §electivity in Metaphor
319
from the reverse) were read aloud in random order. Each judge rated the relationality and attributionality of the interpretation on separate 1-5 scales. Relationality was defined as the degree to which the predicates in the response expressed relations, either between objects (e.g., X hits v ) or between relutions (e.g.. X hitting Y causes Y to break). Attributicmility was defined as the degree to which the predicates described objects in and of themselves. In most cases the decisions were straightforward. However, there were some cases in which deriving !he conceptual structure from the surface information required a subjective decision. Section V gives a detailed discussion of the scoring. After rating all 20 responses for a given metaphor, the raters discussed their ratings and disagreements were resolved. The agreement before discussion was .91. Immediately after rating a metaphor interpretation, the judges rated the relationality and attributionality of the object descriptions for the same metaphor (20 descriptions for each of the two objects). Descriptions were read to the judges in a different random order from the metaphors.
f. Undergruduute AIR Ratings. As a check on the judges’ ratings, a second method of scoring for relationality and attributionality was also used. This method differed from the previous rating method in three ways: ( I ) a large group of untrained subjects served as raters, (2) the metaphor interpretations were broken into individual propositions rather than being rated as a whole, and (3) one combined rating scale was used rather than separate scales for attributionality and relationality. Only propositions from the metaphor interpretations were rated; the object descriptions were not included in this task. The 22 raters were divided into two groups, each scoring responses for one of the two sets of metaphors. The propositions were read to the raters in random order within and across metaphors. Each proposition was rated on a 5-point composite scale ranging from I (highly attributional) to 5 (highly relational). Raters were given as examples of clearly attributional statements: “X is red” and “X is large.” Examples of relational statements were “X hits things” and “X causes explosions.” g . Scoring for Salience Imbalance. Two trained judges rated the metaphor interpretations for salience imbalance. They were unaware of the hypothesis being tested and of the original subjects’ aptness and metaphoricity ratings. Forward and reversed metaphors were scored separately. Judges compared each metaphor interpretation with the subjects’ object descriptions and assessed whether the metaphor interpretation contained any propositions also found in either the base or target object descriptions. Judges scored on the basis of meaning, not for identical wording. The outcome of this scoring procedure was. for each metaphor. the number of propositions that the original subjects had included both
N r e Gentner and Catherine Clement
320
in the metaphor interpretation and in ( I ) the base, (2) the target, (3) the top half of the base, (4) the bottom half of the base, ( 5 ) the top half of the target, and (6) the bottom half of the target.
2. Results a . Structure-Mapping. The first two predictions are (1) that the metaphor interpretations should contain more relational information than attributional information and (2) that this relational advantage should be greater for the metaphor interpretations than for the object descriptions. Table 111 shows a typical response. Both relations and object attributes appear in the object descriptions, but only relational information appears in the metaphor interpretation.
TABLE 111 SAMPLE RESPONSE IN EXPERIMENT 1: OBJECT DESCRIPTIONS AND METAPHORINTERPRETATION OF "CIGARETTES ARE LIKETIME BOMBS" Trained judges' ratings Response
Relationality
Attributionality
Base: time bomb Explosive devices with detonator linked to timing device Explosion time can be pre-set Perpetrator doesn't have to be present
5
5
Target: cigarette Chopped cured tobacco in paper roll With or without a filter at the end held in the mouth With or without menthol Lit with a match and breathed through to draw smoke into the lungs Found widely among humans Known by some cultures to be damaging to the lungs Once considered beneficial to health
5
5
5
I
Metaphor: cigarettes are like time bombs They do their damage afker some period of time during which no damage may be evident Aptness: 3 Metaphoricity: 5
-
Relatkmsl Selectivity in Metaphor
'E 5
32 I
relationality
.
attributionality
'W
0
OBJECT DESCRIPTIONS
METAPHOR INTERPRETATIONS
Fig. 2. Experiment I : mean ratings by trained judges of relationality and attributionality of object descriptions and metaphor interpretations.
These predictions are borne out by the trained judges' ratings of metaphor interpretations and object descriptions. First, as showin in Fig. 2. the metaphor interpretations were rated as highly relational but not highly attributional (r(l5) = 6.68, p < .0005, one tailed]. This difference holds up for individual metaphors. The mean relationality rating was higher than the mean attributionality rating for every one of the 16 metaphors, both forward and reverse. Second, this difference between relationality and attributionality was specific to metaphor interpretations. The object descriptions were rated as high in both relational and attributional information. A 2 x 2 x 2 analysis of variance for the within-subjects factors of Directionality (forward vs. reverse), Task (metaphor vs. object), and Measure (relationality vs. attributionality) confirmed this prediction. There was a main effect of Task, simply reflecting that more was said (both attributes and relations) about the objects than the metaphors [F(1,19) = 262.44, p < .OOI]. Measure was also significant [F(1,19) = 419.08, p < .OOI], reflecting that, overall, the responses were judged as higher in relationality than in attributionality. There was no main effect of Direction [F(I,19) = 3.20, not significant], although a significant interaction between Direction and Task was found [F(I , 19) = 1 I .30,p < .01]. Not surprisingly, Direction affected metaphors but not objects. Turning to the chief result, the predicted interaction of Task and Measure was significant [F (1.19) = 129.94, p < .001]. It appears that the drop in
322
Dedre Gentner and Catherine Clement
attributionality from object descriptions to metaphors is steeper than the drop in relationality. However, planned comparisons revealed that both attributionality and relationality differed significantly between metaphors and objects [ t (39) = 18.01, p < .OOl and t (39) = 2.05, p < .05, respectively]. An item analysis revealed the same patterns of significance as the subjects analysis, except that the interaction between Direction and Task was not significant. Again, the key interaction of Task and Measure was significant [ F (1.7) = 15.10, p C.011. The third prediction of structure-mappingis that the aptness of the metaphors should be positively correlated with the relationality of the metaphor interpretations. In contrast, there should be no correlation, or even a negative correlation, between aptness and attributionality. This prediction was confirmed using both the judges’ ratings and the undergraduate A/R ratings. Figure 3 shows a scatter plot of the subjects’ aptness ratings plotted against the judges’ ratings of relationality and attributionality. Figure 4 shows aptness plotted against the undergraduate AIR ratings. Pearson’s product-moment correlations were performed on the mean ratings for the 16 metaphors. As predicted, aptness is positively correlated with both measures of relationality (r = .65, p C.01 for the judges ratings and r = .56. p c .05 for the undergraduate A/R rating). There is no positive correlation between aptness and attributionality, and the trend is negative ( r = - .31. not significant) for the judges’ ratings. These results suggest that subjects consider metaphors apt when they find relational interpretations. Finally, as a check on the reliability of the measures, correlations were performed between the judges’ mean ratings and the undergraduate A/R ratings. The measures proved to be consistent. The correlation with A/R rating was positive for relationality and negative for attributionality [r(14) = .62, p < .05 and d14) = -.65, P C .01, respectively]. b. Salience Imbalance. The first prediction of salience imbalance is that the metaphor interpretations should primarily include propositions that are of high salience (mentioned early) in the base description and of low salience (mentioned late) in the target. In order to give the hypothesis every possible opportunity, several variants of the predictions were tested. Table IV shows the results. The most straightforward prediction is that more assertions from the metaphor interpretations should be found in the top half of the descriptions of the base object and the bottom half of the descriptions of the target object than from the reverse intersection (the bottom half of the base and top half of the target). Figure 5 shows a schematic depiction of this prediction (“BI” refers to top half of base and “T2” refers to bottom half of target). This prediction is not confirmed, as shown in Table IV.The difference in the mean numbers of propositions in the two intersections is not significant [ t ( l 5 ) = .81, not significant].
323
Relational Selectivity in Metaphor
v4 4
0
rm-
I=
0 -@4
+, 0
4a-
+ +
4
8
m ce 0
s
+,
E 0
am-
rn
73
3 c, C 0
I@ :
La, a tm
I
I 1. m
I
I
rm-
I
1
am
+ +
-.-+
aa,
\
+
+
*42
+
+
1
I rm
+ .
+
1.m-
I
1
+
-\
za,
I 4m
+
4m-
am
1
bOD
I
I
1
I
+ +
I
++
I
1
1
Fig. 3. Experiment I: correlations between subjects' aptness ratings and trained judges ratings of relationality and attributionality. Top. relationality vs. aptness; bottom. attributionality vs. aptness.
Dedre Centner and Catherine Clement
324
/-
+-/-
c
+
+
+
+
+
7 ' 2 '+
+ +
k
*
Mean Original Aptness Rating
0
8
L: Fig. 4. Experiment I: correlation between subjects' aptness ratings and the undergraduate A-R ratings.
TABLE IV RESULTS OF EXPERIMENT 1: MEAN NUMBERS OF PREDICATES OCCURRING IN METAPHORINTERPRETATIONS Predictions of salience imbalance
Results: mean number of predicates"
~~
81
n T2 > 8 2 n TI B>T BI > B2 T2 > TI BI > T I T2 > 8 2
BI n n B BI T2 81 T2
"NS. Not significant, two-tailed I test.
=
.OM
= 1.16
= .58 =
.49
= 38 = .49
8 2 n TI T 82 TI TI 82
= .025 = 1.04 = .58 = 36
= .S6 = 58
NS NS NS NS NS NS
Relational Selectivity in Metaphor
325
BASE
TARGET
Fig. 5. Experiment I : schematic depiction of the salience imbalance prediction: Metaphor interpretations should include information from the hatched quadrants. (T2 = bottom half of target object descriptions; B I = top half of base object descriptions).
e.9. sleeplng Pl11.5
e,g,
sermons
"A T IS LIKE A
e.9. Sermons
ore llke
B"
sleeplng 0111s
Perhaps the halfway point is the wrong cutoff for high vs. low salience. All or most of the information subjects mentioned in their object descriptions may be of high salience. In that case the prediction is simply that more of the metaphor assertions should match assertions from the base description than from the target description. This prediction is also disconfirmed [r(15) = 3 1 , not significant]. Although two plausible versions of the salience imbalance prediction have been disconfnned, there remain four other patterns that could support salience imbalance: if the assertions from the metaphor interpretations (1) match more assertions from the top half of the base than from the bottom half of the base; (2) match more assertions from the bottom half of the target description than from the top half; (3) match more assertions in the top half of the base than from the top half of the target; or (4) match more assertions from the bottom half of the target than from the bottom half of the base. Each one of these predictions is disconfirmed. As shown in Table IV, the relevant means are nearly identical in all cases and all differences are not significant. Overall, the first prediction of the salience imbalance hypothesis is not supported here. We found no evidence that salience imbalance determines the information people use in their metaphor interpretations. The second prediction of the salience imbalance hypothesis, that metaphoricity ratings should be higher for forward metaphors than for reversed metaphors, also failed to receive support. Table V shows the mean aptness and metaphoricity ratings (as well as the ratings of relationality and attributionality)for forward vs. reversed metaphors. As Table V suggests, the mean metaphoricity ratings are not significantly different for forward and reversed metaphors [r(7) = 1.21, not significant]. The forward
Dedre Centner and Catherine Clement
326
TABLE V
RESULTS OF EXPERIMENT 1: COMPARISON OF FORWARD AND REVERSEDMETAPHORS Original ratings
Characteristics of interpretations
Relationality Attributionality AIR rating Aptness Metaphoricity (trained judges)" (trained judges) (group raters) Forward metaphors Reversed metaphors
3.31
3.80
4.91
2.51
3.10
2.70
3.60
4.60
2.24
2.99
"The only significant difference between the forward and reversed condilions is in relationality [ r (7) = 2.51. p > .0S. one tailed].
and reversed metaphors do appear to differ more in aptness than in metaphoricity, although the aptness difference is also not significant [r(7) = I .77]. The only significant difference between forward and reversed metaphors is in relationality as related by the trained judges [r(7) = 2.51, p < .05]. This difference in relationality is evidence for some directional asymmetry. However, there is no evidence that this asymmetry involves differences in metaphoricity. The third prediction of salience imbalance is that metaphoricity should be correlated with the degree of salience imbalance in the metaphors. That is, metaphoricity should be positively correlated with the proportion of metaphor interpretation statements also found in the base descriptions and negatively correlated with the proportion of interpretation statements found in the target descriptions. Instead. we find that metaphoricity is negatively correlated both with the number of statements from the target [r(l4) = -.69, p C .01] and with the number of propositions from the base [r( 14) = - .56, p < .05]. Since this is a key prediction for the salience imbalance theory, it seemed advisable to check if it held for the forward metaphors only. However, here, too, the results fail to show a positive correlation between metaphoricity and number of statements from the base and, indeed, show a negative trend [r(6) = - .65, not significant]. Finally, we tested whether the salience imbalance intuition might apply to aptness rather than to metaphoricity. This possibility also was disconfirmed. The correlations between aptness and number of propositions found in either the base or the target object descriptions are nonsignificant Lr(14) = .28 and r(14) = .05. respectively1.
Relational Selectivity in Metaphor
327
3. Discussion
The results provide no support for the proposals that ( I ) salience imbalance determines the interpretation of a metaphor from its terms or that (2) salience imbalance is a principal source of metaphoricity. Contrary to the first proposal, the imbalance in salience of assertions in the base and target object descriptions did not predict metaphor interpretations. Contrary to the second proposal, no difference in metaphoricity was found between the forward and reversed metaphors.' More importantly, metaphoricity was not correlated with the degree of salience imbalance. The predictions of structure-mappingtheory were confirmed. First, the metaphor interpretations were rated higher in relationality than in attributionality. Second, this relational advantage applied specifically to the metaphors; the object descriptions were rated high in both relational and attributional information. Third, aptness was correlated with relationality. Thus, the more relational information people can find to map from the base to the target, the more apt they find the metaphor. These results are evidence that relational structure serves as a selection constraint: when people interpret metaphorical comparisons, they tacitly assume that relational information, rather than information about object attributes, is meant to be preserved in the metaphor interpretation. One limitation of the research so far is that only eight metaphors (and their eight reversed counterparts) have been used. The small size of the stimulus set makes us suspect the null results for salience imbalance. It seemed advisable to replicate the study with a larger set of stimuli.
B. EXPEKiMENT 2 Experiment 2 provides a second test of the predictions of structuremapping theory and salience imbalance theory. '" The basic method was the same as in Experiment I: subjects first gave object descriptions and "It should be noted that the test o f this prediction i s problematic. I n an effort to ensure fairness l o the salience imbalance position, the metaphors were taken from the set of examples that Ortony (1979) had used to illustrate the theory. However, A. Ortony (personal communication. 1986) states that at least one o f these metaphors i s reversible. in which case the predicted directional asymmetry in metaphoricity would not be expected to hold for that metaphor. Thus. the failure to find an overall directional asymmetry i s not conclusive. However, the other two tests of salience imbalance and, i n particular. the negative results concerning salience imbalance as an interpretation rule are not affected. "This experiment was conducted as part of a larger study o f the development o f analogy and metaphor. and for this reason the metaphors were designed to be intelligible to children. For the present purposes only the adult data are o f interest. See Gentner (1988b) for a presentation of the developmental findings.
328
Dedre Centner and Catherlne Clement
then interpreted metaphors and rated them for aptness and metaphoricity. The interpretations were then scored by independent judges for relationality, attributionality, and salience, as estimated by order of mention with respect to the base and target. A key aspect of this experiment was that new materials were used which represented three types of metaphors, as discussed earlier: attributional metaphors, relational metaphors, and double metaphors. In attribute metaphors, the predicates shared by the base and target objects were object attributes: e.g., "pancakes are nickels" (both are round). In relation metaphors, the shared predicates were relations: e.g., ''a tire is a shoe" (both are used by moving figures as points of contact with the ground). In double metaphors, both attributes and relations were shared: e.g., "plant stems are like drinking straws" (both are long and cylindrical; both are used to bring liquids from below to nourish a living thing). In addition to broadening the range of materials, this collection of metaphor types provides some new tests of structure-mapping. First, since the theory predicts a correlation between aptness and relationality, the attribute metaphors should be judged as less apt than the relational metaphors. More important, for our purposes, is the test of the interpretation rules provided by the double metaphors. The results of Experiment I indicated that the metaphor interpretations were predominantly relational. However, it is possible that these results were obtained because only relational commonalties were possible. Since the double metaphors are designed to allow either a relational or an attributional interpretation, they provide a stronger test of the structure-mapping prediction that metaphor interpretations should be based on relations. Structure-mapping makes three predictions. First, the metaphor interpretations should be higher in relationality than in attributionality. (This prediction applies only to the relational and double metaphors since the attribute metaphors do not permit a relational interpretation.) Second, the aptness ratings should be positively correlated with the relationality of the metaphor interpretations. Third, aptness should be lower for attribute metaphors than for relational and double metaphors." The predictions of salience imbalance are as in Experiment I , except that in Experiment 2 the metaphors were presented in only one direction (the "forward direction"). Therefore, the predictions concerning metaphoricity and direction do not apply here. The first prediction is that metaphor interpretations should be determined by salience imbalance: that is, they should tend to include propositions mentioned early in the de"In this study the object descriptions were not rated for relationalify and attributiond~ity. so no comparisons were made between the relationality of the object descriptions and the metaphor interprefations.
Relational Selectivity in Metaphor
329
scription of the base object and late (if at all) in the description of the target object. Second, metaphoricity should depend principally on salience imbalance: that is, the metaphoricity ratings should be positively correlated with the degree of salience imbalance.
I . Method a . Subjects. The subjects were 10 college students from psychology classes at the University of California at San Diego. b. Materials. As shown in Table VI,there were eight instances each of three metaphor types: (I)attribute metaphors, in which base and target shared many attributes but few relations; (2) relation metaphors, in which base and target shared many relations but few attributes; and (3) double metaphors, in which base and target shared both relations and attributes. All subjects interpreted all 24 metaphors. In addition, there were 48 objects in the object description task.
TABLE VI
MATERIALSUSED IN EXPERIMENT 2 Relational metaphors
The moon i s like a lightbulb. A camera is like a tape-recorder. A ladder i s like a hill. A cloud i s like a sponge. A roof i s like a hat. Treebark i s like skin. A tire is like a shoe. A window i s like an eye.
Attributive metaphors
Jellybeans are like balloons. A cloud i s like a marshmallow. A football i s like an egg. The sun i s like an orange. A snake i s like a hose. Soap suds are like whipped cream. Pancakes are like nickels. A tiger i s like a zebra.
Double metaphors
A doctor is like a repairman. A kite is like a bird. The sky i s like the ocean. A hummingbird is like a helicopter. Plant stems are like drinking straws. A lake i s like a mirror. Grass is like hair. Stars are like diamonds.
330
Dedre Centner and Catherine Clement
c . Procedure. The procedure was the same as for Experiment I . Subjects first wrote out descriptions of the 48 separate objects, which were presented in random order. They were then given the metaphor workbook and told to write their interpretations of the metaphors and to rate their aptness and metaphoricity.
d . Scoring. The metaphor interpretations were scored by trained judges (the same five advanced undergraduates as in Experiment I). The method was as in Experiment I: groups of from two to four judges were read the metaphor interpretations and rated them on two 5-point scales, a relational scale and an attributional scale. Agreement between raters ranged from 85 to 100% on different metaphors.
2 . Resulfs
The results of this study largely replicated those of Experiment I . a . Sfrucfure-Mapping. The first prediction is that the interpretations of relation metaphors and double metaphors would be high in relational information but not attributional information. As discussed previously, the performance on double metaphors is of special interest since they were specifically designed to have attributional interpretations as well as relational interpretations. Table VII shows the rated relationality and attributionality of the interpretations for the three types of metaphor. As predicted, metaphor interpretations were rated higher in relationality than in attributionality for both relation and double metaphors [f(9) = 5.87, p < .001 and f(9) = 3.79, p < .05, respectively]. The second prediction of structure-mapping is that aptness ratings would be positively correlated with relationality, but not with attributionality.
TABLE V11 RESULTS OF EXPERIMENT 2: JUDGES’ RATINGSOF THE RELATIONALITY AND AITKIBUTIONALITY OF THE METAPHOR lNTEKPKETATlONS A N D SUBJECTS’ APTNESSRATINGSFOR T H E THREE TYPESOF METAPHORS
Relational metaphors Double metaphors Attributional metaphors
Relationalit y
Attributionalit y
Aptness
4.6
I .7
2.9
3.9
3.0
2.9
I .4
4.4
2.3
Relational Selectivity in Metaphor
33 I
As in Experiment I , aptness is positively correlated with relationality [422) = . 5 5 . p < .01]. However, the attributionality results are somewhat stronger than those for Experiment I in that aptness is negatively correlated with attributionality [422) = - .42, p <.05]. It appears that subjects found
metaphors more apt to the extent that they found relational commonalities and less apt to the extent that they found attributional commonalties. The third prediction is that subjects should consider the relation and double metaphors more apt than the attribute metaphors since the first two permit relational interpretations and the latter does not. As Table VI1 shows, this prediction is confirmed; the mean aptness ratings for relation and double metaphors are considerably higher than those for attribute metaphors [t(9)= 5.24. p < .001 and r(9) = 7 . 3 1 , ~< .001, respectively]. Thus. all three predictions of structure-mapping are confirmed. b. Metuphor Clusses. One other set of findings concerns the materials. Crucial to this theory is the claim that the distinction between attributionality and relationality can be made reasonably clearly. The results provide evidence for the orderliness of the distinction (see Table V l l ) . First, the relation metaphors were rated as more relational than both attribute and double metaphors [t(9) = 12.98, p < .OOl and t(9) = 2.35, p C .05. respectively]. Second, the attribute metaphors were rated as more attributional than both relation and double metaphors [t(9)= 18.09, p < .001, and t(9) = 8.1 I . p < .001, respectively]. Finally, the double metaphors were intermediate on both rating scales. They were more relational than attribute metaphors [t(9) = 13.12, p < .001] and more attributional than relation metaphors [t(9) = 10.87, p < .001]. c. Salience Imbalunce. The first prediction of salience imbalance is that the metaphor interpretations should tend to include propositions mentioned early in the description of the base and late in the description of the target. This result is not confirmed. Table VlIl shows the results' for the same set of six versions of the prediction tested in Experiment I . Not one yields a significant difference. Indeed, the pattern of results is remarkably similar to the negative results of Experiment I . Thus, it does not appear that salience imbalance functioned as an informational constraint on which propositions subjects included in their metaphor interpretations. The second prediction of salience imbalance is that metaphoricity should depend positively on salience imbalance. That is, metaphoricity should be positively correlated with the number of assertions from the metaphor interpretation that match with the base description and negatively correlated with the number that match with the target. This prediction, too, is not confirmed. The correlations between rated metaphoricity and number of propositions from base and from target are not significant [r(22) = .32 and 4 2 2 ) = .lo, respectively].
Dedre Centner and Catherine Clement
332
TABLE Vlll
RESULTS OF EXPERIMENT 2: MEAN NUMBERS OF PREDICATES OCCURRING I N METAPHOR INTERPRETATIONS Predictions of salience imbalance
BI n T2 > B2 f l TI B>T BI > B2 T2 > TI BI > T I T2 > 8 2
Results: mean number of predicates"
BI n T2 = .025 B = .63 BI = .37 T2 = .23 BI = .37 T2 = .23
B2 n TI T B2 TI
TI
82
= .026 = .59 = .28 = .39 = .39 = .28
NS NS NS NS NS NS
"NS. Not significant, two-tailed f test.
As in Experiment I , we also tested the correlation between salience imbalance and aptness, even though that correlation is not predicted by the salience imbalance model. In Experiment 2, unlike Experiment 1. there was some support for a correlation between salience imbalance and aptness. That is, aptness was positively correlated with the number of propositions from the base [r(22) = .37,p < .01]. However, the further prediction that aptness should be negatively correlated with the number of propositions from the target was not confirmed [r(22) = .30, not significant]. 3. Discussion
The results are consistent with the structure-mapping claim that relational selectivity acts as an informational constraint on the interpretation of a metaphor from its terms. First, the metaphor interpretations were rated high in relationality and low in attributionality, including the double metaphors, which could support either a relational or an attributional interpretation. Second, the aptness ratings were positively correlated with judged relationality and negatively correlated with judged attributionality . Finally, subjects rated the relational and double metaphors as more apt than the attribute metaphors. Subjects appear both to seek relational predicates in metaphor interpretation and to judge the aptness of the comparison according to the relationality of the interpretation. No support was found for the two predictions of salience imbalance tested here. First, the relative salience of assertions in the base and target object descriptions did not predict inclusion in the metaphor interpretations. Second, there was no correlation between degree of salience imbalance and metaphoricity. (However, some evidence was obtained for
Relational Selectivity in Metaphor
333
a relation between salience imbalance and aptness. We will return to this point later.) Both structure-mapping and salience imbalance postulate dependent variables that could be operationalized in various ways. In Experiment 3a and 3b we consider alternate methods of assessing the predictions of the two theories. In Experiment 3a we consider an alternative way to operationalize the structural distinctions necessary to test structure-mapping theory. In the previous experiments, we used two methods of rating the underlying propositional structure of subjects' responses. In both cases the results confirmed the structure-mapping predictions. However, these methods have the disadvantage that decisions about attributionality and relationality are, in part, subjective. Although the judges were not told the subjects' aptness or metaphoricity ratings, and among the five trained judges only one knew the hypothesis, it nevertheless seems desirable to have a less subjective measure of relationality and attributionality. Therefore, in Experiment 3a the structure-mapping predictions were evaluated using a syntatic scoring method. The task of the raters was simply to assign each word to a syntactic category. Then the categories were sorted into relational categories (e.g. transitive verbs and comparative adjectives) and object-attribute categories (e.g., common nouns and adjectives). Our reasoning was that the number of relational words in the surface syntax might be fairly well correlated with the relationality of the underlying propositional structure. If so, this syntactic scoring system should produce results similar to the two propositional scoring systems. Although this method entails some sacrifice in the richness of propositional scoring (that is, it will fail to capture all of the underlying relational structure), it has the advantage of being objective and easily describable.
c.
EXPERIMENT 3A
I . Method a . Materials. The descriptions of the 16 objects and interpretations of the 8 forward and reverse metaphors from Experiment I were given to the raters.
6. Raters. Five advanced undergraduates from the University of IIlinois, who each received a brief training session in grammatical distinctions, served as raters. c. Rating System. The raters were instructed to rate each word of every sentence used in the descriptions and interpretations according to the grammatical categories shown in Table IX.Section V1 gives a detailed description of the scoring system used. Our interest was in scoring what
334
Mre Centner and Catherine Clement
TABLE IX EXPERIMENT 3A: GRAMMATICALCATEGORIES SYNTACTIC SCORING SYSTEM Category Noun Ordinary noun“ Relational noun Not sure (noun) Pronoun Adjective Ordinary adjective“ Relational adjective Not sure (adjective) Pronominal adjective or possessive” Comparative modifieS Preposition” Adverb Verb Intransitive verb Transitive verb” Linking verb Verb plus particle Not sure (verb) Connective” Other
U S E D IN
Example
Dog Father
-
They Red Edible Their. theirs Bigger than Above Slowly Fall
Hit Is. seems Hang on
-
Because
“Counted as attributional i n subsequent analyses. ”Counted as relational i n subsequent analyses.
subjects wrote about the objects (or, in the case of metaphor interpretation. about the comparison between the objects). Therefore, any mention of the original object terms as well as any pronouns that referred to them was omitted from the scoring. For example, if a subject interpreted the metaphor “an apple is like a fire engine” by saying “they both are red” or “apples are red,” then only the descriptor red would be scored; the words apples, they, and both would be omitted from scoring (since they merely refer to the objects given in the metaphor). Articles such as the and un were not scored. d . Procedure. The raters worked in groups of three. After practice with tiller materials, they scored the first three responses for each metaphor and object. After these ratings were recorded, disagreements were discussed. Finally, the rest of the statements were scored individually and the ratings were recorded. Any remaining disagreements were then discussed and resolved.
Relational Selectivity in Metaphor
33s
e. Analysis. The raters categorizations were combined into three larger scores: relational, attributional, and other. The categories counted as relational were ( I ) pronominal adjectives and possessives, (2) comparative modifiers, (3) prepositions, (4) transitive verbs, ( 5 ) connectives, (6) relational nouns, (7) relational adjectives, and (8) verbparticle combinations. The categories counted as attributional were ordinary nouns and ordinary adjectives. A narrow criterion was also used, in which categories 6, 7, and 8 were omitted from the relational category. Certain categories were omitted from further analysis because they are neither clearly relational nor clerly attributional: ( I ) not sure (nouns), (2) not sure (adjectives), (3) adverbs, (4) linking verbs (e.g., copulas), and ( 5 ) other. For each response (that is, each metaphor interpretation and each object description) the number of words falling into each class-relational, attributional, and other-was computed.
2. Results and Discussion
The results again confirm the predictions of structure-mapping. As in Experiment I , the proportion of attributional terms used in the object descriptions was significantly greater than the proportion used in the metaphor interpretations (.38 vs. .31). In contrast, the proportion of relational terms was similar: .39 in the object descriptions and .37 in the metaphor interpretations. Thus, the proportion of attributional information dropped as subjects moved from object descriptions to metaphor interpretations. This pattern held for both forward and reverse metaphors. A 2 x 2 x 2 analysis of variance of Direction (forward vs. reversed), Task (metaphor vs. object), and Category (relational vs. attributional) confirmed these patterns. There was a main effect of Task, reflecting the greater overall scores for object descriptions than for metaphors [F(I , 19) = 16.85. p < .OOOI],and a main effect of Category, reflecting the overall greater amount of attribute information [F( I ,19) = 8.89, p < .01]. Most importantly, there was an interaction between Task and Category, reflecting the fact that the proportion of attributional, but not relational, information was greater in the object descriptions than in the metaphor interpretations [F(I,19) = 5.56, p < .05]. No other effects were significant, including that of Direction." Thus, the syntactic scoring system yields the same pattern of results, supporting the structure-mapping view, as was originally obtained by the propositional scoring system in Experiment I . Having tested the predictions of structure-mapping under a different "The results of the narrow summation were similar, although much compressed. since by this scheme about half the terms had to be omitted from the analysis. The pattern of significance was the same except that the main effect of Category was nonsignificant.
336
Dedre Centner and Catherine Clement
method of assessment, in Experiment 3b we similarly reexamine the predictions of salience imbalance. The results of Experiments I and 2 failed to provide support for salience imbalance. However, before accepting these negative results, we must consider the possibility that the order-ofmention method used here simply failed to provide an adequate indicator of salience. Order of mention may be affected by multiple variables and, therefore, may be insensitive to the true salience order. It should be noted that some aspects of the data argue against this interpretation. First, the detailed patterns of negative results with respect to metaphoricity are nearly identical for Experiments I and 2. Second, the salience imbalance predictions fail not only on the detailed comparisons (e.g.. top half of base vs. bottom half of base) but also on the simple comparison of the relative contribution of base vs. target. By any reasonable interpretation of the notion of salience, it would seem that subjects should have included at least some information in their object descriptions that they considered salient for the objects. Yet in neither experiment did the base contribute more to the metaphor interpretation than the target. Thus, the salience imbalance predictions fail both at the fine-structure level and at the global level of base versus target. However, despite these arguments, it remains the case that order-of-mention is only one measure of salience. Therefore, in Experiment 3b we used an alternative way of measuring salience.
D. EXPERIMENT 36 In Experiment 3b. metaphor interpretations from Experiment I were assessed for salience relative to the base and target terms using a method similar to that used by Ortony et al. (1985). Subjects rated the assertions in the metaphor interpretations for their immediacy (how readily the thought comes to mind when thinking about the object term) and importance (how important or significant the thought is with respect to the object term) with respect to the base and target terms. Salience imbalance predicts that the assertions from the metaphor interpretation will be rated as more immediate and important with respect to the base than with respect to the target. Our interest is in whether metaphor interpretations can be predicted from the salience of information in the prior representations of base and target objects. Therefore, our method differs from that of Ortony et al. in that subjects were not told about the metaphor that generated the assertions they rated. This was done to ensure that subjects would give the immediacy and importance of the assertions simply with respect to the relevant object terms unbiased by the context of the metaphor. A second difference in methodology was that subjects rated assertions from the metaphor interpretations in the context of other assertions from the object descriptions. This meant that salience of the metaphor assertions
Relational Selectivity in Metaphor
337
was assessed relative to the salience of other assertions about the base and target terms.
I . Method a . Subjects. The subjects were 77 undergraduate and graduate students of the University of Illinois. They either received class credit or were paid for their participation. 6 . Materials. Subjects rated statements both from the metaphor interpretations (both forward and reverse metaphors) and from the object descriptions given by subjects in Experiment 1. Each subject rated the immediacy and importance of three types of statements with respect to one of the terms of the original metaphor: ( I ) statements about the term itself, (2) statements from metaphors in which the term was the target, and (3) statements from metaphors in which the term was the base. For example, for the metaphor “blood vessels are like aqueducts.” some subjects rated statements with respect to “blood vessels” and others rated statements with respect to “aqueducts.” Subjects in the “blood vessels” group rated the immediacy and importance, with respect to blood vessels, of (1) all statements describing blood vessels, (2) all statements made in interpreting the forward metaphor “blood vessels are aqueducts,” and (3) all statements made in interpreting the reverse metaphor “aqueducts are blood vessels. l3 The statements were rated with respect to the 16 terms that entered into the metaphors in Experiment 1. The number of statements about a given term varied from 46 to 73 depending on how many statements were made by the original subjects. Statements that were repeated across subjects were rated only once. Repetitions were scored on the basis of similar or identical meaning. For example “they transport fluid” as a description of “aqueducts” was counted as a repetition of “they carry liquid.” The weighted scoring system described next took into account the frequency of the original statements. To keep subjects unaware of the metaphors, we removed any mention of the original terms by Experiment I subjects. For example, if the metaphor had been interpreted as “both blood vessels and aqueducts provide nutrients,” this statement was changed to “they provide nutrients.” This had the further advantage that it allowed the identical statements to be rated with respect to both “blood vessels’’ and “aqueducts. ”
”
c . Procedure. Subjects were told that they would hear statements about a particular topic. Their job was to rate each statement on scales ”A few statements that were clearly inappropriate were omitted. For example. one subject described ”aqueduct” as “a horse racing track in Queens. NY.” This response was omitted.
338
Dedre Centner and Catherine Clement
of immediacy and importance with respect to that topic. Importance was defined as "how important or significant the thought is with respect to the (topic)." Importance was rated on a scale from 1 (not at all important) to 5 (highly important). Immediacy was defined as "how quickly and naturally the thought comes to mind when you are thinking of (the topic)" and was also rated on a scale from 1 to 5. As discussed previously, statements from the object descriptions were interspersed with statements from the interpretations of the metaphors that included that object. Each term was rated by 10 subjects. Of the 77 subjects, 56 rated two terms, 10 rated one, 8 rated three, 2 rated four, and I rated six.I4 Subjects never rated two terms taken from the same metaphor. d. Scoring. For each term, the ratings of the metaphor interpretation statements were summed separately for the forward and reverse metaphors. This gave us two totals across subjects, one for assertions about metaphors in which the term functioned as the base object and one for metaphors in which the term functioned as the target object. Both weighted and unweighted totals were computed. For the unweighted totals, the raw ratings of statements across subjects were simply summed for each term. For the weighted totals, the rating for each statement was weighted by the number of times the statement had been given by the original subjects.
2. Results and Discussion
If salience imbalance provides an interpretation rule for metaphors, then the statements that occurred in the metaphor interpretations should be rated as more salient with respect to base terms than target terms. As shown in Tables X and XI and Fig. 6, this prediction was not confirmed for either the immediacy or the importance ratings. Four 2 x 2 repeated-measures analyses of variance (ANOVA) were performed over immediacy and importance using both the weighted and unweighted scores. For all four ANOVA. the two factors were Position (base vs. target) and Directionality (forward vs. reverse metaphor interpretation) with items as the random variable. The key prediction of salience imbalance is a main effect of Position: the immediacy (or importance)of assertions should be higher with respect to the base than the target. This prediction was not confirmed. The effect of Position did not reach significance in any of the four analyses. For immediacy. the weighted and unweighted analyses yielded F( I ,7)= I .24and F( I .7)= I .23, respectively. For importance, F(1.7)= .69 and F(1,7)= 1.81 for the weighted and unweighted analyses. respectively. Thus, for both measures of salience "This task was used as a tiller task for other experiments. Thus. the number of terms rated by a subject varied according to the time allotted for the tiller task.
Relational Selectivity in Metaphor
339
TABLE X RESULTS OF EXPERIMENT 3B: MEAN IMMEDIACY RATINGSOF INFORMATION IN THE METAPHORINTERPRETATIONS WITH RESPECT TO THE BASE A N D TARGET Weighted
Forward Reverse Combined
Unweighted
Base
Target
Base
Target
3. in 2.79 2.98
2.59 3.21 2.90
3.04 2.72 2.88
2.55 3.02 2.78
neither the weighted nor unweighted ratings revealed a n y significant asymmetry between the salience of the metaphor interpretations with respect to the base and their salience with respect to the target. The main effect of Directionality was not significant in any of the four analyses. Statements that occurred in the interpretations of forward metaphors were not considered either more immediate or more important than statements in the interpretations of reverse metaphors. Although no main effects were shown in any of the four ANOVAs, there was an interaction between Position and Direction for immediacy [F(1.7) = 13.72, p < .01 for the weighted analysis and F(1,7) = 9.18, p < .05 for the unweighted analysis]. Figure 6 shows the weighted immediacy results with respect to the base and target for forward and reverse metaphors. The forward metaphors show the predicted asymmetry-that is, statements from the metaphor interpretations are rated as more immediate in the base than in the target-but the reverse metaphors show the opposite pattern. (The importance ratings showed a nonsignificant crossover of the same form.) Had salience imbalance operated as a strong interpretation TABLE XI
RESULTSOF EXPERIMENT 38: MEAN IMPORTANCE RATINGSOF I N F O R M A T I O N IN T H E METAPHOR INTERPRETATIONS WITH RESPECT TO T H E BASE A N D TARGET Weighted
Forward Reverse Combined
Unweighted
Base
Target
Base
Target
3.23 2.90 3.07
2.87 3. I7 3.02
3. I8 2.95 3.07
2.90 3.0 2.99
Dedre Gentner and Catherine Clement
340
U
9)
E
a
P
S
2
4 ’
Base
Target
I
Role Fig. 6. Experiment 3b: weighted immediacy ratings of propositions from the interpretations of forward and reverse metaphors with respect to base and target terms.
rule. the immediacy and importance of the metaphor assertions would have been higher for the base than for the target regardless of the direction of the metaphor. Thus, the results of this study largely confirm our previous negative findings concerning salience imbalance. Using immediacy and importance ratings as a measure of salience, we found no evidence that salience imbalance determines the information that is included in metaphor interpretations. The immediacy interaction provides support for a directional effect in that for forward metaphors the predicted asymmetry was obtained. However, the fact that the reverse metaphors show the reverse asymmetry indicates that, although salience imbalance may capture a genuine order preference, it does not determine subjects’ interpretations. Indeed, the pattern of results suggests that subjects had some other informational constraint (e.g., preserving common relational structure) that determined which commonalities belonged in their metaphor interpretations and simply included these commonalities regardless of their relative salience. If we make the further assumption that the “forward” metaphors are just those in which the relational commonalities are of higher salience in the base (and therefore that the “reverse” metaphors tend to have relational commonalities that are of higher salience in the target), then the fact that forward metaphors obeyed salience imbalance and reverse metaphors did not is explained. We will return to the issue of the respective roles of structure-mapping and salience imbalance in metaphor interpretation in the following discussion.
Relational Selectivity in Metaphor
E. GENERALDISCUSSION: SALIENCE MAPPING
34 I
IMBALANCE AND STRUCTURE-
I . Structure-Mapping The results of the three experiments support the application of structuremapping to metaphor. In Experiments 1, 2, and 3a, metaphor interpretations were found to contain predominantly relational information and to include relatively less attributional information than the object descriptions for their terms. Even metaphors deliberately designed to suggest either an attribute or relational interpretation (the double metaphors) received relational interpretations. This finding held under three different methods of scoring relationality and attributionality, two based on judgments of propositional structure and one based on syntactic judgments. Not only did subjects tend to focus their metaphor interpretations on relational information, but they also appeared to base their aptness judgments on the degree to which they succeed in arriving at a relational interpretation. In both Experiments I and 2, the relationality of metaphor interpretations was positively correlated with aptness. In contrast, subjects appeared to find attribute matches irrelevant or even detrimental to their sense of how apt a metaphor was; the correlation between aptness and attributionality was negative in Experiment 2 and not significant (but with a negative trend) in Experiment 1. Further, in Experiment 2 subjects judged the aptness of the double metaphors according to the relationality of their interpretations, even though an attribute interpretation was also clearly possible, and they judged both relational and double metaphors as more apt than attribute metaphors. 2. Salience Imbalance
The results for salience imbalance are both less promising and more puzzling. We first consider the negative evidence which suggests that salience imbalance provides neither an interpretation rule nor a defining criterion for metaphor. We then consider some positive findings that suggest that salience imbalance does have a role in metaphor. a . Negative Evidence Concerning Salience Imbalance. In all three experiments our methods, unlike those of Ortony et al. ( 1983, were aimed at predicting metaphor interpretations from the salience of information in the prior representation of the constituent terms. We found no evidence that the relative salience of information in the representations of the terms determines the meaning given to a metaphor. In Experiments I and 2. using an order-of-mention measure of salience, we found no significant
342
Dedre Gentner and Catherine Clement
tendency for the metaphor interpretations to contain high-salient information from subjects' prior descriptions of the base terms and/or lowsalient information from descriptions of target terms. The patterns of results in the two experiments are remarkably similar even though different metaphors were used. In Experiment 3. two alternative measures of salience were used. New subjects were given a pool of propositions from previous subjects' metaphor interpretations and object descriptions and rated their immediacy and importance with respect to the base or target terms. No overall asymmetry was found. In particular, contrary to the prediction of salience imbalance, there was no tendency for the ratings of assertions from the metaphor interpretations to be rated as more salient (by either measure) with respect to the base than with respect to the target. Thus, again we found no support for salience imbalance as an interpretation rule for metaphor. The second strong construal of salience imbalance is that it is constitutive of metaphor. No evidence was found for this claim in either Experiments I or 2. There was no difference in the metaphoricity of forward and reverse metaphors in Experiment 1, and more importantly, no correlation was found between metaphoricity and salience imbalance, even when only forward metaphors were considered. This undermines the claim that metaphoricity depends crucially on salience imbalance. Aside from the findings reported here, a second reason to question salience imbalance as the source of the distinction between literal similarity and metaphor is that both literal and nonliteral similarity statements show directional asymmetry and salience imbalance (Rosch. 1976, 1975; Tversky, 1977; Ortony et a / . , 1985). Thus, salience imbalance and asymmetry are not defining characteristics of metaphor. On the other hand, the evidence of Ortony et al. suggests that directional asymmetries are stronger in metaphor than in literal similarity comparisons. (Note, however, that Conner & Kogan, 1980, failed to find greater asymmetry for metaphor). We return to this point shortly. b. Positive Evidence for Salience Imbalance. The strong proposals that salience imbalance provides an informational constraint on metaphor interpretations or that it is definitive of metaphor do not appear to hold. However, there are three findings that indicate that salience imbalance plays a special role in metaphor. First, the results of Ortony et al. suggest that salience imbalance and directional asymmetry may be stronger for metaphors than for literal similarity statements. Second, in Experiment 2, but not in Experiment I , we found some partial evidence that salience imbalance influences aptness judgments in that there was a positive correlation between aptness and order of mention in the base, though no negative correlation was found with order of mention in the target. Third.
Relational Selectivity in Metaphor
343
the results of Experiment 3 showed evidence for salience imbalance (as measured by immediacy ratings) for forward metaphors, though not for reverse metaphors. These findings suggest that salience imbalance may have some special status in metaphor. c. Salience Imbalance as a Communicative Norm. Perhaps the best account of the role of salience imbalance comes from an observation of Ortony ef al. (1985). They suggest that directionality and salience imbalance may arise from a conversational contract-a version of the “givennew” contract described by Clark and Haviland (1977). When a speaker uses a simile such as “a is like b,” the hearer has a pragmatic understanding about what is likely to be conveyed: “In similes (and indeed in all similarity statements) the ‘given’ entity is the topic of the comparison and therefore is in the a-position. The ‘new’ information that is being communicated about the given entity is contained in the b-term. . . . Presumably, to convey the new information, a speaker selects a 6-term for which the attributes to be communicated are highly salient” (Ortony et al., 1985, p. 571). Similar points have been made by Glucksberg (1980) and by Tourangeau and Sternberg ( 1981). Thus, salience imbalance may be the application of a conversational cooperativeness rule to comparatives: “If X is to be explained by comparison with Y,then the explanation should be more accessible for Y than for X.” By this account, salience imbalance should be viewed not as an informational constraint on the interpretation of metaphor but as a communicative norm. The findings discussed here are, in the main, consistent with this account of salience imbalance. We found no evidence that salience imbalance provides an interpretation rule. However, if we assume that metaphors with high “forward” salience imbalance are more communicatively appropriate, this could account for the directional preferences found by Ortony et al. (1985). Such a directional preference could also account for the relation between aptness and salience imbalance for forward metaphors found in Experiment 2. This account does not explain why salience imbalance and asymmetry are more characteristic of metaphor than of literal similarity. Perhaps a further aspect of the conversational contract is that the more demanding it is to comprehend an utterance, the more important it is for it to obey good communicative norms. If we make this assumption, and further assume that metaphors are more difficult to comprehend than literal similarity statements, then we can explain the greater directional asymmetry of metaphors. This line of argument suggests that the two theories are directed at different, though interrelated, aspects of metaphor comprehension. Structuremapping describes a set of informational constraints on the kind of information that should enter a metaphor interpretation, while salience im-
344
Dedre Centner and Catherine Clement
balance describes a communicative norm about the way new information should be presented in a metaphor. That is, when people hear a metaphor they seek a common system of relational information, and they assume that this system will be more obvious in the base than it is in the target. Or, to put it more informally, structure-mapping tells you what to look for and salience imbalance tells you where to look first. An interesting prediction that arises from this account is that salience imbalance effects should be most pronounced in communication contexts, whereas structuremapping effects should hold equally within and outside of communication contexts. l5 The results of the three experiments provide considerable support for the predictions of structure-mapping. To return to the questions raised in the introduction, we conclude that, despite the computational advantages of a simple representational system, we cannot model metaphor interpretation without distinguishing among different kinds of predicatesspecifically, between relations and attributes. In the next section we consider the further issue of higher-order relational structure in the interpretation of metaphor and analogy.
IV. Systems of Relations Structure-mapping describes two interpretation rules for analogy and metaphor. The studies discussed so far provide support for the first of these claims, that people's interpretations tend to include relations common to both domains and tend to disregard common object attributes. We turn now to the second informational constraint postulated by structure-mapping, the systematicity principle which states that people implicitly seek to interpret an analogy or metaphor in terms of a shared set of interconnected relations constrained by higher-order relations rather than in terms of isolated predicates. That is, among the (potentially large) set of common relations that could be included in an interpretation, people select those that are part of a common system of relations. Thus, the systematicity principle acts as a selection filter that selects which lowerorder predicates are preserved in an interpretation. It reflects a tacit preference for coherence and deductive power in analogy and metaphor. In this section we discuss the role of higher-order relational structure in the interpretation of analogy and metaphor. We present empirical evi"Note that this account of salience imbalance i s compatible with Ortony's (1979) third suggestion. that is, with a processing model in which an interpreter starts from the top of the base and works down. searching for components that match with the target. However, in light of the present results, we would add to this the structure-mapping constraint that the match be one of relational structure.
Relational Selectivity In Metaphor
345
dence for systematicity as an informational constraint on the interpretation of analogy.I6Finally, we consider a competing account of the role of higherorder knowledge in analogical processing.
A. EVIDENCE FOR SYSTEMATICITY Although researchers have just begun to test systematicity as a psychological interpretation rule, there is some evidence for the general importance of systematicity in analogical processing. For example, Gentner and Schumacher (1986) and Schumacher and Gentner (1988) have obtained evidence that systematicity can facilitate accurate analogical mapping. In their research, subjects were taught a device model and then asked to transfer their knowledge to an analogous device. Subjects were able to achieve accurate transfer in substantially fewer trials when their initial model possessed systematic structure than when it did not (even though the same procedures were taught in both cases). Other research indicates that common systematic structure influences subjects’judgments of how sound an analogy is (i.e., whether or not the analogy would yield justifable inferences) (Gentner & Landers, 1985; Rattermann & Gentner, 1987). In these studies subjects rated the soundness of the analogical match between two brief stories. All the pairs of stories included matching lower-order relations. In some cases these lower-order matches were governed by matching higher-order relations; in other cases, the lower-order matches simply stood alone. Subjects rated the comparisons between stories as more sound when the higher-order matches were present. These sets of results implicate systematicity in analogical processing and in the evaluation of analogies. However, this research does not tell us whether systematicity acts as a selection filter that influences the way the meaning of an analogy or metaphor is derived from the meaning of the base and target. Research by Clement and Gentner (1988) addresses this issue. We are conducting studies of the interpretation of analogy to examine whether systematicity constrains the selection of information to map between a base and target. Specifically,we ask if the selection of which lower-order relations to include in the interpretation is governed by whether or not these relations belong to a larger system of relations shared by the two domains. In these studies subjects read descriptions of two analogous domains and are asked to say which aspects of the target contribute to the analogy. The base and target are descriptions of fictional worlds. In each case, the base and target share two key lower-order relations (which we will call “key facts”). For example, one base domain describes the lifecycle of some extraterrestrial creatures. The two key facts in this domain ‘The remaining experiments utilize analogies. If, as we have argued, relational metaphors are treated like analogies, the conclu5ons will apply to them as well as to analogies.
346
Dedre Centner and Catherine Clement
are ( I ) that the creatures periodically become dormant and (2) that they can only survive in one particular environmental niche. The analogous target domain concerns robots who explore planets. The target includes analogous versions of the two key facts described in the base: ( I ) the robots periodically shut down their primary functions, and (2) the robots are unable to function in different locations. In both domains, the key facts are governed by higher-order systems of causal relations. The essential manipulation is whether or not these causal systems match between the base and the target. Thus, in one case the matching key facts are governed by matching causal systems (the matching system case). For example, the lower-order facts that the creatureshobots become dormant might be caused by like events in both the base and target, e.g., lack of necessary resources. In contrast, the other pair of matching key facts-e.g., that the creatureshobots have restricted areas of functioning-is governed by different causes in the base and the target (the non-matching-system case). Note that in both cases there is a perfect match between the lower-order predicates. The difference is that in one case, the two like lower-order relations belong to like systems of relations while in the other, the two like lower-order relations belong to different systems of relations. (Which of the two facts is identically governed is counterbalanced across subjects.) Subjects study the base and the target and then choose which of the two lower-order facts in the target best contribute to the analogy with the base. Our hypothesis is that subjects should select the matching-system lower-order relation over the non-matching-system lower-order relation. The results of the initial studies support this hypothesis: subjects chose (study I ) or predicted (study 2) the matching-system fact more often than the non-matching-system fact. Note that the subjects’ overt task was to choose between two pairs of lower-order relations that are, by themselves, equally well matched. Yet subjects showed a significant preference for the pair which is embedded in a larger matching structure. This suggests that systematicity does indeed provide a selection filter on which lowerorder relations are included in the interpretation of an analogy. Thus, we can go beyond simply postulating a relational preference and add systematicity as a further informationalconstraint on the interpretation of analogy. Finally, there is computational support for the structure-mapping account. A computer simulation of the theory, called the Structure-Mapping Engine (SME) (written by Brian Falkenhainer and Ken Forbus), produces psychologically plausible interpretations of analogies and relational mett aphors. (Falkenhainer et al., 1986 in press; Gentner. 1988: Gentner c ~ al., 1987).
Given predicate calculus representations of two potential analogs, it uses purely structural principles-one-to-one correspondence. strirctrrrcil
Relational Selectivity in Metaphor
347
consistency, and systematicity-to interpret and evaluate an analogy between two situations. lt operates by first finding all possible relational identities between base and target; it then assigns to each of these match hypotheses an evaluation. based on the structural closeness of the match and on a kind of local systematicity by which a given pair of matching predicates is assigned a higher evaluation if their parents also match. SME then sweeps these matching pairs into the largest possible sets consistent with the structural constraints laid out previously and computes an overall evaluation. In addition, it hypothesizes candidate inferences, new facts about the target domain that are derived by analogy with the base domain. Thus, SME simulates both the matching of existing predicates in the two domains and the carryover of hypothesized predicates from one domain to the other. We have compared the performance of SME with that of human subjects for a set of simple analogies using pairs of stories (Skorstad, Falkenhainer & Gentner, 1987). We find that SME's structural evaluations match fairly well with human soundness ratings for story analogies. Some aspects of structure mapping have received convergent support. There is widespread agreement on the basic elements of one-to-one mappings of objects and structural consistency during matching and carryover of predicates, and many researchers use systematicity or a variant of it as a selection filter (Burstein, 1983; Hofstadter, 1984; Indurkhya, 1985; Kedar-Cabelli, 1985; Van Lehn & Brown, 1980; Winston, 1980. 1982). More generally, the idea that metaphor and analogy involve a mapping of complex knowledge structures is gaining currency in cognitive science (e.g., Black, 1962; Clement, 1987; Gick & Holyoak. 1980. 1983; Hesse, 1966; Hoffman, 1980; Keane, 1988; Kittay, 1987; Lakoff & Johnson, 1980; Miller, 1979; Reed, 1987; Rumelhart & Norman, 1981; Tourangeau & Sternberg. 1982; Verbrugge & McCarrell, 1977).
B. STRUCTURAL vs. GOAL-DRIVEN MODELS As discussed previously, many researchers in cognitive science and artificial intelligence have made use of structural principles in modeling analogy. However, some analogy researchers have argued for a stronger focus on contextual goals and plans in addition to (or instead of) structural principles (Burstein, 1983; Burstein and Adelson, 1987; Carbonell, 1981. 1983; Holyoak, 1985; Kedar-Cabelli, 1985). For example, Carbonell (1981) proposed that metaphors and analogies are interpreted by means of an invariance hierarchy which captures the degree to which people expect different conceptual relations to be preserved in an interpretation. According to this view, when given a new metaphor or analogy people will first seek an interpretation in terms of a goal-expectation setting, then try planning and counterplanning strategies, then causal structures, and
348
Dedre Centner and Catherine Clement
so on through 10 categories, with descriptive properties and object identity at the bottom of the list. The goal-centered approach to analogy can be divided into three separable claims, not all of which are maintained by any one researcher:" 1. Analogies tend often to be about goals and plans; that is, the higherorder relations that govern the analogy are often goal structures because humans often need to reason about such things (e.g., Carbonell, 1983). In this view, the goal-centered approach can be treated as a specialization of the structural account that attempts to take into account the relative frequency of different kinds of relational structures in human reasoning. Carbonell's invariance hierarchy could be seen as an instance of this effort. 2. Structural principles are important in analogy but must be augmented by some mechanism for taking into account the current cognitive context, including the goals of the person. One reason for this is that structural principles alone are felt to be insufficient to decide among alternative possible interpretations of an analogy (Burstein, 1983; Kedar-Cabelli, 1985). These accounts supplement structural principles with contextual-pragmatic considerations (often in the initial selection of the knowledge given to the matching process) and are, in general, compatible with the views presented here (See Gentner, 1987, 1988a). 3. The strongest version of the goal-centered claim is that the interpretation mechanisms for analogy are defined with respect to the user's goals (e.g, Holyoak, 1985). Thus, an analogy can only be comprehended in the context of a current goal or plan. This strong form of the goaloriented account cannot be viewed as a specialization of structure mapping. Rather, it seeks to replace structural principles with goal-driven selection mechanisms. Since this proposal is a distinct alternative to structure mapping, we discuss it here.
The most extreme of the goal-centered accounts is that given by Holyoak (1985). He proposes that analogy must be modeled as part of a goal-driven processing system: "Within the pragmatic framework, the structure of analogy is closely tied to the mechanisms by which analogies are actually used by the cognitive system to achieve its goals." (Holyoak, 1985, p. 76). He argues that the structure-mapping approach is "doomed to failure" because it fails to take account of goals. In Holyoak's pragmatic account, matching is governed entirely by the relevance of the predicates to the current goals of the problem solver. A crucial difference between the "Occasionally a fourth version of the goal-oriented approach is brought forth: the claim that "In analogy the person has a goal to comprehend the analogy." Since this claim i s compatible with all of the views discussed, including the pure structural view. we will not discuss it further.
Relational Selectivity in Metaphor
349
pragmatic account and the structure-mappingaccount'' is that in the pragmatic account the distinction between structural commonalities and surface commonaltiesis based solely on relevance. Holyoak's definitions of these terms are as follows: I t i s possible, based on the taxonomy o f mapping relations discussed earlier. to draw a distinction between surfuce and strurturul similarities and dissimilarities. An identity between two problem situations that plays no causal role in determining the possible solutions to one or the other analog constitutes a surface similarity. Similarly. a structure-preserving difference, as defined earlier. constitutes a surface dissimilarity. I n contrast, identities that influence goal attainment constitute structural similarities, and structure-violating differences constitute structural dissimilarities. Note that the distinction between surface and structural similarities, as used here, hinges on the relevance o f the property in question to attainment o f a successful solution. The distinction thus crucially depends on the goal o f the problem solver. (Holyoak. 1985. p. 81)
The key point here is the final sentence: in the pragmatic account, the distinction between surface and structural similarities "hinges on the relevance of the property in question to attainment of a successful solution. The distinction thus crucially depends on the goal of the problem solver." Structural similarities are defined as "identities that influence goal attainment," and a surface similarity as "an identity between two problem situations that plays no causal role in determining the possible solutions to one or the other analog." This means that the distinction between surface and structural information-and. therefore, the decision as to what to include in the interpretation of an analogy-must be made with respect to the person's current goals. Holyoak's emphasis on plans and goals has some appealing features. This account promises to replace the abstract formalisms of structural approach with an ecologically motivated account centered around what matters to the individual. Further, to apply structure-mappingto a problemsolving situation requires at least two kinds of selection criteria: structural selection filters within the matcher, as discussed in this article, and a goalrelevance check on the output of the analogical match process (which is '~ account requires modeled as external to the matching p r ~ e s s ) .Holyoak's only one kind of selection criterion: relevance to the goal of the problemsolver. But there are severe costs to this simplification. First, since structural matches are defined only by their relevance to a set of goals, the pragmatic account requires a context that specifies what is relevant before "'It must be mentioned that Holyoak may have modified his views. In a recent talk, Holyoak and Thagard (1987) adopted a more structural approach. ''In fact. I have argued elsewhere that there are three separate criteria that must be applied to an analogy interpretation: structural soundnes, goal relevance, and validity in the target (Gentner. 1988).
350
Dedre Gentner and Catherine Clement
it can operate. Thus, this position fails to capture people’s ability to comprehend a new metaphor or analogy without reference to a prior goal context. Such a mechanism could not interpret an analogy in isolation, nor could it interpret an analogy whose point is irrelevant to the current goal context. Such a position clearly fails as an account of the interpretation of analogy. Although prior contextual goals can provide advance expectations that facilitate the interpretation of an analogy, such expectations are not an inherent requirement for computing an interpretation. Indeed, people often perform the reverse computation; they derive the interpretation of an analogy through a structural computation, such as described here, and then infer from this the probable plans and goals of the speaker. For example, in the Clement and Gentner study, subjects derived systematic interpretations of the analogies without requiring any prior problem-solving goals. An example closer to hand is the analogies and metaphors in Table 1. Since their meanings are not supported by a current goal context, if the interpretation mechanism requires a goal context it should be impossible to comprehend these comparisons. We leave it to the reader to judge whether this is true. To be fair, it should be noted that Holyoak’s (1985) pragmatic account is meant to apply only to analogies in problem solving. As we have shown, when it is applied to the more general question of how analogy is interpreted it encounters serious difficulties. One could, perhaps, preserve the pragmatic account by postulating separate mechanisms for analogy in a goal context and analogy in isolation. In the former case, “structural commonalities” would be defined in terms of goal relevance; in the latter, they would be defined in terms of predicate structure. However, such a dichotomy would lead to a loss in generality. It seems more reasonable to assume that the basic mechanisms of analogy and similarity operate across different task contexts. Such a view is compatible with the structural account but not with the pragmatic account. There are further difficulties with the pragmatic account. Because the interpretation of an analogy is defined in terms of relevance to the initial goals of the problem solver, the pragmatic view does not allow for unexpected outcomes in an analogical match. This means that many creative uses of analogy-such as scientific discovery-are out of bounds. Finally, the pragmatic account lacks any means of capturing the important psychological distinction between an analogy that fails because it is irrelevant and an analogy that fails because it is unsound. In short, although a good case can be made for the need to augment structural considerations with goal-relevant considerations (e.g., Burstein, 1983). the attempt to replace structural factors such as systematicity with pragmatic factors like relevance is misguided.
Relational Selectivity in Metaphor
35 I
I 1 is inlcrcsling to ask if the effects of higher-order structure operate in IiIcvid siiililiirity ltILbIill
as well as analogy. Although Tversky's (1977) theory of
siiiiilw-ily makes no distinctions among predicate types, the struc-
Itltc-illiipping framework presented previously (See Fig. I ) postulates that coiiiiiion relational structure (as well as common object attributes) is imp)rtiinI in litcral similarity. In an ongoing research project with Doug Medin itnd Rob Goldstone, we find support for this claim: subjects' literal similarity judgments show strong effects of relational structure (Goldstone, Mcdin. & Gentner, 1988). Moreover, the effects of relational similarity appear to be separable from the effects of attributional similarity. For example, given this triad, most people would agree that the first string is more similar to the second than to the third: Xo
Xn
To
But if we add exactly one first-order feature (the letter o ) to each of these strings, we can change this preference so that most people now choose the third string as most similar to the first:
0x0
oXn
oTo
Since the same lower-order feature was added to each of the three strings, the change in similarity ordering cannot be accounted for as an additive effect of lower-order features. [Indeed, if the vocabulary were restricted to lower-order features this shift would constitute a violation of Tversky's (1977) independence principle.] Instead, it appears that the shift occurs because people are sensitive to the shared symmetry relation between the first and third string in the second set. Thus, relational structure seems to be an important aspect of ordinary similarity as well as analogy and metaphor. D. CONCLUSION We have examined four alternative accounts of the interpretation rules for analogy and metaphor. These accounts vary in their representational assumptions in whether multidimensional spaces, feature sets, or propositional structures are assumed. We have reviewed the structure-mapping theory. which posits a psychological difference between relations and attributes and bctween systems of relations and isolated relations. The theory is unique in postulating purely structural computations over higher-order prcdiciitc representations; they do not depend on the specific content of
Dedre Centner and Catherine Clement
352
the representations or on a set of prior goals or expectations. The evidence presented supports these claims. These structural distinctions appear to be essential for capturing the informational constraints on the interpretation of analogy and metaphor. This research suggests that analogies and many classes of metaphors can be viewed as devices for highlighting and carrying over relational structure. Because of this, analogies and metaphors allow us to focus on relational commonalities that would otherwise be difficult to express. Further, beyond their communicative uses, analogies and metaphors have enormous conceptual utility as tools for the extraction of relational structure. They allow us to become aware of potentially important relational structures that are not yet explicitly represented in our conceptual and linguistic system, and which may then be abstracted away from the objects to which they apply. In Russell’s words, “It must have required many ages to discover that a brace of pheasants and a couple of days were both instances of the number two.” Research in analogy and metaphor may provide a way to understand this achievement. V.
Appendix A: Scoring Propositional Structure
Attributionality and relationality are judgments about the conceptual predicate structure underlying the surface language. In most cases, the form of the surface expression makes it clear whether the underlying predicate is an attribute or a relation. For example, predicates that take two or more objects, such as transitive verbs, were scored as expressing relationships between their arguments, e.g., “X hit Y” and “X likes Y.” Also, comparatives such as “X is larger than Y,”which express relations between attributes of objects were scored as relations. Adjectives often express single-object attributes. These were scored as object attributes whether or not they were stated as an adjectival proposition, e.g., the proposition “X is 10 feet tall” was scored as an object attribute. For the cases discussed so far, there are clear surface signs of their relational or attributional usage. A more difficult set of cases arises when underlying relations are expressed as surface attributes through a process of abstraction (see Miller, 1979). For example, the adjective soporific in X is soporific” is stated as though it were a quality of X, but in fact it conveys relational information that there exist beings whom X puts to sleep. It stands for a set of relational statements like “X puts Y to sleep,” “X puts 2 to sleep,” etc. These kinds of terms are both relational, in their underlying meaning, and attributional, in that the person has chosen to express the quality as an attribute. Such abstracted relational adjectives
Relational !Mectivity in Metaphor
353
were scored as conveying, in moderate degree, both relational and attributional meaning.
VI.
Appendix B: Grammatical Categories Used in Syntactic Scoring
Statement type: A.
Nouns Ordinary e.g., It has a very large hat. They both live in small houses. Relational e.g., It is a container for something. (It could contain something). They are good providers for their young. (They give something to someone else).
B. Adjectives Ordinary e.g., It has big feet. They are both red. Relational e.g., It is an edible lead. (Someone can eat it.) Both are soporific. (They put a person to sleep.)
C. Comparative Modifiers Adjectives and adverbs that compare often with er suffix). e.g., Trees are taller than buses. Both run faster than their prey.
D. Prepositions A word that connects a noun or pronoun to another noun, usually
with relative spatial location or direction. e.g., The book above the desk is the one. Fluid flowsfrom the heart to the lungs.
354
Dedre Gentner and Catherine Clement
E. Adverbs Modifiers of verbs. (usually with ly suffix). e.g.. Both move swifrly. Both think keenly.
F. Verbs Transitive Verbs that require a direct object. Action passes from the subject across the verb to an object of the verb. e.g., Jerry hit the ball. Karen took the books. Intransitive Action terminates with the verb. Intransitive verbs do not require a direct object. e.g.. Mary pondered. The motor raced. Linking Verbs that join the subject and the direct object or modifier together. Implies that they are equal or similar. e.g., be (is. am) The weather is nasty. taste The pie tasted foul. Auxiliary Auxiliary or helping verbs are used with other verbs to express complex ideas from tenses, moods, voices, etc. When recording verbs, auxiliary verbs are combined with the verb phrase as a whole. He is laughing. e.g.. is might She might do. Verb plus particle A verb combined with a short, invariable part of speech, such as a preposition. e.g., He hung up the phone. She worked out in the gym. ACKNOWLEDGMENTS This research was supported by the Department of the Navy, Office of Naval Research. under Contract No. N00014-79-C-0338, and under the current Contract No. 00014-85-KOSS9. The developmental study was supported by the National Institute of Education under Contract
Relational Selectivity in Metaphor
355
Nos. NIE-400-80-0030 and NIE-400-80-0031. We thank Allan Collins, Ken Forbus, Don Gentner. Doug Medin. Andrew Ortony, Mary Jo Rattermann, Bob Schumacher, Ed Smith. Yvette Tenney. and Cecile Toupin for comments on this research. We also thank Judith Block, Philip Kohn, Betsy Perry, Patricia Stuart, Edna Sullivan. Ben Teitelbaum. and especially Monica Olmstead for their help with these studies.
REFERENCES Black. M. (1%2). Models und metaphors: Studies in lunguuge undphilosophy. Ithaca, New York: Comell University Press. Burstein. M. H. (1983). Concept formation by incremental analogical reasoning and debugging. I n R. Michalski. J. Carbonell. & T.Mitchell (Eds.), Muchine Ieurning (Vol. 2. pp. 351369). Los Altos. California: Morgan Kaufman. Burstein. M.. & Adelson. B. (1987). Analogical learning: Mapping and integrating partial mental models. Proceedings of the Ninth Annual Conference of the Cognitive Science Society. Seattle, pp. 11-22. Carbonell. J. G . (19811. A computational model o f analogical problem solving. Proceedings ($the Seventh Internutionul Joint Coderence on Artificul Intelligence Vuncouwr, British Colitmbiu pp. 147-152. Carbonell. J. G . (1983). Derivational analogy in problem solving and knowledge acquisition. Proceedings qf the International Muchine Leurning Workshop Depurtment of Computer Science. University of Illinois (it Urbunu-Chumpuign pp. 12-1 8. Clark. H. H., & Haviland, S . E. (1977). Comprehension and the given-new contract. I n R. 0.Freedle (ED.). Discorrrseprodi~ctionandcomprehension (pp. 1-40). Norwood, New Jersey: Ablex. Clement. C. A. ( 1987). The representotion and use of principles derived from ubstructions und exumples. Unpublished manuscript. Clement, C. A., & Gentner, D.(1988). Systemuticity us u selection cnnstruint in unulogicul mupping. Unpublished manuscript. Collins, A.. & Gentner, D.(1983). Multiple models o f the evaporation process. Proceedings of the Fiji11 Annuul Conference ofthe Cognitive Science Society Rochester. New York Paper Session No. 6 . Collins, A. M . . & Gentner, D. (1987). How people construct mental models. I n S . Holland and N. Quinn (Eds.) Cultitrul models in language und thought (pp. 243-265). Cambridge: Cambridge University Press. Collins. A. M., & Quillian. M . R. (1%9). Retrieval time from semantic memory. Joitrnirl of Verbul Leurning und Verbal Behavior. 8,240-247. Conner, K.. & Kogan. N. (1980). Topic-vehicle relations in metaphor: The issue o f asymmetry. I n Honeck & R. Hoffman (Eds.), Cognition undfigurutive Iunguuge (pp. 283306). Hillsdale. New Jersey: Erlbaum. Elio. R.. & Anderson. J. R. (1981). The effects o f category generalizations and instance similarity on schema abstraction. Journal of fiperimentul Psychology: Humun Leurning und Memory. 7, 397-417. Falkenhainer. B.. Forbus. K. D..& Gentner, D. (1988). The structure-mupping engine. Proceedings of the Americun Association for Artificiul lnlel/igence. Philadelphia. pp. 272-277. Revised version to appear i n Artificiul Intelligence. i n press. Forbus, K., & Gentner. D.(1986). Learning physical domains: Towards a theoretical framework. I n R. M . Michalski. J. Carbonell, & T. Mitchell (Eds.). Muchine Ieurning: An urt$ciul intelligence uppruuch (Vol. II, pp. 31 1-348). Los Altos, California: Morgan Kaufmann.
356
Dedre Centner and Catherlne Clement
Gentner. D. (1980). The structure of analogical models in science (Tech. Rep. No. 4451). Cambridge, Massachusetts: Bolt Beranek & Newman. Gentner, D. (1982). Are scientificanalogies metaphors? In D. Miall (Ed.), Metaphor: Problems and perspectives (pp. 106-132). Brighton: Harvester Press. Gentner. D. ( 1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155-170 Gentner. D. (1986). Mechanisms of analogical learning. Paper presented at the Workshop on Analogy and Similarity. Monticello, Illinois; to appear in S. Vosniadou & A. Ortony (Eds.). Similarity and Analogical Reasoning, in press. Gentner. D. (1988a). Analogical inference and analogical access. I n Anulugico. pp. 63-88. London: Pitman Publishing Co. Genter, D.( 1988b). Structure-mapping in analogical development: The relational shift. Child Development, 59, 47-59. Genter. D.. Falkenhainer. B.. & Skorstad, J. (1987). Metaphor: The good. the bad and the ugly. Proreedings of the Third Conference on Theoretical Issues in Natural Language Processing. Las Cruces, New Mexico. Also to appear in D. H. Helman (Ed.), Anulogicd reasoning: Perspectives of artificial intelligence, computer science. und philosophy.
Boston: Reidel. Gentner. D.. & Gentner. D. R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & A. L. Stevens (Eds.). Mental models (pp. 99-129). Hillsdale, New Jersey: Erlbaum. Gentner. D.. & Landers, R. (1985). Analogical reminding: A good match is hard to find. Proceedings of the International Conference on Systems. Man and Cybernetics, Tucson.
Gentner. D.. & Schumacher. R. M. (1986). Use of structure-mapping theory for complex systems. Proceedings of the IEEE International Conference on Systems. Man and Cybernetics, Atlanta pp. 252-258. Gick. M. L.. & Holyoak. K. J. (1980). Analogical problem solving. Cognitive Psychology. 12,306-355. Gick. M. L., & Holyoak. K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 25, 1-38. Glucksberg, S. (1980). Remarks. Presented at the Symposium on Metaphor as Knowledge, meeting of the American Psychological Association, Montreal. Goldstone. R.. Medin. D., & Gentner. D. (1988. April). Relational similarity and the nonindependence of features in similarity judgements. Paper presented at the meeting of the Midwestern Psychology Association. Chicago. Hesse, M. 9. (1966). Models and analogies in science. Notre Dame, Indiana: University of Notre Dame Press. Hoffman, R. R. (1980). Metaphor in science. In R. P. Honeck & R. R. Hoffman (Eds.), Cognition and figurative language (pp. 393-4231. Hillsdale, New Jersey: Erlbaum. Hofstadter. D. R. ( 1984). The Copycat project: An experiment in nondeterministic und creutive analogies (MIT A.I. Laboratory Memo 755). Cambridge. Massachusetts: MIT Press. Holyoak. K. J. (1985). The pragmatics of analogical transfer. In G. H. Bower (Ed.). The psvchology of learning and motivation (Vol. I . pp. 59-87). New York: Academic Press. Holyoak. K.. & Thagard. P. (1987). Computational simulation of analogy. Paper presented at the meeting of the Society for Philosophy and Psychology, San Diego. Indurkhya. B. ( 1985). Constrained semantic transference: A formal theory of metaphors (Tech. Rep. No.85/008). Boston: Boston University, Department of Computer Science. Keane. M. (1988). Analogicul problem solving. Chichester, U.K.: Ellis Honvood. Kedar-Cabelli. S. ( 1985). Purpose-directed analogy. Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine. CA. pp. 150-159. Kittay. E. ( 1987). Metaphor: Its cognitiveforce and linguistic structure. Oxford: Clarendon Press.
Relational Selectivity in Metaphor
357
Kittay. E.. & Lehrer. A. (1981). Semantic fields and the structure of metaphor. Studies in language. 5,1, 3143.
Lakoff. G.. & Johnson. M. (1980). Metuphors we live by. Chicago: University of Chicago Press. Krumhansl. C. (1978). Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychologicul Review, 85, 176184.
Marr. D. (1982). Vision. San Francisco: Freeman. Miller. G . A. (1979). Images and models, similes and metaphors. In A. Ortony (Ed.).Metaphor und thought (pp. 202-250). Cambridge: Cambridge University Press. Miller, G. A., & Johnson-Laird. P. N. (1976). Language und perception. Cambridge, Massachusetts: Harvard University Press. Nagy, W. (1974). Figurative patterns and the redundancy in lexicon. Doctoral dissertation, University of California at San Diego. Norman, D. A., Rumelhart. D. E.. & the LNR Research Group. (1975). Explorutions in cognition. San Francisco: Freeman. Ortony, A. (1979). Beyond literal similarity. Psychologicul Review, 86, 161-180. Ortony, A.. Vondruska. R. J., Foss. M. A.. &Jones. L. E. (1985). Salience, similies, and the asymmetry of similarity. Journal of Memory and Languuge, 24, 569-594. Palmer. S. E. (1978). Fundamental aspects of cognitive representation. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp. 259-303). Hillsdale, New Jersey: Erlbaum. Palmer, S. E.. & Kimchi. R. (1985). The information processing approach to cognition. In T. Knapp & L. C. Robertson (Eds.). Approaches to cognition: Contrusts and controversies (pp. 37-77). Hillsdale. New Jersey: Erlbaum. Rattermann, M. J.. & Gentner, D. (1987). Analogy and similarity: Determinants of accessibility and inferential soundness. Proceedings of the Ninth Annuul Meeting of the Cognitive Science Society. Seattle pp. 23-34. Reed, S. K.(1987). A structure-mapping model for word problems. Journul of Experimental Psychologv: Learning, Memory and Cognition. 13, 124-1 39.
Reynolds. R. E.. & Ortony. A. (1980). Some issues in the measurement of children's comprehension of metaphorical language. Child Development. 51, I 110-1 119. Rips. L. J.. & Turnbull. W. (1980). How big is big? Relative and absolute properties in memory. Cognition. 8, 175-185. Rosch, E. (1973). On the internal structure of perceptual and semantic categories. In T. E. Moore (Ed.), Cognitive development und the acquisition of languuge (pp. 1 11-144). New York: Academic Press. Rosch, E. ( 1975). Cognitive reference points. Cognitive P ~ y c h o l ~ g7y, .532-547. Rumelhart, D. E.. & Abrahamson. A. A. A. (1973). A model for analogical reasoning. Cognitive Psvchology, 5, 1-28.
Rumelhart. D. E., & Norman, D. A. (1981). Analogical processes in learning. In J. R. Anderson (Ed.).Cognitive skills und their urquisition (pp. 335-359). Hillsdale, New Jersey: Erlbaum. Rumelhart, D. E., & Ortony, A. (1977). Representation of knowledge. In. R. C. Anderson, R. J. Spiro, & W. E. Montague (Eds.), Schooling and the acquisition of knowledge. Hillsdale. New Jersey: Erlbaum. Schank. R., & Abelson. R. (1977). Scripts. plans. goals. and understunding. Hillsdale, New Jersey: Erlbaum. Schumacher, R. M..& Gentner. D. (1988). Transfer of training as analogical mapping. IEEE Transactions on Systems. Mun. und Cybernetics.
Shepard, R. N. (1974). Representation of structure in similarity data: Problems and prospects. P.v.vchometrika. 39, 373-42 I ,
358
Dedre Centner and Catherine Clement
Skorstad, J., Falkenhainer. B., & Gentner, D.(1987). Anological Processing: A simulation and empirical corroboration. Proceedings of the Meeting of the Americrin Associurion of Art$ciul Intelligence. Seattle. Tourangeau, R.. & Sternberg. R. J. (1981). Aptness i n metaphor. Cognitive Psvcltologv. 13, 27-55. Tourangeau R.. & Sternberg, R. J. (1982). Understanding and appreciating metaphors. Cognilion. 11, 203-244. Tversky. A. ( 1977). Features o f similarity. Psychokogicul Review. 84, 327-352. Van Lehn. K..& Brown. J. S. (1980). Planning nets: A representation for formalizing analogies and semantic models o f procedural skills. I n R. E. Snow, P. A. Federico, & W. E. Montague (Eds.), Aptitude. Ieurning and insrriiction: Cognitive process unulyses (Vol. 2. pp. 95-137). Hillsdale. New Jersey: Erlbaum. Verbrugge, R. R.. & McCarrell, N. S. (1977). Metaphoric comprehension: Studies in reminding and resembling. Cognitive Psvchology. 9, 494-533. Winston. P. H. (1980). Learning and reasoning by analogy. Commimicutions oftlie A C M , 23,689-703. Winston. P. H. (1982). Learning new principles from precedents and exercises. Artificiul Intelligence. 19, 321-350.
A
structure mapping, 315 systems of relations, 344-352 Analyses of variance (ANOVA). relational selectivity in metaphor and, 338, 339 Anomaly, relational selectivity in metaphor and, 308, 314 Aphasia, working memory, aging and, 194 Applicability, relational selectivity in metaphor and, 311 Arousal, comparator hypothesis and, 76 Associability, comparator hypothesis and,
Acquisition, comparator hypothesis and, 51-54. 61
applications, 62-65 appraisal, 87 conditioned inhibition theory, 68-71 postconditioning inflation, 85 relationship to other models, 81, 82 Activation, retrieval strategies and, 228, 230-232, 254
Age, alternative representations and, 275,
64, 68, 70. 74
283
Aging, working memory and, see Working memory, aging and Alternative representations, 261, 303 approaches, 261-264 bus schedules, 264-274 comprehensive vim, 300, 301 cognitive procesees, 303 domains, 301, 302 forms of representation, 302 medical instructions, 274-283 0veMm of experiments, 297 research strategy, 299,300 tcxt-cditing commands, 283-297 Amnesia, working memory, aging and, 194 Analgesia, comparator hypothesis and, 65, 66 Analogy, relational selectivity in, 307-309 multidimensional space models, 309
Associations, comparator hypothesis and, see Comparator hypothesis, associations and Attention comparator hypothesis and, 70, 84 retrieval strategies and, 227, 239 working memory, aging and, 214, 216, 219 Attributionality, relational selectivity in metaphor and experiments, 319-323, 327-335, 341 scoring propositional structure, 352, 353 systems of relations, 351
B Blocking, comparator hypothesis and. 53, 58
applications, 64-66 conditioned inhibition theory. 68-70, 72 359
360
Index
Bus Schedules, alternative representations and, 264-274, 297, 301 C
extinction, 77 retardation tests, 67-70 summation tests, 70-72 supcrconditioning, 72-74 generalization to instrumental behavior. 86, 87
Carryover effects, foraging and, 38, 39 Categories alternative representations and bus Schedules, 269, 272-274 comprehensive view, 301 overview of experiments, 298 textediting commands, 288,2% relational dectivity in metaphor and, 333, 335 Characteristicnm, relational selectivity in metaphor and. 311 Cognition alternative representations and, 262,263 bus Scheduled, 273 comprehensive view, 300.303 overview of experiments, 297 tcacditing commands, 297 comparator hypothedia and. 52, 54 foraging and, 7, 16 relational selectivity in metaphor and, 347, 348
working memory, aging and, 194 inhibition, 219, 220 reduced processing resource approach, 208
theoretical framework, 194, 195 Cognitive gerontology, working memory, and, 197, 208, 210, 215, 216 Cognitive tasks, retrieval strategies and, 227
Coherence relational selectivity in metaphor and,
multiple contexts, 78 postconditioning inflation amament of 2 x 2 matrix. 82-84 associative strength, 84-86 punctate comparator stimuli, 61, 62 relationship to other models Oibbon-Bahrn, 80.81 learning theory, 82 Ramrla-Wagner, 81, 82 temporal window for comparisons, 78, 79
Competition retrieval strategica and, 230, 231 working memory, aging and, 215, 217, 219
Compliance, alternative representations and, 274 Comprehension alternative representation8 and, 278-280. 283,297,299
working memory, aging and inference difficulty, 200 inhibition, 215, 217, 219 Priminp. u)6,2fv reduced processing resource approach. 209 theoretical framework, 197, 198 Conceptual centrality, relational selectivity in metaphor and, 3 11 Conceptual process, visual stimuli and, 139, 160, 181
Conditioned excitation, comparator 344 hypothesis and working memory, aging and, 214, 215, apphCiOM, 65 219 conditioned inhibition theory, 67, 68, Comparator hypothesis, a ~ ~ ~ ~ and, i a t i ~ n ~ 70-12, 74-77 51-56 postconditioning inflation, 83.84 apphtiOM. 62 punctate comparator stimuli, 62 overshadowing, 62-64 relationship to other models, 81 US-prexposure effect, 64-66 temporal window, 79 appraisal, 87 Conditioned inhibition theory, comparator comparator stimuli, 79, 80 hypothesis and, 59, 67 appraid, 87 comparison, 56-61 conditioned inhibition theory, 67 conditioned excitation, 74-77 conditioned excitation, 74-77 extinction, 77
Index postconditioning inflation, 83, 85 punctate comparator stimuli, 62 relationship to other models, 81, 82 retardation tests, 67-70 summation tests, 70-72 superconditioning, 72-74 temporal window, 79 Conditioned stimuli. comparator hypothesis and, 54-61 applications, 65, 66 comparator stimuli, 80 conditioned inhibition theory, 67-77 multiple contexts, 78 postconditioning inflation, 82-85 punctate comparator stimuli, 61 relationship to other models, 80, 81 temporal window, 78, 79 Confmation bias, experimental synthesis of behavior and, 107. 115. 131 Consistency relational selectivity in metaphor and,
Cued recall, alternative representations and, 270-272, 277
cues alternative representations and, 290 comparator hypothesis and, 61, 73, 79, 80 foraging and. 22.24.26 retrieval strategies and, 245 visual stimuli and, 150, 151 Working memory, a& and, 193 inhibition, 217-219 priming, 204, 205, 207
D Daydreams, working memory, aging and, 212
Decay, visual stimuli and, 144, 165, 166 Decision experimental synthesis of behavior and,
313, 347
retrieval strategies and, 231-233 Constrained optimal model, foraging and, 31, 38
Constraint, experimental synthesis of behavior and, 130, 134 Context comparator hypothesis and, 78-81, 84, 86
working memory, aging and, 217 Contiguity, comparator hypothesis and, 51, 82, 85
Contingency comparator hypothesis and, 54, 55, 58 conditioned inhibition theory, 68, 69,
132
foraging and, 2, 6, 33 Delayed reduction, foraging and patch departure, 21, 24, 25, 28 prey selection, 11-15 risk, 35 Diet selection, foraging and, 39, 40 Discourse processing, working memory, aging and, 195 Distractibility, working memory, aging and, 215
Duration, visual stimuli and phenomenological appearance, 167-175, 178. 179, 182, 183
picture memory model, 148, 149, 152,
75-77
temporal window, 79 experimental synthesis of behavior and, 94-96,99-103 assessment, 106, 114 maximization, 121. 124 negative effects of reward, 129, 131 positive effects of rule fmding, 132 pretraining variation, 104 stereotypy, 133, 134 foraging and, 6, 24 Contradiction, retrieval strategies and, 235, 239
Contrast, comparator hypothesis and, 86 Crab, foraging and, 1, 4, 15
361
153, 162
E Elaboration alternative representations and, 299 retrieval strategies and, 233 Encoding alternative representations and, 263,265. 27 1
retrieval strategies and, 254 visual stimuli and, 139, 180-182 working memory, aging and inference difficulty, 199,200,m
362
Index
Encoding, working memory, aging and (mnr.) inhibition, 215 reduced proassing resource approach, 209 theoretical framework, 195 Energy intake, foraging and, 3, 4, 16, 17, 33 Energy yield, foraging and, 8, 11, 14 Evolution, foraging and, 2, 3 Experimental synthesis of behavior, 93-103, 128 apprehension of logical form, 114-120 contingency assessment, lob114 maximization, 120-128 negative effects of reward, 129-131 positive effects Of rule f i d h g , 131-133 prctraining variation, 104-106 stmotypy, 133-135 Experts, retrieval strategies and, 231, 232 Exploitation, foraging and, 31, 38 External memory, working memory, aging and, 201 Extinction comparator hypothesis and, 58-61 applications, 63-65 conditioned inhibition theory, 69, 77. 78 generalization, 86 postconditioning inflation, 83, 84 punctate comparator stimuli, 61. 62 experimental synthesis of behavior and, 95
foraging and, 22
decision, 20-25 f u t w 26 operant simulations, 18-20 time, 16, 17 prey selection delay reduction, 11, 12 how, 8 operant simulations, 8-10 partial preferences, 10, 11 scalar expectancy, 12, 13 Schedules, 13-16 risk energy budget, 34-36 WpOW t0 V h M C e , 33.34 W P W examples, 27, 28 fluctuating environment, 28, 29 momentary maximizing, 26.27 two-med bandit, 29-33 time horizons diet selection, 39.40 patch departure, 40.41 problem, 36, 37 tWO-mCd bandit, 37-39 Forgetting alternative represmtations and, 277 retrieval strategies and, 229, 230 working memory, aging and, 194,215, 217 Free r e d , working memory, aging and, 195. 216
F
G
Fear, foraging and, 37 Feedback, experimental synthesis of behavior and, 122, 128 Fixed interval schedule, fomging and, 10, 12 Foraging, comparator hypothesis and, 85, 87 Foraging, operant behavior and, 1, 2, 41-43 laboratory simulation examples, 5, 6 why. 6, 7 optimal foraging theory models. 3 -t% 4.5 patch departure assessment. 25. 26
Generalization comparator hypothesis and, 63, 79,86. 87 upmimental synthesis of behavior and, 93, 98, 99 contingency assessment, 108 negative effects of reward, 129, 130 Gerontology, cogaitive, working memory and, 197, 208, 210, 215, 216 Giving-uptime rule, foraging and, 18, 19, 21, 22 God& relational seledvity in metaphor and, 347-350 Grammatical categories, relational selectivity in metaphor and, 353, 354
IndcX
H
363
L
Habituation, comparator hypothesis and,
List learning, working memory, aging and,
68. 70
I Immediacy, relational selectivity in metaphor and, 338-340 Incentive, comparator hypothesis and, 86 Inference alternative representations and, 279, 280 retrieval strategies and, 233-235, 239, 256 working memory. aging and formation, 199-204, 208 inhibition. 217 priming, 207 reduced processing r e ~ ~ u r approach, ce 210, 211
theoretical framework, 194,1%,198,199 Information processing, alternative representations and, 261 Inhibition, working memory, aging and, 212-217
compensatory mechanisms. 217-220 Instrumental behavior, comparator hypothesis and, 86, 87 Instrumental conditioning, experimental synthesis of behavior and, 93 Intelligence, experimental synthesis of behavior and, 133 Intelligent variation, experimental synthesis of behavior and, 97, 134 Interference alternative representations and, 2% retrieval strategies and, 230, 231, 233 working memory, aging and, 216, 217 Interpretation, relational selectivity in metaphor and, 307 accounts. 312, 313, 315 experiments. 316, 317, 319, 320, 326, 327, 336, 338-340, 343
systems of relations, 344, 345, 347, 348. 350-352
Intentimulus intervaI. visual stimuli and, 151, 152, 154-156, 159-163. 165, 166
phenomenological appearance, 172, 173, 177, 178, 182
Invariance hierarchy, relational selectivity in metaphor and, 347, 348
195
Literal similarity, relational selectivity in metaphor and, 351 Long-term memory, aging and, 206 Luminance, visual stimuli and phenomenological appearance, 170, 172, 175, 176, 178, 179
picture memory model, 148, 149, 151-162, 164, 165
M Marginal-value theorem, foraging and, 17-20, 24
Masking, visual stimuli and, 185, 186 phenomenological appearance, 172, 174-177, 179
picture memory model, 148. 150, 152-162, 164-166
Matching experimental synthesis of behavior and, 120, 121, 123-126
foraging and, 6 Matching law, experimental synthesis of behavior and, % Matching task, alternative representations and, 270, 272 Maximization, experimental synthesis of behavior and, 120-128, 132 Meaningfulness, alternative representations and, 271, 272, 274 Mediating processes, visual stimuli and, 183, 184
Medical instructions, alternative representations and, 264, 274-283, 297, 298. 301
Melioration, experimental synthesis of behavior and, 120, 124. 127, 128 Memory alternative representations and, 262 bus schedules, 271. 272 comprehensive view, 303 medical instructions, 275, 276, 278-283 tcxt-cditing commands, 289 comparator hypothesis and, 51-53, 56 experimental synthesis of behavior and, 100
364
Index
Memory (conr.) foraging and, 2, 42 laboratory simulation,7 patch departwe, 20.24.26 prey selection, 12, 13, 16 time horizons, 38 retrieval strategica and, 228,254-256 selection, 235, 236, 239, 240, 243, 245, 247,248
two-factor theory, 228,230,231,233,234 visual stimuli and, 139-142 information acquisition, 180, 181 mediating piacema, 184 phenomenological appearance, 182 Metaphor, relational wlectivity and, 307-309, 351, 352 multidimensional space models,309,310 salience imbalance accounts, 310-312 experimen& 316-344 wring propositional structure, 352, 353
structure mapping accounts, 312-316 experiments, 316-344 syntactic llcorinp, 353, 354 syatcms of relations, 344, 345 activity, 345-347 literal hilarity, 351 modes, 347-351
Mnemonic device alternative reprrsentations and, 285 retrieval strategica and, 230 Momentary maximidnp, foraging and, 2 6 2 7 Monotonidty, visual stimuli and, 148, 181 Motivation dternative represmtations and, 275, 276, 302
comparator hypothesis and, 52 experimentalsynthgir of behavior and, 129 foraging and, 7, 15, 37, 39 retrieval ctrategies and, 239, 246 Motive& experimental synthesis of behavior and, 130 Multidimensional space models, relational selectivity in metaphor and, 307, 309. 310
N Naloxone, comparator hypothesis and, 66 Necessity. experimental synthesis of behavior and, 115, 117-119
Negative transfer, experimental synthesis of behavior and. 103, 128 contingency aesedsment, 108, 113 prrtraining variation, 104 0
operant behavior, foraging and, see Foraging, operant behavior and Optimality, f0-w and, 3-5, 42, 43 laboratory aimulation, 5-7 patch departure, 18-20.24-26 prey dection, 10-13, 15, 16
risk, 34, 36
~ a m p w 27-29, , 32 time horizons, 36, 37, 39 Optimization, experimental synthesis of behavior and, 120, 121 overshadowing, comparator hypothesis and. 53
applications, 62-64 appraisal, 87 conditioned inhibition theory, 73, 74 postconditioning inflation, 83, 84 relationship to other models,82 Overwriting memory, retrieval strategies and, 230
P learning paradigm, retrieval and, 230 Partial prefemce, f o r a h and, 9-11 Patch departurr, foraging and assessment, 25, 26 decision. 20-25 futun, 26 operant simulations, 18-20 M
-
h t e StraWgkS
time, 16, 17
time horizons, 40,41 Payoff, experimental synthesis of behavior and, 101, 102, 108 Puwptual process, visual stimuli and, 139, 140
information acquisition, 180 picture memory model, 156, 157, 159-165 Persistence, visual stimuli and mediating procasca, 183 phenomenological appearance, 169-174, 176, 178, 179, 182
365 Picture memory, visual stimuli and, 140, 142-148 applications, 148-152 duration of procwing, 152-166 information acquisition, 180 Pigeons comparator hypothesis and, 79 upcritnentalsynthesis of behavior and, 133 foraging and, 1 patch departure, 16 risk, 34 sampling, 28, 30, 32 time horizons, 38 Prrnposureeffect, comparator hypothe& and applications, 62.64-67 appraisal, 87 conditioned inhibition theory, 68 postconditioning inflation, 83,84 &tionship to 0th- models, 82 ,P experimeatal synthesis of behavior and, 101-103, 128 apprehension of logical form, 118, 120 contingency assessment, 109, 110. 113, 114 maximization, 122-128 negative effects of reward, 130 positive effects of rule finding, 131-133 stereotypy, 133 variation, 104-106 Prey selection, foraging and, 41 delay reduction, 11, 12 how, 8 laboratory simulation, 5 0-t Simulatio~,8-10 optimal foraging theory, 4 partial preferences, 10, 11 patch departure, 2 4 2 1 scalar expectancy, 12, 13 schedul- 13-16 Priming retrieval strategies and, 249-251, 254 working memory, aging and, 204-208 Probability learning,foraging and, 33 Profitability, foraging and. 14, 15 Punctate comparator stimuli, comparator hypothesis and, 61, 62,79
R Random intervals, upcrimental synthesis of behavior and, 123 Randomness, experimental synthesis of behavior and, 97,133, 134
Reaction time retrieval strategies and, 255 selection, 235-239. 242. 243, 246, 248. 250-253 two-factor theory, 234 working memory, aging and, 205
Recall
alternative representations and bus schedules, 270-272 medical instructions, 278 OV~MLW of experiments, 297,299 turtcditing commands, 284,285,290 retrieval strategies and, 232 working memory, aging and, Zoo. 202, 214 -cy
comparator hypothesis and, 78 retrieval strategies and, 248 Recognition alternative repnsentationsand, 272 retrieval strategies and selection, 237, 241, 243-245 two-factor theory, 230-234 visual stimuli and, 153, 156 working memory, aging and, 205,207, 214 Reduced processing resource approach, working memory, aging and, 208-212 Rehearsal, alternative npresentations and, 277 Reinforcement comparator hypothesis and, 52, 56, 58 applications, 62 conditioned inhibition theory, 68, 70, 72, 74, 77 generalization, 86 punctate comparator stimuli, 62 relationship to other models, 81 temporal window, 79 experimental synthesis of behavior and, 93-96. 98-103 contingency assessment, 108, 113 maximization, 120-128 negative effects of reward, 129-131 paraining variation, 104, 106 stereotypy, 133, 134 foraging and, 1, 2 patch depart22, 25 prey selection, 11, 13, 14 risk, 33, 36 ~ a m p w 27-29, , 32 time horizons, 40
366
loda
Reminiscing. retrieval strategies and, 229 Repaition, experimental synthesis of behavior and, 100, 101, 133 Resource depression, foraging and, 16, 18, 20. 26 Retardation tests, comparator hypothesis and, 58 applications, 65 conditioned inhibition theory, 67-71,76,77 postconditioning inflation, 83. 84 punctatc comparator stimuli, 62
Retention comparator hypothesis and, J3, 61 experimental synthesis of behavior and, %
Retrieval alternative representations and, 261, 271, 299
comparator hypothesis and, 51, 53 visual stimuli and, 148 working memory, aging and inference difficulty, 199, 200,202-204, 208
inhibition, 215-219 priming, 204-206.208 reduced processing resource approach, 209
theoretical framework, 195 Retrieval strategies, control of, 227, 228, 254-258
selection, 235-240 intrinsic variables, 245-254 mixaddelay design, 241-245 two-factor theory, 228-230 competition for activation, 230, 231 elaboration, 232-234 experts, 231, 232 Retrograde amnesia, comparator hypothesis and. 53 Reward, negative effects of, experimental synthesis of behavior and, 129-131 Reward condition, experimental synthesis of behavior and, 102, 103 Risk, foraging and, 41, 42 energy budget, 34-36 response to variance, 33,34 time horizons, 39 Rules, experimental synthesis of behavior and apprehension of logical form, 118, 120 contingency assessment, 114
maximization, 122, 124, 126 positive effects, 131-133 pretraining variation, 106 Run of bad luck, foraging and, 19. 22. 23, 26
S Salience, comparator hypothesis and, 70, 71
Salience imbalance, relational selectivity in metaphor and, 307, 309 accounts. 310-312, 316 experiments, 316-344 Sampling, foraging and examples, 27, 28 fluctuating environment, 28, 29 momentary maximizing, 26.27 two-armed bandit, 29-33 Scalar expectancy model, foraging and, 12, 13, 28, 29 Scanning, alternative representations and, 265, 298
Schizophrenia, working memory, aging and, 213, 219
Selection task, experimental synthesis of behavior and, 114 Selective attention visual stimuli and, 168 working memory, aging and, 216, 219 Semantic grouping, alternative representations and bus echedules, 265, 269-274 comprehensive vim, 301 overview of experiments, 298 textaditing commands, 2%. 297 Short-term memory, retrieval strategies and, 228
Signal detection, foraging and, 4 Similes,relational selectivity in metaphor and, 311, 312 Stereotypy, experimental synthesis of behavior and, 94-100, 103, 128, 133-135
apprehension of logical form, 116 contingency asmsment, 110 maximization, 122 pretraining variation, 104, 105 Stimulus onset asynchrony visual stimuli and, 154, 158, 160, 162, 163 working memory, aging and, 205, 207
lnda
367
Stimulus priming, visual stimuli and, 148,
pretraining variation, 104 relational selectivity in metaphor and,
149
Stimulus-response, alternative representations and, 285, 295 Storage capacity, working memory, aging and, 197, 280,211 Structure mapping, relational selectivity in metaphor and, 307. 309 accounts, 312-316 experiments, 316-344 systems of relations, 344, 346-349, 351 Successive reversal, foraging and, 32, 33 Sufficiency, experimental synthesis of behavior and, 115-120 Summation tests, comparator hypothesis and, 58 conditioned inhibition theory, 67, 68, 70-72, 76, 77
postconditioning inflation. 83 punctate comparator stimuli, 62 Superconditioning, comparator hypothesis and, 72-75, 87 Symbol meaning, alternative representations and, 265, 269-272, 274 Synchrony, visual stimuli and, 168, 172, 175 Syntactic scoring, relational selectivity in metaphor and experiments, 333, 334 grammatical catcgoric& 353, 354 Systematicity, relational selectivity in metaphor and accounts, 313 evidence for, 345-347 systems of relations, 344 Systemic variation, experimental synthesis of behavior and, 97, 132, 134
345
'Itawl time, foraging and, 18, 19,24,26 'Mo-armed bandit, foraging and, 29-33, 37-39
U Unconditioned stimulus, comparator hypothesis and, 54-58, 61 applications, 64, 66 appraisal. 87 conditioned inhibition theory, 67-69, 71, 74-77
postconditioning inflation, 83, 84. 86 pnarposure effect, 62, 64-66 punctate comparator stimuli, 61 relationship to other models, 80-82 temporal window, 78.79 Unidimensionality, visual stimuli and information acquisition. 180 mediating processes, 184 picture memory model, 143, 144, 146 Unmasked stimulus, visual stimuli and, 148, 150
V Variability, experimental synthesis of behavior and, 128, 133, 134 Variable interval schedules, foraging and, 10. 12
Variance, foraging and, 33, 34 Visible persistence, visual stimuli and, 140,
T
165, 167, 168
Visual stimuli, information extraction from, Text-editing commands, alternative representations and, 264,283-297,301 Time horizons, foraging and, 41. 42 diet selection, 39, 40 patch departure, 40.41 problem, 36, 37 two-armsd bandit. 37-39 'ILansfer alternative representations and, 2% experimental synthesis of behavior and, 128
contingency assessment, 108, 113
139-143, 184-188
information acquisition, 180 conceptual processing. 181 monotonocity assumption, 181 perceptual processing, 180 mediating processes, 183,184 phenomenological appearance, 166. 181-183
mode applications, 168-171 mode extension, 166-168 persistence, 172-180 picture memory model, 143-148
368
Index
Visual stimuli, information extraction from, picture memory model ( m a ) applications, 148-152 duration of proceeeing, 152-166 Vocabulary, working memory, aging and, 197 W
Working memory, aging and, 193, 194 inference difficulty, 199-204. 208
inhibition, 212-217 compensatory mechanisms. 217-220 Primin03 204-208 reduced procesSing resource approach, m-212 theoretical framework, 194, 195 discoursc p r o c e s ~ 195-198 , inferenm, 198, 199 Working memory, alternative rcpnsmtations and, 265, 266
CONTENTS OF RECENT VOLUMES Volwr. 12 Experimental Analysis of Imprinting and Its Behavioral Effects Howard S. Hoffman Memory, Temporal Discrimination, and Learned Structure in Behavior Charles P. s h i p The Relation between Stimulus Analyzabllity and Perceived Dimensional Structure Barbara Burns,Bryan E. Shepp, Dorothy McDonough, and Willa K. Wiener-Ehrlich Mental Comparison Robert S. Moyer and Susan T. Dumais The Simultaneous Acquisition of Multiple Memories Benton J. Underwood and Robert A. Malmi The Updaof Human Memory Robert A. Bjork subject Index
Memory Storage Factors Leading to Infantile Amnesia Norman E. Spear Learned Helplessness: All of Us Were Right (and Wrong): Inescapable Shock Has Multiple Effects S t m n F. Maim and Raymond L. Jackson On the Cognitive Component of Learned Helplessness and Depression L a m B. Alloy and Martin E. P. Seligman A General Learniq Theory and Its Application to Schema Abstraction John R. Anderson, Paul J. Kline, and Charles M. Beaslcy, Jr. Similarity and Order in Memory Robert 0. Cmvder Stimulus Classification: Partitioning Strategies and Use of Evidence Patrick Rabbitt Immediate Memory and Discourse mcessing
Robert J. Jarvclla Subject Index
Volumo 13 Pavlovian Conditioning and the Mediation of Behavior J. Bruce ommier and Janice A. Lawry A Conditioned Opponent Theory of Pavlovian Conditioning and Habituation Jonathan S c h d
Volumo 14 A Molar Equilibrium Theory of Learnad Performance William Timberlakc
369
370
Contents of Recent Volumea
Fish as a Natural Category for People and Pigeons R. J. Hcrnstein and Petcr A. de Villiers Freedom of Choice: A Behavioral Analysis A. Charles Catania A Sketch of an Ecological Maatheory for Theories of Learning Timothy D. Johnston and M. T. I\lmy SAM: A Theory of Probabilistic Search of Associative Memory Jenrm 0. W. Raaijmakers and Richard M. Shiffrin Memory-Based Rehearsal Ronald E. Johnson Individual Differences in Free Recall: When Some People Remember Better Than Others Marcia Ozier
Index
Volume 16 Skill and Working Memory William 0. Chase and K. Anders Ericseon The Impact of a Schema on Comprehension and Memory Arthur C. Graesser and GICM V. Nakamura Construction and Representation of Ordcrings in Memory Kirk H. Smith and B a r k T. Mynatt A Perspective on Rehearsal Michael J. Watkins and Zehra F. Peynircioflu Sh0rt-lb-m Memory for Older Information Alice F. Healy Retrospective and Prospective Processing in Animal Working Memory Werner K. Honig and Roger K. R. Thompson
Index
Volumo 15 Conditioned Attention Theory R. E. Lubow, I. Weiner, and Paul Schnur A Classification and Analysis of Short'Rrm Retention Coda in Pigeons Donald A. Riley, Robert 0. Cook, and Marvin R. Lamb Inferences in Information Processing Richard J. Harris Many Are Called but Few Arc Chosen: The Influence of Context on the Effects of Category Size Douglas L. Nelson Frequency, Orthographic m l a r i t y , and Lexical Status in Letter and Word Perception Dominic W. Massaro, James E. Jastrzcmbski, and Peter A. Lucas Self and Memory Anthony 0. Gknwald Children's Knowledge of Events:A Causal Analysis of Story Structure Tom Ifabassq Nancy L. Stein, and Lucie R. Johnson
Index
Volume 17 The Structure of Human Memory William F. Bre~erand John R. P h A Simulation Model for the Comprehension of 'Itchnical Prose David Kieras A Multiple-Entry, Modular Memory System Marcia K. Johnson The Cognitive Map of a City-50 Years of Learning and Memory Harry P. Bahrick Problem Solving SW in the Social Sciences Jama F. Voss, lkrry R. Grccne, Timothy A. Post, and Barbara C. Penner Biological Constraints on Instrumental and Classical Conditioning: Implications for General Process Theory Michael Domjan
Index
Volumo 18 Nonanalytic Condition: Memory, Perception, and Concept Learning Larry L. Jacoby and Lee R. Brooks
Coatentr of ¢ Voiomea
On the Nature of Categories Donald Homa The Recovery of Unconscious (Inawmsible) Memories: Laboratory Studies of Hypermnesia Matthew Erdelyi Origins of Behavior of Pavlovian Conditioning Peter C. Holland Directed Forgetting in Context Mark Rilling, Donald F. Kendrick, and Thomas B. Stonebraker Effects of Isolation Rearing on Learning by MUIUUals Robert Holson and Gene P. Sackett Aristotlds Logic Marilyn Jager Adams Some Empirical Justification for a Thcory of Natural Propositional Logic Martin D. S. Braine, Brian J. Reiser, and Barbara Rumain
371
Associative Structures in Instrumental -ng
Ruth M. Colwill and Robert A. Rescorla The Structure of Subjective Time: How Time Flies John Gibbon The Computation of Contingency in Classical Conditioning Richard H. Granger, Jr. and Jeffrey C. Schlimmer -ball: An Example of KnowledgeDincted Machine Learning Elliot Soloway Mental Cues and Verbal Reports in -g
Francis S. Bellezza Memory Mechanismsin 'Rxt Comprehension Murray Glanzer and Suzanne Donnenwcrth Nolan
Index
Index
Volume 19 Memory for Experience Janet Kolodner The Pragmatics of Analogical "hsfer Keith J. Holyoak Leaning in Complex Domains: A Cognitive Analysis of Computer Programming Richard E. M a w Posthypnotic Amnesia and the Dissociation of Memory John F. Kihlstrom Unit Formation in Perception and Memory John Ceraso How Infants Form Categories Barbara A. Younger and Leslie B. Cohen
Index
Volume 20 Recognition by Components: A Theory of Visual Pattern Recognition lrving Biederman
Volume 21 An Integrated Computational Model of Stimulus-ResponseCompatibility and Practice Paul S. Rosenbloom and Allen Newell A ConnectionistKontroI Architecture for Working Memory Walter Schneider and Mark Dctweiler The Intelligent Hand Roberta L. Klatzky and Susan J. Lederman Successive Approximation to a Model of Human Motor Programming David A. Rosenbaum Modular Analysis of Timing in Motor Skill Steven W. Keele and Richard I. Ivry Associative Accounts of Causality Judgment David R. Shanks and Anthony Dickinson Anxiety and the Amygdala: Pharmacological and Anatomical Analysis of the Fear-Potentiated Startle PiUIIdigm Michael Davis, Janice M. Hitchcock. and Jeffrey B. Rosen
Index
This Page Intentionally Left Blank