THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 21
This Page Intentionally Left Bla...
18 downloads
1080 Views
17MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 21
This Page Intentionally Left Blank
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
EDITED BY GORDON H. BOWER STANFORD UNIVERSITY, STANFORD, CALIFORNIA
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers
San Diego New York Berkeley Boston London Sydney Tokyo Toronto
COPYRIGHT
0 1987 BY ACADEMICPRESS, INC.
ALL RIGHTS RESERVED.
NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS. ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING. OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM. WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC. 1250 Sixth Avenue San D i e g o , California 92101 United Kingdom Edition published by ACADEMIC PRESS INC. ( L O N D O N ) LTD. 24-28 Oval Road, London NWI 7DX
LIBRARYOF CONGRESS
CATALOG CARD
ISBN 0-12-543321-2 (alk.
paper)
PRINTED IN THE UNITED STATES OF AMERICA 87 88
89 90
9 8 7 6 5 4 3 2 1
NUMBER: 66-30104
CONTENTS
AN INTEGRATED COMPUTATIONAL MODEL OF STIMULUS-RESPONSE COMPATIBILITY AND PRACTICE Paul S. Rosenbloom and Allen Newel1 I. Performance: Compatibility and Goal Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . 11. Learning: Practice and Chunking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 26
43 49
A CONNECTIONIST/CONTROL ARCHITECTURE FOR WORKING MEMORY
Walter Schneider and Mark Detweiler I. 11. 111. 1V. V. VI. V11. VIII. IX. X.
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........... Traditional Views of Short-Term Memory A Connectionist/Control Architecture for ............... Interpretation of the Working-Memory Literature . . . . . . . . . . . . . . . . . . . . . . . . Context Effects, Proactive Interference, and Release from Proactive Interference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Skilled Memory, Mnemonics, and Levels of Processing ................... Serial Outputs and Chunking.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Workload and Working M e m o r y . . . . . Working Memory in Learning and Ski1 Final Comments . . . . . . ........... References. . . . . . . . . . . . . . . . . . . . . . . . . ...........
V
54 56 51 71
84 93 101 112
114
vi
Contents
THE INTELLIGENT HAND
Roberta L. Klatzky and Susan .l Lederman I. The Curious Discrepancy between Two Phenomena.. ..................... 11. Haptic Apprehension and Recognition: Theoretical Issues . . . . . . . . . . . . . . . . . 111. Conclusions and Applications ......................................... References ...........................................................
122 128 147 149
SUCCESSIVE APPROXIMATIONS TO A MODEL OF HUMAN MOTOR PROGRAMMING
David A . Rosenbaum Introduction. . . ............. ..... Hierarchical De hoices . . . . . . The Motor-Program Editor Model . . . . . . . . Further Tests of the Hierarchial Decisions Model and Motor-Program Editor Model. . . . . . . . . V. The Hierarchical Editor Model. . . . . . . . . . . . VI. Parallel Editing and Execution. .............. .............. VII. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I. 11. 111. IV.
................................
153
175
179 181
MODULAR ANALYSIS OF TIMING IN MOTOR SKILL
Steven W Keele and Richard I. Ivry I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
................................ 111. Individual Differences in Timing.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Issues in the Study of Timing
IV. V. VI. VII. VIII.
Further Analysis of Force Control and Maxi Individual Differences in Skill . . . . . . . . . . . . Neurological Analysis of Timing ................................ Other Approaches to Modularity. . . . . . . . . . ..... Conclusions ............................
......................................................
183 184 189
203 214 226
ASSOCIATIVE ACCOUNTS OF CAUSALITY JUDGMENT
David R. Shanks and Anthony Dickinson I. 11. 111. IV.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contiguity and Contingency.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acquisition of Causality Judgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blocking by the Causal Background.. ..................................
229 230 236 242
Contents
V. Retrospective Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Comparator Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii 247 249 256 258
ANXIETY AND THE AMYGDALA: PHARMACOLOGICAL AND ANATOMICAL ANALYSIS OF THE FEAR-POTENTIATED STARTLE PARADIGM
Michael Davis. Janice M . Hitchcock. and Jeffrey B. Rosen I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1 . The Fear-Potentiated Startle Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . .. I l l . The Pharmacology of Fear-Potentiated Startle . . . . . . . . . . . . . . . . . . . . . . . . . . . IV. Neural Systems Involved in Fear-Potentiated Startle . . . . . . . . . . . . . . . . . . . . . . . V. Sensitization of Startle by Footshocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Anxiety and the Amygdala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
264 264 267 270 290 292 296 297
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
307
Contents of
317
Recent Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AN INTEGRATED COMPUTATIONAL MODEL OF STIMULUS-RESPONSE COMPATIBILITY AND PRACTICE* Paul S. Rosenbloorn DEPARTMENTS OF COMPUTER SCIENCE AND PSYCHOLOGY STANFORD UNIVERSITY STANFORD, CALIFORNIA 94305
Allen Newell DEPARTMENT O F COMPUTER SCIENCE CARNEGIE-MELLON UNIVERSITY PITTSBURGH, PENNSYLVANIA 15213 I. Performance: Compatibility and Goal Hierarchies . . . . . . . . . . . . . . . . . . . . . . A. Data: Stimulus-Response compatibility B. Model: Goal Hierarchies . . . . . . . . . . . . . C. Results: Compatibility Simulations. . . . . . . . . . . . . . . 11. Learning: Practice and Chunking .................... A. Data: Practice ...................... B. Model: Chunking . , . ....................................... C. Results: Practice Simu ons . . . . . . . . . . . . . . . . . . . . . . III. Discussion.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References ..... ...........................
3
29 38 43 49
Consider the position of a subject in a typical reaction-time experiment. On each trial of the experiment the subject is presented with a stimulus display-visual, auditory, or kinesthetic-containing some information from which he must determine the proper response, which is usually a vocalization or a manipulation by the hands or fingers. A single condition of the experiment defines a particular task environment-a set of possible stimuli (the stimulus environment), a set of possible responses (the response environment), and the mapping of the stimulus displays into the responses. The entire experiment, from the point of view of one within-condition subject, consists of a sequence of trials from this task *This article is condensed from Rosenbloom, P . S. (1986), The chunking of goal hierarchies: A model of practice and stimulus-response compatibility. In J . E. Laird, P. S. Rosenbloom, & A. Newell (Eds.), Universal subgoaling and chunking: The automatic generation and learning of goal hierarchies. Hingham, ME: Kluwer Academic Publishers. Reprinted by permission of the publisher. THE PSYCHOLOGY OF LEARNING ANDMOTIVATION. VOL. 21
1
Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
2
Paul S. Rosenbloom and Allen Newell
environment. This basic paradigm holds over a number of experimental domains, including stimulus-response compatibility and practice. Theories of reaction-time phenomena usually focus on one particular domain, modeling well the data so circumscribed but effectively ignoring that each of these domains represents only a single aspect of what is in fact an integrated performance by the experimental subjects. When it comes time to build an integrated model of performance, this kind of approach lends itself best to a big-switch model. That is, each of the individual domain models is included as an independent component of the overall model, and a big conceptual switch selects the appropriate submodel when the experiment to be explained falls within its domain. The resulting model is little more than the sum of its parts and has special difficulties with situations that require interactions between the submodels. An alternative approach to modeling reaction-time phenomena is to do it in the context of a cognitive architecture (Newell, 1973; Anderson, 1983; Pylyshyn, 1984; Laird, Newell, & Rosenbloom, 1987). A cognitive architecture specifies the set of fixed mechanisms upon which cognition is based. A complete architecture would specify the mechanism underlying learning, memory, problem solving, perception, motor behavior, etc. Human performance in a variety of domains can then be modeled as the interaction between the architecture and knowledge about tasks and strategies. Doing so has a number of potential benefits. First, it ensures that the various domain models can all coexist within a single working system. A big-switch model has problems when a task requires the interaction of two phenomena to be produced by submodels with incompatible assumptions. Second, architectures embody a set of mechanisms which may individually, or through their interaction, produce the desired phenomena without further assumptions. The model will be much more than the sum of its parts when a small set of basic mechanisms interact to produce all of the phenomena. Third, the architectural mechanisms are usually motivated by need-the system will not run adequately without them-while mechanisms hypothesized to explain reactiontime phenomena are usually motivated by the degree to which they match the data. If a mechanism meets both of these criteria, its likelihood of being correct is greatly increased. Fourth, the reaction-time phenomena appear as side effects of a working system actually trying to perform the experimental tasks, just as they do in psychology experiments. And fifth, these studies can be a good way of investigating the nature of the cognitive architecture itself. In this article we present a theory of two reaction-time domains: stimulus-response compatibility and practice. This theory consists of two components: (1) a model of task performance, based on the concept of goal hierarchies, and (2) a model of learning, based on the concept of chunking. The compatibility and practice effects are produced by first constructing models of how subjects are performing specific experimental tasks, and then simulating these models to determine the time required to perform the tasks. The compatibility phenomena arise because of the differences
Stimulus-Response Compatibility and Practice
3
between the task-performance models underlying subject behavior in the different compatibility conditions. The practice phenomena arise because of changes wrought by the learning model to the task performance models. Though these two components are discussed independently in this chapter, they are actually two integral parts of a single system capable of both performance and learning. Learning occurs in stimulus-response compatibility situations, and it is impossible to run a practice experiment without having subjects actually perform some task. The theory is implemented as a goal-based production-system architecture called Xaps3 (described in Rosenbloom, 1983; Rosenbloom & Newell, 1986). Though some of the architectural assumptions in Xaps3 are direct reflections of the theory, most of the assumptions are shared with the wider class of production-system architectures such as Act* (Anderson, 1983) and Ops5 (Forgy, 1981). The Xaps3 implementation of the theory has been used to generate simulated timing results for four experiments from the compatibility and practice literature (Seibel, 1963; Duncan, 1977; Fitts & Seger, 1953; Morin & Forrin, 1962).' The Seibel (1963) experiment has been instrumental in driving our theoretical work on practice (Newell & Rosenbloom, 1981; Rosenbloom & Newell, 1982; 1987). It is used to evaluate the practice predictions of the model. The Duncan (1977), Fitts and Seeger (1953)' and Morin and Forrin (1962) experiments are three of the major stimulus-response compatibility experiments that are relatively free of confounding phenomena. They are used to evaluate the compatibility predictions of the model. In addition, the Duncan (1977) and Fitts and Seeger (1953) experiments provide practice data that can be used as a further evaluation of the practice model, and of the interaction between compatibility and practice. The next three sections present the data to be modeled, the model, and the simulated results. This presentation is divided into a section on the performance model and one on the learning model. Stimulus-response compatibility is discussed in the performance section, and practice is discussed in the learning section. The final section contains a discussion of the model along with some potential objections to it. I. Performance: Compatibility and Goal Hierarchies
In this section we lay out the basis for task performance, that is, how the actions of an experimental subject are determined by the interaction of the experimental stimuli with the subject's existing cognitive structures. This model, based on the concept of goal hierarchies, has been developed to 'Because the model currently says little about the sources of performance errors, this work is focused entirely on timing data. For a speculative discussion of errors in this context, see Rosenbloom (1983).
4
Paul S. Rosenbloom and AUen Newell
model the main stimulus-response compatibility phenomena and to form the basis of performance in practice tasks. The body of this section consists of presentations of the relevant compatibility data, the model of performance, and results generated by the model. A.
DATASTIMULUS-RESPONSE COMPATIBILITY
It was known by the early 1950s that the stimulus and response environments in an experimental situation could not be considered independently (Fitts & Seeger, 1953). The interaction between the two, as defined by the mapping, is often critical. This phenomenon is termed stimulus-response compatibility, Consider a concrete example in which there are two buttons, one above the other, that can be used to summon either an up elevator or a down elevator. In the compatible situation, the upper button summons the up elevator and the lower button summons the down elevator. In the incompatible situation, the relationship is reversed-the upper button summons the down elevator and the lower button summons the up elevator. In the compatible situation, people are faster and make fewer errors. These effects are robust and rather large. The problems encountered in performing in the incompatible situation do not stem from a lack of knowledge about the correct relationship-subjects learn the mapping from stimulusto response before the experiment begins-instead, it is a problem in actually performing the mapping. Turning to the experimental work on compatibility, the most straightforward instances of the phenomena occur when the stimulus and response environments do not vary across conditions; only the mapping varies. In Duncan (1977), the stimulus environment consisted of an oscilloscope on which a vertical line could appear in one of four horizontal positions (top part of Fig. 1). The response environment consisted of a set of four buttons, lying under the fore- and middle fingers of the subject's hands (bottom part of the figure). On each trial of the task, one of the lines would appear on the oscilloscope and the subject was to press the appropriate button. There were three conditions in the experiment, each of which specified a different mapping of line position to button. In the corresponding condition (Fig. la) each line was mapped to the button below it. In the opposite condition (Fig. lb) each line was mapped to the opposite button-the first line to the last button, the second line to the third button, the third line to the second button, and the last line to the first button. In the remaining mixed condition (Fig. lc) half of the combinations (either the inner two or the outer two) were corresponding and the other half were opposite.*
'Duncan actually employed both mixed conditions, one in which the inner two lights were corresponding and one in which the outer two were. However, because we are not currently modeling differences in discriminability, we do not distinguish between these two variations.
* IU
Stimulus-Response Compatibility and Practice
1111
(a) Corresponding
Fig. 1.
5
(c) Mixed
(b) Opposite
The three compatibility conditions in Duncan (1977).
Table I shows the reaction times for the three conditions. Though the mixed condition is only a single condition, each trial is itself either corresponding or opposite. Therefore, the data have been partitioned to reflect this split. The main thing to notice at the moment is that though the stimulus and response environments remain unchanged, manipulating the mapping yields differences in the time it takes to perform the task. The opposite trials are consistently slower than the corresponding trials, and the trials in the mixed condition are slower than the analogous ones in the nonmixed (pure) conditions. In fact, the two factors appear to be independent, with an extra 60 msec for the opposite mapping, and 100 msec for a mixed condition. Even when the stimulus and response environments are modified across conditions, it is the mapping that accounts for most of the variance. Fitts and Seeger (1953) reported a nine-condition experiment in which three stimulus environments were crossed with three response environments. Table I1 shows the apparatuses used to define these environments as well as the mean reaction times for the nine conditions. Stimulus apparatus S, contains eight lights at the 45 ’points of a circle. On each trial in which it is used, exactly one light goes on. Stimulus apparatuses S, and S, both contain four lights. On each trial either one light comes on or a pair of adjacent lights (with respect to the circle) comes on. With apparatus S , the four lights are at the 90 O points of a circle. With apparatus Sc the display is twice as wide, the horizontal lights in one half, and the vertical lights in the other. Adjacency for apparatus S, is defined as if the lights were still in the circle of apparatus S,. The light on the far left is “at” - 90 O, the middle light is “at,’ 90O , the top-right light is “at” O”, and the bottom-right light is “at” 180”. TABLE I MEANREACTION TIMES( I N MSEC)A N D MARGINAL DIFFERENCES FOR THE FOUR TYPESOF TRIALSI N DUNCAN (1977)
Pure Mixed A
Corresponding
Opposite
A
43 1 529 98
489 590 101
58 61
Paul S. Rosenbloom and Allen Newell
6
RA
s,
... .
s,
. .
s,
.. . .
RE
R,
->K- €3 - 1 390
430
580
450
410
580
770
580
480
The three response apparatuses are defined analogously to the stimulus ones. In response apparatus R, there is a lever that can be pushed toward any of the eight 45 O angles. When used in conjunction with S,, the lever is pushed in the direction of the light. With S, and S,, if one light is on, the lever is pushed in that direction; if two lights are on, then the lever is pushed toward the mean of the two angles. For example, if the right and top lights (which actually appear in the middle and top-right of the display, respectively) in apparatus S, are on, then the lever should be pushed at a 45 O angle. Response apparatus R, allows the lever to be pushed at only 90 angles. When it is used with either stimulus apparatus S, or S, , the lever is pushed in each direction specified by an on light. This may require the lever to be pushed either once or twice. When used with stimulus array S , the lever is pushed once if the light is at a multiple of 90 O and twice otherwise (at an angular displacement of + 45 O and - 45 O from the light that is on). Response apparatus R, is analogous to R, except that it requires two hands to manipulate, one for each of the two orthogonal directions. For all three response apparatuses the reaction time is measured up until the first movement is begun. Because movement time is not included, two movements need not take longer than one. The first thing to notice about the reaction times for these conditions is that in each row and column in the table the fastest reaction time belongs to the one on the main diagonal. For each stimulus apparatus there is a different response apparatus that produces the fastest times. In the analysis of variance reported by Fitts and Seeger, the effects of stimulus apparatus and response apparatus individually were significant, but in their words, “The variance that can be attributed to interaction is very much larger than the variance attributable to the primary effects of either stimulus or response sets alone” (p. 204).
Stimulus-Response Compatibility and Practice
7
This experiment also reveals that just labeling conditions as compatible and incompatible can miss a good deal of what is going on. Though the conditions on the main diagonal are compatible and those off the diagonal are incompatible, some of the incompatible times are faster than some of the compatible times. A theory of stimulus-response compatibility should explain these complexities. Both the Duncan and the Fitts and Seeger tasks are spatial tasks, but the phenomena of compatibility are not so limited, Morin and Forrin (1962) used a set of five symbolic tasks (Table 111). In Conditions I and IV, each trial consisted of the visual presentation of an arabic numeral to which the subject was to respond by saying the number (for example, see “2”, say two). The conditions differed in the number of alternatives in the task environment (2 and 4, respectively). In Conditions I1 and V, each trial consisted of the visual presentation of a symbol ( + , D, , or A ) to which the subject was to respond by saying a number that had been associated with it (4,7,2, and 8, respectively). Again the conditions differed in the number of alternatives that could be presented. Condition I11 was a mixed condition in which the stimulus environment consisted of two numbers (2 and 8) and two symbols ( + and .). In Table 111, Condition I11 has been split according to whether a number (IIIa) or a symbol (IIIb) appeared as a stimulus on the trial. The reaction times for these conditions divide up into three groups separated by about 100 msec each. Conditions I and IV are the fastest (the “compatible” conditions), at around 500 msec. At around 600 msec we find Conditions I1 and IIIa, and at around 700 msec we find Conditions IIIb and V. B. MODEL:GOALHIERARCHIES
The performance model is based on work in three different areas: (1) stimuus-response compatibility theory, (2) applied information-processing psychology, and (3) artificial intelligence. TABLE I11 COMPATIBILITY CONDITIONS AND RT (IN MSEC) MORIN AND FORRIN (1962) Condition
I
S-R Pairs 2-2 8-8
RT
Condition
520
I1
+ -4 .-7
490
v
+-4 .-7
FROM
S-R Pairs
RT 590
IIIa IV
2-2 8-8
44 7-7
0-2
A-8
720
8
Paul S. Rosenbloom and Allen Newel1
1. Stimulus-Response-Compatibility Theory
Though the phenomena of compatibility are important-especially in human factors, where the relationship between stimulus displays and simple manual controls is critical-and have been studied since the early 1950s, there are still no useful theories of compatibility. Welford (1980) sums up this situation nicely: “Surprisingly, after more than twenty years’ research there is no metric for compatibility that transcends particular experimental conditions” (p. 99). Despite the lack of any complete theories, some theoretical statements have been made about aspects of compatibility. The earliest and most widely accepted is that compatibility is a function of the transformations or encodings that must be performed on the stimulus to yield the response. Deininger and Fitts (1955) put it this way: “The fundamental assumption for the model used in developing the S-R compatibility concept is that all perceptual-motor performance involves one or more steps of information transformation, since the response must always be encoded in a manner different from the stimulus” (p. 318). In addition, they went on to propose one other factor they felt was important-whether “the pairings of stimulus and response elements agree with strong population stereotypes.’’ Brebner (1973) took the Deininger and Fitts model one step further, by considering what happens in a multistage transformation between stimulus and response. He proposed that in more compatible situations a single recoding or translation process can be applied to all of the steps. For example, Brebner’s hypothesis would imply that it is easier to do a string of three additions than two additions with a subtraction sandwiched between them. The closest anyone has come to a metric was in the work of Morin and Grant (1955), though it was still for only one set of task variations. They examined a set of tasks based on eight lights and eight response keys, in which the mapping of light to key could be changed. On each trial, a pattern of lights was presented and the subject pressed the appropriate keys. Morin and Grant compared reaction time with the rank correlation of the mapping between the stimuli and response (that is, the serial positions of the lights were compared with those of the buttons associated with them) and found an effect of the absolute value of the correlation (a quadratic effect) and the sign (a linear effect). Shepard (1961) showed that a partial account for this effect can be obtained by looking at the stimulus and response generalizations (or confusions) and the permutation matrix defining the mapping between stimuli and responses. To explain the results of mixed conditions, such as in the Duncan (1977) study described earlier, Smith (1977) proposed a parallel-iterative scheme. In this theory, each S-R pair has an association time, reflecting some unspecified function of the compatibility of that pair. Each iteration of the
Stimulus-Response Compatibility and Practice
9
process takes time proportional to a weighted sum of the association times for all of the pairs in the condition (not just of the pair for the current trial). Across iterations, excitation of the responses accumulates until one response reaches its threshold. Total reaction time is therefore the sum of the times required for all of the iterations until a response. This theory predicts that the time to do a mixed mapping will be between the times to do the respective pure mappings, because the corresponding mapping consists of four fast associations, the mixed mapping of two fast and two slow associations, and the opposite mapping of four slow associations. It also predicts that the addition of more S-R pairs to the condition will always raise the reaction time for the other pairs (though the effect may be small if the association times for the new pairs are small). Smith’s theory treated each S-R pair as a distinct component in the response selection process. In contrast, Duncan (1977) proposed that “responses may be selected, not on the basis of individual S-R associations, but by use of a rule or system of rules.” Duncan concluded that there were two factors determining spatial stimulus-response compatibility: Spatial CRT [Choice Reaction Time] is influenced by two different properties of the mapping. One is the spatial relationship of individual S-R pairs, this is, in terms of the proposed model, which transformation must be made. The other is the number of different relationships in the whole mapping; that is, the number of transformations from which selection must be made. (p. 60)
2. Applied Information-Processing Psychology
Card, Moran, and Newel1 (1980, 1983) proposed that, for tasks involving cognitive skill, a procedural representation of task performance provides good approximations to human performance. They have worked this out for one domain, computer text-editing and text-based command language interaction. They showed that models based on the concepts of goals, operators, methods, and selection rules-GOMS models-provide an excellent account of the behavior of human subjects. When this idea is paired with the earlier theoretical ideas, it yields the image of performance in compatibility tasks as being mediated by procedures or algorithms. An algorithm for a condition in a stimulus-response compatibility experiment specifies a sequence of operations that will compute the correct response for any stimulus that the subject may face during that condition. This idea is the basis for the GOMS model of stimulus-response compatibility. Compatibility phenomena are produced because the more incompatible tasks require algorithms that take longer to perform. In ’This model was referred to as the algorithmic model of stimulus-response compatibility in Rosenbloom (1983). The name has been changed to emphasize its relationship to the GOMS framework.
10
Paul S. Rosenbloom and Allen Newell
Duncan’s terms, this is either because the “rule” for the condition is complex or because there are many rules for the condition (a mixed condition), necessitating numerous tests and branches in the algorithm. Given the specification of the GOMS model, the path to a compatibility metric is straightforward. The first step is to analyze the task situations to ascertain the algorithms employed in their performance. There may be one or more such algorithms, reflecting different strategies being employed by different subjects, or the same subject at different times (see Newell, 1973; Baron, 1978, for discussions of the issues surrounding the availability of multiple methods). One approach to the development of task algorithms is to perform a set of experiments geared toward ascertaining the algorithms actually used by subjects in the various task situations. Another approach is to perform an abstract task analysis similar in nature to the process a programmer goes through in developing an algorithm for a task. In this article we use a form of this abstract task-analysis approach exclusively. However, to increase the likelihood of developing algorithms that reflect what human subjects actually do, we have imposed two biases on this algorithmdevelopment process. The first, and most important, bias is to not include mechanisms in the algorithms that violate in significant ways what is known about the capabilities and limitations of the human cognitive architecture. The second bias is to assume that subjects will tend to use the simpler (faster) algorithms when many are possible. Once algorithms have been derived, the second step is to perform a complexity analysis of the algorithms (Aho, Hopcroft, & Ullman, 1974). This involves assigning cost measures (i.e., amount of time to perform) to the primitive steps in the algorithm and determining how many of each type of step would be executed in the process of performing a trial. Given these values it is relatively straightforward to predict the mean reaction times for the conditions of the experiment. For each algorithm that might be used in a particular condition, the mean cost of performing a trial in that condition is determined. If there is more than one algorithm for the condition, then the costs from the different alternatives must be merged into a single cost for the condition. Since we have no data on the relative frequencies of the alternatives, we make the minimal (Laplacian) mixing assumption: assume the algorithms are equally likely (across subjects) and take their mean behavior. We have elsewhere presented an approximate version of the GOMS model of compatibility that facilitates quick back-of-the-envelope style calculations for predicting relative reaction times between conditions in compatibility experiments (Rosenbloom, 1983) and shown how such a
Stimulus-Response Compatibility and Practice
11
model can be useful in the domain of human-computer interaction (John, Rosenbloom, & Newell, 1985; John & Newell, 1987). In this article we present a goal-hierarchy version of this model that, while more complex than the GOMS model, fits in with current notions about cognitive architecture and facilitates its integration with the model of learning. 3. Artificial Intelligence
Goal hierarchies are a common control structure for the kinds of complex problem-solving systems found in artificial intelligence (Rich, 1983), but this is the first time they have been applied to the domain of reactiontime tasks. The foundation concept of the goal hierarchy is that of the goal-a data structure representing a desired state of affairs. A goal is not a procedure for bringing about the desired state; it is only a description of the state. In order to bring about the goal state, there must be one or more methods associated with the goal. A method could be a rigid algorithm, or it could be one of the more flexible weak methods (Newell, 1969) such as means-ends analysis (Emst & Newell, 1969) or heuristic search (Nilsson, 1971). The current model is overly simplistic in that it conflates the properly distinct notions of goal and method into a single active concept. These conflated “goals” are active processes, much like functions, that take a set of parameters and return a set of results. This simplified treatment is sufficient for the reaction-time experiments modeled here because these tasks require very little in the way of sophisticated problem solving. More complex tasks require a more sophisticated model (see Laird, 1983; Laird et al., 1987, for one such model, which is also closely related to the current work). A single goal generates a goal hierarchy when the goal can be decomposed into a set of simpler goals, and those goals can be decomposed even further. The recursion terminates when the goals are so simple that they can be attained directly. To a large extent the terminal goals in the hierarchy do the actual work. Nonterminal (or internal) goals create the control structure that organizes the performance. Figure 2 shows the goal hierarchy we developed for the Duncan corresponding task. The top half shows the bare bones of the hierarchy as a tree structure. The nodes are labeled with a number representing the order in which the goals are processed in a depthfirst fashion, In depth-first processing there is always exactly one goal being actively worked on at any point in time. We refer to this goal as the active or current goal. When a goal generates a subgoal, the subgoal becomes the active goal, and the parent goal is suspended until control is returned to
12
Paul S. Rosenbloom and Allen Newell
1. Press-Button-Under-Stimulus-Line
2.Get.Horizontal-Location.Of-Stimulus.Line
3. Get-Stimulus-Line 4. Get-Horizontal-Location-Of-Stimulus 5.Press-Button-At-Horizontal-Location
Fig. 2.
Goal hierarchy for the Duncan (1977) corresponding condition.
it by completion of the subgoal, at which point it again becomes the current goal. The bottom half of Fig. 2 shows how the goal hierarchy is traversed in a depth-first fashion. In this representation the tree structure loses its clarity, but the depth-first order in which the goals are processed becomes clear. In addition, the node labels can be expanded out to the full names of the goals. In both representations the bold-faced goals are the terminal goals. The hierarchy for the Duncan corresponding task is quite simple, consisting of only five goals of which three are terminals. This simplicity reflects the inherent simplicity of the task. The top-level goal (PressButton-Under-Stimulus-Line) is responsible for accomplishing the whole task. It is broken down into two subtasks: determining the horizontal location at which to press a button (Get-Horizontal-Location-Of-StimulusLine), and actually pressing the button (Press-Button-At-HorizontalLocation). The first subtask is itself decomposed into two smaller tasks: generating an internal representation for the line perceived in the stimulus display (Get-Stimulus-Line), and retrieving the horizontal location of the light from the representation of the stimulus (Get-Horizontal-Location-OfStimulus). This hierarchy assumes that the stimulus and response locations are represented in such a way that the stimulus location can be used directly as the location for the response. In general, subjects have a degree of flexibility in spatial tasks as to what type of representation to use for the task’s spatial information. They may use rectangular coordinates, polar coordinates, or something else entirely. In this article we make the assumption that subjects employ the coordinate system that they are led to by a combination of the instructions and the surface structure of the stimulus and response environments. If the stimulus consists of a circle of lights, subjects will tend to use polar coordinates; if it is a linear display, rectangular coordinates will be
Stimulus-Response Compatibility and Practice
13
appropriate. If the stimulus and response environments have different spatial properties, different coordinate systems may be used for them. This coordinate-system assumption is a specialization of the strong psychological assumptions that subjects create problem spaces for task performance from the surface structure of the task (Hayes & Simon, 1974; Simon & Hayes, 1976). In the goal hierarchies for the Duncan tasks, a rectangular coordinate system has been used for both the stimulus and response environments. The origin is at the center of the stimulus display (and the response apparatus). We also assume that subjects define the two coordinate spaces so that they just bound the stimulus display and response apparatus, respectively. The minimal horizontal location corresponds to the location of the leftmost line in the stimulus display, and to the leftmost button in the response apparatus; the maximal horizontal location corresponds to the location of the rightmost line in the stimulus display, and to the rightmost button in the response apparatus. This assumption allows the analogous locations in the two environments to be directly linked, making the horizontal location of the desired button identical to that of the stimulus line. The hierarchy for the Duncan opposite condition (Fig. 3) is a slightly augmented version of the hierarchy for the corresponding condition. Goals 2,3, and 4 behave identically, determining the horizontal location of the stimulus line. But now instead of pressing the button at that location, the location must first be inverted (Compute-Opposite-Horizontal-Location) and then the button pressed at the inverted location (Press-Button-AtOpposite-Horizontal-Location).Neglecting to perform the inversion operation can result in what Duncan ( 1 977) referred to as corresponding errors; that is, making the corresponding response when the opposite one is
I 1
I
1 . Press-Button 0pposite.Stimulus-Line 2. Get-Horizontal-Location-Of-Stimulus-Line 3. Get-Stimulus-Line 4. Get-Horizontal-Location-Of-Stimulus 5. PresButton-Opposite-Horizontal.Location 6. Compute-Opposite-Horizontal-Location 7. Press-Button-At-Opposite-Horizontal-Location
Fig. 3.
Goal hierarchy for the Duncan (1977) opposite condition.
14
Paul S. Roseobloom and Allen Newell
appropriate. Such errors could also occur if there was a confusion as to which of the two locations to press. In addition to assuming a task goal hierarchy, the performance model assumes a working memory for the storage of the short-term information relevant to the processing that is going on, such as goals, parameters, and results. If the fourth goal in Fig. 3 has just been completed, the working memory would consist of Goals 1 and 2 (all of the active and suspended goals), the horizontal location of the stimulus line (the result of Goal 4), plus other stimulus information and goal results. This assumption is taken from the research on working memory in cognitive architectures (see, for example, Anderson, 1983; Newell, 1973) For each goal, the working memory is logically partitioned into two components: the initial state and the local state. The initial state consists of the data existing at the time the goal is first activated. The remainder of the working memory-consisting of the data created during the processing of the goal-makes up its local state. Only the local state of a goal can be modified during the processing of the goal; the initial state can be examined but not modified. The modularity resulting from this scoping rule increases the likelihood that an arbitrary set of goals can be pursued without interfering with each other and is important in ensuring correct performance of the chunking mechanism. We define the parameters of a goal to be those pieces of its initial state that were examined during the processing of the goal. For example, the horizontal location of the stimulus line will be a parameter to Goal 5 (Press-Button-At-Horizontal-Location) in Fig. 2 because that information must be examined in order to press the button at the right location. The results of a goal are those pieces of its local state that must remain available after the goal has terminated; the rest of the goal’s local state is removed from working memory at that time. The horizontal location of the stimulus line is a result of Goal 4 (Get-Horizontal-Location-OfStimulus). It is computed by the goal, yet must stay around after the goal has terminated so that Goal 5 can use it to press the button at the correct location. When a goal terminates it has either succeeded or failed. On goal failure the system only knows that the goal has failed; it has no direct way of ascertaining why it failed. The goal may have failed because of the lack of an appropriate parameter, because some test has failed, because accomplishing the goal would have required too much time, or for any number of reasons. On goal success, most goals return a set of results to their parent goals. However, it is useful to talk about a subclass of goals called predicates that generate no symbolized results. Predicates are used to test for the truth of various conditions. If a predicate succeeds, its condition is true. If it fails, the condition is either false or the goal did not know how to test the condition
Stimulus-Response Compatibility and Practice
15
in the current situation. Predicates become operational through the use of brunches. Branches differentially generate new subgoals based on the status of previously completed goals; that is, one subgoal can be generated if the goal has succeeded, and a different one can be generated if the goal has failed. These concepts are illustrated by the goal hierarchy for Duncan’s mixed condition (Fig. 4). The branches are shown in the top part of the figure as dashed lines between the predicate (which is starred) and the goal to attempt next. The lines are labeled with either an S for succeeded or an F for failed. In the bottom part of the figure the branches are given as IF-SUCCEEDED . . . THEN. . . or IF-FAILED . . .THEN . . . statements. Notice that the test O f a predicate’s status, as occurs in a branch, is logically distinct from the actual evaluation of the predicate. The predicate, which is just a particular terminal goal in the hierarchy, is evaluated when it is reached in the depth-first processing of the goal hierarchy. The branch simply checks the status of the completed predicate. This hierarchy begins, as do the previous two structures, by determining the horizontal location of the line in the stimulus display. The position of the line is then tested to determine whether the line is central or distal. Because predicate failure does not necessarily imply falsity, both sides of the decision must be tested to assure correct behavior (Is-HorizontalLocation-In-The-Middle? and Is-Horizontal-Location-Outside-OfMiddle?).
I
n
1
1. Press-Button-At-Or-Opposite-Stimulus-Line 2. Get-Horizontal.Location.Of.Stimulus.Line
3. Get-Stimulus-Line 4. Get-Horizontal-Location-0f.Stimulus 5. Press-Button-At-Or-Opposite-Horizontal-Location 6. Is-Horizontal-Location-In-The-Middle? IF-SUCCEEDED Is-Horizontal-Location-In-The-Middle3 THEN
7.Press-Button-0pposite.Horizontal.Location 8.Compute.Opposite-Horizontal-Location 9. Press-Button.At.Opposite-Horizontal-Location IF-FAILED Is-Horizontal-Location-In-The-Middle? THEN 10. Possibly-Press-ButtonAt-Horizontal-Location 11. Is-Horizontal-location-Outside-Of-Middle? IF-SUCCEEDED Is-Horizontal-Location-0uts;de-Of.Middle~ THEN 12 Press-Button-At-Horizontal-location
Fig. 4.
Goal hierarchy for the Duncan (1977) mixed condition.
16
Paul $3. Rosenbloom and Allen Newell
The order in which these predicates are tested is arbitrary, so there is another goal hierarchy for the same task for which the only difference is the reversal of the test order. Based on the results of the predicates, the hierarchy branches to one of two subgoals. In this particular variant of the mixed condition, the middle two lights are mapped opposite, while the outer two are mapped corresponding. So, if the line is in the middle (an opposite trial), the hierarchy branches to goal Press-Button-Opposite-Horizontal-Location, which is the same as Goal 5 in the opposite hierarchy. If the test of centrality fails, the distal test is performed. If this test succeeds (a corresponding trial) the hierarchy branches to goal Press-Button-At-Horizontablocation, which is the same as Goal 5 in the corresponding hierarchy. Implicit in the hierarchy for the mixed condition is a model of choice reaction time-how reaction time varies with the number of alternatives from which the subject can choose. It is the Positive-Check Model discussed by Welford (1980). The alternatives are broken up into two classes (if possible), and the first class is checked. If it is correct, the choice has been made; otherwise the second class is also checked (that is, “positive check” means that a branch cannot be taken on an ELSE condition). The whole process continues in a hierarchical fashion as long as necessary. The positive-check model gives results that agree with Hick’s law (Hick, 1952): choice reaction time is approximately proportional to the logarithm of the number of choices (see Rosenbloom, 1983, for more on the relationship between our model and choice reaction time). The Duncan mixed hierarchy highlights another important aspect of task algorithms. The complexity of the hierarchy is a function of the entire task environment. Even though any particular task may not be too complex, the combination of all the tasks in the environment can lead to a complex hierarchy. In this task environment the subject may have to de either the corresponding task or the opposite task on each trial. It is this combination that makes the mixed hierarchy more complex than the two pure ones. General rule-like behavior, as described by Duncan (1977), occurs in our model when a single, branchless hierarchy is used for all of the trials in a condition. The more branches that exist to subhierarchies, the less rulelike the behavior is and the more the model looks like a collection of individual stimulus-response associations. For the mixed conditions we expect to find something in between: a hierarchy with a single branch point to the subhierarchies for the two subconditions. C. RESULTS: COMPATIBILITY SIMULATIONS Simulated timings for the three compatibility experiments have been generated by writing one or more goal hierarchies for each condition and
Stimulus-Response Compatibility and Practice
17
then running them within the Xaps3 architecture. Xaps3 is a conventional production system in most ways; it consists of a global working memory for the temporary storage of information, and a long-term production memory that can contain a set of productions (if-then rules). Productions are used to create subgoals, to perform branches, and to generate the results of subgoals. Processing follows the recognize-act cycle, in which first all of the productions are matched to working memory to determine which productions are eligible to execute, and then one or more of the eligible productions are executed. Xaps3 differs from other production-system architectures in the details of these mechanisms and from most in its use of goals and chunking; in this it is closest to Grapes (Sauers & Farrell, 1982) and Soar (Laird, 1986). The recognize-act cycle (also called the production cycle, or just the cycle) is the basic unit of time in the model; it is explicity assumed that each cycle takes the same amount of time. The simulated times for each experimentalcondition were computed by executingthe goal hierarchy for that condition (encoded as a set of productions) and counting the number of production cycles occurring between the start of the simulation and the first response made. 1. Duncan (1977) The goal hierarchies for the Duncan conditions have been presented and discussed above. The simulated times for the corresponding and opposite hierarchies are 15 and 21 cycles respectively. Two versions of the mixed hierarchy (Fig. 4), corresponding to the two alternative test orders, were simulated. The results from these two simulations were averaged to yield net values for the mixed condition. For mixed corresponding trials, the two hierarchies required 25 and 20 cyles respectively, for an average of 22.5 cycles. For mixed opposite trials the two values were 26 and 31, for an average of 28.5 cycles. In Fig. 5 these results are compared with the experimental data from Table I. The figure and the following regression equation reveal that the simulation models the data quite well. The regression equation provides an estimate of 12 msec for the time per production-system cycle. RT
=
12.0 x Cycles
+ 250,
rz = 0.981
(1)
2. Fitts and Seeger (1953)
For the Fitts and Seeger (1953) experiment we have developed hierarchies for the four conditions involving stimulus and response apparatuses A and B.4 For these tasks, a polar coordinate system with origin at the center ‘There are added complexities in those conditions involving the C apparatuses that have not yet been tackled.
Paul S. Roseobloom and Allen Newell
18
3001
0
10
20
30
40 50 Xaps3 cycles
Fig. 5. Regression of reaction time against simulated cost for Duncan (1977).
of the circle of lights is used. Polar coordinates are used for the response apparatus; however, the origin is always reset to be at the current location of the lever. In condition S A - R A the subject must simply push the response lever at the same angle as the light that is on. This is a very compatible situation, with a goal hierarchy (Fig. 6) analogous to the one for the corresponding condition in Duncan's task (Fig. 2). There are three terminal goals that (1) get a description of the light that is on; (2) get the angle of that light with respect to the center; and (3) push the lever at that angle. The cost to execute this hierarchy is 15 cycles. In condition S,-RB ,the subject is always presented with one light on, but must make one or two movements depending on the angle of the light. The hierarchy for this task (Fig. 7) uses a strategy of branching to one of two subhierarchies depending on whether the angle of the light that is on is a multiple of 90". If it is, the lever is pushed in that direction; otherwise, the lever is pushed at the 90O angles on either side of the stimulus angle. As in the actual experimental situation, the simulated clock for both this condition and condition SB-R, runs only up until the first movement has begun. The order of the two tests (Is-Angle-Divisible-by-W?and Is-Angle-NotDivisible-By-90?) is arbitrary, so there are two equivalent variations on this hierarchy to be simulated. Each simulation provided two kinds of data points: trials for which the stimulus angle was a multiple of 90",and trials for which it was not. The simulated times were 20 and 25 (mean 22.5) for multiples of 90 O, and 33 and 28 (mean 30.5) for nonmultiples. These values are averaged to yield the final cost of 26.5 production cycles. We should expect to find considerable variance in the results for this condition because the two kinds of trials differ by 8 cycles, or 96 msec if we use the timekycle from the Duncan simulations.
Stimulus-Response Compatibility and Practice
19
1. Push-Lever-Towards-0n.Light-Stimulus
2. Get-Angle-Of-On-Light-Stimulus 3.Get-On-Light-Stimulus 4. Get- Angle-Of-Stimulus 5. Push-Lever- At- Angle
Fig. 6. Goal structure for condition S,-R,
in Fitts and Seeger (1953).
When stimulus apparatus SBis employed, either one or two lights are on during every trial. When S, is paired with response apparatus R A , the stimulus pattern is always converted into a single push of the lever. The goal hierarchy for condition S B - R A (Fig. 8) starts in the same fashion as the previous two. It gets the description of an on light from the stimulus display and retrieves the angle of the light from the description. This process always yields exactly one on light; if there are two lights on, then one of them is selected arbitrarily. At this point, the hierarchy branches to one of two subhierarchies according to whether one or two lights are on. If there is only one light on, the lever is pushed in the direction of that light. If there is more than one on light, the second on light must be found (the on light not at the angle of the first on light), and the angles of both lights must be determined. The lever is pushed between the two on lights (equivalent to averaging angles). There are two variations on this hierarchy corresponding to the two orderings of the tests, and two types of trials in each variation: one or two lights on. When one light is on the two versions cost 20 and 25 cycles (mean 22.5). With two lights on they cost 45 and 40 cycles (mean 42.5). Thus,
1. Push-Lever-OrthogonaCTo-On-LightStimulus
2. Get-Angle-Of-On-Light-Stimulus 3.Get-On-Light-Stimulus 4. Get- Angle-Of-Stimulus 5. Push-Lever-At-Orthogonal-Components-Of-Angle 6.Is-Angle-Divisible-By-90? IF-SUCCEEDED Is-Ang/e-Divisib/e-By-90?THEN
7. Push-Lever-At-Angle IF-FAILED /s-Ang/e-Divisib/e-By-90? THEN
8. Possibly-Push-Lever-At-2-Orthogonal- Angles 9. Is-Angle-Not-Divisible-By-90? IF-SUCCEEDED /s-Ang/e-Not-Divisib/e-By-90? THEN
10. Push-Lever-At-2-Orthogonal-Angles 1 1 . Push-Lever-At-Angle-Plus-45 12. Compute-Plus-45-Response- Angle 13. Push-Lever- At-Response- Angle 14. Push-Lever-At-Angle-Minus-45 15. Compute-Minus-45-Response- Angle 16. Push-Lever- At-Response- Angle
Fig. 7.
Goal structure for condition SA-RB in Fitts and Seeger (1953).
20
Paul S. Rosenbloom and Allen Newel1 1. Push-Lever-At-Or-Between-On-Light-Stimuli 2. Get-Angle-0f-0n-Light-Stimulus 3. Get-On-Light-Stimulus 4. Get- Angle-Ot-Stimulus 5. Push-Lever-At-Or-BetweenAngles 6. One-On-Light? IF-SUCCEEDED One-On-Light? THEN
7. Push-Lever- At-Angle IF-FAILED OfIe-On-Lighf? THEN
8. Possibly-Push-Lever-BetweenAngles 9. Many-On-Lights? IF-SUCCEEDEDMafly-OfI-LighfS? THEN
10. Push-Lever-Between-Angles 11. Get-Angle-Between-Angles 12. Get-Angle-Of-Second-On-Light-Stimulus 13. Get-Second-On-Light-Stimulus 14. Get-Angle-Of-Stimulus 15. Compute-Mean- Angle 16. Push-Lever- At-Mean-Angle
Fig. 8. Goal structure for condition S r R A in Fitts and Seeger (1953).
the overall mean is 32.5. The large ranges imply that there is even more variance in this condition than in condition SA-RB. In condition SBRB, the lever must be pushed in each of the directions in which there is a light on (either one or two directions). In contrast to the other conditions, there are two qualitatively different goal hierarchies that subjects may be employing for this condition (Fig. 9). This is because it is the only condition in which a response can legitimately be made both before and after a decision: in Condition SA-RAnodecision is required at all; in the two cross-conditions (SA-RBand SBR,J, a manipulation of the angle may (or may not) be required before the first response is made, necessitating a decision before any response is made. The faster of the two different hierarchies for condition SBR, (the top hierarchy in the figure) supposes that the subject begins by making the total response for one light, and only then checks to see if there is another light on. The first hierarchy takes a constant time of 17 cycles irrespective of both the order of the predicates and whether there are one or two lights on. This happens because the first response is generated before either predicate is tested. The system does not worry about those details until it has responded to one light. If instead the subject decides at the very beginning how many lights are on, he is using the second hierarchy (the bottom hierarchy in the figure). This hierarchy does predict different values as a function of predicate ordering and number of lights. For one light on, the two variations generate times of 20 and 25 cycles (mean 22.5). When two lights are on, the times are 27 and 22 (mean 24.5). The net result is 23.5 cycles.
Stimulus-Response Compatibility and Practice
21
1. Push-Lever-Towards-1.Or.2-On-Light-Stimuli( # 1) 2. Push-Lever-Towards-On-Light-Stimulus 3. Get-Angle-0f-On-Light.Stimulus 4. Get-On-Light-Stimulus 5. Get- Angle-Of-Stimulus 6. Push-Lever-At-Angle 7. Done-Or-Second-On-Light 8. One-On-Light? IF-FAILED One-On-Light? THEN
9. Possibly-Push-Lever-Towards-Second-On-Light.Stimulus 10. Many-On-Lights? IF- SU CCEEDED M a n y - 0n - L ;ghtS? THEN 11. Push-Lever-Towards.Second-On-light-Stimulus 12. Get-Angle.01-Stirnulus-Not- At-Response-Angle 13. Get-Angle-Of- Response 14. Get-Angle-Of-Second-On-Light-Stimulus 15. Get-Second-On-Light-Stimulus 16. Get-Angle-Of-Stimulus 17. Push-Lever-At-Angle ( # 2) 1. Push-Lever-Towards1-Or-2-On-Light-Stimuli 2. One-On-Light? IF-SUCCEEDED One-On-Light?THEN
3. Push-Lever-Towards-On-Light-Stimulus 4. Get-Angle-Of-On-Light-Stimulus 5. Get-On-Light-Stimulus 6. Get-Angle-Of-Stimulus 7. Push-Lever-At-Angle IF-FAILED One-On-LIghf7 THEN
8. Possibly-Push-Lever.Towards-2-On-Light-Stimuli 9. Many-On-Lights? IF-SUCCEEDED Many-On lights? THEN
10. Push-Lever-Towards-2-On-Light-Stimuli 11. Push-Lever-Towards-On-light-Stimulus 12. Get-Angle-Of-On-Light-Stimulus 13. Get-On-Light-Stimulus 14. Get-Angle-Of-Stimulus 15. Push-Lever-At-Angle 16. Push-Lever-Towards-Second-On-Light-Stimulus 17. Get-Angle-Of-Stimulus-NotAt-Response-Angle 18. Get- Angle-Of-Response 19. Get-Angle-Of-Second-On-Light-Stimulus 20. Get-Second-On-Light-Stimulus 21. Get-Angle-Of-Stimulus 22. Push-Lever-At-Angle Fig. 9.
Goal structures for condition SrRB in Fitts and Seeger (1953).
There is a conflict in what “equally probable” means for this condition. It could mean that the two major variations are equally likely, with the minor variations equally likely within a major variation. On the other hand, it could mean that all variations are equally likely. The first definition yields an estimate of 20.25. while the second estimate yields 21.3. There is no good
Paul S. Rosenbloom and Men Newell
22
justification for averaging these two values together, so we just take one of them (20.25) and note that they do not differ by much. (The value of r2is the same for both.) Figure 10 and equation 2 summarize the results for the four conditions of this experiment. The fit is extremely good. However, the time/cycle (3.4 msec) is off by almost a factor of 4 from the time predicted from the Duncan simulations. This topic is picked up again below. RT = 3.4 x Cycles
+ 340,
r2 = 0.999
(2)
3. Morin and Forrin (1962) In this experiment there were five conditions and six data points of interest (Condition Ill is split into two cases), but there are only four goal hierarchies. The first and simplest hierarchy (Fig. 11) is sufficient for both Conditions I and IV: read one out of a set of either two or four numbers. The rulelike nature of the goal hierarchies is crucial here because it allows a single hierarchy to be used for reading numbers aloud, no matter how many alternatives there may be. There is one qualification on this statement: it assumes that reading numbers aloud is so well practiced that determining the vocal response to a visual numeral can be done in a single operation. If there is a relatively unlearned mapping (as appears in the other conditions of this same experiment), making that connection is more complex. The hierarchy for Conditions I and IV is analogous to the easiest conditions of the previous two experiments. The subject must (1) get a description of the stimulus (Get-Stimulus-Object), (2) get the name (such as “2”) from that description (Get-Stimulus-Name), and (3) say that name (Say-Name). The hierarchy requires 14 cycles to execute. 4
0
800-
8 v)
9 700. 8
f
z
600-
C 0
._ c
8
500-
400t
300l
0
10
20
30
40 50 Xaps3 cycles
Fig. 10. Regression of reaction time against simulated cost for Fitts and Seeger (1953).
Stimulus-Response Compatibility and Practice
23
1 . Read-Number-Say-Number
2. Get-Stimulus-Object
3. Say-Stirnulus-Name 4. Get-Stimulus-Name 5 . Say-Name
Fig. 11.
Goal structure for Conditions I and IV in Morin and Forrin (1962).
Condition II-say the number associated with either a plus or a square-is like Condition I, except for the fact that the association between the visual stimulus and the vocal response is not well learned. It is assumed that while the subject assimilated the instructions for this condition, he created terminal goals like Get-Name-Of-Plus-Number that when executed give the number that was associated with the symbol + . The subject’s main problem, therefore, is to determine which of these goals to attempt. The hierarchy (Fig. 12) begins by retrieving the stimulus object (Get-StimulusObject) and its name (Get-Stimulus-Name). A sequence of tests is then performed (Is-Plus? and Is-Square?) until the system figures out which stimulus it has seen. It then retrieves the number associated with the symbol, using either Get-Name-Of-Plus-Number or Get-Name-Of-SquareNumber. Once the name of the associated number is retrieved, it can be vocalized (Say-Name-Of-Number). The mean time for executing this hierarchy is the average over the two stimuli, automatically taking into account the possible orderings of the predicates. The times are 26 cycles for a plus and 32 cycles for a square, for a mean value of 29 cycles for this condition. Condition I11 is a mixed task in which either one of two numbers or one of two symbols will appear. If we assume that the description of a stimulus includes a statement of its class (either symbol or number) as well as its name, then the obvious goal hierarchy for this condition is the one in Fig. 13. The first thing to do is to get the representation of the stimulus (GetStimulus-Object). From this representation the stimulus’ class is determined (Get-Stimulus-Class). The decision is then made as to whether to branch to the goal to handle numbers or to the one to handle symbols. The goal to handle numbers (Say-Stimulus-Name)is the same as Goal 3 in the structure for Condition I. Likewise, the goal to handle the two symbols (Say-NameOf-Number-From-Plus-Or-Square-Stimulus) is the same as Goal 3 in the structure for Condition 11. Two versions of this structure were run, corresponding to the two orderings of the Is-Symbol? and Is-Number? predicates. For the numeric case (Condition IIIa), the two versions took 25 and 30 cycles (mean 27.5). For the symbolic case, the results were averaged over both the ordering of the two class predicates and over the ordering of the predicates for the two symbols (plus and square). The plus required 42 and 37 cycles (mean
24
Paul S. Rosenbloorn and Allen Newell 1. Read-Plus-Or-Square-Say-Number 2. Get-Stimulus-Object
3. Say Name-Of-Number-From-Plus-Or-Square-Stimulus 4. Get-Stimulus-Name 5. Say-Name-Of-Number-From-Plus-Or-Square.Name 6. Get-Name-Of-Number-From-Plus-Or-Square-Name 7. Is-Plus? IF-SUCCEEDED /S-/'/US? THEN
8. Get-Name-Of-Plus-Number IF-FAILED / S - f / U S ? THEN
9. Possibly-Get-Name-Of-Square-Number 10. Is-Square? IF-SUCCEEDED IS-SqUare? THEN 11. Get-Name-Of-Square-Number 12. Say-Name-Of-Number
Fig. 12. Goal structure for Condition I1 in Morin and Forrin (1962).
39.9, while the square required 48 and 43 cycles (mean 45.5). Therefore, the total symbolic average (Condition IIIb) is 42.5. Figure 14 shows the last goal hierarchy, the one for Condition V: say the number associated with one of four symbols. It is an extended version of the hierarchy for Condition 11; the chain of predicates has been extended to
1. Read-Plus-Square-Or-Number6ay-Number 2. Get-Stimulus-Object
3. SayName-Of-Number-From.Plus-Square-Or-Number-StimuIus 4. Get-Stimulus-Class 5. Say~Name-Of-Number-From-Plus-Square-Or-Number-Class 6. Is-Number? IF-SUCCEEDEDIs-Number?THEN 7. Say-Stimulus-Name 8. Get-Stimulus-Name 9. Say-Name IF-FAILED Is-Number?THEN 10. Possibly-Say-Name-Of-Number-From-Plus-Or-Square-Stimulus 11. Is-Symbol? IF-SUCCEEDED /s-Symbo/?THEN 12. Say-Name-Of-Number-From-Plus-Or-Square-Stimulus 13.Get-Stimulus-Name 14. Say.Name-Of-Number-From-Plus-Or-Square-Name 15. Get-Name-Of-Number-From-Plus-Or-Square-Name 16. IS-PIUS? IF-SUCCEEDED / S - f / U S ? THEN
17. Get-Name-Of-Plus-Number IF-FAILED / S - f / U S ? THEN
18. Possibly-Get-Name-Of-Square-Number 19. Is-Square? IF-SUCCEEDED/S-SqUare?THEN 20. Get-Name-Of-Square-Number 21. Say-Name-Of-Number
Fig. 13. God structure for Condition I11 in Morin and Forrin (1962).
25
Stimulus-Response Compatibility and Practice 1. Read-Plus-Square-Circle-0r.Triangle-Say-Number 2. Get-Stimulus-Object
3. Say-Name-Of-Number-From.Plus-Square-Circle-Or~Triangle~Stimulus 4. Get-Stimulus-Name
5. Say-Name-Of-Number-From-Plus-Square-Circle-Or-Triangle-Name 6. Get-Name-Of-Number-From-Plus-Square-Circle-Or-Triangle-Name 7. Is-Circle? IF-SUCCEEDED I S - C m k ? ? THEN
8. Get-Name-Of-Circle-Number IF-FAILED I s - C i r c l e ? THEN
9. Get-Name-Of-Number-From-Plus-Square-Or-Triangle-Name 10. Is-Triangle? IF-SUCCEEDED fs-Tnangle? THEN
1 1 . Get-Name-Of-Triangle-Number IF-FAILED I S - T r i a n g l e ? THEN
12. Get-Name-Of-Number-From-Plus-Or-Square-Name 13. IS-PIUS? IF-SUCCEEDED /S-f'fUS? THEN 14. Get-Name-Of-Plus-Number IF-FAILED IS-Plus? THEN
15. Possibly-Get-Name-Of-Square-Number 16. Is-Square? IF-SUCCEEDED IS-SqUare' THEN 17. Get-Name-Of-Square-Number 18. Say-Name-Of-Number
Fig. 14.
Goal structure for Condition V in Morin and Forrin (1962).
handle the two additional symbols (circle and triangle). The decisions are arranged serially, rather than in a logarithmic (hierarchical) fashion, because there is no obvious and well-learned class distinction between any of the pairs of the four symbols. The mean time to perform this task is the average over the times for the four alternative stimuli (26, 32, 38, 44), an average of 35 cycles. Figure 15 and equation 3 summarize the results for the Morin and Forrin (1962) experiment. The slope of the equation lies between those found for the previous two experiments. RT = 7.9 x Cycles
+ 391,
rz = 0.900
(3)
Though this is a worse fit than the previous two experiments, the model still accounts for over 90% of the variance. 4. Combining the Compatibility Results
So far the model has produced good fits to the three compatibility experiments individually, but to have a true cross-experimentaltheory more is needed. Specifically, we need a set of task-independent parameters. This is appropriate for the slope parameter (msec/cycles), but we cannot expect to
26
Paul S. Rosenbloom and Allen Newell
Q
E
2 600C
0 .-c
2 500. a"
40 3000
10
20
30
40
50
Xapsd cycles
Fig. IS. Regression of reaction time against simulated cost for Morin and Forrin (1962).
find a task-independent intercept. The experiments differ in many ways not captured by the theory. For example, in the Morin and Forrin (1962) experiment responses are vocal, while in the other two experiments they are manual. Such differences will affect the intercept, but should leave the slope unchanged. The following equation is the result of regressing the 14 simulated data points against task times. Each of the three compatibility experiments (Duncan, 1977; Fitts & Seeger, 1953; Morin & Forrin, 1962) has its own intercept (specified as the coefficient of a Boolean variable which is true only for that experiment), but a single slope parameter is used. RT = 7.5 x Cycles
+ 347 x Duncan + 244 x Fitts & Seeger +
403 x Morin & Forrin,
r2 = 0.933
(4)
Figure 16 plots simulated reaction time versus true reaction time for the 14 compatibility conditions. Both the equation and the graph reveal the strong linear trend of the data. However, a waviness remains because of the differences in slopes among the experiments. Though this tends to disappear into the overall noise, it is an issue that cannot be considered totally resolved.
II. Learning: Practice and Chunking In this section we describe the learning component of the model. This component, based on the goal-hierarchy representation of task performance and on the concept of chunking, has been developed to model one of the ubiquitous regularities in human performance-the power law of practice. The body of this section consists of a description of the power law of practice, the chunking model of learning, and results generated by the model.
27
Stimulus-Response Compatibility and Practice
al800-
E i=
-2 C
.o0 P
o Duncan (1977) Fins and Seeger (1953)
700-
300 300
b
Morin and Forrin (1962)
400
500
600
700
800
Simulated Reaction Time
Fig. 16. Comparison of simulated and experimental reaction times for the three compatibility experiments combined.
A. DATA:PRACTICE Task performance improves with practice. More precisely, the time to perform a task ( T ) decreases as a power-law function of the number of times the task has been performed (that is, the number of trials, N ) : T = BN-"
(5)
When plotted on log-log paper-where power laws plot as straight lines-practice curves are often linear over much of their range, but have deviations at their two ends. These deviations can be removed by using a fourparameter generalized power-law function. One of the two new parameters (A) takes into account that the asymptote of learning is likely to be greater than zero. In general, there is a nonzero minimum bound on performance time that is determined by basic physiological limitations and/or device limitations-if, for example, the subject must operate a machine. The other added parameter Q takes into account the prior experience on the task. Power laws are not translation invariant. Practice occurring before the official
Paul S. Rosenbloom and Allen Newell
28
beginning of the experiment-even if it consists only of transfer of training from everyday experience-will alter the shape of the curve unless the effect is explicitly allowed for by the inclusion of this parameter. Augmenting the power-law function by these two parameters yields the following generalized function:
A generalized power law plots as a straight line on log-log paper once the effects of the asymptote (A) are removed from the time (T), and the effective number of trials prior to the experiment Q are added to those performed during the experiment (N):
Figure 17 shows a generalized power-law fit to a practice curve from a 1023-choicereaction-time task (Seibel, 1%3). This is a perceptual-motor task in which there is a stimulus display of 10 lights, arranged (mostly) horizontally, and a response apparatus of 10 buttons, arranged (mostly) horizontally in such a way that each finger rests on one of them. The stimulus and response environments are set up so that there is a highly compatible one-one correspondence between the lights and buttons, each light directly above a button. On each trial of the experiment, some of the lights are on and some are off. The subject's task is to respond as quickly as possible by pressing the buttons corresponding to the lights that are on. Ten lights, with two possible states for each light, yields 21° or 1024 possible trials. The configuration with no lights on is not used, leaving 1023 choices. Each data point in this figure represents the mean reaction time over a block of 1023 trials. The curve is linear over the whole range of more than 75,000 trials.
0
g
.01 1 6 ,
.
" * ' " . '
10000
Trial number [N
1OOOOO
+ 1
Fig. 17. Optimal general power-law fit to the Seibel data (log-log coordinates).
Stimulus-Response Compatibility and Practice
29
While the power law of practice was originally recognized in the domain of motor skills (Snoddy, 1926), it has recently become clear that it holds over a much wider range of human tasks, possibly extending to the full range of human performance. Newell and Rosenbloom (1981) brought together the evidence showing this for perceptual-motor skills (Snoddy, 1926; Crossman, 1959), perception (Kolers, 1975; Neisser, Novick, & Lazar, 1963), motor behavior (Card, English, & Burr, 1978), elementary decisions (Seibel, 1963), memory (J. R. Anderson, personal communication, 1980), routine cognitive skill (Moran, 1980), and problem solving (Neves & Anderson, 1981; Newell & Rosenbloom, 1981). Though the fits are impressive, it must be stressed that the power law of practice is only an empirical law. The true underlying law must resemble a power law, but it may have a different analytical form. B. MODEL: CHUNKING Newell and Rosenbloom (1981) showed that no existing models of practice predicted power-law practice curves. To remedy this situation, the chunking It was based on the idea that practice imtheory of learning was de~eloped.~ proves performance via the acquisition of knowledge about patterns in the task environment. This pattern knowledge amounts to the chunks shown to be ubiquitous in the structuring of memory (Miller, 1956; DeGroot, 1965; Bower & Winzenz, 1969; Johnson, 1972; Chase & Simon, 1973; Chase & Ericsson, 1981). The traditional view of chunks is that they are symbols representing the combination of several other symbols. For example, in one set of classic experiments Bower and colleagues (Bower & Winzenz, 1969; Bower & Springston, 1970; Bower, 1972) showed that recall of strings of numbers or letters is strongly influenced by the segmentation of the string. If the segmentation corresponds to a previously learned grouping of the items (for example, FBI-PHD-TWA-IBM), performance is better than if it results in meaningless groups (FB-IPH-DTW-AIB-M). These results were interpreted as evidence for segmentation-guided chunking of familiar strings. By replacing a string of several letters with a single chunk, the subject's memory load is reduced, allowing more letters to be remembered. At recall time the chunks are decoded to yield the original items to be recalled. The existence of chunks implies that memory is hierarchically structured as a lattice (tangled hierarchy, acyclic directed graph, etc.), rooted in a set of preexisting primitives. A given chunk can be accessed in a top-down fashion, by decoding a chunk of which it is a part, or in a bottom-up fashion, by encoding from the parts of the chunk. Encoding is a recognition or parsing process. 'Anderson (1982) has also developed a model in which power-law practice curves are derived from the effects of a power-law forgetting process on a production strengthening mechanism.
30
Paul S. Rosenbloom and Allen Newell
Newell and Rosenbloom (1981) based the chunking theory of learning on three assumptions characterizing respectively the nature of performance, learning, and the task structure. The performance assumption (the performance program of the system is coded in terms of high-level chunks, with the time to process a chunk being nearly independent of the size of the chunk) is a shift from the traditional view of a chunk as a passive symbol to a view where the chunk directly represents an active process. The learning assumption (chunks are learned at a constant rate on average from the relevant patterns of stimuli and responses that occur in the specific environments experienced) views the human as a time-independent processing mechanism. The task-structure assumption (the probability of recurrence of an environmental pattern decreases as the pattern size increases) relates the expected effectiveness of chunks to their height in the hierarchy. Chunking works in a bottom-up fashion, learning low-level (small) patterns first, and then gradually building upon existing chunks to learn higher-level (larger) patterns. If we ignore the task-structure assumption for a moment, this bottom-up learning leads to exponential speedups with practice. Each time a task is practiced, a new and higher level of chunks can be acquired, reducing the time to perform the task by a constant factor (the size of the chunks). The principal effect of the task-structure assumption is to slow down this learning to the point where it becomes more like a power law than an exponential. This happens because, as practice proceeds, the chunks that are learned become progressively larger. These larger chunks are encountered less often-for example, in the Seibel (1963) task described earlier, a chunk that contains three lights is encountered half as often as a chunk that contains two lights-and therefore contribute less to the overall speedup than do the smaller chunks which are encountered more often. Learning actually continues at the same pace (the learning assumption), but what is learned becomes progressively less helpful. Approximate mathematical analyses of this process can be found in Newell and Rosenbloom (1981) and Rosenbloom (1983). Rosenbloom and Newell (1982, 1987) made the abstract assumptions of the chunking theory concrete by implementing a task-dependent formulation of the chunking theory for the Seibel (1963) task. In that task, patterns of stimuli (lights) must be related to patterns of responses (button presses). A chunk entered performance by encoding a pattern of lights and then decoding to a pattern of button presses. In the chunking theory of learning, in contrast to traditional views of chunking, encoding and decoding are not inverse processes between a symbol and a group of symbols. For example, what needs to be decoded in this task is not the pattern of lights, but the pattern of button presses. Based on this consideration, the chunking theory assumes that there are two symbols for each chunk, a stimulus symbol and a response symbol. Using a chunk consists of encoding the stimulus items to
Stimulus-Response Compatibility and Practice
-
31
the stimulus symbol (S,S,S, a), mapping the stimulus symbol to the response symbol (a p ) , and decoding the response symbol to the response items ( p R,R2R,).The mapping serves as a point of control at which the choice of the appropriate response can be made, freeing the encoding and decoding processes to be fast, uncontrolled processes, capable of quickly encoding or decoding a hierarchy of chunks. Several lights can be encoded t o a new stimulus symbol, and this new symbol can be combined with other symbols t o form an even higher-level pattern covering more lights, and so on. At some point a stimulus symbol is mapped to a response symbol, which then goes through hierarchical decoding until the primitive button presses are reached. Though the higher-level chunk is usable in fewer situations (the task-structure assumption), it improves performance when it is usable because it takes essentially the same amount of time to process as one of the lower-level ones (the performance assumption). The model described in this article is based on the earlier formulations of the chunking theory, but it has been generalized to a task-independent practice mechanism by taking advantage of the general performance model provided by goal hierarchies. Instead of relating patterns of lights to patterns of button presses, the new goal-oriented formulation of chunking relates patterns of goal parameters to patterns of goal results. Each chunk improves the performance of the system by eliminating the need to process fully a specific instance (a combination of parameter values) of a particular goal. A goal may (and almost always does) have more than one chunk, as each combination of parameter values requires a different chunk. Chunking essentially implements a form of store-versus-compute trade off, in which it replaces the normal processing (decomposition into subgoals for nonterminal goals, and direct execution of an action for terminal goals) with a direct connection between the relevant parameter values and results. It bears a family resemblance to such other store-versus-compute mechanisms as production composition (Lewis, 1978; Neves & Anderson, 1981; Anderson, 1982, 1986), memo functions (Michie, 1968), macrooperators pikes, Hart, & Nilsson, 1972; Korf, 1985), and explanationbased generalization (Mitchell, Keller, & Kedar-Cabelli, 1986). Discussions of the relationship of chunking to these other mechanisms can be found in Rosenbloom and Newell (1986). Laird, Rosenbloom, and Newell (1986), and Rosenbloom and Laird (1986). As in the earlier model of chunking, each chunk still consists of three components-encoding, decoding, and connection (or mapping)-with each component being implemented as a production. The goal’s parameter values form the basis for the encoding component. Given the presence of those values in the working memory, the encoding component generates a new stimulus symbol representing their combination. Encoding is a parallel, goal-independent, data-driven process. Every encoding component executes
-
-
32
Paul S. Rosenbloom and Allen NeweU
as soon as appropriate, irrespective of whatever else is happening in the system. The results of encoding components can themselves become parameters of other goals, leading to a hierarchical encoding process. The results of the goal form the basis for the decoding component. Given the presence of an encoded result-symbol in the working memory, the decoding component generates the actual results returned by the goal. Decoding occurs when the results are needed. As with encoding, the set of decoding components forms a parallel, goal-independent, hierarchical process in which complex results are decoded to simpler ones, which are then decoded even further. The connection component of the chunk generates the encoded result from the encoded parameter. Connectionsprovide a locus of control by occurring serially and under the control of the goals. A connection can be made only when the system is working on the goal for which the chunk was formed (and after its encoding component has executed). This assures that only appropriate results are generated even though encoding and decoding are uncontrolled. As a simple example of how chunking works, consider the three-goal hierarchy in Fig. 18. This structure computes the average of two numbers. The toplevel goal (Compute-Average-Of-Two-Numbers)takes as parameters the two numbers to be averaged and returns a single result which is their mean. The performs the first half of the first subgoal (Compute-Sum-Of-Two-Numbers) computation. It takes the two numbers as parameters and returns their sum as its result. The second subgoal (Divide-Sum-By-2) finishes the computation by taking the sum as a parameter and returning half of it as its result. Suppose that the first task is to average the numbers 3 and 7. Control would pass from Goal 1 to Goal 2. When Goal 2 finishes and returns its result of 10, a chunk of three components is created (bottom left of Fig. 19). An encoding component is created that encodes the two parameters (3 and 7) into a new symbol (El). It executes as soon as it is created because the parameters are in the working memory. A decoding component is created that, when necessary, decodes from a second new symbol (Dl) to the result (10). A connection component (the horizontal line with the goal name above it and goal number below it) is created that generates the result symbol (Dl) when it detects both the presence of the encoded parameter (El) and that Goal 2 is the active goal. The connection does not execute immediately because Goal 2 is already complete when the chunk is created.
1. Compute Average-OCTwo-Numbers 2. Compute-Sum-Of-Two-Numbers
3. Divide-Sum-By-2
Fig. 18. A simple three-goal hierarchy for the averaging of two numbers.
Stimulus-Response Compatibility and Practice
33
Compute-Averape-O(-Two-Numbers
CoApute Sum-Of Two Numbers
/
3
\ 7
Divide Sum By 2
V 10
1
5
Fig. 19. Sample chunks for the averaging goal hierarchy.
Following the termination of Goal 2, Goal 1 is reactivated, but then is suspended in favor of Goal 3. When this goal terminates successfully (returning the number 5 ) a chunk is created for it (bottom right of Fig. 19). The encoding component encodes the number 10 into the symbol E2; the decoding component decodes from the symbol D2 to the number 5; and the connection component connects E2 to D2 (in the presence of an active Goal 3). In contrast to the chunk for Goal 1, this chunk can be used in more than one task situation. It can be used whenever Goal 1 generates a sum of 10, whether it does it by adding 3 and 7, 5 and 5 , or any other pair of numbers. This is a form of transfer of training based on identical elements (Thorndike, 1913). Following the termination of Goal 3, Goal 1 is reactivated and terminates successfully (returning the number 5 ) . No chunk is created for Goal 1 because chunks are created from the bottom up in the goal hierarchy. A chunk can be created for a goal when (1) the goal has just completed successfully and (2) all of the goal’s subgoals were themselves processed by chunks. It is this bottom-up aspect of chunking that leads to hierarchical encoding and decoding networks. However, bottom-up chunking does not imply that all low-level chunks are learned before any high-level chunks are learned, or even that all of the chunks must be learned for a subgoal before any can be learned for its parent goal. The second condition on chunk creation merely states that chunks must exist for the goal’s subgoals in the current situation. Whether other chunks exist or do not exist for the subgoals is irrelevant. Given what was learned during the first trial of this task, the next time the same task is performed things will go differently. As soon as the task is restarted (again with the values 3 and 7) the encoding component from the chunk for Goal 2 executes, placing El in the working memory. Goal 1 is activated and then suspended in favor of Goal 2. At this point the connection component for Goal 2 executes, generating D 1 and successfully completing Goal 2. D1 is decoded to the number 10, which is then immediately reencoded to E2 by the encoding component for Goal 3. Following the subsequent reactivation and suspension of Goal 1, Goal 3 is activated. The connection
34
Paul S. Rosenbloom and Allen Newell
component for Goal 3 executes, generating D2, and returning D2 as the result to Goal 1. This time when Goal 1 terminates, a chunk is created (top of Fig. 19) because both of the subgoals were processed by chunks. The encoding component for this chunk builds upon the existing encodings by encoding El to a new symbol (E3); it does not go straight from the primitive parameters of Goal 1 (3 and 7). This happens (and causes hierarchical encoding) because, for this instance of Goal 1, El is the parameter, not 3 and 7. Recall from Section IB that the parameters of a goal consist of those pieces of the goal’s initial state that are examined during the goal’s performance. El is generated before Goal 1 is activated (so it is part of the goal’s initial state) and examined by the connection component for Goal 2. On the other hand, neither of the objects representing the numbers 3 and 7 is examined during the processing of Goal 1. Therefore, El is a parameter (and included in the chunk), while the numbers 3 and 7 are not. The decoding component is created in a similarly hierarchical fashion. It decodes from a new symbol 0 3 ) to D2. This occurs because D2 (and not the number 10)is the result of Goal 1. It never became necessary to decode D2, so it was passed directly up as the result of both Goals 3 and 1. The connection component of this chunk links E3 to D3 in a straightforward manner. If the same task is performed yet again, the encoding components immediately generate E l , followed by E3. Goal 1 is activated, and its connection component executes, generating D3 and completing Goal 1. If the result is needed by some part of the system outside of the hierarchy it will be decoded to D1,and then to the number The example that we have just gone through outlines the basics of how the chunking mechanism works. The next step is to look at chunking in the more complex goal hierarchy for the Seibel 1023-choice RT task (Fig. 20). The task environment for the Seibel task has been modeled as two rectilinear arrays, one for the stimulus lights and one for the response buttons. Both of these arrays stretch from 0 to 10oO horizontally. The goal hierarchy is based on the processing of horizontal segments of these arrays. It implements a recursive divide-and-conquer algorithm in which the stimulus display is broken up into smaller and smaller segments until manageable pieces are generated. The recursion occurs at Goals 13 and 14 in Fig. 20. These goals are repetitions of the topmost goal in the hierarchy (Do-LightsIf-Any), but the scope of each is limited to one half of the display currently being processed. The numeric computation to obtain the middle of the segment (involving an addition and a division) could be viewed as too powerful 6There is one obvious modification that can be made to improve the efficiency of this process: avoid creating an encoding or decoding component for trivial situations, those that have
only one item to be encoded or decoded. This technique is employed in the simulations described in Section 1I.C.
*
Stimulus-Response Compalibilily and Practice
/
4'
35
-
s-
F
1. DoLights.lf-Any(Min-X,Max.X) 2. No-Light.On?(Min-X, Max-X) IF-FAILED NO-Lighf-On? THEN 3. Do-Lights(Min.X, Max-X) 4. No-Light-Off?(Min.X,Max-X) IF.SUCCEEDED NO-Llghf-Off7 THEN 5. Do-Press.All-Buttons(Min.X, Max-X) IF.FAILED No-Llghf-Off7 THEN 6. Do-Off-And-On(Min-X,Max-X) 7. One-Light-On?(Min-X, Max-X) IF-SUCCEEDED One-Light-On? THEN 8. Press-Button-Under-On.Light(Min-X. Max-X) 9. Get-0n.Light-X(Min-X,Max-X) 10. Get-On-Light-Stimulus(Min-X.Max-X) 11. Get-Stimulus-X 12. Press-Button-At-X IF-FAILED One-Lighf-On7 THEN 13. Do.Lights-If.Any(Min-X,[Min-X + Max.X]/2) 14. Do-Lights-If-Any([Min-X+ Max-X]/Z. Max-X)
Fig. 20.
Goal hierarchy for the Seibel (1963) task.
a computation to appear where it does. However, this is only intended as an approximation to what a human subject would do in such a situation, namely divide the stimulus display into two (or three) roughly equal parts. Three types of horizontal segments are manageable, that is, terminate the recursion by directly executing a response. The first type of manageable segment is one in which no lights are on. Such segments require no explicit processing, so the goal just returns with success. The opposite of the first type of segment, one in which no lights are off, is also manageable. For such a segment the system generates a single response specifying that a press action must occur in the entire region defined by the segment (using the Do-PressAll-Buttons goal); it is assumed that this will result in button presses by all of the fingers in the region. Specifying a single button press is actually a special case of this in which the region is just large enough to contain one button. Allowing multi-on-light segments to be manageable implies that sequences of adjacent on lights can be pressed simultaneously even before chunking has begun. Such behavior is seen very early in the trial sequence for some subjects (Rosenbloom & Newell, 1987). The remaining manageable
36
Paul S. Rosenbloom and Allen Newell
segments are those that contain exactly one light on. These segments are processed (using the Press-Button-Under-On-Lightgoal) by finding the location of that light and generating a button press at that location. If a generated segment does not meet any of these three criteria, it is unmanageable and is split into two smaller segments. This strategy produces performance characteristics much like those of the subjects in Rosenbloom and Newell (1987)-left-to-right processing of groups of adjacent on lights. Chunking starts in this structure with the terminal goals (numbers 2,4, 5, 7, 10, 11, and 12). Take Goal 11 (Get-Stimulus-X), for example. Successful completion of this goal requires retrieving from the working memory the representation of a stimulus that has been perceived and generating a piece of information representing the horizontal location (X) of that stimulus. The parameter for this goal is the stimulus, and the result is the location. In the chunk for this situation the encoding and decoding components are trivial. They simply recode from the single parameter to a new symbol, and from a second new symbol to the result. The connection component tests for both an active Get-Stimulus-X goal and the encoding symbol, and produces the decoding symbol. Chunks that directly relate groups of lights to groups of buttons-the kinds of chunks produced by the chunking mechanism in Rosenbloom and Newell (1987)-are learned at the root/recursive step in the goal hierarchy. The root/recursive goal in the hierarchy (Do-Lights-If-Any) has parameters which represent lights in the stimulus display and generates results that are the button presses. The earliest chunks that can be created for this goal are those at the bottom of the recursion; that is, goals to process manageable segments of the display. Each of these chunks will represent either a single on light in a region, a region of solid on lights, or a region with no on lights. Once the chunks exist for Goals 13 and 14 (and their sibling Goal 7, the predicate One-Light-On?) in a single situation, the parent goal (Goal 6: DoOff-And-On) can be chunked. This yields a new chunk for the combined situation in both segments. This process continues up the hierarchy until Goal 1 is chunked for that level of recursion. But Goal 1 at that level is just Goal 13 or 14 at the next level up. Therefore, the level of aggregation of segments covered by chunks gradually increases (these chunks are acquired one at a time). Figure 21 shows how the final encoding hierarchy would look if a single task situation (a pattern of 10 lights) were repeated until the root goal at the top level of the recursion has been chunked. The nodes in this hierarchy all represent chunks of the Do-Lights-If-Any goal (numbers 1,13, and 14). The other goals in the hierarchy also contribute encodings, but they have been left out of this figure so that the hierarchical grouping of the lights is clear. Inside of each node is shown the pattern of lights that it covers. The
Stimulus-Response Compatibility and Practice
37
Fig. 21. The encoding hierarchy for one of the Seibel (1963) task situations. 0 is On,o is Off, and - is ignored, The numbers represent the horizontal locations at which the display is segmented.
numbers in the figure specify the horizontal location of the associated split. The left branch from the split represents the contribution from its Goal 13, while the right branch is from its Goal 14. The terminal nodes in the tree represent the manageable segments of the task. One of the terminal patterns (with 10)requires no explicit processing because it contains no on lights. Two of the patterns (the ones with and 0 ) have no off lights, and so are processed by goal Do-Press-All-Buttons. The remaining two manageable patterns contain a single on light and one or more off lights (the ones with c and 0 n 0 ) . These are processed by goal Press-Button-Under-On-Light. Once chunks are acquired for a pair of sibling terminal nodes, it is possible to acquire one for their combination (signified by their parent in the tree), and so on up the tree. If this process were taken to its conclusion, the tree in Fig. 21 would represent the hierarchy defined by the chunks’ encoding components. This process always leads to light-button chunks for contiguous lightbutton pairs. It does not lead to chunks for disjoint patterns such as (only) the two extremal (right and left) light-button pairs. This is not a limitation on the generality of the chunking mechanism. Instead, it is a function of the goal structure employed. A different goal structure (reflecting a different processing strategy) could lead to the creation of such disjoint chunk patterns. The following list of points summarizes the key aspects of chunking as it applies to goal hierarchies. 1 0
1. Each chunk represents a specific goal with a specific set of parameter
2. 3. 4. 5.
values. It relates the parameter values to the results of the goal. Chunks are created through experience with the goals processed. Chunks are created bottom-up in the goal hierarchy. A chunk consists of encoding, decoding, and connection components. Chunk encoding and decoding are hierarchical, parallel, goal-asynchronous processes that operate on goal parameters and results (respectively).
38
Paul S. Rosenbloom and AUen Newell
6. Chunk connection is a serial, goal-synchronous process that generates (encoded) results from (encoded) parameters. 7. Chunks improve performance by replacing the normal processing of a goal (and its subgoals) with the faster processes of encoding, connection, and decoding.
C. RESULTS: PRACTICE SIMULATIONS In this section we present the results of using the chunking theory of learning as a model of practice for the Seibel (1963), Duncan (1977), and Fitts and Seeger (1953) tasks (learning data does not exist for the Morin and Forrin 1962, experiments). The same goal hierarchies as were used in the compatibility simulations are used here (except for the Seibel hierarchy, which was not used there). The only difference is that now a sequence of trials, with learning, is generated. The main phenomena of interest are the shapes of the learning curves for the individual conditions, rather than comparison of times between conditions. However, changes in the relative times are still of secondary interest because they reflect the interaction between compatibility and practice. 1. Seibel(1963)
The practice curve for one of Seibel’s subjects has already been presented (Fig. 17) and shown to be a power law, so we can proceed directly to an examination of some simulated trial sequences. Two different sequences of trials were simulated for the Seibel task. The first sequence is the same as the one used in Rosenbloom and Newell (1987). The simulation completed 268 trials before it was terminated by a lack of memory space. A total of 682 productions was learned. On the second sequence of trials, from a newly generated random permutation of the 1023 possibilities, 259 trials were completed before termination. For this sequence, 652 productions were learned. Figure 22 shows the first sequence as fit by a general power law. Each point in the figure represents the mean value over five successive trials (except for the last one, which only includes three). The high degree of linearity of this curve implies that the model produces practice curves that are well fit by power laws. The apparently anomalous discontinuity in performance between trials 100 and 200 arises from the microstructure of the trial sequence; a string of atypically easy trials occurred at this point. For comparison purposes, the data for the first 268 trials of Subject 3 (Rosenbloom 8z Newell, 1987) for this same sequence of trials are reproduced in Fig. 23.’
’For reasons discussed in the next section, it was not possible to fit a general power law to this aggregated data.
39
Stimulus-Response Compatibility and Practice
1 I0
Fig. 22.
100
1000
Trial number [ N + E]
General power-law fit to 268 simulated trials of the Seibel (1963) task.
It shows much the same pattern as does the simulation, including the presence of the same discontinuity in performance. It differs primarily in being shallower and having more variation. The general power-law fit to the simulated data makes use of the E parameter, the correction for previous practice, to straighten out the curve. It may seem nonsensical to talk about previous practice for such a simulation, but a plausible interpretation is possible. In fact, there are two independent explanations; either one or both may be responsible. The first possibility is that the level at which the terminal goals are defined is too high (complex). If the “true” terminals are more primitive, then chunking starts at a lower level in the hierarchy. During the bottom-up chunking that would occur, the system would eventually reach the lowest level in the current hierarchy. All of the practice prior to that point is effectively previous practice for the current simulation. The other source of previous practice is the
-T
0
al
z
=
3792N. -l7
L
g
i-’ C
looo-
.-0
c
0
g
100:
10 1
10
1000 Trial number EN]
100
Fig. 23. Simple power-law fit to the 227 non-error trials in the first 268 trials of the Seibel (1963) task for Subject 3 in Rosenbloom and Newell (1987).
40
Paul S . Rosenbloom and Allen Newell
goal hierarchy itself. At the beginning of the simulation this structure is already known perfectly. However, there must be a process of method acquisition by which the subject goes from the written (or oral) instructions to an internal goal hierarchy. Table IV shows a slightly different analysis of this data. Included in this table are the exponential, simple power-law, and general power-law fits to the unaggregated data for both trial sequences. Exponential fits are included because the exponential is the main competitor of the power law as a model for the shape of practice curves (the hyperbolic has also been proposed, but it is simply the special case of the power law in which the power is - 1). The left-hand column contains the fits to the human data, and the right-hand column contains the fits to the simulated data. The curves listed here differ slightly from those in Figs. 22 and 23 because of the difference in level of aggregation. The main columns of interest are the ones containing the r2 values, the proportion of the data’s variance accounted for by the fit. We see essentially the same pattern for the simulated data from both trial sequences. The simple power law is comparable to the exponential, while the general power law is better than either. The simple power law is a twoparameter fit, while the exponential has three parameters, so the power law should be favored as an explanation. The general power law has four parameters, though only three are actually needed for these fits (the asymptote is a different kind of parameter at 0 because it is constrained by the analysis program to be greater than or equal to 0). The human data shows the same ambiguity between the exponential and simple power-law forms. However, there are two surprises in this data. The first surprise is the extremely low r2values for all three curve fits. We have no watertight explanation for this, but it appears to be because of the intrinsic variability of the TABLE IV AND GENERAL POWER-LAW FITS EXPONENTIAL, SIMPLE POWER-LAW, HUMAN DATA” AND SIMULATED DATAFOR THE SEIBEL (1963) TASK
TO
Human data Trial sequence 1
Simulated data
Equation
r’
Equation
r2
T = 383 + 1718e-000251 T = 3384N-o l6 T = 18 + 3679(N + 3)-”’
0.138 0.135 0.136
T = 8 + 85e‘Ooo9 T = 462N-05’ T = 0 + 4773(N + 33)-’” T = 4 + 88e-0008 N T = 413N-05’ T = 0 + 4161(N + 29)-09’
0.753 0.751 0.811
2
“Subject 3 in Rosenbloom and Newell (1987).
0.733 0.746 0.807
Stimulus-Response Compatibility and Practice
41
data-some trials require one button press while others require nine-exacerbated by the lack of aggregation of data points, rather than the inadequacy of the power-law and exponential models. The second surprise is the lack of any significant improvement when the general power law is used. This same data set does yield a power-law curve when the full 408 trials generated by the subject are used. In conclusion, it appears that the simulation does produce power-law practice curves, but the evidence is not very strong. Longer simulated curves would be the best way to strengthen this conclusion, but because of memory limitations, the system in which the simulations were run was not capable of this. 2. Duncan (1977) Figure 24 shows the practice curves generated by Duncan's subjects for the four different kinds of trials. This figure shows the mean data for experimental runs 2 through 6, at 144 trials/run/subject (averaged over 8 subjects for the pure cases and 16 subjects for the mixed case). The main result is that all four curves are well fit by straight lines (the rz values range between 0.91 to 0.99), implying a good match to a power law. However, this must be qualified by the fact that the curves are about as well fit by exponentials, with r2 values ranging between 0.88 and 0.99. Figure 25 shows the simulated practice curves for the Duncan conditions. The two pure conditions were each run for one random sequence of 25 trials. The mixed condition was run for two random sequences of 25 trials, one for each of the two hierarchy variations. The mixed corresponding data from the two simulations were combined into a single curve. Likewise, the mixed opposite data were combined into a single curve. These curves are messier because of the reduced amount of aggregation done (five trials/data point8) as compared to the human curves (1 152 trialddata point). Because of the noise in these curves, not too much in the way of firm conclusions can be drawn from them. However, the curves do appear to be exponential rather than power law. For three out of the four curves a higher rz value is obtained for an exponential than for the simple power law (the mixed-corresponding curve is the exception). Though this is somewhat unsatisfactory, it is not totally unexpected because of the ambiguity of the human curves and because the approximate mathematical analysis of chunking presented in Rosenbloom (1983) predicts exponential curves in simple tasks where the number of initial states for a goal does not vary with the height of the goal in the hierarchy, as is the case at least for the pure conditions in this experiment. 'The final data points for two mixed curves do not have exactly five points because the random sequences did not guarantee equal numbers of the two types of trials.
Paul S. Rosenbloom and Allen Newell
42
E
..
0 ._
0 - - .- - 0 Corresponding t - 0 Opposite D - D Mixed-Corresponding H Mixed-Opposite
0
8
c, ‘O01
10
Experimental run
Fig. 24. Log-log plot of the practice curves from Duncan (1977).
3. Fitts and Seeger (1953)
Fitts and Seeger (1953) provide practice curves for the conditions in which the A response apparatus, a lever that can be pushed in one of eight directions, is employed. The curves for conditions S A - R A and SB-RAcan be seen in Fig. 26. These data are aggregated across 48 trials/session and 5 subjects, so that 240 trials are represented per data point. The S A - R A curve is a slightly better exponential (r2 = 0.93) than a simple power law (r2 = 0.92), while the SB-RAcurve is a slightly better simple power law (r2 = 0.91) than an exponential (r2 = 0.89). Both curves are best fit by a general power law (r2= 0.96 and 0.95, respectively). Figure 27 shows the results of simulated practice on these same two conditions. The simulated practice curves for the other two conditions are in Fig. 28. These curves contain 50 trials each, aggregated by 5 trialddata
-
0 5.
t-
m
MMixedOpposite
D
0
1
Opposite €I Mixed-Corresponding
0.
1.0-
.I
-
10
100
Trial number [N]
Fig. 25. Log-log plot of the simulated practice curves for the Duncan (1977) conditions. The mixed curves are averaged over the two hierarchy variations.
43
Stimulus-Response Compatibility and Practice
4
0
1000,
a,
P
v)
L
x
E 4
x
ConditionSA RA CondmonSwRA
E
i-'
c
.-0 0
dC
.
1 o o L
'
. .'....
Y
Fig. 26. Log-log plot of the practice curves for conditions S,-R, and Seeger (1953).
and SB-R, from Fitts
point. Once again comparisons with the human data are difficult because of the large differences in levels of aggression, but we comment on the major trends anyway. Overall, the simulated curves show a pattern not unlike the one evidenced by the human curves. Three of the curves are better exponentials than simple power laws, with one (SB-RB) the reverse. Three of the curves are better general power laws than exponentials, with one (S,-R,) appearing to be a true exponential. 111. Discussion
The performance model is based on a goal-structured representation of reaction-time tasks, in which each task has a performance algorithm
v)
x
a,
0" 0
10.0;
m
P
%
Q----
1.01
..
...
x
Condition SA-RA Condition SB-RA
0
0
~
1
10
100 Trial number [N]
Fig. 27. Log-log plot of the simulated practice curves for conditions S,-R, and SB-R, from Fitts and Seeger (1953). The latter curve is the average over two hierarchy variations.
Paul S. Rosenbloom and Allen Newell
44
P)
0 m
10.0:
0
%
-
Condition SA.RB
x
.lI I
-x Condition Ss
Rg
10
100 Trial number [N]
Fig. 28. Log-log plot of the simulated practice curves for conditions SA-RB and SB-RB from Fitts and Seeger (1953). The curves are the averages over two and three hierarchy variations, respectively.
represented as a goal hierarchy. It produces stimulus-response compatibility effects because different amounts of time are required to perform the goal hierarchies for different experimental conditions. The model produces excelIent fits to the Duncan (1977), Fitts and Seeger (1953), and Morin and Forrin (1962) experiments individually. In addition, when all of the compatibility conditions are combined into a single analysis, a good linear fit is still obtained. Though compatibility has been studied for over 30 years, this model provides the first working proposal for a cross-experimental metric model. Though the performance model has shown promise in its ability to model compatibility phenomena, there are several weaknesses that require further work. One weakness is the inadequacy of the base of experiments over which the model has been tested. Other experiments, both preexisting and novel, need to be analyzed. However, finding other preexisting experiments amenable to this kind of analysis has not been easy. The first experiment we attempted to model was that of Fitts and Deininger (1954). At that time we were not able to do an adequate job of analyzing it, so it was temporarily put aside. Since then, an effort using the approximate version of the model has yielded results in line with those exhibited here (B. E. John, personal communication, 1983), but those results have not yet been incorporated into the current detailed model. Many of the other preexisting experiments, such as Garvey and Knowles (1954), Broadbent and Gregory (1965), Brebner, Shephard, and Cairney (1972), and Brebner (1973), are complicated by their involvement of other effects, such as preparation and discrimination, that are not yet within the scope of the theory. A prime example of this problem is the Smith (1977) experiment. It was much like the Duncan (1977) experiment, except that Smith found that performance in the
Stimulus-Response Compatibility and Practice
45
mixed condition was better than in the opposite condition (called the reflected condition by Smith). Though this seems like it must contradict our model, his mixed condition involved only half as many response keys as did the opposite condition, decreasing the problem of discriminating which response key to use, and increasing the amount of practice on the mixed responses relative to the pure ones. These two factors could lower the reaction time for the mixed condition relative to the opposite condition enough to allow it to fall below. The practice effects are within the scope of our model, but the discriminability effects are not yet. Further development of the model is required before experiments like this can be analyzed adequately. The development of new experiments allows extraneous phenomena to be controlled and can help to overcome the second weakness of the performance model-that it has only been used as a post-hoc analysis tool rather than as a predictive theory. In one recent experiment we compared abbreviation techniques for computer command names (John et af., 1985). Given a visually presented command, such as “delete,” the subject had to type the correct abbreviation for it according to one of four rules: no abbreviation, nonsense (a randomly selected trigram such as wub), vowel deletion (delete becomes dft),and special character plus first letter (for example, /d). Prior to the experiment, an extended version of the GOMS model of compatibility (Section I,B,2) was used to generate algorithms for the four experiments. The model fit the data with an r2 of 0.974. Although this result extends the model to cover an additional new experiment, the model could not be used completely in a predictive fashion. Some of the algorithms were modified postexperimentally to deal with unexpected aspects of the subjects’ behavior, such as the processing of syllable units rather than the expected letter or word units. In a follow-on experiment, results for two new abbreviation techniques were successfully modeled preexperimentally and in a parameter-free fashion, making use of the parameter values from the earlier experiment (John & Newell, 1987). Parameter-free prediction is an important goal, not only because it provides the severest test of such a model, but because it is necessary for practical application of the model. For the two new techniques, the parameter-free predictions yielded an r2 of 0.776. An overall r2 of 0.917 was obtained when the results of the two command-name experiments were combined with those from the compatibility tasks presented in this article. The third weakness of the performance model is the lack of a welldeveloped methodology for the creation of individual task algorithms. The algorithms used in this article were developed in a relatively ad hoc fashion. Three criteria were employed: (1) the algorithms had to be able to actually perform the tasks; (2) the algorithms should not violate what we know about the capabilities and limits of human performance; and (3) all else
46
Paul S. Rosenbloom and Allen Newell
being equal, the algorithms should be as simple as possible. More work is required to turn this informal method into a rigorous and generally usable methodology. The learning model is complementary to the performance model. It exists in the background, observing the processing of goals and saving away chunks that allow the performance model to execute more efficiently in the future. The effect over time is that of a system practicing a task (or a set of tasks). The key criterion on which the learning model was to be evaluated was the functional form of the practice curves produced: it should be a power law. The simulations do produce power-law practice curves. Most importantly, they produce power laws for the one major practice experiment included here, Seibel (1963). However, though the model does produce practice curves for the compatibility experiments, a disturbing number of them are exponentials. We have presented some arguments why this is not unexpected for these very simple tasks and also shown that the human curves for these simple tasks reveal an ambiguity between power-law and exponential forms. However, given the ubiquity of the power law of practice, more power-law simulations are certainly called for. The learning theory originally started from the observation that the power law of practice was ubiquitous throughout cognitive activity (Newell & Rosenbloom, 1981). The present model only pertains to a small domain, and one that is usually placed at the periphery of cognitive behavior at that. However, there is now evidence that chunking is indeed a good learning mechanism for a cognitive architecture quite generally. A general scheme for learning-bychunking has been incorporated in a general cognitive architecture, called Soar (Laird, 1983,1986; Laird e t d . , 1987), developed withinartificial intelligence (AI) and operational over a wide range of A1 tasks. This scheme yields transfer of training within a single experimental trial, across trials, and across tasks (Laird, Rosenbloom, & Newell, 1984). Most of this transfer is positive, but some is negative, caused by the learning of chunks that are overgeneral, that is, that apply in situations when they should not. Moreover, the capability of chunking as a learning mechanism has been extended beyond the simple domain of practice to include more complex types of learning, such as the acquisition of searchcontrol heuristics for a problemsolving system (Laird ef ~i., 1984), the acquisition of macrooperators (Laird, Rosenbloom, & Newell, 1986), and explanation-based generalization (Rosenbloom & Laird, 1986). This evidence is not yet tied directly to human data, and the Soar scheme differs in some details from the model described here.9 Nevertheless, this increases the likelihood that the chunking model describedhere is not limited to the domain of stimulus-response compatibility. Vhe Soar scheme arose from an attempt to incorporate the mechanisms of the present model into Soar.
Stimulus-Response Compatibility and Practice
47
The complete model is the combination of the performance and learning models. There is some, but not much, evidence bearing on this combination. The most important consideration has to be that a complete model of reaction-time phenomena must be capable of both performance and learning, and that the particular versions we propose work harmoniously together, producing both compatibility and practice phenomena. This works in the present model, because the goal hierarchy throws a very finegrained net over the entire task performance, with goals being executed at a rate of about one every 20 msec. This fineness allows both the modeling of small differences in compatibility reaction times and the incremental improvements in all aspects of task performance brought about by practice. In so doing, it provides a potential answer to one of the major criticisms of the chunking theory. Chase (1986) stated the criticism this way: “The theory, as it is presently formulated, cannot account for the fact that all subcomponents show a substantial speedup with practice” (p. 68). He went on to propose that some form of component strengthening, as in Anderson (1982), was required to explain how the primitive components are sped up. An alternative explanation is that what appear to be primitive components are in fact not primitive, but composed of even finer-grained subgoals. For example, in the earlier implemented version of chunking for the Seibel (1963) task, the primitive components (and chunks) directly related patterns of lights to patterns of button presses (Rosenbloom & Newell, 1987). In the current version, each of these components is broken up into a number of smaller subgoals. The existence of these finer-grained subgoals can explain why what appeared to be a primitive component can improve with practice. The a priori plausibility of such a fine-grained goal net may seem low, given that choice-reaction tasks are often viewed as too elementary to be considered cognitive. The Soar architecture, discussed above, provides an important demonstration of the feasibility of a fine-grained goal net in a general cognitive system that does large complex tasks (e.g., play games, solve puzzles, configure computer systems). Soar routinely sets up goals for all decisions, however small. A typical (and high frequency) example is the task of selecting between operators in a problem space. If the information to make the selection is not immediately at hand, Soar sets up a subgoal to do the selection; goes into a problem space for that selection; executes an operator or two, perhaps, to obtain additional information; and then exits the selection problem space to return and make the action selection. All of this would correspond to a second or two. This same goal scheme is used uniformly for all tasks, including the largest and most global. Thus, if Soar were to do the choice-response tasks analyzed in this article, it would set up goals at essentially the same grain as that of the model. Besides just providing a demonstration of plausibility for a fine-grained network, Soar makes clear one of its functions. Chunking in Soar attains its
48
Paul S . Rosenbloom and Allen Newell
results, for both practice and transfer, only because it works in conjunction with the fine-grained goals. Continuous improvement with practice depends on there not being large islands of behavior that cannot be improved, Since chunking is tied to goals, this implies that goals must descend to the smallest units of behavior. Likewise, transfer depends on there being common subtasks, which is to say, common subgoals. It is the low-level goals that provide the bulk of this. The higher the goal, the more unique it becomes. In Soar, both these effects of the fineness of the goal network occur because chunking is chunking of goal results. Conceivably, chunking could be tied to some other organizational aspect of behavior, although it is difficult to imagine what that could be. In any event, Soar ties the present model into general cognitive architectures. Some existing experiments involve both practice and compatibility, notably Fitts and Seeger (1953), Garvey and Knowles (1954), and Duncan (1977). In all of these experiments, practice does reduce the time required by the subjects to perform the experimental tasks. The practice curves usually show convergence, but they do not cross and rarely meet; the compatibility conditions maintain their relative ordering over the span of practice. The import of this data for the model is unclear at present. The simulated practice curves for the compatibility conditions do show similar ordering relationships, but because these curves improve too rapidly, producing short noisy curves, little decisive can be said at this time. The theory also predicts comparable asymptotic performance for the goal hierarchies for all of the conditions. Unless there are portions of the performance that are not open to practice, the curves must eventually meet, or at least come arbitrarily close. It is unclear whether the human practice curves, given enough trials, would eventually meet. Related to both compatibility and practice is the concept of a population stereotype, which was evoked by the early investigators as a major explanation of stimulus-response compatibility (Fitts & Deininger, 1954). Operationally, population stereotypes are determined by examining the responses subjects tend to make to a stimulus when the experimenter does not specify what is appropriate. As Wickens (1984) states, “The amount of practice given to a choice RT task is closely related to S-R compatibility and is clearly the major factor defining population stereotypes” (p. 354). That seems clearly correct. But the implication behind the use of population stereotypes as an explanation is that the compatibilities are essentially arbitrary, being whatever the population happens to bring with it due to the totality of its prior experience. The present model makes clear that there are two contributions to compatibility, one from practice and one from the intrinsic structure of the mapping. This latter is not an effect of population stereotype, but is due to the structure of the basic processing architecture and the structure of the task. Furthermore, the model describes exactly how these two components combine to determine the total effect of compatibility.
Stimulus-Response Compatibility and Practice
49
In summary, we have presented a model of stimulus-response compatibility and practice, built out of submodels for performance and learning. The performance model is supported by its ability to model effectively stimulus-response compatibility experiments and by the pervasive use of goal hierarchies in complex problem-solving systems. The learning model is supported by the well-established presence of chunks in human performance, by the theory’s production of power-law practice curves, and by the power of the chunking mechanism when integrated into a sophisticated cognitive architecture. The combination is supported by the way the two submodels work together to produce joint compatibility and practice effects, and by the success of the Soar architecture to combine the mechanisms of the model with the other mechanisms needed for general cognitive behavior. Despite this general support, there are still a number of problems with the model, as well as a need to extend the model to the detailed treatment of nearby domains, such as other reaction-time tasks (see Rosenbloom, 1983).
ACKNOWLEDGMENTS This research was principally sponsored by the Defense Advanced Research Projects Agency @OD) under Contracts F33615-81-K-l539 and N00039-83-C-0136. Some additional support was provided by the Sloan Foundation. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defence Advanced Research Projects Agency, the U.S. Government, or the Sloan Foundation. This work is based on the first author’s dissertation (Rosenbloom, 1983). done while he was a member of the Department of Computer Science, Carnegie-Mellon University. We would like to thank John Anderson, Jaime Carbonell, and Geoff Hinton for their comments on the dissertation, Mark Cluck and Dirk Ruiz for comments on a draft of this paper, and John Laird for numerous invaluable discussions over the years about this material.
REFERENCES V., Hopcroft, J. E., & Ullman, J. D. (1974). The design and analysis of computer algorithms. Reading, MA: Addison-Wesley. Anderson, J. R. (1982). Acquisition of cognitive skill. Psychologica/ Review, 89, 369-406. Anderson, J. R. (1983). The architecture of cognilion. Cambridge, MA: Harvard Press. Anderson, J . R. (1986). Knowledge compilation: The general learning mechanism. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An ArfSficialintelligence approach, Vol II. Los Altos, CA: Morgan Kauffman. Baron, J . (1978). Intelligence and general strategies. In G. Underwood (Ed.), Sfralegies of information processing. London: Academic Press. Bower, G. H. (1972). Perceptual groups as coding units in immediate memory. Psychonomic Science, 21, 217-21 9. Bower, G. H., & Springston, F. (1970). Pauses as recoding points in letter series. Journal of Experimental Psychology, 83, 421 -430.
Aho, A.
Paul S. Rosenbloom and Allen Newell
50
Bower, G . H., & Winzenz, D. (1969). Group structure, coding, and memory for digit series. Journal of Experimental Psychology Monograph, 80, 1-17. (May, Pt. 2). Brebner, J. (1973). S-R compatibility and changes in RT with practice. Acta Psychologica, 37, 93- 106.
Brebner, J . , Shephard, M., & Cairney, P. (1972). Spatial relationships and S-R compatibility. Acta Psychologica, 36, 1-15. Broadbent, D. E., & Gregory, M. (1965). On the interaction of S-R compatibility with other variables affecting reaction time. British Journal of Psychology, 56, 61-67. Card, S. K., English, W. K., &Burr, B. (1978). Evaluation of mouse, rate controlled isometric joystick, step keys, and text keys for text selection on a CRT. Ergonomics, 21, 601-613. Card, S. K., Moran, T . P., & Newell, A. (1980). Computer text editing: An informationprocessing analysis of a routine cognitive skill. Cognitive Psychology, 12, 32-74. Card, S. K., Moran, T. P, & Newell, A. (1983). The psychology of human-computer inreraction. Hillsdale, NJ: Erlbaum. Chase, W. G . (1986). Visual information processing. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance: Vol. II, Cognitiveprocesses and performance. New York: Wiley(1nterscience). Chase, W. G . , & Ericsson, K . A. (1981). Skilled memory. In J. R . Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, NJ: Erlbaum. Chase, W. G . , & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4, 55-81. Crossman, E. R. F. W . (1959). A theory of the acquisition of speed-skill. Ergonomics, 2, 153-166.
DeGroot, A. D. (1965). Thought and choice in chess. The Hague: Mouton. Deininger, R. L., & Fitts, P. M. (1955). Stimulus-response compatibility, information theory, and perceptual-motor performance. In H. Quastler (Ed.), Information theory in psychoiogy. Glencoe, Illinois: Free Press. Duncan, J . (1977). Response selection rules in spatial choice reaction tasks. In S . Dornic (Ed.), Atrention and performance VI. Hillsdale, NJ: Erlbaum. Ernst, G . W., & Newell, A. (1969). GPS: A case study in generality andproblem solving. New York: Academic Press (ACM Monograph). Fikes, R. E., Hart, P . E., & Nilsson, N. J. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251-288. Fitts, P. M., & Deininger, R. L. (1954). S-R compatibility: Correspondence among paired elements within stimulus and response codes. Journal of Experimental Psychology, 48, 483 -492.
Fitts, P. M., & Seeger, C. M. (1953). S-R compatibility: Spatial characteristics of stimulus and response codes. Journai of Experimental Psychology, 46, 199-210. Forgy, C . L. (1981). OPSS manual. Pittsburgh, PA: Computer Science Department, CarnegieMellon University. Garvey, W. D., & Knowles, W. B. (1954). Response time patterns associated with various display-control relationships. Journal of Experimental Psychology, 47, 3 15-322. Hayes, J. R., & Simon, H . A. (1974). Understanding written problem instructions. In L. Gregg (Ed.), Knowledge and cognition. Potomac, MD: Erlbaum. Hick, W . E. (1952). On the rate of gain of information. Quarterly Journal of Experimental Psychology, 1952, 4, 11-26. John, B. E., & Newell, A. (1987). Predicting the time to recall computer command abbreviations. Proceedings of CHI '87. ACM/SIGCHI. John, B. E., Rosenbloom, P. S., & Newell, A. (1985). A theory of stimulus-response compatibility applied to human-computer interaction. In L. Borman & B. Curtis (Eds.), Proceedings of CHI '85, Human Factors in Computing Systems. San Francisco: ACM/SIGCHI.
Stimulus-Response Compatibility and Practice
51
Johnson, N. F. (1972). Organization and the concept of a memory code. In A. W. Melton & E. Martin, (Eds.), Coding processes in human memory. Washington, DC: Winston. Kolers, P. A. (1975). Memorial consequences of automatized encoding. Journal of Experimental Psychology: Human Learning and Memory, 1, 689-701. Korf, R. E. (1985). Macro-operators: A weak method for learning. Artificial Intelligence, 26, 35-71.
Laird, J. E. (1983). Universal subgoaling. Doctoral dissertation. Carnegie-Mellon University (available in Laird, J . E., Rosenbloom, P. S., & Newell, A. (1986). Universalsubgoaling and chunking: The automatic generation and learning of goal hierarchies. Hingham, MA: Kluwer). Laird, J . E. (1986). Soar user’s manual (Version 4 ) . (Tech. Rep. ISL-15.) Palo Alto, CA: Xerox PARC. Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). Soar: An architecture for general intelligence. Artifirial Intelligence, 33, 1-64. Laird, J. E., Rosenbloom, P . S.,& Newell, A. (1984). Towards chunking as a general learning mechanism. I Proceedings of AAAI-84. Austin: AAAI. Laird, J. E., Rosenbloom, P . S., & Newell, A. (1986). Chunking in Soar: The anatomy of a general learning mechanism. Machine Learning, 1, 11-46. Lewis, C. H . (1978). Production system models of practice effects. Doctoral dissertation. University of Michigan. Michie, D. (1968). “Memo” functions and machine learning Nature (London) 218, 19-22. Miller, G. A. (1956). The magic number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Mitchell, T. M., Keller, R. M., & Kedar-Cabelli, S. T. (1986). Explanation-based generalization: A unifying view. Machine Learning, 1, 47-80. Moran, T . P. (1980). Compiling cognitive skill (AIP Memo 150). Palo Alto, CA: Xerox PARC. Morin, R. E., & Forrin, B. (1962). Mixing two types of S-R associations in a choice reaction time task. Journal of Experimental Psychology, 64, 137-141. Morin, R. E., & Grant, D. A. (1955). Learning and performance of a key-pressing task as a function of the degree of spatial stimulus-response correspondence. Journal of Experimental Psychology, 49, 39-47. Neisser, U., Novick, R . , & Lazar, R . (1963). Searching for ten targets simultaneously. Perceptual and Motor Skills, 11, 427-432. Neves, D. M., & Anderson, J . R . (1981). Knowledge compilation: Mechanisms for the automatization of cognitive skills. In J . R. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, NJ: Erlbaum. Newell, A. (1969). Heuristic programming: Ill-structured problems. In J. Aronofsky (Ed.), Progress in operations research, 111. New York: Wiley. Newell, A. (1973). Production systems: Models of control structures. In W. G . Chase (Ed.), Visual information processing. New York: Academic Press. Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J . R. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, NJ: Erlbaum. Nilsson, N. (1971). Problem-solving methods in artificial intelligence. New York: McGraw-Hill. Pylyshyn, 2. W. (1984). Computation and cognition: Toward a foundation f o r cognitive science. Cambridge, MA: Bradford. Rich, E . (1983). Artificial intelligence. New York: McGraw-Hill. Rosenbloom, P. S. (1983). The chunking of goal hierarchies: A model of practice and stimuiusresponse compatibility. Doctoral dissertation, Carnegie-Mellon University (available in Laird, J. E., Rosenbloom, P. S., & Newell, A. (1986). Universal subgoaling and chunking: The automatic generation and learning of goal hierarchies. Hingham, M A : Kluwer, 1986).
52
Paul S. Rosenbloom and Allen Newell
Rosenbloom, P. S., & Laird, J. E. (1986). Mapping explanation-based generalization onto Soar. Proceedings of AAAI-86. Philadelphia: AAAI. Rosenbloom, P. S., & Newell, A. (1982). Learning by chunking: Summary of a task and a model. Proceedings of AAAI-82. Pittsburgh: AAAI. Rosenbloom, P. S., & Newell, A. (1986). The chunking of goal hierarchies: A generalized model of practice. In R. S.Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach, Vol. II. Los Altos, CA: Morgan Kauffman. Rosenbloom, P. S.,& Newell, A. (1987). Learning by chunking: A production-system model of practice. In D. Klahr, P. Langley, & R. Neches (Eds.), Production system models of learning and development. Cambridge, MA: Bradford Books/MIT Press. Sauers, R., & Farrell, R. (1982). Grapes user’s manual (Tech. Rep.). Pittsburgh, PA: Carnegie-Mellon University, Department of Psychology. Seibel, R. (1963). Discrimination reaction time for a 1,023-alternative task. Journal of Experimental Psychology, 66, 215-226. Shepard, R. N. (1961). Role of generalization in stimulus-response compatibility. Perceptual and Motor Skills, 13, 59-62. Simon, H. A., & Hayes, J. R. (1976). The understanding process: Problem isomorphs. Cognitive Psychology, 8. Smith, G . A. (1977). Studies of compatibility and a new model of choice reaction time. In S. Dornic (Ed.), Attention and performance M.Hillsdale, NJ: Erlbaum. Snoddy, G. S. (1926). Learning and stability. Journal of Applied Psychology, 10, 1-36. Thorndike, E. L. (1913). Educationalpsycholog. ZI: The psychology of learning. New York: Bureau of Publications, Teachers College, Columbia University. Welford, A. T. (1980). Choice reaction times: Basic concepts. In A. T. Welford (Ed.), Reaction times. London: Academic Press. Wickens, C. D. (1984). Engineering psychology and human performance. Columbus, OH: Merrill.
A~ N " I O ~ / ~ N ' I R O AR- L MIR WORKING -0RY Walter Schneider Mark Det weiler LEARNING RESEARCH AND DEVELOPMENT CENTER AND PSYCHOLOGY DEPARTMENT UNIVERSITY OF PITTSBURGH PITTSBURGH, PENNSYLVANIA 15260 I. Introduction ............................................. 11. Traditional Views of Short-Term Memory
. .. . . . . . . .. . . . . . . . ... . . . . . . ..
111. A ConnectionistKontrol Architecture for Working Memory
....
Architectural Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . B. Microlevel Structure . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . C. Macrolevel Structure. D. System-Level Structur E. Context Storage.. . . . F, Simulation Methods . IV. Interpretation of the Worki A. Buffer Phenomena . . . . . . . . . B. Multiple Buffers . . . . . . . . . . . . . , . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . C. Coding Item and Order Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Order of Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. RehearsalLoops ...................................... V. Context Effects, Proactive Interference, and Release from Proactive Interference . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Episodic Memory . . . . . . . . . . . 8. Proactive and Ret C. Release from Proactive Interference . . . . . . . D. Recency and Primacy Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Overloading STM .......... VI. Skilled Memory, Mnem Rules for Skilled Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................ VII. Serial Outputs and Chunking . . . . . . . . . . . . A. Sequential Hierarchical Output . . . . . . . . . . . . . . . . . . . . . . B. Chunking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................. VIII. Workload and Working Memory. . . . . . . . . . . . . IX. Working Memory in Learning and Skill Acquisition . . . . . . . . . . . . . . . . . . . . . A. Distributing Practice. .. ................................. B. Phases of Skill Ac ..".......... _............ X. Final Comments . . . . . ..................................... .. . ... . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.
THE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 21
I
53
54 56 57 51 59 61 63 65 68 71 12 80 81 82 83 84 85 85 88 91 92 93 94 101 101
105 106 108 109 110
112 114
Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
54
Walter Schneider and Mark Detweiler
1. Introduction
The years since the mid 1950s have witnessed a number of important movements in the study of short-term memory (STM). Miller (1956) introduced the concept of capacity limits of STM, citing the magic number 7 f 2; Broadbent (1958) drafted the first serious information-processing model of STM; and Brown (1958) and Peterson and Peterson (1959) rediscovered the technique of using an interpolated task to prevent rehearsal over a brief retention interval. In the 1960s, Melton (1963) advanced the view of interference as the source of all forgetting, Keppel and Underwood (1962) demonstrated the reality of proactive inhibition, and Waugh and Norman (1965) and Atkinson and Shiffrin (1968) proposed what must now be regarded as the modal models of STM. In the 197Os, Baddeley and Hitch (1974) developed and elaborated the idea of a working memory system, And in the 1980s research and theory building are continuing to further differentiate the phenomena and mechanisms behind working-memory systems, e.g., Baddeley (1986) and Chase and Ericsson (1981, 1982). Since the mid 1970s, the modal model of STM has come under increasing criticism (Baddeley, 1976, 1986; Crowder, 1982; Klapp, Marshburn, & Lester, 1983; Klapp, 1987). STM capacity appears more variable than Miller suggested, varying from size 2 in digit canceling (Kirchner, 1958) to size 80 in a skilled memory task (Chase & Ericsson, 1981). More importantly, most real-world tasks could not be completed if working memory had only five to nine items. For example, production system models such as ACT* (J. R. Anderson, 1983) used to simulate real-world tasks typically require a working memory of 20 items to maintain variable information and the goal stack. Further, consider a task such as electronic troubleshooting. To troubleshoot effectively one must at any point have in working memory the fault, the good state, the position in the circuit, the critical input and output signals, the expected signal, and the current hypothesis. If technicians are temporarily interrupted while tracing a fault, they do not not have to start all over. After a few seconds, they continue as if the interruption had never occurred. As a final difficulty of capacity-limited theories of STM, consider that practitioners interested in human workload have long sought to identify the “red line’’ at which performance undergoes catastrophic failure, e.g., airtraffic controllers being interrupted and completely losing their ability to direct air traffic. Such failures are very rare. Humans tend to become slower and somewhat more error prone with increases in task loading, but there is no “red line,” and catastrophic failures do not appear suddenly when the task requires remembering more than seven chunks of information. In other words, human performance shows “graceful degradation” in situations of memory overload.
Working Memory Architecture
55
In this article we trace some of these developments and offer a view of working memory situated within a 1980s connectionist framework. We also discuss a number of phenomena which d o not fit neatly into the textbook treatments of the modal model. And while we nevertheless endorse the core idea of some bufferlike processes of the modal model, we seek to draw attention to the need for a new class of models that can handle a range of working memory phenomena, not just the standard digit-span task. In this article we describe one architecture from a class of architectures for working memory. We use the term “architecture” as it is used in computer science (see J. R. Anderson, 1983; Laird, Rosenbloom, & Newell, 1986), meaning a systematic approach to the configuration of computational components to accomplish some information-processing tasks. The proposed architecture illustrates both the limitations and capacities of human information processing, We also discuss human phenomena that identify qualitative features of human information processing and that should exhibit qualitative features of an architecture of working memory. The connectionist/control architecture assumes processing occurs in a set of modules organized into levels and regions, e.g., vision, speech, semantic. The regions communicate with each other on an innerloop of connections. This loop allows information to be transferred among input, output, and other regions, e.g., semantic or context. The information transfer within and among regions is modulated by a control processing system that controls the maintenance and output of information from modules. A new feature of this architecture is a proposed context-storage module that associates the content of messages in the innerloop with the temporal context. The context storage system is able to reload modules after short-term information decays or is displaced. In addition, it provides a means of achieving stable, robust processing under conditions of high workload. We define working memory in a manner similar to Baddeley (1986, p. 34) as “a system for the temporary holding and manipulation of information during the performance of a range of cognitive tasks such as comprehension, learning, and reasoning.’’ To more temporally bound the range of working memory, we examine tasks in which the expected time to load an element into or retrieve it from working memory is less than a brief time (operationally defined as 10 sec). We are not overly concerned with categorizing something as long- or short-term memory, but rather, we define memory based on temporal dimensions and discuss experimental data and examples in terms of this new model. We begin by reviewing the traditional models of short-term and working memory. We then describe a connectionist/control architecture for cognitive processing that describes the types of memory and processing strategies that exist in such a system. The new architecture relates three modeling themes. First, the connectionist structure draws heavily from the
56
Walter Schneider and Mark Detweiler
concepts of connectionist modeling (Rumelhart & McClelland, 1986b). Second, the control structure is based on automatic and controlled processing theory (Shiffrin & Schneider, 1977; Schneider, 1985; Schneider & Mumme, 1987). And third, the combination of connectionist and control structures enables the architecture to accomplish many of the information processing operations associated with production systems (J. R. Anderson, 1983). We review a variety of literature on STM and provide an interpretation of it within the proposed architecture. 11. Traditional Views of Short-Term Memory
As noted above, two of the most influential models of STM were developed independently by Waugh and Norman (1965) and by Atkinson and Shiffrin (1968, 1971). Borrowing from James’s (1890) terminology, Waugh and Norman proposed a model exhibiting independent primary and secondary memories. Primary memory was cast as a brief storage system markedly limited in capacity. This capacity can be roughly equated with a hypothetical buffer composed of a fixed number of slots. All information entering primary memory is either rehearsed or forgotten. If rehearsed, the information can be transferred to secondary memory from which it decays more slowly. Information can be lost from short-term store (STS) both as a function of delay over time and/or as a function of new items displacing old items. In other words, the longer an item resides in a slot without being rehearsed, the greater its degree of decay; an old item is thought be be displaced as a new item enters STS and occupies its slot. Note that in spite of the presumed rotational character of STS, early items from a list might not be lost if they are transferred into secondary memory. Shortly after Waugh and Norman published their model, Atkinson and Shiffrin described mathematical models of learning and memory known as the Atkinson-Shiffrin model of memory (1968, 1971). Theirs too is a dualcomponent concept of memory, albeit one comprising a sensory register in addition to a STS and long-term store (LTS). This model was more differentiated than previous models, seeking to account for the richness of attention and memory phenomena; e.g., Atkinson and Shiffrin tried to specify how comparisons are made, how retrieval is controlled, and how items are transferred from the STS to the LTS. In doing so, they made the distinction between features of processing structure and control processes. The structure refers to the aforementioned register and stores, treated as a serial set of stages through which information is processed. The control processes refer to components of processing such as decision rules, organizational schemes, retrieval strategies, and problem-solving techniques. In contrast to the permanent structural components, control processes were characterized as optional, i.e., under the subject’s direct control.
Working Memory Architecture
57
Baddeley and Hitch (1974) proposed a more complex STM system than those reflected in the unitary- or multiple-system theories of the late 1960s and early 1970s. They elaborated the idea of a working-memory system comprising separable subsystems. The articulatory loop is one of the subsystems, cast as a passive mechanism resembling a tape loop of limited duration used to store articulable information. In its later form (see, e.g., Baddeley, 1983, 1986), the articulatory subsystem is viewed as more active and made up of a phonological input store and an articulatory rehearsal process. A second subsystem is the visuo-spatial scratchpad, or as Baddeley (1 986) prefers, the visuo-spatial sketchpad. This subsystem is described as being specialized to maintain and manipulate visuo-spatial images. It resembles the articulatory loop in that it is basically an input store. Further, it too is regarded as active in the sense that memory traces are thought to be regenerated by a process outside the store itself. Finally, the central executive is the subsystem assumed to coordinate information from the articulatory loop and visuo-spatial sketchpad. It serves the role of deploying attentional resources and of selecting and operating central control processes and strategies. 111. A Connectionist/ControI Architecture for Working Memory A.
ARCHITECTURAL PRINCIPLES
In this section we examine working memory from the perspective of a new architecture. Rather than using a traditional computer metaphor for the structure, we propose a commingling of ideas from neurophysiology, connectionist modeling, and controlled and automatic processing theory. Five principles suggest architectural constraints. First, we assume that processing occurs in a network of modules having a similar structure but differing in their inputs and outputs. This is suggested by the similarity of structure of hypercolumns of cells in the cortex of the brain (see Mountcastle, 1979), except that the hypercolumns differ in the input and output connections. Second, we assume local specialization of function, i.e., that a given module specializes in a particular class of processing. For example, semantic modules may process words from a given semantic class. Evidence from neurophysiology suggests that a small region of cortex specializes in processing a small set of stimuli from a specific class, e.g., a 1 mm area of V4 visual cortex processes lines of given angles and colors from a 2 area of the visual field (Desimone, Schein, Moran, & Ungerleider, 1985). Cortical maps of the connection anatomy between regions of cortex are becoming very detailed in function (see Van Essen, 1985), suggesting there is a great deal of specialization of the connections among small areas, e.g., 10 mm2 and localization of function. O
58
Walter Schneider and Mark Detweiler
Third, we assume that the knowledge is stored in the connection weights between neural-like units in the system. Physiologically the connection weights are likely to be the synaptic dendrite connections between neurons. The strength of the connection or the size of the weight is assumed to change with learning. The greater the weight between the input and output unit, the more the input unit activates the output. Storing information in connection weights is the defining characteristic of connectionist modeling (see Rumelhart & McClelland, 1986b; Schneider, 1987) and connections are very prevalent in the cortex. The connections provide an associative memory, such that a pattern in one module can evoke a pattern in another module. Associations are stored distributively, typically with many patterns in the same set of connections (see Hinton, McClelland, & Rumelhart, 1986). We assume that input to a module is a vector of activation, e.g., the letter A might be coded as 0, 1, 1, 1, 1 where the 0s and 1s represent the absence and presence of the features, e.g., vertical lines, horizontal lines, backward slant, and forward slant. The set of connections, i.e., association matrix, can store only one association per input vector, yet it can store approximately half as many random association pairs as there are connections. If input vectors are correlated, there is greater interference between the output associations (see below). Fourth, we assume the connection weights may change with a variety of rate constants. The rate constants determine how rapidly the connections change as a function of interpolated learning and the retention interval. Hinton and Plaut (1987) have demonstrated that having fast and slow rate constants in connectionist models can speed learning, reduce retroactive interference, and speed recovery of previously learned material. They refer to connections with large rate constants as fast weights because they change quickly. Connections with low rate constants are called slow weights. At first glance one might object to multiple-speed weights as being unparsimonious; however, neurophysiological evidence currently points to the existence of over 50 neuromessengers, with time courses ranging from milliseconds to 30 min (Siggins & Gruol, 1986); even a simple motor ganglion synapse exhibits three time constants (Barrett & Mangleby, 1976). Mishkin, Malamut, & Bachevalier (1984) have proposed the existence of rapid fast and slow speed learning-based monkey studies in which limbic lesions disrupt immediate memory for events occurring a few minutes prior without disrupting memory acquired slowly (i.e., after several trials) and tested after 24 hours. Consequently, it seems prudent to assume that mu!tiple-speed weights exist, rather than a single-speed weight. Fifth, we assume a modulatory control system that regulates the flow of information among modules. This system has limited memory relating to control processes. It is the mechanism that produces attentional phenomena, in effect facilitating the sequencing and refreshing of information in the network. The control-processing system is a version of the system
Working Memory Arcbiteeture
59
(CAP 1) for implementing automatic and controlled processing (see Shiffrin & Schneider, 1977; Schneider 1985; Schneider & Mumme, 1987). We describe the architecture for working memory at three levels of detail. The microlevel represents a potential neural-like network that can produce associative processing and attentional phenomena, e.g., how visual features are associated to a code for a letter. The macrolevel represents the attentional control and message transmissions within the system, e.g., how memory scanning occurs. The system level represents the interactions among regions, e.g., how visual and auditory message transmissions are coordinated and how contextual biasing of message association occurs. The micro and macro levels of the model are the same as those used in the CAPl model (see Schneider & Mumme, 1987). It is important to understand the relationship among the three levels. We recommend that the reader first get an overview of the three levels by examining Figs. 1-3, reading the captions, and then read the text. Readers who are more familiar with buffer models than connectionist models might benefit from examining the figures in a topdown order: the system level (Fig. 3), iIlustrating regional processors and levels of processing; the macrolevel (Fig. 2), illustrating buffer phenomena and sequential processing; and the microlevel (Fig. l), illustrating how a neural-like system could store, categorize, and transmit information. The following text goes bottom up, i.e., micro, macro, and system, illustrating how each level is built from elements of the previous level of detail. B.
MICROLEVEL STRUCTURE
Figure 1 illustrates the microlevel structure of the model. Information processing is assumed to occur in modules, e.g., M3 in Fig. 1. The message is represented by the state of the output units of the module. The set of activities of the output units is the message vector (MV) for that module, e.g., a code of 0, 1, 1, 0, 0, 1 , 1 . Each output unit sums the activity of its inputs. Associative knowledge is stored in the connections between the message vector and output units. Learning involves changing these connection weights. The activation of each output unit is a logistic function of its input. The logistic function produces a graded output as a function of the input, which has both minimum- (no firing) and maximum-firing level of output. The output of the whole module is modulated by an attenuation unit. This unit modulates the vector messages as a whole. If the attenuation unit is fully activated, all of the output units are inhibited and no message vector is output from the module. If the attenuation unit is not activated, there is no inhibition and the output units transmit the message vector at full strength to the modules at the next level of processing. In the CAPl simulation, attenuation is implemented by multipyling all the output units of the module by a fraction (the attenuation level) to determine the strength of the message vector. Within each module, different types of cells, called report cells, send scalar information to controlled processing. The activity report from the
60
Walter Schaeider and Mark Detweiler
n output Activffy R e p a t
b <)
Pr'or'tY
Attenuation
Fig. 1. Microlevel structure of the CAPl simulation. Processing is assumed to occur in networks of neural-like units. Units are organized into modules (the box labeled M3 outlines the third module) that process a particular class of inputs. Information between modules is transferred as a Message Vector (MV) on fibers connecting the output of one module to the input of the next. In the diagram, information flows from left to right (e.g., the top-left MV might encode visual features, the two left modules letters, and the right module words). Each module contains a vector of output units (the seven small triangles in each module). The output units receive input from other modules and connect autoassociatively to themselves. The recurrent connections from the bottom of each output unit going up and connecting to the other output units in the same module represent the autoassociative connections. Each of the crossing points above the output units (to message vector or autoassociative fibers) represents an associative connection whose strength of connection can be changed with learning. In the rest of the diagram the reverse arrowtype connections represent excitatory influences and the flat connections represent inhibitory influences. A module's output is controlled by an attenuation unit (the large circle) within the module. The attenuation unit regulates information flow from the module. Each module's activity is regulated by a control structure (the box labeled C3 represents the control structure for the third module). Each module reports its activity to the lower-level control structure via activity report and priority report units. The lower units (Iabeled 1, 2, 3) illustrate a potential control circuit, Cell 1 receives the activity reports from the module and inhibits the activity of neighboring modules. Cell 2 inhibits Cell 3, reducing the attenuation activation, thus reducing inhibition of the output units, thus enabling a message vector to transmit. Cell 2 is assumed to habituate, resulting in a burst of output and sequential switching of attention.
module to the control level communicates the module's assessment of the activity and importance of the current vector within the module.' The control level uses the activity report to determine whether an input is recognized, if there is a match between inputs, or that a module has something 'In the CAPl simulation there are two types of report units. The activity report is a measure of how active the module is, e.g., the sum of the squared activity of all output units. Thepriority report is a within-module association between the vector message and a priority tag; it indicates how important the present message is for further processing. A local circuit allows the priority cell to automatically transmit the vector, i.e., without modulation by external control processes. This provides the mechanism for automatic processing which is the main topic of Schneider and Mumme (1987).
Working Memory Architecture
61
to transmit.* The control processing for the system provides a method to sequence message transmissions in the network. The microlevel structure is based on general features of cortical neurophysiology. The output cell units have connections similar t o cortical-cortical pyramidal cells, the report cells to cortical-subcortical cells, and the attenuator cells to chandelier cells (see Schneider & Mumme, 1987; Szentagothai, 1979). The box labeled M3 in Fig. 1 illustrates a simple circuit that allows messages from a set of modules to be output sequentially, as might occur in a memory-scanning experiment (Sternberg, 1966). The control structure coordinates the activity of multiple modules. If multiple messages are transmitted concurrently, interference results and information is lost. The above circuit illustrates how the control system can enable one moduie’s message transmission, while inhibiting neighboring modules’ transmissions (see Schneider & Desimone, 1985, for details). The proposed microlevel structure shows some parallels with the available evidence of cortical hypercolumn anatomy (see Schneider & Mumme, 1987).
C. MACROLEVEL STRUCTURE Figure 2 illustrates the macrolevel interactions of a set of modules. The macrolevel of the model has two types of processing-a message type and a control type. Message-type processing involves sending information messages, i.e., MVs of activation from one module to another. An MV would represent a large vector, e.g., of size 200 in the simulation. Control type processing involves monitoring the message traffic, clearing modules, and modulating the transmission of messages. These functions can be implemented with a circuit similar to that shown at the bottom of Fig. 1 (C3). Observe from Fig. 2 that there are three lines between the message vector and the control level. These lines carry scalar information and represent only a single fiber. The control region receives an activity report regarding the activity of the current message (see Figs. 1 and 2). The FEEDBACK signal sets the autoassociative feedback within the module (see Fig. 2). The TRANSMIT signal (Fig. 2) enables the MV to be output by reducing the amount of attenuation of the output units (see Fig. 1). The modules are arranged in levels and regions. Levels represent different processing stages, ’For details of monitoring activity reports using external control process modulation and the development of automatic processing, see Schneider and Mumme (1987). ’In the control circuit (see C3 in Fig. I), memory is implemented by the state of habituation of Cell 2. If two modules need to transmit, they will both inhibit each other. The module with the higher activity/priority will win, blocking the transmission of its neighbors. While the higher activity message is being transmitted, Cell 2 will habituate (see Fig. 1, bottom). After habituation, the second module will transmit its message and inhibit the transmission of the first module’s message, thus enabling a sequential readout of messages.
Walter Schaeider and Mark Detweiler
62
-
MESSAGE VECTOR
CONTROL SIGNALS
LOAD
7 NEXT 3.
ACTIVITY
REPORT
t
FEEDBACK
t
TRANSMIT
Fig. 2. Macrolevel structure. Each square represents a module in Fig. 1 (e.g., M3). The thick lines represent MVs that output an activity vector from one module to the next. The MV output flows from left to right. The thin lines represent control information between the level control and modules and between level control structures. The arrows indicate the direction of control information flow. The output of modules is modulated from a level control structure. v h i s is similar to the control circuit C3 of Fig. I.) The control structure receives an ACTIVITY report from each module and outputs a FEEDBACK and TRANSMIT signal to the module. The FEEDBACK signal determines the autoassociative feedback within the module. The TRANSMIT signal reduces activation of the attenuation unit t o allow output of the vector to other modules. Processing is assumed to occur in a series of levels. Each level communicates two control signals to the next level. The LOAD signal indicates a message should be loaded at the next level; the NEXT signal indicates the next level recognizes the message sent from the previous module and is ready for the next input. The figure illustrates how the sequentially loaded letters C , A , T can be transmitted as a group to the first word-level module of the word CAT.
e.g., visual dots, lines and bars, letters, and words, in which one level feeds information to the next. Each level communicates to the preceding and succeeding levels with two control signals. The LOAD signal between level control structures indicates that a level is transmitting to the succeeding level. The NEXT signal indicates to the preceding level that a signal has been recognized and that the preceding level can reset itself in preparation for additional input from its predecessors. Regions represent sets of levels specializing in a particular type or mode of processing, e.g., visual, auditory, motor, semantic, and lexical. Modules at one level of processing transmit vector messages to the next level of processing. The outputs from a region can occur in several modes. Modules can be loaded sequentially and then transmitted as a set to buffer input. Figure 2 illustrates how one might hear the letters C , A , T , sequentially buffer each
Working Memory Architecture
63
input, and transmit the set CAT to the next level of processing. To buffer output, the modules can be loaded as a set and then transmitted sequentially. The inhibition among the modules within a set (see C3 in Fig. 1) would produce the sequential behavior. This scheme is capable of implementing a sequential output system similar to the typing model proposed by Rumelhart and Norman (1982) (see also Section VII,A below). The sequential output mechanism can also be used to accomplish tasks such as memory comparison and visual search (see Schneider & Mumme, 1987). D.
STRUCTURE SYSTEM-LEVEL
Figure 3 illustrates the system-level interactions of regions of modules. Each region of processing may be a series of levels of modules and control structure as depicted in Fig. 2. Two types of processing that are analogous to the macrolevel exist at the system level (Fig. 3B). The central control structure receives activity reports from each region and modulates the transmission of messages in the central innerloop. The five control signals between the regions and the central control structure are analogous to those within a level, i.e., ACTIVITY, RESET, TRANSMIT, LOAD, and NEXT, except that the RESET signal involves resetting the control sequencing within a region, rather than changing the feedback within levels. The innerloop of processing refers to the communication between modules from each region that have connections to other regions. The central control system can be implemented using hardware similar to that used for controlling a single module (see C3, Fig. 1). The difference between the two is that the central control system receives the control signals from each region and routes the control signals among regions; e.g., if the motor region requests the next message, the central control structure may route the request to the speech-transmitting module. In contrast, within a region the LOAD and NEXT signals come from the next level within a region. The central control structure modulates the output of regions transmitting on the innerloop. As a result of preprocessing in each region, the central control structure need only process a single scalar value (the activity report) from each region and not directly deal with vector messages. The distribution of control among modules, regions, and the central control system avoids the homunculus problem, i.e., delaying all complex processing to a later stage that cannot be specified. An important feature of the system level is that there is no central executive through which the messages pass. A simplified system might have a single region receiving input and output from all other regions (see Baddeley, 1986; Barnard, 1985; Broadbent, 1984). The advantage of the current cross-connection system is that each region can communicate with other regions without passing through another module. This enables faster single-message transmission and allows multiple regions to jointly activate a
64
Walter Schneider and Mark Detweiler
A
MOTOR 0
A
i
CENTRAL CONTROL
V
m M E S S 8 G E VECTOR
cmmm LO40
SIGNALS NEXT
t
& ACTIVITY REPORT
?RESET
TRANSM MI^
Fig. 3. A, System-leveldescription of the model, a top-down view of the regions of processing within the system. Each region represents a series of processing levels, as in Fig. 2. The first or last level of a region (last level for input regions and first for output regions) is assumed to input to the innerloop of connections between regions. The modules on the innerloop have separate message vectors to each of the other modules they connect to. All the lines in Fig. 3A represent MVs (see Fig. 1). The context module sends a MV to all the other modules on the innerloop. The output for the context module is highlighted to illustrate this connection pattern. This figure represents a simple view of one of many possible connection patterns for regions on the innerloop. B, Side view of the system-level architecture. All of the regions of the innerloop connect to a central control system that routes control signals between regions on the innerloop. The system manages message traffic on the innerloop to maintain reliable communications across regions.
region. If regions automatically transmit their high-priority messages (see Schneider & Mumme, 1987), small numbers of high-priority messages can be rapidly processed without requiring controlled processing. The disadvantage of a crossconnection system is that concurrent message transmissions can cause interference. To avoid such interference, the central control structure sequences the transmissions serially (see Schneider & Detweiler, in press, for discussion of compensatory activities to limit interference). This forced sequencing can result in delay or omission of messages; e.g., if too many messages wait for transmission, the regional activity report may decay before the central controller enables the transmission.
Working Memory Architecture
65
The message vector connections among modules are not a single pathway or bus. Each region has its own set of fibers to other regions. In Fig. 3A, the darkened line from context illustrates how each module has its own pathway to the other modules. Interference is not based on whether a vector is being transmitted, but rather on whether the receiving module receives competing messages. Note that there is an independent association matrix between each input and output module, i.e., a set of connections between the transmitting vector and each receiving module. Thus the auditory region might transmit a vector for the sound of the word pen to the semantic region (evoking the meaning of the object) and to the speech region (evoking the speech output of the word). At the same time, the visual system might transmit a vector representing the visual features of a left arrow (evoking a left motor movement in the motor region), and a change in orientation in the spatial region. A neural system with large vectors could support many dissimilar concurrent transmissions without substantial degradation of perf~rmance.~ This system contrasts with Broadbent’s (1984) Maltese cross view of memory in which all messages must flow through a single central-processing system. It also contrasts with single-bus architectures (such as Barnard, 1985). It is not the number of messages transmitted, but rather the number of competing messages received that determines limits on the number of concurrent message transmissions. If the region can determine when it is receiving interfering messages and signal the central controller, the central controller can then begin sequencing transmissions of regions waiting to transmit. The proposed system-level architecture parallels the amygdaloid complex in the brain, which is characterized as having direct and extensive connections to all cortical sensory systems (see Mishkin & Appenzeller, 1987). The convergence and interconnections of the memory region to the amygdala could provide an innerloop style of communication.
E. CONTEXT STORAGE Context plays a critical role in maintaining working memory in a system with many connections. For purposes of illustration we have labeled one of the regions in the innerloop as context (see Fig. 3, left side). We assume that context is a continuously varying representation of the internal state of the individual. It could be implemented in several ways. For instance, context could be the current state of the system, e.g., time of day, emotional state, hunger, or it could be a randomly varying vector, e.g., with each unit of the vector having a probability .5 of changing state every minute. The context vector can output its message to the innerloop. As with the other regions, an association matrix exists between the context vector and each region to which the context vector connects. Within a region, the context vector may connect to many modules. The context associations provide the potential ‘An order of magnitude of 10,OOO would seem reasonable based on the physiology of hypercolumn interactions (Mountcastle, 1979).
66
Walter Schneider and Mark Detweiler
for a very large working memory storage. To illustrate, assume one has R regions and N modules connecting to the innerloop from each region. The context vector could then be associated to (I? - l)N modules. Assume further that there are 20 regions in the innerloop and 30 modules in each region that connect directly to the innerloop. A single context vector and its association matrices could store 570 codes (one/module). It would be unlikely that all the regions would ever have an association to a specific context code (see below). However, the system would certainly have the potential for storing much more then the 5 f 2 typically associated with STM (Mandler, 1967). Storage is dependent on where learning takes place and what code is active in the receiving module. In the CAPl model, associative learning occurs whenever a vector input is followed by a TRANSMIT-activated release of the vector that was previously stored in the r n ~ d u l eUsing .~ the same learning rule here, the only connections modified after a transmission are those that were activated by the input and were connected to a module that received a TRANSMIT control signal shortly after the input transmission. For example, assume a transmission occurs from a module in the auditory region and the context region, followed by a controlled-processing release of a vector from the motor-system module that controls the hand. The only connections that would be modified would be those on incoming auditory and context fibers in the motor module controlling the hand. Although many modules may receive the auditory transmission (see Fig. 3A), only connections in the transmitting modules are changed. Attention in this architecture is the TRANSMIT-activated output of the MV (see Schneider & Mumme, 1987). Attention allocation is limited and varies with subject strategies. What is actually associated to the context is limited by what is attended (see Fisk & Schneider, 1984). The availability of stored information is dependent on the learning constant and decay rate for connection weights. Most connectionist models use some type of delta learning rule in which the connection strengths are changed as some proportion of the difference between the output vector and the input vector (see Hinton & Sejnowski, 1986). The proportion ranges from 0 to 1.O, with values less than .l being typical. The larger the learning constant, the smaller the number of learning trials needed. However, the larger the learning constant, the more serious is the problem of retroactive interference (see below). sIf CAPl were to allow associative learning to occur after an automatic transmission, the association matrices would, deteriorate and the network could no longer perform comparison tasks (Schneider & Mumme, 1987). Limiting learning to only modules that are attended, i.e., that receive TRANSMIT signals, coincides with experimental data suggesting that learning occurs only after controlled processing and not after automatic processing (see Fisk & Schneider, 1984).
Working Memory Architecture
67
The decay rate of connection weights determines how long the previous associations influence the output of a vector. Fast decay weights are advantageous because the association matrices can be used to provide a working memory that is unaffected by the information stored, say, five time constants earlier. For example, if the weight were to decay to half strength in one minute, there would be no proactive interference from any vectors stored five time constants earlier-the connections would have decayed to only .03 of their original strength. Fast decay weights are disadvantageous because the decayed information can no longer be retrieved, e.g., an association learned five constants earlier cannot be retrieved. A system with multiple learning and weight decay rates can substantially enhance the performance of the network. Hinton and Plaut (1987) have found that having both fast and slow weights can substantially speed learning. Typically one set of weights has a high learning rate and decays rapidly. A second set of weights has a lower learning rate and decays more slowly. The fast weights learn quickly so that after a small number of trials the input can evoke the output. During the same trials, the slow weights change gradually. After a large number of trials the fast weights have less influence due to the effects of retroactive interference, weight decay, and the buildup of the slow weights (see Hinton & Plaut, 1987, and below). This multiple-weight scheme allows the network to temporarily alter the connection-weight space so that older memories can be recovered. For example, assume someone has learned a foreign language as a child but does not speak it regularly. The connection weights between the semantic and speech modules are modified as a result of practice with the current language. In a situation in which use of the first language is necessary, the fast weights change during the first few minutes of conversation. This alters the connection space so that it more closely approximates that of the first language. Note that although much of the previous first-language knowledge returns, no significant change occurs in the longterm weights. As the person resumes use of the current language, the shortterm weight changes may temporarily reduce availability of the current language. However, after a change in context or waiting five time constants, the person should be able to operate in the current language with no deficit. For the present discussion of working memory we assume that the contextregion connections have fast learning and fast decay rates compared to the rates of the other regions in the system. This separation matches physical separation of rapid and slow learning (Mishkin et al., 1984) occurring in different parts of the brain. It may be the case that all of the connections in the system have both fast and slow change rates. Nevertheless, for purposes of illustration it is easier to refer to (1) the context weights, implying that these are the fast learning rate and fast decay connections, and (2) the information weights (connections other than context weights), implying that they have relatively slow learning and decay rates. To help illustrate these ideas, consider
68
Walter Schneider and Mark Detweiler
a simple learning example. If one were to try to associate a visual shape to a word presented auditorily, a fast-weight change would occur between the context and the speech output region, and a slow-weight change would occur between the visual and speech systems. After learning a single paired associate, one could perform 20 reaction-time tasks, e.g., saying the word for the shape using primarily the context associations. However, if one had to remember the shape-word pairing after an hour or had to learn five sets of such pairs, the context associations would be of little value due to the effects of weight decay or proactive interference with the other learned codes. The issue of working memory in this architecture is multifaceted, consisting of many memories within the system. At the microlevel, the activity level of each output unit decays with some time constant. Each module is assumed to have feedback connections that result in the categorization and maintenance of codes. The combination of feedback and decay may allow a module to maintain a well-known code indefinitely,6 assuming there are no new inputs to the module. In order for information inside a module to influence the rest of the network, it must be transmitted out of the module. This depends on the activities of the regional controller. This controller is assumed to have memory concerning which modules have been transmitted and of the activity and priority of the messages waiting to transmit. The regional controller may function as a buffer memory allowing the storage of a few vectors. In this form the regional controller illustrates phenomena similar to the buffer models of STM, e.g., Atkinson & Shiffrin (1968). In addition to the memory resulting from dynamic activity, a great deal of knowledge can be stored in the connection weights in the network. Much of this probably represents slow weights and should be considered LTM. However, we assume there is at least one set of fast weights connected to a context vector that is broadly connected to at least the innerloop of processing regions. This vector facilitates one form of intermediate storage and enables context storage of information. By associating vectors to context, working memory can be reloaded by transmitting the context vector.
F. SIMULATION METHODS The simulation results described in this article were run using the CAP1 simulation program described in detail in Schneider and Mumme (1987). The model includes the associative and autoassociative models of J. A. Anderson’s (1983) brain-state-in-a-box model. The model is a connectionist model with a control structure. The results reported in this article represent robust characteristics of the architecture and are not dependent on detailed parameter searches. 6The feedback only helps for patterns the module has previously learned (see Schneider & Mumme, 1987).
Working Memory Architecture
69
The components of the model are illustrated in Fig. 1 . Each module is made up of a 200-element vector of output units. Each output unit sums its input linearly with the decayed value (typically 0.9) of the activity of the unit on the previous iteration. The output of a unit is a logistic function of the input with limits of + 1.3 and - 1.3 activity. Each output unit connects autoassociatively to half the other output units in the module. Autoassociation provides feedback; the strength of feedback varies between 0 and .6, depending on the level of control feedback input (see Fig. 2) to the module. Each output unit connects to half of the other output units in the same module and to half of the units in other modules. With 200-element vectors, a single output unit connects to 100 output units in its own module and 100 units in every other module that receives the message vector. The autoassociative and associative connection matrices are each 20,000 connections per module.' The associative matrices between modules were initiated with a zero strength of connection between all elements. Learning was accomplished using a Hebb-type learning rule that modified the strength of connections so the input pattern would come to evoke the output pattern. CAP1 uses a delta or Widrow-Hoff learning rule (see J. A. Anderson, 1983; Kohonen, 1984). The strength of connection between the input and output is updated by a learning constant multiplied by the differences between the desired output and the output evoked by the input. In terms of matrix algebra, the change rule was delta A = C(R - AS)ST, where A is the associative matrix, R the response (desired output) vector, S the stimulus vector, ST the transposed stimulus vector, C the learning constant, and delta A the change in the strength of association. The output of the module is determined by the attenuation unit. To transmit a message from a module, the level control would activate the TRANSMIT signal (see Fig. 2). This signal inhibits the attenuation unit (Fig. l), allowing the output units to transmit the message vector to the modules at the next level of processing. The attenuation is a multiplication of the activity of the output units (when transmitting, the strength is 0.3 or 70% attenuation; when it is not, it is 0 or 100% attenuation). All input codes are random 200-element binary vectors with a specified correlation. All the output codes had a zero correlation. The sequential context vectors were correlated to the previous context vector (typically .9). For example, between trials 10% (20 elements) of the context was changed from trial to trial. Before a simulated experiment was begun, the autoassociative matrices within the modules were taught the ensemble of target patterns. This is 'The results described here generalize to much larger matrices. The physiological data suggest a much larger number of connections are involved even in small regions of cortex (Mountcastle, 1979). Simulations with small matrices do not generalize due to the effects of spurious correlations with small numbers of elements.
70
Walter Schneider and Mark Detweiler
analogous to testing subjects in a STM experiment and assuming they enter the room with knowledge of English and will be tested using high-frequency words. Typically the system was taught 50 random vectors presented 10 times each. This modified the autoassociative matrix on each presentation with a decaying learning constant of .1 on pass 1, .09 on pass 2, .081 on pass 3, to .039 on pass 10. After this procedure, the autoassociative matrices would produce positive feedback that matched (correlation above .99) the input for all the patterns (see J. A. Anderson, 1983; Schneider & Mumme, 1987). When a module transmitted to another module, it transmitted a short burst, typically eight iterations of output to the next module.' This produced a short burst of output to the next stage. The next module performed autoassociative processing during and after the burst to receive and clean up the vector message. The dependent measure of memory retrieval for the simulations is the percentage of vector match between the retrieved vector and the desired vector. An output unit could be in one of three states: high, activity above .5; neutral, activity between .5 and - 5 ; or low, activity below .5. The degree of vector match was based on how many elements matched between the retrieved and desired vector using a three-state city-block metric. By chance, vectors should have an average error of one/output unit. The percent of vector match metric is the percentage above chance that the retrieved vector matched the desired vector. A 100% match is a perfect match, a 50% match implies an exact match on half the vector and chance match on the rest, a 0% match implies only a chance match. The probability of recalling an item would be a monotonic function of the percentage of vector match. The actual recall rate would depend on the number of vectors learned, feedback, and similarity of vectors. The reader should treat the percentage of match as a simple approximation of the expected recall. All simulated recall trials were run until the network settled on a vector representation. This typically required fewer than 10 iterations. Each iteration involved four components. First, the activity of the output vector was decreased by the decay rate, e.g., each output unit's activity was set to .9 of what it was on the last iteration. Second, the input vector was multiplied by the associative connection matrix (see the cross connections between message vector and output units in Fig. 1). This produced the input activity vector (one element of the vector for each output unit). This vector was multiplied by the feedforward (or associative) constant and added to the vector of output units. This occurred for the first five iterations after the stimulus was presented. Third, the output vector of the previous iteration 'This bursting can be accomplished by having the control cells habituate. For example, if Control Cell 2 in Fig. 1 habituates, the system will tend to output in bursts.
Working Memory Architecture
71
was multiplied by the autoassociative connection matrix (see M3, the cross connection between the output units to themselves with the module in Fig. 1). This was multiplied by the feedback (or autoassociative) constant and added to the input of the output units. Fourth, the activity of each output unit was set to a logistic function (a sigmoidal transformation) of the input to the unit. In memory, the input for each unit at Level 2 would be
I,, = do2,+ f 02i= -rn
+
C O,,W,, + b C O,W,, J
2rn 1
+ exp(-ul,i/rn)
where the first subscript represents the level of processing, the second unit within the level. The constants are d , decay; f,feedforward associative input; rn, the maximum absolute value of the activity level; a, the slope of the logistic function; and b, feedback autoassociative input. I is the input activity; 0 the output activity, and W the connection strength. In the following discussion we describe only changes from the above parameters. The default parameters were the following: feedback, b = .6; feedforward to the next stage; f = .3; decay, d = .9; slope of logistic function, u = 2; number of iterations of a burst, 5 ; correlation of context vectors, .9; correlation of response vectors, .O; number of iterations before match determination, 8; and learning constant, 0.1. Typically 4 to 24 simulation runs were sufficient to produce stable data. A typical free recall sequence involves presenting a list of vectors to be learned and then having the network recall the vectors. On each trial the connection weights between the input and output vector are modified using the delta rule. The matrix is then presented with the input, whereupon the output is evoked and categorized via autoassociative feedback. The percentage of vector match is calculated, and this is considered the measure of learning. At the end of learning a series of vector pairs, all of the inputs are represented to the matrix and the end-of-list vector match is calculated. The end-of-list percentage of vector match is a metric for retention. IV. Interpretation of the Working-Memory Literature
The present architecture provides an approach for representing a class of models of working memory. Within this architecture the modeling process is one of weaving together a set of parameters, modules, and connections to produce specific models for particular phenomena of working memory. We hope that a very similar structure can be used to represent many regions of
12
Walter Schneider and Mark Detweiler
processing. However, the brain is not a parsimonious processing system. We expect the human processing architecture to include many variants of one, or perhaps several, architectures. In this section we describe how the major phenomena of working memory can be interpreted within the connectionist/control architecture. Illustrative simulations of some of the core concepts are presented. The full elaboration of this system will require extensive research and modeling which we hope will be accomplished by a variety of researcher^.^ The architecture includes a variety of processing elements to accomplish stable processing of real-world input. Context-based storage is critical to prohibit catastrophic failures of memory. For example, if one is distracted by a phone call in the midst of writing, one can reengage the ideas focused on when interrupted, even though the phone call may have required temporarily storing 20 chunks of information. The control of module transmissions is necessary to allow one to attend to information in situations with multiple stimuli, or information overload (see Schneider & Mumme, 1987; Schneider & Detweiler, in press). The sequential loading and transfer from regions is necessary to handle sequential input or output. The use of the central controller is necessary to allow multiple regions to evoke common codes while limiting message interference so that critical messages are not blocked, e.g., either seeing a red light or hearing the word brake can cause one’s foot to press the brake pedal, but one can also selectively ignore, or at least reduce the influence of, one of the modalities. We now examine how particular features of the architecture can be used to interpret working memory phenomena. PHENOMENA A. BUFFER
As mentioned at the outset of this article, the classic or “modal model” (Baddeley, 1986) of STM is a buffer model of the type exemplified by the Atkinson and Shiffrin (1968) model. Three major phenomena are often cited in support of buffer models. These phenomena are very stable and have probably been replicated hundreds of times (see Baddeley, 1976, 1986, for reviews). The first is the recency effect in free recall. When subjects are given a list of words to recall, the last few items are recalled dramatically better than the rest of the items (see Postman & Phillips, 1965). Items in the buffer are lost due to displacement rather than due to time delay. For example, Baddeley and Hitch (1977) had subjects classify a list of 12 names as male or female followed by either immediate free recall, a 30-sec blank delay, or 30 sec of copying digits. In the first two conditions, recall of the last item was over 90070, whereas in the filled delay it was only 52%. The 9We are developing computer modeling tools we can provide to other researchers to help expedite explorations of and extensions to this architecture.
Working Memory Architecture
13
lack of a delay effect in the blank interval indicates that trace decay was not a major factor in producing the recency effect. A second related phenomenon is the STM decay effect caused by interference. Peterson and Peterson (1959) showed that the probability of recall dropped from 80% to 10% in an exponentially decreasing function if subjects counted backward by 3s for 3 to 18 sec respectively. A third related phenomenon is a span effect based on studies of digit and word span. When presented a string of digits, letters, or words, subjects can typically recall an average 8.2 digits, 7.2 letters, and 6.3 words, when span is defined by the length of list recalled correctly 50% of the time (Crannell & Parrish, 1957). The present architecture includes buffers in the regional controllers and potentially at multiple levels within the regional controllers (see Fig. 4). Many such buffers exist, and they are essential for the sequential input and output of information. For example, in speech perception typically one to three phonemes make up a syllable, one to four syllables make up a word, one to four words make up a phrase, and two to four phrases make up a sentence. The storage capacity for a specific task depends on the specific codes, stages, and regions involved. If the higher-level representations have codes for the lower-level sets of buffers, a region may be able to store as many words as
Fig. 4. Levels of processing from input to output. This diagram represents the activity of patterns in the network that would occur after sequentially reading the letters for the word cur and outputting the motor movement to reproduce the word. The codes in the boxes show the state at the time of the last motor movement. At the top level, the features of the last letter are active. At the second level, visual feature sets code the letter positions. At the third level, the three letters have been combined into a single visual word code. This visual word code is transmitted as a single visual message to all the modules on the innerloop. The Level 4 motor unit receives the visual code cut and translates it into a motor code for the task of reproducing the word cut. The motor-task code is transmitted to the motor-sequence modules which store each set of movements for each letter. The “T” motor code is transmitted to the motor movement modules converting the code into three separate movements. The motor movement modules are sequentially output, causing the sequential strokes to produce the letter T. The dotted boxes in the center of the diagram represent other regions on the innerloop not involved in this specific message transmission. The context storage mechanism could enable reloading of the modules on the innerloop. The control signals are the same as those described in Figs. 2 and 3.
74
Walter Schneider and Mark Detweiler
consonants, because there may be as many modules for words as there are modules for consonants. Human STM is essentially equivalent whether it is measured in number of consonants or words (see Murdock, 1961). To implement a buffer scheme in the proposed architecture there must be methods to clear a module, maintain a code within a module, and output the code from a module. Clearing a module can occur by several methods, including (1) inhibiting all the units to zero activity, or (2) increasing the decay rate and reducing the feedback. To clear memory in the simulations we used a decay time of 0.07in the absence of feedback, i.e., with feedback set to zero. This effectively clears the memory for a module in about 0.2 sec. Maintaining a code in a module is accomplished by a combination of activation decay and feedback within the module. Within each module an autoassociative matrix (see J. A. Anderson, 1983) associates each previously learned vector to itself. This has the benefit of cleaning up and categorizing noisy inputs (see Schneider & Mumme, 1987; Anderson & Mozer, 1981). With a feedback parameter of .4 and a decay of .9, the system can maintain a well-learned code indefinitely. Note, however, due to the critical role of feedback the module cannot maintain novel codes unless they are similar to previously learned codes; this is because the net feedback is zero for novel codes dissimilar to the learned codes. In order to buffer and hierarchically code information, all of the modules at one level of processing must feed into each of the modules at the next level of processing (see Fig. 4). If the input “C,” “A,” T enters sequentially while reading, it is critical that the module containing the first letter not be displaced by the second or third letters. One method to implement this involves multiplexing the input from the previous stage. If the feature level could selectively gate input to each letter module, loading the second letter would not interfere with the code in the first letter module. However, this requires substantial hardware at the front end of each stage of processing and is unnecessary. Feedback locking provides a method to lock codes in buffers so that the input to a second module at a level will not interfere with the code contained in the first module. If a code is loaded into a module and maintained by high feedback, the combination of the feedback and the nonlinearities of the output function can block the interfering effects of input codes that are dissimilar to the loaded codes. In the simulation, increasing feedback to .6 after the module is loaded will maintain the code within a few percent, even if other dissimilar input codes are activated to load neighboring modules. To illustrate, we loaded vectors A, B, C into three modules, each of which could store any of the vectors (see also Fig. 5 ) . The accuracy of storing the first vector in Module 1 was .964 after A was loaded into the first vector, .947 after storing B into Module 2, and .943 after storing C into Module 3. To load a new vector into a module we set feedback to zero during the ini-
Working Memory Architecture
75
tial load process. This clears the old message. This is the scheme we use in the simulation. When a level receives a LOAD signal it sets the feedback of the modules storing the information to zero. This causes the old vector to decay and the new vector to be activated; the feedback is then increased, locking the code in the module. The proposed sequential loading scheme has trouble loading similar codes into neighboring modules, thus predicting human problems with similar input. The reason for this is that the feedback effect can easily block an orthogonal code but it has difficulty blocking a similar code. Sequentially storing two similar vectors (e.g., with half the elements equal) into neighboring modules at the same level can cause several times the error of storing random vectors. The size of the disruption depends on factors such as the degree of similarity of the input and output codes, the number of codes, and the duration and strength of feedback.’O Storing similar codes disrupts the noncommon portions of the code. This increases the probability of confusions but generally does not result in omissions. To appreciate the influence of code similarity on processing, recall that Conrad (1964) has shown that most errors in STM are acoustic confusion errors, even when the information is presented visually. This would be expected if the vector code for the information were acoustic and should be more likely if the other items in the remainder of the list were acoustically similar. Baddeley (1966) illuminated this effect further by conducting an experiment to assess whether STM would be more disrupted by acoustic or semantic similarity. Here subjects were presented sequences of five acoustically or five semantically similar or dissimilar words and asked to recall each sequence immediately following its presentation. The influence of acoustic similarity was striking; subjects correctly recalled only 9.6% of the similar sequences as opposed to 82.1070 of the dissimilar ones. The influence of semantic similarity was much smaller; in this case subjects correctly recalled 64.7% of the similar sequences and 71.0% of the dissimilar ones. The last step of sequential processing involves having all active modules output as a set to the next stage of processing. For example, after the elements “C,” “A,” and “T” are sequentially loaded into the letter level, the entire set can be transmitted in parallel to activate the word cat at the next level. This can be accomplished by reducing the activation of the attenuation units of all the active modules in a level (see Fig. 1). As long as there are few interconnections among modues at the same level, transmitting information out of a module will not disturb the contents of the other modules as the same level. This is important because it allows one level to ”We are in the process of investigating these relationships. Currently, presenting the module-related codes reduces the likelihood of the module retaining the noncommon portions of the code, resulting in more confusions.
16
Walter Schneider and Mark Detweiler
output to the next level with every new input until the next level recognizes the set of inputs. In this way higher-level modules can detect matches even without a blank. For example, for the input “C,” “A,” “T” the outputs would be “C,” “CA,” and “CAT,” at which point the next level would attempt to recognize the input via autoassociation (see J.A. Anderson, 1983; Schneider & Mumme, 1987). After recognition at level N + 1, that level would return a NEXT signal (see Fig. 2) to reset level N so the next set of elements could be buffered at level N. Sequential information can be encoded and stored in several ways. The first scheme involves positional coding. If the modules at a level are filled in a prescribed order, the connections for each module can provide positional markers, i.e., we assume that each module has separate fibers to the next level, and that each set of fibers codes positional information. The disadvantage of this scheme is that the level must always know which position it is in, and the later modules in a level are rarely used. A second scheme for sequential storage is to have each module use a context-sensitive code to code its information (see Wickelgren, 1969). Rumelhart and McClelland (1986a) have used such a scheme to code Wickelfeatures (Wickelgren, 1969) to encode speech sounds. For contextsensitive coding, each module must include in its code the present input plus at least some features of the previous and following input items. For example, to code the input “C,” “A,” “T” in a context-sensitive code, each module would code “-Ca” “cAt” and “aT-” (where the - indicates a terminator and the lower-case letters degraded or coarse coding of the neighboring letter). With context-sensitive coding the letters can be represented in any module and the next level would detect the word cat. The advantage of a context-sensitive coding scheme is that the inputs can be in any position as long as the same context-sensitive code does not reappear in the input. In speech production, such a coding scheme allows a position-independent representation of nearly all the words in spoken English and can predict many speech errors (Wickelgren, 1969). For the purposes of the present model, such a code could allow sequential storage without requiring specific sequential positions to always occur in specific modules. Thus “CAT” could be represented in many spatial positions, or perhaps in an ambiguous context that could not otherwise be interpreted, e.g., “XCAT.” A third scheme for sequential storage is dynamic reallocation. In this scheme all of the memory at one level of processing is allocated for storage of the first input and the modules are reallocated to store additional information as needed. For example, assume the sequential string ABCD is stored in four modules. Input A would be stored in all the modules, AAAA; B in one-half, AABB; C in one-fourth, ACBB; and D in another onefourth, ACBD. Position information could be encoded either by position
Working Memory Architecture
I1
markers, e.g., with Module 2 encoding position 1 , 1 , 3, 3 if there were 1, 2, 3, or 4 codes stored in the level, or by a context-sensitive code. The advantages of dynamic reallocation are the following: (1) all of the memory is always used, thus producing more reliable and faster processing for smaller numbers of items; (2) the system need not know the sequence length at the beginning of the sequence; and (3) the system degrades gradually with list length, rather than processing, without an error and then always failing, as in a fixed-slot buffer memory. With dynamic reallocation or context-sensitive coding, the modules at one level of processing can be treated as a ring buffer, rather then as a set of fixed positional slots. In a ring buffer the first element of the ring is the position next to the last, and given that it is a ring, there is always a next element to store into. After all of the slots have been stored once, the next item stored replaces the oldest code in the ring. In a ring buffer of size M, one can store the last M sequential inputs. For example, assume the buffer has seven slots. As the sequential input “CATCH” is entered, the codes “-CAT” and “-CATCH-” could be identified. By varying the number of elements back that are activated, groups of one to seven letters could be output to the next stage of processing. Having information stored in a ring allows higher levels of processing to review the previous input until the limits of the ring are exceeded. Baddeley (1986, p. 80) suggests that the articulatory loop can store the last 1.67 sec of reading time for short words. Assuming it is always the last 1.67 sec, this suggests a ring-buffer type of allocation, rather than fixed slots which are reset after each punctuation mark. The process of sequencing items serially followed by a parallel output to the next level is illustrated in Fig. 5 . The activation of each code and the changes in the control parameters are shown. The simulation incorporates feedback locking in order to lock codes in modules. Observe that the feedback parameter is used to clear modules (by being set to zero) and to maintain the modules’ codes while neighboring modules are loaded. Dynamic reallocation is used, i.e., modules are cleared, when needed. The code “CAT” is stored as “CCC,” “CAC,” “CAT” for each input. The LOAD signal from the letter-feature level (line 1 , iterations 1-8) causes the letter-level controller to deactivate feedback (lines 3, 6, 9, iterations 1-8), loading “C” into all the modules. In iteration 9, the letter-level LOAD signal is deactivated, activating feedback (lines 3,6,9), and increasing the activity in Modules 1,2, and 3. The rapid activity increase (iterations 9-12) indicates to the level controller that the code has been recognized. The letter-level controller sends a NEXT signal (not shown) to the letter-feature level, causing the letter-feature level to be cleared. The letter-level controller sends TRANSMIT signals (lines 4, 7, 10, iterations 10-16) to the letter modules, outputting the “C” code to the word level. It also sends a LOAD signal to the word-level controller.
78
Walter Schneider and Mark Detweiler SIGNAL
~
Letter Feature Level “-1)
LoaD
1{
L
z I
~ O O U L ~
I
Letter Level (N)
3
{
ACTIVITY
2
&r--k---I
I FEEDBACK
3
TRANSMIT
4
ACTIVITY
5
FEEDBACK
6
TRaNSMIT
7
ACTIVITY
a
,
- _.I-
I
’
-
---7
” ”
FEEDBACK 9
T R A N S M I T IC
Word Level (N4)
NEXT
I1
L I
6
II
16
21
26
31
36
41
46
51
56
ITERATION
Fig. 5. Sequential input dynamics. This diagram shows the activity levels and control signals for sequentially loading three vectors in a CAP1 simulation. This represents the activity of levels 1, 2, and 3 of Fig. 4. The ACTIVITY signals (lines 2, 5 , 8) show the activity of all of the output units of the vector. The maximum points on the line represent an absolute value of 1.1 units of activation. The FEEDBACK signals (lines 3,6,9) control the autoassociative feedback within the module. The high state represents a feedback constant of .6, the low state a feedback constant of 0. The LOAD signal (line 1) indicates the letter-feature modules transmitting a control signal to the letter-level modules (lines 2, 5 , 8). This causes the letter level to reduce the FEEDBACK (lines 3, 6, 9) so the letter-level modules accept the transmitted message. The vertical dashed lines represent a causal influence in the diagram. The downward lines represent the LOAD signal altering feedback. The upward lines represent feedback changing the activity of the appropriate module. The TRANSMIT signal (lines 4, 7, 10) represents the outputting of the message vector t o the next module in the sequence. At the same time that the TRANSMIT signal is activated, a LOAD signal is transmitted to the next level (not shown). As the next level recognizes an input pattern (for the pattern “CAT”), it returns a NEXT signal to the letter level causing the letter level to clear its memory and send a NEXT signal to its predecessor level.
On iteration 16 the LOAD signal (line 1) is activated for the second set of letter features. The letter-level controller dynamically reallocates a portion of the modules to store the incoming information and reduces the feedback of a module (line 6, iteration 17). With no feedback and new input, the old code is replaced with the new code (“A” replaces “C” in Module 2).
Working Memory Architecture
79
Notice that the transmission of the “A” code did not significantly influence the activity of modules for which feedback was still activated. The slight decrease in activity (lines 2 and 8, iterations 17-24) shows the minor impact of the modules receiving the “A” code while the “C” code is maintained by the feedback within the module. All the modules transmit their contents (“CAC”) on iterations 26-32. The third letter is allocated to Module 3 and replaces the “C” code. On iteration 47 the word-level module recognizes the code and returns the NEXT signal, causing the letter-level to reset. The dynamics illustrated in Fig. 5 show how the architecture in Figs. 1 and 2 performs sequential input processing. Information is buffered at each level producing asynchronous input processing. Each level can hold or retransmit its information if the next level does not respond. The next level can detect patterns even without explicit boundary terminators, e.g., in the previous example, “CAT” was recognized because it was the first meaningful code encountered in the pattern “C,” “CA,” “CAT,” not because there was a blank afterwards. The immediacy of perception in reading (see Just & Carpenter, 1987) is consonant with the proposed processing dynamics of the model. Just and Carpenter assume that at every level of processing there is a chunking process that stores the sequential inputs of the previous level and tends to clear the previous level. The lower the level of processing, the more frequently the buffers are cleared. For example, with each eye fixation the visual features of the previous fixation seem to be lost. McConkie and Zola (1979) examined students reading of AlTeRnAtInG CASEtext. During some saccades the text of every letter was changed (e.g., CASEbecomes Case). This change was not perceived and had no effect on eye movements. The visual features relating the type which specify the difference between the upper- and lowercase letters are not integrated across fixations in reading. Jarvella (1971) found analogous results for the processing of clauses. He found people tend to forget the exact wording of a preceding text more rapidly if it is in a previous sentence than in the current sentence, even when the same wording is present. For example, recall of a six-word phrase was 42% if it was at the end of the previous sentence and 82% if it was at the beginning of the current sentence. The break was quite discontinuous, affecting all words in the clause. These data are consistent with a view that sequential processing involves a multilevel sequence of buffers where the higher-level buffer stores a representation of multiple items from the previous level and storage at the higher level tends to clear the previous level. “The NEXT signal could automatically cause the level to prepare t o clear all modules with the next LOAD signal from a preceeding input. By delaying the act of clearing information, it would be possible for the level t o retransmit information if needed.
80
Walter Schneider and Mark Detweiler
B. MULTIPLE BUFFERS The number of buffers available at a level of processing may vary depending on the nature of the material. Some buffers may be organized sequentially as in audition, some spatially as in vision, or some based on content as in semantic memory. As a single level is dynamically reallocated for greater numbers of buffers, the quality of the coding degrades. We assume there is a limit to the number of buffers at a given level of processing. In addition to the limited number of buffers, we assume the control region can manage only a small number of modules. For example, 50 semantic modules might exist, each specializing in a given class of words, e.g., for categories such as animals or vehicles. Nevertheless, if the controller can remember only the four most active buffers, the number of active semantic buffers would be effectively only four buffers, regardless of the total number of modules. Based on our interpretations of empirical literature, the number of active buffers seems to be in the range of three to four elements. In his reviews of how well Miller’s (1956) magic number had withstood the succeeding decade and a half, Broadbent (1975) marshaled strong converging evidence to suggest that the magic number is closer to three than to seven items. Here we highlight only a select sample of such evidence. First, when people are asked to group common objects together, the modal category size is two or three. Second, when people are asked to divide strings of letters or digits into smaller groups, rehearsing in groups of three speeds learning better than other group sizes (see McLean & Gregg, 1967; Wickelgren, 1964, 1967). Third, pauses during free recall suggest that items are chunked into groups of no more than three or four items. Finally, mathematical analysis of hierarchical organizations suggests that chunks of three or four items provide the most efficient hierarchy for variable-order searches (Dirlam, 1972). The present view is compatible with Baddeley and Hitch’s (1974) concept of multiple buffers or slave processors. If material is presented so that some of the information can be stored in the articulatory loop, visual-spatial store, and motor system, then the working memory system will exhibit a larger memory and a lack of competition among memories. Two recent sets of experiments illustrate the benefit of utilizing multiple buffers. Frick (1984) tested subjects’ digit spans auditorily, visually, and using a combined auditory and visual presentation. When the first four digits were presented visually and the remaining digits auditorily, subjects’ digit spans exceeded an auditory baseline measurement by three digits. In a second set of experiments, Reisberg, Rappaport, and O’Shaughnessy (1984) demonstrated how motor memory can be used to increase the overall holding capacity of the working memory system. Subjects were taught a simple coding scheme that enabled them to store numbers in a finger-based motor program. In the
Working Memory Architecture
81
first of six experiments, subjects were able to use this coding scheme to increase their digit spans by 33% over baseline. In subsequent studies these authors extended these findings and showed that by having subjects practice the coding scheme they were able to increase their digit spans by nearly 50%. We feel these studies offer strong support for the idea that multiple buffers can be exploited to enlarge the effective workspace of working memory. The present architecture requires an explicit mapping out of these buffers, and we hope that emerging physiological data concerning the structure and function of cortical regions may help guide subsequent inquiries into the nature of the regional processing systems. A system with multiple buffers provides a much more robust processing system. If one buffer is disturbed, information can be reloaded from the previous buffer. For example, in Fig. 4 the stimulus code for “CAT” is buffered at four levels. Even if three of the four levels are cleared, the system can still reload the output buffers to output the information. C. CODING ITEMAND ORDER INFORMATION In the present architecture, item and order information are coded in different ways, resulting in differential sensitivity of the two types of information. In STM experiments, order information is often lost more rapidly than item information, and there is some systematicity to what is lost when (see Estes, 1972). In his pioneering work on STM, Conrad (1959, 1960, 1964) was among the first to systematically document errors of omission, transposition, substitution, and serial-order intrusion. Healy (1 974, 1982) devised methods to experimentally separate the retention of item and order information. Healy (1974) showed that the relationship between order errors and serial position differs from that between item errors and serial position. Order errors (transpositions) reflect both primacy and recency effects; item errors (intrusions and omissions) reflect mostly a primacy effect. In addition, Healy showed that when a temporal sequence was to be recalled, subjects made a large number of phonemic confusion errors. In contrast, when a spatial sequence was to be recalled, subjects made no more phonemic confusion errors than expected by chance. Healy (1982) extended this methodology and provided additional evidence in support of independence of item and order information. Order information may be coded by the buffer position, memory in the controller, or as a small part of the code if the module is storing a contextsensitive code. Item information is stored in the vector of the module. The learning mechanisms implemented in the present model can only associate vector code information. There is no LTM of control-type information. For example, a context vector may, through the use of fast weights, associate the vectors in a set of modules to the context but not to the control-type
82
Walter Schneider and Mark Detweiler
information encoding the arrival order. This would enable context-based vectors to be reloaded (see below). However, the sequential-order information coded in the control regions may not be restored. This results in the expectation that order information may be lost rapidly. The loss of order information is not a serious problem if the next level of processing has a code for the full set of items including order. To illustrate, consider that at the word level “CAT” is a single code which could be associated to the current context, remembered, categorized, and maintained via the autoassociative processing within the module (see Schneider & Mumme, 1987). Now consider a novel letter string, “CET”; it does not have a stable code, and its order information is more dependent on the state of the control memory of the letter level which is not reset by context.
D. ORDER OF OUTPUT The retrieval of information from buffers is likely to occur only in a forward sequential order. In order for the central control structure to request the next item, it would need to send a NEXT signal. This could be communicated with a single binary signal, requiring no memory in the central control structure. We assume the local controller can maintain information concerning the last item transmitted and respond to the NEXT request by transmitting the next most active element. In order to support randomaccess retrieval of one of N buffer elements, there must be at least log, N + 1 request signals (the extra 1 indicates a random-address request). In addition, the central control structure would have to maintain information about which modules were active and which had already been read. These sorts of requirements begin to make the central control structure a homunculus, with little processing benefit, given how rarely random access is needed. A more likely architecture would include TRANSMIT, NEXT, and RESET control signals (see Fig. 3B). This would require little memory in the central control structure and allow retransmission of messages or sequences of messages that were not well received, e.g., as in the case after a message is transmitted and no other module in the network recognizes the message. The forward sequential coding of information at a level of processing can account for the difficulty humans have with digit canceling and reverse digit-span tasks. In a standard digit-canceling task, a series of digits is presented one at a time, typically at a rate of one digit every 0.5, 1, or 1.5 sec. The subject’s task is to cancel a digit either when it first appears (immediate digit canceling), or some number back in the series. In a one-back task, e.g., the subject might see a sequence such as 7, 3, 8, 2, 4, and 5 . The task would involve pressing (canceling) the number 7 when the 3 appeared, 3 when the 8 appeared, etc. Similarly, in two-, three-, and four-back conditions, the subject responds by canceling the appropriate digit back in the series while updating his or her memory.
Working Memory Architecture
83
Within the current architecture, the one-back task is easy. Whenever an input occurs, the digit is stored in the speech output buffer. When the next digit occurs, the previous digit is output from the speech buffer while the current digit is entered into the auditory buffer. The two-back task becomes much more difficult. When the third digit is input, the buffer pointer must be reset to the first position and the first digit output, then that position must be cleared (without clearing the second position in the buffer), and the third element must be added to the buffer. This would be a very difficult task for a control structure that evolved to handle forward sequential coding. Errors of loosing track of position would be very common. Humans frequently loose track of item and order information in digitcanceling tasks. Generally, the farther back in the series they must cancel, the more frequently they loose track of such information. For example, in an early variation of a canceling task using lights rather than digits, Kay (1953, in Welford, 1968) found that at the rate of one new light every 1.5 sec, subjects’ average correct performance declined from 95% (one back), to 67% (two back), to 47% (three back), to 35% (four back). Furthermore, when Mackworth (1959) used this same procedure and established presentation rates on the basis of ability to achieve an 80% accuracy criterion, she found subjects needed progressively more time to perform as they canceled farther back in a series, e.g., at one back they required 1 sec between items, at two back 1.6 sec, at three back 2.4 sec, and at four back 3.8 sec. We encourage the reader to get a sense of the difficulty of this task by having a colleague read a series of 10 random digits at a rate of l/sec and then try to respond to the digits two back in the series. The use of forward sequential coding would also explain the difficulty of outputting sequential information in reverse order, as in reverse digit span. The last item could be output by retransmission of the last input. However, to get to the second-to-last item, the list would have to be output from the beginning until the second-to-last item were reached. Normal forward digit span is typically about seven items (Lyon, 1977), whereas reverse digit span is typically two items less (Starr, 1929; Anders & Lillyquist, 1971). The current control structure could do reverse digit span by outputting the last item, then outputting from the start of the list until the next item to be output matched the last item output before the reset. With a buffer size of four, plus another single storage buffer, a reverse digit span of five would be possible. However, the reaction time to output the items in reverse order would be slow, particularly for the items near the end of the input list but not the last item (see Anders & Lillyquist, 1971). E.
REHEARSAL LOOPS
In the present architecture, a rehearsal loop can be implemented if a message can be passed between a series of modules and the message
84
Walter Schneider and Mark Detweiler
transmission results in invertible codes. For example, in Fig. 3 a rehearsal loop could be established between the auditory input buffer and the speech output buffer. A vector code for a word could be transmitted on the innerloop to the speech buffer. Then the speech buffer could transmit the word to the auditory input buffer. In order for rehearsal to be successful, the code must be invertible. An invertible code is one in which a one-to-one mapping exists between each code set. This is the case for speech; for every auditory representation of a spoken word, there is a motor representation of the spoken word. Hence, given that each code is associated uniquely to the other, it is possible to map to the other. Invertible codes are probably not the norm for communications on the innerloop. Most communications probably involve many-to-one mappings, e.g., there are many color patterns that map to the word red. The speech buffer might be able to bias the visual system, but it cannot produce a clear code. One would expect to be able to develop rehearsal loops where the codes are invertible. For example, training might allow one to build an auditory-to-motor rehearsal loop by mapping each word to a specific motor movement and each motor movement to a unique auditory code. The present architecture illustrates many of the buffer memory phenomena associated with working memory. There are buffers in many levels and in many regions of processing. The primary purpose of these buffers is not to provide a STM store, but rather to enable the robust processing of sequential input and output of temporal or spatial information. Different regions may have different numbers of buffers, levels, and time constants.
V. Context Effects, Proactive Interference, and Release from Proactive Interference
A system having only a short-term buffer memory and a permanent LTM would probably not survive in a world with interruptions. The buffer memory would work well as long as the incoming stimuli were continuously being processed and there were enough buffers to contain and operate on relevant information. If the buffers were ever flushed, however, the system would loose its orientation in time and space and would have to search the environment to determine where it was and what it was doing. Having a system with only buffer memories would be like operating a computer system with only active memory and no backup tapes or disks. If the power were ever lost, the system would have to begin from scratch. LTM would help, but it is important not to use LTM as a working memory. This is because the faster LTM is changed, the greater the likelihood that retroactive interference will distort all the previously stored LTM, making it useless (see below and Fig. 6).
Working Memory Architecture
85
A. EPISODIC MEMORY A system that includes a context or episodic memory is a much more stable system. As an analogy, consider how computer centers use periodic backup procedures. Every night the system backs up all of the files stored in the system. If there is a failure, such as a system disk crash, the most that is lost is one day’s work. The context module in the present system provides an analogous sort of backup. Every time a new vector is stored in the innerloop of modules, the connections between the context vector and the message vectors change so that the context vector can reevoke the message vectors. The context vectors may be changing continuously and providing a time stamp of the current contents of the system. Therefore, it might be possible to reestablish past contexts that occurred at various previous states, e.g., 30 sec, 5 min, 1 hr, 1 day earlier. A context or episodic memory provides three critical benefits to the system. First, it makes the system much more robust. If the active memory buffers are flushed to process some critical event, the buffers can be reloaded as a result of the context vector reestablishing the previous contents of the buffers. Second, it provides the features of temporal orientation that Tulving (1972, 1983, 1984) cites as benefits of episodic memory. These include maintaining space-time orientation, allowing time-tagged judgment and retrieval, and providing autobiographical memory. Third, the episodic memory may allow the use of remindings, i.e., remembering a previous sequence of actions, (see Ross, 1984; Schank, 1982) to enable the performance of procedures before procedural knowledge has developed. Additional evidence for the potential value of context comes from the study of memory deficits. Some amnesiacs, such as HM, can perform STM tasks and learn procedural tasks, but they cannot recall or recognize words presented a few minutes before (Cohen & Squire, 1980; Cohen, Eichenbaum, Decacedo, & Corkin, 1985). Perhaps HM’s pathology illustrates how debilitating processing might be if one had only a short- and long-term memory system without any type of time-tagging to maintain temporal orientation. If HM is distracted from a simple task, the task must be reexplained to him again from the beginning. HM cannot survive in everyday life and must be under close supervision at all times. Because he cannot remember where he is or why he is there, he is prone to wandering and getting lost.
B. PROACTIVE AND RETROACTIVE INTERFERENCE In a context memory system, storing knowledge in fast connection weights will show rapid knowledge acquisition, but it will also exhibit severe proactive and retroactive interference. By using the fast learning rate, knowledge is quickly acquired but quickly forgotten. Figure 6 illustrates this
Walter Schneider and Mark Detweiler
100
8
-
B
i
1 I
f.
V
t5 8
50-
I-
V W
>
z
-
0.8 O--,,+,,-d
0 I
2 TRIAL
3
4
I
2 TRIAL
3
1
4
Fig. 6. Retention and recall; the effects of proactive interference, left, and retroactive interference, right. The numbers refer to the learning rate for the delta-learning rule used during acquisition. The vector match illustrates how accurately the retrieved vector matches the to-belearned vector (0% is chance, 100% a perfect match). In contrast, the learning curve shows retention for that pattern on that trial before the next trial is begun. The retention curves represent recall of all four patterns after four learning trials. The context vectors were correlated .9 between trials. The input pattern was activated for a burst of five iterations. Autoassociative feedback was on for all iterations and the output occurred on iteration 8.
relationship for a simple simulation. The network was trained to associate four random output vectors to a changing context vector. The context vector was a 200-element binary vector. From one trial to the next, 10% of the elements were resampled, resulting in the context vectors having a correlation of .9. The learning curves (Fig. 6A) show how accurately the context vectors reproduced the desired response vectors. Figure 6A shows the information available at the end of each trial. This is similar to what would be expected in a STM experiment with a long period of distracting material, e.g., as in the Peterson and Peterson (1959) experiment with 18 sec of counting backwards by 3s. Figure 6B shows end-of-list recall for all four associates. The system presented the four context vectors for the four previous trials and compared the output to the originally learned vectors to determine the percentage of vector match. This would be similar to a free recall experiment for a four-item list with a distracting task to eliminate any retrieval from buffer memories. The first association is learned nearly perfectly over a wide range of learning patterns. This is because in a new connection matrix there are no previously learned associations that activate incorrect connection patterns. In the present architecture, a pattern activation involves evoking a pattern
Working Memory Architecture
a7
and then categorizing the pattern (via autoassociative feedback) to be one of the previously learned patterns. A vector only one one-twentieth as strong as another association, e.g. learning constants of .05 and 1.O, produces nearly identical recall on Trial 1, e.g., 92.5% match after a learning trial with a .05 learning rate, 95% for . l , 98.5% for .4,and 100% for .8. Performance resembling a first trial occurs when either the previous connection weights are all zero or when the current retrieval vector is orthogonal to the previously learned vector (see below and Fig. 8). For example, in the current simulation, waiting 20 trial periods reduced the correlation to .12 and produced learning performance very similar to Trial 1 performance. Small learning rates (i.e., slowly changing weights) show serious proactive interference effects (see Fig. 6A). This is because the previously learned patterns interfere with the current pattern. If these patterns point the vector toward previously learned patterns, the feedback will retrieve combinations of old patterns as opposed to the current pattern. With a large learning rate, the current learning trial will swamp the effects of the previous learning trials, e.g., the .4 learning-rate condition in Fig. 6A. If the purpose of the memory is to reload the contents of the last trial, a high learning rate is beneficial. Large learning rates (i.e., fast changing weights) show retroactive interference effects. Figure 6B shows the retention after learning four patterns. For a large learning constant (e.g., .8),Trial 1 retention is nearly at chance after only three intervening learning trials. Note that the ordering of retention conditions on Trials 1 and 4 are opposite. The highest learning constant (i-e., .8) produced the worst Trial 1 and best Trial 4 retention (see Fig. 6B). Fast learning develops the association in a single trial but at the expense of forgetting everything learned previously. Trial 4 was the last trial, and hence retroactive interference was not a problem. In sharp contrast to the condition with a large learning constant, the condition with the smallest learning rate (i.e., . l ) showed the worst Trial 4 retention and the best Trial 1 retention of any of the learning rates. All conditions show a recency effect, with the effect being larger and involving fewer trials with the larger learning rates. The smallest learning-rate condition shows some evidence of a primacy effect. The retention data show that if the purpose of the memory is to retrieve information learned many trials previously, smaller learning rates are preferred. The differences in proactive and retroactive interference for small and large learning rates illustrate the benefit of evolving a system with multiple learning rates. The large learning rates provide for quick acquisition and '*There is a floor effect, in that very low learning rates d o not modify the association matrix and illustrate poorer learning and retention. For example, in the current simulation a learning constant of .1 showed better learning and retention than that of a learning constant of .05.
88
Walter Srhneider and Mark Detweiler
allow the system to perform the task, while the small rates encode information for later retrieval. If the learner practices the task extensively, the small learning-rate connections (slow weights) will acquire the information before the large learning-rate connection associations deteriorate due to retroactive interference. The physiological evidence on rapid and slow learning (Mishkin et al., 1984) suggests that primates have evolved a two-speed learning system. The literature on proactive interference effects in STM research is consistent with the existence of a context memory with fast weights. Recall that the first trial of a STM experiment is nearly perfect (Keppel & Underwood, 1962). Baddeley and Scott (1971) found effectively no decay for the first STM trial for delay intervals ranging from 5 to 36 sec. Subjects’ performance declines markedly from the first to the fourth trials in a STM experiment, e.g., reducing from 68% to 25% (Goggin & Wickens, 1971). This proactive interference is a temporary phenomenon. In the present architecture, the longer the delay between trials, the more the context connection weights decay and the context vector changes, resulting in a greater release from the effects of proactive interference. FROM PROACTIVE INTERFERENCE C . RELEASE
Research on release from proactive interference (see Wickens, 1970) may illustrate the local nature of proactive interference (PI) effects. If subjects are required to remember sets of three words from a single taxonomic category in a STM task, accuracy drops dramatically between the first and fourth items of the list. However, if the next word is selected from a new category, performance increases substantially-almost to the level seen on the first trial. This improvement is called the release from PI. Figure 7A illustrates data from Loess (1968) showing this effect. Subjects were presented sets of six words from one of four taxonomic categories. Group 4A received items alternately from the four categories, whereas group 4s received six items sequentially. Notice the dramatic peaks in the solid lines when the category was changed in the 4s condition, i.e., on Trials 7,14, and 19. These peaks illustrate how switching categories can produce a release from PI. Figure 7A also shows a strong PI effect as a result of repeating the same category, even if three other sets of category items are interspersed between the repetitions, as shown by the drop-off in the 4A condition on Trial 5 . The connectionist/control architecture will produce a category release from PI if different semantic categories are represented in different modules within the network. The buildup of PI is a result of storing multiple patterns in one set of connection weights. To illustrate, if one module codes vehicle information and another codes animal information, then there are two sets of connections (or association matrices) between the context and the modules containing the semantic information. In our current simulations, storage results when a transmission is succeeded by a follow-on transmission from the receiving module (see above and Schneider &
Working Memory Architecture
89
A 1.00 r
I-
a
&
a
O
3
6
9
12
15
18
21
24
TRIAL
B
t
3
6
9
12 TRIAL
15
18
21
24
Fig. 7. Release from PI. A, Human data from Loess (1968). B, Results from the simulation. The solid lines are the 4s conditions (massing all the items for a category). The dashed lines are the 4A conditions (alternating all four categories before repeating an item).
Mumme, 1987). Hence, learning only occurs at the intersection of those fibers which input a message just before the module outputs a message. In a release from PI experiment, all of the modules could potentially receive a context message, yet only the module containing the rehearsed item would output a message. This means that only connections within that module would be changed. In the simulation of the category learning experiment, the model was presented 24 items from four categories. The four categories were represented by four sets of different association matrices. A word rehearsal was assumed to involve a transmission of the word from the auditory module to the semantic module, and a transmission of the semantic code to the auditory module. Prior to every transmission the context vector was transmitted. To simulate time delay, 10% of the context vector was changed randomly on every trial. The learning constant was .l. On each trial the word and the semantic vector were associated to the context. The context was then used to retrieve the vectors. Figure 7B plots the percentage of match between the retrieved and the tobe-learned vector. Both the word and the semantic vectors were retrieved. The percentage of vector match plotted in Fig. 7B represents the maximum
90
Walter Scbneider and Mark Detweiler
of the word and semantic vector. This produces slightly higher recall for the first few items than when only the semantic vectors are used. The probability of recall is a monotonic function, e.g., logistic function, of the percentage of vector match. The actual probabilities depend on vector size, number of vector codes, feedback, and noise level. With appropriate parameters, a 50% match could produce a 20% recall, making the simulation data comparable to the Loess data. The simulation produces five qualitative features of release from PI as seen in the Loess (1968) data. First, there is marked PI for repetitions of words in the same category. This occurs both for the 4s condition (e.g., the difference between trials 1,2,3, . . . 6) and the 4A condition (e.g., the difference between Trials 1-4,5-8, . . . 21-24). The proactive interference is a result of interference from the previous learning trials (see discussion of Fig. 6). Second, there is a sharp increase in accuracy or release from PI when the category is changed in the 4s condition, i.e., on Trials 7, 13, 19,23. This is because a new category is assumed to be stored in a different semantic category with a different set of connections to context. Third, excluding the release from PI trials, the 4A condition showed better recall than the 4s condition (Trials 2-6, 8-12, 14-18, 20-24). This occurs in the simulation because the context vectors are correlated .65 in the 4A condition and .9 in the 4 s condition. Remember that in the 4A condition a category is repeated every fourth trial, leading to an average correlation of .9' and .9' in the 4s condition. Fourth, the second exemplar from each category in the 4s condition (Trials 2, 8, 14,20) shows particularly poor performance relative to the first and third exemplars of the category. This occurs as a result of the very good learning of the first exemplar causing more proactive interference on the second trial. The poorer second-trial learning causes less proactive interference on the third exemplar, thus producing better performance on Trials 3,9, 15, and 21 relative to the predecessors.13Fifth, the second repetition of the categories (Trials 5-8 in the 4A condition) is inferior to the preceeding or succeeding set of categories. This is again due to overshoot from learning the first item of the category. The release from PI results provide an indication of how large the effective working memory might be. Each association matrix between the context module and every other module could store one or more vectors. If the context module were to produce orthogonal codes (see Kohonen, 1984), a matrix could store as many vectors as there are fibers. To illustrate, if there were 100 modules in the innerloop and 10 fibers from the context module to the other modules, the theoretical capacity could be as high as lo00 codes. To the extent that the codes are not orthogonal, capacity would be reduced ')This learning overshoot effect on the second exemplar does not occur for larger learning rates (see Fig. 6A Trial 2 versus Trial 3).
Working Memory Architecture
91
accordingly. Human data suggest that probably only one vector can be tied to the context vector for each module in the short term; and over extended periods (minutes), it may be possible to store several sets of vectors. Data from three sets of experiments are compatible with this view. Peterson and Gentile (1965) showed no effects of PI for the first items of a block when the blocks were separated by 91 sec; Loess and Waugh (1967) showed no effects of PI beyond 120 sec; and Kincaid and Wickens (1970) showed the greatest reduction of PI after 45 sec and a reduction of about 74% after 120 sec. In sum, these data suggest that a combination of a changing context vector and perhaps decaying weights allows the system to store a new set of codes every few minutes. The present connectionist model has some similarities to the contextretrieval procedures present in the search for associative memory model, SAM (Raaijmakers & Shiffrin, 1980,1981). In the SAM model, retrieval is based heavily on having items associated with a list context and interitem associations. The model predicts a variety of LTM phenomena, including serial-position effects, list-length and presentation-time effects, temporal aspects of free recall, part-list cuing, and cued recall of paired associates. The current connectionist model provides a mechanism by which a cuing model such as SAM might be implemented in neural-like hardware.
D. RECENCY AND PRIMACY EFFECTS The use of context storage also provides an interpretation of recency and primacy effects in LTM. Tzeng (1973) had subjects perform a free-recall task in which subjects learned four 10-word lists. Each word was presented for 1.5 sec followed by 20 sec of counting backwards by 3s. Tzeng’s data showed a clear recency effect at end-of-list recall and end-of-session recall. The existence of a recency effect following 20 sec of interfering activity violates expectations of basic buffer models. This result would be expected, however, if the context at the time of recall were used as a retrieval cue. Retroactive interference would produce a positive recency effect as is illustrated in Fig. 6B. Since there is typically a delay between the end of one list and the beginning of the next, there will be less information stored with the context vectors active at the ends of the lists, resulting in both primacy and recency effects. Some authors tend to interpret such long-term recency effects as data against the existence of a STM buffer (e.g., Tzeng, 1973; Crowder, 1982), primarily on the basis of parsimony. With the connectionist/control architecture, we expect both short- and long-term recency effects to exist and to have quite different mechanisms. These two mechanisms make different predictions that can be tested. First, increasing the duration of the interfering task should increase the recency effect for long-term retrieval and decrease it for short-term retrieval. Second, combining a short interfering
92
Walter Sehaeider and Mark DetweUer
task at a normal presentation rate in a free-recall task, e.g., performing four digits of a two-back digit-canceling task, should greatly attenuate shortterm recency effects.
E. OVERLOADING STM Context storage enables the network to perform reasonably well, even in situation in which the short-term or buffer memory is heavily loaded. Klapp, et al. (1983) present compelling evidence that such loading can occur without catastrophic effects. When subjects were required to retain letters in a span task, this activity did not interfere with their abilitiesto judge the correctness of greater-thanfless-than statements (Experiment Mi), nor did it impair performance on a modified Sternberg-typescanning task (Experiment #). Similarly, Klapp and Philipoff (1983) found that subjectscould retain letters and concurrently process digits in a missing digits task. Logan (1979) has similarly loaded STM with six digits and found little interaction with number of letters searched in a visual search task. Such results are quite incompatible with the view that working memory has only seven slots. However, in the connectionist/control model, a context storage mechanism can account for these effects. The subject first rehearses the STM list. This connects the context vector to the rehearsed codes. The subject then performs the embedded task, perhaps processing information in the same buffers, but not rehearsing information in the buffers. After the embedded task is completed, the subject activates the context vector that was present at the time of rehearsal and activates the rehearsed items for sequential output. If subjects are required to rehearse similar material in the same modules, interference should occur due to retroactive interference from the second set of material. Mutual interference does occur, despite early rehearsal, if the embedded task requires item and order information to be retained (Klapp et al., 1983, exp. no. 8; Klapp & Philopoff, 1983). Recall (see above) that context coding may provide only weak coding of order information. In summary, the present architecture can accommodate many of the effects of context and proactive interference. Moreover, we submit that some type of context-based storage is needed to guarantee robust information processing, since a system with only active buffers and a slowlychanging LTM is inherently labile and unlikely to survive in the real world. The proposed context-storage system provides interpretations for (1) how episodic memory might function; (2) how the effective working memory might be far larger than four to seven elements; (3) why there is a high level of retrieval and lack of interference effects on the first trial of a STM task; (4) why release from PI is expected if different semantic categories are represented in different modules; (5) how LTM recency effects might arise; and (6) how an information-processing system might still perform even when STM is heavily loaded.
Working Memory Architecture
VI.
93
Skilled Memory, Mnemonics, and Levels of Processing
If working memory includes a large number of regions, levels, and buffers, plus context storage and attentional control, then there are likely to be good and bad strategies for using it. The skilled use of working memory involves allocating and substituting resources to maintain information. We assume the overall resource pool is quite differentiated, with different resources varying in terms of what type of material can be stored, the time required to store the material, PI effects, retrieval time, trace decay, and the robustness of the storage. In terms of the model, we propose that storage is dependent on which modules are active, what the input vectors are to the modules, what codes are in the modules, and whether a module transmits messages after an input. A real-world example of the use of skilled memory comes from the study of a waiter, dubbed JC, by Ericsson and Polson (1987). JC was reported to be able to remember over 20 complete orders without using an external memory aid. In controlled experiments, Ericsson and Polson found that JC was indeed able to perform simulated order tasks with high accuracy. They speculated that JC used retrieval structures analogous to those used by experts in digit span (see below). To remember a sequence of orders, JC rehearsed the first four orders and developed a well-integrated structure for them before trying to remember the next four in the sequence. Ericsson and Polson characterize this structure in terms of a matrix with one dimension representing the relations among items comprising an order. JC associated the customer’s entree to context features (a customer’s face) by constructing an interactive representation. The other dimension represented the items into categories, i.e., into entrees, meat temperatures, kind of salad dressing, and kind of side dish (starch). To create a unique retrieval cue for salad dressings, JC encoded them by their first letter; e.g., to remember four different salad dressings, JC encoded blue cheese, oil and vinegar, oil and vinegar and thousand island as B - 0 - 0 - T . To remember the items in the other three categories, JC relied on different encoding schemes. Temperatures were encoded as spatial patterns, starches as serial patterns, and entrees in terms of repetitions and patterns that resulted from partitioning orders according to cost. JC developed withincategory relationships dynamically, i.e., as he was given a new order he used the different category labels to know where to put a new item and then proceeded to order the old and new items into a coherent structure. Finally, it should be noted that when JC recalled dinner orders he always did so categorically. In the following section we offer a rudimentary framework of rules for thinking about how to develop skilled memory such as JC’s within the proposed architecture.
94
Walter Scbneider and Mark Detweiler
RULES FOR SKILLED MEMORY The connectionist/control architecture suggests five rules for the skilled use of working memory. These rules describe methods of capitalizing on the relative strengths of different types of memory to maximize storage. RULE 1: Use multiple buffers to increase the skilled use and capacity of STM. If a subject is required to perform two tasks, X and Y, and task X can be performed in buffer A, and task Y can be performed in buffers A or B, then task X should be put in A and task Y in B. Many of the experiments on STM load suggest this type of allocation scheme. To be able to use buffering strategies effectively, one must first learn how to alter them depending on situational demands. Different task mixtures will be performed better with some allocation policies than others. For example, digits in a spatial relationship might be stored spatially, e.g., as a visual image of a grid, or verbally, e.g., as the proposition 5 to the left of 8. If the subject must perform a concurrent tracking task, it would be better to store the digits verbally. But when the concurrent task requires auditory processing, it would probably be better to store the digits spatially (Baddeley, Grant, Wight, & Thomson, 1974). If a buffer is likely to be disrupted by irrevelant input, the information should be shifted to a buffer isolated from that input. The unattended speech effect (Salame & Baddeley, 1982) and the suffix effect (Crowder & Morton, 1969) suggest that irrelevant input can disrupt auditory input buffers. To achieve the unattended speech effect, Salame and Baddeley had subjects perform a visual digit-span task with an irrelevant auditory word presented with each visual digit. The irrelevant words reduced digit recall by 46%. In contrast, bursts of white noise produced a much smaller decrement of 19%. To achieve the suffix effect, a subject reads an irrelevant verbal item at the end of a string of digits or words in a span task. The irrelevant verbal item invariably reduces recall of the last few items of the list. In one such experiment, Ayres, Jonides, Reitman, Egan, and Howard (1979) showed that accuracy of the last item dropped from 88% to 32% due to the addition of a word suffix. These effects illustrate that with a continuous verbal input stream, it would be beneficial to recode information spatially and to maintain it in a spatial buffer insulated from the input stream. There are also alternatives to storing information in the form of active codes in buffers; codes may be associated to context vectors or other information vectors. These two types of associations may store information at different rates, show differential effects of proactive and retroactive interference, and may decay at different speeds. The speed with which the context vector can be changed is probably slow relative to the speed with which other vectors can be changed.
Working Memory Architecture
95
Context storage provides a method for rapidly associating information to the current context. It has the potential of being an automatic mode (in the sense of Hasher & Zacks, 1979) of storing information. If the context vector is transmitted periodically, the connection weights can change such that transmitting the context vector can reload the vectors that were present in the network at the time of the last context transmission. If the context vector involves fast-changing weights, learning will be quick but proactive and retroactive interference will limit the usefulness of the storage. RULE 2: Store codes in unique modules that will not be reused until the context changes. This tactic typically involves coding information elaboratively. Storage in the network occurs after a module receives and transmits a vector. To store information in the connection weights of the low-reuse module, the code from the low-reuse module must be transmitted. The benefit of elaborative rehearsal illustrates this type of storage. A subject could learn a word list by verbalizing the words of the list repeatedly. In this case the context weights would be altered for every word. The buildup of proactive and retroactive interference would eliminate any benefit from context-based recall after the first few trials. In contrast, if the subject were to code each word semantically, different modules would code different words. Remember, storage occurs after the transmission of a message. To associate the context vector to the semantic module, the network must transmit the semantic code. To semantically code the word cat, the subject might activate and transmit the concepts “a warm furry object that purrs.” Context would now be associated with that code in the semantic module coding animal-like features. If no other word were stored in that module with the same context vector, there would be no problems with proactive and retroactive interference. Therefore, to use context memory skillfully, one should try to code each word in a unique module. If a second word were to evoke a code in a module that had already stored a code, then that vector should not be transmitted, and perhaps the second-most-activated semantic module should be transmitted. From the present perspective, elaborative encoding and release from PI illustrate the same effect. Simple verbal repetition of items is like Loess’s (1968) 4 s condition (see Fig. 7A, Trials 2-6), in which the same module is reused for all the words. Elaborative encoding is like the 4A condition (see Fig. 7A, Trials 2-4), in which different modules are used on different trials. The differences between the 4A and 4 s conditions (69% versus 34%) are comparable to the differences between elaborative and rote verbal rehearsal. Training may be necessary to establish strategies of the central control system to identify unique modules to transmit, and hence, to store context information. To later retrieve context-stored information, the context vector would have to be transmitted, activating codes in a series. The context-activated
96
Welter Schneider and Mark Detweiler
semantic vectors could then be transmitted to the speech region for verbalization of the words. Note that this system codes order information poorly. There is no inherent coding of order; the system simply has a list of codes associated to a context vector. However, if the context code changes in some continuous manner over time, the strength of connection to different contexts may provide a coarse time-stamping. RULE 3: Develop retrieval cues that are clear, distinct, active, and related to the material to be retrieved. The problem of PI results from associating several output vectors to a single, or several highly related, input vectors. We have assumed that the context vector is a slowly changing vector requiring perhaps two minutes to change substantially; recall that most PI effects in STM procedures dissipate in less than two minutes (see Peterson & Gentile, 1965; Kincaid & Wickens, 1970). By switching attention among a list of well-known items, the subject could rapidly alter what vectors are active in the network. If these vectors were dissimilar, i.e., orthogonal, there would not be a buildup of proactive or retroactive interference. Mnemonic techniques generally provide a list of well-known items to associate information to (see Bower, 1970; Bellezza, 1981, 1982). For example, in the peg-word system, the subject activates a series of images of concrete objects in a list, e.g., one-bun, two-shoe, three-tree, while the method of loci involves committing well-known places to memory. The subject then associates each new word or phrase to one of the images in the list. At recall, the subject sequences through the peg words or locations and retrieves the words associated with each retrieval cue. Using mnemonic strategies would result in better memory in the connectionist/control architecture. To associate a word to a peg requires transmitting the peg-word code, and then transmitting the to-be-remembered code. If the subject repeats only the to-be-remembered words, then the only retrieval code would be the context and perhaps the previous word on the list. Since the context is a slowly changing code, multiple associations build up PI. This interference makes retrieval unlikely if more than a few words are associated to one context, e.g., learning more than three words every two minutes. The advantage of using a mnemonic is that learners can alter the code rapidly by changing the object they are attending to. To the extent that these cues provide orthogonal codes, PI should be greatly reduced. If learners use a well-learned sequence, as in the peg-word or loci mnemonics, they could retrieve the ordered set of retrieval cues. Then, the prelearned retrieval cues and the context could be used to retrieve the newly learned codes. Figure 8 illustrates the importance of using dissimilar retrieval cues in recall. In the simulation we associated a list of four input vectors to four output vectors. Then the model recalled the output vectors using the input
Working Memory Architecture
100
91
B
-
I
0
tI
8
50-
I0 W
>
z
-
0 I
2
TRIAL
3
4
&
I
2
1
3
4
TRIAL
Fig. 8. Similarity of retrieval cues on learning and retention. The numbers represent the sequential correlation between vectors during learning. The learning constant was . 1 in all conditions. (See caption, Fig. 6).
vectors as retrieval cues. Note the proactive interference effect (Fig. 8A correlation .9) when the vectors are correlated. If context is a slowly changing vector, it would have a high correlation from one word to the next, and it would show interference effects similar to the curve with a correlation of .9 between vectors. In contrast, if the peg words were uncorrelated vectors, they would provide recall similar to the curve with a correlation of 0. Reducing the similarity of retrieval cues both reduces the buildup of PI (Fig. 8A) and increases retention (Fig. 8B). This suggests that the effectiveness of spacing in producing a release from PI (Wickens, 1970) and mnemonics have a common mechanism of increasing performance by providing more dissimilar retrieval cues. The use of mnemonics for both intermediate and long-term retrieval suggests that the association of information messages involves both fast and slow weights. The ability to quickly associate new material to a loci retrieval structure or to a peg-word system and to have those associations decay over a period of hours suggests the presence of fast weights between the information vectors. Using the method of loci to remember long stories months later suggests the involvement of slow weights. The SAM model (Raaijmakers & Shiffrin, 1980, 1981) illustrates how rapidly modified associations in LTM might be used. In this model, every time a word is attended, the strength of association of the word context and other words active in STM is increased. Retrieval cues that are related to the to-be-retrieved information allow easier recall of information than unrelated cues. For example, if a category name were used as a retrieval cue, the preexisting associations between the
98
Walter Schneider and Mark Detweiler
category and the exemplar would greatly reduce the amount of learning that needed to occur. The category name would evoke most of the semantic features of the word and in so doing identify which module contained the associated information. The context input need only bias the module to resolve which member of the category to retrieve. The fact that words from a given category are clustered in free recall (Bousfield, 1953; Bousfield & Cohen, 1955; 1956) suggests that multiple words benefit from the same retrieval cue or that they reside in the same module. Humans can quickly associate a few examplars to a number of categories with little evidence of interference (Mandler, 1967). They can also learn to retrieve lists of hierarchically organized material after short study times, e.g., learning up to 112 words after an average study time of 2 seclword (Bower, Clark, Lesgold, & Winzenz, 1969). RULE 4: Use multiple retrieval cues and distribute practice. Most connectionist models use some variant of an error-correction learning rule (see Hinton & Sejnowski, 1986; Rumelhart, Hinton, & Williams, 1986). In the present model, we use a delta learning rule which changes the strength of association in proportion to the error between the vector evoked by the input and the desired vector (generally the vector already in the module as a result of previous processing). If there is no error, there is nothing to correct and hence no learning. Repeated associations to the same vector will typically result in an exponential reduction in amount of learning. The marginal utility of continued rehearsal of the same association decreases as a function of repetitions. However, if the subject switches to a new retrieval cue that is not associated to the output, the new cue will cause a large error so that the learning trial will produce more connection change. Associating an output to multiple input cues provides alternative retrieval paths for later recall. Distributing practice enhances learning because of the nature of changing connection weights with an error-correction rule. Rosenberg and Sejnowski (1986) have shown that a connectionist learning model will learn a set of 1024 patterns with better retention under spaced practice (going through the entire set one at a time) than under massed practice. Massing practice is equivalent to learning with a large learning rate. As mentioned above, large learning rates are problematic because they produce greater retroactive interference (see Fig. 6). If practice is distributed, the network searches the connection space to find a set of connection weights that provide the minimum error for the total ensemble of patterns to be learned. Because the connection spaces generally involve a large number of connections, there are many possible sets of changes in the set of connections that will produce nearly the same output for a given input. By distributing practice, the error-correction rule moves the weight space to a more global minimum for the entire ensemble. In contrast, massing practice moves the
Working Memory Architecture
99
weight space toward a minimum for that one pattern (see Rosenberg & Sejnowski, 1986, for discussion). The presence of context storage increases the importance of spacing practice and provides an interpretation for the generation effect (Cuddy & Jacoby, 1982). If the context vector involves fast-weight changes, repetitions of an item to the same context will result in a lower marginal utility for each repetition. Fast weights are valuable because they enable context-based recovery of information within the same temporal context (see above). Fastcontext weights might be potentially detrimental, in that the majority of learning occurs in these weights and context may not be a good retrieval cue, either because it changes or due to problems of retroactive interference. The generation effect illustrates how context association can harm learning. Cuddy and Jacoby (1982) used a crossword puzzle task to investigate how memory for an earlier solution would influence subsequent puzzle solving. Subjects were presented combinations of reading and construction tasks. In the reading task, the subject read each of two related words, e.g., lawyer, court, while in the construction task the subject read the intact word and then solved the puzzle and reported the solution, e.g., lawyer c --rt. Using this procedure, Cuddy and Jacoby found that a subject’s memory for an earlier presentation of an item can influence subsequent problem solving at least a few minutes later. In addition, they found when a problem was repeated so that its repetition resulted in greater processing, memory for an earlier presentation was less accessible. In the present model, presenting the word earlier would build an association between the context and the puzzle word. The prior presentation of the word would reduce the amount of attention and the amount of noncontext learning the word received, even if it were attended. This type of learning effect produces overshadowing phenomena similar to the Rescoria and Wagner (1972) model. RULE 5 : Use well-learned codes in the receiving modules. Within each module we postulate an autoassociative matrix that associates each learned code to itself. As mentioned above, the autoassociative mechanism is important for cleaning up noisy input and categorizing the input (see J. A. Anderson, 1983; Schneider & Mumme, 1987). The autoassociative effect provides nonlinear feedback so that similar inputs can produce dissimilar outputs (see Schneider & Mumme, 1987). This feedback also helps to maintain information in buffers (see above). This autoassociative effect is the basis of the interaction between longterm and short-term memory. In the simulation, the effect can be removed by setting the autoassociative feedback to zero, thereby simulating the absence of within-module long-term knowledge for the trace. The recall of four paired-associates with a learning constant of .1 had an 18% vector match for a feedback of 0 and 42% for a feedback of .4 (see Fig. 6B).
100
Walter Schneider and Mark Detweiler
To learn arbitrary material, such as digit strings, it should be be beneficial to recode the material in a representation that is already well learned. For example, in Smith’s classic experiment in which subjects recoded each sequence of three binary digits into one octal digit, immediate memory span increased from about 12 to 40 digits (see Miller, 1956). Similarly, Slak (1970) has shown that by acquiring a recoding scheme to translate strings of digits into groups of pronounceable CVCs, one can improve performance markedly on a wide range of digit-based tasks, including serial learning, free recall, recognition, and span tasks. The research on practice effects in the development of skilled memory (Chase & Ericsson, 1981, 1982; Ericsson & Chase, 1981) illustrates the use of all five rules of skilled memory. Chase and Ericsson had their subject SF perform a digit-span task for 230 hours. SF was presented digits at a rate of I/sec and then asked to recall the digits in serial order. Digit span was defined as the number of digits the person could repeat back correctly 50% of the time. Over the 230 hours of practice, SF’s digit span increased from 7 to 79 digits. Chase and Ericsson argued that this skill was accomplished as a result of (1) associating new material to the material in LTM, (2) storing information in a “retrieval structure,” and (3) increasing the speed of encoding and retrieving items with practice. SF’s strategy was to buffer the input stream and to try to associate the information in groups of three or four digits. Digit-buffering illustrates Rule 1, storing information in multiple buffers and moving the information to a lower-activity buffer, while trying to associate it to new information. SF would passively store a group of three or four digits and then encode the digit group into a well-learned code, e.g., track running times such as a world-record mile-running time set by a specific runner. This recoding illustrates Rule 5 , recoding new information into stable LTM codes. SF stored and retrieved information in an elaborate retrieval structure. He would recode digits into sets of three- or four-digit groups; these groups were organized in a hierarchical retrieval structure of groups and “supergroups” of groups of digits. This retrieval structure provided both differential locations at which to store information (Rule 2: store in unique buffers) and unique retrieval cues (Rule 3: use different associations to retrieve the information). For example, storing four-digit milerunning times would not interfere with storing three-digit times for halfmile runs. Observe that if the same buffer were not reused within a short period of time, retroactive and proactive interference would not be a problem. With extended practice, e.g., 230 hours, it may be possible to specialize additional buffers, e.g., mile-running times for the first part of the list, and thus provide more storage capacity. The retrieval structure also provides unique retrieval cues, e.g., associating in a hierarchical structure of groups and
Working Memory Architecture
101
supergroups. After a year of practice, these cues may have become very salient and recoded internally as more orthogonal vectors. I‘ The human working-memory system embodies subsystems that are capable of being deployed in a variety of strategies. In the current connectionist/control architecture, different strategies will exhibit a wide range of effective capacities. If the subject only uses one set of buffers, then capacity is limited from three to five codes. If, on the other hand, the subject uses multiple buffers, then capacity may be limited by the decay time of the buffers, or it may be limited to a capacity of four codedbuffer. If the subject uses context as a retrieval cue, the capacity may be limited to one code/module within the same context. If the subject attends to orthogonal retrieval cues, the capacity may be limited to one code/module for each orthogonal retrieval cue. To develop skilled use of working memory may require extensive training to utilize the best mixture of learning strategies in the face of task-specific conditions.
VII. Serial Outputs and Chunking A. SEQUENTIAL HIERARCHICAL OUTPUT Lashley (1951) stressed that sequential output is a very fundamental and common form of human processing. In this section we provide an interpretation for sequential output, chunking, pause boundaries, and chunkbased information retrieval. Up to this point our discussion has focused on how information reaches the innerloop of processing. Now we discuss how the higher-level codes are converted into sequences of actions. The codes feeding into the innerloop are highly compressed codes that are buffered for transmission on the innerloop. The output of a code may involve sequentially outputting a code that is expanded at each level of processing. Sequential hierarchical output involves one module activating a set of modules at the next stage of processing. A module at level N - 1 transmits a vector, loading three to five modules at level W, the modules in level N transmit sequentially, loading multiple modules at level N + 1. The architecture for sequential output is the same as that for input (see Fig. 2). However, to accomplish sequential output, the sequences of control signals between the level controller and the modules must be altered. For output, the system must load the buffers in parallel and output sequentially. Figure 9, showing a simulation of sequential output, illustrates the output of a sequence of motor movements to write the word cur. Assume that a “An important feature of multilayered connection networks is that they build internal representations, such that codes similar in one representation can be very dissimilar at higher levels of representation (see Hinton, 1986; Ackley, Hinton, & Sejnowski, 1985).
Walter Srhneider and Mark Detweiler
102
MODULE
\ FEEDBACK
3
TRANSMIT
4
ACTIVITY
5
FEEDBACK
6
TRANSMlT
7
ICTIVIIY
8
Lerter Level (N)
I1 I 31
FEEDBACK
9
TRANSMIT
ID
I 1
I I I
I
I !
ACTIVITY
Letter Movements Level (N.1)
FE E D I A C K TRANSMIT
NEXT
14
r' I
'
.I 6
'
' . I ' ' . I .. . I ' . I.. . . I . ' . 1
ll
16
1
21
26
31
I
36
'.
I .
41
'
7
.
I.
'-T-
46
51
1 .
I.' 56
. I-
61
ITERATION
Fig. 9. Sequential output in CAP1 simulation. This diagram represents converting a code for the word cut to the individual motor sequences for Levels 4, 5 , and 6 in Fig. 4. See Fig. 5 for detailed caption description. The "CAT" LOAD signal (line 1) causes parallel loading for the modules for each letter (lines 2, 3, 5 , 6 , 8, 10). These modules are then sequentially output by serially activating the TRANSMIT signals (lines 4,7, 10) of the modules containing each letter. The sequential outputs load the next-level buffers, sequentially activating the letters c, a,and t (line 11). These messages are sequentially transmitted to the next level of processing (line 13). When the letter output module returns the third NEXT signal (line 14, iteration 52), the lettersequence level clears its buffer and issues a NEXT signal to the previous level.
module in the lexical region transmits a code for the letter pattern of cat in the innerloop (see Fig. 3). In the motor system, the central controller first sets the feedback parameter to zero, thus clearing the contents of the buffer. Then the feedback is increased to latch the input for the pattern "CAT" in the module. Note that, since the module buffers the output code, other messages can be sent on the innerloop while the motor region is outputting
Working Memory Architecture
103
the “CAT” stimulus.’s After the multiple buffers “C,” “A,” “T” are loaded in parallel, feedback is maintained at a high level to maintain the traces. Level N n o w begins to sequentially output the active modules to level N + 1. Since the modules at a given level of processing do not interconnect, the modules within a level of processing can transmit their messages without distorting the information of neighboring modules. Sequentially activating the TRANSMIT control signals will sequentially output the contents of buffers. The order of output can be determined in the same ways that sequential input can be maintained (see above). Potential methods for doing this include location-specific coding, e.g., Module I of the stage would always be the first out; context-sensitive coding, e.g., the module with a code indicator at the front of the list, code “-Ca”, would be the first item out, and context would determine the next item, “cAt”, then “aT-”; or strengthcoding, e.g., the first module would have the highest strength and inhibit the gain control of all the other modules until it is output (see Rumelhart & Norman, 1982). Order could be determined by any of these methods within a level of processing. The module with the highest priority would inhibit the output of the other modules at its level and output its message, e.g., set the TRANSMIT signal to transmit the “C” code (Fig. 9, line 4), and the LOAD control signal to the N + 1 level of processing. Level N + 1 begins the same sequence of events as in level N. At level N + 1 the code of the “C” would be converted into the sequence of motor movements to produce the line strokes for the “C.” When level N + 1 finished outputting all its active modules, it would send a NEXT signal (Fig. 9, line 14) to the level N controller requesting the next input. At level N,the next-highest priority module, e.g., “A,” would be transmitted (Fig. 9, line 7). This process would continue until all the active modules at N had been output. Then level N would send a NEXT signal to level N - 1. If the module sending the NEXT signal were on the innerloop, the NEXT signal would be routed through the central controller to the module originating the transmission to the innerloop. This sequential output scheme provides a robust method of outputting information. Should an error occur at any level of processing, the previous stage would have sufficient information to reload the next stage. A module would not clear its contents until the next level had indicated, for all the codes from the previous level, that the information was received, decoded, ”As with sequential input, latching input to a module by using feedback will block other dissimilar messages from distorting the code within the buffer. This implies that nonrelated messages can be transmitted on the innerloop. However, if related codes are transmitted, interference will result, e.g., in the Stroop task (see Dyer, 1973) both the print and color codes are transmitted; since these are similar codes, the feedback latching mechanism will be distorted by these multiple transmissions.
104
Walter Schneider and Mark Detweiler
and successfully transmitted to the next level; e.g., in Fig. 9 the C is not cleared until after level N + 1 reports back that the “T” code was successfully transmitted. This system is asynchronous, meaning that each stage can operate at its own temporal scale, where information at a previous stage of processing is buffered until it is needed at the next stage. If a later stage were to alter its output, e.g., pressing the shift key to type certain characters, the later level could take more or less time for each of its sequential outputs. An additional level of robustness is provided by the contextstorage process of the innerloop modules. For example, assume an interrupt occurred, halting all output and flushing all the modules. Once the network resumed outputting sequentially, the context vector could be transmitted; this would allow the innerloop modules to be reloaded. The network could then begin to output by resuming activity at the point of the innerloop transmissions which preceded the last context-storage event. The process of sequencing information is very similar throughout the system (see Fig. 4). In the input region, modules send a LOAD signal to the next level when information is ready for the next level of processing. The LOAD signal indicates to the next higher level that it should try to recognize (via the process of increasing feedback) a code incorporating the active input at the previous level. The higher input level sends back a NEXT control signal when it recognizes the total pattern of the previous level. The NEXT signal results in the next input to that level, flushing the information at the previous input so that new information can be loaded at that level of processing. In the output regions, each level sends a LOAD control signal to load a series of modules at the next level of processing. The next level returns a NEXT control signal when it has completed all the processing for the previous LOAD signal. The processing in the innerloop is similar, except that the source and destination of the control signals are not limited to a single set of modules. Within the input and output regions, the control LOAD and NEXT signals come from the next level in the same region. In the innerloop, the motor region may get input from any of the regions on the innerloop. The control signals must be routed through the centralcontrol structure. The working memory within the central-control structure must maintain information indicating where to route the NEXT signal when it is issued by a module on the innerloop.16 16Asimple implementation of the central-control routing might involve having the central controller passively monitor the traffic on the innerloop by using changes in activation to specify the intended routing path. For example, if the visual system were to transmit the code on the innerloop, the central-control monitoring would be able to detect the sequential change in activity in the visual region and the region that responded to the visual transmission. Assuming the motor system were activated by the “CAT” transmission, the central controller could infer the modules to which the visual system was outputting. Then, if the motor system were to send a NEXT signal, the central controller would route the NEXT signal to the visual region.
Working Memory Architecture
105
B. CHUNKING The proposed architecture produces many of the chunking effects that Johnson (1966a,b, 1970, 1972) has described. Four phenomena are of special interest. First, subjects will naturally group input and output sequences in groups of three to four elements with longer pauses between groups. In the present model, codes for a given level of processing should not contain more information than the control level can handle, suggesting a need for grouping and increased delays when levels are reloaded. Second, the probability of outputting the first element of a sequence is dependent on the number of chunks at each level of processing, but not on the size of chunks other than the first one at each level of processing. To output a 3, 3, 3 sequence requires decoding three elements at the top level and three at the next level, or six altogether. To output a 3 , 2 , 2 , 2 sequence requires outputting four elements at the top level and two or three at the next level, for a total of seven. Human recall of the first items of a nine-element list is better for a 2, 4, 3 than it is for a 3, 3, 3 or 3, 2, 2, 2 code. In the present architecture, the first elements of every chunk must be output before the first bottom-level code produces output. A failure at any level will terminate the output process. However, the number of elements in unexpanded chunks should not influence the probability of output of the elements of a chunk, i.e., whether the next chunk at a level to be output codes two or five chunks should not influence the probability of output of the elements of the present chunk. The third chunking phenomenon centers on the fact that subjects tend to pause longer between chunks than within chunks (see Broadbent, 1975; McLean & Gregg, 1967; Reitman & Rueter, 1980). This is illustrated in skilled memory studies in which SF outputs digits while performing the digit-span task. By analyzing SF’s verbal protocols, Chase and Ericsson (1981) determined that his speech patterns nearly always followed the same pattern. Digit groups were recalled at a rate of about three digitdsec, with pauses of about 2 sec between groups. The processes of LOAD and NEXT that occur when one level transmits to the next level will produce longer pauses in the outputs. This would be the case particularly when innerloop transmissions are involved, due to time added waiting for other innerloop traffic to be stopped and the NEXT signal to be routed. The fourth chunking phenomenon involves Johnson’s (1970, 1972) characterization of a chunk as an “opaque container” that must be treated as a complete pattern at a given level of processing and not just as the concatenation of the codes of the previous level. According to Johnson (1970, p. 213), a container “is opaque in the sense that recovery from memory of the code container does not allow the S to evaluate the information he has recovered.” Johnson found that if a subject learns multiple strings and
106
Walter Schneider and Mark Detweiler
repeats elements of a chunk, but not the full chunk, accuracy does not improve. If, however, the full string or the first chunk is repeated, performance does improve; e.g., if one repeats the string 94 487 3587 then 39 687 3932, repeating the 87 3 sequence on every other list produces no greater recall than random digits. In the present architecture, the higher-level codes are encapsulated codes containing a distributed representation of the total information that is not divided into individual elements until the next level decodes it. If most of the learning occurs in the innerloop, there is little benefit for repeating portions of the lower-level codes. In summary, the connectionist/control architecture can perform robust sequential output which exhibits many of the phenomena associated with serial output and chunking. Each level of processing buffers and encodes information. Control signals between levels, e.g., the NEXT and LOAD signals, provide a single mechanism for accounting for chunking effects in input, innerloop, and output processing.
VIII. Workload and Working Memory The current architecture can perform multiple tasks concurrently. The system has a variety of resources that can be allocated in different ways to meet the demands of different task combinations. When multiple tasks compete for limited resources, processing will either be delayed or errors will result. This architecture includes many types of resources, e.g., buffers, regions, control structure, and connection weights, and contrasts sharply with Kahneman’s (1973) proposal that attention is a single undifferentiated resource. The present architecture is consistent with Wicken’s (1980) view that resources are differentiated by modalities. However, in addition to competition for specific regions as in Wicken’s model, the present architecture emphasizes the importance of competition for the control structure. This architecture and simulation model are also used to account for human attention phenomena and the acquisition of component skills (see Schneider & Mumme, 1987; Schneider & Detweiler, in press). In the present section we limit our discussion to how the connectionist/control architecture can account for workload effects in memory tasks. The connectionist/control architecture can employ five strategies to perform concurrent tasks. The first strategy is to buffer and delay messages for one of the tasks until the other task is completed. Recall that the system is asynchronous, with buffers at every level of processing. If two tasks require the same set of modules on the innerloop, the central controller can sequence the transmission on the innerloop to time share the use of critical modules. Since both the inputs and outputs are buffered, the time sharing generally results in longer reaction times, but not greater errors. Research
Working Memory Architecture
107
on the psychological refractory period (see Smith, 1967) illustrates such slowing. If the subject must respond to two stimuli presented successively, the response to the second stimulus is delayed by about the time required to complete the response to the first signal. The second strategy is to move a task into low-use buffers. For example, if a subject were to maintain three digits in auditory buffers while performing a visual task that utilizes the visual system, both the innerloop and motor system would show little speed or accuracy deficits. Baddeley and Hitch (1974)have found that increasing the short-term digit load from one to three digits results in no change in accuracy and little change in speed of processing, e.g., a 5% (0.07)sec slowing in judging sentences such as Canaries have wings to be true. However, loads that exceed the capacity of buffers result in substantial errors and increases in reaction time; e.g., with an eight-digit loading, the Baddeley and Hitch (1 974)sentence-judgingtask resulted in a substantial increase in errors (from 3% to 14%) and a slowing of the response (44’70, 0.67 sec). The third strategy to deal with high workload is to use context storage to temporarily associate information to the current context and utilize the context to load modules. The ability of subjects to perform embedded tasks after a brief rehearsal period suggest this type of strategy. For example, Klapp et al. (1983)allowed subjects 5 sec to rehearse letter strings 0,6,and 9 items long before they performed an embedded task such as visual scanning. In the connectionist/control architecture, the short rehearsal would associate the letters to the context, then the search task could be performed without rehearsing the letter task, and finally the context could be used to retrieve the letters. This context-storage strategy can explain the use of brief review periods before performing critical events. For instance, in both athletic competition and military combat situations, individuals often review their intended actions just prior to entering the critical situation. This process of review could be used to associate the impending actions to the context. Attending to the context, i.e., transmitting the context vector, could then simultaneously load modules in many regions and initiate many concurrent processes. The fourth strategy is to develop automatic processes to reduce the load on the central and regional controllers. We assume that the controlprocessing system can control only a very small proportion of the modules in the network. The regional controllers generally only buffer three to four elements at a level of processing. To reduce the load on the control architecture, each module can gate information locally. A model for the development of local automatic gating is detailed in Schneider and Mumme (1987). Briefly, they assume that the autoassociation matrix within each module associates the message within the module with a priority tag. Transmissions from the module that result in a positive event (determined at the system
108
Walter Schneider and Mark Detweiler
level) increase the priority tag, negative transmissions decrease it. If a module receives a high-priority message, the module transmits the message in the absence of control input. If the system consistently responds to particular messages, those messages will be automatically transmitted, i.e., as a result of the local priority tag. The benefit of priority-based transmission is that it allows the limited control-processing resources to be used elsewhere in the system. The model of priority-tag learning (Schneider & Mumme, 1987) illustrates how consistent practice develops fast, parallel, and difficult-to-alter automatic processing. The fifth strategy for dealing with high workload is to reduce the message interference for concurrently transmitted messages (see Schneider & Detweiler, in press). Message interference is a limiting factor for communications on the innerloop. Each module on the innerloop has its own fibers, allowing multiple messages to be transmitted concurrently. However, if two incoming messages activate competing vectors in a receiving module, interference results. Assume a typist seeks to perform copy typing while concurrently comprehending conversations. Normally the visual transmission of text codes activates semantic processing (for comprehension as in reading) and motor processing (for typing). The auditory transmission of speech codes normally activates semantic processing and articulatory codes. Initially concurrent auditory and speech input cause interference, and the central control system only allows the transmission of visual codes during typing. As the subject practices typing, i.e., transmitting messages from the visual system and releasing messages in the motor system, the visual-to-motor connections are strengthened. The lack of releasing responses in the comprehension system weakens these connections. With time, the visual-to-semantic connections weaken such that visual input no longer interferes (at least in a typing context) with the auditory input to semantic processing. If the visual transmissions become automatic, the central controller need not be involved in copy typing. At this stage, the typist could attend to the auditory input and comprehend speech while typing. Copy typists’ lack of memory for the material typed is suggestive of this kind of change of connection weights.
IX. Working Memory in Learning and Skill Acquisition Working memory plays a critical role in learning and acquiring knowlege. All LTM is stored in the connection weights in the network. The change in connection weights is determined by what is active in working memory. In the process of learning a task, controlled processing is generally used to compare the input pattern to a rule and to perform the appropriate response based on the match. One could view this as a process of acquiring productions (J. R. Anderson, 1983). However, since many patterns are stored in
Working Memory Architecture
1D9
any single connection matrix, there will be interactions among patterns, depending on the total set of productions to be acquired (see Rumelhart & McClelland, 1986a). Acquiring a skill necessitates keeping instructions and task-relevant information in working memory while performing at least some components of the task. For example, to learn to specify the output of an AND electronic gate, the system must store the verbal rule “if all the inputs are high, then the output is high,” activate the input patterns, compare the input patterns to the pattern “all high,” and respond “high output’’ if true and “low output” if false. The first step of skill acquisition is to rehearse the verbal rule to enable the context to load the buffers. The context would preload modules for the target state (e.g., a high on all inputs), the response on a match (e.g., a prediction of a high on its output), and the response on a nonmatch (e.g., prediction of a low on the output). By associating these patterns to the context vector with fast weights, the context could be reevoked to reload the buffers. If the subject were distracted, the instructions could be reloaded by activating the context vector. When a problem, (e.g., What is the output if the input is l l l l ? ) is presented, a controlled comparison would occur between the input and the target output. On a match the “yes” response would be released. As a result of controlled processing operations, the input pattern, e.g., 1111, would be transmitted, followed by the output pattern, i.e., a high response, being transmitted. This would associate the input to the output. With sufficient training trials, the longterm connections between the input and output would be modified such that the input could directly activate the output (see Schneider & Mumme, 1987, for a simulation for such learning). When this occurs, context preloading and the controlled comparison process are not needed.
PRACTICE A. DISTRIBUTING
The importance of context storage for learning to perform a task raises serious issues concerning how problems should be sequenced and spaced. Initially it is beneficial to mass practice on a task. For example, in learning electronic troubleshooting, it is better to start with a block of trials for a single gate type before moving on to the next gate type. This is preferred because the context storage maintains the working memory. In procedural tasks, subjects learn to perform individual procedures quickly during massed practice of single tasks, but then show poor performance when the trial types are intermixed. Due to PI between codes, context cannot be used to maintain or retrieve codes when training is distributed. Hence, more errors are expected during distributed training than massed training. To be able to perform a variety of procedures in random order, training must progress to distributing practice. The marginal utility of massed
110
Walter Schneider and Mark Detweiler
practice decreases with time. If the context vectors eliminate most of the error between the activated output and desired output, there is less learning (with a delta-type learning rule). Also, if the subject must randomly execute the procedures at different times, context-based learning may show poor transfer. In sum, the advantages of massing practice early to maintain information in working memory trades off against the disadvantages of the context learning, showing poor transfer and reducing long-term learning. Procedures which expand the distribution of practice with training are likely to be optimal (Landauer & Bjork, 1978). B.
PHASES OF SKILL ACQUISITION
Within the present architecture, there are five identifiable phases of skill acquisition. The movement between these phases is a gradual, continuous transition. The use of working memory and controlled processing varies at each phase of processing. The rate of movement between stages depends on the nature of the task to be learned. We illustrate the transitions using numbers based on subjective impressions of learning logic gates for electronic troubleshooting (Carlson & Schneider, 1987). These numbers are included only to give the reader an impression of the expected time course of these changes. Phase One of skill acquisition, e.g., Trials 1-4, involves loading all the information for performing the task into buffers. The task is performed by comparing information in the buffers to the incoming stimuli and releasing a response if a match occurs (see Schneider, 1985; Schneider & Mumme, 1987, for details). If the subject is interrupted, the buffer information may be lost, resulting in errors. We train our subjects with a mini lecture on six gate types. In learning logic gates, our subjects’ response times are between 2-3 sec on the first trial, with subjects requesting help about 40% of the time. Phase Two of skill acquisition, e.g., Trials 5-20, involves performing the same task as Phase One, but by Trial 5 the context-storage mechanism can maintain and reload working memory. Performance for blocked trials of the task is accurate and reasonably fast. By Trial 5 , subjects’ accuracy is near perfect and response times are down to 0.7 seconds for massed trials. During massed practice in Phase 2, controlled processing resources are required to compare the input to the rules and to release output vectors, but they are not necessary to maintain the traces in the buffer. However, if alternative procedures are intermixed, accuracy decreases and responding slows considerably. Whenever the task switches, subjects reevoke the verbal rule and context t o reload the buffers in order to perform the task.”
”This is similar to J. R. Anderson’s (1983)use of interpretive execution of productions.
Working Memory Architecture
111
On early intermixed trials, subjects’ response times increase to 2-3 sec and they request help about 40% of the time. Phase Three of skill acquisition, e.g., Trials 21-100, occurs when the associations to the goal state are strong enough to load working memory without the use of context storage, such that attending to an AND gate loads the input pattern to be checked and the possible output responses. In this phase performance is accurate and rapid even if tasks are intermixed. However, the subject must still attend to the task and perform controlprocess comparisons. In Phases Four and Five of skill acquisition, a substantial reduction occurs in the use of controlled processing resources in performing the task. Phase Four, e.g., 101-200 trials, is identified when the associations between input, the goal state, and the output become strong enough for the input to evoke the output directly; e.g., with an input of 111, and a goal of AND, the output would evoke a 1 output via associative retrieval. In this phase, controlled processing comparison drops out, thus reducing workload (see Schneider & Mumme, 1987, for simulation). Note that controlled processing is still required to transmit messages on the innerloop and to route the NEXT and LOAD signals. In learning electronic troubleshooting, subjects show small speedups (e.g., 100 msec in predicting the output of single gates) between 100-200 trials of practice, but dramatic speedups (from 8 to 4 sec) in problem solving in circuit troubleshooting. This improved ability to use the rule in the problem-solving context suggests that the learning during Phase Four eliminates the control-processing comparisons as in Phases One to Three. Phase Five, e.g., after about 200 trialslrule, occurs when the modules develop local automatic processing so that the message is transmitted even in the absence of controlled processing input. At this phase of processing, controlled processing resources need not be allocated in the gate identification task. The task can be performed reliably even if the subject uses controlled processing resources to perform other tasks. Some alternative tasks do interfere due to message interference. In the connectionist/control architecture, the extent to which working memory is used varies, depending on the task and the phase of skill acquisition. The combination of context storage and controlled process comparison enables the network to accurately perform novel tasks after only a few trials. This contrasts with traditional connectionist learning systems that typically require thousands of trials to acquire a novel set of associations (see Schneider, 1987). The first few trials of performing a task are very attention demanding, difficult to perform in mixed trials, and error prone under high workload. With practice, the system modifies the LTM associations such that automatic processing develops which enables fast, accurate, and low-resource load processing.
112
Walter Schneider and Mark Detweiler
X.
Final Comments
The connectionst/control architecture details a computational system that exhibits many of the phenomena of human working memory. The system level of the architecture (see Fig. 3) includes regions that specialize in different classes of processing. The activity of the regions is coordinated by a central control structure that routes control signals and sequences transmissions among regions to limit problems of message interference. One of the regions serves as a context storage mechanism that can reevoke (via fast-weight connections) messages on the innerloop of processing. Each region is divided into a number of levels that sequentially or spatially input or output patterns to other levels (see Fig. 2). Each level has a control structure that monitors the activity of all the modules in its level and controls the feedback and transmission of that level. The level control structure sends and receives control signals to coordinate the sequential storage and processing of information. Each level includes multiple modules (see Figs. 1 and 2). Each of these modules involves a connectionist network that processes vectors of information. A module can store, categorize, maintain, and prioritize a received vector. This architecture is sufficiently detailed that it can simulate a wide variety of human learning and attentional phenomena. The architecture is physiologically plausible and shows some intriguing parallels to modular systems in the cortex (see Schneider & Mumme, 1987). Any model of human working memory must first be evaluated as to whether it provides a robust processing architecture that could survive in the complex and dynamic world in which humans have evolved. Buffers are needed because much of the processing must be sequential and asynchronous. Attention is needed to deal with resource competition and message interference. A context-storage mechanism is needed to allow recovery from interruptions, to increase the effective size of working memory, and to permit acquisition of rudimentary skills after only a single stimulus presentation. We think that the traditional models of working memory, e.g., Atkinson and Shiffrin (1968) and Baddeley (1986), do not provide a robust processing architecture. These buffer-oriented systems to not provide mechanisms to allow information to be recovered after an interruption that flushes the buffers. They provide a limited model for a subset of working-memory phenomena. A system limited to only such buffer memories and a slowly changing LTM is likely to exhibit severely unstable processing, perhaps similar to the amnesiac patient HM. The buffer models do account for classic STM phenomena, e.g., interference effects. However, they do not account for many other important phenomena, e.g., lack of STM decay on the first trial, PI effects, reliable processing despite severe loading, and the critical dependence on LTM for what can be stored in STM.
Working Memory Architecture
113
We have described an architectural class of models for working memory. There are many possible configurations of modules, levels, regions, and control structures. For example, the innerloop of processing might be a ring as depicted in Fig. 3, or it could be some complex lattice of processing regions. A great deal of theoretical and simulation work needs to be performed to determine the computational capacities of this architecture. Human empirical research is required to (1) evaluate how well models within this architecture predict human data, and (2) identify specific details of the architecture. The present architecture can account for a wide range of human workingmemory phenomena as emergent properties of the system. Most of the predictions follow from the process of developing a robust processing system, rather than from trying to model specific phenomena. The proposed multileveled buffer scheme provides an interpretation of the magic number three or four, acoustic confusions, sequential processing, problems with digit canceling and reverse digit span, the difficulty of maintaining order information, and the nature of rehearsal. Context storage is included to enable the system to cope with interruptions and to expand working memory. This storage mechanism provides a way of interpreting the distinction between episodic and semantic memory, retroactive and proactive interference effects and trade offs, the buildup of PI, the benefit of elaborative rehearsal over maintenance rehearsal, the release of practice interference either by time or switching content, LTM recency effects, and the ability to continue processing information after traditional STM capacity is exceeded. The present processing architecture can be operated with different levels of effectiveness depending on how the resources in the system are utilized. The skilled uses of working memory provide interpretations of the unattended speech effect, levels of processing, mnemonics, category clustering, distribution of practice, generation effects, and skilled memory. The control processing for sequential output of information makes predictions regarding chunking, chunk-based retrieval, and pause boundaries. The control processing management of information enables the system to deal with conditions of high workload and produces psychological refractory-period phenomena, sequential attending, and the use of context to facilitate priming. Context storage enables information to be acquired rapidly during massed practice of procedures and illustrates that using an expanding practice schedule results in better retention for later distributed testing. To reduce workload on the limited control processing system, the control of information is localized within modules. This localization takes place gradually and illustrates different phases of skill acquisition. This architecture represents a hybrid of many previous models and frameworks for memory. It includes buffers (Waugh & Norman, 1965;
114
Walter Schneider and Mark Detweiler
Atkinson & Shiffrin, 1968), a system to perform automatic and controlled processing (Shiffrin & Schneider, 1977; Schneider, 1985; Schneider & Mumme, 1987), multiple processing regions (Baddeley, 1976; Wickens, 1970, 1972), a distributed connectionist approach to associative memory (McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986a), autoassociative categorization (J. A. Anderson, 1983), automatic context storage (Tulving, 1972, 1983, 1984; Raaijmakers & Shiffrin, 1980, 1981; Hasher & Zacks, 1979), and the use of fast connection weights (Hinton & Plaut, 1987). The understanding of working memory is critical to the understanding of human cognition. We must know its capacity, structure, strategies of use, and limitations. It is important to examine a variety of architectures that incorporate the complex diversity of working-memory phenomena seen in humans. The present connectionist/control architecture provides a potential architecture that could be implemented in a physiologically feasible manner and predicts a variety of the phenomena and potential structure of human memory.
ACKNOWLEDGMENTS This research was sponsored by the Army Research Institute, under Contract No. MDA90386-C-0149 and Personnel and Training Research Programs, Psychological Sciences Division, Office of Naval Research under Contract Nos. N-0014-86-K-0107 and N-0014-86-K-0678.
REFERENCES Ackley, D. H., Hinton, G. E., & Sejnowski, T. J . (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147-169. Anders, T. R., & Lillyquist, T. D. (1971). Retrieval time in forward and backward recall. Psychonomic Science, 22, 205-206. Anderson, 1. A. (1983). Cognitive and psychological computation with neural models. ZEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 799-81 5 . Anderson, J. A., & Mozer, M. C. (1981). Categorization and selective neurons. In G . E. Hinton & J. A. Anderson (Eds.), Parallel models of associative memory. Hillsdale, NJ: Erlbaum. Anderson, J . R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control process. In K. W. Spence & J. T. Spence (Eds.), Thepsychology of learning and motivation. Vol. 2. New York: Academic Press. Atkinson, R. C., & Shiffrin, R. M. (1971). The control of short-term memory. Scientific American, 224, 82-90.
Working Memory Architecture
115
Ayres, T. J., Jonides, J., Reitman, J . S., Egan, J. C., & Howard, D. A. (1979). Differing suffix effects for the same physical stimulus. Journal of Experimental Psychology: Human Learning and Memory, 5 , 315-321. Baddeley, A. D. (1966). Short-term memory for word sequences as a function of acoustic, semantic, and formal similarity. The Quarterly Journal of Experimental Psychology, 18, 362-365.
Baddeley, A. D. (1976). The psychology of memory. New York: Basic Books. Baddeley, A. D. (1983). Working memory. Philosophical Transactions of the Royal Society of London, Series B, 302, 3 11-324. Baddeley, A. D. (1986). Working memory. Oxford: Clarendon. Baddeley, A. D., Grant, S., Wight, E., & Thomson, N. (1974). Imagery and visual working memory. In P. M. A. Rabbitt & S. Dornic (Eds.), Attention and performance VIII. Hillsdale, NJ: Erlbaum. Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H . Bower (Ed.), The psychology of learning and motivation. Vol. 8. New York: Academic Press. Baddeley, A. D., & Hitch, G. J. (1977). Recency reexamined. In S. Dornic (Ed.), Attention and performance VI. Hillsdale, NJ: Erlbaum. Baddeley, A. D., & Scott, D. (1971). Short-term forgetting in the absence of proactive inhibition. The Quarterly Journal of Experimental Psychology, 23, 275-283. Barnard, P. (1985). Interacting cognitive subsystems: A psycholinguistic approach to shortterm memory. In A, W. Ellis (Ed.), Progress in the psychology of language. Vol. 2. Hillsdale, NJ: Erlbaum. Barrett, E. F., & Mangleby, K. L. (1976). Physiology of cholinergic transmission. In A. M. Goldberg & I. Hanin (Eds.), Biology ofcholinergic function (pp. 29-100). New York: Raven. Bellezza, F. S. (1981). Mnemonic devices: Classification, characteristics, and criteria. Review of Educational Research, 51, 247-275. Bellezza, F. S. (1982). Improve your memory skills. Englewood Cliffs, NJ: Prentice-Hall. Bousfield, W. A. (1953). The occurrence of clustering in the recall of randomly arranged associates. Journal of General Psychology, 49, 229-240. Bousfield, W. A., & Cohen, B. H. (1955). The occurrence of clustering in the recall of randomly arranged words of different frequencies of use. Journalof General Psychology, 52,83-95. Bousfield, W. A., & Cohen, B. H. (1956). Clustering in recall as a function of the number of word categories in stimulus word lists. Journal of General Psychology, 54, 95-106. Bower, G. H. (1970). Analysis of a mnemonic device. American Scientist, 58, 496-510. Bower, G. H., Clark, M. C., Lesgold, A., & Winzenz, D. (1969). Hierarchical retrieval schemes in recall of categorized word lists. Journal of Verbal Learning and Verbal Behavior, 8, 323-343. Broadbent, D. E. (1958). Perception and communication. Oxford: Pergamon. Broadbent, D. E. (1975). The magic number seven after fifteen years. In A. Kennedy & A. Wilkes @ds.), Studies in long-term memory. New York: Wiley. Broadbent, D. E. (1984). The Maltese cross: A new simplistic model for memory. The Behavioral and Brain Sciences, 7, 55-94, Brown, J. (1958). Some tests of the decay theory of immediate memory. The Quarterly Journal of Experimental Psychology, 10, 12-21. Carlson, R. A., & Schneider, W. (1987). Learning and using causal rules. Unpublished manuscript. Chase, W. G., & Ericsson, K. A. (1981). Skilled memory. In J. R. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, NJ: Erlbaum. Chase, W. G., & Ericsson, K. A. (1982). Skill and working memory. In G. H. Bower (Ed.), The psychology of learning and motivation. Vol. 16. New York: Academic Press.
Walter Schneider and Mark Detweiler
116
Cohen, N. J., Eichenbaum, H., Decacedo, B. S., & Corkin, S. (1985). Different memory systems underlying acquisition of procedural and declarative knowledge. Annals of the New York Academy of Sciences, 444,54-71. Cohen, N. J., & Squire, L. R. (1980). Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science, 210,207-210. Conrad, R. (1959). Errors of immediate memory. British Journal of Psychology, 50,349-359. Conrad, R. (1960). Serial order intrusions in immediate memory. British Journal of Psychology, 51, 45-48. Conrad, R. (1964). Acoustic confusions in immediate memory. British Journal of Psychology, 55, 75-84. Crannell, C. W., & Parrish, J. M. (1957). A comparison of immediate memory span for digits, letters, and words. Journal of Psychology, 44, 319-327. Crowder, R. G. (1982). The demise of short-term memory. Acta Psychologica, 50, 291-323. Crowder, R. G., & Morton, J. (1969). Precategorical acoustic storage (PAS). Perception & PSyChOphySiCS, 5 , 365-373. Cuddy, L. J., & Jacoby, L. L. (1982). When forgetting helps memory: An analysis of repetition effects. Journal of Verbal Learning and Verbal Behavior, 21, 451-467. Desimone, R., Schein, S. J., Moran, J., &Ungerleider, L. G. (1985). Contour, color and shape analysis beyond the striate cortex. Vision Research, 25, 441-452. Dirlam, D. K. (1972). Most efficient chunk sizes. Cognitive Psychology, 3, 355-359. Dyer, F. H. (1973). The Stroop phenomenon and its use in the study of perceptual, cognitive and response processes. Memory & Cognition, 1, 106-120. Ericsson, K. A., &Chase, W. G. (1981). Exceptional memory. American Scientist, 70, 607-615. Ericsson, K. A., & Polson, P. G. (1987). A cognitive analysis of exceptional memory for restaurant orders. In M. T. H. Chi, R. Glaser, & M. J. Farr (Eds.), The nature of expertise. Hillsdale, NJ: Erlbaum, in press. Estes, W. K. (1972). An associative basis for coding and organization in memory. In A. W. Melton & E. Martin (Eds.), Codingprocessesin human memory. Washington, DC: Winston. Fisk, A. D., & Schneider, W. (1984). Memory as a function of attention, level of processing, and automatization. Journal of Expermenral Psychology: Learning, Memory and Cognition, 10, 181-197. Frick, R. W. (1984). Using both an auditory and a visual short-term store to increase digit span. Memory & Cognition, 12, 507-514. Goggin, J., & Wickens, D. 0. (1971). Proactive interference and language change in shortterm memory. Journal of Verbal Learning and Verbal Behavior, 10,453-458. Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of Experimental Psychology: General, 108, 356-388. Healy, A. F. (1974). Separating item from order information in short-term memory. Journal Verbal Learning and Verbal Behavior, 13,644-655. Healy, A. F. (1982). Short-term memory for order information. In G. H. Bower (Ed.), The psychology of learning and motivation. Vol. 16. New York: Academic Press. Hinton, G. E. (1986). Learning distributed representation of concepts. Eighth Annual Conference of the Cognitive Science Society @p. 1-12). Amerst, Massachusetts, August 1986.
Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing. Vol. 1. Cambridge, MA: MIT Press. Hinton, G. E., & Plaut, D. C. (1987). Using fast weights to deblur old memories. Ninth Annual Conference of the Cognitive Science Society, Seattle, Washington, July, 1987. Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann Machines. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing. Vol. 1. Cambridge, MA: MIT Press.
Working Memory Architecture
117
James, W. (1890/1983). The principles of psychology. Cambridge, MA: Harvard University Press. Jarvella, R. J. (1971). Syntactic processing of connected speech. Journal of Verbal Learning and Verbal Behavior, 10, 409-416. Johnson, N. F. (1966a). The influence of associations between elements of structured verbal responses. Journal of Verbal Learning and Verbal Behavior, 5, 369-374. Johnson, N. F. (1966b). On the relationship between sentence structure and the latency in generating the sentence. Journal of Verbal Learning and Verbal Behavior, 5, 375-380. Johnson, N. F. (1970). The role of chunking and organization in the process of recall. In G. H. Bower (Ed.), Thepsychology of learning and motivation. Vol. 4, New York: Academic Press. Johnson, N. F. (1972). Organization and the concept of a memory code. In A. W. Melton &E. Martin (Eds.), Coding processes in human memory. Washington, DC: Winston. Just, M. A., &Carpenter, P . A. (1987). The psychology of reading und language comprehension. Boston: Allyn and Bacon, Inc. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall. Keppel, G., & Underwood, B. J. (1962). Proactive inhibition in short-term retention of single items. Journal of Verbal Learning and Verbal Behavior, 1, 153-161. Kincaid, J . P., & Wickens, D. D. (1970). Temporal gradient of release from proactive inhibition. Journal of Experimental Psychology, 86, 3i3-316. Kirchner, W. K. (1958). Age differences in short-term retention of rapidly changing information. Journal of Experimental Psychology, 55, 352-358. Klapp, S. T. (1987). Short-term memory limits in human performance. In P. Hancock (Ed.), Human factors psychology. Amsterdam: North-Holland. Klapp, S . T., Marshburn, E. A., &Lester, P . T. (1983). Short-term memory does not involve the “working memory” of information processing: The demise of a common assumption. Journal of Experimental Psychology: General, 112, 240-264. Klapp, S. T., &Phillipoff, A. (1983). Short-term memorylimits in performance. In A. T. Pope & L. D. Haugh (Eds.), Proceedings of the human factor society 27th annualmeeting. Santa Monica, CA: Human Factors Society. Kohonen, T . (1984). Self-organization and associative memory. New York: Springer-Verlag. Laird, J., Rosenbloom, P., & Newell, A. (1986). Universal subgoaling and chunking: The automatic generation and learning of goal hierarchies. Boston, MA: Kluwer. Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R.N. Sykes (Eds.), Practiculaspectsof memory. London: Academic Press. Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms in behavior. New York: Wiley. Loess, H. (1968). Short-term memory and item similarity. Journal of Verbal Learning and Verbal Behavior, 1,87-92. Loess, H., & Waugh, N. C. (1967). Short-term memory and inter-trial interval. Journalof Verbal Learning and Verbal Behavior, 6 , 455-460. Logan, G. D. (1979). On the use of concurrent memory load to measure attention and automaticity. Journal of Experimental Psychology: Human Perception and Performance, 5, 189-207. Lyon, D. (1977). Individual differences in immediate serial recall: A matter of mnemonics? Cognitive Psychology, 9, 403-41 I . Mackworth, J. (1959). Paced memorizing in a continuous task. Journal of Experimental PSyChOlOgy, 58, 206-21 1. Mandler, G. (1967). Organization and memory. In K. W. Spence & J . T. Spence (Eds.), The psychology of learning and motivation. Vol. 1. New York: Academic Press.
118
Walter Schneider and Mark Detweiler
McClelland, J. L., & Rumelhart, D. E. (1986). A distributed model of human learning and memory. In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing, Vol. 2: Psychological and biological models. Cambridge, MA: MIT Press. McConkie, G. W., & Zola, D. (1979). Is visual information integrated across successive fixations in reading? Perception & Psychophysics, 25, 221-224. McLean, R. S., & Gregg, L. W. (1%7). Effects of induced chunking on temporal aspects of serial recitation. Journal of Experimental Psychology, 74, 455459. Melton, A. W. (1963). Implications of short-term memory for a general theory of memory. Journal of Verbal Learning and Verbal Behavior, 2, 1-21. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Mishkin, M., & Appenzeller, T. (1987). Theanatomy of memory. ScientificAmerican,256,80-89. Mishkin, M., Malmut, B., & Bachevalier, J. (1984). Memories and habits: Two neural systems. In G. Lynch, L. McGaugh, & N. M. Weinberger (Eds.), Neurobiology of learning and memory. New York: The Guilford Press. Mountcastle, V. B. (1979). An organizing principle for cerebral function: The unit module and the distributed system. In F. 0. Schmitt & F. G. Worden (Eds.), The neurosciences. Cambridge, MA: MIT Press. Murdock, B. B., Jr. (1961). The retention of individual items. Journal of Experimental Psychology, 62, 618-625. Peterson, L. R., & Gentile, A. (1965). Proactive interference as a function of time between tests. Journal of Experimental Psychology, 70, 473-478. Peterson, L. R., & Peterson, J. J. (1959). Short-term retention of individual verbal items. Journal of Experimental Psychology, 58, 193-198. Postman, L., &Phillips, L. W. (1965). Short-term temporal changes in free recall. The Quarterly Journal of Experimental Psychology, 17, 132-138. Raaijmakers, J. G. W., & Shiffrin, R. M. (1980). SAM: A theory of probabilistic search of associative memory. In G. H. Bower, (Ed.), The psychology of learning and motivation. Vol. 14. New York: Academic Press. Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review, 88, 93-134. Reisberg, D., Rappaport, I., & O’Shaughnessy, M. (1984). Limits of working memory: The digit digit-span. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 203-221.
Reitman, J. S . , & Rueter, H. H. (1980). Organization revealed by recall orders and confirmed by pauses. Cognitive Psychology, 12, 554-581. Rescoria, R. A., &Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectivenessof reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning ZI: Current theory and research. New York: Appleton. Rosenberg, C. R., & Sejnowski, T. J. (1986). The spacing effect on NETtalk, a massivelyparallel network. The 8th Annual Conference of the Cognitive Science Society pp. 72-89. Ross, B. H. (1984). Remindings and their effects in learning a cognitive skill. Cognitive Psychology, 16, 371-416. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.) Parallel distributed processing. Vol. I . Cambridge, MA: MIT Press. Rumelhart, D. E., & McClelland, J. L. (1986a). On learning the past tense of English verbs. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing. Vol. 2: fiychological and biological models. Cambridge, MA: MIT Press. Rumelhart, D. E., & McClelland, J. L. (Eds.). (1986b). Parallel distributedprocessing:fiplora tions in the microstructure of cognition. Vol. I : Foundations. Cambridge, MA: MIT Press. Rumelhart, D. E., & Norman, D. A. (1982). Simulating a skilled typist: A study of skilled cognitive-motor performance. Cognitive Science 6 , 1-36.
Working Memory Architecture
119
Salame, P., & Baddeley, A. (1982). Disruption of short-term memory by unattended speech: Implications for the structure of working memory. Journal of Verbal Learning and Verbal Behavior, 21, 150-164. Schank, R. C . (1982). Dynamic memory. New York: Cambridge University Press. Schneider, W. (1985). Toward a model of attention and the development of automatic processing. In M. I. Posner & 0. S. M. Marin (Eds.), Attention and performance XI. Hillsdale, NJ: Erlbaum. Schneider, W. (1987). Connectionism: Is it a paradigm shift for psychology? Behavior Research Methods, Instruments & Computers, 19, 73-83. Schneider, W. S., & Desimone, R. (1986). A combinedphysiological and theoretical approach to the mechanism of selective attention. Unpublished paper. Schneider, W., & Detweiler, M. (in press). The role of practice in dual-task performance: Workload modelling in a connectionist/control architecture. Human Factors. Schneider, W., & Mumme, D. (1 987). A connectionist/controI architecture for attention, automaticity and the capturing of knowledge. Manuscript for submission. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing. 11. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127-190. Siggins, G. R., & Gruol, D. L. (1986). Mechanisms of transmitter action in the vertebrate central nervous system. In V. B. Mountcastle, F. E. Bloom, & S. R. Gieger (Eds.), Handbook of physiology: The nervoussystem IV @p. 1-1 14). Bethesda, MD: American PhysiologicalSociety. Slak, S. (1970). Phonemic recoding of digital information. Journal of Experimental Psychology, 86, 398-406. Smith, M. C. (1967). Theories of the psychological refractory period. Psychological Bullelin, 67,202-213. Starr, A. S . (1929). The significance of the ratio maintained between the forward, reverse and rhythmic memory span as obtained in three-thousand individual examinations. Psychological Bulletin, 26, 172- 173. Sternberg, S. (1966). High speed scanning in human memory. Science, 153, 652-654. Szentagothai, J. (1979). Local neuron circuits of the neurocortex. In F. 0. Schmidt & F. G . Worden (Eds.). The neurosciences fourth study program. Cambridge, MA: MIT Press. Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization of memory. New York: Academic Press. Tulving, E. (1983). Elements of episodic memory. Oxford: Clarendon. Tulving, E. (1984). Precis of Elements of episodic memory. The Behavioral and Brain Sciences, 7 , 223-268. Tzeng, 0. J . L. (1973). Positive recency effect in delayed free recall. Journal of Verbal Learning and Verbal Behavior, 12, 436-439. Van Essen, D. C. (1985). Functional organization of primate visual cortex. In A. Peters & E. G. Jones (Eds.), The cerebral cortex, Vol. 3, New York: Plenum. Waugh, N. C., &Norman, D. A . (1965). Primary memory. PsychologicalReview, 72.89-104. Welford, A. T. (1968). Fundamentals of skill. London: Methuen. Wickelgren, W. A. (1964). Size of rehearsal group and short-term memory. Journalof Experimental Psychology, 68, 413-419. Wickelgren, W. A. (1967). Rehearsal grouping and hierarchical organization of serial position cues in short-term memory. The Quarterly Journal of Experimental Psychology, 19, 97-102. Wickelgren, W. A. (1969). Context-sensitive coding, associative memory, and serial order in (speech) behavior. Psychological Review, 76, 1-1 5 . Wickens, C. D. (1980). The structure of attentional resources. In R. Nickerson (Ed.), Aflention and performance VIII. Hillsdale. NJ: Erlbaum. Wickens, D. D. (1970). Encoding categories of words: An empirical approach to meaning. Psychological Review, 77, 1-15. Wickens, D. D. (1972). Characteristics of word encoding. In A. W. Melton & E. Martin @ids.), Coding processes in human memory. Washington, DC: Winston.
This Page Intentionally Left Blank
Roberta L . Klatzky UNIVERSITY OF CALIFORNIA SANTA BARBARA. CALIFORNIA 93106
Susan J . Lederman QUEEN’S UNIVERSITY KINGSTON, ONTARIO, CANADA K7L 3N6 I. The Curious Discrepancy between Two Phenomena .................... . A. Haptic Apprehension of Two-Dimensional Spatial Layout . . . . . . . . . . . B. Haptic Apprehension of Three-Dimensional Objects . . . . . . . . . . . . . . .. 11. Haptic Apprehension and Recognition: Theoretical Issues. . . . . . . . . . . . . . .. A. The Image-Mediated Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. The Alternative to Visual Mediation: Direct Haptic Apprehension . . . . C. Questions To Be Addressed-and Some Answers . . . . . . . . . . . . . . . . . . . 111. Conclusions and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
122 122 126 128 128 130 133 147 149
The purpose of this article is to provide a theoretical perspective of haptic processing, focusing primarily on haptic processing of objects. We use the term haptics as defined by Loomis and Lederman (1986): Haptics is a perceptual system incorporating inputs from multiple sensory systems. It includes a cutaneous system that senses pressure and vibration and, although rarely considered when discussing haptics, thermal sensing, which may be of considerable importance for the perception if objects. (For present purposes, we ignore the sensing of pain.) In addition to these tactile subsystems, haptics includes a kinesthetic system, which registers position and movement from receptors in the muscles and joints. In summary, the term haptics is an umbrella that includes all of the sensory subsystems derived from involvement of skin, muscles, and joints. We stress the nature of haptic processing during active, purposive exploration (see Gibson, 1966). By object processing, we mean both apprehension of the structural and substantive attributes of objects and categorization of the objects into previously established classes. Our general theme is that haptics can be very effective at many of these processes, and therefore it should be considered THE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 21
121
Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
122
Roberta L. Klatzky and Susan J. Lederman
an encoding device in its own right, not just a poor substitute for vision. The article begins with two phenomena that we have documented in both laboratory studies and informal demonstrations: (1) Haptics is very poor at apprehending spatial-layout information in a two-dimensional plane. (2) Haptics is very good at learning about and recognizing three-dimensional objects. From these beginnings, we outline a general theory of haptic apprehension and recognition, part of which is supported by our research program of the last several years, and part of which remains the stuff of intuition and conjecture.
I. The Curious Discrepancy between Two Phenomena APPREHENSION OF TWO-DIMENSIONAL SPATIAL LAYOUT A. HAPTIC The research described here began when one of us (SL) approached the other (RK) with an interesting phenomenon: People seem to be very poor at haptically recognizing raised-line drawings of common objects by following object contours with their fingers. A skeptic, the untutored author tried the task, only to discover that it was, indeed, virtually impossible. After about a minute of groping over a raised contour map of South America she guessed “George Washington?” The display was instantly recognizable when she opened her eyes. We, and others, have informally verified this phenomenon many times over. Determined “viewers” often explore a simple drawing for minutes before giving up; their surprise when they see, and immediately recognize, the object is considerable. In more formal studies (in progress), we have found that success in this task can sometimes be obtained, particularly with simple, prototypical pictures. However, success usually follows an inferential, hypothesis-testing procedure, for example, the object is much longer than it is wide, it curves, it seems to have a pointed end, so perhaps it is a snake. In our experience, haptic observers have rarely claimed to “see” a wholistic mental image by means of haptic exploration; such a wholistic image occurs only with very simple objects. Experimentally, we have pursued this phenomenon by studying the errors made during haptic encoding of unfamiliar two-dimensional patterns (Lederman, Klatzky, & Barber, 1985). The primary information in such patterns is available through the kinesthetic component of touch. A representation of the spatial layout of the pattern must be integrated, over time, from information about the position of the exploring digit as it follows the contours of the display. Our work has focused, then, on the possibility that movement per se is a basis for errors in pattern encoding. More specifically, we investigatedwhether there might be specific movement-
The Intelligent Hand
123
based “heuristics” that observers use to infer spatial layout information from haptic exploration. To test this, we used raised plastic pathways like that shown in Fig. 1. Subjects traversed the pathway from end to end as often as they desired with the index finger, and then answered questions about it. The questions were intended to reveal observers’ representations of the pathway configuration as a whole. In one case, observers were asked to indicate the length of the straightline distance between the endpoints: unless the pathway was a straight line, they had never actually explored this distance. In a second case, they were asked to indicate, by setting a pointer, the direction of the start of the pathway from the end (relative to a horizontal reference line). In order to motivate processing of the pathway configuration as a whole, trials judging straight-line length or direction of starting point were mixed with trials demanding a judgment about the length of the pathway or the direction of its most-deviating lateral point (relative to a line connecting the endpoints), respectively. Our principal manipulation in these studies was the extent of movement along the actually explored pathway. In the experiments on length path
‘
reference line
...............b.............
(horizontal axis)
0
1. .1
E
R
4
(table edge)
Fig. I . Sample pathway configuration used in experiments on two-dimensional pattern apprehension. (From Lederman, Klatzky, & Barber, 1985. Copyright 0 1985 by the American Psychological Association.)
124
Roberta L. Klatzky and Susan J. Lederman
judgments, the length of the pathway was manipulated in multiples of the end-to-end distance. In experiments on direction judgment, the degree to which the pathway deviated from the (inferred) straight line between endpoints was manipulated. The results, shown in Fig. 2, indicate rather different effects of these two manipulations. The left panel shows the effects of actual pathway length on judgments of the end-to-end distance. There is a clear tendency to erroneously inflate this distance judgment as irrelevant movement along the pathway increases. In subsequent experiments (Lederman, Klatzky, Collins, & Wardell, 1987), in which observers were taught to move at different speeds, we have shown that this length-distortion effect is primarly spatial rather than temporal, although it does increase with the duration of movement as well. That is, inferred straight-line length is influenced more by how far the exploring limb moves than by how long it moves. When exploration moves through irrelevant areas of space, estimates of inferred distances in the pathway configuration increase. In contrast, the nature of exploratory movement had virtually no effect on the direction judgments, as shown in the right panel of Fig. 2. Instead, we observed a phenomenon similar to distortions associated with cognitive maps (e.g., Tversky, 1981): Subjects used natural reference axes in the plane when judging the direction of the start point of the pathway, and their judgments were pulled toward these axes. Thus, when the actual to-bejudged angle was aligned with a reference axis, as in the 90" condition shown in Fig. 2 (where the start point was directly above the end), there was zero error. But as the actual value moved away from 90°, judgments were pulled back toward 90".The linear function indicates that this pullback effect could be almost 50% (as indicated by the slope). For present purposes, the most important point about these results is that observers used a heuristic that was invariant over the dynamics of exploration. Neither the direction in which the explored pathway deviated from the judged line nor the distance of such deviating movement altered the results. On the whole, our studies of two-dimensional haptic apprehension reveal substantial error and vulnerability to distortion. We assume that such error is a natural result of the use of cognitive heuristics to infer spatial information. We have found evidence for a variety of heuristic devices in this work, including using the extent of exploratory movement to infer distance, using spatial reference axes to judge positions of points, footstep counting to determine distance in large spaces explored on foot, and inferring missing portions of a triangular configuration from spatial imagery of the traversed legs. In comparison to vision, haptics appears to be particularly prone to use such heuristic devices (e.g., Lederman & Taylor, 1969). There are several potential reasons for this. First, the more impoverished the spatial informa-
T
rz r W
1
16
1
1
DI D3
1
W
I
1
D2
1
I
I
l
l
I
I
DI
D3
I
1
1
1
I
I I
D2D3 10
D3 D2
D4
CLOCKWISE PATH
DI
I
14
(3
n 3
7)
z a w
(3
12
7 V
10
W
8
3
z, - E
m u [L-
O
rK U
6
40
- .49 o c t u o l * 4 3
Error
EUCLIDEAN DISTANCE ( e )
W
(3
W
z
0-41cm
4
0-67crn 0 - I1 Ocm B - I 5 2crn
0 -
m
2
a I
0
z
W
I
2
I 4
I 6
F E L T PATHWAY DISTANCE ( i n multiples of e 1
I
8
w
a!
1
1
30 40
1
50
1
1
1
60 70 80
1
90
"
1
'
1
100 110 120 130 140 150
DIRECT ANGLE ( d o g )
Fig. 2. Left panel: Mean error in direct-line (Euclidean) judgments as a function of pathway distance. Right panel: Mean error in judging position of pathway origin, as a function of its actual position (direct angle) and detour distance (D1 = 2.5 cm, D2 = 6.7 cm, D3 = 11 cm, D4 = 17.9 cm). The upper graph illustrates performance when objects were allowed to hold a finger on the start as an anchor; the lower illustrates a no-anchor condition. (Adapted from Lederman, Klatzky, & Barber, 1985). Copyright 0 1985 by the American Psychological Association.)
126
Roberta L. Klatzky and Susan J. Lederman
tion directly available from perception and memory, the greater the need for heuristics. In other words, heuristics are used to “pull up the slack” from more direct spatial processing. And as argued below, the kinesthetic component of haptics leaves a great deal of slack when providing information about the layout of points in space. Second, haptic exploration in these two-dimensional tasks extends considerably over time. This imposes demands on memory, which may concomitantly increase the influence of such processes (Tversky, 1981). We are not alone, of course, in documenting poor performance with haptic perception. An extensive literature on such tasks as recognition, matching, and reconstruction of two-dimensional arrays as well as free-standing nonsense shapes makes the same point (e.g., Cashdan, 1968; Dodds, Howarth, & Carter, 1982; Lobb, 1965; Worchel, 1951). The case seems clear that, with such tasks, haptics is an impoverished perceptual medium.
B. HAPTIC APPREHENSION OF THREE-DIMZNSIONAL OBJECTS After concluding our first body of research on two-dimensional pattern apprehension, we felt that the haptic system had been inadequately tested. We initially considered three reasons for caution in generalizing results from studies like ours, which used artifical objects or raised graphics displays, to haptic performance overall: (1) nature of the task, (2) modalityinappropriate stimuli, (3) ecological validity of the stimuli. With respect to the first of these, studies with two-dimensional haptic displays often require pattern apprehension as opposed to categorization. Could this be the reason for poor performance? Simply changing the task to categorization does not produce a marked improvement in performance with two-dimensional pictures; as we have noted, categorization of such displays is poor. On the other hand, as we argue further below, categorization of real objects may be superior to detailed apprehension because there are multiple and often redundant cues to an object’s identity. Crude apprehension of very few attributes might converge to produce accurate performance. The second potential reason for discounting two-dimensional performance data relates particularly to vision/touch comparisons, which are often interpreted as evidence of poor haptic performance. Considerations of modality specificity suggest that these comparisons are often inappropriate. Many of the displays used in previous research do not adequately allow for fundamental differences between the visual anU haptic sensory systems (Berla, 1982; Ikeda & Uchikawa, 1978; Lederman, 1979). For example, the resolving power of the fingertip is much less than that of the eye (Weinstein, 1968). Stimulus construction cannot easily compensate for this, because changing the size of a stimulus to accomodate the poor resolution of touch also changes the rate at which its contours can be explored, increasing the temporal-integration and memory demands of the task.
The Intelligent Hand
127
We find the third reason for questioning studies of two-dimensional performance to be the most compelling: The stimuli are not ecologically valid. One concern is the degree of practice, which has been found to improve haptic discrimination performance (Gibson, 1966; Simons & Locher, 1979). A lack of familiarity with artificial displays might be critical to the inferiority of haptics relative to vision. But over and above the familiarity issue, the stimuli are inadequate depictions of objects, because they generally fail to retain many of the properties of the objects themselves, such as thermal attributes, size, or texture. The cues that these displays do provide are usually dictated by the original visual master from which a raised replica was derived. It therefore becomes necessary to determine the shape of the stimulus, and perhaps even to form a visual image, in order to perform adequately. In contrast, real objects provide information about many nonstructural attributes. A kitchen sponge, for example, could be identified by its texture or elasticity, without regard for its shape or size. The foregoing reasoning suggested to us that in order to determine the processing capabilities of the haptic system under optimal circumstances one should test its adequacy for object recognition, i.e., categorization of real objects at the basic level (Rosch, 1978). The cues that real objects provide are ecologically determined, rather than based on a visual replica. Haptic manipulation of objects is commonplace and therefore familiar. Real objects maintain in full scale the attributes that contribute to haptic identification, and their proper orientation is determined by such intrinsic characteristics as principal axes, flat surfaces, and center of gravity. And as mentioned above, real objects provide redundant, multidimensional cues to their categorical identity. We therefore asked adults to haptically identify hand-size common objects that were readily named with vision (Klatzky, Lederman & Metzger, 1985). Our goal was to provide baseline measures of speed and accuracy. The stimuli were 100 common objects, of a size that could be heId in the hands, roughly classifiable as personal articles, articles for entertainment, foods, clothing, tools, kitchen supplies, office supplies, and household articles. A name was considered correct if it was commonly applied to objects of the given type, was not commonly applied to distinctly different objects, and was not the name of a relatively abstract category. (Thus, for example, “thread” or “spool” would be acceptable for that object, but “dowel” or “sewing implement” would not.) A visual identification task served as a pretest for verifying the namability of the stimuli by sight. For the haptic identification task, subjects were blindfolded and wore sound-masking headphones. They were instructed to identify each object as quickly and accurately as possible. If they could not do so, they were to say, “I don’t know.” Response time, from the first contact with the object to the vocal response, was recorded. In addition, following their vocalization of a name, subjects were asked to describe the properties that had been used to identify the object.
128
Roberta L. Klatzky and Susan J. Lederman
Of the 2000 responses, only 83 (4.2%) were errors. Only 4 errors were omissions, and only 14 were names that were not related to the correct response. The remaining errors were related to the correct response, e.g., a superordinate or categorically related name, and some were false starts that were then corrected. If related-name errors and such corrections were allowed, the accuracy rate was 99%. Moreover, responses were fast: For correct responses, the modal response latency was 1-2 sec, and 68% of responses occurred within 3 sec of contact. Only 6% of responses took longer than 5 sec of contact. Subject’s phenomenological reports of the object properties that led to their identifications were aggregated into general categories. The most frequently mentioned bases for identification were shape (e.g., of a whistle), texture (e.g., of sandpaper), identification of a component (e.g., cap on a pen), and size. In short, the principal finding from this study was that haptic identification of a wide range of objects can be remarkably fast and accurate. These findings support our claim that studies with arbitrary configurations or two-dimensional simulations underestimate the general capacity for haptic object processing. In short, haptics can be a far more effective perceptual system than has previously been acknowledged.
II. Haptic Apprehension and Recognition: Theoretical Issues A. THEIMAGE-MEDIATED MODEL Many people are surprised that a simple raised-line drawing cannot readily be identified by touch, especially when the observer is visually experienced. That is because they hold implicitly to what we call the “image-mediated model” of haptics, as indicated in Fig. 3A. This initially plausible model states that kinesthetic information about spatial position is integrated over time to provide a representation of object contour. (Cutaneous information might also be integrated to provide a representation of texture, which is often portrayed on raised graphics for the blind.) The resulting representation is mentally converted to a visual image that is “reperceived” by an image interpreter (see Kerst & Howard, 1978), leading to identification. Though plausible, the image-mediated model fails, as we have seen. There are two potentially fallacious assumptions in this model. One is that the kinesthetic information provided by planar stimuli is sufficient to create a representation of spatial layout. The error level we found for apprehending simple pathways calls this assumption into doubt. The second assumption
The Intelligent Hand
129
A
VISUAL SENSORS
~
7 VISUAL IMAGE
HAPTIC SENSORS
- - 3
----i
IMFIGE -------- OBJECT INTERFRETER REPRESENTATION
VISUAL TRANSLATION
B. VISUAL SENSORS ---‘VISLIHL
P6OCESSOh
---‘ DERIVED --- . VISUAL PROPERTIES REPRESENTATION COMMON
’ REPRESENTATION HAPTIC SENSORS ---
HHPTIC PROCESSOR
--- >
DERIVEO - - - - I PROPERTIES
HAPTIC REPRESENTATION
Fig. 3. Assumed processing by the visual and haptic systems. A, The image-mediated model; B, The present authors’ assumptions.
is that the representation of contour provided by haptic exploration, whether impoverished or not, is interpreted by a visual processor. We argue instead that the way in which the haptic system naturally interprets pressure variation over space is not analogous to an internal visual “reperception.” Let us not forget in this discussion that haptic recognition of real objects is excellent; what is needed is a model to account for what haptics can do well. The difference between the approach we advocate and the imagemediated model is clear when we compare Fig. 3A to Fig. 3B, which schematizes our fundamental assumption: Haptic apprehension and recognition are achieved by processes unique to that system. Although some representations achieved haptically and visually may be held in common, the two domains are likely to give differential weight to such codes. Even though we know that properties of objects such as form and texture are readily available through vision, our approach dismisses the assumption that haptics requires the assistance of a visual mediator in order to apprehend this information. The necessity for image mediation is further questioned when we consider that some information available to haptics is not likely to be visually mediated. This information includes properties such as temperature, weight, and hardness. The lack of salience of such properties
130
Roberta L. Klatzky and Susan J. Lederman
to vision is indicated by the general tendency to exclude them from raisedline drawings (or from visual images, for that matter). In short, far from being subordinate to vision, haptic object processing should be considered in its own right. In doing so, we must expand our ideas of the properties of objects that are useful, if not critical, to perception and identification.
B. THEALTERNATIVE TO VISUALMEDIATION: DIRECTHAPTIC APPREHENSION Our initial theorizing about a general model of object apprehension and categorical recognition takes the form of a set of basic assumptions that raise critical questions. These questions are the focus of our current and future research. Our model assumes that haptics is multidimensional, in the sense that the haptic system computes several distinct classes of information. These classes are related to an object’s structure, substance, and function. The structural properties include size, shape, and weight. Substance includes properties of the material from which an object is constructed, such as its hardness, elasticity, surface texture, and temperature. The functional information is for our purposes restricted to those functions that are directly indicated by the object, rather than inferred subsequent to categorization. How is this expanded set of properties related to the more basic sensory primitives of touch (tactile, such as pressure, vibration, and thermal; and kinesthetic, from joint and muscle input)? Our answer to this question derives not from our research with real common objects, but from our observation of individuals who are actively apprehending unfamiliar ones. In this circumstance, exploration is extended over time, and the importance of the hand movements made during object processing becomes manifest. These observations led us to propose that the apprehension of object properties is expanded by “piggybacking” the hand’s basic sensory capabilities onto its motor competencies. To develop this argument, consider the full capabilities of the human hand. The hand actually encompasses two systems that are at least conceptually distinct: a sensory system, with its tactile and kinesthetic sensors, and a motor system that actively manipulates objects. Work on twodimensional displays addresses the capabilities of only one of these, the sensory system, and then only to a limited extent. For this purpose, twodimensional displays are actually ideal, because they severely restrict the nature of the information that can be obtained. In its simplest form, a twodimensional haptic stimulus is a raised outline on a uniform medium (such
The Intelligent Hand
13 1
as a thin plastic). It offers a dichotomous pressure variable, which is observed over some two-dimensional space within reach of the hand. Thus the haptic system is provided with minimal pressure variation, no thermal or vibratory variation, and only planar kinesthetic variation. This information is generally not sufficient to identify objects, except for highly prototypical and simple instances. We hypothesize, however, that the motor output of the second system enables the hand to augment its perceptual capacity. In our view, the hand can usually take advantage of its motor properties to expand the range of perceived dimensions beyond pressure, temperature, and spatial position. The expanded range consists of just such attributes as are identified by phenomenological report during recognition of real objects, such as threedimensional structure, texture, and hardness. The issue of how these object properties are derived is a central concern of our research. Specifically, we assume that the purposive movements that are made during object exploration provide direct cues about the object properties that are being processed by the perceptuaI and cognitive system. We call these movements exploratory procedures (EPs). They are stereotyped movement patterns, each having certain characteristics that are invariant and others that are typical. An EP need not correspond to a particular region or configuration of the hand or to a fixed pressure or rate of movement. In general, a procedure can be executed in a variety of ways, while still maintaining its invariant (and usually, its typical) properties. Our assumption is that there is a direct link between these EPs and the object primitives that are computed by the haptic system. Thus, by studying the procedures, we can investigate the underlying haptic representation of objects in human memory and the processes that achieve and utilize that representation (Lederman & Klatzky, 1987). The object properties of interest are not assumed to be equally available to haptic processing. Computational models in the spirit of Marr’s (1982) nowclassic work suggest that there may be a particular sequence for deriving representations of an input stimulus, starting with a primitive “primal sketch.” Below, we consider what might constitute that sequence for a haptically explored object. Note that we assume that many of these properties are achieved perceptually, i.e., early in haptic processing, prior to object identification. These and other preliminary assumptions have been implemented in a simple LISP program called HAND (Haptic Apprehension and Naming Device), outlined in Fig. 4 (Klatzky, Lederman, Roan, & Andre, 1986). the purpose of this program is not to model human exploratory behavior in detail, which would certainly be premature given existing data. Rather, it is intended as a heuristic device, which embodies our conceptualization of the factors that direct exploratory behavior and lead to object apprehension and recognition.
Roberta L. Klatzky and Susan J. Lederman
132
L ~ O W L E D G E5 rFruLi-ufiEzS
0 0
DATA BASE OF OBJECT kNOWLEDGE kNOWLEDGE ABOUT CURRENT OBJECT kNOWLEDGE ABOUT EXPLORATORY PROCEDURES
EXECUTE EXPLORATORY PROCEDURE
I
,
COMPARE CURRENT OBJECT TO D A T A RASE OF OBJECT kNOWLEDGE
I
i i
MAkE D E C I S I O N A S TO WHETHER CiBJECT CAN BE I D E N T I F I E D
T I
YES
->
I D E N T I F Y . UPDATE OBJECT V-NOWLEDGE
iI MAKE D E C I S I O N AS TO CONTINUE -CURRENT EXPLORATORY PROCEDURE OR TO CHANGE
---salience of currently acquired v a l u e ---re1 iabil i t y of measurement ---degree o f confidence i n current v a l u e
L
I F DECIDE TO CHANGE, CHOOSE NEXT EXPLORATORY PROCEDURE
Fig. 4.
---missing d a t a on current object ---diagnostic properties of hypothesized object ---preference ordering of properties
HAND: Haptic Apprehension and Naming Device (from Kfatzky et ai., 1986).
HAND explores objects in order to learn about and identify them. Its knowledge about the universe of identifiable objects is contained in a data base, in which each object is represented by a series of values on dimensions. The dimensions are related to the object’s structural, substantive, and functional properties. There are three points to note about the object representation. First, it may be incomplete or fuzzy. Second, the dimensions vary in their distinctiveness or “diagnosticity” for an object. Third, the representation changes with experience. HAND also contains a repertoire of EPS, each specialized to provide
The Intelligent Hand
133
information about a particular dimension, within some measurement error. HAND learns about objects by executing these procedures. An important aspect of haptic processing (see below) is that the system is limited in how many EPs can be executed simultaneously. In HAND, therefore, the EPs compete with one another to be “selected” for execution. The process of selecting EPs for execution is like a “pandemonium” system, with all EPs competing. At any given time, which EP is actually executed depends on a number of factors, top down, bottom up, and intrinsic to the EPs themselves. As exploratory procedures are executed, their outputs lead to a representation of the explored object, which is compared to the data base abut the universe of objects. When the comparison leads to a substantial match, a tentative identification of the current object is made. If correct, the data base is updated to incorporate values from the current object.
C. QESTIONS TO BE ADDRESSED - AND SOME ANSWERS The conceptual model described by the HAND program gives rise to a number of questions to be addressed in our research: (1) What are the specific links between human EPs and haptic object primitives? (2) Why is a particular procedure used to derive a particular primitive? (3) What constrains the sequence of processing activities over time? (4) How does the haptic processing system correspond to visuatobject processing, in terms of the information that is computed, the order in which it is computed, and its importance to object identification? Our ability to answer these questions is at present mixed. Some we have addressed with empirical research; the answers to others remain more conjectural at this time. In the following sections, we describe our studies of haptic object processing and relate them to these critical questions. 1.
What Are the Links between Exploration and Object?
In addressing this question, we chose to study a set of properties that seemed, on both phenomenological and theoretical grounds, to be important attributes of objects. They are shown in Table I. The first four properties are related to the substance from which the object is made: its texture, hardness, temperature (which usually means rate of heat flow), and weight. The next properties are related to the object’s structure: its global shape, exact shape, volume, and, again, weight. Global shape refers to the regular form that would approximate the object envelope, whereas exact shape refers to the object’s precise contours. Weight is actually jointly determined by structure and substance. Finally, two properties relate to the object’s function: One is the nature of part motion (for example, rotary versus linear movement along an axis perpendicular to the rest of the object).
134
Roberta L. Klatzky and Susan J. Lederman
TABLE I LINKS BETWEEN KNOWLEDGE ABOUT OBJECTS AND EXPLORATION Knowledge about object
Exploratory procedure
Substance property Texture Hardness Temperature Weight
Lateral motion Pressure Static contact Unsupported holding
Structural property Global shape Exact shape Volume (Weight)
Enclosure, contour following Contour following Enclosure (Unsupported holding)
Functional property Part motion Specific function
Part motion test Function test
The second is the property of potential function, as determined by form. We restricted our examination to four such functions, which could be readily apprehended even from unfamiliar objects: serving as a conduit, as a pincer, as a container, and making noise. As Table I indicates, each property is associated with one or two exploratory procedures that we found are the principal means of apprehending that property. Texture is associated with lateral motion, hardness with pressure, and temperature with static contact. Weight is associated with unsupported holding. Global shape and volume are associated with enclosure. Global shape is also paired with contour following, which is the principal procedure for determining exact shape. Finally, there are unique partmotion test and function test procedures. Figure 5 indicates a stereotyped version of each of these procedures, which are specified in more detail as follows: (1) Lateral motion is identified by motion between the skin and the textured surface, typically a repetitive rub over a small area at a fairly rapid rate. (2) Pressure involves applying a normal or torque force to one part of an object while another part is stabilized or opposing force is applied. This can be seen by movement, as poking, or by explicit signs of force in the fingers and hand. (3) In static contact, the object is supported externally-by an external surface or the other hand-while one hand passively rests on it without molding. (4) In unsupported holding, the object is lifted away from any supported surface and maintained in the hand without any effort to mold the fingers to the object; typically, there is hefting of the arm or wrist, which enhances weight judgment (Brodie & Ross, 1984). (5) In enclosure, the hand maintains simultaneous contact with as much of the
The Intelligent Hand
135
envelope of the object as possible. Often one can see an effort to mold the hand more finely to the object contours; this is usually under shapeassessment conditions, however. (6) Contour following is a dynamic procedure in which the hand maintains contact with a contour of the object. Typically, the movement is smooth and nonrepetitive within a segment of object contour, stops or shifts direction when a contour segment ends, and does not occur on a homogenous surface. (7) Part-motion test is an exploratory procedure that we define only when there exists a moving part; it is the act of making the part move relative to the rest of the object. (8) And a function test is a performative movement which actually executes the object's function: running the finger along a conduit, placing the hand or finger in a container, making noise with a noisemaker, or pinching the end of a pincer together. Although these definitions may sound complicated, in practice they are generally easy to discern.
Lateral
Pressure
Enclosure
Contour
Motion
Following
Function Test
Fig. 5 .
@
Test Part Motion
Typical movement pattern for each of the EPs (from Lederman & Klatzky, 1987).
136
Roberta L. Klatzky and Susan J. Lederrnan
The present partitioning of hand movements was constructed with several goals and constraints in mind. (1) It was intended to capture the nature of movement variation specifically during object apprehension and recognition. Clearly, the list of procedures could be expanded; for example, one could include pencil-sharpening and tape-dispensing movements. However, we excluded such object-specific movement, focusing instead on procedures that would be more generally observable and related to determining object properties relevant to categorization (as indicated by Klatzky et al., 1985). This is true even of the present function-test procedure, which examines functions that can be discerned from the structural and substantive properties of even unfamiliar objects. (2) The present set of EPs was also constructed with the goal of pooling movements that are functionally identical, rather than those that look identical. This represents a departure from previous analyses (e.g., Davidson, Abbott, & Gershenfeld, 1974; Davidson & Whitson, 1974). (3) Finally, each procedure is intended to be as unambiguous as possible, which limited the level of specificity of our description. For example, variations in pressure might be valuable to observe, but are difficult to agree upon from purely visual data. Lederman & Klatzky (1987, Experiment 1) attempted to investigate the links between desired knowledge about the properties of objects and the nature of haptic exploration. Our techniques involved videotaping the hands of participants as they explored an object freely with both hands during a match-to-sample task. The task required participants to select the best match for a sample object along some designated dimension, such as surface texture. The best-matching object was not necessarily an exact match on the target property, and it was constructed so as not to match the sample with respect to other, irrelevant attributes. The objects were designed, in fact, so that the variation along irrelevant dimensions violated common correlations between object properties, such as size and weight. On each trial, the participant was first told what property constituted the basis for the match, and was then given the sample object to explore. Next, the participant was presented with three comparison objects, one at a time, and allowed to explore each. A comfortable exploration period was set for each property, based on pilot testing. Finally, the participant indicated the best match. The comparison stimuli were selected so that accuracy was fairly high but not perfect. Our concern was with the nature of the hand movements during exploration of the sample object on each trial. Our intention was to partition the exploratory period into classes of movement. We made an initial distinction between what we call “task maintenance” procedures and the exploratory activities used for learning about object properties. Task maintenance includes those actions necessary to maintain the object in a stable position or to orient it for examination. Our more important distinctions were those
The Intelligent Hand
137
that divided the nonmaintenance activities into the eight exploratory procedures described above. A naive scorer classified each period during which the sample stimulus was examined, assigning each discernibly distinct activity to one of the exploratory procedures, or alternatively, to task maintenance. The scorer was given instructions enabling her to identify the eight exploratory procedures by their invariant and typical properties. Although it is time consuming, this classification appears to be reasonably reliable. The principal data from this study were duration profiles of exploration for each instruction condition. That is, for each object property that was to be used as the basis of the match, we can see how much exploration time was devoted to the various classes of movements. Table I1 shows these profiles, in the form of z scores computed over columns, which adjust for the fact that different procedures inherently take different times to execute. It excludes the part-motion and function-test procedures, which were scored only for selected trials. Each cell entry shows the z-score duration of the given exploratory procedure for the given criteria1 property, relative to the same procedure when other properties were specified. We can see that these distributions are far from uniform. There tend to be clear cases where a procedure is executed, and cases where it is not. We had originally predicted links between exploration and object properties (see Lederman & Klatzky, 1987). The next question is whether the cases where a procedure is observed are those we predicted. The most striking departure from our original predictions was that in addition to the use of static contact to assess temperature, there was also a tendency to enclose an object TABLE I1 DURATION OF EXPLORATORY PROCEDURES UNDER EACH INSTRUCTION ( z SCORES NORMALIZEDBY
COLUMNS)
Procedure Instruction
Lateral motion
Pressure
Texture Hardness Temperature Weight Volume Shape (global) Shape (exact) Part motion Function
2.78 0.06 -0.56 - 0.48 -0.38 - 0.27 - 0.24 -0.58 - 0.34
- 0.22 2.82 -0.31 -0.38 - 0.48 -0.30 - 0.49 - 0.41 - 0.23
Static contact
Unsupported holding
- 0.89
-0.38 -0.30 -0.38 2.83 - 0.28 -0.38 -0.34 - 0.38 - 0.38
- 0.46
1.43 - 0.89
0.61 - 0.89 - 0.89
0.11 1.79
Enclosure -0.96 - 1.00
1.13 - 0.73
1 .80 0.05 1.05 - 0.57 - 0.76
Contour following
- 0.60 - 0.69 - 0.67 -0.68 -0.18 0.48 2.63 - 0.19 - 0.10
138
Roberta L. Klatzky and Susan J. Lederman
for this purpose. This makes good sense, in that enclosure would maximize the contacting skin surface for the relatively small objects that we used, and temperature assessment is enhanced by greater skin contact (Kenshalo, 1970). We have also informally noticed that this use of enclosure is less molded to the detailed contour of the object than is enclosure for the purpose of assessing shape. Our next question was how distinctive the EPS are from one another. We used the durations of exploration in a discriminant analysis, to see whether the instruction on each trial could be predicted from the movement profile, that is, from the duration of each procedure. Again, we eliminated the partmotion and function-test procedures as predictor variables, because they did not apply to all objects. However, we did include the part-motion and function-test instructions in the set to be classified. This analysis indicated that the profiles of movement were sufficiently different to classify a trial according to the property that was specified as the basis for the match. Classification was entirely accurate except for the part-motion and functiontest trials, which tended to be confused with one another. (But recall that we were excluding the EPs that would be most diagnostic of those trials.) The classificatory discriminant analysis computes the Mahalanobis distance measure between classes (i .e., the normalized distance between instruction conditions with respect to the EP-duration variables). We used this measure in a clustering analysis to examine the similarity between the duration profiles under the various property-matching instructions. Figure 6 shows the results of this analysis by plotting instruction clusters against the similarity value at the point of formation. We can see that part-motion and function matches are maximally similar with respect to the durations of the six procedures included in this analysis. Next to cluster are temperatureand volume-matching instructions, which tend to concentrate on enclosure and static contact, although we assume for different reasons. Global shape then clusters in with part-motion and function test, and so on up the tree. Exact shape matching enters last because its duration profile is distinguished by long periods of contour following. The similarity between part-motion and function-matching trials reflects the fact that both involve substantial contour following and static contact. The first of these seems reasonable, since knowledge of motion and function is likely to follow from a structural analysis of the object that requires contour following. But static contact is the EP associated with temperature detection. Why should it be highly diagnostic of part motion or function? In answer, we take these periods of static contact to reflect cognitive analysis, during which purposive movement is temporarily arrested. To summarize, this study documents a relationship between exploratory movements of the human hand and desired knowledge about an object. EPs appear to be readily identifiable and reasonably specialized for com-
The Intelligent Hand
139
INSTRUCTION
Fig. 6. Cluster analysis of matching tasks on the basis of movement profiles. The clusters formed are plotted as a function of the similarity values at the points of formation (from Lederman & Klatzky, 1987).
putation of particular object attributes. This brings us to our second question. 2.
Why Is a Procedure Used to Derive a Primitive?
In order to answer this question, we must consider each EP as a physical system, which takes certain exploration-related variables and converts them to representations of information about the object. Consider, for example, the procedure used to measure hardness, namely, application of pressure to a surface. One hypothesis is that the explorer uses knowledge of the force applied and the distance that the finger moves to derive an assessment of hardness or compliance. With a pressure-sensitive robot finger, Bajcsy and Hager (1983) used just such an algorithm to compute compliance, first calibrating the finger by determining the sensor output as a function of weights placed on the surface. This essentially corresponds to a psychophysical experiment with a robot subject.
140
Roberta L. Klatzky and Susan J. Lederman
We are undertaking a novel approach to such questions in collaboration with Ruzena Bajcsy of the University of Pennsylvania Computer Science Department and GRASP Robotics Lab. We are implementing exploratory procedures modeled after those seen with humans, but with a robot end effector equipped with sensors. Our approach is to develop effective algorithms that compute object dimensions, given the sensor and the prescribed mode of exploration. Of course, whether the exploration has any effect at all depends upon the sensing device being used. However, by determining which devices lead to advantages for particular exploratory procedures, we may begin to understand what type of information the procedure is augmenting. 3.
What Determines the Sequence of Haptic Processing?
There are two general answers to this question: First, there may be sequential constraints on the achievement of haptic representations due to the nature of the computations performed during perception. That is, the output of one representation might be used as the input for computing a subsequent representation; for example, contour information might be used to derive a representation of function. Our working hypothesis is based on a logical analysis of object dimensions, within the context of contemporary work on sensory processing (e.g., Treisman & Gelade, 1980). We propose that some object dimensions are processed in parallel by the haptic system early in the course of perception. Likely candidates for these object dimensions are temperature, texture, and hardness, the dimensions of the object’s material substance. However, there is also an intrinsic serial order to some object dimensions. Local surface information should be obtained before information about the object’s contours (assuming that contour requires a larger spatial sample); a global envelope is likely to be obtained prior to exact shape; and part motion should follow an analysis of local contour (since the part must be isolated). Function, being inferential, should be derived late in processing. Thus, we would roughly order the computations as producing surface and internalsubstance primitves (in parallel), global volumetric primitives, more precise contour information, and finally, function. We are not assuming strict seriality in the computation of these latter representations, but rather assume that they are initiated in a particular order and then proceed in a processing cascade (McClelland, 1979). Some data relevant to these hypotheses are described below, but they are a major interest for future research. A second factor that could influence the haptic processing sequence is the nature of the EPs themselves, which may constrain the order in which they are executed. For example, the first procedure to be executed might be that
The Intelligent Hand
141
which provides at least minimal information about the greatest number of object properties. This does not mean that the properties must be processed in parallel, of course. However, the selection of this procedure constrains what properties are available for processing, and to that extent it will influence the order in which object attributes are encoded. More specifically, the ordering of procedures may reflect their sufficiency, necessity, and optimality for generating information about object dimensions. A procedure is necessary to determine an object attribute if no other means of exploration would be adequate. A procedure is sufficient if it provides some information about the given attribute (i.e., allows abovechance performance) and optimal if it provides more accurate information (and, possibly, is faster to perform) than any other means of exploration. A procedure that is nonoptimal but sufficient for determining an object attribute might be applied if it were sufficient and/or optimal for other attributes. By this means, information about several object properties could be determined at one time. We might expect, then, that the first procedures implemented are those that are sufficient for many object dimensions, i.e., those that are not specialized. In order to begin addressing these issues, it is critical to determine the necessity, sufficiency, and optimality of the various exploratory procedures. We have done so, in a study (Lederman & Klatzky, 1987, Experiment 2) that was similar to the match-to-sample task initially used to determine the relationships between object dimensions and exploration. In this new study, we constrained the nature of exploration and then assessed the ability to match objects on a targeted dimension. The exploratory procedures that subjects were constrained to execute were lateral motion, static contact, unsupported holding, pressure, enclosure, and contour following. Each procedure was used with each of the to-be-matched dimensions: texture, temperature, weight, hardness, size, global shape, and exact contour. For example, the objects designed for texture matching were explored and judged by each subject on six distinct trials, each with a different exploratory procedure. The time allowed for each type of exploration was based on the time for which the given procedure was spontaneously produced in the original match-to-sample study. Figure 7 shows the data for this study for each combination of exploratory procedure and target dimension. Each panel corresponds to a dimension. The exploratory procedures that did (and did not) result in above-chance accuracy are ordered (left to right) from best to worst. Any procedure producing above-chance accuracy is termed sufficient. As can be seen, in most cases there were several procedures that were sufficient to match on a given dimension. However, in the case of exact contour matching, only contour following was sufficient. Hence it is not only sufficient, but necessary. The procedure leading to the most accurate performance
Roberta L. Klatzky and Susan J. Ledeman
142
TEXTURE
HARDNESS
WEIGHT
VOLUME
Irn *)
.o
P
D 40
rn
Irn .o m
60 %
SC
CF
UH
EN
LM
PR
CF
LM
TEMPERATURE
P I
SC
UH
EN
The Intelligent Hand
143
can be termed optimal. Generally, the procedure found to be associated with a dimension in the original match-to-sample study (marked by an asterisk in the figure) was optimal in this sense, although other procedures might lead to statistically equivalent accuracy. The only surprise in these data was that enclosure did not produce the greatest accuracy in globalshape matching, although initially predicted to do so. In the case where some exploratory procedure is not a clear winner in accuracy, optimality can be defined as a speed advantage. We found in our initial study that contour following is a particularly slow procedure, for example. Thus we may conclude that lateral motion is optimal for texture matching, even though contour following is about as accurate, because lateral motion is much faster. A similar argument may be made for unsupported holding being optimal in weight matching, as our first match-tosample task generally found it to be a faster procedure than either enclosure or contour following. Two considerations suggest that our indices of optimality are actually lower bounds. First, when one procedure is executed, another may inevitably be involved to some degree. For example, lateral motion over a surface is effected during the course of contour following. This means that the effectiveness of nonpredicted procedures may be overestimated, because their performance involves the optimal procedure to a certain extent. Second, in constraining the nature of exploration in this study, we may have reduced the effectiveness of some procedures. For example, subjects were instructed to enclose objects while keeping their hand on the table surface to avoid execution of unsupported holding at the same time. This may have reduced access to structural information in the third dimension and thus undermined global shape and size matching. Finally, in addition to necessity, sufficiency, and optimality, this study gives us indications of the specialization of EPs. A procedure is highly specialized if the difference between its best performance (for example, exact-contour matching for the contour-following procedure) and performance with other to-be-matched dimensions is great. Measuring specialization by calculating z scores over dimensions, then computing, for each EP, its highest z score minus the average of all others indicated that pressure was the most highly specialized procedure. Perhaps more important, enclosure was considerably less specialized than any other procedure. It resulted in above-chance performance for most dimensions, but was not a clear winner for any. This means that the procedure effected during the most basic prehensile contact with an object, the grasp, is broadly informative with Fig. 7. Experiment 2: Histograms of the accuracy level for each EP under each dimensionmatching instruction. EPs are ordered left to right from highest to lowest accuracy. EPs that did not attain abovechance performance are shown to the right of the dashed vertical line. EPs to the left of the vertial line were all sufficent for performing the task (from Lederman & Klatzky, 1987).
144
Roberta L. KLatzky and Susan J. Ledermao
respect to the object’s structural and substantive properties. In our current research, we are determining whether the ordering of procedures during spontaneous exploration of objects reflects necessity, specialization, and optimality. We predict that for basic-level categorization, generality of function will usually dictate the response: the initial procedure will tend to be an enclosure, in the form of a grasp. But when more refined, subordinate-level discriminations must be made, e.g., between rough and smooth grades of sandpaper, necessity and optimality will be more potent predictors of exploratory activity. There are, of course, other bases for selecting an EP. One would be “top down”-to verify a hypothesis. We can contrast this with a “bottom up” basis for procedure selection, as when initial exploration reveals that the object is highly distinctive on some dimension, in comparison to the universe of objects (e.g., the softness of a cotton ball). This salient value on the dimension causes the procedure appropriate to the dimension to be executed for purposes of acquiring another reading. The HAND model incorporates such mechanisms. 4.
Correspondence between Haptic and Visual Processing
It is interesting to note that the visual system is thought to compute at least some of the same primitives as the haptic system, including texture, global shape, contour, and even function. At some level of representation, then, it seems possible, and perhaps even likely, that vision and touch converge. This is not to say, however, that the two systems are entirely comparable beyond low levels of processing. The two modalities seem likely to specialize in rather different properties of objects. Of the haptic primitives we propose, hardness, weight, temperature, and part mobility would seem to be more available to touch. Precise representation of contour, on the other hand, is far more likely in vision, especially for two-dimensional displays (Easton & Moran, 1978; Lederman et al., 1985; Rock & Victor, 1964). Moreover, those primitives that are reasonably well computed by the two systems are likely to be of differential importance. There is substantial evidence, in particular, that volumetric primitives play an overriding role in visual-object recognition (Beiderman, 1987). In contrast, these primitives are likely to be costly to compute through touch, especially when an object comprises a composite of several volumetrics. Surface texture, on the other hand, is frequently sufficient to identify a common object, as we have found from our data on haptic object recognition (Klatzky et al., 1985). Thus we hypothesize that substance-related information will have priority in touch and structural information in vision. Texture is actually a multidimensional property that is likely to be shared by both modalities. Lederman, Thorne, and Jones (1986) have shown that both vision and touch process texture reasonably well, but they treat it in
The Intelligent Hand
145
different ways. A discrepancy paradigm was used, in which subjects made psychophysical judgments about textures they were seeing and feeling. Unbeknownst to the subjects, the two textures were different, enabling the contribution of the two modalities to the judgment to be separated. When instructed to treat texture as spatial density, vision dominated, whereas an instruction to treat it as roughness led to dominance by touch. When instructions referred merely to “texture,” the two modalities weighed the inputs about equally (Lederman & Abbott, 1981). We have assessed our predictions about the salience of object dimensions to vision and touch in an experiment using a sorting task (Klatzky, Lederman, & Reed, 1987). The stimuli were hand-size objects varying along four dimensions: size (three levels, all graspable by the hand), shape (oval, hourglass, clover), surface roughness (satiny, ribbed, coarse), and hardness (foam rubber, plastic foam, balsa wood). The stimuli were scaled initially so that similarity ratings along the four dimensions were reasonably comparable, as were intervals between scale values. However, when dimensional discriminability was measured by having subjects sort the objects into levels on a dimension as quickly as possible, size proved less discriminable than shape, hardness, and texture, which were about the same. Subjects were asked to sort the stimuli three times, into two, three, or four bins. The three-bin sort was intended to reveal the most important dimension. For example, if hardness were salient, subjects might use one bin for soft objects, one for hard, and the third for medium soft. The twobin sort was intended to reveal the most important cutpoint within a dimension. For example, subjects might sort the hardest objects into one bin and the rest into a second bin. And the four-bin sort was intended to bring in secondarily salient dimensions. Most importantly, subjects were instructed to sort in one of four ways: (1) touch (without vision), unbiased-to put things that seemed similar into the same bin; (2) touch, biased toward touch-to put things that felt similar into the same bin; (3) touch, biased toward vision-to put things into the same bin if the visual image of them was similar; (4) touch plus vision, unbiased-like (l), but subjects were also allowed to see the stimuli. The sorting data were converted to “cutpoint” scores, one for each pair of levels within each dimension (Levels 1 versus 2, 1 versus 3, 2 versus 3). For each bin, the number of objects representing each of the two levels was determined and the difference between these numbers was computed. For example, if subjects sorted two small (Level 1) and five large (Level 3) objects into a common bin, the score for the 1-versus-3cutpoint within the size dimension would be 3 (5 - 2). These numbers were summed over bins. Each cutpoint score indicates how well the two levels that constitute the cutpoint were discriminated. A score of zero means that objects representing both levels were aggregated within the same bin, that no discrimination was made. The data from this study were generally in accord with our predictions, as
Roberta L. Klatzky and Susan J. Lederman
146
Shape
45
45
40
40
35
f
Size
--I
50
35
30
$
30 25
in
m
Z
25
E
2
20
5
E
B 20
15
15
10
10
5
5
0
0
TV
w
T
IT
I"Str"cf,o"s
Hardness 50
45 40
35
f
30
vi
E E 3
25 20
15 10
5 0 TVI
Tv I"str"Ctlo"S
T
IT
Fig. 8. Average cutpoint scores for each of four object dimensions, under four instructions (TVI, touch with visual imagery; TV, touch with vision; T, unbiased touch; IT, instructed touch). Adapted from Klatzky, Lederman, & Reed (1987). 0 1987 American Psychological Association.
shown in Fig. 8, which presents the average cutpoint scores on each dimension for the three-bin sort. Within the substance dimensions (hardness and texture), the unbiased and touch-biased instructions show high scores. Texture is also used extensively by the touch-plus-vision group. Shape was used somewhat by all groups (due to its high discriminability), but most by the visual imagery group and least by the two touch groups. Size was not used significantly by any group. To summarize, among those groups who were denied vision, contour dominated the judgments of those given visual-imagery instructions, whereas substance was particularly salient to those who had no such visual bias. The group who used vision as well as touch acted like a composite of the tactual groups, making use of both contour and substance dimensions. These observations were verified by correlations between groups, based on the 36 cutpoint scores. The two touch groups were strongly correlated
The Intelligent Hand
147
(.75), and the visual imagery and vision groups were also strongly correlated (.69). The cross correlation between touch-plus-vision and touch groups was moderate (about S O in both cases), and substantially greater than that between tough-plus-imagery and touch groups. We also conducted a stepwide discriminant analysis to determine if subjects could be classified into their groups from the cutpoint scores, and if so, which scores were relevant. Scores on the shape dimension were the most potent discriminators. Two shape scores did reasonably well (about 60% correct) at categorizing subjects from the touch and visual-imagery groups. However, this analysis did very poorly with the touch-plus-vision group: 65% error, with 8 of 20 subjects classified as touch and 5 as visual imagery. This group was clearly confusable with the others. Size was used very little in this study by any group, raising the possibility that the relatively low discriminability of size differences led to their being ignored. Accordingly, we replicated the study, using only the extreme levels of the size dimension, which had been shown to be highly discriminable by touch. Although this raised the scores for size to some extent, it did not dominate judgments of haptic explorers, even in the two-bin sort (where the presence of only two size values should have motivated its use). In this condition, those given visual imagery instructions still emphasized form far more than size; those without visual bias emphasized hardness slightly more than size. On the whole, then, it appears that the most salient attributes of objects do differ between visual and haptic modes of exploration. That is, the conscious percept is quite likely to be different for the two modalities. When unbiased, touch tends to focus on the properties of explored surfaces and internal substance more than contour. When visual imagery is induced, it focuses predominantly on contour. Visually guided touch is like a mixture of these two extremes. It is interesting to note that the properties that were found to be salient to touch are also properties that we assume, on a logical basis, to be computed relatively early. This raises the possibility that in the course of haptic object recognition, identification of the material from which an object is made may occur early, followed by object categorization per se at a later stage. The temporally extended and arduous nature of contour apprehension also suggests that it might be an attentional process in touch, whereas at least initial stages of contour processing appear to be automatic in vision (Biederman, 1987; Treisman & Gelade, 1980). These speculations demand further support, of course. 111. Conclusions and Applications
To summarize, our research program highlights certain fundamental aspects of haptic apprehension and recognition. We have shown that haptics
148
Roberta L. Klatzky and Susan J. Lederman
can be remarkably effective or it can be quite poor, depending upon the nature of the stimulus information available. We argue that planar stimuli made of a uniform material reduce the effectivenessof this system by denying it access to information that it encodes well and by forcing it to use a subset of its encoding processes. We have determined that the haptic system has developed specialized exploratory procedures for apprehending different object attributes. A procedure tends to be optimal for encoding only one particular property of objects, but it may be sufficient for encoding several other attributes. The degree of specialization varies over procedures, with the most broadly effective procedure, enclosure, being executed by a simple grasp. We have argued that haptics should not be dismissed as merely providing the input for a process of imaginal reperception; it has its own encoding sequence. The information that is most directly accessible to touch tends to be attributes of the material from which an object is made or global properties such as weight and gross shape. Contour and volumetric information, so critical to visual perception, are acquired relatively inefficiently through haptic exploration. These findings constitute a first step toward a model of haptic apprehension and object categorization. But even this early work has practical implications. One area of application pertains to the development of tangible graphics displays as aids for the blind. These displays are often based on the image-mediated model, which assumes that touch functions like an impoverished visual sense. In fact, raised two-dimensional line drawings for reading by touch are at best metaphorical, according to our theory. For example, the pictorial device of depicting three dimensions by projecting them onto a picture plane is inappropriate for haptic encoding, which is more likely to treat the lines as surface texture. Why should this not be the case for a system which encodes three dimensions by exploring in three dimensions, not along a plane? Similarly, the interposing of object contours to indicate occlusion may be readily understood through vision, but with haptic exploration, an object may mistakenly be perceived as terminating when its contours are interrupted by another (Wake, Shimizu, & Wake, 1980). Of course, people might develop cognitive rules to interpret touched planar stimuli as they would visual images (Kennedy, 1983), but they are not then using haptics in a natural or direct way. We have examined many graphic displays which were clearly inappropriate for haptic encoding. In addition to the questionable benefit of using perspective, we have seen displays that present the figure as a smooth surface and the ground as rough, thus mistakenly highlighting the latter! We have seen attempts to portray complex contour (for example, in circuit diagrams, maps, or mathematical functions), although a high level of complexity is unlikely to be within the apprehension capabilities of the
The Intelligent Hand
149
kinesthetic sense. It is not surprising, then, to find that current tangible graphics displays are rarely useful to the blind. A rather different and very promising area of application for this research is the design of flexible, multipurpose robots. Sensory feedback becomes critical in this challenging task, and such devices are therefore likely to have a variety of tactile sensor devices and a relatively varied means of movement. Previously, we alluded to research with robots that may shed some light on the algorithms by which humans compute object properties. But conversely, study of the human may suggest ways to optimize robotic perception. The human model of haptics may indicate not only how to move the robot end effector, but also what features to extract and how to sequence the analysis of object properties for identification. Our research is only in its initial stages. We have much more to learn about haptics. What we do know now, however, is sufficient to indicate the complexities and importance of this marvelous vehicle for perception and categorization. Research on haptics has too frequently been ignored by those in the fields of visual perception and spatial cognition. Our research program points out that the study of haptics does not focus on an esoteric avenue of perception, but provides a general tool for studying fundamental issues related to attention, pattern recognition, and cognitive processing.
ACKNOWLEDGMENTS The joint research program described here is supported by grant number BNS84-21340 from the National Science Foundation to R. L. Klatzky and grant number A9854 from Natural Sciences &Engineering Research Council of Canada to S. J. Lederman, and by a contract with the Office of Naval Research to R. L. Klatzky, S. J. Lederman, and R. Bajcsy.
REFERENCES Bajcsy, R, & Hager, G. (1983). Tactile information processing: The bottom up approach. Technical Report of the Dept. of Computer and Information Science, University of Pennsylvania. Beiderman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115-148. Berla, E. P. (1982). Haptic perception of tangible graphic displays. In W. Schiff & E. Foulke (Eds.), Tactualperception: A sourcebook (pp. 364-386). Cambridge: Cambridge University Press. Brodie, E., &Ross, H. (1984). Sensorimotor mechanisms in weight discrimination. Perception & Psychophysics, 36, 47-81. Cashdan, S. (1968). Visual and haptic form discrimination under conditions of successive stimulation. Journal of Experimental Psychology Monograph, 76, pt. 1 . Davidson, P. W., Abbot, S., & Gershenfeld, J . (1974). Influence of exploration time on haptic and visual matching of complex shape. Perception & Psychophysics, 15, 539-543.
150
Roberta L. Klatzky and Susan J. Lederman
Davidson, P. W., & Whitson, T. T. (1974).Haptic equivalence matching of curvature by blind and sighted humans. Journal of Experimental Psychology, 102,687-690. Dodds, A. G., Howarth, C. I., &Carter, D. C. (1982).The mental maps of the blind: The role of previous visual experience. Journal of Visual Impairment and Blindness, 76, 5-12. Easton, R., & Moran, P. W. (1978). A quantitative confirmation of visual capture of curvature. The Journal of General Psychology, 98, 105-1 12. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Ikeda, M., & Uchikawa, K. (1978).Integrating time for visual pattern perception and a comparison with the tactile model. Vision Research, 18, 1565-1571. Kennedy, J. M. (1983).What can we learn about pictures from the blind? American Scientist, 71. 19-26. Kenshalo, D. R. (1970). Psychophysical studies of temperature sensitivity. In W. D. Neff (Ed.), Contributions to sensory physiology, V o l 4 . New York: Academic Press. Kerst, S. M., &Howard, J. H., Jr. (1978).Memory psychophysics for visual area and length. Memory & Cognition, 6, 327-335. Klatzky, R. L., Lederman, S. J., & Metzger, V. A. (1985). Identifying objects by touch: An “expert system.” Perception & Psychophysics, 37, 299-302. Klatzky, R. L., Lederman, S. J., &Reed, C. (1981).There’s more to touch than meets the eye: The salience of object attributes for haptics with and without vision. Journal of Experimental Psychology: General, in press. Klatzky, R., Lederman, S., Roan, B., & Andre, K. (1986). Haptic apprehension and naming device. Cognitive Science Series, TechnicalReport 8601,Univ. of Calif. at Santa Barbara. Lederman, S. J. (1979). Tactual mapping from a psychologist’s perspective. Bulletin of the Assoc. of Canadian Map Libraries, 32, 21-25. Lederman, S. J., &Abbot, S. G. (1981).Texture perception: Studies of intersensory organization using a discrepancy paradigm, and visual versus tactual psychophysics. Journal of Experimental Psychology: Human Perception and Performance, 7, 902-915. Lederman, S. J., & Klatzky, R. L. (1987). Hand movements: A window into haptic object recognition. Cognitive Psychology, 19,342-368. Lederman, S.,Klatzky, R. L., &Barber, P. (1985).Spatial and movement-based heuristics for encoding pattern information through touch. Journal of Experimental Psychology: General, 114, 3349. Lederman, S., Klatzky, R. L., Collins, A., & Wardell, J. (1987). Exploring environments by hand or foot: Time-based heuristics for encoding distance in movement space. Journal of Experimental Psychology: Learning, Memory, and Cognition, in press. Lederman, S., & Taylor, M. (1969). Perception of interpolated position and orientation by vision and active touch. Perception & Psychophysics, 6, 153-159. Lederman, S. J., Thorne, G., & Jones, B. (1986). Multidimensionality and intersensory integration. Journal of Experimental Psychology: Human Perception & Performance, 12, 169-180. Lobb, H.(l%5). Vision versus touch in form discrimination. Canadian Journal of Psychology, 19. 175-187. Loomis, J., & Lederman, S. J. (1986). Tactual Perception. In K. Boff, L. Kaufman, & J. Thomas (Eds.), Handbook of perception and human performance. New York: Wiley. Marr, D. (1982). Vision. San Francisco: Freeman. McClelland, J. L. (1979).O n the time-relations of mental processes: An examination of systems of processes in cascade. Psychological Review, 86, 287-330. Rock I., & Victor, J. (1964). Vision and touch: An experimentally created conflict between the two senses. Science, 143, 594-596. Rosch, E. (1978)Principles of categorization. In E. Rosch & B. Lloyd (Eds.), Cognition and categorization. Hillsdale, New Jersey: Erlbaum.
The Intelligent Hand
151
Simons, R. W., & Locher, P. J. (1979). Role of extended perceptual experience upon haptic perception of nonrepresentational shapes. Perceptual and Motor Skills, 48, 987-991. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognifive psycho fog^, 12, 97-136. Tversky, B. (1981). Distortions in memory for maps. Cognitive Psychology, 13, 407-433. Wake, T., Shimizu, Y., &Wake, H.(1980). Perception of tactile three dimensional information and visual aids for blind persons. Japanese Journal of Ergonomics, 16, 27-36. Weinstein S. (1968). Intensive and extensive aspects of tactile sensitivity as a function of body part, sex, and laterality. In D. R. Kenshalo (Ed.), The skin senses (pp. 195-218). Springfield, IL: Thomas. Worchel, P. (1951). Space perception and orientation in the blind. PsychologicalMonographs, 65 (Whole No. 332).
This Page Intentionally Left Blank
SU-
APPROXIMATIONS TO A MODEX OF HUMAN MOTOR PROGRAMMING David A . Rosenbaum DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF MASSACHUSETTS AMHERST, MASSACHUSETTS 01003
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...............
11. Hierarchical Decisions in Sequence Choices 111. The Motor-Program Editor Model . . . . . . .
IV.
V.
VI.
VII.
...............
A. Studies Using Stimulus-Response Compatibility Effects . . . . . . . . . . . . . B. The Parameter Remapping Effect. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . Further Tests of the Hierarchical Decisions Model and Motor-Program Editor Model ....................................... The Hierarchical Editor Model. . ............ A. The Hierarchical Nature of . . . . . . . ...................... B. The HED Model’s Fit to Ea ................. C. Implications of the HED M Parallel Editing and Execution . . . ............... A. Inverse Length Effects .................... . . . . . . . . . . . . . . . . . . . . . B. Scheduling.. ................................................. C. Scheduling and the HED Model . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . References . . . . . . . . . . . . . . ...............
153 154 157 158 162 163 169 173 175 175 176
181
1. Introduction
Before one carries out a voluntary act, however mundane it may seem, a sophisticated planning process occurs. Understanding the nature of this process can have importance for cognitive psychology, since the action system is vital for perception and the realization of decisions. In addition, the cognitive operations underlying the planning of actions may be fundamental to other psychological functions since the impressive motor skills of “lower animals” may have set the evolutionary stage for the cognitive skills that humans possess, and the forms that our cognitions take must ultimately take account of the means by which they are enacted. Finally, cognitive theory has historically placed much stock in the importance of action. Witness the stress on learning by doing in Piagetian theory and the longstanding interest in motor theories of THE PSYCHOLOGY OF LEARNING ANDMOTIVATION, VOL. 21
153
Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
154
David A. Rosenbaum
perception, especially in the areas of speech (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and visual form perception (Coren, 1986). Action planning can be studied in a variety of ways-by directly monitoring the activity of the nervous system, by scrutinizing patients with movement disorders, by observing performance and its errors in everyday life, and by studying the timing and accuracy of responses in controlled laboratory settings. This article is concerned with the latter type of research. It summarizes a series of experiments on the cognitive activity that immediately precedes and allows for the execution of voluntary actions. For convenience, I refer to this cognitive activity as motor programming and to its resultant representations as motor programs. The domain of motor programming that my colleagues and I have studied is the preparation of rapid finger sequences. We have been interested in this area for several reasons. The control of finger sequences is important in keyboard entry and musical performance. Finger sequences can be performed extremely rapidly and appear to be executed without direct conscious control. At the same time, they appear to be subject to a number of cognitive constraints, some of which are described here. Finally, it is easy to record individual finger presses with modern computer equipment. The goal of the research is to delineate the processes of motor programming and the structure of motor programs. In the experiments reviewed, subjects (all college students) were asked to perform one of two memorized finger sequences after the appearance of a choice reaction signal. At the start of each block of trials, the subjects were told what the two possible finger sequences were and what the corresponding reaction signals would be; usually the sequences were very short and could be learned immediately. On each trial, one of the signals appeared and the subject was supposed to produce the designated sequence as quickly and accurately as possible. The data of interest were the times and identities of produced responses. There were two empirical questions: (1) How do the times for individual finger responses depend on the types of sequences in which the responses are embedded, and (2) how does the timing of individual responses within a sequence depend on the other sequence that is possible? If the timing of responses within a sequence changes as a function of the alternative sequence, the changes can be attributed to the operations underlying programming of the sequence to be performed. 11. Hierarchical Decisions in Sequence Choices
Our first experiment (Rosenbaum, Saltzman, & Kingman, 1984b, Experiment 1) concerned an issue arising out of an influential set of results reported
A Model of Human Motor Programming
155
by Sternberg, Monsell, Knoll & Wright (1978). These investigators found that the simple reaction time to produce the first response in a highly prepared response sequence increased with the length of the sequence, up to an asymptote of about eight items (see Monsell, 1986). Similarly, the average interresponse time within the sequence increased with sequence length, often at the same rate as the time to initiate the sequence. These results suggest that a time-consuming programming process precedes the execution of response sequences. My colleagues and I wanted to investigate this process in choice situations. We asked subjects to choose between sequences of varying length: (1) i versus I, (2) ir versus IR, or (3) irm versus IRM, where i, r, and m denote button presses of the index, ring, and middle fingers of the left hand, respectively, and I, R, and M denote button presses of the index, ring, and middle fingers of the right hand, respectively. Subjects learned to associate one visual signal (0)with one sequence and another visual signal (X) with the other sequence. The instruction was to perform the required sequence accurately, minimizing the delay between appearance of the signal and completion of the required sequence (although simultaneous responses were prohibited). The main results of the experiment (see Fig. 1) were that T I ,the mean time for the first response after the reaction signal, increased with the length n of the sequence to be performed; the mean time for the second response, T 2 ,was longer when that response was embedded in a sequence
I
2
3
3
2
NUMBER OF RESPONSES,
n
Fig. 1. Results of the first experiment of Rosenbaum et a / . (1984b).The graph shows the for response 1 < j < 3 in sequences of length 1 < n < 3. Panel A shows mean mean latency, q., T , . Panel B shows mean T2 and T 3 .
156
David A. Rosenbnum
of length n = 3 than when it was embedded in a sequence of length n = 2. These results replicate the findings of Sternberg et al. It is natural to try to account for the results of Fig. 1 with a model that can also account for length effects when subjects know in advance what sequence will have to be produced. One such model (Rosenbaum, 1985), which is based on previous models of memory retrieval (Bower, Clark, Lesgold & Winzenz, 1969; Johnson, 1970), assumes that response sequences are produced through the successive decomposition of motor subprograms into their constituents. Visualizing this process as a tree-traversal process (see Rosenbaum, Kenny, & Derr, 1983) and considering all possible binary trees’ for sequences with varying numbers of terminal nodes (corresponding to individual responses), it can be shown that (1) on the average, the length of the node path from the root of the tree to the leftmost terminal node increases with sequence length (up to an asymptote), and (2) on the average, the length of the node path from one terminal node to the next increases with sequence length (see Rosenbaum, 1985, for details).*This model can be extended to the task of choosing between sequences. One merely needs to assume that subjects choose between trees corresponding to the sequences. Thus, after the choice signal is identified, the subject decides which tree to access, and then courses through the tree in the same manner as when the identity of the required sequence is known ahead of time. If one assumes that it takes extra time to traverse each extra node, the tree-traversal model accounts for length effects on initiation times and interresponse times, both in choice and simple reaction-time experiments. Although this model provides a straightforward account of sequence choice, it runs into problems because it predicts that the time to perform one sequence should be unaffected by characteristics of the other possible sequence; that is, the structure of one tree should be unaffected by the structure of the other. Rosenbaum et al. (1984b) tested this prediction by pairing sequences of different lengths. For example, response I was paired with i in one condition, or with irm in another condition. The prediction was that the latency of I would remain the same regardless of the length of the other possible sequence. In fact, the latency of I was considerably longer when the alternative sequence was irm than when the alternative sequence was i. ‘The restriction to binary trees is guided less by prior theoretical commitments than by the fact that when other sorts of trees are allowed, the data are accounted for less effectively. It happens, however, that in linguistic theory, binary trees have been found to have special, preferred status in the description of syntax (Kayne, 1984). ’This model also predicts inverted U-shaped serial position curves for interresponse times, because breaks between major sections of the tree are often encountered around the middle of the sequence. Such effects have been obtained by Sternberg et al. (1978), although they are not uniquely predicted by their “buffer search” model. The tree-traversal model does uniquely predict this effect.
A Model of Human Motor Programming
157
By contrast, the latency for irm was only slightly longer when this sequence was paired with IRM than when it was paired with I. To account for these results, Rosenbaum et a[. (1984b) proposed an elaboration of the model outlined above: the hierarchical decisions model (see Fig. 2). This model adopts all the assumptions of the original tree model but adds two assumptions to it. One is that choices are always made at the same functional level. The other is that choices are always made at the highest level possible. Thus, choosing I when the alternative is irm (Fig. 2C) requires an intermediate decision about a superordinate node, whereas the comparable decision is unnecessary when I is paired with i (Fig. 2B). If one counts the number of nodes to be traversed in order to reach the leftrnost terminal node for each sequence in each choice situation, the T, data can be accounted for with this model. 111. The Motor-Program Editor Model
Although the hierarchical decisions model can account for the choicecontext effects obtained by Rosenbaum et al. (1984b), it does not account for choice-context effects that have been obtained in other studies. Consider, for example, the finding that subjects take longer to choose between oscillatory movements of the two hands when the movements proceed in different directions (horizontal and vertical) than when they proceed in the same direction (Heuer, 1982). Since the hierarchical decisions model does not assume that different numbers of decisions should be needed in these two kinds of choices, it fails to account for this result. To account for similarity effects of this kind (as well as the other results described above), Rosenbaum and Saltzman (1984) proposed a motorprogram editor model. The model says that subjects choose between two possible movement sequences by specifying the motor features that are uncertain at each serial position of the sequence to be performed. Thus, for a choice between oscillatory movements of the two hands, if the two
Fig. 2. Hierarchical decisions model of Rosenbaum et al. (1984b). Panel A: Two sequences of length n = 3. Panel B: Two sequences of length n = 1. Panel C: One sequence of length n = 3 paired with a sequence of length n = 1.
158
David A. Rosenbaum
movements proceed in the same direction, only the hand feature distinguishing the two uncertain responses needs to be specified after the reaction signal is identified. However, if the two movements proceed in different directions, both the direction and hand of the required responses must be identified. Assuming that specification time increases with the number of features to be specified, choice reaction time should increase as the possible response sequences become less similar (i.e., as they share fewer features). The motor-program editor model accounts for the results of Rosenbaum et al. (1984b). First consider the length effects obtained in the experiment, where the choices were i versus I, ir versus IR, and irm versus IRM. With an increase in the number of responses between which choices must be made, the number of motor subprograms requiring feature assignments also increases. Thus, i versus I requires one hand assignment, ir versus IR requires two hand assignments, and irm versus IRM requires three hand assignments. The results of the second experiment of Rosenbaum et al. (1984b) can be accounted for in a similar way, with the added assumption that extra time is needed to cancel subprograms for particular serial positions (e.g., the subprograms for serial positions 2 and 3 when i is called for in the context of IRM). (See Rosenbaum and Saltzman (1984) for more details.) The motor-program editor model differs from the hierarchical decisions model in the decision units it postulates. Whereas the hierarchical decisions model assumes that the lowest-level decision units are subprograms corresponding to complete responses, the motor-program editor model assumes that choices can be made about more elementary aspects of motor responses. Consequently, the motor-program editor model allows that programs for two possible response sequences can be combined into one protoprogram, with only the features distinguishing the two subprograms at each serial position remaining to be specified after the reaction signal is identified. Another important difference between the hierarchical decisions model and the motor-program editor model is that the motor-program editor model provides a mechanism for programming long sequences of responses. It does so because it suggests that only the features distinguishing one motor sequence from the next need to be changed or edited, thereby eliminating the need for complete reprogramming each time another motor sequence is performed. In the sections that follow, I describe studies that support both of these general predictions. A.
STUDIES USING STIMULUS-RESPONSE COMPATIBILITY EFFECTS
It is well known that the choice reaction time for a response depends on the relation between the response and the signal with which it is paired.
A Model of Human Motor Programming
159
Thus, if a person is asked to press a button with the left hand when a signal appears on the left, and to press a button with the right hand when a signal appears on the right, the choice reaction times are faster than when the stimulus-response (S-R) mappings are reversed (Craft & Simon, 1970). Inhoff, Rosenbaum, Gordon, and Campbell (1984) made use of this fact to investigate the selection of response sequences. They were particularly interested in the possibility that when subjects choose between sequences that can be distinguished by a single motor feature, subjects choose the designated sequence by simply selecting its distinguishing feature. The results of Inhoff et af. supported this hypothesis. In one experiment, subjects performed the same set of choices as in the first experiment of Rosenbaum et af. (1986), that is, where the sequences had one, two, or three responses each. However, in the Inhoff et af. study, the reaction signals were spatially positioned so that in the high S-R compatibility conditions the stimulus on the left signaled the left-hand sequence and the stimulus on the right signaled the right-hand sequence. By contrast, in the low S-R compatibility conditions, the stimulus on the left signaled the right-hand sequence and the stimulus on the right signaled the left-hand sequence. Fig. 3 shows the three main results: (1) Mean T , increased with sequence length, replicating the earlier findings; (2) Mean T , was longer in the low S-R compatibility conditions than in the high S-R compatibility conditions, replicating Craft and Simon (1970); and (3) most importantly, the effects of sequence length and S-R compatibility were additive. These results are consistent with the hypothesis that subjects chose between the left- and right-hand sequences by merely choosing the hand needed to perform the sequence. Suppose that S-R compatibility only influences the time to choose between the right and left hand, or more specifically, an abstract parameter designating the right or left hand. Since the choice between left and right hand occurs only once, the effect of S-R compatibility should affect only a single stage of processing. Therefore, the effect of S-R compatibility should be statistically additive with other effects, such as sequence length (Sternberg, 1969). There are other possible interpretations of the Fig. 3 data, however. One is that S-R compatibility effects extend only to the first response. For example, subjects might choose between the first response of one sequence and the first response of the other sequence and then allow the selected response to trigger the succeeding responses in its own sequence. The time to produce the first response could still increase with sequence length because of the demands of the tree-traversal process. To address this possibility, Inhoff et af. studied S-R compatibility effects for sequences that began with one hand and ended with the other. The reasoning was that if the results of their first experiment reflected a choice limited to the first response, the effects of S-R compatibility should be the same regardless of whether the sequence
David A. Rosenbaum
160
T
4001 A
0
Q, v)
t
E
z
a w I
350 I 325
3001 275
NUMBER OF RESPONSES, n Fig. 3. Results of the first experiment of Inhoff el al. (1984).
begins with one hand and ends with the other (heterogeneous sequences) or is performed entirely with one hand (homogeneous sequences). In fact, the data supported the view that subjects applied a “hand” value to the entire set of responses before performing the first response. As seen in Fig. 4, the effect of S-R compatibility was smaller for the heterogenous sequences than for homogeneous sequences. Can these results be explained in terms consistent with the motor program editor model? Suppose that when subjects prepare to choose between two response sequences, they prepare motor subprograms common to the two sequences but leave unspecified (until the reaction signal is identified) those motor feature(s) distinguishing the ultimately required sequence from the sequence that is not required. When the reaction signal is presented, a choice is made between the two possible motor features and then the chosen feature is assigned in a serial fashion to the subprograms requiring feature assignment. Suppose the time to choose between motor features is affected by S-R compatibility but the time to assign features is not. The choice reaction time is then described by the following equation: T, = k
+ p c + qi + au
(1)
where k is a constant, c is the time to choose a parameter that is compatibly mapped to the reaction signal, p is the number of such parameters, i is the
A Model of Human Motor Progrnmming
I 475
161
I
T
-
450 -
425
400
-
375 -
COMPATIBLE
INCOMPATIBLE
COMPATIBILITY BETWEEN REACTION SIGNAL AND FIRST RESPONSE Fig. 4.
Results of the second experiment of Inhoff et ul. (1984).
time (i > p) to choose a parameter that is incompatibly mapped to the reaction signal, q is the number of such parameters, a is the parameter assignment time, and u is the number of subprograms requiring assignment. When the choice is between homogeneous sequences, the values of p and q are 2 and 0 respectively when S-R compatibility is high, and 0 and 2 respectively when S-R compatibility is low. By contrast, when the choice is between heterogeneous sequences, the values of p and q are both 1 when S-R compatibility is high or low. Thus, the model can account for the fact that the S-R compatibility effect is smaller when the sequences are heterogeneous than when the sequences are homogeneous. 's4
'Notice that the model does not account for the fact that the S-R compatibility effect is larger for the first response than for the second. The result suggests that subjects postpone programming of later responses, in which case equation (1) would have to be revised to reflect diminishing probabilities of feature assignment for responses that are farther from the beginning of the sequence. 'The same model has been shown to account for reaction time data from an experiment on speech choices (Rosenbaum, Gordon, Stillings, & Feinstein, 1987).
162
David A. Rosenbaum
The model can also account for the additive relation between sequence length and S-R compatibility, shown in Fig. 3. The effect of S-R compatibility is attributable to the fact that in the high S-R compatibility conditionp = 1 and q = 0, whereas in the low S-R compatibility conditionp = 0 and q = 1. The increase in TIwith sequence length is attributable to the corresponding increase in u, the number of subprograms requiring feature assignment. In sum, therefore, these results support the view that subjects can choose between response sequences by assigning motor features to previously readied subprograms. Note that according to the motor-program editor model, length effects are attributable to increases in the number of subprograms requiring feature assignment rather than to increases in the number of nodes in a tree-traversal process. Later in this article I discuss the relation between these two proposals. For now, it suffices to say that the evidence that has been reviewed so far favors the motor-program editor model over the hierarchical decisions model.
B. THEPARAMETER REMAPPING EFFECT A corollary of the motor-program editor model is that successive response sequences can be programmed by changing features that distinguish one sequence from the next. This method of programming is more efficient than one in which each sequence must be programmed “from scratch” regardless of its relation to the sequence that has just been performed (Rosenbaum, 1980). The motor-program editor model therefore has the virtue that it not only addresses the question of how forthcoming response sequences are programmed; it also says what happens to motor programs after they have been executed. The model says that just-used motor programs are preserved so that the features distinguishing the justperformed sequence from the next one to be performed can be changed. If this view of programming is correct, sequences that share many features with sequences that have just been performed should be performed more quickly than sequences that share fewer such features. The evidence supports this prediction. In serial-choice reaction-time tasks, where signals for distinct responses are presented one after the other in rapid succession, reaction times are shorter for similar successive responses than for dissimilar successive responses (see Rosenbaum & Saltman, 1984). Similarly, in tasks where subjects perform memorized response sequences repetitively and as quickly as possible, there are dramatic slowdowns when features of particular responses change from one cycle to the next (Rosenbaum, Weber, Hazelett, & Hindorff, 1986). For example, as seen in Fig. 5 , when subjects perform finger-tapping sequences, their mean response rates are significantly lower when the number of taps with the same finger changes from one cycle to the next than when the number of taps with the same finger is constant
A Model of Human Motor Programming
-Flxed mapping _ - - - -Variable rnappmg 16
163
J
\
\
‘Y
3
4
6
REQUIRED NUMBER OF RESPONSES PER s
Fig. 5 . Results of the “finger fumbler” experiment of Rosenbaum el 01. (1986). Subjects performed two finger sequences from memory at three different rates (specified by a computergenerated click train). For one sequence, MMIiiMIIi, the mapping of number of consecutive taps to the same finger varied. For the other sequence, RRMIIimmr, the mapping of number of consecutive taps to the same finger was fixed. The graph shows the mean number of finger sequences completed before an error was made when the subject was required to produce individual responses.
from one cycle to the next. Thus, when the mapping of parameters to responses is fixed, performance benefits, but when the mapping of parameters to response varies, performance suffers. These results are consistent with the hypothesis that when the mapping (or assignment) of a parameter to a subprogram needs to be changed, it takes time to achieve this remapping. If previous parameter assignments were not preserved, such remapping would not be required.
IV. Further Tests of the Hierarchical Decisions Model and Motor-Program Editor Model The preceding sections show that the motor-program editor model does a reasonably good job of accounting for sequencechoice data. Moreover, the model provides a means of conceptualizing the programming of extended series of responses. The results of several additional experiments (Rosenbaum, Inhoff, & Gordon, 1984a) shed light on more detailed aspects of the program editing process. These experiments also help resolve a question remaining from the hierarchical decisions model, namely, whether motor programs are represented and decoded hierarchically. The motor-program
David A. Rosenbaum
164
editor model makes no explicit provision for hierarchical organization, although it certainly does not preclude it. Since the hierarchical decisions model provides an excellent fit to the timing data of Sternberg et al. (1978) and Rosenbaum et al. (1984b), it was important to determine what role the hierarchy actually plays. The first experiment of Rosenbaum et al. (1984a) was designed to distinguish between the hierarchical decisions model and motor-program editor model. Subjects chose between right- and left-hand sequences with varying numbers of responses. In different conditions, the choices were i versus IR, i versus IRM, ir versus IR, and ir versus IRM. The hierarchical decisions model made two clear predictions about performance in these choices: (1) The latencies of responses in any sequence should always increase with the length of the sequence; (2) As the number of responses n in the nonrequired sequence increases, T , for the required sequence should either increase (if a more complex hierarchy is formed) or remain the same. The motor-program editor model predicted a more complex pattern of data which was in fact obtained (see Table I). The pattern can be accounted for with the proposal that decision making is restricted to serial positions in which there are two possible responses, and the time to begin a sequence increases with the number of such decisions, unless the decision is to cancel responses (e.g., to produce i in the context of IR). In the second experiment of Rosenbaum et al. (1984a), all of the sequences consisted of two responses, but the characteristics of the sequences were varied so that the first and second responses were either performed by TABLE 1 CHOICES, HYPOTHESIZED EDITING OPERATIONS, AND MEAN LATENCIES (INMSEC), FIRST EXPERIMENT OF ROSENBAUM ET AL. (1984a) Condition 1
2 3 4
Sequence
Editing operation
Ti
TZ
i IR
Specify 1, cancel 2 Specify 1
378 380
-a
1
IRM
Specify 1, cancel 2 & 3 Specify 1
380 381
ir IR ir IRM
Specify 1 & 2 Specify 1 & 2 Specify 1 & 2, cancel 3 Specify 1 & 2
401
81 71
420 413
74 63
78
T,
75
70
uA blank space indicates that data were not applicable. T,, TI,and T3 are the mean latencies for the first, second, and third response respectively. Data in Condition 3 are means over ir and IR.
A Model of Human Motor Programming
165
fingers of one hand (e.g., im or IM) or by fingers of two hands (e.g., iM or Im); see Table 11. By pairing these sorts of sequences in different ways, it was possible to vary the serial positions of the uncertain response (position 1 or 2, or 1 and 2) as well as the complexity of any rules that might be used to choose between the sequences. Thus, for some choices (e.g., im versus IM), a simple rule could be used to decide between the sequences (viz., use the left or right hand), but for other choices with the same number of uncertain responses (e.g., iM versus Im) a more complex rule was needed. The hierarchical decisions model predicted that TI would not depend on the serial position(s) of the uncertain response(s) nor on the complexity of the rules that might be used to select them. However, the motor-program editor model predicted that such effects would emerge. As seen in Table 11, the data supported the motor-program editor model. When more complex rules were theoretically required, mean T, increased. In the third experiment of Rosenbaurn et a/. (1984a), we sought more detailed information about the editing process. Subjects chose between pairs of three-response sequences, with the uncertain response in serial position 1, 2, or 3 (see Table 111). Representative choices for these three conditions were irm versus Irm, irm versus iRm, and irm versus irM. By varying the serial position of the uncertain response, we could determine whether selection of the uncertain response was delayed until after the first response was performed. We assumed that the likelihood of such delayed decision making would increase with the amount of delay possible. Thus, TI was predicted to decrease with the distance of the uncertain response from the TABLE I1
CHOICES AND RESULTS, SECOND EXPERIMENT OF ROSENBAUM E T A L . (1984a) ~~
~~
Condition
Sequences'
1
im Im
2
3
Position of uncertain response
~~
Transition
T,
1
Within-hand Between-hand
402 387
im iM
2
Within-hand Between-hand
367 368
im IM
1 and 2
Within-hand Between-hand
380 427
'Conditions 1 and 2 also included the mirror-image sequences of those listed in the table. Thus in Condition 1 another choice was IM versus iM and in Condition 2 another choice was IM versus Im. The data are averaged over the two within-hand and two between-hand sequences tested in each condition.
David A. Rosenbaum
166
TABLE I11 CHOICES, THIRD EXPERIMENT OF ROSENBAUM ETAL. (1984a)
Condition
Sequence
1
irrn Irm irrn Mrm irrn iRm irrn iIm irrn irM irm irI
2
3 4
5
6
Positon of uncertain response
Distinguishing features
1
Hand
1
Hand and finger
2
Hand
2
Hand and finger
3
Hand
3
Hand and finger
beginning of the sequence. The motor-program editor model did not predict such an outcome, however, for it assumed that all the uncertainties in the program for a forthcoming sequence were resolved in advance. A second issue addressed in this experiment was whether TIdepends on the number of features that distinguish the alternative responses at a given serial position. Since the motor-program editor model assumes that decisions are made about individual motor features, it would be consistent with the model to find that T, increases with the number of features distinguishing the responses between which choices must be made. To address this issue, three choices were added to the ones listed above: irrn versus Mrm, irrn versus ilm, and irrn versus irI (see Table 111). Note that for these sequences, as for the ones listed above, the serial position of the uncertain response varied. However, the alternative responses at each serial position differed with respect to hand and finger rather than hand alone. We predicted that if hand and finger are specified separately (i.e., serially) at each serial position, T, should be longer for choices in which hand and finger are uncertain than for choices in which only hand is uncertain. The TIdata for this experiment are shown in Fig. 6, where it is seen that the number of distinguishing features had little or no effect on the time to initiate the sequence. By contrast, the serial position of the uncertain response had a large effect. The farther the uncertain response was from the beginning of the sequence, the shorter was T,. This result is consistent with the delayed-decision view of programming and inconsistent with the totalpreprogramming view of the motor-program editor model. Further support
A Model of Human Motor Programming I
I67
I
I
f
-E Hand
I=
$
I
400-
-
Hand and Finger
350 -
1
I
I
2
L 3
POSITION OF UNCERTAIN RESPONSE
Fig. 6. Results of the third experiment of Rosenbaum ef a/. (1984a). Filled points correspond to the conditions in which the alternative responses differed with respect to hand alone. Empty points correspond to the conditions in which the alternative responses differed with respect to hand and finger.
for the delayed-decision view comes from the interresponse times. As seen in Table IV, interresponse times were longer for responses that were initially uncertain than for responses that were initially certain. This is what one would expect if the selection of a forthcoming response were sometimes delayed relative to the completion of the immediately preceding response. Although these results are consistent with the delayed-decision hypothesis there is a major problem with the hypothesis: The magnitude of the uncertainty effect was smaller for interresponse times than for initial response times. Consider the fact that T , dropped 150 msec as the uncertain response moved from Serial Position 1 to Serial Position 3. If this drop occurred because selection of the third response could be delayed, then those 150 msec should have be seen in T, and T3;that is, the net elevation in T2and T, when response 3 was uncertain as compared to when Response 1 was uncertain should have been 150 msec. The fact that the net elevation in T, and T3 was much smaller than 150 msec poses a problem for the delayed-decision view.
V. The Hierarchical Editor Model
To resolve this problem and to account for the other results that have been reviewed here, Rosenbaum et al. (1984a) developed a new model of
David A. Rosenbaum
168
TABLE IV MEANINTERRESPONSE TIMES,THIRD EXPERIMENT OF ROSENBAUM ETAL. (1984a) Position of uncertain response ~
2
1
3
Distinguishing feature
Tza
T3
T2
T,
Ti
T,
Hand Handand finger
177 176
195 170
213 193
189 163
170 177
214 233
~
~
uTi and T3are the mean latencies for the second and third responses respectively.
motor programming-the hierarchical editor, or HED, model. The model says that subjects prepare for a choice between response sequences by establishing an abstract program with all the features common to the two possible sequences. The program is assumed to be hierarchically organized, which means that it can be represented as a tree or as a phrase structure grammar (i.e., a set of rewrite rules). If one uses the phrase structure representation, one can describe the workings of the model as follows. Before the reaction signal appears, translations are carried out successively from the top of the program (i.e., the first line) downward to the first point where there is an uncertain translation. After the reaction signal is identified, all remaining translations are performed, beginning with the first uncertain translation and proceeding to the end of the program. During this series of translations, none of the terminal elements is executed. This series of translations is called the Edit pass; its purpose is to ensure that all rewrite statements in the program are fully determined. Once the Edit pass has been completed, control returns to the top of the program and the translation process begins anew, this time with all the terminal elements being physically executed when they are encountered. This series of translations is called the Execution pass. One other assumption is needed, namely, that each translation step takes a finite and measurable amount of time. On the basis of these assumptions, one can count the number steps preceding each response to see whether the time for that response depends on the number of steps that precede it. Consider how the HED model applies to the data of the last experiment described in the preceding section (see Fig. 7 and Table V). For a choice in which the first response is uncertain, the Edit pass can proceed to the second line of the program before the reaction signal is presented; this is where the i or I in the first uncertain translation is located, which is denoted X table. After the reaction signal is presented, seven translation steps are
-
A Model of Human Motor Programming ( A ) IiMm vs Mm
I
i
i
( 8 ) liMm vs iMm
M m
I
(C) iMrnl vs Mrn
M m
I
169
i
M m
(0)iMrnl vs iMrn
i
M m
1
Fig. 7. The HED model account of the results of the fifth experiment of Rosenbaum et al. (1984a). The model assumes a tree-traversal process that begins (after identification of the reaction signal) at the leftmost point where there is uncertainty about whether to retain subprograms for responses or cancel those subprograms (denoted +).
needed before the first response is carried out. By contrast, when the second response is uncertain, the Edit pass can proceed up to the third line of the program, and only five translation steps are needed before the first response is performed. Finally, when the third response is uncertain, the Edit pass can proceed up to the fourth line of the program, and only four translation steps are needed during the initial reaction time. If T, is plotted against the number of translation steps, s, that are theoretically required after the reaction signal appears, the correlation coefficient of the best-fitting straight line ( T , = 268 + 44s)is .98. Moreover, interresponse times that theoretically require two preceding translation steps are accurately predicted to take longer (214 msec) than interresponse times that theoretically require only one preceding translation step (177 msec). Thus the HED model accounts for the results of this experiment. It does so by saying that fewer rewrite steps are required after the reaction signal is presented as the position of the uncertain response recedes from the beginning of the sequence to the end. A. THEHIERARCHICAL NATURE OF EDITING
Based on the material just presented, it should be clear why the word editor was included in the name of the HED Model. It is because the model assumes that there is a process in which editing a program ensures that all initially uncertain translations are fully defined. I turn next to
David A. Rosenbaum
170
TABLE V HED MODEL ACCOUNT, THIRD EXPERIMENT OF ROSENBAUM ET AL. (1984a)
Plana
Edit step
-
-b
5
1 2 3 4
6 7 (Response 1) 8 (Response 2) 9 (Response 3)
Sequence Xrm X-iorI x - # r-#
m-#
-
Sequence iXm i-# X-rorR X-#
m-#
Sequence
- irX
i - # r-# X-morM x - #
1 2 3
1 2
Execution step
4 5 (Response 1) 6 7 (Response 2) 8 (Response 3) 3 4 (Response 1) 5 (Response 2) 6 7 (Response 3)
Mean latency (msec)
5 70 177 182
-
504
-
203 176
433 174
-
224
'% denotes an uncertain response; # denotes a terminal element. b-
Dash indicates data not applicable.
the fourth experiment of Rosenbaum et al. (1984a), which shows why the word hierarchical was also included in the model's name. Subjects in this experiment performed in four choice conditions (see Table VI). In Conditions 1 and 2 the two possible response sequences were mirror images of one another; in Conditions 3 and 4 they were not. In Table VI, the data from the experiment are divided according to whether the performed sequence was an index-index-middle sequence or an indexmiddle-middle sequence. Two important results emerge. One is that when the alternative sequences were mirror images, T , was shorter than when the alternative sequences were not mirror images. Second, the interresponse times, T2and T3 ,exhibited a three-way interaction such that interresponse times were reduced for responses using the same finger as the immediately preceding response, but only in mirror-image choices. The HED model can account for this complex pattern of results, as seen in Table VII. When the alternative sequences were mirror images of one another, subjects could prepare a plan that capitalized on their shared structure. For example, when the choice was between the left- and right-hand versions of index-index-middle (top panel), editing could take advantage of the fact that there was a common index-finger doublet. Likewise, when the choice was between the left- and right-hand versions of index-middle-middle
A Model of Human Motor Programming
171
TABLE VI CHOICES AND MEANLATENCIES, FOURTH EXPERIMENT OF ROSENBAUM E r a . (1984a) Choice type Response 1 not repeated
Response 1 repeated Relationship
T,"
T,
T3
T,
T,
T3
Mirror Nonmirror
434 492
173 198
191 196
441 491
190 197
159 190
"T,, T,, and T, are the mean latencies for the first, second, and third responses respectively.
(middle panel), editing could take advantage of the fact that there was a common middle-finger doublet. However, when the choice was between sequences that were not mirror images (bottom panel), there were no common doublets that the editor could exploit. If one counts the number of steps
TABLE VII APPLICATION OF THE HED MODEL, FOURTH EXPERIMENT OF ROSENBAUM ET AL. (1 984a) ~
Choice iim versus IIM
Edit step
Plan"
-
Sequence XY X-iorI
x-# x-#
imm versus IMM
Y-morM Y-# XY Sequence X-iorl
-
Y -iorM
1
1 2
3
Y-# Z-morM z - #
~
-
5
-
Sequence XYZ X-iorl
x-#
~~
4 5
2 3 4
Y-morM Y-#
Y-#
1
2 3
x-#
iim versus IMM
-b
4
5 6 ~
~
Execution step 6 7 8 (Response 1) 9 (Response 2) 10 11 (Response 3) 6 7 8 (Response 1) 9 10 (Response 2) 11 (Response 3) 7 8 9 (Response 1) 10 11 (Response 2) 12 13 (Response 3) -~
'x, Y, and Z denote uncertain responses; #denotes a terminal element. bDash indicates data not applicable.
Latency (msec)
434
173
-
191
441
-
190
159
492
-
198
-
193
David A. Rosenbaurn
172
preceding each response in each of the three choices of Table VII, one sees that there is a good match between the number of steps and the corresponding response times. For example, eight steps are assumed to precede the first response in mirror image choices, but nine steps are assumed to precede the first response in nonmirror image choices. Furthermore, for responses that use the same finger as the immediately preceding response, in mirror image choices one step is assumed, but for all other noninitial responses two steps are assumed. This can explain the three-way interaction for interresponse times. The HED model therefore accounts for the data from this experiment through its assumption that editing takes advantage of hierarchical structure. Another experiment that illustrates the hierarchical nature of editing was reported by Rosenbaum et al. (1984a, exp. 5). Table VIII shows the choices used in this experiment and the obtained mean latencies of the corresponding responses. As seen in the table, the two sequences in each choice differed with respect to the presence or absence of particular responses. Moreover, the responses that distinguished the two sequences either formed or did not form natural hierarchical groupings. Thus Ii, which distinguished the two sequences in Choice 2, was a natural hierarchical group, whereas I, which distinguished the two sequences in Choice 1, was not. If editing respects hierarchical organization, TI for Mm in Choice 2 should be shorter than TI for iMm in Choice 1. The data confirm this prediction. Furthermore, the data from Choices 3 and 4 show that the superiority of Mm over iMm was not simply due to inherent differences in the ease of producing those two sequences or to the fact that one sequence occupied an earlier position in its “parent” sequence. The HED Model can account for the data from this experiment, as seen in Fig. 8. The general idea is that choices are made about whether to produce or not produce individual responses or groups of responses, and this TABLE VIII CHOICES AND MEANLATENCIES, FIFTHEXPERIMENT OF ROSENBAUM E T A L . (1984a) Alternatives
Condition 1 2 3
I
i
M
m
50 1
156
175
131
I
i
M
m
452
165
189
131
m
I
154
151
175
m
I
162
177
151
1
339 4
i 494
M
M
versus
1
M
m
503
148
132
versus versus versus
M
m
43 1
106
m
i
M
328
149
153
M
m 101
465
A Model of Human Motor Programming
173
Y1
5oof
I t sb 8 E
400
F 300
L
v1
2 0 0l / i ? i d
0
2
Responses
4 6 8 NUMBER OF OPERATIONS
10
Fig. 8. Fit of the HED model to the results of the fifth experiment of Rosenbaum ef al. (1984a). Fitting was achieved by pooling responses that theoretically required the same number of node traversals.
choice is always made at the highest level possible. As assumed in the HED model generally, the first choice is made immediately after the reaction signal is identified, and then the tree-traversal process continues from that point to the rightmost terminal node in order to check for any more uncertain mappings; this is the edit pass. After the edit has been completed, the entire tree is traversed again from the leftmost terminal node to the rightmost terminal node, and responses are executed when their corresponding terminal nodes are encountered; this is the Execution pass. One can count the number of nodes preceding each response in each choice condition to see how well the model fits the data. As seen in Fig. 8, it does quite well. B. THEHED MODELSFITTO EARLIER DATA
As seen in the preceding discussion, the HED model does a reasonably good job of accounting for the data of the fourth and fifth experiments of Rosenbaum et al. (1984a). Before we consider what the HED model tells us about motor programming generally, it is important to ascertain whether the model also accounts for the data from the other experiments that have been reviewed here. Without reviewing in detail how the HED model has been fit to the data from the previous experiments, it suffices to say that the model accounts for over 90% of the variance of the mean latency data in all of the experiments.
174
David A. Rosenbaum
C. IMPLICATIONS OF THE HED MODEL What general principles about motor programming does the HED model suggest? First, economy of representation. The model assumes that the programs set up for editing have all the features common to the two possible response sequences. Insofar as the HED model is correct, it implies that people reduce the size or number of programs held in readiness at any one time, presumably because of limitations in the capacity of short-term memory. Second, as a result of economy of representation, editing decisions can be made at different levels, ranging from entire groups of responses to individual motor features (as was assumed in the motor-program editor model). Thus there is considerable flexibility in the types of decision units that can be used in motor programming. This principle makes sense from the perspective of skilled performance. A programming system that allows for decision-making at many different levels presumably affords greater flexibility than a programming system that allows for decision making at only one level. Third, the hierarchical view of movement control is strengthened. The success of the HED model suggests that hierarchical organization plays a key role in the programming as well as execution of movement sequences, as has been suggested elsewhere (Lashley, 1951; MacKay, 1982). Fourth, movement commands are retrieved from symbolic memory stores rather than being read from low-level buffers in a linear fashion. This view is consonant with emerging views of information intake which hypothesize immediate access to symbolic memory stores rather than transient storage in raw sensory form (Coltheart, 1980). Fifth, the HED model says that the edit pass involves exactly as many steps in toto as the execution pass. A more general implication of this assumption is one reminiscent of Shepard’s claim about the similarity between internal cognitive events and the external (perceptual) events to which they correspond (e.g., Shepard & Podgorny, 1978). The HED model is consistent with Shepard’s view in that internal processes-in this case, those comprising the edit pass-are isomorphically related to their corresponding external (motor) events-the execution pass. In the same way that Shepard assumes that the time for internal cognitive processing should be positively related to the time for perceptual processing, the HED Model assumes that the total time for editing a forthcoming motor sequence should be positively related to the time to execute that sequence. Sixth, although it is speculative at this time, the success of the HED model can be taken to sugest that fundamentally similar mechanisms are used in the control of manual activity and the control of speech activity. This view is also supported by comparisons of response timing in speech and typewriting (Sternberg et al., 1978).
A Model of Human Motor Programming
175
VI. Parallel Editing and Execution Despite the success of the HED model, it has (at least) one serious problem. Since editing is assumed to proceed from the first point of uncertainty to the end of the program, a choice between two lengthy response sequences could take an inordinately long time to complete. Rosenbaum, Hindorff, and Munro (1986, 1987) described experiments that were designed to address this problem. Based on these experiments, they proposed a modified version of the HED model. All the assumptions of the model are retained in the modified version except that execution of one part of a sequence can occur while a later part of the sequence is being edited; that is, execution and editing can go on in parallel. (In the HED model, execution cannot occur until editing has been completed .) A.
INVERSELENGTH EFFECTS
One experiment that led to the modified version of the HED model is illustrated in Table IX. Subjects chose between sequences consisting of varying numbers of responses, with the first uncertain response located at two different distances from the end of the sequence. The sequences had 3 , 4 , or 6 responses, and the uncertain response was located either in the last serial position or in the next-to-last serial position. The HED model made two
TABLE IX CHOICES, FIRST EXPERIMENT OF ROSENBAUM E T A L . (1986b) Responses before uncertain response
Responses after uncertain response
Total responses
Rri
I
1
3
Rri
2
0
3
RRri RRmi
2
1
4
RRri RRrm
3
0
4
RRRRri RRRRmi
4
1
6
RRRRri RRRRrm
5
0
6
Choice"
Rmi Rrm
@Half the subjects had the sequences listed here and half had the left-right mirror image of these sequences (e.g., r I U versus rRM). Italics denote uncertain responses.
176
David A. Rosenbaum
clear predictions about this situation. First, the latency of the first response should always be longer when the uncertain response is in the next-to-last serial position than when the uncertain response is in the last serial position. This prediction derives from the assumption that extra editing is needed when the uncertain response is one step removed from the end of the program. The second prediction is that the time for the first response should increase, or at least remain constant, with the length of the sequence to be performed. This prediction derives from the assumption that increasing the length of the sequence can increase the complexity of the execution pass, including the complexity of the translation steps preceding the first response. Figure 9 shows what actually happened. The latency of the first response decreased as the length of the required sequence increased! This result violates the prediction of the hierarchical editor model and also runs counter to the well-known result of Sternberg et al., as well as our replications of this result. The second important result was that the latency of the first response was uniformly longer when the uncertain response was in the penultimate serial position (that is, when two edit steps were theoretically required), than when the uncertain response was in the final serial position (that is, when one edit step was theoretically required). The latter result supports the prediction of the hierarchical editor model, but the former result (the inverse length effect) does not.
B. SCHEDULING How can these results be explained? Based on subjects’ reports and based on the results of two other experiments, Rosenbaum ef af. suggested that subjects execute responses while editing later responses, but they do so in such a way that means and variances of interresponse times are minimized. Consider what would happen if subjects executed responses while editing later responses, but they always began to execute responses as quickly as possible after the reaction signal appeared. Provided that editing takes much longer than execution (which is likely in view of the much greater time required for choice as compared to simple reactions) there would occasionally be very long delays before responses that were initially uncertain, since the editing process for the initially uncertain response might not be completed after execution of the immediately preceding response. To avoid this state of affairs, subjects determine (presumably through experience) how long editing takes and how long execution takes, and then they use this information to withhold the start of the execution pass until the moment when the produced train of responses is likely to come out “smoothly,” that is, with interresponse times that have minimal mean and variance. Thus subjects schedule their early, certain responses so that the entire response sequence is performed without long delays midway through the sequence.
A Model of Human Motor Programming
450 -
177
0 U=n-1
400
350
U=n 5 I
I
I
3
4
6
NUMBER OF RESPONSES, n
Fig. 9. Results of the first experiment of Rosenbaum el ul. (1987). The two curves correspond to conditions in which the uncertain response occupied the last serial position (U = n) or the next-to-last serial position (U = n - 1).
Scheduling responses in this way is not uncommon in skilled performance. Batting an oncoming baseball entails timing the start of the swing so that bat and ball meet at an intended time and place. Similarly, the delivery of efferent commands to articulators with different masses must occur at different times if the articulators are to act simultaneously to produce a desired speech sound (Lenneberg, 1967, Chapter 3). Scheduling does not always work perfectly, however. Speakers misarticulate, and batters, in part through their inability schedule perfectly, help pitchers earn high salaries. In the more mundane situation of choosing between finger sequences, if scheduling is prone to error, responses that are initially uncertain will occasionally be produced with a latency that is somewhat longer than normal, and so the mean interresponse times for those responses will be elevated as a result, as was seen in Table IV (and as was found in the experiments of Rosenbaum et al., 1986a,b). Note that if editing and execution can occur in parallel, the problem of the “missing” 150 msec discussed at the end of Section IV, is solved. With a parallel system, there is no reason why decreases in T, must lead to equal increases in T, and T 3 . How does the scheduling version of the HED model explain the inverse length effect observed in Fig. 9? Since scheduling is achieved by estimating the time needed to select the uncertain response and by then “working backward” from this expected time, as more responses can fill this time interval, T, will decrease and approach an asymptote.
178
David A. Rosenbaum
How does the scheduling model account for the fact that in addition to decreasing with the number of responses before the first uncertain response, TI also increases with the number of responses after the first uncertain response? (Another experiment of Rosenbaum et al., 1986a, b, established the robustness of this effect .) Since adding responses after the uncertain response increases the expected duration of the edit process, TImust be delayed accordingly if uniformly short interresponse times are to be achieved.' C. SCHEDULING AND THE HED MODEL
Although the preceding discussion has focused on the dynamics of scheduling, it should be emphasized that scheduling, as well as the capacity for parallel editing and execution that scheduling presumes, is the only feature that distinguishes the scheduling version of the HED model from the HED model itself. Thus, the other assumptions of the HED model-a serial edit pass, a serial execution pass, hierarchical processing in both the edit and execution pass, and flexibility in the levels or units of decision making-are retained in the scheduling version of the HED model. Since the scheduling version of the HED model is the final model that we have been led to, it would be reassuring to know that it can account for all of the results we have obtained. While it would take too long to review each study individually, I can provide a thumbnail sketch of the power of the modified model. First, consider the positive length effects, reported in Section I1 and shown in Fig. 1. This result is accounted for with the scheduling version of HED, just as it was accounted for with the original HED Model. When there are more responses for which subprograms require editing, the time needed for editing increases. Scheduling indicates when execution must await completion of the edit process, for otherwise there would be major interruptions in production of the sequence. If the length of the sequence continues to grow, scheduling indicates when it is possible to begin executing responses while editing later responses. The scheduling version of the HED model accounts in a similar way for the length effects of Inhoff et al. (1984), shown in Fig. 3 . Because the model allows for decision making about abstract motor features, such as the hand to be assigned to previously readied finger subprograms, it can account for the additive relation between S-R compatibility and sequence length. The effect on T , of the serial position of the uncertain response, shown in Fig. 6, is also explained with the scheduling version of the HED model. With greater distance from the beginnng of the sequence to the first uncertain response, the edit interval can be filled with more responses. 'Beyond a certain point, however, adding responses should no longer affect T I ,because when editing leads execution by a sufficient amount of time, execution need not be postponed to ensure that editing has been completed.
A Model of Human Motor Programming
179
The patterns of interresponse times that have been reviewed here, which support the notion of a hierarchical decoding process, are also consistent with the scheduling version of the HED model, because this version assumes hierarchical decoding. At first, it might seem problematical to assume that subjects strive for uniformly short interresponse times, but hierarchical decoding results in interresponse times that depend on the dynamics of decoding (or tree traversals). Rosenbaum et d.(1986b) suggested that subjects might simply use the number of responses before the uncertain response to schedule execution. Rosenbaum et al. (1986b) reported data which confirmed this suggestion. They proposed that by merely counting responses, subjects could still achieve considerable minimization of the mean and variance of interresponse times, and T , could change accordingly, even though the actual delays between responses could (and demonstrably do) follow the workings of the hierarchical decoding process.6 VII.
Conclusions
This article has shown that choosing between response sequences is a rich experimental paradigm for investigating the structure of motor programs and the processes by which motor programs are prepared for execution. One principle that has emerged from these experiments is that in choosing between response sequences, people make use of remarkably skillful planning strategies. (My colleagues and I like to say that people really use their HEDs!) Subjects in sequence-choice experiments exhibit flexibility in the decision units they employ, and they anticipate how long different processes will take so they can decide when to carry out other, preceding processes. Partly because of the rich range of capacities that seem to be brought to bear in the sequence-choice situation, and partly because of the powerful choice-context effects that emerge as a result, the process of choosing between response sequences is far more complex than might have been imagined. On the other hand, the availability of complex choice mechanisms may be just what is needed to ensure flexible action planning in everyday life. It was to elucidate the mechanisms of everday programming that these experiments were done. Programming actions in natural conditions can be thought of as a task in which one has to choose among an infinite number of possible actions. Thus, understanding how people choose between two sequences of previously designated responses is a step, if only a modest one, 6Another possibility is that subjects simply learn by trial and error to postpone initial responses by amounts of time that happen to result in relatively smooth response sequences; that is, they respond as if they were scheduling even if they actually are not.
180
David A. Rosenbaum
toward understanding spontaneous .motor programming. For purposes of getting somewhat closer to the ultimate task of understanding how actions are programmed in everyday life, the sequence choice experiment can be elaborated in various ways. For example, the number of possible sequences can be increased, the dimensions to be selected can be varied (e.g., in terms of timing and rhythm), and the types of responses that are studied can be broadened. My colleagues and I have already done a number of speech choice experiments the results of which closely parallel the results presented here (Rosenbaum, Gordon, Stillings, & Feinstein, 1987). Finding similar results in speech and finger sequencing tasks suggests that common principles apply across response modalities, a conclusion that has been supported in other studies of the timing and kinematics of manual and vocal activities (Ostry, Feltham, & Munhall, 1984; Sternberg el al., 1978). The similarity of programming strategies across response modalities can also be taken to suggest that subjects rely on general purpose cognitive mechanisms in choosing between response sequences. Indeed, the methods of sequence choice seem remarkably similar to mechanisms of information intake and recall. As we have seen, successive responses appear to be executed via hierarchical decoding processes similar to those that have been suggested for the recall of symbolic material and sentences (Bower et al., 1969; Johnson, 1966). Reliance on hierarchical decoding for the on-line control of responses also suggests that response commands are not simply read off low-level linear string representations of the sort that might be expected to characterize “motor output buffers.” The rapid accessibility of relatively high-level codes for movement is reminiscent of the rapid accessibility of high-level semantic information for reading (Potter, Kroll, & Harris, 1980; Potter, Kroll, Yachzel, Carpenter, & Sherman 1986). Similarly, the arguments presented above against low-level buffers for movement are reminiscent of the arguments that have been raised against the importance of sensory buffers for perception (Coltheart, 1980). Discovering these similarities across processing domains points to the existence of converging solutions to information-processing problems and is therefore consistent with Anderson’s (1983) claims along these lines. Movement control may not be an isolated information-processing system. If fact, solving the information-processing demands of movement in the course of evolution may have set the stage for the development of the “higher” mental faculties that are more typically studied in cognitive psychology. ACKNOWLEDGMENTS Supported in part by grants BNS-8120104, BNS-8408634, and BNS-8710933 from the National Science Foundation, and Research Career Development Award 1 K04 NS00942-01 from the National Institute of Neurological and Communicative Disorders and Stroke.
A Model of Human Motor Programming
181
REFERENCES Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Bower, G. H., Clark, M., Lesgold, A,, & Winzenz, D. (1969). Hierarchical retrieval schemes in recall of categorized word lists. Journalof Verbal Learning and VerbalBehavior,8,323-443. Coltheart, M. (1980). Iconic memory and visible persistence. Perception & Psychophysics, 27, 183-228. Coren, S. (1986) An efferent component in the visual perception of direction and extent. Psychological Review, 93, 391-410. Craft, J. L., & Simon, J. R. (1970). Processing symbolic information from a visual display: Interference from an irrelevant directional cue. Journal of Experimental Psychology, 83, 415-420. Heuer, H. (1982). Binary choice reaction time as a criterion of motor equivalence. Acta Psychologica, 50, 35 4 7 . Inhoff, A. W., Rosenbaum, D. A., Gordon, A. M., & Campbell, J. A. (1984). Stimulusresponse compatibility and motor programming of manual response sequences. Journal of Experimental Psychology: Human Perception and Performance, 10, 124-733. Johnson, N. F. (1966). On the relationship between sentence structure and the latency in generating the sentence. Journal of Verbal Learning and Verbal Behavior, 5 , 375480. Johnson, N. F. (1970). The role of chunking and organization in the process of recall. In G. H. Bower (Ed.), Psychology of learning and rflotivation, Vol. 4 . New York: Academic Press. Kayne, R. S. (1984). Connectedness and binary branching. Dordrecht: Foris. Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms in behavior (pp. 112-131). New York: Wiley. Lenneberg, E. H. (1967). Biological foundations of language. New York: Wiley. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. G. (1967). Perception of the speech code. Psychological Review, 74, 431-461. MacKay, D. G. (1982). The problem of flexibility, fluency, and speed-accuracy trade-off in skilled behavior. Psychological Review, 89, 483-506. Monsell, S. (1986). Programming of complex sequences: Evidence from the timing of rapid speech and other productions. In C. Fromm & H. Heuer (Eds.), Generation and modulation of action patterns (pp. 72-86). Berlin: Springer-Verlag. Ostry, D. J.. Feltharn, R. F., & Munhall, K. G. (1984). Similarities in the control of speech articulators and the limbs: Kinematics of tongue dorsum movement in speech. Journal of Experimental Psychology: Human Perception and Performance, 9, 622-636. Potter, M. C., Kroll, J. F., Yachzel, B. Carpenter, E., & Sherman, J. (1986). Pictures in sentences: Understanding without words. Journal of Experimental Psychology: General, 115. Potter, M. C., Kroll, J. F., & Harris, C. (1980). Comprehension and memory in rapid sequential reading. In R. S. Nickerson (Ed.), Attention and performance VIII (pp. 395-418). Hillsdale, NJ: Erlbaum. Rosenbaum, D. A. (1980). Human movement initiation: Specification of arm, direction, and extent. Journal of Experimental Psychology: General, 109, 444-474. Rosenbaum, D. A. (1985). Motor programming: A review and scheduling theory. In H. Heuer, U. Kleinbeck, & K-M. Schmidt (Eds.), Motor behavior: Programming, control, and acquisition (pp. 1-33). Berlin: Springer-Verlag. Rosenbaum, D. A., Gordon, A. M., Stillings, N. A., & Feinstein, M. H. (1986). Stimulusresponse compatibility in the programming of speech. Memory & Cognition, 15,217-224. Rosenbaum, D. A., Hindorff, V., & Munro, E. M. (1986). Programming of rapid finger sequences. In H. Heuer & C. Fromm (Eds.), Generation and modulation of action patterns (pp. 64-71). Berlin: Springer-Verlag.
David A. Rosenbaum
182
Rosenbaum, D. A., Hindorff, V., & Munro, E. M. (1987). Scheduling and programming of rapid finger sequences: Tests and elaborations of the hierarchical editor model. Journal of Experimental Psychology: Human Perception and Performance, 13, 193-203. Rosenbaum, D. A., Inhoff, A. W., & Gordon, A. M. (1984a). Choosing between movement sequences: A hierarchical editor model. Journal of Experimental Psychology General, 113, 372-393.
Rosenbaum, D. A., Kenny, S., &Den, M. A. (1983). Hierarchical control of rapid movement sequences. Journal of Experimental Psychology: Human Perception and Performance, 9, 86-102.
Rosenbaum, D. A., & Saltzman, E. (1984). A motor-program editor. In W. Prinz, & A. F. Sanders (Eds.), Cognition and motor processes (pp. 51-61). Berlin: Springer-Verlag. Rosenbaum, D. A., Saltzman, E., & Kingman, A. (1984b). Choosing between movement sequences. In S . Kornblum & J. Requin eds.), Preparatory states and processes (pp. 119-134). Hillsdale, NJ: Erlbaum. Rosenbaum, D. A., Weber, R. J., Hazelett, W. M., & Hindorff, V. (1986). The parameter remapping effect in human performance: Evidence from tongue twisters and finger fumblers. Journal of Memory and Language, 25, 710-725. Shepard, R. N., & Podgorny, P. (1978). Cognitive processes that resemble perceptual processes. In W. K. Estes (Ed.), Handbook of learning and cognitive processes, Vol. 5 (pp. 189-237). Hillsdale, NJ: Erlbaum. Sternberg, S . (1969). The discovery of processing stages. Extensions of Donders’ method. Acta Psychologica, 30, 276-3 15. Sternberg, S., Monsell, S., Knoll, R. L., &Wright, C. E. (1978). The latency and duration of rapid movement sequences: Comparisons of speech and typewriting. In G. E. Stelmach (Ed.), Information processing in motor control and learning (pp. 117-152). New York: Academic Press.
MODULAR ANALYSB OF TIMING IN MOTOR SKILL Steven W. Keele Richard I. Ivry DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF OREGON EUGENE. OREGON 97403
I. Introduction . . . .
.............................................
11. Issues in the Study ming . . . . . . . . . . . . . . . . . . . . . . .......... 111. Individual Differences in Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV. Further Analysis of Force Control and Maximum Rate . . . . . . . . . . . . . . . . . .
V.
VI.
VII.
VIII.
A. Force Control . . . . . . . . . . . . ............. B. Maximum Rate of Repetitive Individual Differences in Skill . . . . A. Critique of Earlier Approaches B. Experimental Analysis of Individual Differences . . . . . . . . . . . . . . . . . . . .......... Neurological Analysis of Timing , A. Case Study 1 ............................................ B. CaseStudy2.. . . . . . . . . . . . . .......... C. CaseStudy3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... Other Approaches to Modularity . . . . . . . . . . A. One Clock or Many Clocks? An Analysis of Time Sharing . . . . . . . . . . . B. Functional Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I.
183 184
189 192 192 193 197 198 200 203 204 207 209 214 214 220 224 226
Introduction
In the 1940s, 1950s and early 1960s, the study of motor learning dominated the study of skilled motor control. Much research was concerned with factors like the form of practice (massed versus distributed), whether it was best to learn all components of a task at once or practice them separately, and the best ways to administer feedback. Except for pioneering studies by people like Craik, Vince, and Hick in England, Paillard in France, and Fitts in the United States, few investigators analyzed the nature of the processes themselves that underlie motor skill. Any investigation of underlying processes was usually devoted to general processes of learning, (e.g., habit strength, inhibition, or consolidation) rather than descriptions of the components of skill. Recent years have witnessed a paradigmatic shift from the study of global strategies of learning to studying the underlying structure of motor control and skill. THE PSYCHOLOGY OF LEARNING A N D MOTIVATION. VOL 21
183
Copyright 0 1987 by Academic Press, lnc All rights of reproduction in any form reserved
184
Steven W. Keele and Richard I. Ivry
This, of course, is the same kind of shift that has occurred in all of cognitive psychology. Memory research, for example, has undergone a similar transformation from an emphasis on how to memorize to the conceptual nature of memory. Our own program of studies of motor control has been in the newer tradition of component isolation endemic to cognitive psychology. What we mean by components of skill is clarified by examples. In playing a musical instrument, the sequence of movements must be specified, the succession of movements must be timed to occur at just the right points, and their forces must be carefully regulated. Similarly, in a completely different motor task such as a gymnastics routine, movements must be sequenced, they must be timed, and their forces must be regulated. The primary issue with which we are concerned is whether the components of different skills such as piano playing and gymnastics draw on the same system. If so, we call the systems modules. Is it the case that the neural systems used in timing musical performance are the same as those used in timing gymnastics performance? Are the neural systems that regulate force the same across tasks? Is the sequencing system the same? Are these systems distinct from each other? In other words, we are asking whether the brain is organized by the functions that are computed, a modular organization, rather than by the tasks performed. We presume there are no music and gymnastics centers of the brain. What has happened to the study of learning with this change to a process analysis of skill? The process analysis basically is an analysis of what is learned, especially for sequencing. Perhaps when one understands what is learned, one is in a better position to comment on how to learn. Although we have just begun to investigate sequencing from a modular viewpoint, and hence do not comment much about it, our work on timing, force control, and rate, the primary focus of the article, has relevance to issues of learning. An old debate in the area of motor control concerns whether basic abilities predict success in skill or whether success is mostly a matter of learning. Our analyses of timing, force, and rate have made use of individual differences, and therefore we offer some comments on the learning-versus-abilitiesdebate. The major portion of this article is centered around the analysis of timing. We show that the system that controls time is rather general in its application, being used not only to control different muscles in the course of motor production but also to judge durations of perceived events. For that reason we call it a module. The discussion of force control and speed is conducted within the framework of analyzing timing, because a goal is to show that they involve modules separable from timing. 11. Issues in the Study of Timing
The study of timing has an old history in psychology. Much of the earlier work is reported in Woodrow (1951), and an excellent and more recent
Modular Analysis of Timing in Motor Skill
185
review is provided by Allan (1979). Other work is described in edited volumes by Gibbon and Allan (1984) and Michon and Jackson (1985). Despite the extensive work on timing, the bulk of it probably has little relevance to understanding motor control. Most studies have been concerned with variables that affect the passage of subjective time over intervals of seconds, minutes, or hours. Most motor tasks, such as playing a musical instrument, proceed at a fast pace, with actions following one another every couple hundred msec or so. It is this fast-paced timing that we seek to understand. A seminal study by Michon (1967) provided an important method for the study of motor timing. A tone occurred at periodic intervals, and subjects synchronized key pressing with the tone. After a period, the pace signal disappeared and subjects continued to produce the target interval for as many as 200 taps. A primary interest of Michon’s concerned the relation between the magnitude of the target interval being produced(t) and the variability of the intertap intervals (a,) He investigated target intervals ranging from 333 msec up to 3333 msec and found the following relation: u, = at’.’
Michon found no evidence that there is an optimum pace. Variability increased monotonically with base interval. Subsequently, Wing and Kristofferson (1973a,b; see especially Wing, 1980, for a review) adopted the Michon method in which a period of synchronization with a pacing tone was followed by a self-timed series of taps. Within a range of times of greater interest from a motor-control viewpoint, a range of 220 to 490 msec, the variance (a )of intertap intervals increased linearly with the base interval (Wing, 1980). The exact function is different than that found by Michon, perhaps reflecting a different timing mechanism for the shorter range of intervals tested by Wing and Kristofferson. Again, there was no hint of an optimum interval. However, the more basic contribution of Wing and Kristofferson was to provide a theory and method for decomposing the total timing variance into two underlying and independent components. Here we present their model in some detail. Later we make use of both their general logic and their particular mathematical scheme. The starting assumption of the model is that there is a central clock which meters out time and initiates a response process whenever the target time has transpired. At the same instant that the response process begins, the clock recycles to measure out the next interval. An implication of this assumption is that feedback from the movement does not influence the next clock interval, since the clock cycle leading to movement n begins as movement n - 1 is being implemented. One source of variance influencing the intertap intervals is variance in the durations of the successive clock-generated intervals. The
186
Steven W. Keele and Richard 1. Ivry
response process itself, called motor implementation, also varies in duration from tap to tap. Since the two components are assumed to be independent, the total variance is the sum of the implementation variance and the clock variance, as shown in equation 2: a; = a&
+
202
M
I is the intertap interval, C is the clock interval, and M is the motor implementation variance. A given intertap interval is dependent on the particular clock interval for that period, and on both the duration of the preceding motor process that begins the response interval and the duration of the motor process that closes the response interval. It is this double contribution of implementation time to each intertap interval that results in it being counted twice in equation 2. Thus, if a motor implementation was by chance fast, the response would appear relatively early, shortening the preceding intertap interval. Since the clock interval itself is independent of the preceding implementation time, the intertap interval following a fast implementation tends to be long. Similarly, a randomly slow implementation time tends to increase the preceding intertap interval and decrease the following one. Variation in implementation time thus induces a negative correlation between durations of adjacent intervals. This phenomenon is illustrated in Fig. 1. The magnitude of such a negative correlation increases with variability in the motor delay. In contrast, clock variation produces no such tendency for long and short alternation. Thus, motor implementation variance has two effects. First, together with clock variance, it produces variance in the intertap intervals. Second, implementation variance induces negative covariation between adjacent intervals. The motor variance can be calculated directly from the negative covariation, and the clock variance can be estimated by subtracting the motor variance from the total variance of intertap intervals in accordance with equation 2. According to the Wing and Kristofferson model, an increase in the target interval being tapped out is accomplished by lengthening the duration between successive clock pulses. Thus, one would expect clock variance to systematically increase with the base interval. Implementation time should not depend on the base interval, however, so a change in the target interval is not expected to influence implementation variance. Figure 2 shows the more detailed result of the study by Wing (1980) in which subjects produced intervals ranging from 220 to 490 msec. For each target interval, total variance was decomposed into the underlying constituents. As predicted, clock variance increases with interval duration while motor variance remains constant. This result provides strong support for the model.
Modular Analysis of Timing in Motor Skill
a.
Perfect clock process with implementation variability 2
Central Signal
Peripheral
187
,
c
I
I
Response
b.
Perfect implementation process with clock variability
Central Signal
Peripheral Response Fig. 1. Panel A: a situation in which a hypothetical clock puts out pulses at nonvarying intervals. Following the clock pulse, an implementation process results in a response after a delay. Note that an increase in the normal implementation time, I , by an extra delay, D, increases the duration of the preceding interresponse interval and decreases the following interresponse interval. Such a result assumes that variations in implementation time do not effect the clock process. Panel B: clock emits pulses at varying intervals, implementation process is constant. An increase in clock interval C by amount D increases the current interresponse interval but has no effect on the subsequent interval. Thus, clock variance increases interresponse variability, but in contrast to implementation variance, does not also induce a negative correlation between adjacent intervals.
An alternate account of the negative covariation between successive intervals is that subjects deliberately adjust the duration of each interval to compensate for error in the preceding interval. This seems on the surface unlikely since it posits another timer that monitors the duration of the implementation. Nevertheless, Wing (1977) directly evaluated a feedback explanation. In his experiment, the subject heard a tone indicating the finger had made contact 15 msec after the finger touched the response apparatus in the course of tapping. Unknown to the subject, the tone was occasionally either delayed or advanced a slight amount. If subjects base timing of each response on the time of receipt of the feedback, then there should be a linear relation between the duration of the following interval and the perturbation in feedback time. Such was not the case, Indeed, sometimes the tone seemed to be ignored altogether. Perhaps the feedback on which timing is based is proprioceptive rather
Steven W.Keele and Richard I. Ivry
188
8
: s
200
-tEi
100
B
kn Intertq, Interval mEx?cS 1
Fig. 2. The Wing and Kristofferson model is used to decompose intertap variability for various target intervals into clock and implementation variance (u). Based on Wing (1980).
than auditory. To test that possibility, Conrad and Brooks (1974) trained monkeys to make regular back and forth arm movements between two mechanical stops. If proprioceptive feedback from hitting the stop is instrumental in triggering the return movement, the return movement should begin earlier when the stop is moved inward, resulting in an earlier contact than expected. However, the timing of the return movement was unaffected by the proprioceptive change. It would appear that unless there is a rather marked perturbation of feedback, each successive movement is triggered by a cycling clock and is uninfluenced by variations in the duration of implementation time with its resulting variation in the time of feedback receipt. The analysis by Wing and Kristofferson suggests, therefore, that a central component to timing, a clock, can be isolated from the motor system that implements movement. A primary question we raise in our own research is whether the clock is modular. Is the same timing system used for different effectors-finger, arm, and foot-or does each effector have its own clock? Is the same clock involved in judging the duration of perceptual events? We have employed four methods to investigate these questions: correlations of individual differences, analysis of neurological patient differences, dual task performance, and functional similarities between perception and production. We describe our work involving each of these approaches.
Modular Analysis of Timing in Motor Skill
189
III. Individual Differences in Timing Suppose that different modules do indeed regulate timing and force. In that case, timing abilities of different people should correlate across different effectors. Thus, a person good at timing with the finger should also be good at forearm timing. Similarly, the ability to control force with the finger should correlate with ability to control force with the forearm. However, given different modules for timing and force control, there is no reason to expect timing ability to correlate with force control even when the same effector is used. In other words, correlations should reflect common processes, not common effectors. In one study (Keele, Ivry, & Pokorny, 1987a) designed to test the modular notion, 29 subjects synchronized key pressing by movement of either the finger or the forearm with an auditory signal that occurred regularly every 400 msec. After a 6-sec synchronizing period, the pacing tone was terminated and subjects continued to press an additional 31 times, attempting to maintain the target interval. The standard deviation of the intertap intervals from each bout was calculated. The measure of timing ability was the average of the standard deviation over many bouts. In a second task, subjects made isometric finger or forearm presses on a force transducer whenever a nonperiodic tone signaled that it was time to press. Target forces ranging from 3.0 to 10.8 N (3 10 to 1100 g) were indicated by horizontal lines on an oscilloscope screen. After a key press, a vertical line rose on the screen in proportion to the produced force. A subject’s task was to attempt to produce forces which resulted in the vertical line terminating on the horizontal target. Six practice presses with feedback at a particular target force were followed by six additional presses with no feedback. The standard deviation of peak force on the six presses without feedback was averaged over several bouts and over five different target forces to measure force-control ability. The results in Table I show the correlation between timing ability with finger and timing ability with forearm to be very high at .90.Similarly, the correlation between force control with the finger and force control with the forearm is also high at .76. However, correlations between timing ability and force ability are low (and nonsignificant except for the .34) even when the same effector is used for both tasks. Such results are what one would expect if timing and force control emanate from separate modules. Is it possible that the presumed timer that underlies finger and forearm timing also subserves perception as well as motor production? To test this possibility, a different group of 32 subjects was tested on three tasks (Keele, Pokorny, Corcos, & Ivry, 1985). One task involved timed tapping in a manner similar to the study just presented. After synchronizing with a pacing tone occurring every 400 msec, the tone terminated and subjects continued
190
Steven W. Keele and Richard I. Ivry
TABLE I CORRELATION BETWEEN TIMING AND FORCE CONTROL OF FINGER AND FOREARM Timing variability
Finger Timing variability: arm Finger Force variability: arm
Finger
Arm
.9P .90 .30
.91
.I8
.34 .21
Force variability Finger
.87 .76
Arm
.76
Stalicized values are reliabilities. All other correlations are uncorrected for attenuation.
to tap with either index finger or foot. Again, the standard deviation of the intertap intervals served to indicate timing ability. Variability of finger correlated .51 with variability of the foot (when corrected for attenuation due to unreliability of the component tasks). Such a result is again consistent with the view of a common timer for different effectors. The second task involved tapping as rapidly as possible with finger, forefoot, or heel. Maximum rates on these tasks correlated with each other in amounts ranging from .52 to .64. The third task involved duration judgments of perceptual events. Subjects heard two clicks separated by 400 msec. One second later another pair of clicks with a variable interval was presented. Subjects judged whether the second interval was longer or shorter than the first interval. For each subject, difference thresholds were calculated as measures of acuity in judging the temporal differences. The acuity scores from the perceptual task were then correlated with timing ability and maximum rate averaged over finger and foot. Accuracy of motor timing correlated significantly with the acuity of perceptual time judgments (r = .53; r = .60 when corrected for attenuation). In addition, motor timing correlated .46 with maximum rate. However, perceptual timing and maximum rate did not correlate significantly (r = .18). It appears that motor timing is composed of at least two parts. The part that correlates with maximum rate might be “motor noise.” Presumably that factor produces variability in intertap intervals and also constrains the maximum rate at which the motor system can cycle. The part that correlates with the perceptual task presumably reflects a common timing mechanism. The only other report that we know of in the
Modular Analysis of Timing in Motor Skill
191
literature that suggests a similar result is one by Smith (1957) which indicates a .45 correlation between the discrimination of intervals and the production of intervals. One might argue that motor and perceptual timing correlate for reasons other than a common timer, such as individual differences in motivation. However, very low or nonexistent correlations between timing and force control and between perceptual timing and maximum rate of motor production suggest that the common requirement of timing, not motivation, is at the heart of the correlation between motor and perceptual timing. The two studies that we have discussed so far found correlations of motor timing among three different effectors-finger, forearm, and foot-suggesting generality across diverse muscular systems. A similar issue concerns generality across different kinds of perceptual judgment. In an unpublished study (conducted by Bettina Debu and Diane Manchester in our lab), 29 subjects performed on two different tasks of perceptual time judgments. One task was as described above: Subjects compared the durations of two intervals and individual acuity thresholds were calculated. The other task was similar except that subjects compared two steady tones. The first tone was 400 msec in duration. One second later the second tone, which varied in duration, was presented. The subjects indicated whether the second tone was longer or shorter and, as before, difference thresholds were calculated. The thresholds for the two different tasks correlated .68 (or .86 when corrected for attenuation). The correlation of acuities on the two tasks of perceptual-duration judgment is important beyond simply showing generality of perceptual timing. With click pairs, subjects have an impression of beats. This raises the issue of whether the correlation between the perceptual task and the motor task is because subjects are doing something like generating covert motor responses to the perceptual beats. However, steady tones provide no subjective impression of beats. The fact that abilities on the two perceptual tasks strongly correlate suggests that the timing mechanism does not require either overt or covert motor responses. Overall, the correlational analyses are consistent with the view of a timekeeper common to different effectors and to perception. Moreover, the correlations of the timing tasks with the maximum rate of motor activity suggest that there is an additional aspect of timing variability on the motor task that is specific to motor implementation and not timing per se. Such a conclusion is reminiscent of the Wing and Kristofferson (1973a,b) decomposition of tapping variability into clock and implementation components. We have used the Wing and Kristofferson method to decompose the intertap variability for each individual’s finger and foot production into the clock and implementation components (Keele etal., 1985). One would expect that if the clock component is common to finger and foot and the implementation components differ at least in part, the correlations of the clock
192
Steven W. Keele and Richard 1. Ivry
components would be even higher than correlations of overall intertap variability. However, in both this study and others we have failed to find any support for this prediction. We are still investigating the issue. Although we have not had success in applying the Wing and Kristofferson technique to the analysis of individual differences, we have been much more successful in applying it to the analysis of patients. That work is described in a later section.
IV. Further Analysis of Force Control and Maximum Rate So far we have reported that substantive correlations among timing abilities cut across a number of task variations-motor production with finger, forearm, and foot, and perception of intervals produced by click pairs and steady tones. In this section we show that comparable generality occurs for force control and for the maximum rate at which reciprocal movements can be made. A. FORCE CONTROL In the study by Keele et al., (1987a), we found that variability of force production correlated between finger and forearm production (r = .76). In that particular case, the force ranges were the same for both effectors, with target forces between 3.0 and 10.8 N (these forces are equivalent to masses under gravity of 310 and 1100 g). Although the force ranges are the same, the forearm is of course much stronger than the finger, so that rather delicate control is required with the forearm compared to the finger. Nonetheless, forcecontrol abilities correlated rather markedly across the two effectors. In a second experiment,. subjects repeatedly produced forces with the finger ranging from 5.1 to 7.8 N and forces with the foot ranging from 14.7 to 21.1 N. The standard deviation of the achieved peak forces was again the measure of ability to control force. The correlation across subjects between finger and foot was .73. Moreover, we (Keele et d.,1987a) have also found that force control, when it is the object of subjects’ attention, correlates .43 with force variation when the subjects’ main goal is to produce accurate times and they are uninstructed regarding force. Thus, it appears that the ability to control force correlates across diverse effectors even when the force ranges differ and whether or not accurate force production is a goal. A dominant theme that guides our research is the modular one, in which we ask whether different components of motor control are not only general but also independent of each other. The correlational work certainly is consistent with the idea of generality of both timing and force control components, but are they independent? We have already pointed out that the
Modular Analysis of Timing in Motor SkiU
193
correlations among force control, when it is the primary object of control, and timing variability are small and mostly nonsignificant, indicating a large degree of independence. However, other findings suggest at least a small dependence between the two factors. One reason for a dependence can be understood in terms of the logic of the Wing and Kristofferson model. After a clock indicates the time to initiate a movement, other processes transpire to implement the movement. Very likely a more forceful movement would result in faster implementation. Thus, a movement that was more forceful than average, due to random variation, would tend to occur early, shortening the preceding interval and lengthening the following one. As seen in Fig. 3, the force of a press correlates negatively about .10 with the duration of the preceding interval and positively about .18 with duration of the following interval. A dependence between time and force due to peripheral features is of no consequence for the notion of separate modules for time and force control. However, beyond the peripheral interaction, there may be a more central interaction between the two operations. In one experiment reported by Keele et al. (1987a), subjects tapped a series of responses in which every two short intervals were followed by an interval twice as long. One of the key presses in each cycle was to be accented by a more forceful press. When the accent was on the pulse that separated the long interval from the preceding short interval, the accent lengthened the preceding interval and shortened the following one. This pattern is the opposite of what would be predicted were force simply to speed up implementation and would suggest, therefore, that a mechanism that regulates force also alters in some degree the central generation of time. The interaction between force and time perhaps should not be overblown. Certainly force control and time control are largely unrelated. There is very little correlation between abilities in the two domains, suggesting that they largely come from separable modules. It is possible that what interaction occurs actually comes from outside the modules responsible for force and time control. Such a possibility is explored when discussing the results of tests with neurological patients.
RATE OF REPETITIVE ACTIVITY B. MAXIMUM In several studies we have examined the maximum rate at which repetitive movements can be made for a variety of effectors. Questions about maximum rate differ from those of timing and force control in that it is not clear that one would postulate a brain module whose purpose is to regulate rate. Nonetheless, when considering timing and force control, it also has been of interest to investigate the operation of the motor system at its maximum rate.
Steven W. Keele and Richard I. Ivry
194
.20
-
.lo -
.15 z
0 I-
4 w
a o z 0 0
.a--.05'
-.1cI
-5
-4
1
-3
1
-2
1
I
-1
0
LAG
1
1
I
2
I
3
I
I
4
5
Fig. 3. The correlation between the force of taps and the durations of intervals that precede and follow the taps. Lag 0 refers to the interval just before a tap. Negative lags refer to earlier intervals and positive lags to later intervals. Xs are for the finger, boxes for the foot.
Earlier in this article we reported that the maximum rates at which the finger, forefoot, and heel could be moved up and down correlated in a range of 3 2 to .64.Those correlations are similar to ones obtained in a study by Keele and Hawkins (1982) involving a wider variety of effectors: forefinger, thumb, wrist, forearm, and forefoot (see Table 11). Table I11 shows the intervals at maximum rate to range from 158 to 205 msec/tap when averaged over the 15 subjects of the study. What accounts for the rather sizable correlations among the maximum rates of different effectors? One general possibility is that the limit is set by peripheral properties of the effectors. A more interesting possibility is that central factors, including ones that affect perceptual tasks, might affect maximum speed. We have investigated several such possibilities, ruling out some and providing at least tentative support for others. The various effectors have differing lengths and masses, and it is possible that these affect reciprocation rate much as in a pendulum. Such seems to be the case for maximum rate of reciprocation of the legs in running (e.g., Heglund, Taylor, & McMahon, 1974), which can only cycle about twice a second in humans (see review in Keele, 1986). However, Table I11 shows that the longer and more massive effectors are not the slowest: forearm and wrist are the fastest and finger, thumb, and foot are the slowest. All are on the order of twice as fast as the legs in running. Moreover, we measured individual differences in the lengths and circumferences of feet and fingers;
Modular Analysis of Timing in Motor Skill
195
TABLE I1 CORRELATIONS IN SPEED BETWEEN SYSTEMS
Finger Thumb Wrist Arm Foot
Finger
Thumb
Wrist
.8@ .73 .70
.80 .95 .85 .12 .59
.84 .98 .80 .91
.59
.67
Arm .69
.75
.19 .86
.68 .I4 .69
.61
.92
1.0
.64
Foot
Talues in italics are reliabilities. Values below the major diagonal are uncorrected. Those above the diagonal are corrected for attenuation.
TABLE 111 TAPPING SPEED
Finger Thumb Wrist Arm Foot
Msecltap
Tapslsec
20 1
5.0
205 160
4.9
158 198
6.3 6.3 5.1
and this unpublished data show no relation between those and the maximum rates at which people could tap. Cycling of the legs in running may be largely controlled by subcortical mechanisms (Grillner, 1981) and designed to exploit pendular properties to preserve energy. The cyclical movements of the other effectors may emerge from different systems and may be less sensitive to energy considerations. Another biomechanical feature that could limit reciprocation rate is minimum contraction time of muscles. Freund (1983) has suggested that the maximum rate of reciprocation is only slighly slower than that allowed by contraction time of muscles. By itself, this would not account for the correlations across different effectors unless it was also supposed that mechanical properties of the muscles differed among individuals and were correlated across effector systems. Such a possibility has not been evaluated. Although contraction speed might account for some of the rate differences across individuals, it appears not to be the only cause. Another factor appears to be variability in the motor system itself. Earlier we reported that the regularity of motor timing correlates about .5 with maximum rate,
196
Steven W. Keele and Richard I. lvry
whereas perceptual timing fails to correlate with rate. Our earlier suggestion was that motor variability is composed of two portions, one that could be attributed to a timer and the other to motor noise. It appears that the latter component constrains speed by preventing a consistently optimum time of arrival of signals to the muscles. Another closely related constraint on maximum speed may be whether, consistency aside, the optimum pattern of input to the muscles is provided. In an unpublished study (Corcos, Keele, Woollacott, & Pokorny, 1984), subjects made reciprocal forearm movements as rapidly as possible. Electromyographic analysis suggested that the slower subjects tended to have rather strict alternation of biceps and triceps with nearly equal durations of the activation and deactivation times of each muscle; on average the activation period was 164 msec and the off period was 157 msec. For faster subjects, the average on period was 136 msec and the off period was 113 msec. The fact that for fast subjects the off period was shorter than the on period implies that there was partial coactivation of the biceps and triceps muscles with one becoming activated sightly before the other one terminated. However, the different pattern for the fast and slow subjects was not reliable at standard levels of confidence (.05 c p c .lo). Moreover, we could not determine whether the pattern of the faster subjects was intrinsic to the neural organization for those subjects or induced by the greater speed of reciprocation. It would be useful to know what would happen to the EMG patterns at speeds just below maximum. The issue of optimum pattern of muscle activation needs further investigation. Is there any possibility that a constraint on maximum reciprocation rate might be of more central origin than muscular constraints? In preliminary exploration of the issue (unpublished data), we examined two perceptual tasks with cyclic periods approximately that of the reciprocal motor movements, i.e., around 200 msec. In one task developed by Warren (e.g., Warren & Obusek, 1972), subjects listened to repeating sequences of four distinctive sounds: buzz, hiss, low tone, and high tone. After a series, subjects reported the order of the sounds. The onset-to-onset intervals of successive sounds in different series were either 75, 125, 175,225, or 275 msec. The percentage of correct identification of the series orders was determined for each rate, and a 75% threshold was determined for each subject. The second task was a variant of one investigated by Cheatham and White (1954). Subjects listened to series of 6 to 10 clicks, and after each series indicated how many clicks they had heard. The interval between clicks was varied and for each subject a threshold interval was estimated. In addition to the two perceptual tasks, each of the 31 subjects performed a number of bouts of maximum-rate tapping with the finger, thumb, wrist, forearm, and forefoot. Correlations among the tapping rates of the different effectors were more or less the same as in previous studies, averaging
Modular Analysis of Timing in Motor Skill
197
.54. Averaged over both effectors and subjects, the mean interval between taps was 173 msec. The mean threshold interval for the sound-order task was 219 msec, and the mean threshold interval on the click-counting task was 170 msec. The correlation across subjects between maximum tapping rate and performance on the Warren task was .53 (p < .005). Performance on the click-counting task correlated with maximum rate .32 (p < .05) and .66 with performance on the sound-order task. Since maximum rate on the perceptual tasks correlates with maximum rate of repetititve motor activity, it is possible that some central ratelimiting factor lies behind both tasks. However, the result must be interpreted with caution, because the design does not encompass the ideal paradigm for correlational research. It would be useful to show not only a correlation between maximum motor rates and maximum perceptual rates, but also a lack of correlation between these and some other reasonable perceptual and motor tasks to rule out some third factor such as motivation. Our various investigations do show that the maximum rates at which effectors can be moved back and forth are correlated across individuals. The rates are not determined by the pendular properties of the effectors. It would appear that because the rates are only slightly slower than allowed by contraction rates of muscles (Freund, 1983), the contraction rate sets an absolute limit, but other more central factors also play a role in individual differences: Subjects more variable on motor timing are slower in maximum rate. Individual differences in the pattern of input to the muscles may affect maximum reciprocation rate, though this issue requires further investigation. There is some suggestion for an even more central constraint on maximum rate in common to some perceptual as well as motor tasks. The elucidation of such possible constraints would be useful in future research.
V. Individual Differences in Skill A long-standing interest in psychology has been whether individual differences in motor-skill performance can be predicted from more elementary abilities. Early work on this problem by and large has not been very encouraging. Performance on a skill is not predicted well either by elementary abilities or by performance on another skill. Such lack of success has paved the way for another now-dominant view of individual differences that stresses differences in acquired knowledge bases. We suggest that, while a knowledge-base view undoubtedly captures a major aspect of individual differences, a modular approach may motivate a reexamination of the ability notion.
198
Steven W. Keele and Richard I. Ivry
A. CRITIQUE OF EARLIER APPROACHES TO INDIVIDUAL DIFFERENCES A thoughtful review by Marteniuk (1974) of much of the older literature indicates that performances on a variety of simple tasks such as reaction time and movement time, that might be thought relevant to complex motor skills, fail to correlate with each other. Moreover, skill on complex tasks, such as differing sports, fail to correlate with each other. Such results led Marteniuk to endorse specificity theory, a notion earlier promulgated by Henry (1956, 1958, cited in Marteniuk, 1974). The view suggests that individual differences on one task are largely specific to that task. Moreover, individuals who are superior on several different tasks, such as different sports, are simply people who happen to be good on a very large number of independent abilities. General “athletic” ability does not exist. An examination of the studies surveyed by Marteniuk raises several issues, however. First, the selection of abilities expected to correlate was not based on theory or other research. Thus, several studies do fail to find a significant correlation across individuals between reaction times and movement times. However, work by Fitts and Peterson (1964) provides little reason to suppose that such abilities would be correlated. Reaction time is primarily influenced by number of choices and compatibilityand hardly at all by the distance to a target and its width. The reverse is true for movement time. Second, many former studies of individual differences failed to make use of the central methodological techniqueof modern cognitivepsychology, namely, the subtractivetechnique. In most studies of cognitivepsychology, quantitative specificationof a process is typically implemented by some form of subtraction of one condition from another. For example, to isolatedecision processes from visual and movement processes in reaction time, one might identify the decision process with the slope of the function that relates reaction time to number of choices (Hick, 1952; Hyman, 1953). Most past analyses have, unfortunately, been based on task performance per se rather than on the isolated processes. Another issue arising from Marteniuk’s conclusions favoring specificity concerns whether one could reasonably expect a single ability, or even just a couple, to underlie diverse activities. Might not the conclusion that a gifted athlete is strong on many abilities be exactly what is expected from the view that any particular task depends on a number of underlying processes, and the mix of processes might differ markedly from task to task? The research reviewed by Marteniuk was perhaps burdened by the vain hope that motor skill could be understood in terms of one or two abilities. Another prominent line of work on individual differences in motor control emerged from factor analysis. Excellent summaries appear in Fleishman (1966) and Jones (1966). Fleishman and colleagues tested large numbers of subjects, often around 200, on batteries of tasks, including complex tracking tasks, discrimination reaction time, hand steadiness, and
Modular Analysis of Timing in Motor Skill
199
anticipation of visual coincidence. Performances on the various tasks were correlated and a factor analysis conducted in order to define factors in common to different subsets of the tasks. Among the numerous abilities deduced in this manner were multilimb coordination, reaction time, speed of arm movement, manual dexterity, finger dexterity, arm-hand steadiness, static strength, gross body coordination, and stamina. Thus, as Marteniuk (1974) had supposed, many different abilities underlie skill. A notable aspect of Fleishman’s work was the demonstration that the factor structure that underlies a particular task changes with practice. Figure 4 illustrates how the importance of the reaction time and rate of movement factors grow with practice on discriminination reaction time, while spatial relations diminish in importance, and a component specific to discrimination reaction time also grows. Similarly, Fleishman and Rich (1963) found the utility of spatial abilities in predicting performance to diminish with practice while that of kinesthetic sensitivity increased. Fleishman’s conclusion that a relatively large amount of final performance cannot be predicted from other factors is quite congenial to conclusions arising from the analyses of chess and other cognitive skills (e.g., Chase & Simon, 1973; Ericcson & Chase, 1982). The expert advantage over novices appears primarily due to an extensive knowledge base containing a huge repetoire of patterns intrinsic to the skill and specific actions for each pattern. Similarly for basketball, Allard, Graham, and Paarsalu (1980) found that experienced players were much better than novices at recalling positions of players on a mock court after a brief exposure. As in comparable findings with chess, the expert advantage was found only for patterns that would appear in real games in a structured setting. Such results are surprising given a common intuition that expert performance is due to unusual abilities. Newel1 and Rosenbloom (1981) have extended such a pattern-learning theory of expertise to a quantitative formulation of improvement on motor skill with practice. It appears clear than an increasing knowledge base intrinsic to the skill being learned is the primary factor leading to improved performance. Nonetheless, this does not necessarily mean that the highest levels of skill are not constrained by more elementary abilities. An important observation of Fleishman was that, even after extensive practice, certain abilities predict performance. One many suppose that, with practice, performance begins to approach a level that is limited by underlying capabilities. Perhaps the primary criticism of factor analytic work is that it is not well motivated by the theory, research, and methods of cognitive psychology. The factors discovered by Fleishman bear little resemblance to concepts that have emerged from the study of motor control within the tradition of cognitive psychology. The performance scores correlated in the factor analytic approach are typically whole-task scores rather than scores derived
200
Steven W. Keele and Richard I. Ivry 100%
Voriance unoccounted
for
50
t
i
Discrimmotion reaction time, specific
01 I
3
5
7
9
I1
13
3
Trials
Fig. 4. Changes in the factor structure with practice on a discrimination reaction time task. From Fleishman (1966).
by subtraction-like methods to isolate and quantify particular processes. The modular analysis of skill that we have adopted attempts to rectify these problems. B. EXPERIMENTAL ANALYSIS OF INDIVIDUAL DIFFERENCES
Although most of the work reported in this article is concerned with timing, force, and speed, in an earlier series of studies (Keele & Hawkins, 1982) we investigated individual differences in attentional abilities-time
Modular Analysis of Timing in Motor Skill
20 1
sharing and attention switching. The idea was that since many complex skills, such as piloting a plane or playing soccer, involve several simultaneous events, successful performers might be competent either because of a superior ability to time share the activities or to rapidly switch between them. Although we found no evidence for a general ability to time share (cf. Ackerman, Schneider, & Wickens, 1984), we did produce some evidence for a general attention-switching ability. To investigate attention switching, we devised a variety of situations in which on occasion an unexpected signal occurred, thereby requiring an attention switch from an expected one. Generally speaking, reaction times to unexpected signals are much slower than to expected ones. The paradigm and theory has been well developed by Posner and colleagues (see Posner, 1978). In one of our cases, a cue indicated whether or not to expect a red signal as opposed to one of three shapes. In a second case, the expectancy was induced by having a long series of successive shapes with only the rare occurrence of a red light. In a third case, the signals regularly alternated between colors and shapes, requiring a regular shift of attention. The efficiency of attention switching can be deduced by subtracting reaction times to expected signals from unexpected signals. Keele and Hawkins found individual differences in such measures of attentional flexiblity to correlate around .45 across the diverse situations. Moreover, in a second study subjects either performed with one set of signals or alternated between two signal sets, the latter requiring continuous switching of attention. Difficulties in switching are indicated by a slowed reaction time in the alternating case compared to the single task. Switching ability correlated .48 across situations involving markedly different kinds of signals and responses. The attention-switching study suggests that some unitary mechanism might be involved in switching attention in diverse settings, constituting by our criteria a module. This conclusion is consistent with recent neuropsychological work that suggests that a high-level cortical system is involved in switching emphasis between tasks of different character (Posner, Inhoff, Friedrich, & Cohen, 1987). In turn, such an ability might be useful to help predict performance on real tasks that involve a considerable amount of switching between different demands. Altogether, then, we have developed over the last several years evidence for modules of timing, force control, and attention switching, and we also have found consistent individual differences in maximum rate of activity. Beyond the isolation of modules, the question can be raised whether individual differences in functioning of the modules predicts performance on nonlaboratory tasks. Work along this line has been limited, but we have explored piano playing, and an old study of typing by Book (1924) fits within the framework.
202
Steven W. Keele and Richard 1. Ivry
We assume that becoming a very good pianist requires extremely good timing and the ability to make fast reciprocal movements with the fingers. To test this proposition, we (Keele ef a/., 1985) compared 16 unusually skilled pianists to a control group of 32 subjects who were not highly skilled in piano playing (the controls were the same subjects reported earlier in which motor timing, motor rate, and perceptual timing were correlated). Each subject was tested on the motor-timing task in which timing proficiency was defined by the standard deviation of the intertap intervals; on the maximum-rate task in which finger, forefoot, and heel were tapped as rapidly as possible; and by the acuity of distinguishing the durations of the intervals between tones on the perceptual-timing task, Table IV shows that the pianists were significantly better on both motor and perceptual timing than the nonpianists. At maximum rates, pianists have shorter intertap intervals with the fingers than do nonpianists. A further decomposition of motor timing by means of the Wing and Kristofferson method indicated that pianists were significantly better than nonpianists on both the clock and the implementation components of timing variance. Conceivably, extensive piano practice can improve timing and speed performance rather than such abilities being prerequisites for expert-level performance. The danger of inferring cause and effect from correlations has been often stated. We are thus unable to differentiate the two possibilities, but at a minimum the study suggests a relation between abilities tapped by our very simple tasks for timing and speed and one important real-life skill. Book (1924) examined the relation between typing speed and maximum rate of reciprocal activity. He made use of norms of maximum tapping rates of forearm, upper arm, wrist, and index finger established by Nicholson (1925; see Keele, 1981, for a description of Nicholson’s data). The norms TABLE IV
SPEEDAND TIMING (IN MSEC) FOR PIANISTS AND
NONPIANISTS
Speed: Mean intertap interval Finger Foot Heel Motor timing: SD Finger Foot Perception range
Pianists
Nonpianists
160 181 182
182 182 176
15.3 17.7
20.1 21.3
25
36
Modular Analysis of Timing in Motor Skill
203
were based on nearly 25 subjects of each sex for each age, ranging from 17 to about 50 and even included data for ages up to age 81. More recently Gentner (1981) has shown that the average interstroke interval for expert typists when the same finger is used twice in succession varies from typist to typist in the range of 164 to 225 msec. The similarity to intertap intervals at maximum tapping rates suggest that typing speed in experts is constrained by the maximum rates at which they can move effectors. When Book compared the tapping rates of national and international champion typists to the age-and sex-matched norms established by Nicholson, he found the rates of champions to be about 30% faster. Perhaps extensive typing practice improves tapping rate rather than tapping rate limiting speed. However, Book found that champion typists were also faster in tapping with effectors (such as the upper arm) that are not used in typing. Second, in a study of college typists, he found a correlation between typing speed at the end of the course and maximum tapping rate assessed before the course began. Although our explorations of a modular approach to individual differences are limited, they may nonetheless serve as a useful model for a revitalization of the analysis of individual differences in skill. The first stage, we believe, is to establish the reality of hypothesized modules. Our work along this line has been most successful for timing, but we also have established strong evidence for a force-control module and moderate evidence for a module involved in attention switching. We have also found that maximum rate of reciprocal activity is limited across effectors and perhaps even correlates with rate of certain perceptual tasks. The second stage is to show that individual differences in functioning of the module are predictive of performance on complex tasks. We have done little regarding this second stage, but have shown that skill on the piano is related to individual differences in timing and maximum rate. In addition, early work by Book suggests that maximum rate of reciprocal movement constrains ultimate speed in typing.
VI. Neurological Analysis of Timing In the previous sections, we have hypothesized that there are taskindependent operations such as timing or force control. The correlational work has yielded model tasks which assess the functioning of these separable components. Neuropsychological research provides a second way to investigate the validity of these hypothesized procedures. Patients with lesions in different parts of the motor pathways can be tested in an effort to show dissociations between the specific components. To give a hypothetical
204
Steven W. Keele and Richard I. Ivry
example, suppose it were found that patients who had damage in the basal ganglia had difficulty in the force-control task, whereas cortical patients and cerebellar patients did not. This would then imply that the basal ganglia play a primary role in the regulation of force output, or at least are part of a force-control pathway. It is important to recognize that this neuropsychological approach bolsters our research program in two distinct ways. First, the performance of neurologically impaired subjects offers an independent methodology which should converge with and extend the results we have observed in our other experiments. For instance, the Wing and Kristofferson model attributes tapping variability to two independent sources of variance. Thus, we shoud be able to predict how some patients will perform on the tapping task solely on the basis of their neurological diagnosis. Specifically, patients with peripheral nerve damage should only demonstrate an increase in their implementation estimate, since the clock is postulated to be one of the components of the central control system. Second, the neuropsychological approach represents an attempt to explicate a specific process at two distinct levels of description. That is, we wish to link behavioral phenomena such as tapping performance to the underlying neural systems which control specific aspects of that behavior. It is not, in principle, necessary that a specific behavioral process be supported by specialized neural tissue. The phenomenon may be the result of the dynamic interactions of distributed systems or may be the observable manifestation of transient control processes which are created for the completion of a specific task. Nonethless, results such as the significant correlation between the production and perception of time are most easily accounted for by assuming that these tasks involve processing in a common neural system which constitutes an internal clock. If this were so, we should expect to find patient groups who have difficulty in any task which involves the internal clock. In this section we present some selected case studies from our neuropsychological research. A more thorough discussion of this work can be found in Ivry (1986). We include a sketch of this aspect of our research in order to demonstrate how the study of cognitive processes can be supplemented by neuropsychological research. A. CASESTUDY1
As noted above, a strong test of the Wing and Kristofferson model can be made by testing patients in which the neurological deficit is peripheral. To review, the model rests on the assumption that there are two processes involved in tapping and that these two processes operate independently of each other. A corollary of these assumptions is that the processes operate in
Modular Analysis of Timing in Motor Skill
205
an open-loop (i.e., feedback-free) mode. Thus, any variability in the implementation process is predicted to have no effect on the variability of the timekeeper process. A peripheral neuropathy case study was undertaken to test the following predictions: (1) Added variability in a finger-tapping task following peripheral nerve damage will lead to an increased estimate of implementation variability. (2) The clock estimate will be unaffected. If the results are not supportive of both predictions, then the Wing and Kristofferson model will be of questionable value in patient research. The patient (WHI) had been involved in an automobile accident. He had suffered spinal injuries for which he was undergoing intensive rehabilitation throughout the test period. He was tested on two different occasions, approximately 4 and 5 months after the accident. At this time he was unable to use his lower extremities but had recovered complete control over his right hand and partial control over his left hand. As part of his hospitalization program, electromyographic and nerve conduction tests were performed during the period between test sessions. These tests revealed some persistent minor denervation in the abductor muscles on the right side, although as noted above, this did not produce any clinically detectable deficits. The same muscles, as well as the distal extensors, on the left side showed acute and chronic denervation. The asymmetry in hand recovery allowed for a within-patient comparison. This within-subject control is an essential ingredient for single-subject methodologies. In previous unpublished research with normal subjects, we found that patients are equally proficient at tapping with either hand. Thus it is reasonable to assume that any differences demonstrated by WHI between left-handed and right-handed tapping can be attributed to the remaining neurological problems associated with his left arm and hand. Each of the two test sessions was comprised of three blocks of tapping with each index finger. A block was completed when the subject had produced six errorfree trials, each trial being composed of 12 responses with a pacing tone and 31 unpaced responses. “Errorfree” is an arbitrary term, since it is based on a criterion that each interval produced by the subject be within 50% of the base interval (Keele et al., 1985). Three blocks of tapping on each of 2 days produced a total of 36 errorfree trials with each hand. Test hand was alternated between blocks, starting with the unaffected right hand on the first session and the affected left hand on the second session. Each session began with two practice trials with each hand. WHI was more variable in tapping with the impaired, left hand. The overall SD of his intertap intervals with the left hand was 32 msec, whereas the comparable figure for the right hand was 26 msec. Figure 5 presents the clock and motor-delay estimates derived from the Wing and Kristofferson model for each block of trials. As can be seen, there is considerable overlap between the clock estimates for the two hands. In contrast, with the exception of the
Steven W.Keele and Richard 1. Ivry
206
0 Clock x Pbtor
50
InPlmtoticr
- Imxlired
h
I"
--- Unimired
v
1
2o
v, 10
1 1
2
3
4
5
6
7
Trial Block
Fig. 5. Clock and motor-implementation variances for each block of trials for patient WHI.
final block, the implementation estimate for the left hand is higher than for the right. Statistical analyses verified the reliability of this difference (t(5) = 2.68, p < .05). A similar analysis showed no difference in terms of the clock estimate (t(5) = 0.51). It is unclear why the final block deviated from the general pattern. It may partially reflect the fact that WHI had shown some additional recovery since the first test session. His overall standard deviation had dropped from 34 msec to 29 msec over the 1-month period. However, sampling error is a more likely factor, since this same block also showed a clock estimate as high as any other. The predicted dissociation observed in the patient with peripheral nerve damage demonstrates that the Wing and Kristofferson method can be useful in trying to identify the neural mechanisms involved in timing. The next step in our neuropsychological research program was to test patients with different types of lesions in subcortical and cortical structures. The question with each of these patient groups was whether increased variability in the tapping task could be attributed to the clock or implementation process. Two neural systems of interest were the basal ganglia and the cerebellum. Many researchers have argued that these two subcortical systems are involved in the planning of movement (Allen & Tsukahara, 1974; Delong & Georgopolous, 1981; Brooks & Thach, 1981). The exact nature of their respective contributions is open to question. We wished to test whether timing functions may be controlled by one of these subcortical systems.
Modular Analysis of Timing in Motor Skill
B.
207
CASE STUDY^
This case report addresses one of the more problematic results from our neuropsychological studies. Parkinson patients are used in research concerned with lesions of the basal ganglia since the dise,ase is known to primarily affect the dopaminergic pathways originating in the substantia nigra, one of the nuclei of the basal ganglia. Wing et al., (1984) presented a case history of a hemiparkinsonian patient. In the initial phase of that study, the patient was found to have increased clock variability on the affected side in comparison to the unaffected hand. Moreover, the subject had even greater difficulty in tapping when tested a year later, and this added variability was entirely attributed to the clock process. However, other Parkinson subjects have not shown any deficit in the tapping task (Ivry, 1986). It thus remains unclear whether Parkinson’s disease produces deficits in tapping performance. One explanation for the discrepant results may stem from the fact that almost all Parkinsonian patients are receiving some version of L-dopa to stimulate dopamine production. The differential performance may be due to the fact that the medication is more effective with some patients than others. In addition, there are side effects from L-dopa that can create new movement problems. This makes it difficult to determine whether it is the Parkinson’s disease or the medication which may cause patients to have difficulty in the tapping task. To overcome the potential artifacts introduced by medication, we tested a patient shortly after it was determined that he had Parkinson’s disease and before he received any L-dopa therapy. BAU, a 75-year-old man, was initially tested 2 weeks after having been diagnosed as Parkinsonian. He presented a moderate resting tremor and was found to be mildly rigid and akinetic. After the first test session, BAU began treatment with Medopar, a variant of L-dopa. He ingested three 125-mg tablets daily, one before each meal. The medication regimen did not change over the next 2 weeks. During this period, BAU participated in an additional five test sessions. Each session consisted of six blocks of tapping with the right hand. Six error-free trials constituted each block. Our within-subject comparison with this patient involved his performance at various stages of medication. Figure 6 shows the mean estimates of the clock and motor-delay processes as a function of test session. Since all six blocks are averaged together for each session, each data point represents the mean across 36 trials. It can clearly be seen that the motor-delay estimate remains fairly constant throughout the study. Most striking is the rapid decrease in the clock estimate over the first three sessions. When the subject was tested prior to any medication, the clock estimate was 44 msec. By the second session, at which time BAU had been receiving medication for 3 days, the estimate was
Steven W. Keele and Richard 1. Ivry
208
0 Clock
M
x btor Imlmtatim
10
Sessions
Fig. 6. Clock and motor-implementation variances for each block of trials for BAU.
33 msec. At subsequent sessions the clock estimate leveled out at approximately 26 msec. These trends were verified in an 6 (blocks) x 2 (variance source) repeated-measures ANOVA. Of most interest is the finding that the interaction was significant (I;y5,25) = 4.10, pc.01). Post hoc analyses revealed that only the clock estimates varied across blocks. Although no control subjects have been tested as extensively over a comparable time period, we have never found there to be much benefit on the tapping task from practice after the first couple of trials. Given that BAU’s dramatic improvement cannot be attributed to practice, the results strongly suggest that one effect of the medication was to correct a deficit in the timing process. These results imply that Wing et al., (1984) were correct in concluding that basal ganglia dysfunction can impair the timing process. The two case studies reviewed above present a double dissociation based on the two-process model of repetitive movements developed by Wing and Kristofferson. The peripheral neuropathy patient was found to have increased variability only in the implementation process, whereas only the clock process was impaired in the Parkinson patient. Nonetheless, it is not possible to conclude that the basal ganglia are responsible for timing functions. The only conclusion that can be drawn is that lesions in the basal ganglia can disrupt the normal functioning of the internal clock. Whether the effect is direct or indirect cannot be ascertained. Furthermore, it is necessary to examine other patients to determine whether lesions in different neural systems will also affect the timing or implementation process.
Modular Analysis of Timing in Motor Skill
209
C. CASESTUDY 3 YOU, a 25-year-old female, had suffered a stroke centered in the left cerebellar hemisphere. She was tested on four different occasions over a 1-week period approximately 1 month after the accident. Her hand movements were marked by severe dysmetria as assessed by her inability to accurately point to an object in space, and she displayed severe intentional tremor during all forms of movement involving the left side. In addition, her gait was unstable and characterized by a wide stance to compensate for balance problems. She did not show any motor deficits on the right side. Her CT scans are shown in Fig. 7 . A large lesion can be seen along the leftmost border of the cerebellum. YOU participated in a total of seven blocks of tapping with each hand over the l-week period. She completed two blocks of tapping with the left hand on the first day of testing as part of a standard protocol used in a different experiment. During the next three sessions, testing alternated between hands with an extra block added for the right hand in Sessions 2 and 3 to equate the number of blocks. YOU was unable to produce six errorfree trials during three of the blocks with the left hand, and thus the total number of errorfree trials for the left hand is 39 in comparison to 42 for the right hand. Figure 8 presents the clock and rnotor-delay estimates for each block of trials. The mean overall standard deviation for YOU was 71 msec for the left hand and 34 msec for the right. These overall scores were decomposed according to the Wing and Kristofferson model. The Wing and Kristofferson clock and motor-delay estimates were 54 msec and 29 msec respectively for the left hand, whereas comparable scores of 20 msec and 19 msec were obtained for the right hand. Correlated t-tests showed both differences to
Fig. 7. Cerebellar sections from CT scans of patient YOU. An arrow points to the damaged region.
Steven W. Keele and Richard I. Ivry
210
Clock
x
btor Imlmtuticn
-
ImIred
--- Unimired
1
2
3
4
5
6
7
Trial Block
Fig. 8. Clock and motor-implementation variances for each block of trials and for the impaired and unimpaired effector for patient YOU.
be significant (t(6) = 8.31, p<.OOl for the clock estimate; t(6) = 2.36, p < .05 for the implementation estimate). Thus YOU exhibits inflated both clock and implementation variability when tapping with her impaired hand. It is interesting to note that the motor-delay estimate for the right hand approaches more normal levels during the last few blocks. This trend may parallel the patient’s improvement over the week of testing leading up to her discharge from the hospital. The results of the tapping task with the two subcortical patients may at first glance appear quite problematic. Lesions in either the basal ganglia or cerebellum led to increased variability in the timing process. Moreover, the cerebellar patient also appeared to have a deficit in the implementation process. The issue may appear to become even more clouded, since our preliminary testing of cortical patients has yielded results which are similar to those of the cerebellar patient reported above (see Ivry, 1986). Lesions which extend into the precentral, motor areas of the frontal cortex appear to produce increases in both the clock and implementation estimates. Nonetheless, the present results become more cohesive in light of the present understanding of the neuroanatomical connections between the various motor systems of the brain. A simplified wiring diagram is provided in Fig. 9. The focus in this diagram is on the position of the basal ganglia and cerebellum in relation to the motor cortex and spinal systems.
21 1
Modular Analysis of Timing in Motor Skill
A I
Cerebral C o r t e x h
I
Thalmus
I
\
Pyramidal Tract
\ \b\
Extrapyramidal Tracts (including vestibular nuclei, Red nucleus, and reticular formation)
Effectors
Fig. 9. A simplified wiring diagram of the neural structures involved in timed tapping. Varibility in the loops which return to the cerebral cortex is hypothesized to consitude clock variance. Implementation variance is the result of descending pathways to the spinal cord.
The first point to be made is that the primary motor cortex and the cerebellum are critical components of the two primary pathways down the spinal cord. The pyramidal tract includes those neurons in the primary motor cortex which relay information to the motorneurons via a minimal number of synapses (i.e., 0 or 1). The other pathway is actually composed of a number of different extrapyramidal tracts. Most of these traverse the cerebellum at some point in their circuitry. Thus, both the motor cortex and the cerebellum have easy access to the spinal neurons. On the other hand, there appears to be little output from the basal ganglia which can have such relatively direct influence on the spinal neurons (e.g., DeLong & Georgopoulos, 1981). This arrangement meshes nicely with the finding that lesions in either the cerebellum or cerebral cortex can produce inflated motor implementation estimates. At least part of the lesioned tissue in these
212
Steven W. Keele and Richard I. Ivry
patients is presumably outside of the timing system and is part of the implementation pathways. More interesting, we have never tested any Parkinson patients who showed an increase in the motor implementation estimate. To account for the clock results, it is necessary to propose that the three neural systems are either part of a timing loop or part of an integrated circuit in which one neural system plays a primary role in timing. A loop- or circuit-based hypothesis is necessary to account for the fact that damage in either the cerebellum, cerebral cortex, or the basal ganglia can increase the variability of the timing process. The independence assumption of the Wing and Kristofferson model implies that any system which was not contained within the timing process could only affect the motor implementation estimate. There are at least two different types of timing circuits, each of which is sketched in Fig. 10. The first type (Fig. 1Oa) would implicate all of the different structures within the circuit in the control of timing. Thus, the timing of, say, 400 msec would involve setting up a path through the loop which takes 400 msec to circuit; 500 msec paths would presumably involve more synapses or siower conducting neurons in order to increase the amount of time it takes to complete each circuit. Damage at any point along the circuit would disrupt the normal functioning of this type of clock. A problem with this type of mechanism is that there is little overlap between the cortico-basal ganglia and the cortico-cerebellar loops despite the common relay of both subcortical structures in the ventral portion of the thalamus (e.g., Goldberg, 1985). Thus, the circuit cannot really be continuous. The second form of a timing circuit (Fig. lob) involves a local operation for timing control. In this model, one of the loops is assumed to determine timing information. The other loop(s) are involved with other, unrelated computations. The common site for integrating the different loops can yield the unintuitive prediction that timing deficits may be observed whether or not the lesion is in the timing loop. This would occur whenever the lesion is positioned prior to the site of integration. For the sake of argument, imagine that the motor cortex has just sent a signal down the pyramidal tract which triggers a key press. This signal simultaneously initiates the process needed to determine the next response. A number of different procedures or operations are then invoked. For instance, suppose the cerebellum is called upon to determine the time at which the response can be made and the basal ganglia provide some other (unknown) parameter input. When all of the computational outputs of the various procedures are returned to the motor cortex, the next response is triggered. Once this occurs, the cycle is repeated and each procedure is again executed. The important point to note from this simplified example is that deficits which affect any of the procedures or disrupt the system at any point prior to the triggering of a response will
Modular Analysis of Timing in Motor Skill
213
a. Delay line circuit
b.
Circuit with local clock operation
Fig. 10. Two possible forms of a timing circuit.
contribute to increased timing variability. The affected structures may not play a direct role in timing control, but they can induce added noise in the timing circuit, since the different operations can not be called until the preceding response has been initiated. The case studies reported above do not allow us to differentiate between the two types of models depicted in Fig. 10. A critical test would involve testing the patients with the perception-of-time task as well as the tapping task. Since the correlational work suggests that the two tasks involve a common timing process, we should find that only a select group of patients are impaired on both tasks if the timing component is restricted to a single neural system. The perception task, however, does not lend itself to case study methodology. For instance, auditory projections to cortical and subcortical regions are not well lateralized (and thus we cannot compare performance between an affected and unaffected side as we have with the tapping task).
214
Steven W. Keele and Richard I. Ivry
Therefore, we have switched to a group-study design for the perception task, and these results will be reported elsewhere (but see Ivry, 1986). Briefly, this research leads us to believe that the cerebellum is the critical neural structure in timing functions. The cerebellar patients consistently demonstrate a deficit in the perception-of-time task, whereas Parkinson and cortical patients do not differ from healthy control subjects. This perceptual deficit observed in the cerebellar patients is specific to time-based judgments, since these same patients are not impaired in a perceptual task in which the loudness of tone pairs is compared. Taken together, the tapping and perception results indicate that the operation of an internal clock can be localized within the cerebellum.
VII. Other Approaches to Modularity In addition to the use of correlational methods and neurological analysis, we have made use of two other methods for studying a timing module. One method exploits time-sharing paradigms. The other exploits functional similarity. Each of these is described below. A. ONECLOCK OR MANY CLOCKS? AN ANALYSIS OF TIMESHARING The case studies reported in the previous section have demonstrated that certain patients show a deficit in the timing of repetitive movements. Most interesting is the case study of the cerebellar patient YOU. The Wing and Kristofferson (1973a) method of analysis primarily attributed her increased variability to a deficit in the central timekeeping process when tapping with her left hand in comparison to her right-handed performance. Other studies (Wing et al., 1984; Ivry, 1986) have also found large differences in the clock estimates in single-subject comparisons between an impaired and unimpaired hand. One implication of these results is that there appear to be separate clock processes involved in left- or right-hand movements. However, this inference stand in opposition to the very findings which motivated the patient research. The motivation for the neuropsychological branch of our research program came from the correlational work which supported the notion of a single, multipurpose clock. Not only was the correlation across effectors positive, but the correlation between the production and perception of time was also significant. It is possible to propose models which could potentially serve as a basis for unifying these apparent discrepant results. For instance, there may be many clocks, but all with similar characteristics. People who have a good clock for finger movements could also have good clocks for foot or arm movements or for use in perceptual tasks. This hypothesis is a variant of the
Moddar Analysis of Timing in Motor skill
215
general-module logic discussed earlier, but the module is limited to a single operation whose implementation is redundant. The multiple clocks could either be scattered throughout the nervous system or all located within a single neural structure. Another possibility is that there is only a small set of clocks such as one which can be invoked for movements involving one side of the body and another for movements on the opposite side. The two may be jointly utilized in perceptual tasks in which the stimuli are presented bilaterally. The operating characteristics of this joint operation may be some sort of average of the two processes operating independently. Thus the perception and production correlations come about since the same timing system is at least partially involved in each task. The questions raised in the preceding paragraphs have been addressed in studies from a number of laboratories over the last 10 years. In this section, we review some of this work as representative of a third approach to the study of timing. The paradigms used in this work rely on more-traditional experimental methods as opposed to the correlational and patient work discussed above. These studies have been especially useful in illuminating some of the constraints underlying temporally based behavior. Not all of the research on timing has favored the notion of a single timing system. For example, Kolers and Brewster (1985) report a series of experiments which compared subjects’ ability to make repetitive finger taps as a function of stimulus modality. At the beginning of each trial, a pacing stimulus indicated the target response frequency. The pacing stimulus was either auditory, visual, or tactile. The results showed differences as a function of modality. Specifically, subjects were most accurate when the pacing stimulus was a tone. The advantage of this stimulus could be seen in either the deviation of the produced frequency from the target frequency or in the variability of the interresponse intervals. However, it is unclear whether the results indicate that different clocks are being entrained by the separate modalities as the authors wish to claim, or whether there are differences between the modalities in terms of afferent latencies or access to a common timing process. A more direct assay of timing constraints is encompassed in the work of Klapp and his associates (1979, 1981; Klapp, Hill, Tyler, Martin, Jagacinski, & Jones, 1985). These investigators employed a dual-task methodology in a series of studies in which the simultaneous performance of two tasks was compared to performance on each of the tasks performed separately. The underlying premise of this paradigm is that if performance is impaired under the dual-task conditions, the two tasks must share some processing resources. Manipulating the relationship between the two tasks allows the experimenter to examine the nature of the shared resources. In Klapp’s (1979) first study with this methodology, the subjects were instructed to press and release a response key in synchrony with a tome. In the
216
Steven W. Keele and Richard I. Ivry
dual-task conditions, the second hand performed a similar task, but the duration of the release segment was varied. The crucial finding was that, when two responses were required, the relation between the two effectors largely determined the amount of dual-task interference. If the two responses were identical, the subjects demonstrated no interference. In contrast, large interference effects were observed if the two response cycles had a different period. Klapp (1979) further showed that the two responses could be out of phase and still produce no interference as long as the periods were identical. The fact that the temporal relation between the two responses is critical suggests a limitation in terms of the subjects’ ability to simultaneously generate more than one temporal pattern. The constraints observed in the dual-task situation appear to stem from a specific structural source of interference such as a shared timing process, rather than some other shared process. In the experiments discussed above, different tone frequencies were used to identify the two response periods. In the third experiment reported by Klapp (1979), the second stimulus was presented visually. If the interference between response streams differing in period stem from the demands created in differentiating between the two tones, the task should become easier when the pacing stimuli are processed through separate modalities. The results, however, showed that the twomodality condition was in fact harder than the single-modality condition. Another possible source of interference might be the result of special constraints involved in using homologous muscle groups. Klapp (1981) tested this hypothesis by repeating the same stimulus conditions but requiring that one response be manual and the other vocal. For the vocal reponses, the subjects were instructed to repeat the syllable la in synchrony with a second pacing stimulus. The results were essentially identical to those in the twohanded experiments. Dual responses which had different periods produced large interference effects. In contrast, when the two responses cycled with the same period, no interference was found regardless of whether the two responses were synchronized or were out of phase. Moreover, neither of these conditions showed any interference when compared to the single-task control condition. Thus, the subjects were able to perform two tasks as well as a single task if the timing requirements were harmonically related between the two tasks. An alternative tack was used in another experiment (Klapp et ul., 1985, Experiment 2) for investigatingwhether the constraints arise as a consequenceof a hypothetical linkage between any sets of muscles simultaneously employed for different tasks. This experiments was quite similar to the two-handed study (Klapp, 1979), but all of the responses were performed with the same effector. That is, the subject attempted to concurrently maintain two rhythms with one hand. Dual-task interference was evident in both the one-hand and two-hand conditions when the two rhythms were incompatible (i.e., different periods).
Modular Analysis of Timing in Motor Skill
217
While it may intuitively sound more difficult to tap two rhythms with one hand than with two, the subjects were able to correctly produce the sequence of responses on 3 1Yo of the one-hand trials, whereas the comparable figure for two-handed tapping was only 5 % . Klapp et at. (1985) also examined the generality of these temporal constraints. Although the research reviewed above showed that the constraints were not limited to movements involving homologous muscles, the dualtask results might be interpreted as showing that interference occurs at some point in the motor-control system when two responses are to be simultaneously controlled. Klapp et al. (1985) examined whether similar temporal constraints would be apparent in dual tasks in which the response requirements are minimized and the subject must instead monitor two different stimulus patterns. Thus the research questions whether the effects are purely motoric, or whether analogous interference effects can be seen in perception tasks. If, as our correlational work has shown, a common timing process is used in both motor and perceptual tasks, then constraints which arise from this process should be evident in either type of assessment. The task required the subject to monitor two sequences. One sequence was presented auditorily and the other visually. Each sequence alternated between stimulus-on and stimulus-off segments. The subjects’ task was to respond as quickly as possible to the termination of either sequence. Only one sequence was terminated on a given trial. In two of the conditions, the periods of the two sequences were identical but the phase was either identical or shifted. In the third condition, the periods were different for the two conditions. Reaction time to indicate the termination of a sequence incurred a much larger dual-task decrement for the different period sequences in comparison to either the idential or phase-shift sequences. In summary, Klapp has shown a number of conditions in which dual-task limitations arise because the two tasks require the processing of temporal patterns of different periodicity. Timing differences were shown to be a prerequisite for observing these limitations. Specifically, control conditions in which the response or stimulus conditions were similar did not produce comparable amounts of interference when the temporal patterns were of the same period. The constraints are not restricted to movements involving homologous muscle groups. Moreover, Klapp et at. (1985) demonstrated that the constraints are also evident in perceptual tasks. They appear to reflect a more general principle of organization which encompasses any aspect of cognition in which timing is needed. In contrast to dual-task performance with different stimulus or response periods, Klapp found phase shifts did not produce interference. Interestingly, the phase shifts do not appear to have to be to points which might constitute beats of the total period, but can involve more arbitrary phase shifts as long as the two periods are the same. This observation has not been
218
Steven W. Keele and Richard 1. Ivry
entirely supported in other research. Yamanishi, Kawato, and Suzuki (1980) found that identical phases and 180" phase shifts were easier to produce than other phase shifts in two-handed tapping. We have also utilized dual-task logic in our laboratory. Pokorny (1985) provided a more direct method for studying the interaction of timing in motor and perceptual tasks. Subjects were required to tap out a short sequence of responses. The sequence involved two continuous cycles in which the successive target intervals were 800, 400, 1200, and 800 msec. This sequence was first demonstrated by computer-generated tones, and the subject then attempted to reproduce it by tapping out the eight intervals. On experimental trials, a click pair was presented during the tapping of each of the four 800 msec intervals. One pair was louder than the other three, and one pair was separated by a longer interval. In one condition the subjects made duration judgments by indicating which click pair was separated by the longest interval. In a second condition they judged which pair was loudest, and a third condition required no perceptual decision. No click pairs were presented on control trials. The results can be summarized quite easily. There was no difference in terms of the magnitude of the dual-task decrement between the two perception tasks. Tapping variability did increase in the dual-task condition, but equally across all three conditions. This increase was found to occur soleley in those intervals in which a click pair was presented (the 800 msec intervals) and was accomplished by a systematic lengthening of these intervals. In fact, the effects were identical in the dual-stimulation conditions regardless of whether the subject was required to make a perceptual judgment. Nonetheless, other results of the study indicate that the source of this interference may be due to increased demands on a timing process. Variations of nontemporal properties of the tones such as intensity had no influence on tapping. The mere presence of the tones was sufficient to increase the interval duration, In contrast to this lack of effect of intensity variations, tapping performance did vary as a function of the temporal position of the click pairs within the 800 msec interval. These results suggest that perceptual events are registered in time and do not disrupt the tapping merely because they are distracting. Instead, the degree of disruption depends on their temporal placement. Furthermore, the finding that the same changes were evident even when the tones were ignored emphasizes that this registration process is automatic. In summary, the experiments reviewed in this section demonstrate some of the constraints which may limit behavior that involves the operation of an internal timing process. The experiments, especially the most recent work of Klapp et al. (1985), provide an alternative methodology for investigating shared characteristics of a timing process which is invoked in either motor or perceptual tasks. The most convincing finding which can be
Modular Analysis of Timing in Motor Skill
219
garnered from these studies is that we are extremely poor in being able to concurrently produce or monitor two unrelated rhythms. This same principle was noted by Lashley (1951) when he pointed out that rhythm tends to spread out into almost every other concurrent activity. At first glance, this principle may seem to provide an answer to the question raised at the beginning of this section: Is there a single clock or many clocks? The all-encompassing nature of rhythm would appear to favor a single-clock model. However it is possible that an independent process is the source of the constraint. This hypothesis is implicit in models which assume that timed behavior is the result of oscillators which are entrained in dualtask movements (e.g., Kelso & Scholz, 1985). Note that these models place the constraint within the motor system. The results cited above argue for a more abstract source which is involved in both perceptual and motor functions. Although the results observed with dual tasks are consistent with a model in which temporal properties of each task interact within a timer, it is possible the results can be accounted for in another way. Consider the loop model outlined in Fig. lob. This model involves a local timer which is nested within a larger control system. The timer is activated by some process, and the timing instructions are then relayed to this control system. If the control process was constrained to handle single inputs from the computational subsystems (i.e., the clock), then the timing constraints would not necessarily reflect the operation of a single timing process, but rather a limited control process. The model in Fig. 10b would, of course, have to be modified to account for the perception results in the dual-task experiments. The control system could no longer be viewed as solely involved in movement control, but would require more general capabilities. Additional research is needed to determine whether the dual-task results reflect interactions within a timer or at a higher level in the control sytem. One final, speculative comment on the source of constraint: Suppose the constraint stems from some higher-level control process. It would seem reasonable to assume that such a process would be cortical in nature. If this were the case, one might predict that split-brain patients would show less interference than normal subjects in simultaneously generating two rhythms. In fact, Geschwind and Kaplan (1962) report a patient who had undergone surgery which resulted in extensive severing of the corpus callosum. Although they provide few details, the patient was able to perform repetitive, bimanual movements at different speeds. This result would suggest the operation of two different clocks which are no longer constrained to function with a common output by some higher-order process. When measured independently, the unilateral Parkinson patient reported by Wing et al. (1984) also showed large differences in the clock estimates between her two hands. However, when this subject was asked to tap bimanually, the
220
Steven W. Keele and Richard I. Ivry
two estimates became identical. The bimanual clock estimate was between that obtained when each hand was tested separately. In this case the process which requires temporal constraint is powerful enough to lead to an improvement in dual-task performance.
SIMILARITY B. FUNCTIONAL One indication of whether a common system underlies performance on two different tasks is whether the tasks respond similarly to variables. We have used an analysis of functional similarity between perception and production to investigate timing processes (Keele, Nicoletti, Ivry, & Pokorny, 1987b). Besides providing additional support for a timing module common to perception and production, the analysis of functional relations addresses questions regarding the nature of the timing system. Specifically, is the timer best conceived as a persistent pacemaker or as an interval timer? By persistent pacemaker, we mean an internal timing device which emits a periodic beat in a manner similar to a metronome. Once the beat begins, it serves as a benchmark against which to judge the timing of other events. In contrast, interval timers, such as a stopwatch, contain a circuit that measures time. The circuit can start at arbitrary times to produce the desired signal. As with a stopwatch, an interval timer can be started at any point and it can meter out time without being dependent on synchronization with a periodic beat. Our neurological work favors the conception of an interval timer in which duration is determined by the length of a delay line as opposed to a timer in which beats arise from an oscillator. To recapitulate, several neurological structures contribute to clock variance when timing variance is decomposed into its underlying constituents in accordance with the Wing and Kristofferson model. This implies that the diverse structures are all nested within a system which constitutes the total clock. If the cerebellum were the source of a periodic signal, then by the Wing and Kristofferson logic, everything after the signal pulse would appear as part of the implementation process. Since that does not occur, we have suggested that the cerebellum acts to vary the delay between input from one part of the circuit and output to another part. A study by Schulze (1978) raised a serious concern about the delay-line conception and prompted our own studies of the issue. We describe his experiment in some detail, since two of our experiments were essentially replicates. Subjects listened to a series of seven tones that formed six intervals. The task was to determine whether all of the intervals were equal or whether some differed. We denote the isochronous sequence t t t t t t, where t is 300 msec. There were three types of sequence in which at least one interval deviated:
Modular Analysis of Timing in Motor Skill
22 1
1. t t t + t + t + t + , where + means that the interval is longer by 15 msec (Schulze also examined a 10-msecincrement and found similar results.) 2 . t t t + t t t 3 . t t t + t - t t , where - means the interval is shorter by 15 msec . The question is which of the three anisochronous sequences-1,2, or 3is the easiest to distinguish from the isochronous sequence. According to one theory, if judgments are based on a comparison of successiveintervals, Sequence 3 should be the easiest, because there are three points where adjacent intervals differ from one another. Furthermore, one of the comparisons of Sequence 3 involves an especially large difference of 30 msec. Sequence 2 should be next in ease because it contains two points where adjacent intervals differ. Sequence 1 should be hardest because a difference in adjacent intervals occurs only at one place. A second theory suggest that subjects store a record of the first two intervals, which always are standard, and then compare other intervals to that record. By this theory, Sequence 1 should be easiest and 2 the hardest, because four subsequent intervals differ from the standard in Sequence 1, whereas only one interval differs in Sequence 2. Sequence 3 should be intermediate in difficulty, since two intervals differ from the standard, one in the positive and one in the negative direction. A third theory posits that the initial three tones, which define the first two intervals, initiate a periodic internal beat that continues. Subsequent tones are then judged by whether they are early or late compared to the internal beat. Note that this theory does not compare intervals as much as it compares synchrony of events with beats. Like the second theory, it predicts that Sequence 1 should be easiest. The reason is that the first increment puts the fourth tone 15 msec out of synchrony with the internal beat. The second increment increases the asynchrony to 30 msec. The third increment increases the asynchrony to 45 msec, and the final one to 60 msec. Note that this theory assumes that tones subsequent to the initial beat-setting tones do not influence the beat and instead are simply compared to the beat for synchrony. In contrast to the second theory that involves the establishment of a standard interval for comparison, the beat theory predicts Sequence 3 to be more difficult than Sequence 2. For Sequence 2, the first increment produces a 15 msec asynchrony with the internal beat, and that asynchrony is maintained with subsequent tones. For Sequence 3, the initial asynchrony introduced by the incremented interval is canceled by the subsequent decremented interval. Schultze’s results favored the third theory, the internal beat model, because Sequence 1 was the easiest to discriminate from all equal intervals, followed by Sequence 2 and then Sequence 3. However, the difference between Sequences 2 and 3 was marginal from a statistical viewpoint.
222
Steven W. Keele and Richard I. Ivry
If the internal beat theory were correct, it would be consistent with a notion of a local circuit or system that would oscillate, producing a periodic output. The only way in which the beat theory would be consistent with our notion of a timer involving multiple neural structures would be if the total circuit through those structures reverberated periodically. Two experiments that we conducted to further investigate Schulze’s findings involved his basic paradigm. The first experiment had one sequence with isochronous intervals of 300 msec. Three other sequences were as designated above for Sequences 1, 2, and 3, except that any increments or decrements were 20 msec. Again, subjects were asked to judge whether all the intervals were equal. This paradigm essentially replicates Schulze’s. However, the results yielded a critical difference. In our experiment, Sequence 1 was the easiest, Sequence 3 was next easiest, and Sequence 2 was the hardest. All differences were statistically significant (recall that in Schulze’s study the difference between Conditions 2 and 3 was not clearly reliable, although the data suggested that Sequence 2 was easier than Sequence 3). Our ordering of results is consistent, not with the beat theory, but with the theory that suggests that a standard time interval is stored and then compared to others. It seems surprising that the contrast between the incremented and decremented intervals in Sequence 3 does not make discrimination easy. Nicoletti and Keele (1987) showed that if increments or decrements are in the intensity or pitch of the tones, then such a contrast is readily apparent, and Sequence 3 is indeed the easiest condition for appreciating differences. Adjacent differences in times, on the other hand, do not appear to have a special status. Since our study had a different outcome than Schulze’s study, further replication was needed. Our second experiment, which incorporated one important change, was undertaken for this purpose. In this experiment, as in the earlier one, the first three tones were separated by 300 msec, providing the opportunity set up an internal beat. After the first three tones there was a pause before the remaining train of tones. On some trials the pause was 540 msec, so that the next tone would occur 60 msec before a hypothetical internal beat. On other trials the pause was 660 msec, so that the next tone would occur 60 msec after an internal beat. After the pause, all the remaining intervals were either 300 msec or altered by 25 msec. They were all incremented by 25 msec on Sequence 1 trials. Only the first interval was incremented on Sequence 2 trials, and the first was incremented and the second decremented for Sequence 3. Note that for Sequence 1, when the pause was 540 msec, the asynchrony between an internal beat and a tone would decrease with each interval until there would be a slight reversal in the direction of asynchony on the final tone. In contrast, with the 660 msec pause the asynchrony should grow continuously. Thus, the internal beat
Modular Analysis of Timing in Motor Skill
223
theory predicts that Sequence 1 would be difficult to discriminate from equality with the pause of 540 msec but easy with the pause of 660 msec. Again note that the theory assumes that after the beat is established, subsequent tones do not alter it. If this assumption is not granted, the internal beat theory makes no sense at all in terms of generating predictions. The results of the study completely replicated our other study: Sequence 1 was easiest, followed by Sequence 3, and then Sequence 2. Again, all differences were statistically significant. Moreover, there was no difference in ease of the Sequence 1 conditions with the short or long pause. Clearly, our results are not in agreement with a theory that posits an internal beat that persists after an initial train of periodic events and serves as a basis for temporal judgments. Instead, they are consistent with the view that grew out of our neurological studies that suggested the cerebellum inserts a delay line in a circuit linking several brain areas. We have also employed the concept of functional similarity to see whether conclusions that apply to pure perceptual timing also apply in instances involving motor production. Our two studies reported above involve all perceptual events with no timed motor activity. In a third experiment in the Keele et a/. (1987b) study, subjects listened to a train of five tones occurring either every 400 msec on some trials or 450 msec on other trials. Either 500 or 550 msec after the onset of the last tone the word TAP appeared on an oscilloscope screen. The subjects would then make two taps, attempting to reproduce the target interval. Although we studied a variety of conditions, the one of concern here required the subjects to begin the first tap as soon as possible after the signal to tap. Suppose that the initial tones set up an internal beat that was used to trigger the taps. This would mean that the reaction time to begin the first tap would depend on the amount of time between the signal to tap and the onset of the next internal beat. Thus, reaction times would differ between the 400and 450-msec paces, since with a 450 msec pace subjects would have to wait longer for the next internal beat. Also, the reaction times would differ between the 500- and 550-msec delays between the last tone and the tap signal, because with the longer delay, subjects would have less time before the next internal beat. Finally, one might expect to find a multimodal reaction-time distribution, because if there was not enough time to prepare a response by the time of the first internal beat, the subject would then have to wait for the next one. Under the instructions to begin tapping as soon as possible, none of the predictions that follow from the internal beat theory was borne out. Instead it appears that subjects can begin the timing of the response at arbitrary times after the end of a previous train of events. It is as though subjects extract the interval from the period train and can then reproduce that interval starting at arbitrary points. This is consistent with an interval timer rather
224
Steven W. Keele and Richard I. Ivry
than a persistent pacemaker. It does not appear, therefore, that periodic events entrain an oscillator which then persists after the periodic stimulus events. Such a conclusion is consistent with our analyses of neurological patients. With regard to the issue of modularity, the fact that we obtain functional similarity in the results involving purely perceptual decisions and those involving the translation of perceptual timing into motor production suggests that the same timing module is used in both perceptual and motor timing.
VIII. Conclusions Two themes have dominated this article, a general one and a more specific one. The general theme concerns whether the motor control system is composed of independent modules in which each module performs specialized computations. Although a given module is specialized, its domain of operation is quite broad in the sense that it can be drawn on by various effectors and is employed in nonmotor activities as well. The second more specific focus of this article is on the hypothesized existence of a timing module with subsidiary discussion of force control and maximum rate of movement. On the whole, our data argue strongly for a timing module that is separable from a force module. Indirectly the data also support the utility of the general modular approach. The modular hypothesis presented in this article may appear obvious and perhaps naive. Indeed, is there really an alternative to a modular view of brain organization? Framed in the terms of our original examples of piano playing and gymnastics, would anyone seriously argue that one section of the brain becomes organized specifically for one task to the exclusion of the other? In fact, a task organization viewpoint is a common one, at least for speech. It often has been suggested that parts of the human brain have evolved for language and speech. Recent work by Poizner, Klima, and Bellugi (1987) has suggested, however, that localization of some language functions in humans cannot be attributed to the evolution of specialized apparatus for speech. They examined deaf patients who had been proficient in American Sign Language prior to cerebral damage. Following stroke, these patients showed disruptions of language analogous to Wernicke’s aphasia (a decline of semantic content) and Broca’s aphasia (decline in syntactic content) in speaking patients. Moreover, there is a rough correspondence of the locations of areas causing the two distinct pathologies of Broca’s and Wernicke’s aphasia in the speaking and deaf subjects. While the results of Poizner and colleagues do not argue against brain areas specialized for language, they do argue that such areas are not tied to a particular output modality. Thus, their work supports a conclusion at least consistent with a modular view.
Modular Analysis of Timing in Motor Skill
225
In addition to speech, it sometimes has been suggested that other parts of the brain have become specialized for other specific domains of behavior. Rozin (1976) speculated that in initial evolutionary development, brain systems arise for very specific and prewired purposes. Visual properties that signal particular foods, for example, may become linked in the course of evolution to certain food-related behaviors. The idea of specialized adaptive systems is frequently championed by ethologists. However, Rozin proposes that the fundamental achievement of human evolution involves a liberation of various systems from their initial purposes so that they can become accessible to other systems. One such system, he argues, is a phonetic module, which has become accessible not only to auditory input in speech but also to visual input in the act of reading. His idea of systems accessible to various inputs and outputs is similar to our modular one, and we believe these ideas address a genuine issue. We have been primarily concerned in our research with a timing module. Although extensive research has concerned itself with the perception of long periods of time, such as minutes, hours, or days, relatively little work has been done on timing in a range relevant to the control of motor activity. Within this shorter time range, the issue of whether a modular timing system might underlie the control of different effectors and perception has received scant attention (but see’Wing, 1980). Neuroscientists, however, have long recognized the possibility that specific neural systems may have evolved to control timing functions. In fact, the architecture of the cerebellum (Braitenberg, 1967), effects of cerebellar cooling (Conrad & Brooks, 1974), and the clinical signs of cerebellar dysfunction (Dichgans & Diener, 1984) have pointed to timing as one function of this neural structure. Nonetheless, experimental evaluation of the timing hypothesis has been notably missing. One problem has been the lack of a suitable methodology for differentiating implementation and clock contributions to timing. The Wing and Kristofferson model provides such a methodology. In addition, no one concerned with the issue of brain systems involved in timing seems to have asked whether the same system used for motor timing also provides the mechanism for perceptual timing. We hope that our work implicating a variety of neural systems, and the cerebellum specifically, in timing will encourage new work at the physiological level. Finally, work on individual differences in skill appears to have reached a low ebb. We suspect that a reason for this state of affairs is that the study of individual differences has too often taken place outside the guiding framework of cognitive psychology with its strong contributions of theory and method. In addition, a knowledge-base (i.e., task-specific) conception of individual differences has been very successful, and perhaps that success led to neglect of the analysis of abilities. A modular analysis of skill not only makes use of individual differences as a method for defining modules but may be instrumental in bolstering the study of individual differences it self.
226
Steven W. Keele and Richard 1. Ivry
ACKNOWLEDGMENTS We wish to thank Daniel Corcos, Bettina Debu, Diane Manchester, Ik So0 Moon, Roberto Nicoletti, and Robert Pokorny for their extensive contributions to the research reported in this article. The generous help of Dr. Chris Diener and Dr. Robert Rafal in obtaining and evaluating patients was invaluable. In addition, the research funds provided by National Science Foundation Grant BNS8119274 and Office of Naval Research Contract N00014-83-K-0601are gratefully acknowledged.
REFERENCES Ackerman, P. L., Schneider, W., & Wickens, C. D. (1984). Deciding the existence of a a time-sharing ability: A combined methodological and theoretical approach. Human Factors, 26, 71-82. Allan, L. G. (1979). The perception of time. Perception and Psychophysics. 26, 340-354. Allard, F., Graham, S., & Paarsalu, M. E. (1980). Perception in sport: Basketball. Journal of Sport Psychology, 2, 14-21. Allen, G . , & Tsukahara, N. (1974). Cerebrocerebellar communication systems. Physiological Reviews, 54, 957-1006. Book, W. F. (1924). Voluntary motor ability of the world’s champion typists. Journal of Applie Psychology, 8 , 283-308. Braitenberg, V. (1967). Is the cerebellar cortex a biological clock in the millisecond range? Progress in Brain Research, 25, 334-336. Brooks, V., & Thach, W. (1982). Cerebellar control of posture and movement. In V. Brooks (Ed.), Handbook of Physiology: Motor control. Washington, DC: American Physiology Society. Chase, W. G., & Simon, H. A. (1973). The mind’s eye in chess. In W. G. Chase (Ed.), Visual information processing. New York: Academic Press. Cheatham, P. G., &White, C. T. (1954). Temporal numerosity: 111 auditory perception of numbers. Journal of Experimental Psychology, 47, 425-428. Conrad, B., & Brooks, V. (1974). Effects of dentate cooling on rapid alternating arm movements. Journal of Physiology, 31, 792-804. Corcos, D. M., Keele, S . W., Woollacott, M. H., & Pokorny, R. (1984). EMG anarysis of rapid reciprocal movements ofthe arm. Unpublished manuscript presented at the Olympic Scientific Congress, University of Oregon. Delong, M., & Georgopoulos, A. (1981). Motor functions of the basal ganglia. In V. Brooks (Ed.), Handbook of physiology: Motor control. Washington, DC: American Physiology Society. Dichgans, J., & Diener, H. (1984). Clinical evidence for functional compartmentalization of the cerebellum. In J. Bloedel, J. Dichgans, & W. Prechts (Eds.), Cerebellarfunctions @p. 126-147). Berlin: Springer-Verlag Ericcson, K. A., & Chase, W. G. (1982). Exceptional memory. American Scientist, 70, 607-61 5 . Fitts, P. M., & Peterson, J. R. (1964). Information capacity of discrete motor responses. Journal of Experimental Psychology, 61, 193-1 12. Fleishman, E. A. (1966). Human abilities and the acquisition of skill. In E. A. Bilodeau (Ed.), Acquisition of skill. New York: Academic Press.
Modular Analysis of Timing in Motor Skill
227
Fleishman, E. A., &Rich, S. (1963). Role of kinesthetic and spatial-visual abiities in perpetual motor learning. Journal of Experimental Psychology, 66, 6-1 1. Freund, H. J. (1983). Motor unit and muscle activity in voluntary motor control. Physiological Reviews, 63, 387-436. Gentner, D. R. (1981). Skilled finger movements in typing. La Jolla, CA: University of California, San Diego, Center for Human Information Processing (CHIP Report 104). Geschwind, N., & Kaplan, E. (1962). A human cerebral deconnection syndrome. Neurology, 12, 675-685. Gibbon, J., & Allan, L. (Eds.). (1984). Timing and time perception. Annals of the New York Accademy of Sciences, 423. Goldberg, G. (1985). Supplementary motor areas structure and function: Review and hypothesis. Behavioral and Brain Sciences 8, 567-616. Grillner, S. (1981). Control of locomotion in bipeds, tetrapods, and fish. In V. Brooks (Ed.), Handbook of physiology: Section I : The nervous system. Vol. 11: Motor control, Part 2 . Baltimore: American Physiological Society. Heglund, N. C., Taylor, C. R., & McMahon, T. A. (1974). Scaling stride frequency and gait to animal size: Mice and horses. Science, 186, 1112-1113. Hick, W.E. (1952). On the rate of gain of information. Quarterly Journal ofExperimental PSyChOlOgy, 4, 11-26. Hyman, R. (1953). Stimulus information as a determinant of reaction time. Journal ofExperimental Psychology, 82, 403-404. Ivry, R. (1986). Components of coordination: The cerebellum as an internal clock. Unpublished doctoral dissertation, University of Oregon, Eugene. Jones, M. B. (1966). Individual differences. In E. A. Bilodeau (Ed.), Acquisition of skill (pp. 112-144). New York: Academic Press. Keele, S. W. (1981). Behavioral analysis of movement. In V. Brooks (Ed.), Handbook of Physiology: Section I: The nervous system, Vol. II. Motor control, Part 2. Baltimore: American Physiology Society. Keele, S. W. (1986). Motor control. In J. K. Boff, L. Kaufman, & J. P. Thomas (Eds), Handbook of human perception and performance, Vol. II. New York: Wiley. Keele, S. W., & Hawkins, H. L. (1982). Explorations of individual differences relevant to high level skill. Journal of Motor Behavior, 14, 3-23. Keele, S. W., Ivry, R., & Pokorny, R. (1987a). Force control and its relation to timing. Journal of Motor Behavior, 19,96-114. Keele, S . W., Nicoletti, R., Ivry, R., & Pokorny, R. (1987b). The nature of human timing in perceptual judgments and motor production: Pacemaker or interval times. Submitted. Keele, S. W., Pokorny, R. A., Corcos, D. M., & Ivry, R. (1985). Do perception and motor production share common timing mechanisms: A correlational analysis. Acta Psychologica, 60, 173-191. KeIso, J., & Scholz, J. (1985). Cooperative phenomena in biological motion. In J. Haken (Ed.), Synergetics of complex systems in physics, chemistry and biology. Berlin: Springer-Verlag. Klapp, S. (1979). Doing two things at once: The role of temporal compatability. Memory & Cognition, 7 , 375-381. Klapp, S. (1981). Temporal compatability in dual motor tasks. 11: Simultaneous articulation and hand movements. Memory & Cognition, 9, 398-401. Klapp, S., Hill, M., Tyler, J., Martin, Z., Jagacinski, R., & Jones, M. (1985). On marching to two different drummers: Perceptual aspects of the difficulties. Journal of Experimental Psychology: Human Perception & Performance, 11, 814-828. Kolers, P., & Brewster, J. (1985). Rhythms and responses. Journal of Experimental Psychology: Human Perception & Performance, 11, 150- 167.
228
Steven W. Keele and Richard I. Ivry
Lashley, K. (1951). The problem of serial order in behavior. In L. Jeffress (Ed.), Cerebral mechanisms in behavior. New York: Wiley. Marteniuk, R. G . (1974). Individual differences in motor performance and learning. In J. H. Wilmore (Ed.), Exercise and sports sciences review, Vol. 2. New York: Academic Press. Michon, J. A. (1967). Timing in temporal tracking. Soesterberg, The Netherlands: Institute for Perception-TNO. Michon, J. A., & Jackson, J. L. (Eds.), (1985). Time, mind, and behavior. Berlin: SpringerVerlag. Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J. R. Anderson (Eds.), Cognitive skills and their acquisition. Hillsdale, NJ: Erlbaum. Nicholson, T. E. (1925). Increase and decline in speed and control of voluntary motor behavior. Unpublished doctoral dissertation, Indiana University, Bloomington. Nicoletti, R., & Keele, S. W. (1987). Confront0 di differenti modelli sul Riconoscimento di sequenze di intervalli temporali. Submitted. Poizner, H., Klima, E. S., & Bellugi, U. (1987). What the hands reveal about the brain. Cambridge, MA: Bradford/MIT Press. Pokorny, R. (1985). Searching for interaction between timing of motor tasks and timing of perception tasks. Unpublished doctoral disseration, University of Oregon, Eugene. Posner, M. I. (1978). Chronometric explorations of mind. Hillsdale, NJ: Erlbaum. Posner, M. I., Inhoff, A. W., Friedrich, F. J. & Cohen, A. (1987). Isolating attention systems: A cognitive-anatomical analysis. Psychobiology, 15, 107-121. Rozin, P . (1976). The evolution of intelligence and access to the cognitive unconscious. In J . M. Sprague & A. N. Epstein (Eds.), Progress in psychobiology and physiological psychology (pp. 245-280). New York: Academic Press. Schulze, H. H. (1978). The detectability of local and glocal displacements in regular rhythmic patterns. Psychological Research, 40, 173-1 81. Smith, 0. W. (1957). Relationship of rhythm discrimination to motor rhythm performance. Journal of Applied Psychology, 41, 365-369. Warren, R. M., & Obusek, C. J. (1972). Identification of temporal order within auditory sequences. Perception & Psychophysics, 12, 86-90. Wing, A. M. (1977). Perturbations of auditory feedback delay and the timing of movement. Journal of Experimental Psychology: Human Perception and Performance, 3, 175-186. Wing, A. M. (1980). The long and short of timing in motor sequences. In G. E. Stelmach & J. Requin (Eds.), Tutorials in motor behavior. Amsterdam: North Holland. Wing, A. M., Keele, S. W., & Margolin, D. (1984). Motor disorder and the timing of repetitive movements. In J. Gibbon & L. Allan (Eds.), Timing and time perception. Annals of the New York Academy of Sciences, 423. Wing, A. M., & Kristofferson, A. B. (1973a). Response delays and the timing of discrete motor responses. Perception and Psychophysics, 14, 5-12. Wing, A. M., & Kristofferson, A. B. (1973b). The timing of interresponse intervals. Perception and Psychophysics, 13, 455-460. Woodrow, H. (1951). Time perception. In S. S. Stevens (Ed.), Handbook of experimental psychology. (pp. 1224-1236). New York: Wiley. Yamanishi, J., Kawato, M., & Suzuki, R. (1980). Two coupled oscillators as a model for the coordinated finger tapping by both hands. Biological Cybernetics, 37, 219-225.
AssocIATn7E ACCXNNJS OF CAUSALITY JUDGMENT David R . Shanks MRC APPLIED PSYCHOLOGY UNIT 15 CHAUCER ROAD CAMBRIDGE CB2 2EF, ENGLAND
Anthony Dickinson DEPARTMENT OF EXPERIMENTAL PSYCHOLOGY CAMBRIDGE UNIVERSITY CAMBRIDGE CB2 3EB, ENGLAND I. 11. 111. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contiguity and Contingency Acquisition of Causality Judgments .................................. Blocking by the Causal Background . . Retrospective Evaluation . . . . . . . . . . . Comparator The Conclusion .... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
229 236
258
I. Introduction Few would dispute that the ability to detect and learn about what Tolman and Brunswick (1935) called the “causal texture of the environment” is a major prerequisite for adaptive behavior. Not only does prediction and control of important events depend upon sensitivity to this texture, but also our causal attributions representing its structure are claimed to be major determinants of affective experience (Abramson, Seligman, & Teasdale, 1978; Weiner, 1985). In spite of the ramifications of causality detection throughout psychology, there is, at present, little consensus about the mental faculties involved in such attributions. As a result of his classic studies of mechanical causation, Michotte (1963) argued that perceptual processes could mediate a direct phenomenological experience of causation. On the other hand, contemporary attribution theory appeals largely to the role of inferential and rule-governed judgments (e.g., Hilton & Slugoski, 1986; Kelley, 1973), whereas others have emphasized the importance of knowledge of the generative or productive processes involved in causal interactions (e.g., Shultz, 1982). THE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 21
229
Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
230
David R. Shanks and Anthony Diekinson
These perceptual and cognitive approaches to causal attribution can be contrasted with a more venerable tradition, that of associationism. This perspective may be traced to Hume’s (1888) claim that “there is no relation, which produces a stronger connexion in the fancy, and makes one idea more readily recall another, than the relation of cause and effect betwixt their objects” (p. 11). According to Hume, a causal judgment is seen as reflecting no more than the strength of the relevant association between the mental representations of the cause and effect, with the principles governing such attributions being those of associative learning. There is no doubt that Hume’s thoughts on this issue have had a major impact on both philosophy and psychology. His analysis acts as the starting point for most subsequent discussions of the ontological status of causation, while associationism in general represents a major theoretical perspective in the study of learning and memory. Even so, the possible contribution of associative processes to causal attribution has received relatively little attention, and our purpose in this article is to go some way toward remedying this neglect. To the best of our knowledge, the only area of psychology that has offered an associative account of a process sensitive to causality is that of conditioning. An instrumental or operant conditioning procedure presents the subject with a causal relationship between an action and an outcome, the reinforcer; performing the action either causes the reinforcer to occur under a positive contingency or prevents its occurence under a negative one, and the subjects demonstrate a sensitivity to these causal relationships by adjusting their behavior appropriately. In fact, not only can conditioning be seen as a causalitydetectiontask from a procedural point of view, it has also been argued that the mechanism controling conditioningis sensitive specifically to causal relationships (e.g., Dickinson, 1980; Testa, 1974). Although certain theorists have interpreted conditioning in terms of the application of some concept of contingency (see Alloy & Tabachnik, 1984; Hammond & Paynter, 1983), the dominant theoretical approaches appeal to the role of associative learning (e.g., Gibbon & Balsam, 1981; Mackintosh, 1975; Pearce & Hall, 1980; Rescorla & Wagner, 1972; Sutherland & Mackintosh, 1971; Wagner, 1981). Most of these associative theories were developed to explain classical or Pavlovian conditioning rather than the instrumental or operant variety, but there are good reasons for assuming that the two types of conditioningare mediated by a common learning process (see Dickinson, 1980; Mackintosh, 1983). The specific question that we address is whether the same form of associative learning can also be extended to cover causality judgment. 11. Contiguity and Contingency
In order to extend these conditioning theories to causal attribution, all
Associative Accounts of Causality Judgment
23 1
one has to assume is that judgments of event relationships are also determined by associative processes. Although this assumption is bolstered by the correlation that is generally observed between conditioning and knowledge of the relevant event relationships (Brewer, 1974), it seems to us to be important to establish at the outset that causal judgments are affected by the factors critical for the type of associative learning seen in conditioning. Contemporary conditioning theories essentially follow Hume in identifying temporal succession and contiguity as being fundamental; conditioning reflects the accumulation of increments in the strength of the association between event representations, with each increment arising from a pairing in which the conditional stimulus or action precedes the reinforcer in time. It is true that, unlike classic learning theories (e.g., Hull, 1943), contemporary accounts are usually silent about the actual interevent interval over which an association can be formed, but all argue that the size of the increment in associative strength accruing from a pairing decreases as the contiguity is degraded. Although the importance of temporal relationships is well established in the detection of mechanical causation (e.g., Gruber, Fink, & Damm, 1957; Leslie 1982; Michotte, 1963), there is only limited evidence (e.g., Wasserman & Neunaber, 1986; Siegler & Liebert, 1974) for their role in judgments of event interactions in which the causal or generative process is not directly apparent to the subject. This is the type of causal interaction that is usually implemented by an operant contingency and therefore the one for which associative conditioning theory may have most relevance. Consequently, it seemed to us to be of importance to establish at the outset that, like conditioning, causal judgments of simple instrumental relationships are sensitive to temporal contiguity. To do this, we arranged a free-operant contingency between an action (A) of pressing the space bar on the computer keyboard and an outcome (0), the flashing of a triangle displayed on a video screen for 0.1 sec. We used a probabilistic schedule of a type known to sustain instrumental performance in animals (e.g., Hammond, 1980; Dickinson & Charnock, 1985). This schedule arranges that whenever the subject presses the space bar at least once during any 1-sec period, the outcome occurs with some fixed probability; in the absence of a press the outcome never occurs. Each subject received a randomly ordered series of conditions, each of which ended after 25 presses had occurred. In one condition, 75/0, these outcomes were presented at the end of any 1-sec interval in which a press had occurred so that there was good contiguity between the action and outcome. In a second condition 75/0@), by contrast, the outcome was delayed for 2 sec after the end of any 1-sec period in which the subject pressed the space bar. These two conditions were similar, therefore, in terms of the actual causal effect of pressing; in both cases, performing the action caused the outcome with a probability of .75. What differed was the temporal delay between the action
232
David R. Shanks and Anthony Dickinson
and outcome, which was 1 sec or less in the first condition but at least 2 sec in the other. At the outset of the experiment the subjects were told that they would be required to judge the extent to which the action caused the outcome, and at the end of each condition they were asked to rate this on a scale from 0 to 100. Zero represented the case in which the action was totally ineffective, and 100 the case in which the action always caused the outcome. As Fig. 1 shows, introducing a delay between the action and the outcome in condition 75/00) significantly reduced the ratings relative to condition 75/0 in which they were contiguous (Newman-Keuls, p < .05). This sensitivity encouraged us to take seriously the possibility that an associative mechanism may mediate this type of causality judgment. There is, however, a major problem with causal attributions based simply
##@Judgment 180
Actual A P 170
7510
7510 (D)
5
z
75175
CONDITION
Fig. 1. Mean causality judgment and actual contingency under a positive contingency with either an immediate, 7510, or delayed, 75/00). outcome and under a noncontingent schedule (75/75).
Associative Accounts of Causality Judgment
23 3
upon contiguous pairings of cause and effect. A simple contiguity-sensitive mechanism would not distinguish pairings in which the putative cause is necessary for the effect from those in which the conjunction is fortuitous or accidental. As Hume himself pointed out, contiguity alone provides no evidence about the necessity of a cause, a criterion that is central to the contemporary counterfactual concept of causation (Mackie, 1974). This point can be illustrated by the schematic presentation in Fig. 2 of two sequences of a target cause and outcome in the presence of what we call the causal background. The significance of this causal background is discussed in a later section. As both sequences contain two contiguous pairings of the events, judgments based simply upon the number of such pairings should yield the same rating for the effectiveness of the target cause. And yet there is no reason to believe that the target cause is necessary for the outcome in Sequence b; in contrast t o Sequence a, the outcome in Sequence b is just as likely to occur in the absence of the target cause as in its presence. What distinguishes the two sequences is, of course, the contingency, covariation, or temporal correlation between the events; in Sequence a there is a positive contingency between the target cause and outcome, whereas
CAUSAL BACKGROUND
TARGET CAUSE
A
L
I
OUTCOME
CAUSAL BACKGROUND
TARGET CAUSE
9
L
OUTCOME
Fig. 2. Sequence of events under a positive contingency (a) and under a noncontigent relationship (b).
234
David R. Shanks and Anthony Dickinson
these events are independent or noncontingent in Sequence b. There is, in fact, good evidence that judgments of the causal effectiveness of instrumental actions are sensitive to event contingencies (e.g., Allan & Jenkins, 1980, 1983; Alloy & Abramson, 1979; Chatlosh, Neunaber, & Wasserman, 1985; Neunaber & Wasserman, 1986; Wasserman, Chatlosh, & Neunaber, 1983), a point that can be illustrated by a third condition run in our free-operant experiment. In this 75/75 condition, presses were followed by contiguous outcomes with a probability of .75 just as in the 75/0 condition. The only difference was that in the 75/75 condition outcomes also occurred at the end of each 1-sec interval without a press with the same probability. As all subjects received the same number (25 x .75) of contiguous pairings of the action with the outcome in both of these conditions, a simple contiguitysensitive process should have yielded identical judgments in the two cases even though pressing had no effect on the likelihood of the outcome in condition 75/75. In fact, as Fig. 1 shows, our subjects were sensitive to the instrumental contigency when the number of pairings was controlled: the 75/75 schedule produced significantly lower ratings than the 75/0 condition (Newman-Keuls test p c .05). The standard way of analyzing the sensitivity of causality judgments to event contingency attempts to determine the metric that gives the best account of the ratings produced by varying the instrumental schedule. Thus, for example, in a series of experiments employing a free-operant procedure upon which ours was modeled, Wasserman and his colleagues (Chatlosh et al., 1985; Neunaber & Wasserman, 1986; Wasserman et al., 1983) demonstrated that their subjects’ judgments conformed to the contingency metric, P (Allan, 1980), produced by the difference between the conditional probability of the outcome given the action P(O/A), and the conditional probability of the outcome in the absence of the action, P(O/ -A). As a result, Neunaber and Wasserman (1986) have concluded that “our prior free-operant work indicates that subjects’ highly accurate and unbiased estimates are best attributed to their use of a proper judgment rule” (p. 164). This explanation is clearly in line with cognitive or rule-based accounts of causality judgments offered in other areas of attribution research and at variance with an associative account based upon conditioning theory. The way in which such a cognitive account would accomodate the effect of temporal contiguity is somewhat more problematic. Presumably delaying an outcome makes it possible that the subjects will classify this outcome as being one that occurred in the absence of the action rather than in conjunction with it. This classification serves to decrease the subjects’ estimates of AP both by reducing their perceived value of P(O/A) and by enhancing that of P(O/ - A). As an outcome must be classified as occuring in either conjunction or disjunction with the action, a change in one conditional probability implies a change in the other. This fact provided us with a way
Associative Accounts of Causality Judgment
235
of testing whether or not the effect of a change in temporal contiguity was in fact mediated by such an alteration in the perceived contingency. To do this, we asked subjects to judge the causal effectiveness of each of four actions, A l , A2, A3, and A4, in producing the same outcome, the flashing triangle. The actions were pressing different keys on the keyboard with assignment of specific keys to the different actions being counterbalanced across subjects. In the control condition, the subjects were simply asked to alternate between A1 and A2, both of which produced the outcome immediately on a free-operant schedule under which P(O/A) was .75 and P(O/-A) was .25. As Fig. 3 shows, when the subjects were asked to judge the causal effectiveness of each action after 240-sec exposure to the schedule, they gave ratings very close to the actual AP for both A1 and A2. In the second experimental condition, the subjects were also asked to alternate between two actions, in this case A3 and A4, which were associated with the outcome on exactly the same schedule as A1 and A2, except that outcomes produced by A4 were delayed for 4 sec. This produced the expected drop on the ratings for A4 below those for the control actions, A1 and A2 (see Fig. 3). This replicates the effect of temporal contiguity seen in the first study. The interesting question, however, concerns the judgments for A3 which produced immediate outcomes in the context of delayed outcomes generated by A4. If the effect of contiguity is mediated by a change in the perceived contingency, we should expect to have observed a decrement in the judgments not only for A4 but also for A3. According to this account, delaying the outcome for A4 does not just decrease the perceived P(O/A4) but also correspondingly increases P(O/- A4). As these delayed outcomes were unlikely to have occurred in close association with A3, they should also have served to increment P(O/ -A3), thus reducing the perceived contingency for A3 as well as A4. Figure 3 shows that this did not happen; the decremental effect of degrading the temporal contiguity between A4 and the outcome appeared to be confined to this action, with A3 yielding ratings at least as high as the control actions, A1 and A2. Thus, it appears that temporal contiguity can have a direct influence on causality judgment over and above any effect it has on the perceived contingency. This conclusion is clearly in greater accord with the operation of an associative process, directly sensitive to temporal factors, than with a cognitive approach in which such factors just affect the classification of outcomes. Moreover, our doubts about a rule-based account are not restricted to the effect of temporal contiguity alone for, as we shall see, it remains unclear that such an approach encompasses even the direct effects of event contingency.
David R. Shanks and Anthony Dirkinson
236
-60
U
-50
5 D
0
2 D
r
-40
z i
-30 2
0
-20
<
-10
D
,”
A
A2 I
A3 I
I
A4
2 0
0
I
ACTION
Fig. 3. Mean causality judgment and actual contingency for Actions 1 to 4. Actions 1 and 2 were performed concurrently, as were Actions 3 and 4. Although all actions were related to the outcome by the same positive contingency, the occurrence of the outcome was delayed after Action 4 but occurred immediately following the other actions.
III. Acquisition of Causality Judgments Although a cognitive account based upon the AP rule can account for the sensitivity to event contingency seen in the experiments of Wasserman and his colleagues, we have become dissatisfied with this approach as a result of some additional data we have gathered concerning not the terminal ratings, but rather they way in which such judgments develop with increasing experience of the instrumental contingency. These data came from a study that was similar in procedure to that used by Wasserman et al. (1983) and Chatlosh et al. (1985); the subjects were given 240-sec exposure to the freeoperant probabilistic schedule used in our previous study under each of a variety of instrumental contingencies. There were in fact only three significant differences from the procedure used by Wasserman and his colleagues. First, we employed a press of the space bar on a computer keyboard rather than a telegraph key as the action and the flash of the triangle on the computer screen rather than of a light as the outcome. Second, we sampled only a selection of the contingencies used in their studies. These were 875/125, 875/500, 875/875, 125/125, 500/875, and 125/875, where the first term in each designation refers to P(O/A) x loo0 and the second to P(O/ - AO) x 10o0. These contingencies were presented in a random order that varied
Associative Accounts of Causality Judgment
231
across subjects. In addition to including positive contingencies in which P(O/A) was greater than P(O/ -A) and noncontingent relationships in which the two conditional probabilities were identical, we asked subjects to judge negative contingencies in which P(O/A) was less than P(O/ -A), so that pressing actually decreased the likelihood of the outcome. As a result, the rating scale was extended from + 100, which represented the judgment that pressing always caused the outcome, to - 100, which corresponded to a judgment that pressing prevented the outcome. The final and critical difference was that rather than requesting a judgment just at the end of each 240-sec period, we divided this period into sampling intervals and asked for separate judgments after varying amounts of exposure to the schedule. This allowed us to track any changes in the rating with increasing experience of the contingency. These acquisition profiles, illustrated in Fig. 4, demonstrate that our subjects, like those of Wasserman and his colleagues, were sensitive to the instrumental contingencies. If we consider just the terminal judgments under each schedule, we can see that when P(O/A) was held constant at .875, judgments were systematically reduced as P(O/ -A) was increased across schedules 875/125, 875/500, and 875/875. Similarily, holding P(O/ -A) constant at 0.875 and increasing P(O/A) across schedules 125/875, 500/875, and 875/875 produced a corresponding increment in judgments. These effects of event contingency were not confounded with differences in the rate of pressing, for there were no significant differences in the number of times the subjects pressed under the various schedules. In spite of this general sensitivity to the instrumental contingency, the judgments did in fact deviate from an accurate assessment in a number ways, even if we allow that the ratings have only ordinal status. For instance, the terminal ratings for the noncontingent 125/125 schedule were below zero and were significantly lower than those for the other, noncontingent 875/875 schedule, t(15) =2.76, p < .05. In this respect, our data differ from those of Wassmerman and his colleagues (Chatlosh et al., 1985; Neunaber & Wasserman, 1986; Wasserman et af., 1983), who generally found that high- and low-frequency noncontingent schedules both yielded ratings close to zero. At present we cannot account for this discrepancy, although we suspect that it is not due to the use of a multiple-judgment procedure. Chatlosh et al. (1985) did in fact detect a small but significant effect of outcome frequency in one of their studies, and a similar bias has been reported in other instrumental contingency-detection studies (e.g., Alloy & Abramson, 1979), although it is usually due to the assignment of positive ratings to the high-frequency schedule rather than negative ones in the lowfrequency case. Whatever the reason for the occurrence of this frequency effect in our study, it does not really present a problem for the idea that subjects employ
David R. Shanks and Anthony Diekinson
238
0; 1 i
zb 30 I
60
90
120
150
180
210
240
T I M E (sec) Fig. 4. Mean causality judgments as a function of the duration of experience with the contingencies. The first number for each condition is P(O/A) and the second is P(O/ - A).
an estimate of AP in arriving at their judgments if we focus simply upon the relative ratings under the different schedules. There is good evidence that subjects give varying weights to different sources of information (see Einhorn and Hogarth, 1986, for a recent review of evidence on differential weightings of contingency evidence in the context of causality judgment), and all one has to assume is that they attach more weight to their estimates of P(O/A)than to those of P(O/-A). Such differential weighting could also explain why subjects tended to give lower judgments for the noncontingent schedule 125/125 than for the negative contingency 500/875. A third feature of our results, however, is not so easily reconciled with the AP rule, even in its weighted form. This is the acquisition profiles we observed. The terminal judgments resulted from increments in ratings under the positive contingencies and decrements under the negative ones. There was a significant increase from the first to the last judgment under the 875/125 schedule, t(15) =3.52, p c .01, and a significant decrease for the negative 125/875 schedule, t(17) =3.71, p c .01. In contrast to the observed incremental and decremental acquisition profiles, the AP rule predicts that judgments should remain constant across experience of the contingency. Although the variance of AP should decrease with increments in the sample size on which its estimate is based, its mean should remain constant and
239
Associative Accounts of Causality Judgment
therefore so should the mean of judgments based upon this metric. Figure 5 shows that this was in fact true of the actual hp experienced by the subjects in our study. Although the actual contingencies were determined by a software random number generator and therefore could deviate from the nominal values for any particular subject, on average these deviations were very small and certainly were not statistically reliable. Given the importance of the acquisition functions for a rule-based account, it is important to determine that these profiles are not an artifact of the rating procedure which, for instance, might have encouraged the subjects to conflate their assessment of causality with their judgment about some other feature of the task in making the rating. An obvious candidate for such a confound is the confidence that they have in their judgments of causality. With increasing experience with a schedule, the subjects' confidence in their judgments of causal effectiveness should increase so that if the actual ratings reflected a product of this judgment with confidence, incremental and decremental profiles would have been expected even if the subjects operated with a concept of causality based upon the AP rule. It is very difficult, of course, to evaluate such a claim in general, but we thought
I
I
U U
I
W
=
I I
-
=
3-60!/ -80
.
125/875
r n 0 1 0 2 030
60
90
120
150
T I M E (se0
180
210
240
Fig. 5 . Mean actual contingency experienced by the subjects as a function of the duration of that experience.
David R. Shanks and Anthony Dickinson
240
that it was worthwhile seeing whether a conflation arose from the fact that our subjects were allowed to make only a single rating. Consequently, we repeated the acquisition experiment, but now every time the subjects made a causality judgment they were also asked to rate their confidence in that judgment using a scale on which 100 represented complete confidence and 0 no confidence. Only the positive, 875/125, and negative, 1251875, schedules were presented because these were the ones under which we anticipated the clearest incremental and decremental profiles. The results showed that the subjects were perfectly capable of making orderly judgments of confidence which increased progressively with experience under both schedules (see Fig. 6). However, the opportunity to make separate confidence ratings did not affect judgments of causality; these continued to show incremental and decremental functions under positive, t ( l 5 ) = 3 . 3 4 , p c . 0 1 , and negative t(15)=6.53,p<.01, contingenciesrespectively (see Fig. 7). Once again, the actual contingencies did not change across trials. Faced with these data, we could still advocate a rule-based account by arguing that subjects base their judgments on a contingencymetric other than AP.For instance, Allan and Jenkins (1983) reported that judgments of contingency in discrete-trial instrumental procedures were best accounted for by the AD measure, which is based upon the difference between the frequency of “confirming” instances, F(O/A)and F( - 0 -A), and that of “disconfirming” instances, F( - O/A) and F(O/ -A), a position that has been endorsed recently by Einhorn and Hogarth (1986). Because AD depends upon the frequencies of conjunctions and disjunctions of the action and outcome, any
60 LL z 8 50
0 s75/125
<
m
z
40
1251875
i,
0
0
2o
60
90
120 150 180 T I M E lsec)
210
240
Fig. 6. Mean rating of confidence in the causality judgments for a positive (875/125) and negative contingency (125/875) as a function of the duration of experience with the contingencies.
Associative Accounts of Causality Judgment
24 1
T I M E (see)
Fig. 7. Mean causality judgment and actual contingency as a function of duration of experience with the contingencies. The top panel illustrates the results for the positive contingency (8751125); those for the negative contingency (125/875) are displayed in the bottom panel.
judgments based upon this metric should show progressive increments or decrements as the number of these episodes accumulate with increasing exposure to the schedule. Even this rule has difficulties, however, when faced with the acquisition profiles for the noncontingent schedules. In order to account for the effect of outcome frequency, we should again have to assume that the frequency terms in the rule receive different weightings. The problem with this analysis is that it predicts that the pattern of judgments for the noncontingent schedules should also change systematically with experience of the contingency. If the weighting coefficients are such as to yield a higher AD value for the 875/875 schedule than for the 125/125
David R. Shanks and Anthony Diekinson
242
schedule after a certain number of actions, it can be easily shown that the judgments for both noncontingent conditions should increase progressively, with the rate of increase being greater the higher the outcome frequency'. This prediction is clearly at variance with the observed pattern (see Fig. 4); after a transitory increase under the 875/875 schedule, judgments, if anything, tended to show a slight decline with increasing exposure. In conclusion, the acquisition of causality judgments cannot be readily explained in terms of the applications of simple linear rules for combining evidence about the conjunctions and disjunctions of action and outcome. Of course, we could entertain the possibility that the subjects use more complex, nonlinear metrics, but two observations dissuade us from doing so. First, the acquisition functions or causality judgments are just those anticipated by the incremental and decremental processes posited by conditioning theory for changing associative strength. Second, it is well established that the conditioning exhibited by the humble laboratory rat and pigeon is sensitive to variations in event contingencies, an observation that contemporary conditioning theories have had to face up to without appealing to the application of such rules. This has led us to pursue an associative account of the sensitivity of human causality judgment to event contingencies along the lines of the analysis suggested by such theories.
IV. Blocking by the Causal Background The similarity between the sensitivities of animal conditioning and human causality judgment to variations in event contingencies is striking (see Alloy & Tabachnik, 1984, for a review). Just as judgments under a positive
'In a weighted AD rule with coefficients a, b, c, and d
AD
=
UF(O/A) + bF( -01-A)
- cF(O/ -A)
-
dF( -O/A)
Thus, under a noncontingent schedule the expected value of AD after N trials is AD = uprN
+ b(l - p ) (1 -r)N
- cp(1 -r)N - d(l -p)rN
wherep is the probability of the outcome and r is the probability of an action per second. If r = .5, as it did approximately in our studies, the expected AD is
D
= 0.5 Ip(a
+d
-
b
-
c)
+
b
-
d]N
if the coefficients are such as to yield a positive ALl, then AD is an increasing function of N with a slope that increases with p .
Associative Accounts of Causality Judgment
243
contingency with a fixed P(O/A) decrease as P(O/ -A) is raised, so the strength of conditioning under a fixed probability of reinforcement given the conditional stimulus declines with increments in the probability of the reinforcer in the absence of the conditional stimulus (e.g., Rescorla, 1968). A similar effect is also seen in instrumental conditioning under variations in the contingency between the instrumental action and the reinforcer using the type of probabilistic schedule that Wasserman and his collaborators and we ourselves employed in the judgment studies (e.g., Hammond, 1980). This parallel is not restricted to positive contingencies. Under a negative contingency, a stimulus is established as a conditional inhibitor capable of opposing the action of an excitatory stimulus. Like human causality judgment, inhibitory conditioning is sensitive to the strength of the negative contingency, increasing as the likelihood of the reinforcer in the absence of the conditional stimulus is raised (Rescorla, 1969) and the probability of their conjoint occurrence is lowered (Witcher & Ayres, 1980). A comparable sensitivity to event contingencies is also seen in the instrumental analog of Pavlovian inhibitory conditioning, namely, avoidance learning (e.g., Kadden, Schoenfeld, & Snapper, 1974). It should also be noted that under both positive and negative contingencies the strength of conditioning grows progressively toward its asymptote (e.g., Hearst & Franklin, 1977) in a way that parallels our observations of the acquisition of causality judgments. Finally, an effect of outcome frequency under a noncontingent schedule, comparable to that we observed for causality judgments, is also to be found in animal conditioning. Augmenting the reinforcer frequency enhances the degree of excitatory conditioning sustained by a noncontingent schedule, at least following limited exposure to the schedule (e.g., Kremer, 1971). Moreover, a noncontingent schedule with a relatively high reinforcement rate produces an initial increase in excitatory conditioning followed by a decline (Rescorla, 1972). We observed just such a profile for judgments under the 875/875 schedule. Taken together, all these similarities encourage the view that the general analysis of contingency sensitivity developed from studies of animal conditioning might be extended to human causality judgment. A critical process in this analysis (See Rescorla & Wagner, 1972; Dickinson, 1980) is that of blocking. The basic idea is derived from Kamin’s (1969) original demonstration that conditioning to one element of a compound stimulus could be blocked by prior training to the other element. Kamin gave two groups of rats compound conditioning in which two simultaneously presented stimuli, A and B, were paired with a reinforcer for a number of trials (AB+ trials). The only difference between the groups
244
David R. Shanks and Anthony Dickinson
was that the blocking group had received prior training in which B alone was reinforced (B+ trials). This training schedule can be designated B + AB + . When Kamin subsequently tested A by presenting it alone, less conditioning was observed in the blocking group than in the control group which had received only the AB + compound training but not prior conditioning to B alone. Thus, prior training to B blocked conditioning to A. In order to explain how this process of blocking might mediate sensitivity to event contingencies, we have to assume that there is always present a potential source of causal agents for the outcome other than the target cause under consideration. This source is represented by the causal background in Fig. 2, which stands for all the potential causal agents other than the target. We can now see that what distinguishes the two sequences is that the causal background alone is paired with the outcome in the noncontingent Sequence b but not in the contingent Sequence a. The analogy between these schedules and the blocking procedure is clear; under both contingencies the subjects receive episodes in which a compound of the target (A) and the causal background (B) is paired with the outcome (AB+ trials). What distinguishes the two schedules is that in the noncontingent case the subjects also receive trials in which the causal background is paired by itself with the outcome (B+). Thus, under both the blocking and noncontingent schedules, the subjects receive a B + AB + schedule, the only difference being that these two types of trial are presented successively in the former case but intermixed in the latter. This does not affect the occurrence of blocking in animal conditioning (Wagner, 1969). So it would appear that the reduction in conditioning and, we should argue, also in causal judgments under a noncontingent schedule, is due to the fact that the causal background blocks attributions to the target cause. According to this analysis, if we increase the number of times that the causal background is paired by itself with the outcome prior to training with the target cause, its ability to block attributions to the target should be enhanced, thereby reducing judgments of the causal effectiveness of this agent. The analogous form of context blocking, as it is called, is well established in animal conditioning (e.g., Baker & Mercier, 1982; Tomie, 1981). Consequently, our next study attempted to demonstrate a role for context blocking in causality judgment. We abandoned the free-operant procedure of the previous experiments in favor of a discrete-trial task in which the nature of the context or causal background was made explicit. In presentation of the free-operant task, no reference was made to alternative causal agents for the outcome, and the subject’s attributions of unpaired outcomes were not controlled experimentally. Since a context-blocking design essentially studies the interaction between the attribution of outcomes paired with the target and that of unpaired outcomes, we thought it desirable to bring the latter under experimental control, at least to the extent
Associative Accounts of Causality Judgment
245
of providing all subjects with the same explicit causal background to which outcomes could be attributed. The task we have developed for this purpose is a simple video game in which the subjects are asked to judge how effective different shells are in destroying tanks. The scenario states that these shells are of variable reliability and thus on some occasions may hit the tank without causing it to explode. The subjects are given a number of trials before being requested to judge the effectiveness of the shells on a scale similar to that used in the free-operant studies. On each trial a representation of a tank is shown moving across the video screen and, in doing so, passing through a gunsight before traversing a minefield. The subject is given the option of firing a shell on each trial by pressing the space bar on the computer keyboard. In all of our studies with this procedure, as in the free-operant case, there have never been any significant differences in the number of trials on which the subjects choose to perform the action across the different conditions. It is pointed out to the subject that if the tank explodes in the absence of a shell being fired, this must be due to the minefield, whereas an explosion following the firing of a shell could be due to either the shell or the minefield. However, no information about the cause of destruction on these trials is available from the temporal or spatial locus of the explosion in the tank’s traverse of the screen. This point is varied randomly across trials. Thus, it can be seen that within this scenario the minefield plays the role of the causal background posited by an analysis in terms of context blocking. In spite of the the major procedural differences between this video-game task and the free-operant procedure, the two tasks yield very similar patterns of judgment across variations in event contingencies. Regarding the destruction of the tank as the outcome and the firing of the shell as the action, judgments are sensitive to both positive and negative contingencies and show the bias produced by changing outcome frequency on a noncontingent schedule (Dickinson, Shanks, & Evenden, 1984). Moreover, the acquisition profiles seen with this procedure (Shanks, 1985a) are very similar to those for the free-operant schedules, suggesting that the same process underlies sensitivity to event contingencies in the two cases. Given this procedure, we were in a position to investigate whether a blockinglike effect occurs in causality judgment when the number of pairings of the causal background, the minefield in this case, with the outcome is increased. We (Dickinson et al., 1984) did this by giving the subjects an initial observation stage in one set of trials. During this stage they could not perform the action, but simply observed the outcome occurring with a probability of .75 on each trial. This stage served to enhance the number of pairings of the causal background with the outcome and thus should have increased its ability to block attributions to the action during a second stage in which the subjects could perform the action on a noncontingent schedule
David R. Shank and Anthony Dickhson
246
with an outcome probability of .75. In this procedure, such a noncontingent schedule with a high outcome frequency yields a positive rating (Dickinson et al., 1984) which, if blocking occurs, should be reduced by the prior observation period. To assess whether blocking did in fact occur, we compared judgments of the shells’ effectiveness in this blocking set to those for two control conditions. The first (Con30), consisting of 30 trials under the noncontingent schedule, matched the number of trials on which the action could be performed to that in the blocking condition, whereas in the second (Con60) the number of trials on which the action could be performed was 60, the same as the total number received in the blocking condition. The outcome of this study (Dickinson et al., 1984) is illustrated in Fig. 8. The first point to note is that the control conditions yielded positive ratings. In fact, these ratings are much higher than those produced by the same noncontingent schedule with a high outcome frequency under the free-operant procedure (see Fig. 1). There are a host of possible reasons for this discrepancy. As we have already seen, judgments change across experience with such a schedule, showing an initial increase followed by a decrease. In fact, a comparable decrement is apparent in the ratings for the Con30 and Con60 sets. This means that any comparison must take into account the amount of experience with the schedule, an operation that is not easy in a comparison of discrete-trial and free-operant procedures. Furthermore, if
50 >
$
(0 3
+ z
F
40
30 20
z 10 3 0
w I
0 -101
Con 30
Con 60
1-10
Blocking
Fig. 8. Mean causality judgment and actual contingency for the sets in which the subjects experienced either 30 (blocking and Con30) or 60 trials (Con60) under a noncontingent schedule, which in the case of the blocking set were preceded by 30 observation trials.
Associative Accounts of Causality Judgment
247
the blocking account is correct, the ratings for the target event will depend upon how effective the causal background is in blocking the target, a factor that may well vary from task to task. Finally, there are a number of procedural reasons why such across-experiment comparisons are inappropriate. For instance, the absolute ratings for a particular schedule could be affected by the values of the other schedules against which it is contrasted in a within-subjects procedure. For these reasons we restrict our analyses to the ordinal relationships between judgments within the same experiment. The major finding of the present study is the demonstration of a substantial blocking effect in that the judgments for the blocking condition were below those for both control sets. It is important to note that these differences occurred in the absence of any discrepancies in the actual contingency experienced in each condition; Fig. 8 also shows that the actual AP was very close to zero in all three conditions, so that the action in fact had no effect on the likelihood of the outcome under any condition. As a result, an account of this blocking effect in terms of the application of a rule based upon this metric is not available. The demonstration of context blocking encouraged us to pursue the applicability of conditioning theories to causality judgment. This form of blocking has been explained by associative theories of conditioning in two different ways. The first assumes that the process occurs during acquisition so as to restrict the growth of associative strength to the target event, whereas the second appeals to a comparator process operating either when the subject is tested for responding to the target in a conditioning experiment or at the time the judgment is made in our causality judgment studies. We consider each of these ideas in turn.
V.
Retrospective Evaluation
The best known account of blocking in terms of the modulation of acquisition is that offered by Rescorla and Wagner (1972). They argued that a reinforcer or outcome will sustain only a limited amount of associative strength and that the increment in the strength of the target event, A, accruing from each pairing with an outcome is reduced if these pairings occur in the presence of another event, B. Moreover, the size of the reduction is proportional to the associative strength of this competing event. In the context of our studies, this means that independent pairings of the causal background with the outcome during the observation stage should have increased the associative strength of the background, which in turn should have reduced the increments in the strength of the action accruing from
248
David R. Shanks and Anthony Diekinson
subsequent pairings of the action with the outcome in the presence of the background. The Rescorla-Wagner model is not the only account of how a blocking process might modulate acquisition; several other theorists have appealed to the role of attentional-like mechanisms. The reduced increment in associative strength of the target event produced by the presence of an alternative cause on a particular trial may be due to a consequent failure to attend to the target on either the current trial (e.g., Sutherland & Mackintosh, 1971) or on subsequent trials (e.g., Mackintosh, 1975; Pearce &Hall, 1980). Whatever the differences between these various acquisition theories, they all place a common restriction upon the blocking process, namely that it can only operate prospectively. This point can be illustrated by considering the blocking procedure again. We have already seen that B + training in the first stage reduces conditioning and attribution to A during AB + compound training in the second stage. But what would happen if we reversed the two stages so AB + training preceded B + training to yield a backward as opposed to forward blocking procedure? By any objective criteria, the forward and backward procedures provide the same evidence about the relative causal efficacy of A and B, and so we might expect that B + training should reduce attributions to A in both cases. This is not the result, however, anticipated by acquisition theories for, according to these accounts, once A has acquired associative strength during AB+ training, it should be impervious to subsequent changes in the strength of B brought about by B + training. Thus, acquisition theories predict the occurrence of forward but not backward blocking. Shanks (1985b) evaluated this prediction using the video-game procedure. In one study the forward blocking (FB) and forward control (FC) conditions were identical to the blocking and CON30 sets, respectively, of the previous blocking experiment. The subjects received 30 trials under the 75/75 noncontingent schedule with an outcome probability of .75, preceded in the case of the FB set by 30 observation trials with the same outcome probability. These forward sets were intermixed with two backward conditions, a backward blocking (BB) and a backward control (BC) set. In the BB set, the order of the two stages was simply reversed so that the observation stage followed the stage in which the action could be performed, The BC set, like the FC condition, presented 30 trials on each of which the action could be performed, but in the backward case an interval equivalent to that occupied by the observation stage in the BB set elapsed between the end of the 30 trials and a request for a judgment. To check that the blocking effect in causal judgment is not peculiar to noncontingent schedules, Shanks (1985b) also ran subjects in these conditions under a positive, 75/50 contingency in which P(O/A)and P(O/ -A) were .75 and S O respectively. The probability of the outcome in the observation stage was also 3 0 for these subjects so that the presence of this stage did not alter the actual contingency.
Associative Accounts of Causality Judgment
I
FC
FB
BC
BB
249
I
Condition
Fig. 9.. Mean causaIity judgment and actual contingency for the forward control (FC) and blocking (FB) sets and for the backward control (BC) and blocking (BB) sets. These data are shown separately for groups experiencing the positive 75/50 schedule (top panel) and the noncontingent 75/75 schedule b o t t o m panel).
Figure 9 shows that Shanks (1985b) was able to replicate the forward blocking effect under both the noncontingent (75/75) and contingent (75/50) schedules. More important, a backward blocking effect was also observed under both schedules. In both the forward and backward cases, the judgments for the blocking sets were lower than those for the corresponding control sets, with the magnitude of the differences being comparable in the two cases. Moreover, as Fig. 9 shows, these differences were not due to discrepancies in the actual contingencies experienced in the various sets; what fortuitous differences there were in the actual LW were in the opposite direction to those for the judgments. This is but one of a number of studies in which it was demonstrated that causality judgment is susceptible to a backward blocking process whereby the subjects appear to evaluate retrospectively the causal status of the target agent in the light of subsequent information about the causal background. These demonstrations clearly undermine the application of acquisition accounts of blocking to human causality judgment for they challenge one of the basic assumptions of these theories, namely, that the blocking process acts to restrict the initial growth of associative strength.
VI. Comparator Theories In contrast to the acquisition theories, backward blocking, at least by contextual or background cues, follows directly from comparator accounts.
250
David R. Shanks and Anthony Diekinson
The most clearly formulated comparator model of conditioning is that offered by Gibbon and Balsam (1981) in the form of their Scalar Expectancy Theory, although a similar idea has also been espoused by Jenkins, Barnes, and Barrera (1981) and by Miller and Schachtman (1985a,b). What Gibbon and Balsam suggest is that the associative strengths of the target event and the context (causal background in our terms) grow independently with each pairing, with the reinforcer or outcome producing an increment and each nonreinforced presentation a decrement, Responding to the target event is then determined by some comparison of the current associative strength of the target to that of the causal background; the higher the strength of the target relative to that of the background, the greater the conditional response to the target. If we assume that judgments of causal effectiveness of the target are determined by a similar comparison, this theory provides a ready explanation of both forward and backward blocking by the causal background. The observation stage, whether presented before or after the firing stage, serves to increase the number of pairings of the causal background with the context, thus enhancing its associative strength. This in turn reduces the relative strength of the target which governs judgments. Whether or not this comparator theory provides an adequate account of the sensitivity of animal conditioning to event contingencies is currently a matter of dispute, for it has had to face a number of empirical challenges. One of the most compelling comes from studies of the effect of signaling unpaired reinforcers. In the typical signaling experiment (e.g., Durlach, 1983), a contingency effect is demonstrated by presenting the animals with contingent and noncontingent schedules of the type illustrated in Fig. 2. Another noncontingent condition is added, however, in which all the reinforcers that occur unpaired with the target are signaled by a quite different event, called the signal. According to Gibbon and Balsam’s comparator model, this should have no effect on conditioning to the target. The signaling operation does not alter the number of times the background is paired with the reinforcer and therefore should leave its associative strength unchanged relative to the simple noncontingent schedule. There is no reason, therefore, to expect signaling to alter the determinant of conditioning, the relative associative strengths of the target and the background, and the animals should fail to show any conditioning under this signaled noncontingent schedule. In fact, Durlach (1983) found that signaling went a long way toward restoring the deficit in conditioning produced by this type of noncontingent schedule. Although there have been failures to demonstrate this effect (e.g., Jenkins & Shattuck, 1981), it has been replicated in both Pavlovian (Rescorla, 1984) and instrumental procedures pickinson & Charnock, 1985; Hammond & Weinberg, 1984), giving little reason to doubt its reliability under the appropriate conditions.
Associative Accounts of Causality Judgment
25 1
Given these data, the signaling procedure provided us with the obvious test of the applicability of the Gibbon and Balsam comparator model to causality judgments. To implement this test, subjects performed our video game under three different schedules (Shanks, 1986). The first two were simple contingent and noncontingent schedules. Under the contingent 50/0 schedule, the outcome never occurred unless the subject performed the action, in which case the probability of the outcome was S O . The noncontingent, 50/50 schedule was identical except for the fact that the outcome also occurred on half the occasions when the subject chose not to respond. The third, signal, condition was the one of prime interest. This condition also employed the noncontingent schedule, but now a representation of a plane crossed the screen above the tank on all trials when the subjects did not fire a shell on target and the tank was programmed to explode. The instructions explicitly informed the subjects that these planes were capable of destroying the tanks. This condition was designed to parallel the signaling schedule of the conditioning studies, in that all outcomes that occurred in the absence of the target event, the firing of the shell, were paired with another independent event, the presence of the plane. Requests for judgments of the causal effectiveness of the shell were made after 40 trials under each condition. The pattern of these judgments, illustrated in Fig. 10, replicated the basic sensitivity to event contingencies seen in the previous studies; the subjects gave significantly lower ratings under the noncontingent schedule than under the contingent one. More importantly, signaling the outcomes that occurred in the absence of the action elevated ratings of the causal effectiveness of the shell above those for the simple noncontingent condition, thus reversing to some extent the reduction in judgments produced by a degradation in contingency. This finding parallels exactly the results of the animal conditioning studies and provides the same difficulty for any comparator theory, such as Gibbon and Balsam’s (1981), which bases the associative strength of the causal background simply on pairings with the outcome. Once again, it should be emphasized that, as in the case of the blocking studies, the pattern of judgments cannot be explained in terms of the actual contingencies experienced by the subjects. Under both the signal and nonconiingent conditions, firing had no effect on the likelihood of a tank’s destruction; as Fig. 10 shows, the actual M was both close to zero for both schedules and did not differ significantly in the two cases. The demonstration of a signaling effect leaves us at something of a theoretical impasse. Both the sensitivity of causal judgments to temporal contiguity and their acquisition profiles under different contingencies suggest a role for the type of associative processes embodied within contemporary conditioning theories. The problem is that no single extant theory can encompass the effects of event contingencies on causal judgment. The
David R. Shanks and Anthony Diekinson
252
.:.:.:.:.
J"DGMENT
ACTUALAP
50
D
2 C
F 40 0
30 20
e GI rn
5 -<
I0 X
0 ;
0
v
U -lo'
CONTINGENT
NON-CONTINGENT
SIGNAL
-10
CONDITION
Fig. 10. Mean causality judgment and actual contingency for the various conditions in the signaling experiment.
demonstration of backward blocking by the causal background effectively rules out what we have referred to as acquisition theories, whereas the signaling result places restrictions upon the type of comparator process we could entertain. Faced with these conclusions, it seemed t o us that the preferred course of theoretical development would be to retain the associative and comparator assumptions while modifying the latter to take account of the signaling results. Although the signaling result is at variance with any normative analysis of contingency judgment as, in reality, the action had no effect on the outcome in either the noncontingent or signal conditions, it does make intuitive sense. In the signaling condition, outcomes that occur in the presence of the signal, the plane in this case, are presumably more likely to be attributed to this event than to the causal background, the minefield. It is as though the presence of the signal allows the system to discount the outcomes that occur in its presence in assessing the effectiveness of the causal background. This, in turn, should reduce the ability of the minefield to block attributions to the action. One way of implementing such discounting within an associative theory is to abandon the common assumption that compounds of events, such as that comprising the presence of the signal and the causal background, can be
Associative Accounts of Causality Judgment
253
analyzed into elements for the assignment and alteration of associative strengths. Without this assumption, the compound of the signal and background can be treated as a single event with its own associative strength which is independent of that attached to the causal background alone. This means that outcomes that occur in association with this compound will leave the strength of the causal background unchanged. To the extent that it is strength of the causal background alone that makes a decremental contribution to the comparator process, signaling all outcomes that occur in the absence of the action should prevent the growth of this associative strength and hence increase judgments of the effectiveness of the action. This is the analysis that we have recently adopted in an associative comparator theory of causal judgment (Dickinson & Shanks, 1985). As in all comparator theories, the dependent variable, in this case the judgment of the effectiveness of the target event A, J', is determined by positive and negative factors. In accord with the assumption about compound events, the positive factor is the associative strength of the compound of the target event and the causal background, T/AB, whereas the negative factor is the associative strength of the causal background alone, V,. For the sake of simplicity, we assume that the judgment is determined just by the difference between these two factors, so that
Forward and backward blocking by the causal background, of course, follow directly from the fact that training of the background alone in the observation stage increases V, , thus reducing JA . In common with other associative learning theories, the change in these associative strengths, NAB and dVB, produced by a contiguous pairing of either the compound of the target and background AB or the background B alone with the outcome are determined by the standard linear operator equations:
where a is t..e learning rate parameter associated wit.. either the compound AB or B alone, /3 an equivalent parameter for the occurrence or nonoccurrence of the outcome, and X the asymptote of associative strength. This asymptote is assumed to be 100 when the outcome occurs and zero when it does not. We have already seen that this model can accomodate both the blocking and signaling effects that we have observed, at least at a qualitative level.
254
David R, Shanks and Anthony Dickinson
What remains is to show that it can also produce a set of acquisition functions with the main features of the empirical profiles that we observed in our free-operant study. This we attempted by running a simulation of the model under the conditions of the acquisition study, using the same contingency schedule and software random-probability generator as were used in the experiment to determine the sequence of events. The probability of an action was set at 0.44/sec, which was the mean rate actually produced by the subjects in that experiment. A nonexhaustive and nonsystematic search of the parameter space was performed in order to discover whether the model could produce an appropriate set of acquisition functions. A comparison of Figs. 4 and 11, which illustrate the mean judgments produced by the actual subjects and loo0 simulated subjects, respectively, shows that our model is capable of reproducing some of the main features of the empirical profiles. In general, the terminal simulated judgments reflected an appropriate sensitivity to event contingencies, as did the actual ratings. Moreover, these terminal values were reached by incremental and decremental changes under positive and negative contingenciesrespectively, in a way that paralleled the observed functions. The model could also reproduce detailed features of these profiles; under the noncontingent schedule with a high-outcome frequency, the 875/875 schedule, there was an initial increase followed by a decline to an asymptotic judgment near zero in both the actual and simulated functions. The main discrepancy related to the judgments under the noncontingent, 125/125 schedule. Whereas the model predicts that judgments should remain close to zero, we in fact observed substantial negative ratings under this schedule. This meant that the observed effect of outcome frequency on judgments under noncontingent schedules was much more sustained in the empirical data than in the simulated ratings, where it is only a transient phenomenon due to the initial positive ratings under the 875/875 schedule. We are somewhat uncertain, however, what to make of this discrepancy for, as we have already pointed out, Wasserman and his colleagues (Chatlosh et al., 1985; Neunaber & Wasserman, 1986; Wasserman et al., 1983) reported terminal judgments that were much more in accord with our simulation in this respect than they were with our own data. Thus, it remains unclear, as yet, whether the discrepancey reflects a failure of the model or some undetected biasing factor in our procedure. It will probably not have escaped the reader’s attention that our function for specifying the judgment JA,equation 1, is similar to that determining A P , the associative strength of the compound of the action and causal background, V A B , corresponds to P(O/A) and that of causal background alone, V,, to P(O/ -A). This correpondenceis reinforced by the fact that if the learning rate parameter P in equations 2 and 3 is the same whether or not the outcome occurs, then at asymptote V& and V, will equal P(O/A) x 100 and P(O/ -A)
Associative Accounts of Causality Judgment
0
lo
*'30
60
90
120
150
180
210
255
2LO
TRIALS
Fig. 11. Mean causality judgments ( J , ) as a function of the duration of experience with the contingency resulting from a simulation of loo0 subjects on the associative comparator theory presented in equations 1,2, and 3. The results of the simulation are shown for a variety of contingencies in which the first term specifies P ( O / A ) x loo0 and the second term P(O/ -A) x 1oo0. The parameters employed in the simulation were: aAB = 0.2; aB = 0.1; 0 for the outcome = 0.3 and 0for the nonoccurrence of the outcome = 0.8; X for the outcome = 1 .O and X for the nonoccurrence of the outcome = 0.
x 100 respectively. This means, of course, that any terminal judgment data that can be well described by the AP rule will be perfectly compatible with our model. This should not be taken to mean, however, that there are no substantive psychological differences between our model and a rule-governed account based upon AP. The JA comparison embodied in equation 1, unlike the AP rule, makes explicit the fact that information about the occurrence of the outcome in the absence of the action is important to the subject because it provides evidence about the effectiveness of the causal background in which the action operates. A simple contingency metric, such as aP, has no such implication. This is why our model, unlike the AP rule, provides a principled way of discounting the signaled outcomes in arriving at a causality judgment. A second discriminating factor is, of course, the use of an
256
David R. Shanks and Anthony Dickinson
associative function in equations 2 and 3 to determine the values of VA,and VB. This means that Jh unlike A P,does not necessarily reflect directly the strength of an event contingency but confounds this factor with the amount of training. A given value of JAcanarise either after a limited exposure to a strong contingency or more extended exposure to a weaker one. The reason for preferring the J,rule is, of course, the demonstration that judgments do in fact pass through such intermediate values in the acquisition studies. Moreover, without the notion that increased exposure to pairings with an outcome augments the strength of a causal agent, in this case the causal background, it would be difficult to explain the context blocking that we observed.
VII.
Conclusion
With a few notable exceptions (e.g., Anderson & Bower, 1973), human cognitive psychology abandoned associationism with the demise of behaviorism many years ago in favor of a theoretical perspective that emphasizes the role of information processing and computation. A casualty of the cognitive revolution was the study of animal conditioning, which until that time had been central to the study of learning and memory in general, but consigned itself to a backwater of psychology, due in part to the conservatism of its theoretical stance. Few (e.g., Rudy, 1974; Gluck & Bower, 1986) would now recognize any possible relevance of its associative analysis of learning to human psychology. In this article we have attempted to bridge the divide in one selected area by showing that the empirical and theoretical analyses of the impact of event contingencies developed within animal conditioning may well illuminate the processes underlying our judgments of causality. To the list of empirical similarities between animal conditioning and human causality judgment outlined previously, we can now add the form of the acquisition functions, contextual blocking, and the signaling effect. It might be thought, however, that our demonstration of backward blocking only serves to undermine the parallel, for it is true that this phenomenon lies outside the scope of the dominant acquisition theories of conditioning. This theoretical failure, though, must be distinguished from the empirical claim that animal conditioning does not exhibit the type of retrospective evaluation demonstrated in the backward-blocking studies (Shanks, 1985b). It is true that, to the best of our knowlege, all attempts to demonstrate a standard backward-blocking effect with animals have been unsuccessful (e.g., Durlach & Rescorla, 1980; Rescorla & Cunningham, 1978; Schweitzer & Green, 1982). There have been, however, a number of reports in the animal conditioning literature that appear to involve a process of retrospective
Associative Accounts of Causality Judgment
251
evaluation. For instance, both Kaufman and Bolles (1981) and Matzel, Schachtman, and Miller (1985) found that following Pavlovian compound training, extinguishing conditioning to one of the elements elevated responding to the other. It is findings such as these that have, at least in part, motivated the development of comparator theories in animal conditioning (Miller & Schachtman, 1985a,b). For the time being, it seems that we should leave open the question of whether animals are capable of retrospective evaluation. Finally, a couple of caveats must be made about the scope of our associative approach. Because it appeals to the growth of associative strength across a series of episodes involving pairings in real time, it is clearly not applicable to judgments based upon narrative or abstract representations and summaries of event co-occurrences, such as are widely used in studying attribution in social psychology (e.g., Kelley, 1973) or text comprehension (e.g., Trabasso & Sperry, 1985). Nor is it obvious that an associative account could be offered for the perceptual experience of mechanical causation of the type studied by Michotte (1963), which can be immediately apparent in a single episode. In addition, we have neglected a number of factors other than contiguity and contingency that are likely to affect causal judgment and only some of which can be easily integrated with a purely associative account. Factors such as the similarity of cause and effect, which was indentified as important in Einhorn and Hogarths' (1986) review, provide no obvious difficulties for a conditioning-based approach; both the congruity of the temporal pattern of events (e.g., Testa, 1975) and their physical similarity (e.g., Rescorla & Furrow, 1977) are known to have an effect on simple conditioning. But the incorporation of other important factors is more problematic. For example, it is clear that beliefs and prior knowledge about specific causal processes can interact with (Alloy & Tabachnik, 1984) and, in some cases, even counteract (e.g., Shultz, 1982) contiguity and contingency information, a conclusion that would seem to demand an inferential basis for causality judgments. If we wish to continue to defend a role for associative processes in the face of this evidence, we must recognize that such processes can interact with inferential ones, In fact, the comparator type of model in many ways provides a ready-made vehicle for such an interaction. If we were to accept that associative strength can give rise to a belief about the causal efficacy of an agent, which is sensitive to temporal contiguity, then the comparator equation for J', equation 1, can be interpreted at the psychological level as an inference rule by which people derive a judgment sensitive to event contingency. Given this interpretation, there is no reason why other knowledge and beliefs should not enter into this inference and thus interact with event contingencies in determining the causal judgment. In this sense,
David R. Shanks and Anthony Dickinson
258
our comparator model is really a hybrid that incorporates both associative and inferential processes.
ACKNOWLEDGMENTS The work reported in this article and its preparation were supported in part by grants from the Science and Engineering Research Council and the Medical Research Council. Among the many people we thank for helpful comments on this research are Paula Durlach, Phil JohnsonLaird, and Nicholas Mackintosh.
REFERENCES Abramson, L. Y.,Seligman, M. E. P., & Teasdale, J. (1978). Learned helplessness in humans: Critique and reformulation. Journal of Abnormal Psychology, 87, 49-74. Allan, L. G., (1980). A note on measurement of contingency between two binary variables in judgment tasks. Bulletin of the Psychonomic Society, 15, 147-149. Allan, L. G., & Jenkins, H . M. (1980). The judgment of contingency and the nature of the response alternatives. Canadian Journal of Psychology, 34, 1-1 1. Allan, L. G., & Jenkins, H. M. (1983). The effect of representations of binary variables on judgment of influence. Learning and Motivation, 14,381-405. Alloy, L. B., & Abramson, L. Y. (1979). Judgment of contingency in depressed and nondepressed students: Sadder but wiser? Journal of Experimental Psychology: General, 108, 441-485.
Alloy, L. B., & Tabachnik, N. (1984). Assessment of covariation by humans and animals: The joint influence of prior expectations and current situational information. Psychological Review, 91, 112-149. Anderson, J. R., & Bower, G. H. (1973). Human associative memory. Washington, D.C.: Winston. Baker, A. G . , & Mercier, P. (1982). Manipulation of the apparatus and response context may reduce the US pre-exposure interference effect. Quarterly Journal of Experimental Psychology, 34B, 221-234. Brewer, W. F. (1974). There is no convincing evidence for operant or classical conditioning in adult humans. In W. B. Weimer & D. S. Palermo (Eds.), Cognition and the symbolic processes. Hillsdale, NJ: Erlbaum. Chatlosh, D. L., Neunaber, D. J., & Wasserman, E. A. (1985). Response-outcome contingency: Behavioral and judgmental effects of appetitive and adversive outcomes with college students. Learning and Motivation, 16, 1-34. Dickinson, A. (1980). Contemporary animal learning theory. Cambridge: Cambridge University Press. Dickinson, A., & Charnock, D. J. (1985). Contingency effects with maintained instrumental reinforcement. Quarterly Journal of Experimental Psychology, 37B,397-416. Dickinson, A., &Shanks, D. R. (1985). Animal conditioning and human causality judgment. In L.-G. Nilsson, & T. Archer (Eds), Perspectives on learning and memory. Hillsdale, NJ: Erlbaum. Dickinson, A., Shanks, D. R., & Evenden, J. L. (1984). Judgement of act-outcome contingency: The role of selective attribution: Quarterly Journal of Experimental Psychology, 36A, 29-50.
Associative Accounts of Causality Judgment
259
Durlach, P. J. (1983). Effect of signaling intertrial unconditioned stimuli in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 9, 374-389. Durlach, P. J., & Rescorla, R. A. (1980). Potentiation rather than overshadowing in flavoraversion learning: An analysis in terms of withincompound associations. Journal of Experimental Psychology: Animal Behavior Processes, 6 , 175-187. Einhorn, H. J., & Hogarth, R. M. (1986). Judging probable cause. Psychological Bulletin, 99, 3-19.
Gibbon, J., & Balsam, P. D. (1981). Spreading association in time. In C. M. Locurto, H. S . Terrace, & J . Gibbon (Eds.), Autoshaping and conditioning theory. New York: Academic Press. Cluck, M. A., & Bower, G. H. (1986). Conditioning and categorization: Some common effects of informational variables in animal and human learning. Proceedings of the Cognitive Science Society Conference, 1- 15. Gruber, H. E., Fink, C. D., & Damm, V. (1957). Effects of experience on perception of causality. Journal of Experimental Psychology, 53, 89-93. Hammond, L. J. (1980). The effect of contingency upon the appetitive conditioning of free operant behavior. Journal of the Experimental Analysis of Behavior, 34, 297-304. Hammond, L. J., & Paynter, W. E. (1983). Probabilistic contingency theories of animal conditioning: A critical analysis. Learning and Motivation, 14, 527-550. Hammond, L. J . & Weinberg, M. (1984). Signaling unearned reinforcers removes the suppression produced by a zero correlation in an operant paradigm. Animal Learning and Behavior, 12, 371-377. Hearst, E., & Franklin, S . R. (1977). Positive and negative relations between a signal and food: Approach-withdrawal behavior to the signal. Journal of Experimental Psychology: Animal Behavior Processes, 3, 37-52. Hilton, D. J., & Slugoski, B. R. (1986). Knowledge-based causal attribution: The abnormal conditions focus model. Psychological Review, 93, 75-88. Hull, C. L. (1943). Principles of behavior. New York: Appleton. Hume, D. (1888). A treatise of human nature. L. A. Selby-Bigge (Ed.). Oxford: Clarendon. Jenkins, H. M., Barnes, R. A., & Barrera, F. J. (1981). Why autoshaping depends on trials spacing. In C. M. Locurto, H. S . Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory. New York: Academic Press. Jenkins, H. M., & Shattuck, D. (1981). Contingency in fear conditioning: A reexamination. Bulletin of the Psychonomic Society, 17, 159-162. Kadden, R. M., Schoenfeld, W. N., & Snapper, A. G. (1974). Aversive schedules with ines of reinforcement for responding and not responding by rhesus monkeys: 11. Without signal. Journal of Comparative and Physiological Psychology, 87, 1 189-1 197. Kamin, L. J . (1969). Predictability, surprise, attention and conditioning. In B. A. Campbell, & R. M. Church (Eds.), Punishment and aversive behavior. New York: Appleton. Kaufman, M. A., & Bolles, R. C. (1981). A nonassociative aspect of overshadowing. Bulletin of the Psychonomic Society, 18, 318-320. Kelley, H. H. (1973). The process of causal attribution. American Psychologist, 28, 107-128. Kremer, E. F. (1971). Truly random and traditional control procedures in CER conditioning in the rat. Journal of Comparative and Physiological Psychology, 76, 441-448. Leslie, A. M. (1982). The perception of causality in infants. Perception, 11, 173-186. Mackie, J. L. (1974). The cement of the universe. Oxford: Oxford University Press. Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276-298. Mackintosh, N. J. (1983). Conditioning and associative learning. Oxford: Clarendon.
David R. Shanks and Anthony Diekinson
260
Matzel, L. D., Schachtman, T. R., & Miller, R. R. (1985). Recovery of an overshadowed association achieved by extinction of the overshadowing stimulus. Learning and Motivation, 16, 398-412. Michotte, A. (1963). The perception of causality. London: Methuen. Miller, R. R., & Schachtman, T. R. (1985a). The several roles of context at the time of retrieval. In P. D. Balsam & A. Tomie (Eds.), Context and learning. Hillsdale, NJ: Erlbaum. Miller, R. R., & Schachtman, T. R. (1985b). Conditioning context as an associative baseline: Implications for response generation and the nature of conditioned inhibition. In R. R. Miller & N. E. Spear (Eds.), Information processing in animals: Conditioned inhibition. Hillsdale, NJ: Erlbaum. Neunaber, D. J., & Wasserman, E. A. (1986). The effects of unidirectional versus bidirectional rating procedures on college students’ judgments of response-outcome contingency. Learning & Motivation, 17, 162-179. Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning. Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87,532-552. Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology. 66, 1-5. Rescorla, R. A. (1969). Conditioned inhibition of fear resulting from negative CS-US contingencies. Journal of Comparative and Physiological Psychology, 67, 504-509. Rescorla, R. A. (1972). Informational variables in Pavlovian conditioning. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 6). New York: Academic Press. Rescorla, R. A. (1984). Signaling intertrial shocks attenuates their negative effect on conditioned suppression. Bulletin of the Psychonomic Society, 22, 225-228. Rescorla, R. A., & Cunningham, C. L. (1978). Within-compound flavor associations. Journal of Experimental Psychology: Animal Behavior Processes, 4, 267-275. Rescorla, R. A., &Furrow, D. R. (1977). Stimulus similarity as a determinant of Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 3, 203-215. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectivenessof reinforcement and nonreinforcement. In A. H. Black, & W. F. Prokasy (Eds.), Classical conditioning II: Current theory and research. New York: Appleton. Rudy, J. W. (1974). Stimulus selection in animal conditioning and paired-associate learning: Variations in the associative process. Journal of Verbal Learning and Verbal Behavior, 13, 282-296.
Schweitzer, L., & Green, L. (1982). Reevaluation of things past: A test of the ‘retrospection hypothesis’ using a CER procedure with rats. Pavlovian Journal of the Biological Sciences, 17, 62-68. Shanks, D. R. (1985a). Continuous monitoring of human contingency judgment across trials. Memory and Cognition, 13, 158-167. Shanks, D. R. (1985b). Forward and backward blocking in human contingency judgement. Quarterly Journal of Experimental Psychology, 3178, 1-21. Shanks, D. R. (1986). Selective attribution and the judgment of causality. Learning ondMotivation. 17, 31 1-334. Shultz, T. R. (1982). Rules of causal attribution. Monographs of the Society for Research in Child Development, 47, (1, Serial No. 194). Siegler, R. S., & Liebert, R. M. (1974). Effects of contiguity, regularity, and age on children’s causal inferences. Developmental Psychology, 10, 514-579. Sutherland, N. S. & Macintosh, N. J . (1971). Mechanisms of animal discrimination learning. New York: Academic Press. Testa, T. J. (1974). Causal relationships and the acquisition of avoidance responses. Psychological Review, 81, 491-505.
Associative Accounts of Causality Judgment
261
Testa, T. J. (1975). Effects of similarity of location and temporal intensity pattern of conditioned and unconditioned stimuli on the acquisition of conditioned suppression in rats. Journal of Experimental Psychology: Animal Behavior Processes, 1, 1 14-121. Tolman, E. C., & Brunswick, E. (1935). The organism and the causal texture of the environment. Psychological Review, 42, 43-77. Tomie, A. (1981). Effect of unpredictable food on the subsequent acquisition of autoshaping: Analysis of the context-blocking hypothesis. In C. M. Locurto, H. s. Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory. New York: Academic Press. Trabasso, T., & Sperry, L. L. (1985). Causal relatedness and importance of story events. Journal of Memory and Language, 24, 595-61 1. Wagner, A. R. (1969). Stimulus selection and a ‘modified continuity theory.’ In G. H. Bower & J. T. Spence (Eds.), The psychology of leurning and motivation. Vol. 3. New York: Academic Press. Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behavior. In N. E. Spear & R. R. Miller (Eds.), Information processing in unimuis: Memory mechanisms. Hillsdale, NJ: Erlbaum. Wasserman, E. A., Chatlosh, D. L., & Neunaber, D. J. (1983). Perception of causal relations in humans: Factors affecting judgments of response-outcome contingencies under freeoperant procedures. Learning and Motivation, 14, 406-432. Wasserman, E. A., & Neunaber, D. J. (1986). College students’ responding to and rating of contingency relations: The role of temporal contiguity. Journal of the Experimental Analysis of Behavior, 46, 15-35. Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 92, 548-573. Witcher, E. S., & Ayres, J. J. B. (1980). Systematic manipulation of CS-US pairings in negative CS-US correlation procedures in rats. Animal Learning and Behavior, 8, 67-74.
This Page Intentionally Left Blank
ANXIETY AND THE AMYGDALA: PHARMACOLOGICAZI AND ANA'IDMCAL ANALYSIS OF THE FEARPoTENlM"'ED STARTLE PARADIGM Michael Davis Janice M . Hitchcock Jeffrey B. Rosen RIBICOFF RESEARCH FACILITIES OF THE CONNECTICUT MENTAL HEALTH CENTER DEPARTMENT OF PSYCHIATRY YALE UNIVERSITY SCHOOL OF MEDICINE NEW HAVEN. CONNECTICUT 06508 I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. The Fear-Potentiated Startle Paradigm . . . . . . . . . . . . . . 111. The Pharmacology of Fear-Potentiated Startle
.........................
Advantages of Fear-Potentiated Startle for Studying the Pharmacology of Fear or Anxiety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Effects of Different Drugs on Fear-Potentiated Startle: e.. ... C. Effects of Different Drugs on Fear-Potentiated Startle: IV. Neural Systems Involved in Fear-Potentiated Startle. .................... A. The Acoustic Startle Pathway . . . . . .... B. Fear-Potentiation of a Short-Laten Measured Electromyographically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Determining the Point within the Startle Pathway Where Fear Alters NeuralTransmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Effects of Diazepam on Potentiation of Electrically Elicited Startle and Reversal by RO 15-1788 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. The Role of the Central Nucleus of the Amygdala in Fear-Potentiated Startle . . . . . F. Effects of Electrical Stimulatio G. The Role of Amygdala Projection Areas in Fear-Potentiated Startle . . . H. Relationship between Fear-Potentiated Startle and Startle Increas Electrical Stimulation of the Amygdala ...................... I. Relationship of the Amygdala to the Visual Structures Involved Fear-Potentiated Startle ........................................ V. Sensitization of Startle by Footshocks . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Effects of Footshocks on Acoustic Startle B. Possible Role of the Amygdala in Footsh Sensitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI. Anxiety and the Amygdala .......................................... A. Anatomical Connections between Involved in Fear or Anxiety . . . . .
264 264 267
A.
THE PSYCHOLOGY OF LEARNING ANDMOTIVATION, VOL 21
263
267 268 270 270 270 27 1 274 276 277 278 284 284 289 290 290 290 292 292
Copyright 0 1987 by Academic Press, Inc. All rights o f reproduction in any form reserved.
264
Michael Davis et al.
B. Elicitation of Fear by Electrical Stimulation of the Amygdala ............. C. The Role of the Amygdala in Fear Elicited by a Conditioned Stimulus ...... D. Conditioned Fear versus Anxiety ..................................... VII. Summary and Conclusions ............................................... References. ............................................................
294 294 295 296 297
I. Introduction The way in which neural systems mediate changes in behavior represents one of the most intriguing challenges faced by neurobiologists. The most definitive work in this area has focused on relatively simple types of behavioral change such as habituation, sensitization, and classical conditioning in invertebrate nervous systems (Alkon, 1979; Carew, 1984; Castellucci, Pinsker, Kupferman, & Kandel, 1970; Crow & Alkon, 1980; Hawkins, Abrams, Carew, & Kandel, 1983; Walters & Byrne, 1985). A major advance in the analysis of these questions was to choose a simple reflex behavior that could be modified by experience and then determine the neural circuit that mediated the behavior being measured. Once this was done, it was possible to isolate where different types of plasticity occurred and then determine how these changes were brought about at the cellular level. Comparable levels of analysis have not yet been carried out in intact vertebrate nervous systems. Because of this, it would be highly desirable to study a behavior in an intact vertebrate that is complex enough to be modified by experience, yet simple enough to have a neural circuit amenable to experimental analysis. The short-latency acoustic startle reflex enhanced by prior classical fear conditioning satisfies these requirements. The purpose of this article is to describe the fear-potentiated startle paradigm and the advantages it provides for a pharmacological and neuroanatomical analysis of fear conditioning. Pharmacological treatments that block fear-potentiated startle are reviewed, the neural pathways involved in the startle reflex are described, and the role of the amygdala in fear-potentiated startle and its possible connections to the startle pathway and critical visual structures that carry information about the conditioned stimulus are reviewed. Finally, the importance of the central nucleus of the amygdala and its efferent projections to several brainstem target areas for fear and anxiety are outlined. 11. The Fear-Potentiated Startle Paradigm
Capitalizing on anecdotal evidence that humans seem to startle more to a loud sound when they are afraid, Brown, Kalish, and Farber (1951) demonstrated that the amplitude of the accoustic startle reflex in the rat can
Anxiety and the Amygdala
265
be augmented by presenting the eliciting auditory startle stimulus in the presence of a cue (e.g., a light) that has previously been paired with a shock. This phenomenon has been termed the fear-potentiated startle effect and has been replicated using either an auditory or a visual conditioned stimulus and when startle has been elicited by either a loud sound or an airpuff (Albert, Dempsey & Sorenson, 1985; Anderson, Johnson, & Kempton, 1969a,b; Berg & Davis, 1984, 1985; Cassella & Davis, 1985, 1986; Cassella, Harty, &Davis, 1986; Chi, 1965; Davis, 1979a,b; Davis &Astrachan, 1978; Davis, Redmond, & Baraban, 1979; Galvani, 1970; Hitchcock & Davis, 1986a, 1987; Kurtz & Siegel, 1966; Leaton & Borszcz, 1985; Siegel, 1967; Tischler & Davis, 1983; Wagner, Siegel, & Fein, 1967). In this paradigm, a central state of fear is considered to be the conditioned response (see McAllister & McAllister, 1971). Conditioned fear is operationally defined by elevated startle amplitude in the presence of a cue previously paired with a shock (see Fig. 1). Thus, the conditioned stimulus does not elicit startle. Furthermore, the startleeliciting stimulus is never paired with a shock. Instead, the conditioned stimulus is paired with a shock and startle is elicited by another stimulus either in the presence or absence of the conditioned stimulus. Fear-potentiated startle is said to occur if startle is greater when elicited in the presence of the conditioned stimulus. Potentiated startle only occurs following paired but not unpaired presentations of the conditioned stimulus and the shock, which indicates that it is a valid measure of classical conditioning (Davis & Astrachan, 1978). Discriminations between visual and auditory conditioned stimuli (Hitchcock & Davis, 1986b; see Fig. 2) or between auditory cues that differ in duration (Siegel, 1967) have also been demonstrated with potentiated startle. Generalization decrements resulting from a change in the frequency of a pure tone conditioned stimulus between training and testing have also been reported (Siegel, 1967). Increased startle in the presence of the conditioned stimulus still occurs very reliably at least one month after original training, making it appropriate for the study of long term memory as well (Cassella & Davis, 1985). It has been suggested, however, that potentiated startle may not reflect increased fear in the presence of a conditioned fear stimulus, but instead results from the animal making a postural adjustment (e.g., crouching) in anticipation of the impending footshock that is especially conducive to startle (Kurtz & Sigel, 1966). In support of this interpretation, these authors reported that startle was increased in the presence of a cue previously paired with a footshock, but not when the cue previously had been paired with a backshock, even though the shock levels at the two loci were adjusted to support equivalent avoidance responding in another situation. In examining the difference, we found that the magnitude of the potentiated startle effect
Michael Davis et a / .
266
TRAINING: LIGHT and SHOCK PAIRED
TESTING:
NOISE-ALONE TRIALS
Q NORMAL STARTLE (in dark)
POTENTIATED STARTLE (in light)
Fig. 1. The fear-potentiated startle paradigm. During training, a neutral stimulus (conditioned stimulus) such as a light is consistently paired with a footshock. During testing, startle is elicited by an auditory stimulus (e.g., a 100-db burst of white noise) in the presence (light-noise trial type) or absence (noise-alone trial type) of the conditioned stimulus. This is simply a cartoon; the positions and postures that are pictured may not mimic the actual behavior of the animals.
was nonmonotonically related to the shock intensity used in training when either footshocks or backshocks were employed (Davis & Astrachan, 1978). Hence moderate levels of footshock or backshock produced robust potentiated startle, whereas very low or high levels of shock did not. It is possible, therefore, that the failure to find potentiated startle using backshocks in the Kurtz and Siege1 study resulted because the effective shock intensity was either too high or too low. Moreover, in spinally transected rats rigidly held in a modified stereotaxic instrument that prevented obvious postural adjustments, the pinna component of startle was found to be enhanced in the presence of a cue previously paired with a footshock (Cassella & Davis, 1986a). Potentiation of startle measured electromyographically in neck muscles also occurs in the absence of any obvious postural adjustment (Cassella et al., 1986). In addition, the magnitude of potentiated startle
Anxiety and the Amygdde
130 120 -
267
NOISE ALONE
0 CS+,
NOISE
CS-, NOISE
w 100 0
3
r
h <
I-
n 40 20
LIGHT CS+
TONE CS+
Fig. 2. Discriminative conditioning measured by fear-potentiated startle. Mean amplitude startle response elicited by a noise burst in the absence of a conditioned stimulus (noise alone), in the presence of a stimulus that has been consistently paired with a footshock (CS + , noise), or not paired with a footshock (CS - , noise), using either a light as the CS + and a 75db, 4000-Hz tone as a CS - (left bars) or a 75db. 4000-Hztone as the CS + and a light as a CS - (right bars).
correlates highly with the degree of freezing, a very common measure of fear (Leaton & Borszcz, 1985). Taken together, therefore, the data indicate that potentiated startle is a valid measure of classical fear conditioning.
III. The Pharmacology of Fear-Potentiated Startle OF FEAR-POTENTIATED STARTLE FOR STUDYING THE A. ADVANTAGES PHARMACOLOGY OF FEAROR ANXIETY
Fear-potentiated startle offers a number of advantages for analyzing how drugs affect fear or anxiety. First, potentiated startle is defined as a withinsubjects difference in startle amplitude in the presence (light-noise trials) versus the absence (noise-alone trials) of the visual conditioned stimulus. This makes it a sensitive measure, because it reduces problems caused by between-subject variability in startle. Second, it allows an evaluation of specific effects (reduction of startle on light-noise trials) versus nonspecific effects (reduction of startle on noise-alone trials), so that qualitative as well as quantitative drug profiles can be compared. Third, different intensities of the auditory stimulus can be used within the same test session to elicit
268
Michael Davis et ul.
startle. This allows potentiated startle to be assessed at several points on the measurement scale, circumventing problems of interpretation that can arise when markedly different parts of the measurement scale are involved (e.g., rate-dependent drug effect in operant paradigms; percentage figures used with very different baselines). Fourth, no shocks are given in testing. Thus drug effects observed in testing cannot be explained in terms of changes in sensitivity to shock. Fifth, the separation between training and testing sessions allows one to evaluate whether a drug alters original learning or performance, and tests for state-dependent learning can be easily evaluated (e.g., Davis, 1979a). Sixth, potentiated startle does not involve any obvious operant. Thus the animal is not required to make or withhold a voluntary response to indicate fear or lack of fear; consequently, drug-induced effects that might be expected to alter operant performance (e.g., rate-dependent, motivational, disinhibitory motor effects) are circumvented. In addition, most animal tests of fear or anxiety involve a suppression of ongoing behavior in the presence of a fear stimulus (e.g., suppression of bar-pressing in the conditioned emotional response test or the operant conflict test; suppression of licking in the lick-suppression test; suppression of movement measured by freezing; suppression of normal activity in the social interaction test). Hence, certain treatments (e.g., decreases in serotonin transmission) might appear anxiolytic in these tests if they interacted with neural systems common to each of these tests (e.g., response inhibition), even though they might not reduce anxiety clinically. Because fear in the potentiated startle paradigm is reflected by enhanced response output, it may provide an important alternative test with which to analyze potential anxiolytic compounds.
B. EFFECTS OF DIFFERENT DRUGS ON FEAR-POTENTIATED STARTLE : PERFORMANCE Table I shows that a variety of drugs that reduce fear or anxiety in humans decrease potentiated startle in rats. Clonidine, morphine, diazepam, and buspirone, which differ considerably in their mechanism of action, all block potentiated startle. In most cases, these treatments do not depress startle levels on the noise-alone trials, although clonidine does have marked depressant effects on both types of trials. In addition, yohimbine and piperoxane, which induce anxiety in normal people and exaggerate it in anxious people (Charney, Heninger, & Breier, 1984; Goldenberg, Snyder, & Aranow, 1947; Holmberg & Gershon, 1961; Soffer, 1954), actually increase potentiated startle in rats. Thus, at very low doses, these drugs increase startle amplitude on the light-noise trials without having any effect on startle on the noise-alone trials, and this only occurs in rats
Anxiety and the Amygdala
269
TABLE I
EFFECTS OF TREATMENTS THAT ALTER DIFFERENT NEUROTRANSMITTERS ON POTENTIATED STARTLE Dose range (mg/kg)
Reference
Treatments that block potentiated startle Sodium amytal
10-40
Chi (1965)
Diazepam
0.3-2.5
Davis (1979a) Berg and Davis (1984)
Flurazepam
2.5-20
Davis (1979a)
Morphine
2.5-10
Davis (1979b)
Alcohol
7.5-22.5 cc/kg
Williams (1960, in Miller & Barry, 1960)
Nicotine"
0.4
Sorenson and Wilkinson (1983)
Buspirone
0.6-10
Kehne et al. (1987)
Gepirone
0.6-10
Kehne et al. (1987)
Clonidine
0.01-0.04
Davis et al. (1979)
Treatments that do not block potentiated startle Cinanserin
10
Davis ef a/. (1987)
Cyproheptadine
5
Davis et al. (1987)
p-Chloroamphetamine
5
Davis et al. (1987)
p-Chlorophenylalanine
400x2
Davis et al. (1987)
N a 1oxone
2.0
Davis et al. (1987) Davis et al. (1987)
Raphe lesions
I .o -
Cassella & Davis (1985)
RO-15-1788
Davis et al. (1987)
Imipramine (chronic or acute)
5-10
WB-4101
1 .O
Davis et al. (1979)
Propranolol"
20
Davis et al. (1979)
Treatments that increase potentiated startle Piperoxane
0.25-1.0
Davis et al. (1979)
Yohimbine
0.125-0.25
Davis et al. (1979)
Tartial blockage.
conditioned to fear the light (Davis et al., 1979). On the other hand, Table I shows that a variety of treatments which alter serotonin (5-HT) transmission do not affect potentiated startle. This is important because treatments that decrease 5-HT transmission have an anxiolytic profile in several animal
270
Michael Davis et al.
tests of fear or anxiety (e.g., operant-conflict test, lick-suppression test, social interaction test), perhaps by interfering with response inhibition (see Soubrie, 1986). These effects may represent false positives in these tests because treatments like p-chlorophenylalanine do not appear to be anxiolytic when tested clinically (e.g., Shopsin, Friedman, & Gershon, 1976). In addition, although the anxiolytic effects of buspirone have been suggested to be mediated through the serotonin system, the ability of buspirone to selectively decrease fear-potentiated startle does not seem to be attributable to its actions at either pre- or postsynaptic 5-HT receptors (Davis, Cassella, & Kehne, 1987). C. EFFECTS OF DIFFERENT DRUGS ON FEAR-POTENTIATED STARTLE: ACQUISITION We are beginning to investigate the effects of different drugs on the acquisition of fear-potentiated startle, because very little work has been done in this area and the paradigm provides an easy way to differentiate drug effects on acquisition versus performance of fear. We did find that diazepam only slightly attenuated the acquisition of potentiated startle at a dose (2.5 mg/kg) that completely blocked performance of potentiated startle, and that these effects could not be explained by state-dependent learning (Davis, 1979a). Buspirone (5 mg/kg) does partially attenuate the acquisition of potentiated startle but is much more potent in depressing performance, again without any apparent state-dependent learning taking place (Davis, 1987). Clearly a great deal of work will have to be done to elucidate the role of various neurotransmitters in the acquisition of potentiated startle.
IV. Neural Systems Involved in Fear-Potentiated Startle Another advantage of the potentiated startle paradigm is that fear is being measured by a change in a simple reflex. Hence, with potentiated startle, fear is expressed through some neural circuit that is activated by the conditioned stimulus and ultimately impinges on the startle circuit. Our laboratory has begun to delineate these two neural pathways and to see how they interconnect to mediate potentiated startle.
A. THEACOUSTIC STARTLE PATHWAY In the rat, the latency of acoustic startle as recorded electromyographically in the foreleg is 6 msec and as recorded in the hindleg is 8 msec (Lson, McAdam, & Hammond, 1973). This is an extraordinarily short latency and indicates that only a few synapses can be involved. Figure 3
Anxiety and the Amygdala
27 1
illustrates the nuclei and fiber tracts we believe might mediate the acoustic startle response in the rat (Davis, Gendelman, Tischler, & Gendelman, 1982). The posteroventral cochlear nucleus (VCN) appears to be the first synapse in the primary acoustic startle circuit. Bilateral lesions of the VCN, but not the neighboring dorsal cochlear nuclei, abolish acoustic startle. In the awake rat, bilateral, single-pulse stimulation of the VCN (1 msec pulse width, 10-25 pA) elicits startle-like responses with a latency of 7.0-7.5 msec (see Fig. 4). The next synapse appears to occur in the ventral nucleus of the lateral lemniscus (VLL), which is known to receive direct projections from the ventral cochlear nucleus. Bilateral lesions of the VLL eliminate acoustic startle, and electrical stimulation of this nucleus elicits startle-like responses with an average latency of about 6 msec (Fig. 4). The next synapse may occur in a ventromedial region of the nucleus reticularis pontis caudalis (RPC). Bilateral lesions of this area abolish acoustic startle. Electrical stimulation of points within the RPC elicits startle-like responses with an average latency of about 5 msec. Cell bodies in the RPC send their axons to all levels of the spinal cord by way of the reticulo-spinal tract. This tract courses near or through the medial longitudinal fasciculus (MLF) on the midline and then bifurcates to form the ventral funiculi in the spinal cord. Complete lesions of the MLF eliminate acoustic startle, and electrical stimulation on the midline through the MLF elicits leg movements with a latency of about 4.0-4.5 msec (Fig. 4). Fibers from the reticulo-spinal tract synapse in the spinal cord, forming the final synapse before the neuromuscular junction. Direct monosynaptic connections onto motorneurons, as well as indirect ones through an interneuron in the cord, are possible. To date we have not determined whether spinal interneurons are involved. More recently, we have found that infusion of ibotenic acid, which destroys cell bodies without damaging fibers of passage (Schwarcz, Hokfelt, Fuxe, Jonsson, Goldstein, & Terrenius, 1979; Zaczek & Coyle, 1982), into either the VCN, VLL, or RPC also eliminates acoustic startle (Cassella & Davis, 1986b). B. FEAR-POTENTIATION OF A SHORT-LATENCY STARTLE RESPONSE MEASURED ELECTROMYOGRAPHICALLY Similar to many measures of fear, such as the conditional emotional response, potentiated startle represents a case in which the conditioned stimulus modulates an ongoing behavior. Because startle can be measured electromyographicallywith a latency of only 8 msec, the light should potentiate this 8-msec response. Typically, however, startle is not measured
272
Michael Davis et ul.
Fig. 3. Primary acoustic startle circuit consisting of the ventral cochlear nucleus (VCN); ventral nucleus of the lateral lemniscus (VLL) and the nucleus reticularis pontis caudalis (RPC). Other abbreviation used are A, aqueduct; CNIC, central nucleus of the inferior colliculus; CU, cuneate nucleus; DCN, dorsal cochlear nucleus; DP, decussation of pyramids; DR, dorsal raphe nucleus; ENIC, external nucleus of the inferior colliculus; 10, inferior olive;
Anxiety and the Amygdala
273
Fig. 4.. Electromyographic recording from the quadriceps femoris muscle complex of the medial longitudinal startle response elicited by electrical stimulation in the spinal cord (SC), fasciculus (MLF), nucleus reticularis pontis caudalis (RPC), ventral nuclei of the lateral lemniscus (VLL), or ventral cochlear nucleus (VCN), or by a tone (from Davis et a/., 1982, with permission of the Williams & Wilkins Co.).
electromyographically, but instead it is measured as a movement of a cage over a relatively long interval after onset of the startle-eliciting stimulus (e.g., 200 msec). Hence, it is conceivable that the visual conditioned stimulus does not actually alter the very short, 8-msec startle, but instead might facilitate other auditory systems which could produce cage VLL, ventral nucleus of the lateral lemniscus; LM, medial lemniscus; LV, lateral vestibular nucleus; MLF, medial longitudinal fasciculus; MTB, medial nucleus of the trapezoid body; MV, medial vestibular nucleus; nVII, nucleus of the seventh nerve; P, pyramids; RGI, nucleus reticularis gigantocellularis; RST, reticulospinal tract; SO, superior olive; TSV, spinal tract Of the fifth nerve; VAS, ventral acoustic stria; VII, seventh nerve. (From Davis ef al., 1982, with permission of the Williams & Wilkins CO.)
214
Michael Davis et al.
movements at longer latencies. If so, this might mean that the visual conditioned stimulus would not actually alter transmission along the shortlatency pathway outlined in Fig. 3. On the other hand, if the light did increase the very short-latency startle reflex, one would have to conclude that it alters transmission at some point in the short-latency pathway. To test this, rats were implanted with bipolar electrodes in the neck muscles (Cassella et al., 1986). Following recovery, these rats were trained and then tested for potentiated startle using both neck EMG and cage movement as measures of startle. In fact, the light markedly increased startle measured electromyographically in the neck, where the latency of startle is only 5 msec (Fig.5). In addition, there were high correlations within each rat between the degree of potentiation of startle measured electromyographically and by cage movement. Based on these data, the visual conditioned stimulus must ultimately alter neural transmission somewhere along the short-latency pathway outlined in Fig. 3 instead of recruiting other longer-latency auditory pathways. C. DETERMINING THE POINT WITHIN THE STARTLE PATHWAY WHEREFEARALTERS NEURAL TRANSMISSION
Having delineated the startle reflex circuit involved in fear-potentiated startle, the next task was to determine the point within the startle circuit where the visual conditioned stimulus modulates transmission following conditioning. To do this, startle-like responses have been elicited electrically from various points along the startle pathway before and after presentation of a light that was either paired or not paired with a shock in different groups of rats (Berg 8z Davis, 1985). Startle elicited by electrical stimulation at or before the point in the startle circuit where the light modulates transmission should show potentiation, whereas startle elicited beyond this point should not. In this experiment, at least 24 hr prior to training, separate groups of rats were implanted with bilateral monopolar electrodes (0.25 mm diameter, 0.5 mm uninsulated tip) in either the ventral cochlear nucleus, ventral nuclei of the lateral lemniscus, or nucleus reticularis pontis caudalis. In addition, other groups were implanted with electrodes aimed at lateral as well as medial aspects of the ventral acoustic stria (VAS), the fibers that connect the VCN to the VLL. Animals were trained with 20 light-shock pairs in two sessions separated by 24 to 48 hr, with testing taking place 24 to 72 hr after the second session. The results are shown in Fig. 6. Potentiation of electrically elicited startle was equivalent to potentiation of acoustic startle at all locations up through the VLL. In contrast, startle elicited from the RPC was not potentiated, despite undiminished and significant potentiation of startle elicited
215
Anxiety and the Amygdaln
TONE
STARTLE IN DARKNESS
-&-
STARTLE IN PRESENCE OF LIGHT TONE
y
p
-
Fig. 5. Oscilloscope tracings of startle elicited in the absence (upper panel) or presence (lower panel) of the tight conditioned stimulus. The top trace in each panel displays the output of a sound-level meter which measures the startle stimulus. The middle trace shows the electromyographic response measured from the neck muscles, and the lower trace shows the startle cage accelerometer output. (From Cassella et al., 1986, with permission from Pergamon Press.)
acoustically from these same animals. The difference in potentiation of electrically and acoustically elicited startle at sites beyond the VLL cannot be attributed to the extent of effective conditioning, because potentiation of startle elicited acoustically was just as great in the RPC group as it was in the VCN, VAS, and VLL groups. Furthermore, the potentiation of electrically elicited startle was specific to an explicit pairing of light and shock during training. Animals with electrodes implanted in the VCN and trained with a random temporal relation between light and shock showed no sign of potentiation of either electrically or acoustically elicited startle. Taken together, these data indicate that the VLL or the RPC is the point in the startle pathway where a visual conditioned stimulus ultimately modulates transmission following conditioning so as to affect the startle reflex.
Michael Davis et al.
276
z
0
F 20 5
z I-
B W
tW
t-
-I
CT
15
10
a I-
9
z a
5
W
=
I
o -5c
VCN
VAS
VLL
RPC
ELECTRODE SITES
Fig. 6. Magnitude of the fear-potentiated startle effect when startle was elicited either acoustically or electrically in different groups of rats that had electrodes implanted in either the ventral cochlear nucleus (VCN), ventral acoustic stria (VAS), ventral nucleus of the lateral lemniscus (VLL), or nucleus reticularis pontis caudalis (RPC). The white bars represent the degree of potentiation of startle elicited acoustically in rats with electrodes implanted into different parts of the startle pathway. The black bars represent the degree of potentiation of startle elicited electrically through the different electrode placements. Degree of potentiation is expressed as the mean differencein startle amplitude in the presence versus absence of the visual conditioned stimulus. (From Berg & Davis, 1985, with permission from the American Psychological Association.)
D. EFFECTS OF DIAZEPAM ON POTENTIATION OF ELECTRICALLY ELICITED STARTLE AND REVERSAL BY RO 15-1788 A critical assumption in the interpretation of these data gathered using the electrical stimulation technique is that potentiation of electrically and acoustically elicited startle represents the same phenomenon. In support of this, significant correlations were found between the magnitude of potentiation of electrically and acoustically elicited startle, r(20) = .75, p < .01, for the VCN implants, r(4) = .85, p < -05, for the VLL implants (Berg & Davis, 1985). In addition, the drug diazepam was as effective in blocking potentiation of startle elicited from the VCN as it was in blocking potentiation of acoustic startle (Berg & Davis, 1984; see Fig. 7). Moreover, with both types of stimulation, the supression by diazepam was reversed by the benzodiazepine antagonist RO 15-1788, at doses that had no effect on baseline or potentiated startle by themselves (Berg & Davis, 1984).
277
Anxiety and the Amygdaln
0
L I G H T - ELECTRIC ELECTRIC
ALONE
1 1
0.63
VEHICLE
DOSE OF D I A Z E P A M
1.25
2.5
(rng/kg)
Fig. 7. Mean amplitude of startle elicited by electrical stimulation of the ventral cochlear nucleus in the presence or absence of the light after administration of various doses of diazepam or its vehicle. (From Berg & Davis, 1984, with permission from Pergamon Press.)
E. THEROLEOF THE CENTRAL NUCLEUS OF THE AMYGDALA IN FEAR-POTENTIATED STARTLE Recently, several studies have indicated that lesions of the cerebellum or lesions of efferents from the cerebellum to the red nucleus eliminate classically conditioned motor responses such as the nictitating membrane response and conditioned leg flexion (see Thompson, Donegan, Clark, Lavond, Lincoln, Madden, Mamounas, Mauk, & McCormick, 1986). Interestingly, however, these lesions did not block heart-rate conditioning (Thompson et al., 1986), suggesting that the cerebellum may not be required for fear conditioning. On the other hand, lesions of the central nucleus of the amygdala block heart-rate conditioning (Cohen, 1975; Gentile, Jarrell, Teich, McCabe, & Schneiderman, 1986; Kapp, Frysinger, Gallagher, & Haselton, 1979a) as well as blocking measures of conditioned fear in several other experimental paradigms (see Hitchcock & Davis, 1986a). Since potentiated startle appears to be a measure of fear, we hypothesized that lesions of the amygdala should block potentiated startle, whereas lesions of the cerebellum or red nucleus might not.
278
Michael Davis el al.
To test this, rats were given 10 light-shock pairings on two successive days (Hitchcock & Davis, 1986a). At 24-48 hr following training, groups of rats received either bilateral transection of the cerebellar peduncles, bilateral lesions of the red nucleus (which receives most of the cerebellar efferents), or bilateral lesions of the central nucleus of the amygdala. Control animals were sham operated. At 3-4 days after surgery, the rats were tested for potentiated startle. Fear-potentiated startle was completely blocked by lesions of the central nucleus of the amygdala. In this same experiment, transection of the cerebellar peduncles or lesions of the red nucleus had no effect on potentiated startle (see Fig. 8). A second experiment using a visual prepulse test indicated that the blockade of potentiated startle observed in animals with lesions of the amygdala could not be attributed to visual impairment. A third experiment indicated that the absence of potentiation in the amygdalalesioned animals did not simply result from a lowered startle level ceiling, because the amygdala-lesioned animals could show increased startle with increased stimulus intensity and with administration of strychnine (see Fig. 9), a drug that reliably increases startle (Kehne, Gallager, & Davis, 1981). Finally, an additional study demonstrated that lesions of the central nucleus of the amygdala also blocked fear-potentiated startle using an auditory conditioned stimulus (Hitchcock & Davis, 1987). Taken together, the results of these experiments support the hypothesis that the amygdala is involved in fear conditioning, because potentiated startle is a measure of conditioned fear. The results are also consistent with the hypothesis that the cerebellum is involved in conditioned motor responding, rather than conditioned fear (Thompson et al., 1986). It is still possible, however, that the cerebellum could modulate potentiated startle because electrical stimulation of the cerebellum has been reported to increase the magnitude of potentiated startle (Albert, Dempsey, & Sorenson, 1985), and recent studies indicate that lesions of the vermis may alter heart-rate conditioning in rats (Supple & Leaton, 1986). F. EFFECTS OF ELECTRICAL STIMULATION OF THE AMYGDALA ON ACOUSTIC STARTLE It is not clear how the amygdala participates in fear-potentiated startle. It is possible that the light, after being paired with shock, activates the amygdala, which would then increase startle. Short-latency visual-evoked potentials have been recorded in the amygdala (Pollock, Bock, Fuchs, & Lohaus, 1976; Sanghera, Rolls, & Roper-Hall, 1979), and electrical stimulation of the amygdala has been reported to produce fearlike behaviors in animals (Applegate, Kapp, Underwood, & McNall, 1983; Gloor, 1960), to mimic conditioned and unconditioned cardiac effects in rabbit heart-rate
279
Anxiety and the Amygdala
CEREBELLUM 80 W
3
5n 60
z W
a
AMYGDALA
0 Light-Noise Noise-Alone
n
$
RED NUCLEUS
40
4
+ v)
20
3 3 im SHAM
SHAM
TRANSECTION
SHAM
LESION
LESION
Fig. 8. Mean amplitude startle on the light-noise or noise-alone trials in rats in which the cerebellum was surgically transected from the brainstem or in rats with electrolytic lesions of the red nucleus or the central nucleus of the amygdala. Sham animals for the cerebellar experiment had the transection knife inserted under the cerebellum, but the fiber pathways were not cut. Sham animals for the other experiments had the electrodes lowered into the brain, but no current was passed.
conditioning (Kapp, Gallagher, Underwood, McNall, & Whiteshorn, 1982), and to elicit feelings of anxiety in humans (Chapman, 1954). Consistent with this, we have found that low-level electrical stimulation of the amygdala (e.g., 40-400 p.4, 25-msec trains of 0.1-msec square-wave cathodal pulses) markedly increases acoustic startle amplitude (Rosen & Davis, 1987). This excitatory effect has occurred in every rat that we have tested so far in which electrodes were found to be placed in the ventral, intercalated, or medial nucleus of the amygdala or in the area just medial to the amygdaloid complex (see Fig. 10). Stimulation of the area just medial to the amygdala had the lowest threshold for increasing acoustic startle. This area coincides with the initial part of the ventral amygdalofugal pathway as it begins its projection to the lower brainstem (Krettek &Price, 1978; Schwaber, Kapp, Higgins, & Rapp, 1982; Post & Mai, 1980). Low-level electrical stimulation at this site would be expected to activate a large number of fibers projecting to the brainstem, since they are highly concentrated at this part of the pathway. In contrast, stimulation of the amygdala nuclei themselves, where the neurons of these fibers originate, would require higher currents (i .e., more current spread) to activate the same number of brainstem projections because the neurons are dispersed throughout the nuclei. Further studies involving stimulation of the area just medial to the amygdala where the ventral amygdalofugal
280
Michael Davis et al.
SHAM
AMYGDALA
Light-Noise
Noise-Alone
WATER STRYCH
WATER
STRYCH
Fig. 9. Mean amplitude startle on the light-noise and noise-done trials in rats with lesions of the central nucleus of the amygdala or sham lesions when testing was carried out after injection of either strychnine (.075) mg/kg) or its vehicle, water. (From Hitchcock &Davis, 1986a, with permission from the American Psychological Association.)
pathway begins, in combination with lesions of cell bodies (e.g., ibotenic acid lesions) in the central, medial, intercalated, or basolateral nucleus, would help clarify whether enhancement of startle by stimulation in this pathway actually results from activation of the axons originating from these amygdaloid nuclei. Observations of the rats revealed no obvious signs of behavioral activation during stimulation of the amygdala at these stimulation currents and durations, indicating that startle is an extremely sensitive index of amygdala stimulation. Moreover, the duration of stimulation is well below that used to produce kindling in rats (Handforth, 1984), so that the effects on startle are not associated with convulsions. With electrical stimulation of the amygdala, the excitatory effect on startle appears to develop very rapidly (e.g., within a few milliseconds from the onset of amygdala stimulation). The rapidity of action means that the increase in startle is not secondary to autonomic or hormonal changes that might be produced by amygdala stimulation, because these would have a much longer latency. In addition, electrical stimulation of the amygdala alone does not elicit startle even at high currents. Finally, electrical stimulation of several other nearby brain areas such as the endopiriform nucleus, fundus striati, internal capsule, or some sites in the basolatereal nucleus of the amygdala does not increase startle (see Fig. 10).
Anxiety and the Amygdala
Bregma
- 1.8
28 1
mm
Bregma -2.3 mm
Bregm:i -2.8 mm
Bregma -3.3 mm
Fig. 10. Effective and ineffective stimulation sites in the amygdala for enhancement of acoustic startle. Effective sites at currents below 100 p.A (A), at 100-200 p.A (o), at 201-300 p.A (H) and 301-400 p4 (+) are shown on left. Ineffective sites (v)are shown on the right. (From Rosen & Davis, 1987, with permission from the American Psychological Association.)
282
Michael Davis et ul.
In addition to localizing brain areas which modulate startle, enhancement of startle by electrical stimulation also allows a characterization of the neural elements which mediate the facilitory effect. By delivering paired pulses to the amygdala with various interpulse intervals, the poststimulation excitability cycle (i.e., the distribution of refractory periods) of the neural elements in the amygdala involved in startle facilitation can be determined (Gallistel, Shizgal, & Yeomans, 1981). This information may be used to estimate the fiber diameter and conduction velocity of the axons involved in startle facilitation, since these parameters are related to refractory period duration (Swadlow & Waxman, 1976). The experimental paradigm used in these studies is shown schematically in Fig. 11A. A pair of cathodal square-wave pulses of 0.1-msec duration was delivered with various intervals between the pulses, The intervals tested were 0.1,0.2, 0.4,0.6, 0.8, 1.0, 1.2, 1.6, 2.0,4.0, 10.0, and 20.0 msec from the onset of the first pulse (conditioning or C pulse) to the onset of the second pulse (test or T pulse). The second pulse was always delivered 5 msec before the onset of the acoustic startle stimulus, since preliminary experiments showed that a single pulse delivered with this lead time could enhance startle. Single pulses delivered 5 msec before the onset of the startle stimulus and startle-stimuli-alone trials were given for comparing the effects of the paired-pulse trials. Ten trials of each condition with a 15-sec intertrial interval were given in a randomized Latin-square design. The current for a single pulse to elevate startle by about 50% was determined for each electrode placement and held constant throughout these experiments. The effects of paired pulses through the electrodes which enhanced startle are shown in Fig. 11B. A single pulse ('T pulse) and paired pulses enhanced startle. Paired pulses either enhanced or had no effect when compared to a single pulse. With conditioning-test pulse (C-T) intervals from 0.1 to 0.6 msec, startle was not enhanced compared to a T pulse delivered alone. However, between 0.8- and 2-msec C-T intervals, startle was enhanced compared to the T pulse alone condition. In addition, the 1.O-msec C-T interval significantly increased startle compared to the 0.8-msec C-T interval, and intervals above 1.Omsec did not enhance startle more than the 1.O msec C-T interval. The results indicate that the population of stimulated axons of the amygdala which is responsible for enhancing startle is quite homogeneous. Most neurons recovered from refractoriness in between 0.8 and 1.O msec, and enhancement was not increased with longer C-T intervals. The refractory periods of the axons of the amygdala neurons which enhance acoustic startle are in the same range as those which subserve self-stimulationin the medial forebrain bundle in the rat (Yeomans, 1975) and callosal axons of the rabbit visual system (Swadlow & Waxman, 1976). The medial forebrain bundle and visual callosal axons are considered small and myelinated with conduction velocities estimated to be 1 to 8 m/sec (Shizgal, Bielajew, Corbett, Skelton,
283
Anxiety and the Amygdala
A
I
AMYGDALA
I
' ' 1 -ms 20' C
STARTLE STIM.
T
+(
)-
-
50 -
w
0
40-
3
-
a 2
30-
t J
< W
-J
t-
2
20-
y
10-
U v)
z a
-
-
i SS
T
71 -1
.2
.4
I
I
l
T
I
I
.6 .8 11.21.62
I
I
I
4
10
20
ALONE
C-T INTERVAL (rnsec)
Fig. 11. Refractory period and summation function for stimulated elements in the amygdala which modulate startle elicited by a 20-msec noise burst. A: Procedure for the paired-pulse experiments. The amygdala was stimulated with a 0. I-msec C pulse followed 0.1 to 20 msec later by a T pulse. The T pulses were always delivered 5 msec before the onset of the acoustic startle stimulus. The duration of the pulses in the amygdala were 0.1 msec. B: Results of the paired-pulse experiments. Along the abscissa, SS alone represents the startle stimulus given without prior amygdala stimulation, T is the T pulse followed by the startle stimulus without a preceding C pulse, and the C-T intervals are those of pairs of C and T pulses followed by startle stimuli. Startle amplitude in conditions with electrical stimulation of the amygdala was significantly greater than in the SS alone condition. * signifies that the paired pulses significantly increased startle compared to the T condition. ** denotes that the C-T interval significantly increased startle compared to the previous C-T interval (e.g.. 1.0 msec was greater than 0.8 msec). (From Rosen & Davis, 1987, with permission from the American Psychological Association.)
& Yeomans, 1980; Swadlow & Waxman, 1976). This suggests that the axons
subserving the enhancement of startle are also small and myelinated with similar conduction velocities. Future neuroanatomical studies will examine this possibility.
284
Michael Davis et al.
G. THEROLEOF AMYGDALA PROJECTION AREASIN FEAR-POTENTIATED STARTLE
As discussed previously, lesions of the central nucleus of the amygdala block fear-potentiated startle (Fig 12). Recently, we have been using the sensitive Phaseolus vulgaris-leucoagglutinin (PHA-L) anterograde tracing technique and lesion studies to delineate possible connectionsbetween the amygdala and the startle pathway in relation to fear-potentiated startle (Hitchcock, Rosen & Davis, 1968a; Mondock &Davis, 1985). The central nucleus of the amygdala projects to a variety of brain regions via two major efferent pathways, the stria terminalis and the ventral amygdalofugal pathway. Lesions of the stria terminalis itself, or the bed nucleus of the stria terminalis, a major projection area of this pathway, do not block potentiated startle (see Fig. 13). Knife cuts of the rostral part of the ventral amygdalofugal pathway, which would interrupt its projections to the rostral lateral hypothalamus and substantia innominata, also fail to block potentiated startle (see Fig. 14). On the other hand, lesions of the caudal part of the ventral amygdalofugal pathway, at the point where it passes through the subthalamic area and cerebral peduncles, completelyblock potentiated startle (see Fig. 15). Interestingly, Jarrell, McCabe, Teich, Gentile, Van Dercar, and Schneiderman(1986) found that lesions of this area also block heart-rate conditioning. Finally, lesions of the substantia nigra, which receives central nucleus projections as well as fibers of passage from the central nucleus of the amygdala to more caudal brainstem regions, also block potentiated startle. This blockade does not seem to involve dopamine cells in the zona compacta, since infusion of the dopamine neurotoxin 6-hydroxydopamine into the substantia nigra did not block potentiated startle despite over a 90% depletion of dopamine in the caudate nucleus. These lesion experimentsindicate that the pathway from the central nucleus of the amygdala to the startle circuit travels through the caudal part of the ventral amygdalofugal pathway, which is known to project directly to many parts of the pons, medulla, and probably the spinal cord (Krettek & Price, 1978; Mizuno, Takahashi, Satoda, & Matsushima, 1985; Post & Mai, 1980; Price & Amaral, 1981; Sandrew, Edwards, Poletti, & Foote, 1986; Schwaber et al., 1982). In fact, Inagaki, Kawai, Matsuzak, Shiosaka, and Tohyama (1982) have reported direct connections between the central nucleus of the amygdala and the exact part of the nucleus reticularis pontis caudalisthat is critical for startle (an area just dorsal to the superior olive). Presently we are attempting to verify this using both anterograde and retrograde tracing techniques. Because this pathway appears to contain the peptide somatostatin (Inagaki et al., 1982), future studies will examine the role of somatostatin in fear-potentiated startle. BETWEEN FEAR-POTENTIATED STARTLE AND STARTLE H . RELATIONSHIP BY ELECTRICAL STIMULATION OF THE AMYGDALA INCREASED
It is not clear whether startle enhanced by electrical stimulation of the central nucleus of the amygdala is related to fear-potentiated startle. Several
285
Anxiety and the Amygdala
70
W
n
-c1 2
0.
f
W
35
r
n
c
-1
CY
2
v)
0
UNOP E RAT E D
SHAM
A MYG DA LA
Fig. 12. Lower panel: Mean amplitude startle response on noise-alone trials @lack bars) and light-noise trials (white bars) in unoperated rats, rats given sham lesions, and rats given bilateral lesions of the central nucleus of the amygdala. Center panel: Histological reconstruction of a representative lesion of the central nucleus of the amygdala. The black area represents the cavity produced by the lesion and the striped area represents the surrounding gliosis. Upper panel: Schematic representation of a deposit of PHA-L into the left central nucleus of the amygdala and PHA-L labeled fibers from the amygdala. Actual density of labeling is not accurately represented in the schematics shown in Figs. 12-15.
experiments could be performed to evaluate this hypothesis. For example, lesions of points along the ventral amygdalofugal pathway that block fearpotentiated startle should also block startle enhanced by stimulation of the amygdala. In fact, in a preliminary experiment, lesions of the caudal
286
Michael Davis ct ul.
UMOP E RATE D
SHAM
BCD NUCLEUS OF ST
Fig. 13. Lower panel: Mean amplitude startle response on noise-alone trials (black bars) and light-noise trials (white bars) in unoperated rats, rats given sham lesions, and rats given bilateral lesions of the bed nucleus of the stria terminals. Center panel: Histological reconstruction of a representative lesion of the bed nucleus of the stria terminalis. Upper panel: Schematic representation of terminals and fibers in the bed nucleus of the stria terminals after a deposit of PHA-L into the left central nucleus of the amygdala, as shown in Fig. 12.
ventral amygdalofugal pathway at the level of the subthalamic nucleus and substantia nigra completely blocked facilitation of startle normally produced by electrical stimulation of the central nucleus of the amygdala (Rosen, Hitchcock, & Davis, 1986).
Anxiety and the Amygdala
287
n UNOPE RAT ED
SHAM
ROSTRAL YAF
Fig. 14. Lower panel: Mean amplitude startle response on noise-alone trials (black bars) and light-noise trials (white bars) in unoperated rats, rats given sham transections, and rats given bilateral transections in the rostral division of the ventral amygdalofugal pathway that would interrupt the connection between the central nucleus of the amygdala and the rostral hypothalamus and substantia innominata. Center panel: Histological reconstruction of the transection. Upper panel: Schematic representation of terminals and fibers in the rostral division of the ventral amygdalofugal pathway after a deposit of PHA-L into the left central nucleus of the amygdala, as shown in Fig. 12.
In another experiment one might expect to see synergism between electrical stimulation of the amygdala and presentation of a visual stimulus previously paired with a shock if both a visual conditioned stimulus and electrical stimulation of the amygdala active common neural pathways. Moreover,
Michael Davis et 01.
288
12'T
e
I
n
I
a cn I-
0
UNOPE RAT ED
r SHAM
1
CAUDAL VAF
Fig. 15. Lower panel: Mean amplitude startle response on noise-alone trials (black bars) and light-noise trials (white bars) in unoperated rats, rats given sham lesions, and rats given bilateral lesions of the caudal division of the ventral amygdalofugal pathway that would interrupt the connection between the central nucleus of the amygdala and the brainstem. Center panel: Histological reconstruction of a representative lesion of the caudal ventral amygdalofugal pathway. Upper panel: Schematic representation of terminals and fibers in the caudal division of the ventral amygdalofugal pathway after a deposit of PHA-L into the left central nucleus of the amygdala, as shown in Fig. 12.
one would expect the point along the startle pathway where stimulation of th.:: amygdala ultimately alters transmission to be the same as the point where a conditioned stimulus alters transmission (i.e., one should get the same data as shown in Fig. 6 using amygdala stimulation instead of a light
Anxiety and the Amygdala
289
previously paired with a shock). If these results do indicate that enhancement of startle by electrical stimulation of the amygdala is related to fearpotentiated startle, then this technique should be valuable in determining the locus of action of various drugs that are known to alter potentiated startle. Thus, drugs that act before and perhaps at the amygdala should block startle potentiated by a stimulus previously paired with a shock but not by stimulation of the amygdala. In contrast, drugs that act beyond the amygdala should block both fear-potentiated startle and startle enhanced by electrical stimulation of the amygdala. OF THE AMYGDALA TO THE VISUALSTRUCTURES INVOLVED I. RELATIONSHIP IN FEAR-POTENTIATED STARTLE
Thus far we have no direct information linking the amygdala to any visual structures that appear critical for fear-potentiated startle. The central nucleus of the amygdala is known to receive visual input from the insular cortex (Turner & Zimmer, 1984), which probably receives visual information through a pathway involving the lateral geniculate nucleus, visual cortex, and various visual association areas. Thus it is possible that the visual conditioned stimulus would activate the central nucleus of the amygdala after being relayed through these structures. Previous experiments in our laboratory found that lesions of the lateral geniculate nucleus and the visual cortex blocked fear-potentiated startle when testing occurred 1-2 days after these lesions. In contrast, lesions of superficial layers of the superior colliculus, the pretectal area, parietal cortex, or the dorsal lateral lemniscus did not block potentiated startle (Tischler & Davis, 1983). It was also reported in that study that lesions of deep and intermediate layers of the superior colliculus blocked fear-potentiated startle. Recently, we have replicated this result by testing 6-7 days after lesions of deep and intermediate layers of the superior colliculus (Hitchcock & Davis, 1986~).At this time, however, baseline levels of startle (i.e., on the noisealone trials) were markedly elevated after lesions of the superior colliculus. In fact, when these animals were retested using a very weak, 75 -db, noise burst, they each showed increased startle in the presence of the light. Moreover, lesions of deep and intermediate layers of the superior colliculus did not prevent electrical stimulation of the amygdala from facilitating startle (Hitchcock, Rosen, & Davis, 1986b). Hence, we now believe that deep and intermediate layers of the superior colliculus, which project directly to the ventral nudeus of the lateral lemniscus (Henkel & Edwards, 1978; Henkel, 1981), tonically inhibit acoustic startle. This effect may interfere with the measurement of fear-potentiated startle unless special test conditions are arranged. However, the superior colliculus does not appear to be either an obligatory visual relay in startle potentiated by a visual
290
Michael Davis et al.
conditioned fear stimulus or part of the pathway connecting the central nucleus of the amygdala to the startle circuit.
V. Sensitization of Startle by Footshocks A. EFFECTS OF FOOTSHOCKS ON ACOUSTIC STARTLE Fear potentiated startle is defined by an increase in startle in the presence of a cue previously paired with shock. One might expect, therefore, that shock itself should increase startle amplitude, so that fear-potentiated startle would reflect the familiar finding that the conditioned response mimics the unconditioned response. Curiously, however, Brown et al. (1951) reported that acoustic startle was actually depressed when elicited from 15 to 60 sec after a footshock, although longer intervals were not tested. Recently, Fanselow and collegues (e.g., Fanselow, 1981, 1982, 1984; Fanselow & Bolles, 1979) have shown that shock leads to an increase in freezing, a traditional measure of fear in rats. Interestingly, however, freezing does not occur immediately after the shock, but develops gradually over several minutes. Since the magnitude of fear-potentiated startle correlates highly with the amount of freezing measured in the same experimental situation (Leaton & Borszcz, 1985), we reasoned that startle should increase for several minutes following footshock with a time course similar to that of freezing reported by Fanselow. To test this idea, a group of 10 rats was presented with forty 105-db noise bursts once every 30 sec before and after a train of 10 shocks (0.6 mA, 0.5 sec in duration) presented at a rate of 1 shockhec. Another group was treated identically except that no shocks were given. Figure 16 shows that a train of 10 shocks led to a progressive increase in startle amplitude that peaked in about 10 min, leading to a highly significant difference in startle amplitude in the shocked and nonshocked group during the postshock series of startle stimuli, t(18) = 4.36, p < .001. Other studies showed that the magnitude of startle facilitation was directly related to the number or intensity of intervening shocks. Moreover, shocks given prior to presenting any startle stimuli also lead to an elevation of startle, indicating that enhanced startle results from sensitization rather than simply dishabituation (e.g., Groves & Thompson, 1970). ROLEOF THE AMYGDALA IN FOOTSHOCK-INDUCED SENSITIZATION B. POSSIBLE Since footshock is both aversive and fear-producing, it is quite possible that footshock activates the amygdala which then leads to an elevation of startle via
291
Anxiety and the Amygdala
100 0 I
90
y a: a
80
k
&
70
W
n 3
a H
-J
a Z
U
60 50
w
2
40 3c 5. 1
-15
-10
-5
I 1
0
TIME AFTER
,
I
I
5
10
15
I
20
SHOCK
Fig. 16. Sensitization of startle by footshock: The mean amplitude startle response prior to and following a series of ten 0.6-mA. 500-msec footshocks (indicated by the arrow) presented once per second (solid) or no shock (open). Forty 105db noise bursts were presented at a 30-sec interval both before and after the shocks.
connections between the amygdala and the startle pathway. In fact, preliminary evidence indicates that lesions of the central nucleus of the amygdala prevent footshocks from sensitizing startle (Hitchcock & Davis, 1986d). If footshock sensitizes startle by activating the central nucleus of the amygdala, then interruption of the neural pathway that mediates footshock-induced activation of the amygdala should block shock sensitization of startle. In addition, if activation of the arnygdala by a footshock is critical for the acquisition of fear conditioning, then interruption of this pathway should also block the acquisition of fear-potentiated startle. Irnportantly, however, lesions made following acquisition should not block performance of fear-potentiated startle, since after acquisition the conditioned stimulus should now activate the amygdala thereby increasing startle, independent of the pathway connecting footshock to the amygdala.
292
Michael Davis d al.
The most direct way in which a footshock could activate the central nucleus of the amygdala would involve pain receptors in the footpads which would send action potentials through the lateral spinothalamic tract to the medial division of the medial geniculate nucleus (Lund & Webster, 1967; Mehler, 1969). This region of the medial geniculate nucleus is known to have heavy, direct projections to the central nucleus of the amygdala (LeDoux, Ruggiero, & Reis, 1985; Ottersen & Ben-Ari, 1979). A more complex pathway would involve projections of the spinothdamic tract to the ventral posterolateral nucleus of the thalamus to somatosensory cortex I and 11, which then projects to the insular cortex, which projects to the central nucleus of the amygdala (see Turner & Zimmer, 1984). Finally, other inputs might involve spino-reticular pathways which would activate catecholamine-containing neurons of the lateral tegmentum or locus coeruleus (e.g., Guyenet & Byrum, 1985; McMahan & Wall, 1985) which would then project to the amygdala (see Moore & Card, 1984). Hence, future studies will evaluate the role of these various pathways in sensitization of startle by footshocks and their involvement in the acquisition of fear-potentiated startle.
VI. Anxiety and the Amygdala A variety of animal models have been used to infer a central state of fear or anxiety. In some models fear is inferred when an animal freezes, thus interrupting some ongoing behavior such as pressing a bar or interacting socially with other animals. In other models fear is measured by changes in autonomic activity, such as heart rate, blood pressure, or respiration. Fear can also be measured by a change in simple reflexes or a change in facial expressions and mouth movements. Thus fear appears to produce a complex pattern of behaviors that are highly correlated with each other. A. ANATOMICAL CONNECTIONS BETWEEN THE AMYGDALA AND BRAIN IN FEAROR ANXIETY AREASINVOLVED
Similar to suggestions of several previous reviews (Gloor, 1960; Kapp & Pascoe, 1984; Kapp, Pascoe, & Bixler, 1984; Sarter & Markowitsch, 1985), Fig. 17 summarizes work done in many different laboratories indicating that the central nucleus of the amygdala has direct projections to a variety of brainstem areas that might be expected to be involved in many of the symptoms of fear or anxiety. Thus the central nucleus of the amygdala projects to a region of the central grey (Beitz, 1982; Post & Mai, 1980) that has been implicated in fear in a number of behavioral tests (Iwata, LeDoux, & Reis, 1986; Liebman, Mayer, & Liebeskind, 1970). Direct projections to
Anxiety and the Amygdala
I
FEAR PROVOKING STIMULI
CENTRAL GREY
1-
;-------CONDITIONED
CENTRAL NUCLFUS OF THE
DORSAL MOTOR NUCLEUS OF
PARABRACHIAL NUCLEUS
THE VAGUS HYPOTHALAMUS
NUCLEUS RETICULARIS PONTlS CAUDALIS
1
1
ARREST O F
CHANGE IN
CHANGE IN
INCREASED
ONGOING BEHAVIOR
HEART RATE BLOOD PRESSURE
RESPIRATION
REFLEX EXCITABILITY
1
1
FREEZING CARDIOVASCULAR CONFLICT TEST CONDITIONING CER SOCIAL INTERACTION
293
I PANTING RESPIRATORY DISTRESS IN PANIC
1 STARTLE INCREASE
OR SYMBOLIC STIMULI
ALA
1
TRlGEMlNAL VENTRAL TEGMENTAL A R E A FACIAL T O FRONTAL CORTEX AND MOTOR NUCLEUS LOCUS COERULEUS
I
MOUTH OPEN
INCREASE DOPAMINE
JAW M O V E M ~ N ~ ~
AND NOREPINEPHRINE
1 FACIAL EXPRESSIONS
INCREASED VIGILANCE
OF FEAR
Fig. 17. Connections of the central nucleus of the amygdafa to a variety of target areas that are probably involved in the pattern of behaviors typically associated with fear.
the dorsal motor nucleus of the vagus (Hopkins & Holstege, 1978; Schwaber et a/., 1982; Takeuchi, Matsushima, Matsushima, & Hopkins, 1983) may be involved in several autonomic measures of fear or anxiety, since the vagus nerve controls many different autonomic functions. Projections of the central nucleus of the amygdala to the parabrachial nucleus (Krettek & Price, 1978; Price & Amaral, 1981; Takeuchi, McLean, & Hopkins, 1982) may be involved in respiratory changes during fear, since electrical stimulation of the parabrachial nucleus is known to alter respiratory rate (Cohen, 1971; Bertrand & Hugelin, 1971). Direct projections to the trigeminal (Post & Mai, 1980) and perhaps the facial motor nuclei may mediate some of the facial expressions of fear. As outlined earlier, projections of the amygdala to the nucleus reticularis pontis caudalis (Inagaki et al., 1982) probably are involved in fear-potentiation of the startle reflex. Finally, projections from the central nucleus of the amygdala to the ventral tegmental area (Phillipson, 1979) may mediate stress-induced changes in dopamine turnover in the frontal cortex (e.g., Thienny, Tassin, Blane, & Glowinski, 1976). Moreover, this projection might also link the central nucleus of the amygdala to the locus coeruleus, since stimulation of the ventral tegmental area can activate the locus coeruleus (Deutch, Goldstein, & Roth, 1986), which itself has been implicated in fear and anxiety (Redmond, 1977), or increased vigilance and attention (e.g., Aston-Jones & Bloom, 1981).
294
Michael Davis et al.
B. ELICITATION OF FEAR BY ELECTRICAL STIMULATION OF THE AMYGDALA Importantly, it has also been shown that electrical stimulation of the central nucleus of the amygdala can produce a complex pattern of behavioral and autonomic changes that, taken together, constitute a state that highly resembles a state of fear. Thus, electrical stimulation of the central nucleus of the amygdala produces a cessation of ongoing behavior (Applegate et al., 1983; Gloor, 1960). In fact, cessation of ongoing behavior is the critical measure of fear or anxiety in several animal models such as the operant conflict test (Geller & Seifter, 1960) the conditioned emotional response (CER) (Estes & Skinner, 1941), social interaction test (File, 1980), and freezing itself (e.g., Fanselow & Bolles, 1979). Stimulation of the amygdala can also alter heart rate (Applegate et al., 1983; Kapp et al., 1982) and blood pressure (Morgenson & Calaresu, 1973), both measures used to study cardiovascular conditioning. Electrical stimulation of the central nucleus of the amygdala also alters respiration (Applegate et al., 1983; Harper, Frysinger, Trelease, & Marks, 1984), a prominent symptom of fear especially in panic disorders. Electrical stimulation of the amygdala also elicits jaw movements (Applegate et al., 1983; Gloor, 1960; Ohta, 1984), which often accompany the fear response. Amygdala stimulation can also produce gastric ulceration (Henke, 198Oa; Innes & Tansy, 1980; Sen & Anand, 1957), which may result from chronic fear or anxiety. As outlined earlier, electrical stimulation of specific parts of the amygdala increases the acoustic startle reflex, which is elevated during fear. Finally, it has been reported in humans that electrical stimulation of the amygdala elicits feelings of fear or anxiety as well as autonomic reactions indicative of fear (Chapman et al., 1954; Gloor, Olivier, & Quesney, 1981).
Viewed in this way, the highly correlated set of behaviors seen during fear may result from activation of a single area of the brain (the central nucleus of the amygdala) which then projects to a variety of target areas which themselves are critical for each of the specific symptoms of fear or anxiety as well as the perception of such a state. Moreover, it must be assumed that all of these connections are already formed in an adult organism, since electrical stimulation produces these effects in the absence of prior explicit fear conditioning. Given this innate wiring diagram, it would seem most parsimonious to assume that a neutral stimulus will elicit a state of fear when that stimulus comes to activate the amygdala after being paired with an aversive stimulus. Thus fear conditioning probably involves neural plasticity afferent to or in the amygdala rather than a change in its efferent target areas.
C. THEROLEOF THE AMYGDALA IN FEARELICITED BY A CONDITIONED STIMULUS Consistent with this interpretation, several studies have shown that a neutral stimulus paired with aversive stimulation will alter neural firing in the amygdala, especially the central nucleus of the amygdala (Henke, 1983;
Anxiety and the Arnygdala
295
Pascoe & Kapp, 1985). Moreover, lesions of the central nucleus are known to eliminate or attenuate conditioned changes measured by a cessation of ongoing behavior such as freezing (Iwata et al., 1986a), reduced bar pressing in the operant confict test (Shibata, Kataoka, Yamashita, & Ueki, 1986), or the conditioned emotional response paradigm (Kellicut & Schwartzbaum, 1963; Spevack, Campbell, & Drake, 1975). Lesions of the central nucleus also block conditioned changes in heart rate (Cohen, 1975; Gentile et al., 1986; Kapp, Frysiner, Gallagher, & Haselton, 1979), blood pressure (Iwata, LeDoux, Meeley, Arneric & Reis, 1986), or ulceration induced by immobilization stress (Henke, 1980b). Data outlined earlier indicate that lesions of the central nucleus of the amygdala block fearpotentiated startle (Hitchcock & Davis, 1986a). Lesions of the amygdala are known to block several measures of innate fear in different species (see Blanchard & Blanchard, 1972; Ursin, Jellestad, & Cabrera, 1981). This, along with a large literature implicating the amygdala in many other measures of fear such as active and passive avoidance (for reviews see Kaada, 1972; Sarter & Markowitsch, 1985; Ursin et al., 1981) and evaluation and memory of emotionally significant sensory stimuli (Bennett, Liang, & McGaugh, 1985; Bresnahan & Routtenberg, 1972; Ellis & Kesner, 1983; Gallagher & Kapp, 1978, 1981; Gold, Hankins, Edwards, Chester, & McGaugh, 1975; Handwerker, Gold, & McGaugh, 1974; Kesner, 1982; Liang, Bennett, & McGaugh, 1985; Liang, Juler, & McGaugh, 1986; Mishkin & Aggleton, 1981) compellingly indicate a crucial role of the amygdala in fear. D. CONDITIONED FEARVERSUS ANXIETY Clinically, fear is regarded to be more stimulus specific than anxiety, despite very similar symptoms. Figure 17 suggests that spontaneous activation of the central nucleus of the amygdala would produce a state resembling fear in the absence of any obvious eliciting stimulus. In fact, fear and anxiety often precede temporal lobe epileptic seizures (Gloor et al., 1981), which are usually associated with abnormal electrical activity of the amygdala (Crandall, Walter, & Dymond, 1971). An important implication of this distinction is that treatments that block conditioned fear might not necessarily block anxiety. For example, if a drug decreased transmission along a sensory pathway required for a conditioned stimulus to activate the amygdala, then that drug might be especially effective in blocking conditioned fear. However, if anxiety resulted from activation of the amygdala not involving that sensory pathway, then that drug might not be especially effective in reducing anxiety. On the other hand, drugs that act specifically in the amygdala should affect both conditioned fear and anxiety. Moreover, drugs that act at various target areas might be expected to provide selective actions on some but not all of the somatic symptoms associated with anxiety. It is noteworthy in this regard that the central nucleus of the amygdala is known to have a high density of opiate receptors (Goodman, Snyder, Kuhar, & Young, 1980) whereas the basolateral nucleus, which projects to the central nucleus (Smith & Millhouse, 1985), has a high density of
2%
Michael Davis et al.
benzodiazepine receptors (Niehoff & Kuhar, 1983). In fact, local infusion of the opiate agonist levorphanol into the central nucleus of the amygdala blocks conditioned bradycardia in rabbits (Gallagher, Kapp, McNall, & Pascoe, 1981) and has anxiolytic effects in the social interaction test (File & Rodgers, 1979). Furthermore, local infusion of benzodiazepines into the amygdala has anxiolytic effects in the operant conflict test (Nagy, Zambo, & Decsi, 1979; Petersen & Scheel-Kruger, 1982; Scheel-Kruger & Petersen, 1982; Shibata, Kataoka, Gomita, & Ueki, 1982). Interestingly, anticonflict effects of benzodiazepines only seem to occur after local infusion into the basolateral nucleus (the nucleus of the amygdala that has a high density of benzodiazepine receptors) and not after local infusion into the central nucleus (Petersen & Scheel-Kruger, 1982). Taken together, therefore, these results suggest that drug actions in the amygdala may be sufficient to explain both fear-reducing and anxiety-reducing effects of different drugs. Future studies employing local infusion of benzodiazepine or opiate antagonists into the amygdala coupled with systemic administration of various agonists may be able to determine if local binding to receptors in the amygdala is necessary to explain their anxiolytic effects. Eventually, local infusion of various drugs into specific target areas may be used to evaluate whether highly specific anxiolytic actions are produced. These results could then serve as a guide for eventually producing more selective anxiolytic compounds.
VII. Summary and Conclusions The potentiated startle paradigm measures conditioned fear by an increase in the amplitude of a simple reflex (the acoustic startle reflex) in the presence of a cue previously paired with shock. This paradigm offers a number of advantages as an alternative to most animal tests of fear or anxiety since it involves no operant and is reflected by an enhancement rather than a suppression of ongoing behavior. Lesion and electrical stimulation studies on fear-potentiated startle and startle increased by electrical stimulation of the amygdala are being used to define the neural pathways necessary for a visual conditioned stimulus to alter the acoustic startle reflex. The current working hypothesis is that the conditioned stimulus activates the central nucleus of the amygdala through a pathway involving the lateral geniculate nucleus, visual cortex, visual association cortex, and insular cortex. The central nucleus of the amygdala may then project directly to the acoustic startle pathway so as to modulate the startle response. More work has to be done to define conclusively the relevant neural pathways involved in fear-potentiated startle. Nonetheless, we feel that by combining behavioral, anatomical, physiological, and pharmacological approaches, it will be possible to determine each step along the pathway which mediates the ability of a stimulus signaling fear to alter behavior. Once the exact
Anxiety and the Amygdsla
297
structures are delineated, it should be possible to determine the neurotransmitters that are released during a state of fear and how this chemical information is relayed along these pathways so as to affect behavior. Eventually, this approach should allow us to determine where plastic changes take place along these pathways to mediate the conditioned effects that are being measured and the biochemical processes that are involved. An important insight gained from work in several different laboratories is that the central nucleus of the amygdala appears to play a crucial role in conditioned fear and probably anxiety. Many of the amygdala’s projection areas are involved in reactions that are used to measure fear and anxiety. Moreover, electrical stimulation of the amygdala elicits a pattern of behaviors that mimic natural or conditioned states of fear, and lesions of the amygdala block innate or conditioned fear. Since drugs can potentially act on afferent or efferent systems of the amygdala or in the amygdala itself, different classes of anxiolytic drugs may block fear or anxiety by acting at different points along the neural pathways involved in fear. Therefore, it may not be necessary to ask whether all anxiolytic drugs block fear or anxiety by interacting with a single transmitter. It is more likely that some drugs act by blocking transmission of the conditioned stimulus to the amygdala while others might block transmission along its efferent pathways. Eventually such information should allow more specific drugs to be developed for treating clinical anxiety disorders. ACKNOWLEDGMENTS This research was supported by NSF Grant BNS-81-20476, NIMH Grant MH-25642, NINCDS Grant NS-18033, Research Scientist Development Award MH-00004, and the State of Connecticut. Some of the results in this article will be submitted to the Yale Graduate School in partial fulfillment of the requirement for a Ph.D. to Janice Hitchcock. Our sincere thanks are extended to Lee Schlesinger, who tested many of the animals used for these studies, to Bruce Kapp for helpful discussions about the brainstem projections of the amygdala, to Ariel Deutsch for collaboration on the PHA-L studies, to James Cassella and John Kehne for comments on the manuscript, and to Leslie Fields for help in typing this paper. Special thanks are extended to Don Weisz for alerting us some years ago to the importance of the amygdala in fear.
REFERENCES Albert, T. J., Dempsey, C. W., & Sorenson, C. A. (1985). Anterior cerebellar vermal stimulation: Effect on behavior and basal forebrain neurochemistry in rat. Biological Psychiatry, 20, 1267-1276. Alkon, D. L. (1979). Voltage-dependent calcium and potassium ion conductances: A contingency mechanism for an associative learning model. Science, 205, 810-816.
298
Michael Davis et al.
Anderson, D. C., Johnson, D., & Kempton, H. (1969a). Second-order fear conditioning as revealed through augmentation of a startle response: Part I. Psychonomic Science, 16, 5-7. Anderson, D. C., Johnson, D., & Kempton, H. (1969b). Second-order fear conditioning as revealed through augmentation of a startle response: Part 11. Psychonomic Science, 16, 7-9. Applegate, C. D., Kapp, B. S., Underwood, M. D., & McNall, C. L. (1983). Autonomic and somatomotor effects of amygdala central n. stimulation in awake rabbits. Physiology &Behavior, 31, 353-360. Aston-Jones, G.,& Bloom, F. E. (1981). Norepinephrine-containing locus coeruleus neurons in behaving rats exhibit pronounced responses to non-noxious environmental stimuli. Journal of Neuroscience, 1, 887-900. Beitz, A. J . (1982). The organization of afferent projections to the midbrain periaqueductal gray of the rat. Neuroscience, 7, 133-159. Bennett, C., Liang, K. C., & McGaugh, J. L. (1985). Depletion of adenal catecholamines alters the amnestic effect of amygdala stimulation. Behavioral Bruin Research, 15,83-91. Berg, W.K., & Davis, M. (1984). Diazepam blocks fear-enhanced startle elicited electrically from the brainstem. Physiology & Behavior, 2, 333-336. Berg, W.K., & Davis, M. (1985). Associative learning modifies startle reflexes at the lateral lemniscus. Behavioral Neuroscience, 99,191-199. Bertrand, S., & Hugelin, A. (1971). Respiratory synchronizing function of the nucleus parabrachialis medialis: Pneumotaxic mechanisms. Journal of Neurophysiology, 34, 180-207. Blanchard, D. C., & Blanchard, R. J. (1972). Innate and conditioned reactions to threat in rats with amygdaloid lesions. Journal of Comparative and Physiological Psychology, 81, 28 1-290. Bresnahan, E.,& Routtenberg, A. (1972). Memory disruption by unilateral low level, subseizure stimulation of the medial amygdaloid nucleus. Physiology & Behavior, 9,513-525. Brown, J . S., Kalish, H. I. & Farber, I. E. (1951). Conditioned fear as revealed by magnitude of startle response to an auditory stimulus. Journal of Experimental Psychology, 41, 317-328. Carew, T. J. (1984). An introduction to cellular approaches used in the analysis of habituation and sensitization in aplysia. In H. V. S. Peeke & L. Petrinovich (Eds.), Habituation, sensitization, and behavior (pp. 205-249). New York: Academic Press. Cassella, J. V., & Davis, M. (1986a). Habituation, prepulse inhibition, fear conditioning, and drug modulation of the acoustically elicited pinna reflex in rats. Behavioral Neuroscience, loo, 39-44. Cassella, J. V., & Davis, M. (1986b). Neural structures mediating acoustic and tactile startle reflexes and the acousticallyelicited pinna response in rats: Electrolytic and ibotenic acid studies. Society for Neuroscience Abstracts, 12, 1273. Cassella, J . V., Harty, P. T., &Davis, M. (1986). Fear conditioning, pre-pulse inhibition and drug modulation of a short latency startle response measure electromyographically from neck muscles in the rat. Physiology & Behovior, 36, 1187-1 191. Castellucci, V., Pinsker, H., Kupfermann, I., & Kandel, E. R. (1970). Neuronal mechanisms of habituation and dishabituation of the gill-withdrawal relex in Aplysia. Science, 167, 1745-1748. Chapman, W. P., Schroeder, H. R., Guyer, G., Brazier, M. A. B., Fager, C., Poppen, J. L., Solomon, H. C., & Yakolev, P. I. (1954). Physiological evidence concerning the importance of the amygdaloid nuclear region in the integration of circulating function and emotion in man. Science, 129, 949-950. Charney, D. S., Heninger, G. R., & Breier, A. (1984). Noradrenergic function in panic anxiety. Archives of General Psychiatry, 41, 751-763.
Anxiety and the Amygdala
299
Chi, C. C. (1965). The effect of amobarbital sodium on conditioned fear as measured by the potentiated startle response in rats. Psychopharmacologia, 7 , 115-122. Cohen, D. H. (1975). Involvement of the avian amygdalar homologue (archistriatum posterior and mediale) in defensively conditioned heart rate change. Journal of Comparative Neurology, 160, 13-36. Cohen, M. I. (1971). Switching of the respiratory phases and evoked phrenic responses produced by rostra1 pontine electrical stimulation. Journal of Physiology (London). 217, 133-158. Crandall, P. H., Walter, R. D., & Dymond, A. (1971). The ictal electroencephalographic signal identifying limbic system seizure foci. Proceedings of the American Association of Neurology & Surgery, 1, 1. Crow, T. J., & Alkon. D. L. (1980). Associative behavior modification in Hermissenda: Cellular correlates. Science, 209,412-414. Davis, M. (1979a). Diazepam and flurazepam: Effects on conditioned fear as measured with the potentiated startle paradigm. Psychopharmacology, 62, 1-7. Davis, M.(1979b). Morphine and naloxone: Effects on conditioned fear as measured with the potentiated startle paradigm. European Journal of Pharmacology, 54, 341-347. Davis, M. (1987). Effects of buspirone on acquisition of fear]. Unpublished raw data. Davis, M., &Astrachan, D. I. (1978). Conditioned fear and startle magnitude: Effects of different footshock or backshock intensities used in training. Journal of Experimental Psychology: Animal Behavior Processes, 4, 95-103. Davis, M., Cassella, J. V., & Kehne, J. H. (1987). Serotonin does not mediate anxiolytic effects of buspirone in the fear-potentiated startle paradigm: Comparison with 8-OHDPAT and ipsapirone. Psychopharmacology, in press. Davis, M., Gendelman, D. S., Tischler, M. D., & Gendelman, P. M. (1982). A primary acoustic startle circuit: Lesions and stimulation studies. Journal of Neuroscience, 6,791-805. Davis, M., Kehne, J. H., & Cassella, J. B. (1985). Buspirone and MJ-13805 selectively attenuate fear as measured with the potentiated startle paradigm. Society for Neuroscience Abstracts, 11, 426. Davis, M., Redmond, D. E., Jr., & Baraban, J . M. (1979). Noradrenergic agonists and antagonists: Effects on conditioned fear as measured by the potentiated startle paradigm. Psychopharmacology, 65, 111- 1 18. Deutsch, A. Y.,Goldstein, M., & Roth, R. H. (1986). Activation of the locus coeruleus induced by selective stimulation of the ventral tegmented area. Brain Research, 363, 307-3 14. Ellis, M. E., & Kesner, R. P. (1983). The noradrenergic system of the amygdala and aversive information processing. Behavioral Neuroscience, 97, 399415. Estes, W. K., & Skinner, B. F. (1941). Some quantitative properties of anxiety. Journal of Experimental Psychology, 29, 390MOO. Fanselow, M. S. (1981). Naloxone and Pavlovian fear conditioning. Learning and Motivation, 12, 398-419. Fanselow, M. S. (1982). The postshock activity burst. Animal Learning and Behavior, 10, 448-454. Fanselow, M. S. (1984). Shock-induced analgesia on the formalin test: Effect of shock severity, naloxone, hypophysectomy, and associative variables. Behavioral Neuroscience, 98, 79-95. Fanselow, M. S., & Bolles, R. C. (1979). Naloxone and shock-elicited freezing in the rat. Journal of Comparative and Physiological Psychology, 93,736-744. File, S. E. (1980). The use of social interaction as a method for detecting anxiolytic activity of chlordiazepoxide-like drugs. Journal of Neuroscience Methods, 2, 219-238. File, S. E., & Rodgers, R. J. (1979). Partial anxiolytic actions of morphine sulphate following
Michael Davis et al.
300
microinjection into the central nucleus of the amygdala in rats. Pharmacology, Biochemistry, and Behavior, 11, 313-318. Gallagher, M., & Kapp. B. S. (1978). Manipulation of opiate activity in the amygdala alters memory processes. Life Sciences, 23, 1973-1978. Gallagher, M., & Kapp, B. S. (1981). Effect of phentolamine administration into the amagdala complex of rats on timedependent memory processes. Behavioral and Neural Biology, 31, 90-95.
Gallagher, M., Kapp, B. S . , Frysinger, R. C., & Rapp, P. R. (1980). Beta-adrenergic manipulation in amygdala central n. alters rabbit heart rate conditioning. Pharmacology, Biochemistry, and Behavior, 12, 419-426. Gallagher, M., Kapp, B. S.. McNall, C.L., & Pascoe, J. P. (1981). Opiate effects in the amygdala central nucleus on heart rate conditioning in rabbits. Pharmacology, Biochemistry, and Behavior,, 14, 497-505. Gallistel, C. R., Shizgal, P., & Yeomans, J. S. (1981). Portrait of the substrate for self-stimulation. Psychological Review, 88, 228-273. Galvani, P. F. (1970). Air-puff-elicited startle: Habituation over trials and measurement of a hypothetical emotional response. Behavioral Research Methods & Instrumentation, 2, 232-233.
Geller, I., & Seifter, J. (1960). The effects of memprobamate, barbiturates, d-amphetamine and promazine on experimentally induced conflict in the rat. Psychopharmacologia, 1, 482-492,
Gentile, C. G., Jarrell, T. W. Teich, A., McCabe, P. M., & Schneiderman, N. (1986). The role of amygdaloid central nucleus in the retention of differential Pavlovian conditioning of bradycardia in rabbits. Behavioral Brain Research, 20, 263-273. Gerfen, C. R., & Sawchenko, P. E. (1984). An antegrade neuroanatomical tracing method that shows the detailed morphology of neurons, their axons and terminals: Immunohistochemical localization of an axonally transported plant lectin, Phaseolus vulgaris leucoagglutinin (PHA-L). Brain Research, 290, 219-238. Glendenning, K. K., Brunso-Bechtold, J. K., Thompson, G. C., & Masterton, R. B. (1981). Ascending auditory afferents to the nuclei of the lateral lemniscus. Journal of Comparative Neurology, 191, 673-703. Gloor, P. (1960). Amygdala. In J. Field (Ed.), Handbook ofphysiology: Sect. I . Neurophysiology (Vol. 2, pp. 1395-1420). Washington, DC: American Physiological Society. Gloor, P., Olivier, A., & Quesney, L. F. (1981). The role of the amygdala in the expression of psychic phenomena in temporal lobe seizures. In Y. Ben-Ari (Ed.), The amygdaloid complex. New York: Elsevier. Gold, P. E., Hankins, L., Edwards, R M., Chester, J., & McGaugh, J. L. (1975). Memory interference and facilitation with posttrial amygdala stimulation: Effect varies with footshock level. Brain Research, 86, 509-513. Goldenberg, M., Snyder, C. H., & Aranow, H., Jr. (1947). New test for hypertension due to circulating epinephrine. Journal of the American Medical Association, 135, 971-976. Goodman, R. R., Snyder, S. H., Kuhar, M. J., & Young, W. S., 111. (1980). Differentiation of delta and mu opiate receptor localizations by light microscopic autoradiography. Proceedings of the National Academy of Sciences of the United States of America. 11, 2167-2174.
Groves, P. M., &Thompson, R. F. (1970). Habituation: A dual process theory. Psychological Review, 71, 419-450. Guyenet, P. G., & Byrum, C. E. (1985). Comparative effects of sciatic nerve stimulation, blood pressure, and morphine on the activity of A5 and A6 pontine noradrenergic neurons. Brain Research, 321, 191-201. Handforth, A. (1984). Implications of stimulus factors governing kindled siezure threshold. Experimental Neurology, 86, 33-39. Handwerker, M. J., Gold, P.E.,& McGaugh, J. L. (1974). Impairment of active avoidance learning with posttraining amygdala stimulation. Brain Research, 75, 324-327.
Anxiety and the Amygdala
301
Harper, R. M., Frysinger, R. C., Trelease, R. B., & Marks, J. D. (1984). State-dependent alteration of respiratory cycle timing by stimulation of the central nucleus of the amygdala. Brain Research, 306, 1-8. Hawkins, R. D., Abrams. T. W., Carew, T. J., & Kandel, E. R. (1983). A cellular mechanism of classical conditioning in Aplysia: Activitydependent amplification of presynaptic facilitation. Science, 219, 400-405. Henke, P. G. (198Oa). The centromedial amygdala and gastric pathology in rats. Physiology & Behavior, 25, 107-1 12. Henke, P. G. (1980b). The amygdala and restraint ulcers in rats. Journal of Comparative and Physiofogy Psychology, 94, 313-323. Henke, P. G. (1983). Unit-activity in the central amygdala nucleus of rats in response to immobilization-stress. Brain Research Reviews, 10, 833-837. Henkel, C. K. (1981). Afferent sources of a lateral midbrain tegmental zone associated with the pinnae in the cat as mapped by retrograde transport of horseradish peroxidase. Journal of Comparative Neurology, 203, 213-226. Henkel, C. K., &Edwards, S. B. (1978). The superior colliculus control of pinna movements in the cat: Possible anatomical connections. Journal of Comparative Neurology, 182, 763-776. Hitchcock, J. M., &Davis, M. (1986a). Lesions of the amygdala, but not of the cerebellum or red nucleus, block conditioned fear as measured with the potentiated startle paradigm. Behavioral Neuroscience, 100, 11-22. Hitchcock, J. M., & Davis, M. (1986b). (Discrimination between visual and auditory stimuli demonstrated in the fear-potentiated startle paradigm.) Unpublished raw data. Hitchcock, J. M., &Davis, M. (1986~).[Lesions of deep layers of the superior colliculus do not block fear-potentiated startle when very weak stimulus intensities are used to elicit acoustic startle]. Unpublished raw data. Hitchcock, J. M., &Davis, M. (1986d). [Lesions of the central nucleus of the amygdala block enhancement of acoustic startle by footshock]. Unpublished raw data. Hitchcock, J. M., & Davis, M. (1987). Fear-potentiated startle using an auditory conditioned stimulus: Effect of lesions of the amygdala. Physiology & Behavior, 39, 403-408. Hitchcock. J. M., Rosen, J. B., & Davis, M. (1986a). [Efferent pathways of the amygdala involved in fear-potentiated startle: Lesion and PHA-L anterograde tracing studies]. Unpublished raw data. Hitchcock, J. M., Rosen, J. B., & Davis, M. (1986b). [Lesions of deep layers of the superior colliculus do not prevent electrical stimulation of the amygdala from facilitating the startle reflex]. Unpublished raw data. Hokfelt, T., Johansson, O., & Goldstein, M. (1984). Central catcheolamine neurons as revealed by immunohistochemistry with special reference to adrenaline neurons. In A. Bjorlund, & T. Hokfelt (Eds.), Handbook of chemical neuroanatomy (pp. 157-276). Amsterdam: Elsevier. Holmberg, G . , & Gershon, S. (1961). Autonomic and psychic effects of yohirnbine hydrochloride. Psychopharmacologia (Berlin), 2, 93-106. Hopkins, D. A., & Holstege, G. (1978). Amygdaloid projections to the mesencephalon, pons and medulla oblongata in the cat. Experimental Brain Research, 32, 529-547. Inagaki, S., Kawai, Y., Matsuzak, T., Shiosaka, S., & Tohyama, M. (1983). Precise terminal fields of the descending somatostatinergic neuron system from the arnygdala complex of the rat. Journal fir Hirnforschung. 24, 345-356. Innes, D. L., &Tansy, M. F. (1980). Gastric mucosal ulceration associated with electrochemical stimulation of the limbic system. Brain Research Bulletin, 5, 33-36. Ison, J. R., McAdam, D. W., & Hammond, G. R. (1973). Latency and amplitude changes in the acoustic startle reflex of the rat produced by variation in auditory prestimulation. Physiology & Behavior, 10, 1035-1039. Iwata, J .. LeDoux, J. E., Meeley, M. P., Arneric, S . , & Reis, D. J., (1986a). Intrinsic neurons in the amygdala field projected to by the medial geniculate body mediate emotional responses conditioned to acoustic stimuli. Brain Research, 383, 195-214.
Michael Davis el al.
302
Iwata, J., LeDoux, J. E., & Reis, D. J. (1986b). Different efferent projections of the geniculoamygdala pathway mediate autonomic and behavioral concomitants of conditioned fear: Possible completion of the conditioned response circuits. Society for Neuroscience Abstracts, 12, 977. Jarrell, T. W., McCabe, P. M., Teich, A., Gentile, C. G., VanDercar, D. H., &Schneiderman, N. (1986). Lateral subthalamic area as mediator of classically conditioned bradycardia in rabbits. Behavioral Neuroscience, 100, 3-10. Kaada, B. R. (1972). Stimulation and regional ablation of the amygdaloid complex with reference to functional representations. In B. E. Eleftheriou (Ed.), The neurobiology of the amygdala @p. 205-281). New York: Plenum. Kapp, B. S., Frysinger, R. C., Gallagher, M., & Haselton, J. R. (1979). Amygdala central nucleus lesions: Effects on heart rate conditioning in the rabbit. Physiology & Behavior, 23, 1109-1117. Kapp, B. S., Gallagher, M., Underwood, M. D., McNall, C. L., & Whitehorn, D. (1982). Cardiovascular responses elicited by electrical stimulation of the amygdala central nucleus in the rabbit. Brain Research, 234,251-262. Kapp, B. S., & Pascoe, J. P.(1984). Correlational aspects of memory: Vertebrate model systems. In R. P. Kesner & J. L. Martinez (Eds.), Learning and memory: A biological view. New York: Academic Press. Kapp, B. S . , Pascoe, J. P., & Bixler, M. A. (1984). The amygdala: A neuroanatomical systems approach to its contribution to aversive conditioning. In L. S. Squire &N. Butters (Eds.), Neuropsychology of memory. New York: Guilford Press. Kehne, J. H., Cassella, J. V.,&Davis, M. (1987). Anxiolytic effects of buspirone and gepirone in the fear-potentiated startle effect. Psychopharmacology, in press. Kehne, J. H., Gallager, D. W., &Davis, M. (1981). Strychnine: Brainstem and spinal mediation of excitatory effects on acoustic startle. European Journal of Pharmacology, 76, 177-186.
Kellicut, M. H., & Schwarzbaum, J. S. (1963). Formation of a conditioned emotional response (CER) following lesions of the amygdaloid complex in rats. Psychological Review, 12, 351-358.
Kesner, R. P. (1982). Brain stimulation: Effects on memory. Behavioral and Neural Biology, 36, 315-358.
Krettek, J. E., & Price, J. L. (1978). Amygdaloid projections to subcortical structures within the basal forebrain and brainstem in the rat and cat. Journal of Comparative Neurology, 178,225-254.
Kurtz, K. H., & Siegel, A. (1966). Conditioned fear and magnitude of startle response: A replication and extension. Journal of Comparative and Physiological Psychology, 62,8-14. Leaton, R. N., & Borszcz, G. S. (1985). Potentiated startle: Its relation to freezing and shock intensity in rats. Journal of Ekperimentai Psychology: Animal Behavior Processes, 11, 421-428.
LeDoux, J. E., Ruggiero, D. A., & Reis, D. J. (1985). Projections to the subcortical forebrain from anatomically defined regions of the medial geniculate body in the rat. Journal of Comparative Neurology, 242, 182-213. Liang, K. C., Bennett, C., & McGaugh, J. L. (1985). Peripheral epinephrine modulates the effects of post-training amaygdala stimulation on memory. Behavioral Brain Research, 15, 93-100.
Liang, K. C., Juler, R. G., & McGaugh, J. L. (1986). Modulating effects of posttraining epinephrine on memory: Involvement of the amygdala noradrenergic systems. Brain Research, 368, 125-133. Liebman, J. M., Mayer, D. J., & Liebeskind, J. C. (1970). Mesencephalic central gray lesions and fear-motivated behavior in rats. Brain Research, 23, 353-370. Lund, R. D., & Webster, K. E. (1%7). Thalamic afferents from the spinal cord and trigeminal nuclei. Journal of Comparative Neurology, 130, 313-328.
Anxiety and the Amygdala
303
McAllister, W. R., & McAllister, D. E. (1971). Behavioral measurement of conditioned fear. In F. R. Brush (Ed.), Aversive conditioning and learning (pp. 105-179). New York: Academic Press. McMahon, S. B., &Wall, P. D. (1985). Electrophysiological mapping of brainstem projections of spinal cord lamina I cells in the rat. Brain Research, 333, 19-26. Mehler, W. R. (1969). Some neurological species differences-a posteriori. Annals of the New York Academy of Sciences 167, 424-468. Miller, N. E., &Barry, H., 111. (1960). Motivational effects of drugs: Methods which illustrate some general problems in psychopharmacology. Psychopharmacologia, 1, 169-199. Mishkin, M., & Aggleton, J. (1981). Multiple functional contributions of the amygdala in the monkey. In Y. Ben-Ari (Ed.), The arnygdaloid complex. New York: Elsevier. Mizuno, N., Takahashi, O., Satoda, T., & Matsushima, R. (1985). Amygdalospinal projections in the macaque monkey. Neuroscience Letters, 53, 321-330. Mondlock, J. M., & Davis, M. (1985). The role of various amygdala projection areas (bed nucleus of stria terminalis, rostra1 lateral hypothalamus, substantia nigra) in fearenhanced acoustic startle. Society for Neuroscience Abstracts, 11, 331. Moore, R. Y.,& Card, J. P. (1984). Noradrenaline-containing neuron systems. In A. Bjorklund & T. Hokfelt (Eds.), Handbook of chemical neuroanatomy. Amsterdam: Elsevier. Morgenson, G. J., & Calaresu, F. R. (1973). Cardiovascular responses to electrical stimulation of the amygdala in the rat. Experimental Neurology, 39, 166-180. Nagy, J., Zambo, K., & Decsi, L. (1979). Anti-anxiety action of diazepam after intra-amygdaloid application in the rat. Neuropharmacology, 18, 573-576. Niehoff, D. L., &Kuhar, M. J. (1983). Benzodiazepine receptors: localization in rat amygdala. Journal of Neuroscience, 3 , 209 1-2097. Ohta, M. (1984). Amygdaloid and cortical facilitation or inhibition of trigeminal motoneurons in the rat. Brain Research, 291, 39-48. Ottersen, 0. P., & Ben-Ari, Y. (1979). Afferent connections to the amygdaloid complex of the rat and cat. I. Projections from the thalamus. Journal of Comparative Neurology, 187, 401-424.
Pascoe, J. P., & Kapp, B. S. (1985). Electrophysiological characteristics of amygdaloid central nucleus neurons during Pavlovian fear conditioning in the rabbit. Behavioral Brain Research, 16, 117-133. Petersen, E. N., & Scheel-Kruger, J. (1982). The GABAergic anticonflict effect of intraamygdaloid benzodiazepines demonstrated by a new water lick conflict paradigm. In M. Y . Spiegelstein & A. Levy (Eds.), Behavioral models and analysis of drug action. Amsterdam: Elsevier. Phillipson, 0. T. (1979). Afferent projections to the ventral tegmented area of Tsai and intrafascicular nucleus. A horseradish peroxidase study in the rat. Journal of Comparative Neurology, 187, 117-143. Pollock, B., Bock, P. R., Fuchs, A. M., & Lohaus, R. (1976). Visually evoked potentials in cortical and subcortical brain structures of conscious rabbits with chronically implanted electrodes. Arzneirnittelforschung (Drug Research), 26, 321-334. Post, S., & Mai, J. K. (1980). Contribution to the amygdaloid projection field in the rat. A quantitative autoradiographic study. Journal fur Hirnforschung, 21, 199-225. Price, J. L., & Amaral, D. G. (1981). An autoradiographic study of the projections of the central nucleus of the monkey amygdala. Journal of Neuroscience, 1, 1242-1259. Racine, R. J. (1978). Kindling: The first decade. Neurosurgery, 3, 234-252. Redmond, D. E., Jr. (1977). Alteration in the function of the nucleus locus coeruleus: A possible model for studies on anxiety. In 1. E. Hanin & E. Usdin Fds.), Animal models in psychiatry and neurology, pp. 293-304. Oxford: Pergamon. Rosen, J. B., & Davis, M. (1987). Enhancement of acoustic startle by electrical stimulation of the amygdala. Behavioral Neuroscience, in press. Rosen, J . B., Hitchcock, J. M., &Davis, M. (1986). [Lesions of the caudal part of the ventral amygdalofugal pathway block enhancement of startle by electrical stimulation of the amygdala]. Unpublished raw data.
Michael Davis et ul.
304
Sandrew, B. B., Edwards, D. L., Poletti, C. E.,& Foote, W. E. (1986). Amygdalospinal projections in the cat. Brain Research, 373, 235-239. Sanghera, M. K., Rolls, E. T., and Roper-Hall, A. (1979). Visual responses of neurons in the dorsolateral amygdala of the alert monkey. Experimental Neurology, 63, 610-626. Sarter, M., & Markowitsch, H. J. (1985). Involvement of the amygdala in learning and memory: A critical review, with emphasis on anatomical relations. Behavioral Neuroscience, 99, 342-380. Scheel-Kruger, J., & Petersen, E. N. (1982). Anticonflict effect of the benzodiazepines mediated by a GABAergic mechanism in the amygdala. European Journal of Pharmacology, 82, 115-116. Schwaber, J. S., Kapp, B. S., Higgins, G. A., & Rapp, P. R. (1982). Amygdaloid and basal forebrain direct connections with the nucleus of the solitary tract and the dorsal motor nucleus. Journal of Neuroscience, 2, 1424-1438. Schwarcz, M., Hokfelt, T., Fuxe, K., Jonsson, Goldstein, M., &Terenius, L. (1979). Ibotenic acid-induced neuronal degeneration: A morphological and neurochemical study. Experimental Brain Research, 31, 199-216. Sen, R. N., & Anand, B. K. (1957). Effect of electrical stimulation of the limbic system of brain (‘visceral brain’) on gastric secretory activity and ulceration. Indian Journal of Medical Research, 45, 5 15-521. Shibata, K., Kataoka, Y., Gomita, Y., & Ueki, S. (1982). Localization of the site of the anticonflict action of benzodiazepines in the amygdaloid nucleus of rats. Brain Research, 234, 442-446.
Shibata, K., Kataoka, Y., Yamashita, K., & Ueki, S. (1986). An important role of the central amygdaloid nucleus and mammillary body in the mediation of conflict behavior in rats. Brain Research, 372, 159-162. Shizgal, P., Bielajew, C., Corbett, D., Skelton, R., &Yeomans, J. (1980). Behavioral methods for inferring anatomical linkage between rewarding brain stimulation sites. Journal of Comparative and Physiological Psychology, 94, 227-237. Shopsin, B., Friedman, E., & Gershon, S. (1976). Parachlorophenylalanine reversal of tranylcypromine effects in depressed patients. Archives of General Psychiatry, 33, 81 1-819. Siegel, A. (1967). Stimulus generalization of a classically conditioned response along a temporal dimension. Journal of Comparative and Physiological Psychology, 64, 461-466. Smith, B. S., & Millhouse, 0. E. (1985). The connections between the basolateral and central amygdaloid nuclei. Neuroscience Letters, 56, 307-309. Soffer, A. (1954). Regitine and benodaine in the diagnosis of pheochromocytoma. The Medical Clinics of North America, 30, 375-384. Sorenson, C. A., & Wilkinson, L. 0. (1983). Behavioral evidence that nicotine adminstration has anxiolytic actions in rats. Society for Neuroscience Abstracts, 9, 137. Soubrie, P. (1986). Reconciling the role of central serotonin neurons in human and animal behavior. The Behavioral and Brain Sciences, 92, 319-364. Spevack, A. A., Campbell, C. T., & Drake, L. (1975). Effect of amygdalectomy on habituation and CER in rats. Physiology & Behavior, 15, 199-207. Supple, W. F., & Leaton, R. N. (1986). Cerebellar vermis: Essential for classically conditioned bradycardia in rats. Eastern Psychological Association, p. 65. Swadlow, H. A., & Waxman, S. G. (1976). Variations in conduction velocity and excitability following single and multiple impulses of visual callosal axons in the rabbit. Experimental Neurology, 53, 128-150. Takeuchi, Y., Matsushima, S., Matsushima, R., & Hopkins, D. A. (1983). Direct amygdaloid projections to the dorsal motor nucleus of the vagus nerve: A light and electron microscopic study in the rat. Brain Research, 280, 143-147. Takeuchi, Y., McLean, J. H., & Hopkins, D. A. (1982). Reciprocal connections between the amygdala and parabrachial nuclei: Ultrastructural demonstration by degeneration and axonal transport of HRP in the cat. Brain Research, 239, 583-588.
Anxiety and the Amygdala
305
Thompson, R. F., Donegan, N. H., Clark, G. A., Lavond, D. G., Lincoln, J. S.. Madden, J. IV, Mamounas, L. A., Mauk, M. D., &McCormick, D. A. (1987). Neuronal substrates of discrete, defensive conditioned reflexes, conditioned fear states, and their interactions in the rabbit. In I. Gormezano, W.F. Prokasy, & R. F. Thompson (Eds.), Classical conditioning III. Behavioral, neurophysiological, and neurochemical studies in the rabbit. Hillsdale, N.J.: Erlbaum. Tischler, M. D., & Davis, M. (1983). A visual pathway that mediates fear-conditioned enhancement of acoustic startle. Brain Research, 276, 55-71. Turner, B. H., & Zimmer, J . (1984). The architecture and some of the interconnections of the rat’s amygdala and lateral periallocortex. Journal of Comparative Neurology, 227, 540-557.
Ursin, H., Jellestad, F., & Cabrera, I. 0. (1981). The amygdala, exploration and fear. In Y. Ben-Ari (Ed.), The amygdaloid complex. Amsterdam: Elsevier. Veening, J . G., Swanson, L. W., & Sawchenko, P. E. (1984). The organization of projections from the central nucleus of the amygdala to brain stem sites involved in central autonomic regulation: A combined retrograde transport-immunohistochemical study. Brain Research, 303, 337-357. Wagner, A. R., Siegel, L. S., & Fein, G. 0 . (1967). Extinction of conditioned fear as a function of percentage of reinforcement. Journal of Comparative and Physiological PSyChOlOgy, 63, 160-164. Walters, E. T., & Byrne, J . H. (1985). Long-term enhancement produced by activity-dependent modulation of aplysia sensory neurons. Journal of Neuroscience, 5 , 662-672. Yeomans, J. S. (1975). Quantitative measurement of neural post-stimulation excitability with behavioral methods. Physiology & Behavior, 15, 593-602. Zaczek, R., & Coyle, J . T. (1982). Excitatory amino acid analogues: Neurotoxicity and seizures. Neuropharmacology, 21, 15-20.
This Page Intentionally Left Blank
A
transmission, 274-276 visual structures, 289, 290 pharmacology, 267-270 sensitization, 290-292 Anxiolytic compounds, anxiety, amygdala and, 268, 269, 296, 297 Aphasia, motor skill, timing and, 224 Apprehension, haptic processing and, 121, 122, 147, 148 direct, 130-133 exploration, 133-140 image-mediated model, 128-130 sequence, 140-144 three-dimensional objects, 126-128 two-dimensional spatial layout, 122 visual processing, 144-147 Artificial intelligence, stimulus-response compatibility and, 7, 11-16, 46 Associationism, causality judgment and, 230, 256 Associative learning, causality judgment and, 230, 256-258 acquisition, 242 blocking, 247 comparator theories, 250-254 contiguity, 231 retrospective evaluation, 247 Associative memory, working memory and architecture, 58, 59 context, 91, 97
Acoustic startle, anxiety, amygdala and, 294, 296 neural systems, 270-272, 274-276, 278-283, 289 sensitization, 290 Acquisition anxiety, amygdala and, 270, 291, 292 causality judgment and, 236-242, 256 blocking, 247 comparator theories, 249, 252, 254, 256 retrospective evaluation, 247, 249 Aggression, stimulus-response Compatibility and, 43 Amnesia, working memory and, 85, 112 Amygdala, anxiety and, see Anxiety, amygdala and Anxiety, amygdala and, 264-267, 296, 297 anatomical connections, 292, 293 conditioned fear, 295, 296 conditioned stimulus, 294, 295 electrical stimulation, 278-289, 294 neural systems, 270 acoustic startle pathway, 270-272 central nucleus, 277, 278 diazepam, 276, 277 electrical stimulation, 278-289 electromyographical measurement, 271, 273, 274 projection areas, 284 307
308
Index
Asynchrony motor skill, timing and, 221, 222 working memory and, 79, 104 Attention anxiety, amygdala and, 293 causality judgment and, 248 haptic processing and, 147, 149 motor skill, timing and, 200, 201 working memory and, 56, 57, 112 architecture, 58, 59, 66 skill acquisition, 111 skilled memory, 93, 99 Attenuation, working memory and architecture, 59, 61, 69 literature, 75 Attribution, causality judgment and, 234, 248, 252 Auditory system, anxiety, amygdala and, 273 Autoassociation, working memory and, 114 architecture, 61, 69-71 literature, 74, 76, 82 skilled memory, 99, 107 Avoidance anxiety, amygdala and, 295 causality judgment and, 243
B Backward blocking, causality judgment and, 248, 249, 253, 256 Basal ganglia, motor skill, timing and, 204, 206-208, 210-212 Basolateral nucleus, anxiety, amygdala and, 295, 296 Benzodiazepine, anxiety, amygdala and, 276, 296 Blocking anxiety, amygdala and, 294, 296, 291 neural systems, 276, 277, 285, 286 sensitization, 291 causality judgment and, 242-247, 256 comparator theories, 249, 252, 253, 256 retrospective evaluation, 247-249 working memory and, 75 Blood pressure, anxiety, amygdala and, 292, 294, 295 Brainstem, anxiety, amygdala and, 264, 279, 284, 292 Broca’s aphasia, motor skill, timing and, 224
Buffers human motor programming and, 174, 180 working memory and, 112, 113 architecture, 59 context, 84, 91, 92 literature, 72-83 sequential hierarchal output, 101-104, 106 skill acquisition, 109, 110 skilled memory, 93, 94, 99, 101 workload, 107 Buspirone, anxiety, amygdala and, 268, 270
C
Callosal axons, anxiety and, 282 Categorization haptic processing and, 121, 126, 127, 148, 149 apprehension, 130, 136, 143, 147 working memory and, 55, 112-114 architecture, 59, 68 literature, 82 skilled memory, 93, 98, 99 Caudate nucleus, anxiety, amygdala and, 284 Causality judgment, 229, 230, 256-258 acquisition, 236-242 blocking, 242-247 comparator theories, 249-256 contiguity, 230-234 contingency, 233-236 retrospective evaluation, 247-249 Central amygdalofugal pathway, anxiety and, 279 Central nucleus, anxiety, amygdala and, 292-297 neural systems, 277-280, 284, 286, 289 sensitization, 291, 292 Cerebellum anxiety, amygdala and, 277, 278 motor skill, timing and, 225 neurological analysis, 204, 206, 209-212, 214 time sharing, 223 Chunking stimulus-response compatibility and, 2, 46-49 learning, 26, 29-38
Index
performance, 17 working memory and, 54, 113 literature, 72, 81 serial output, 101, 105, 106 Classical conditioning, anxiety, amygdala and, 264 fear-potentiated startle, 265, 267 neural systems, 277 Clonidine, anxiety, amygdala and, 268 Coding, working memory and, 81, 82, 96 Cognitive processing haptic processing and, 149 working memory and, 55 Comparator theories, causality judgment and, 249-258 Competition, working memory and, 81 Conditional emotional response, anxiety, amygdala and, 271, 294, 295 Conditional stimulus, causality judgment and, 243 Conditioned fear, amygdala and, 295-297 fear-potentiated startle, 265 neural systems, 290 Conditioned response, anxiety, amygdala and, 265, 290 Conditioned stimulus, anxiety, amygdala and, 294-297 fear-potentiated startle, 265 neural systems, 270, 271, 273, 274, 278, 287, 289 pharmacology, 267 sensitization, 291 Conditioning anxiety, amygdala and, 275, 297 causality judgment and blocking, 242-244, 247 comparator theories, 230, 250, 251, 256, 257 contiguity, 231 retrospective evaluation, 248 Connectionist architecture, working memory and context storage, 65-68 macrolevel structure, 61-63 microlevel structure, 59-61 principles, 57-59 simulation methods, 68-71 system-level structure, 63-65 Context causality judgment and, 256
309
working memory and, 55, 84, 112-114 architect ure, 65-68 episodic memory, 85 interference, 85-88 literature, 76, 77 overloading, 92 recency, 91, 92 release, 88-91 skill acquisition, 109, 111 skilled memory, 93, 99 Context blocking, causality judgment and, 256 Contiguity, causality judgment and, 230-236, 257 Contingency, causality judgment and, 230, 233-235, 256, 257 acquisition, 237-242 blocking, 242-245 comparator theories, 250-252, 254-256 retrospective evaluation, 248, 249 Corpus collosum, motor skill, timing and, 219 Cortex motor skill, timing and neurological analysis, 204, 206, 210-212, 214 time sharing, 219 working memory and, 57, 58, 61, 112 Cues anxiety, amygdala and, 265, 266, 290, 296 causality judgment and, 249 haptic processing and, 126, 127, 131 working memory and context, 91 skilled memory, 96, 97, 100, 101
D Deafness, motor skill, timing and, 224 Decay, working memory and, 55, 56, 112 architecture, 65-68, 70, 71 literature, 73, 74 skilled memory, 93, 97, 101 Decision making, human motor programming and, 178 Decoding human motor programming and, 179, 180 stimulus-response compatibility and, 30-33, 36
310
Index
Degradation causality judgment and, 235 working memory and, 54 architecture, 65 context, 87 literature, 80, 81 Delay human motor programming and, 176 motor skill, timing and, 223 working memory and, 105, 106 Depression, anxiety, amygdala and, 268, 270 Diazepam, anxiety, amygdala and, 268, 270, 276, 277 Digit span, working memory and, 112 chunking, 105 literature, 73, 81-83 skilled memory, 93, 94, 100 Discrimination causality judgment and, 255 haptic processing and, 127, 144, 145, 147 motor skill, timing and, 191, 198, 221, 223 stimulus-response compatibility and, 44, 45 Displacement, working memory and, 72 Distraction, working memory and, 86 L-Dopa, motor skill, timing and, 207 Dopamine anxiety, amygdala and, 284, 293 motor skill, timing and, 207 Dorsal cochlear nuclei, anxiety, amygdala and, 271 Drugs, anxiety, amygdala and, 295-297 fear-potentiated startle, 267-270 neural systems, 289 Dual-task condition, motor skill, timing and, 218, 220 Dysmetria, motor skill, timing and, 209
E Elaboration, working memory and, 95, 113 Electrical stimulation, anxiety, amygdala and, 278-289, 294, 296 Electromyographic analysis, motor skill, timing and, 196 Electromyographical measurement, anxiety, amygdala and, 266, 270, 271, 273, 274
Encoding haptic processing and, 122, 141. 148 stimulus-response Compatibility and, 30-34, 36 working memory and chunking, 106 context, 88 skilled memory, 93, 95, 100 Episodic memory, working memory and, 85, 92, 113 Error-correction learning rule, working memory and, 98 Exploration, haptic processing and object, 133-140 sequence, 143 two-dimensional spatial layout, 122-124 Exploratory apprehensions, haptic processing and, 131, 133, 138-140, 144 Extinguishing, causality judgment and, 257
F Facial expression, anxiety, amygdala and, 292, 293 Fear conditioning, anxiety, amygdala and, 264, 294 neural systems, 277 sensitization, 291 Fear-potentiated startle, amygdala and, 295, 296 neural systems, 270 acoustic startle pathway, 270-272 central nucleus, 277, 278 diazepam, 276, 277 electrical stimulation, 278-289 electromyographical measurement, 271, 273, 274 projection areas, 284 transmission, 274-276 visual structures, 289, 290 paradigm, 265-267 pharmacology, 267-270 sensitization, 290-292 Feedback haptic processing and, 149 motor skill, timing and, 183, 185, 187-189 working memory and, 112 architecture, 61, 63, 68-71 context, 87, 90, 99
Index literature, 74, 75, 77-791 sequential hierarchal output, 102, 103 Footshock, anxiety, amygdala and, 265, 290-292 Force control, motor skill, timing and, 192, 193, 200 Forgetting, working memory and, 54, 56 Free recall, working memory and context, 91, 92 literature, 72 skilled memory, 98, 100 Freezing, anxiety, amygdala and, 267, 290, 292
G Generalization, stimulus-response compatibility and, 31, 46 Goal, working memory and, 111 Goal hierarchies, stimulus-response compatibility and, 3, 4, 7, 44, 47 artificial intelligence, 11-16 data, 4-7 information processing, 9-11 learning, 32, 34, 37, 38, 40, 41 results, 16-26 theory, 8, 9 GOMS model, stimulus-response compatibility and, 9-11, 45
H Habituation, anxiety, amygdala and, 264 Haptic processing, 121, 122, 147-149 apprehension, 133 direct, 130-133 exploration, 133-1 40 image-mediated model, 128-130 sequence, 140-144 visual processing, 144-147 three-dimensional objects, 126-128 two-dimensional spatial layout, 122-1 26 Heart rate, anxiety, amygdala and, 277-279, 284, 292, 294, 295 Hick’s law, stimulus-response compatibility and, 16 Hierarchal decisions, human motor programming and, 163-167, 179
311
motor-program editor model, 158, 162 sequence, 154-157 Hierarchal editor model, human motor programming and, 167-175 scheduling, 177-179 Hierarchy, working memory and, 101-104 literature, 91 skilled memory, 98 Hormones, anxiety, amygdala and, 280 Human motor programming, 153, 154, 163, 179, 180 hierarchal decisions, 154-157, 163-167 hierarchal editor model, 167-173 implications, 174 motor-program editor model, 157, 158, 163-167 remapping, 162, 163 stimulus-response, 158-162 parallel editing, 175 inverse length effects, 175, 176 scheduling, 176-179 Hypothalamus, anxiety, amygdala and, 284
I Information processing human motor programming and, 180 stimulus-response compatibility and, 7, 9-1 I working memory and, 54-56, 112, 113 architecture, 59 context, 92 Inhibitory conditioning, causality judgment and, 243 Initiation, human motor programming and, 156 Instrumental conditioning, causality judgment and, 230, 243, 250 Insular cortex, anxiety, amygdala and, 289, 292, 296 Interference causality judgment and, 229, 257, 258 motor skill, timing and, 217, 218 working memory and architecture, 61, 64 context, 91 literature, 73 skilled memory 97, 98 Intervals, motor skill, timing and, 191
312
Index
L Latency, anxiety, amygdala and, 270, 271, 273, 274, 278 Learning, stimulus-response compatibility and, 3, 11, 26-29, 46 chunking, 29-38 results, 38-43 Lesions, anxiety, amygdala and, 284, 286, 296 Locus coeruleus, anxiety, amygdala and, 292, 293 Long-term memory anxiety, amygdala and, 265 working memory and, 5 5 , 112, 113 architecture, 68 context, 84, 85, 91, 92 literature, 81 skill acquisition, 108, 110, 111 skilled memory, 97, 99, 100 Long-term store, working memory and, 56
M Maintenance haptic processing and, 136, 137 working memory and, 113 Mapping human motor programming and, 163, 173 stimulus-response compatibility and, 4, 5, 9, 22 working memory and, 84 Matching, haptic processing and, 126 Medial geniculate nucleus, anxiety, amygdala and, 292 Medial longitudinal fasciculus, anxiety, amygdala and, 271 Medial nucleus, anxiety, amygdala and, 279, 280 Medulla, anxiety, amygdala and, 284 Memory anxiety, amygdala and, 295 causality judgment and, 256 haptic processing and, 126 human motor programming and, 156, 174 motor skill, timing and, 184 stimulus-response compatibility and, 29, 38, 41 Message vector, working memory and, 59, 61-63, 66, 69, 70
Mnemonic strategies, working memory and, 96, 97, 113 Modeling, working memory and, 72 Modularity, motor skill, timing and, 214 functional similarity, 220-224 time sharing, 214-220 Modules motor skill, timing and, 184 working memory and architecture, 57-60, 63, 66 literature, 74, 75, 77, 78, 80 sequential hierarchal output, 101, 103 Morphine, anxiety, amygdala and, 268 Motivation, motor skill, timing and, 191 Motor implementation, motor skill, timing and, 186 Motor program editor model, human motor programming and, 157, 158, 163-167 remapping, 162, 163 stimulus-response. 158-162 Motor programming, human, see Human motor programming Motor skill, timing in, 183, 184, 197, 224, 225 earlier approaches, 198-200 experimental analysis, 200-203 force control, 192, 193 individual differences, 189-192 issues, 184-188 modularity, 214 functional similarity, 220-224 time sharing, 214-220 neurological analysis, 203-214 repetitive activity, 193-197 Movement disorders, human motor programming and, 154
N Necessity, haptic processing and, 144 Nerve damage, motor skill, timing and, 193 Neural transmission, anxiety, amygdala and, 274, 275 Neurobiology, anxiety, amygdala and, 264 Neurological analysis, motor skill, timing and, 188, 203-214 time sharing, 220, 223, 224 Neurons, anxiety, amygdala and, 279, 282, 292
Index Neurophysiology, working memory and, 58 Neuropsychology, motor skill, timing and, 214 Neurotransmitters, anxiety, amygdala and, 269, 270, 297
0 Operant conditioning, causality judgment and, 230, 246 Operant conflict test, anxiety, aniygdala and, 294-296 Operant performance, anxiety, amygdala and, 268 Opiates, anxiety, amygdala and, 295, 296 Optimality, haptic processing and, 144 Output, working memory and, 82, 83 Overload, working memory and, 54, 92
P Paired associates, working memory and, 91 Parallel editing, human motor programming and, 175 inverse length effects, 175, 176 scheduling, 176-179 Parkinson’s disease, motor skill, timing and, 207, 208, 212, 214, 219 Pattern learning theory, motor skill, timing and, 199 Pavlovian conditioning, causality judgment and, 230, 257 blocking, 243 comparator theories, 250 P-chlorophenylalanine, anxiety, amygdala and, 270 Perception, haptic processing and, 149 Performance anxiety, amygdala and, 268-270 stimulus-response compatibility and, 3, 4, 7, 44, 45, 47 artificial intelligence, 11-16 data, 4-7 information processing, 9-11 results, 16-26 theory, 8, 9 Peripheral neuropathy, motor skill, timing and, 208
313
Pharmacology, anxiety, amygdala and, 264, 261-270, 296 Phuseolus vulgaris-leucoagglutinin anterograde tracing, anxiety, amygdala and, 284 Piagetian theory, parallel editing and, 153 Piperoxane, anxiety, amygdala and, 268 Plasticity, anxiety, amygdala and, 264, 294 Posteroventral cochlear nucleus, anxiety, amygdala and, 271 Post-stimulation excitability cycle, anxiety, amygdala and, 282 Practice motor skill, timing and, 183, 202, 203 stimulus-response compatibility and, 26-29 working memory and, 113 skill acquisition, 109, 110 skilled memory, 98, 99, 101 Primacy, working memory and, 81, 87, 91 Priority, working memory and, 108 Proactive inhibition, working memory and, 54, 67, 68 Proactive interference, working memory and, 112 context, 87, 88, 92 skill acquisition, 109 skilled memory, 93-97, 100 Procedural knowledge, working memory and, 85
R Reaction time human motor programming and, 156, 162 motor skill, timing and, 198, 199, 223 stimulus-response compatibility and, 1, 2, 28, 43, 47, 49 working memory and, 68, 107 Recall causality judgment and, 230 working memory and architecture, 70 chunking, 105 context, 86, 90, 91 literature, 72, 73 skilled memory, 94-100 Recency, working memory and, 113 context, 87, 91, 92
314
index
Recency, working memory and (conl.) literature, 72, 73, 81 skilled memory, 97 Reciprocation, motor skill, timing and, 194, 195, 202, 203 Recoding, working memory and, 100 Recognition haptic processing and, 122, 126, 127, 147, 149 image-mediated model, 129 working memory and, 100 Red nucleus, anxiety, amygdala and, 277, 278 Reflex, anxiety, amygdala and, 270, 274, 275, 292, 296 Rehearsal, working memory and, 54, 56, 57, 113 context, 92 literature, 81, 83, 84, 89 skilled memory, 95 workload, 107, 109 Reinforcement, causality judgment and, 230 blocking, 243 comparator theories, 250, 254 contiguity, 231 Remapping, human motor programming and, 162, 163 Reperception, haptic processing and, 129 Repetition motor skill, timing and, 193-197, 208, 214 working memory and, 93, 98, 99 Respiration, anxiety, amygdala and, 292-294 Response anxiety, amygdala and, 268, 270 motor skill, timing and, 217 Retention, working memory and, 54, 113 architecture, 58, 71 context, 87 skilled memory, 97 Rericularis pontis caudalis, anxiety, amygdala and, 271, 274, 275, 284, 293 Retrieval human motor programming and, 156 working memory and, 55, 56, 113 architecture, 67, 70 context, 87, 88, 91 literature, 82 sequential hierarchal output, 101 skill acquisition, 109
skilled memory, 93, 96-101 Retroactive interference, working memory and, 113 architecture, 66, 67 context, 84, 85, 87, 92 skilled memory, 94-96, 99, 100
S Scalar Expectancy Theory, causality judgment and, 250 Scanning, working memory and, 59, 61 Scheduling, human motor programming and, 176-178 Search of associative memory model, working memory and, 91, 97 Semantic processing, working memory and, 113 architecture, 57, 62, 65, 67 context, 88-90, 92 literature, 75, 80 skilled memory, 95, 96, 98 workload, 108 Sensitization, anxiety, amygdala and, 264, 290-292 Sequence human motor programming and, 154, 164-166 hierarchal decisions, 154-157 hierarchal editor model, 170-172 motor-program editor model, 158-161 motor skill, timing and, 184 working memory and, 112, 113 architecture, 59, 63, 64 literature, 75, 79, 81-83 skilled memory, 96 Sequential hierarchal output, working memory and, 101-104 Serial-order intrusion, working memory and, 81 Serotonin, anxiety, amygdala and, 269, 270 Shock, anxiety, amygdala and, 296 fear-potentiated startle, 265, 266 neural systems, 274, 275, 278, 287, 289 pharmacology, 268 Short-term information, stimulus-response compatibility and, 14 Short-term memory human motor programming and, 174
Index
working memory and, 54-57, 112. 113 architecture, 66, 68, 70 context, 85, 86, 91, 92 literature, 72-75, 81, 84 skilled memory, 94, 96, 97, 99 Short-term store, working memory and, 56 Sign language, motor skill, timing and, 224 Signaling, causality judgment and, 252, 255, 256 Similarity motor skill, timing and, 220-224 working memory and, 97 Social interaction test, anxiety, amygdala and, 294 Somatostatin, anxiety, amygdala and, 284 Span, working memory and, 73, 100 Specialization, haptic processing and, 148 Specificity theory, motor skill, timing and, 198 Spinal cord anxiety, amygdala and, 271, 284 motor skill, timing and, 211 Spinal neurons, motor skill, timing and, 211 Split brain, motor skill, timing and, 219 Startle, amygdala and, see Fear-potentiated startle, amygdala and Startle reflex, anxiety, amygdala and, 264, 271, 275 Stereotype, stimulus-response compatibilty and, 48 Stimulus anxiety, amygdala and, 294, 296 neural systems, 273, 278, 287 pharmacology, 268 motor skill, timing and, 217 Stimulus-response compatibility, 1-3, 43-49 human motor programming and, 158-162 learning, 26-29 chunking, 29-38 results, 38-43 performance, goal hierarchies and, 3, 4, 7 artificial intelligence, 11-16 data, 4-7 information processing, 9-1 I results, 16-26 theory, 8, 9 Stroke, motor skill, timing and, 209, 224 Subgoals, stimulus-response compatibility and, 47 Substantia nigra anxiety, amygdala and, 284, 286
315
motor skill, timing and, 207 Substitution, working memory and, 81 Subthalamic nucleus, anxiety, arnygdala and, 286 Superior colliculus, anxiety, amygdala and, 289, 290 Synapse, anxiety, amygdala and, 270, 271 Synchrony, motor skill, timing and, 185, 189 neurological analysis, 216 time sharing, 220, 221 Synergism, anxiety, arnygdala and, 287
T Thermal sensing, haptic processing and, 121 Time sharing, motor skill, timing and, 200, 201, 214-220 Timing, motor skill and, see Motor skill, timing and Transfer, working memory and, 110 Translation, human motor programming and, 168, 169 Transposition, working memory and, 81
V Vagus nerve, anxiety, amygdala and, 293 Ventral acoustic stria, anxiety, amygdala and, 274, 275 Ventral amygdalofugal pathway, anxiety and, 279, 280, 284-286 Ventral cochlear nucleus, anxiety, amygdala and, 271, 275, 276 Ventral nucleus of the lateral lemniscus, anxiety, amygdala and, 271, 274-276, 289 Verbalization, working memory and, 95, 96 Visual cortex, anxiety, amygdala and, 289, 296 Visual structures, anxiety, amygdala and, 289. 290
W
Wernicke’s aphasia, motor skill, timing and, 224
316 Working memory, 54-56, 112-114 architecture context storage, 65-68 macrolevel structure, 61-63 microlevel structure, 59-61 principles, 57-59 simulation methods, 68-71 system-level structure, 63-65 chunking, 105, 106 context, 84 episodic memory, 85 interference, 85-88 overloading, 92 recency, 91, 92 release, 88-91 literature, 71, 72 buffer, 72-81 coding, 81, 82 output, 82, 83
Index rehearsal loops, 83, 84 sequential hierarchal output, 101-104 skill acquisition, 108 distributing practice, 109, 110 phases, 110, 111 skilled memory, 93 rules for, 94-101 stimulus-response compatibility and learning, 31-33, 36 performance, 14, 17 traditional views, 56, 57 workload, 106-108 Workload, working memory and, 106-108
Y Yohimbine, anxiety, amygdala and, 268
CONTENTS OF RECENT VOLUMES Volume 11 Levels of Encoding and Retention of Prose D. James Dooling and Robert E. Christiaansen Mind Your p’s and q’s: The Role of Content and Context in Some Uses of And, Or, and If Samuel Fillenbaum Encoding and Processing of Symbolic Information in Comparative Judgments William P. Banks Memory for Problem Solutions Stephen K. Reed and Jeffrey A. Johnson Hybrid Theory of Classical Conditioning Frank A. Logan Internal Constructions of Spatial Patterns Lloyd R. Peterson, Leslie Rawlings, and Carolyn Co hen Attention and Preattention Howard Egeth
Memory, Temporal Discrimination, and Learned Structure in Behavior Charles P. Shimp The Relation between Stimulus Analyzability and Perceived Dimensional Structure Barbara Burns, Bryan E. Shepp, Dorothy McDonough, and Willa K . Wiener-Ehrlich Mental Comparison Robert S . Moyer and Susan T. Dumais The Simultaneous Acquisition of Multiple Memories Benton J. Underwood and Robert A. Malmi The Updating of Human Memory Robert A. Bjork
Subject Index
Volume 13
Subject Index Pavlovian Conditioning and the Mediation of Behavior J. Bruce Overmier and Janice A. Lawry A Conditioned Opponent Theory of Pavlovian Conditioning and Habituation Jonathan SchuIl Memory Storage Factors Leading to Infantile Amnesia Norman E. Spear
Volume 12 Experimental Analysis of Imprinting and Its Behavioral Effects Howard S. Hoffman 317
318
Contents of Recent Volumes
Learned Helplessness: All of Us Were Right (and Wrong): Inescapable Shock Has Multiple Effects Steven F. Maier and Raymond L. Jackson On the Cognitive Component of Learned Helplessness and Depression Lauren B. Alloy and Martin E. P. Seligman A General Learning Theory and Its Application to Schema Abstraction John R. Anderson, Paul J. Kline, and Charles M. Beasley, Jr. Similarity and Order in Memory Robert G. Crowder Stimulus Classification: Partitioning Strategies and Use of Evidence Patrick Rabbitt Immediate Memory and Discourse Processing Robert J. Jarvella
Subject Index
Volume 14 A Molar Equilibrium Theory of Learned Performance William Timberlake Fish as a Natural Category for People and Pigeons R. J. Hernstein and Peter A. de Villiers Freedom of Choice: A Behavioral Analysis A. Charles Catania A Sketch of an Ecological Metatheory for Theories of Learning Timothy D. Johnston and M. T. Turvey SAM: A Theory of Probabilistic Search of Associative Memory Jeroen G. W. Raaijmakers and Richard M. Shiffrin Memory-Based Rehearsal Ronald E. Johnson Individual Differences in Free Recall: When Some People Remember Better Than Others Marcia Ozier
Index
Volume 15 Conditioned Attention Theory R. E. Lubow, 1. Weiner, and Paul Schnur A Classification and Analysis of Short-Term Retention Codes in Pigeons Donald A. Riley, Robert G. Cook, and Marvin R. Lamb Inferences in Information Processing Richard J. Harris Many Are Called but Few Are Chosen: The Influence of Context on the Effects of Category Size Douglas L. Nelson Frequency, Orthographic Regularity, and Lexical Status in Letter and Word Perception Dominic W. Massaro, James E. Jastrzembski, and Peter A. Lucas Self and Memory Anthony G. Greenwald Children’s Knowledge of Events: A Causal Analysis of Story Structure Tom Trabasso, Nancy L. Stein, and Lucie R. Johnson
Index
Volume 16 Skill and Working Memory William G. Chase and K. Anders Ericsson The Impact of a Schema on Comprehension and Memory Arthur C. Graexer and Glenn V. Nakamura Construction and Representation of Orderings in Memory Kirk H. Smith and Barbee T. Mynatt A Perspective on Rehearsal Michael J . Watkins and Zehra F. Peynircioglu Short-Term Memory for Order Information Alice F. Healy Retrospective and Prospective Processing in Animal Working Memory Werner K. Honig and Roger K. R. Thompson
Index
Contents of Recent Volumes
Volume 17
319
Martin D. S. Braine, Brian J. Reiser, and Barbara Rurnain
Index The Structure of Human Memory William F. Brewer and John R. Pan] A Simulation Model for the Comprehension of Technical Prose David Kieras A Multiple-Entry, Modular Memory System Marcia K. Johnson The Cognitive Map of a City-SO Years of Learning and Memory Harry P. Bahrick Problem Solving Skill in the Social Sciences James F. Voss, Terry R. Greene, Timothy A. Post, and Barbara C. Penner Biological Constraints on Instrumental and Classical Conditioning: Implications for General Process Theory Michael Domjan
Index
Volume 19 Memory for Experience Janet Kolodner The Pragmatics of Analogical Transfer Keith J. Holyoak Learning in Complex Domains: A Cognitive Analysis of Computer Programming Richard E. Mayer Posthypnotic Amnesia and the Dissociation of Memory John F. Kihlstrom Unit Formation in Perception and Memory John Ceraso How Infants Form Categories Barbara A . Younger and Leslie B. Cohen
Index
Volume 18
Volume 20
Nonanalytic Condition: Memory, Perception, and Concept Learning Larry L. Jacoby and Lee R. Brooks On the Nature of Categories Donald Homa The Recovery of Unconscious (Inaccessible) Memories: Laboratory Studies of Hypermnesia Matthew Erdelyi Origins of Behavior in Pavlovian Conditioning Peter C. Holland Directed Forgetting in Context Mark Rilling, Donald F. Kendrick, and Thomas B. Stonebraker Effects of Isolation Rearing on Learning by Mammals Robert Holson and Gene P. Sackett Aristotle’s Logic Marilyn Jager Adams Some Empirical Justification for a Theory of Natural Propositional Logic
Recognition by Components: A Theory of Visual Pattern Recognition Irving Biederman Associative Structures in Instrumental Learning Ruth M. Colwill and Robert A. Rescorla The Structure of Subjective Time: How Time Flies John Gibbon The Computation of Contingency in Classical Conditioning Richard H. Granger, Jr. and Jeffrey C. Schlimmer Baseball: An Example of KnowledgeDirected Machine Learning Elliot Soloway Mental Cues and Verbal Reports in Learning Francis S. Bellezza Memory Mechanisms in Text Comprehension Murray Glanzer and Suzanne Donnenwerth Nolan Index
This Page Intentionally Left Blank