List of Contributors
M. Donk
G. Mulder
Faculteit der Psychologie en Pedagogek Vrije Universiteit de Boelelaan 1111 10...
63 downloads
1020 Views
34MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
List of Contributors
M. Donk
G. Mulder
Faculteit der Psychologie en Pedagogek Vrije Universiteit de Boelelaan 1111 1081 HV Amsterdam The Netherlands
Institute of Experimental Psychology University of Groningen Kerklaan 30 9751 NN Haren The Netherlands
M. Eimer
D. Nattkemper
MPI fiir Psychologische Forschung Leopoldstr. 24 80802 Munchen Germany
MPI f6r Psychologische Forschung Leopoldstr. 24 80802 Munchen Germany
J. Everatt
0. Neumann
Department of Psychology University of Surrey Guildford GU2 5XH UK
Abt. fiir Psychologie Universit~it Bielefeld Postfach 10 01 31 33501 Bielefeld Germany
Th. C. Gunter
Institute of Experimental Psychology University of Groningen Kerklaan 30 9751 NN Haren The Netherlands
W. Prinz
MPI fiir Psychologische Forschung Leopoldstr. 24 80802 Munchen Germany
H. Heuer
A. F. Sanders
Institut f6r Arbeitsphysiologie an der Universit~it Dortmund Ardeystr. 67 44139 Dortmund Germany
Faculteit der Psychologie en Pedagogek Vrije Universiteit de Boelelaan 1111 1081 HV Amsterdam The Netherlands
H. S. Koelega E. SchriJger
Psychonomy Department University of Utrecht Heidelberglaan 2 3584 CS Utrecht The Netherlands
MPI f/ir Psychologische Forschung Leopoldstr. 24 80802 Munchen Germany
xi
xii
Contributors
H. G. 0. M. Smid
A. H. C. van der Heijden
Institute of Experimental Psychology University of Groningen Kerklaan 30 9751 NN Haren The Netherlands
Unit of Experimental and Theoretical Psychology Leiden University 2333 AK Leiden The Netherlands
G. ten Hoopen
Unit of Experimental and Theoretical Psychology Leiden University 2333 AK Leiden The Netherlands
M. W. van der Molen
Faculteit der Psychologie Universiteit van Amsterdam Roetersstraat 15 1015 WB Amsterdam The Netherlands A. A. Wijers
G. Underwood
Department of Psychology University of Nottingham Nottingham NG7 2RD UK
Institute of Experimental Psychology University of Groningen Kerklaan 30 9751 NN Haren The Netherlands
Introduction A. F. Sanders* and O. N e u m a n n t
*Free University, Amsterdam, The Netherlands and Biele[eld University, Germany
It is elementary historical knowledge in experimental psychology that the concept of attention had a high status in all classical theorizing, culminating in Titchener's (1908) statement that 'the doctrine of attention is the nerve of the whole psychological system, and that as men judge it, so they shall be judged before the general tribunal of Psychology'. It is equally elementary historical knowledge that the concept was rejected as 'mentalistic' by leading proponents of classical behaviorism and as 'non-existent' by Gestalt psychologists (e.g. Rubin, 1921). Attention research did not vanish during the dominance of these theoretical directions (Lovie, 1983), but attention lost the status of a central theoretical construct. It was brought back to respectability as a consequence of the cognitive shift during the 1950s but, yet, it was never fully restored. Thus, Johnston and Dark (1986) stated that 'at a time of despair and panic we turned to William James (1890) where we found new hope and inspiration'. Although it is intuitively quite clear that 'everyone knows what attention is' (James, 1890), it seems to evade systematic analysis. What is the problem with attention? In fact there seem to be several potential problems and pitfalls. A first problem is methodological and concerns inappropriate use as an ad hoc explanatory concept. It is indeed revealing to see the extent to which attention has been invoked as an 'explanation' of a variety of issues ranging from perceptual phenomena such as figure-ground and binocular rivalry to motor phenomena of programming and preparation. Again, attention has been used as an explanatory concept in the context of consciousness and of action planning, and it has even been used as a synonym for processing capacity (Kahneman, 1973). It should be realized that the uncritical application of the concept has caused, and actually is causing, a justified distrust. For instance, Rubin (1921) correctly objected to its ad hoc application to figure-ground perception, as did K6hler (1925) to M611er's (1923) theory of perceiving Gestalts in terms of attention. In short, attention has been viewed as an explanatory concept for almost any mental phenomenon. This may seem to confirm its central significance, but at the same time the concept becomes void of theoretical meaning. Attention cannot serve the role of a simple deus ex machina, but represents a field of inquiry requiring its own processing models. Trivial as this may sound, the issue remains a pitfall against which modern studies are certainly not safeguarded. Handbook of Perception and Action, Volume 3 ISBN 0-12-516163-8
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
2
A.F. Sanders and O. Neumann
This leads to a second problem: it has been extremely difficult to define proper research questions about attention without confounding it with issues of either perceptual or motor or central processing as such. If, following James, attention is viewed as a state of selection 'of one out of what seem several simultaneously possible objects or trains of thought' (p. 403), research on attention should deal with the dynamics and limits of selective activity, but not with the ongoing processes themselves. This seemingly trivial point has still been one of the major weaknesses from which the field has suffered, perhaps because attention cannot be studied in isolation but always accompanies some other process. Thus, most topics listed under attention in Woodworth's (1938) classical text would not qualify today as typical for attentional control. In the context of his theoretical scheme, Titchener was fully aware of the danger of confounding when he defined attention as sensory clearness. This had the implication that attention serves only the role of modulating sensory processes. Unfortunately, his introspective experimental technique led into a deadlock and was definitively abandoned after Titchener's death in 1929. It was undoubtedly the merit of Donald Broadbent (1958) that attention regained respectability in the 1950s since he was the first to formulate a workable set of questions, a testable theory and a set of fruitful experimental paradigms. The filter theory encompassed a variety of issues which, also intuitively, belong to the attentional domain: selective listening, automatic detection, the orientation reaction, distraction, vigilance, and doing two things at once. Broadbent clearly saw that a theory of attention should deal with selective as well as with intensive attention. Again, the filter could be either strategically set toward certain signal sources, or automatically attracted by certain signals, thus addressing what James had called active and passive attention. Broadbent also realized the role of motivation in the decision of what to attend (witness the experiments on the effects on environmental stresses and human performance). In monotonous conditions the filter becomes 'satiated' in selecting the same type of information over and over again. Finally, the filter emphasized the control function of attention, since it decided which stimuli would have priority and which not. Filter theory could be shown to be wrong on many occasions but it should be realized that it was the first time an attentional theory was sufficiently predictive that it could be shown to be wrong! Some main problems were probably that it was tuned too much to sensory intake, that it was too much all-or-none, and that it was too exclusively concerned with focused attention. Further, it used Shannon and Weaver's (1949) concept of a limited-capacity channel without adopting the quantitative methods of information theory. Broadbent was the first to acknowledge some of these shortcomings in his 1971 summary by introducing 'pigeonholing' as a counterpart of filtering and by allowing for 'attenuation' instead of all-or-none filtering, both principles elegantly framed in the context of the signal detection model. Apart from the theoretical issues it was probably of importance that the original results on filtering suffered from limitations of the specific experimental paradigms. Thus the interpretation of results on dichotic listening in terms of constraints in shifting attention proved doubtful, while the results on shadowing were limited to the presentation of two auditory messages. The picture was further complicated by pronounced practice effects, apparently leading to automatic processing. The lack of generalization from results obtained in simple experimental paradigms obviously brings back the danger of confounding attentional control with the properties of the domain of processes under investigation.
Introduction
3
These last points suggest a third main problem for studying attention, namely the enormous variety of mental phenomena with which attentional control may be concerned. It has been amply demonstrated that this variety precludes a common heading of 'processing capacity' (Kahneman, 1973), which proposes that subjects may carry out simultaneous tasks as long as capacity limits are not exceeded. As has been repeatedly pointed out, the consequence of assuming multiple processing resources (Navon and Gopher, 1979) implies giving up a unitary concept of attentional control. Perhaps, therefore, attention can be only used in a descriptive sense to indicate a complex field of study (Neumann, 1992), characterized by functionally different systems, such as orienting, sustained action and multiple task performance (Posner, 1978). If this is the case the future is to define specific paradigms, each allowing for the separation of processing as such and attentional control within the constraints of its domain, without any claim about a common mechanism across domains. Whether there is an ultimate unitary attentional coordination remains to be seen, but for the moment the emphasis is on diversity rather than on unity. It should be added that the presently popular connectionist language provides ample opportunities for modeling attention in terms of selective enhancement or inhibition of connections, and of treating problems of attentional control and systems separation, arising in cases of mutually interfering pathways. Accepting diversity is precisely what has happened during the last decades. The developments concern proliferation of various attentional domains such as orienting in space, object localization in visual search, central processing bottlenecks, attention for action, attentional control in dual-task performance, energetical models of effort, etc. Together with this proliferation a score of new experimental paradigms and measurement techniques have been developed and investigated, among which are cost-benefit analysis for separating controlled versus automatic processing (Posner, 1978), consistent mapping and the development of automatic processing (Shiffrin and Schneider, 1977), the analysis of brain potentials (N/i/it/inen, 1982) and recent neurophysiological developments on regional cerebral blood flow (Posner and Petersen, 1990). In particular, the neurophysiological correlates of performance may allow a more precise distinction between processing proper and its attentional control, thus enhancing the theoretical status of the field. Many of these themes are examined by the contributors of this volume. Eight of the 10 chapters can be subdivided into three groups dealing with (1) information intake, (2) central control functions, and (3) the intensity aspect. With respect to information intake a division is made between attention accompanying auditory and visual information processing. With respect to visual processing, separate chapters are concerned with visual search and with stimulus identification and localization, largely reflecting the distinction between divided and focused attention. The central control function of attention is reflected in three chapters, one on dual-task performance, one on involuntary attention and one on automatic versus controlled information processing. There are also two chapters with an emphasis on the intensity aspect of attention, one relating to the energetics of reaction processes and one to sustained attention. The chapter on brain potentials reflects the prospects of electrophysiological techniques, while the final theoretical chapter overviews topics from the nine other chapters in a search for communalities and divergences. It is inevitable that the different chapters reflect equally many biases of individual authors concerning their area of discussion. Some chapters consist of a critical
4
A.F. Sanders and 0. Neumann
review, others highlight specific developments; some are more theoretical and others more experimental; and some are more optimistic and others more pessimistic about the prospects of their area. Perhaps most conspicuously, they are rooted in widely different research t r a d i t i o n s - from h u m a n performance to psycholinguistics, from m e m o r y research to the psychology of music, from the control of eye m o v e m e n t s to the estimation of time and the effects of stress. The diversity of the approaches that can be found in this v o l u m e thus reflects the ubiquity of the selective and control m e c h a n i s m that make up 'attention'.
REFERENCES Broadbent, D. E. (1958). Perception and Communication. London: Pergamon. Broadbent, D. E. (1971). Decision and Stress. New York: Academic Press. James, W. (1890). The Principles of Psychology. New York: Holt. Johnston, W. A. and Dark, D. J. (1986). Selective attention. Annual Review of Psychology, 37, 43-75. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice Hall. K6hler, W. (1925). Gestalttheorie und Komplextheorie. Psychologische Forschung, 6, 458-416. Lovie, A. D. (1983). Attention and behaviorism- fact and fiction. British Journal of Psychology, 74, 301-310. MiiUer, G. E. (1923). Komplextheorie und Gestalttheorie: Ein Beitrag zur Wahrnehmungspsychologie. G6ttingen. N~i~it~inen, R. (1982). Processing negativity: An evoked potential reflection of selective attention. Psychological Bulletin, 92, 605-640. Navon, D. and Gopher, D. (1979). On the economy of human processing systems. Psychological Review, 86, 214-255. Neumann, O. (1992). Theorien der Aufmerksamkeit: von Metaphern zu Mechanismen. Psychologische Rundschau, 43, 83-101. Posner, M. I. (1978). Chronometric Explorations of Mind. Hillsdale, NJ: Erlbaum. Posner, M. I. and Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13, 25-42. Posner, M. I. and Snyder, C. R. R. (1975). Facilitation and inhibition in the processing of signals. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance V. London: Academic Press. Rubin, E. (1921). Die Nichtexistenz der Aufmerksamkeit. Bericht iiber den 9. Kongress fiir experimentelle Psychologie (Jena), pp. 211-212. Shannon, C. and Weaver, W. (1949). The Mathematical Theory of Communication. Urbana, IL: The University of Illinois Press. Shiffrin, R. M. and Schneider, W. (1977). Controlled and automatic information processing: Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127-189. Titchener, E. B. (1908). Lectures on the Elementary Psychology of Emotion and Attention. New York: MacMillan. Woodworth, R. S. (1938). Experimental Psychology. New York: Holt.
Chapter 1 V i s u a l Attention A. H. C. van der Heijden Unit of Experimental and Theoretical Psychology, Leiden University, The Netherlands
'Modern' attention r e s e a r c h - t h e research that contemporary attention research builds u p o n - started at the end of the 1950s (see Lovie, 1983, for a description of this beginning). The auditory attention branch had its beginning in the UK. It started with the introduction of the multi-channel tape recorder and used 'listening to speech' as its main task (see Moray, 1969, for an overview of this early research). The visual attention branch had its origin in the USA and Canada. It started with the rediscovery of an afterimage and had 'reading of letters' as its main task (see Neisser, 1967, for an overview). The outcomes and accomplishments of older, mainly German, research projects were nearly completely neglected (Neumann, in press). Apparently, the hope was that a fresh start, a new language (the information transmission and information processing language) and an improved methodology ('objective' measurements of behavior instead of 'subjective' introspective reports) could bring the progression that psychology looked for. This chapter deals with modern visual attention research. It deals with it in an idiosyncratic way, with regard to the topics chosen as well as the way of presenting these topics. During the past 20 years, there has been an explosion of important empirical and theoretical work in visual attention. But attention still is an elusive concept and many interpretations are presently in vogue. So, selections are necessary and in selections of necessity a personal bias is involved. My bias consists of the view that attention is a selection mechanism dealing with regions in visual space (Van der Heijden, 1992). As a result, in this chapter important lines of research and ingenious contributions to theory will not receive the attention many readers might have wished. Moreover, a way of dealing with the selected topics has to be chosen. In this chapter I opted for a hybrid approach: partly historical, partly systematic and partly personal. In the historical part I try to do some justice to the tremendous formative influence of the initial experimentation and theorizing in modern visual attention research. In the systematic part I try to outline the important achievements of further ingenious experimentation and theorizing. In the personal part I present my view on visual information processing and the role of selective attention therein. Handbook of Perception and Action, Volume 3 ISBN 0-12-516163-8
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
6
A . H . C . Van der Heijden
My wish is that these selections will induce sufficient frustration and inspiration that the coming 30 years of research will bring the same tremendous advances as the past 30 years have done.
1
THE START
The afterimage that visual attention research started with is a phenomenon that was already well documented in the German literature (see also Sperling, 1960, one of the few publications that recognized earlier contributions). Von Helmholtz (1871, 1894), for instance, described experiments in which an electric spark was used to illuminate a page with printed letters: 'The electric discharge illuminated the printed page for an indivisible instant during which its image became visible and remained for a very short while as a positive afterimage. Thus, the duration of the perceptibility of the picture was limited to the duration of the afterimage' (Warren and Warren, 1968, translators), and that duration was much longer than the duration of the spark ('an indivisible instant'). Also Wundt (1899), for instance, convincingly argued that it is blatantly false to assume that, in tachistoscopic experiments with short stimulus durations, the duration that a stimulus is seen exactly equals the physical stimulus duration. The duration of the visual sensation evoked by a short stimulus presentation appreciably exceeds the objective exposure duration. There is something like a positive afterimage (see Sperling, 1960, pp. 2223, for further information and details). At the end of the 1950s and the beginning of the 1960s this positive afterimage gave rise to a series of investigations in Canada and the USA. This research was not really seen as visual attention research but as, what can be called, visual memory research. Except Sperling's (1960) 'The information available in brief visual presentations', all landmark papers had 'memory', in one form or another, in their title: 'Paced memorizing in a continuous task' (Mackworth, 1959); 'Short-term memory in vision' (Averbach and Coriell, 1961); 'Short-term storage of information in vision' (Averbach and Sperling, 1961); 'The visual image and the memory trace' (Mackworth, 1962); 'The relation between the visual image and post-perceptual immediate memory' (Mackworth, 1963); 'A model for visual memory tasks' (Sperling, 1963); 'Successive approximations to a model for short-term memory' (Sperling, 1967). Nevertheless, it was from this 'memory-in-vision' research that modern visual attention research gradually emerged. The basic observation these investigators all started with is easily reproduced in a simple experiment, a whole-report experiment. A subject is briefly shown a visual stimulus containing many items, e.g. letters or digits, and is instructed to name as many of these items as possible. It then appears that there is a limit to the number of items that the subject can name. The subject can report only about six or seven items. In fact, this limit is a classic result (Cattell, 1885; Erdman and Dodge, 1898; Glanville and Dallenbach, 1920; all cited in Woodworth and Schlosberg, 1954). However, the very fact that, around 1960, this simple phenomenon was known under a great variety of n a m e s - 'span of perception', 'span of apprehension', 'span of attention', 'span of immediate memory' (Averbach and Sperling, 1961)- already indicates that it was far from clear what exactly determined this limit in performance. Mackworth, Sperling, and Averbach and Coriell were all concerned with elucidating aspects of this problem.
Visual attention
I
A1 I I I
7
I
A2
I
"-
I I
Figure 1.1. Two-store model for visual information processing. Store A: visual representation; A1, visual representation during stimulus presentation; A2, visual representation after stimulus presentation (positive afterimage). Store B: memory for identified items. The first arrow indicates a transition or translation stage between A and B, the second arrow the production of the response.
For a clear explanation of what aspects they were concerned with, it is necessary to make explicit the general framework with which these investigators started (Figure 1.1). One conviction was that, to explain the data obtained in whole-report tasks, at least two stores or memories had to be postulated. First, upon stimulus presentation, visual information enters the visual system and is stored during (A1), and also for a short while after (A2; the positive afterimage), stimulus presentation as a visual representation (store A). However, because responding can be delayed without severe consequences or simply because there is a limit to performance, a second store, containing representations of recognized items, must also be involved (store B). Of course, a transition or translation stage between A and B has to be postulated (the first arrow in Figure 1.1). Finally, the information appears in a (spoken or written) overt response (the second arrow). We are now in a position to have a closer look at the experiments, experimental results and theoretical views of Mackworth, Sperling, and Averbach and Coriell.
1.1
Mackworth: Whole-Report Task
In line with the general scheme just presented, Mackworth (1959) assumed that in the execution of the whole-report task two 'traces' are involved: a 'direct perceptual trace' (store A) and a more durable 'second trace' (store B). She presumed that '... the first trace may be a direct representation of the visual situation, while the second trace tends to be a verbal one. Thus the first direct trace must be translated by the brain into a verbal one to be stored in a more durable form' (Mackworth, 1959, p. 211). After a suggestion by Glanville and Dallenbach (1920), Mackworth (1962) assumed that, with very short stimulus presentations, this translation consists of a reading of the items from the afterimage, i.e. from A2 in our scheme. To further investigate this process, but also to assess the duration of the afterimage and the influence of possible limitations of store B on overt performance, Mackworth (1962, 1963) used a whole-report task and varied the exposure duration. Mackworth (1962) used stimulus cards with digits as items. The exposure durations used ranged from 62 ms to 16 s. Subjects had to write down as many digits as possible and in the correct position (i.e. identification and localization was required). Figure 1.2 presents the, for present purposes, most interesting part of the results. For exposure durations between 62 ms and I s a linear relationship between number of elements named and stimulus exposure time was found. The equation
8
A . H . C . Van der Heijden
7
B
4 I,..,
i,., O
3
e~
2
o~.,.
! i I
1
1"-6'~
1 1
~"
I 1
~
I 3
-d"
I 1
1
1 .~.
Seconds per message duration Figure 1.2. The effect of exposure duration upon the number of digits recalled.
for this relation is approximately: Digits correct = 4 + 3 x exposure time (in seconds). For exposure times longer than I s this linear relationship no longer holds. The increase in number of digits named is then much smaller. Mackworth (1962) regarded this outcome as a complete confirmation of her a priori theoretical views. The linear relationship is taken as evidence for a serial reading (or recognition) process. From the increase in the number of elements reported when the exposure time was increased from 62 ms to I s, it is concluded that the elements are read from the stimulus representation, i.e. when the stimulus is present (A1), at a rate of about three items per second (333 ms per digit). The additional four items that, according to the equation, are reported with a 0 ms exposure duration must, of course, have been read after stimulus presentation from the afterimage (i.e. from A2). Assuming a constant reading rate of three items per second, it follows that this image was readable during 4 x 333 ms, approximately 1350 ms (the intersection with the x-axis). Apparently, with exposure durations longer than I s, capacity limitations of the second store are reached. Therefore the linear relationship breaks down. Mackworth (1963) replicated and extended these results. Besides digits, now also letters, colors and shapes were used as items. Subjects had to name as many elements as possible in the correct order. For exposure times from 100 ms to I s, for
Visual attention
9
all four types of material, a linear relation between exposure time and number of elements reported was found. Of course, this outcome invited the same explanation as the digit results reported by Mackworth (1962): the slope of the linear function directly reflects the reading (or naming) rate and the intercept is an estimate of the number of items read from the afterimage. But now there was even more supportive evidence for this interpretation. Slope and intercept were largest for the function for digits, somewhat smaller for the function for letters, again somewhat smaller for the function for colors and smallest for the function for shapes. This detailed outcome pattern perfectly corresponded with the reading rates for these different types of material observed in an independent part of the experiment (3.6 items per second for digits, 3.1 for letters, 2.2 for colors and 1.9 for shapes). Multiplication of intercept and reading time produced an estimated usable afterimage duration of about 1.5 s for all types of material. Taken all together, according to Mackworth, items are literally read, and it is the exposure time plus afterimage duration and the reading rate that limit the amount reported in whole-report tasks. Only when exposure times are greater than 1 s do 'verbal-memory' limitations influence performance. For all exposure durations the number of items read from an afterimage with a duration of about 1.5 s considerably contributes to performance, i.e. there is an afterimage with a spectacularly long duration.
1.2 Sperling: Partial-Report Task Like Mackworth before him, Sperling (1960) started from the assumption that in the performance of whole-report tasks two 'stores' are involved: one subserving what the subject sees (store A) and one subserving what the subject knows (store B). But, contrary to Mackworth, Sperling (1960) was convinced that, to find out what is really going on in whole-report tasks, it is better to use exposure durations that do not allow useful eye movements, i.e. exposure durations shorter than 150ms. Moreover, Sperling was convinced that in such a situation store B, immediate memory in his terminology, already imposes a limit on what can be reported. From the literature (he refers to Miller, 1956) and from two experiments (Sperling, 1960, experiments 1 and 2), he knew that the capacity of this immediate m e m o r y - the span of immediate m e m o r y - is about 4.5 items. Sperling's main interest, however, was not in the capacity of this immediate memory. What he wanted to know was: How much can be seen in a single brief exposure? His main interest was in the capacity of store A. But, just because store B imposes a low limit on overt performance, a whole-report procedure cannot be used as a measuring stick to determine this capacity. If a subject's introspective reports about 'what is seen' are regarded as inadequate and an objective measurement is required, a trick has to be i n v e n t e d - a trick that ensures the limit of store B plays no role in the performance of the task. The technique for circumventing the limit of store B is the partial-report technique (Sperling, 1960). With this technique subjects are not required to name as many elements as they can as in a whole-report task; they have to report only a sample from the total number of items presented. Obviously, the maximum size of this sample is chosen in such a way that it is smaller than the span of immediate memory.
10
A. H. C. Van der Heijden
In his first partial-report experiment, Sperling implemented this technique in the following way. Stimuli containing rows of letters, e.g. three rows of three letters, were used. A stimulus was exposed for 50 ms. Immediately after the stimulus was turned off, one out of three different tones was presented. The pitch of the tone indicated which row had to be reported: high t o n e - u p p e r row, middle t o n e middle row, low t o n e - bottom row. Of course, the tone as a coded instruction can be given at any moment before, during and after stimulus presentation. In a second partial-report experiment, Sperling systematically varied the moment of presentation of the tone relative to the display. With the indicator just before stimulus presentation and at stimulus offset, the partial-report method resulted in nearly perfect performance. With increasing indicator delays, performance decreased to about 60% correct. By using the rule that the proportion of the sample equals the proportion seen of the whole stimulus, Sperling was able to calculate the number of items seen, the value he was interested in. Figure 1.3 depicts the results obtained. Sperling (1960, p. 20) concluded that with the indicator at stimulus offset '... between two and three times more information is available for partial-reports than for the whole reports.' The decreasing performance observed with increasing indicator delays indicated to him that 'This discrepancy between the two kinds of report is short-lived. Information in excess of that indicated by the whole-report was available to the Ss for only a fraction of a second following the exposure. At the end of this time, the accuracy of partial-reports is no longer very different from that of whole reports' (Sperling, 1960, p. 20). In summary, Sperling showed that many more items are seen than can be reported in a whole-report task. Moreover, the results obtained with the delayed indicator, together with the subject's phenomenological reports, the effects of visual masking and the known facts about the persistence of sensation, reinforced his a priori conviction that the items are still seen for a short interval of time after
9
100
!
75
50
..d
_ ,i 25
!
! / .10 0
! .15
I .30
! .50
l 1.0
0
Delay of instruction tone (see) Figure 1.3. Decay of available information with a stimulus containing 3 x 3 letters. The duration of the light flash is shown on the same time scale at the lower left. The bar at the right indicates the whole-report performance for this material.
Visual attention
11
stimulus exposure, or that the 'information is initially stored as a visual image and that the Ss can effectively utilize this information in their partial reports' (Sperling, 1960, p. 21).
1.3
Averbach and Corielh Bar-Probe Task
Averbach and Coriell (1961) knew that a brief visual stimulation gives rise to a much longer-lasting 'short-term storage in the visual system', called 'positive afterimage', 'retinal persistence', 'persistence of vision', etc., by writers on perception (A2). They also seemed to be convinced that A2 limits the span of perception (Averbach and Coriell, 1961, pp. 310-311). In general, they wanted to learn about the functional properties of this storage, 'its decay, readout and erasure'. Specifically, they wanted to know whether it is the capacity or the storage duration of store A that limits performance in whole-report tasks. Like Sperling (1960), they knew that an eye-fixation is the basic temporal building block of the visual system and that it is of fundamental importance to study what is happening during one such eye-fixation. They therefore used short exposure durations that mimic a single eye-fixation. Averbach and Coriell's partial-report bar-probe task, designed to investigate the properties of this visual storage, is a visual variant of Sperling's partial-report task. Visual displays consisting of two horizontal strings of eight random letters are briefly (50 ms) exposed. A visual spatial cue - a black bar above or below one of the 16 letter positions- is used to indicate a single letter. The subjects have to report the letter indicated and to guess if unsure. As in Sperling's task this visual indicator can be presented at various moments in time relative to the letter display. Averbach and Coriell used this possibility for investigating store A. As far as the temporal properties of store A are concerned, the results obtained essentially replicated Sperling's results. When the indicator preceded the display or was shown simultaneously with it, accuracy was rather high, about 75%. Accuracy gradually decreased when the cue was delayed. At a delay of about 200ms accuracy reached an asymptotic level of about 35%. From the finding that the decay curve did not fall to zero, Averbach and Coriell concluded that performance not only taps store A, but also contains a more permanent memory, i.e. store B, component. Ultimately they arrived at an estimated store A duration of about 250 ms. The spatial properties of store A were really surprising. Figure 1.4 presents the percentages correct per indicated position. The distribution of correct responses over positions is strongly W-shaped. Averbach and Coriell remark that individual letters in all 16 positions were clearly legible. So 'The explanation seems to lie.., in the fact that letters in some positions, although perfectly legible by themselves, are not legible in the context of the array' (p. 315). 'Resolution of the s t o r a g e - o r ease of reading-out- is disturbed when too much data is put in' (p. 316), and that in a very peculiar and intriguing way. Averbach and Coriell (p. 326) summarized their, for our purposes relevant, findings by stating that 'The visual process involves a buffer storage whose read-in is very fast and read-out relatively slow... The storage time is of the order of one-quarter second. The storage capacity is more difficult to assess...'
12
A. H. C. Van der Heijden
100
80
0 o
K 40
20 top row bottom row 1
2
3
4
5
6
7
8
target position Figure 1.4. Accuracy of report in Averbach and Coriell's (1961) partial-report bar-probe study as a function of target position.
1.4
Reading, Selection and Attention
The foregoing does not really do justice to the, often highly ingenious, initial experimental and theoretical work. Nevertheless, the information given suffices to indicate what was regarded as of real importance and where the bulk of modern research in visual attention started from (see Moray, 1969, and van der Heijden, 1992, for a description of some other important research traditions). As the titles of the initial papers, mentioned earlier, already suggested, and as is also readily apparent from the brief descriptions just given, the afterimage-the availability of usable visual information after stimulus exposure- and the possibility of assessing some of its properties with objective measurement techniques was regarded as the most important and exciting finding. And that excitement was infectious. Numerous investigators started to investigate this afterimage, or 'shortterm store', 'visual image', 'persisting sensory trace', 'visual information storage', or, as Neisser (1967) called it, 'icon'. A diversity of methods for investigating its content and duration was developed. A kind of subdiscipline within experimental psychology arose (Haber and Standing, 1969; Haber and Hershenson, 1974), flowered (Coltheart, 1980; Long, 1980), was questioned (Haber, 1983) and seems to have largely disappeared by now. Most of this research is not of any importance for the topic of our concern: attention in vision. We return to a few exceptions later in this chapter.
Visual attention
13
With regard to the information processing operation that is going on in their different types of tasks the investigators were unanimous. The subjects read the items. The observer reads a visual image (Sperling, 1960, p. 27), the process of reading the indicated letter takes time (Averbach and Coriell, 1961, p. 314), and the linear function shows the amount that can be read from the stimulus itself and from the visual image (Mackworth, 1962, p. 58). That there is reading in a reading task is self-evident, so this 'analysis' is not of great interest. In an instruction 'that was well understood' by his subjects, Sperling (1960, p. 11) gives some indications how reading was conceptualized. 'You will see letters illuminated by a flash that quickly fades o u t . . . You will hear a tone during the flash or while it is fading which will indicate which letters you are to attempt to read. Do not read the card until you hear the tone, [etc.]'. So, the idea is that the subject is in complete control of his or her reading, even in a single eye-fixation and when reading is the task. With regard to the speed of this reading process, however, the investigators had different opinions. As we have already seen, Mackworth was convinced that the reading of a single item takes a considerable amount of time: between 300 and 500 ms, depending on the kind of item. By using a chain of assumptions, Averbach and Coriell (1961) were able to estimate the time needed for 'the process of detecting a marker and reading a letter'. Their estimate was that for maximum performance between 200 and 270 ms are needed. For Sperling, however, these common-sense reading rates were not that self-evident. According to Sperling (1960, p. 24), reading goes much faster. He referred to data reported by Baxt (1871) indicating that the time required to read a letter is about 10 ms. We return to this extremely fast reading further on in this chapter. Sperling (1960) and Averbach and Coriell (1961) were well aware that in their respective tasks a 'choice' or a 'selection' (i.e. the act of choosing) is also involved. (Mackworth, 1963, p. 75, mentioned 'a selective process' but had no further use for it in her theoretical elaborations.) Sperling (1960, p. 23) stated 'If more information is available to him than he can remember, the S must "choose" a part of it to remember. In doing so, he has chosen the part to forget...Ss exercised only locational choices... Locational choices are probably not the only effective choices that the S can m a k e . . . Usually, what he does, or attempts to do, is determined by the instructions.' Averbach and Coriell (1961, p. 316) concluded that the visual system stores information and that 'This storage can be tapped selectively on a signal given by the experimenter'. However, not the 'selectivity', but the 'storage' is what was regarded as of importance. Sperling (1960) and Averbach and Coriell (1961) also assumed that in their tasks attention was involved, and that attention had something to do with the 'reading' and with the 'selection' (to the best of my knowledge, Mackworth did not use the word attention, but I am not a wordprocessor). Sperling (1960, p. 24) explained that the unobservable reaction to a high signal tone is 'looking up', where 'looking up' has to be described in terms of a shift of 'attention', where 'attention' has to be conceived as a preparation for, or sensitization to, the correct row. It is, however, far from clear what attention exactly does. Attending is not reading: 'Once his attention is directed to the appropriate row, the S still has to read the letters' (p. 24). Or possibly it is: 'The choice of what part of the stimulus to attend to or of which letters to read is the choice of what fraction of the stimulus information to utilize' (p. 25). To the best of my knowledge, Averbach and Coriell (1961, p. 324) used the
14
A. H. C. Van der Heijden
word attention only once, when they stated '... there is a selective process, which occurs only after the marker appears, when the subject has been cued to direct his attention to the single desired letter.' And, of course, that remark does not really elucidate the functional role of attention in their partial-report bar-probe task. It is not really surprising, and also not of real importance, that in the studies discussed up to now the notion of 'attention' remains rather vague and that its relation with 'reading' and 'selection' is not spelled out in detail. As described, the studies were not really concerned with these topics, but with the duration and capacity of memories, and especially of a memory in vision. That subjects can read, attend and select was taken for granted and gratefully used as a tool to investigate other issues. What is of importance, however, is that in these tasks something like reading, attending and selecting is involved in a nontrivial way. With regard to the bar-probe task Eriksen and Collins (1969, p. 254) remark: 'The perseverating image, or as Neisser (1967) termed it, icon, has generated considerable interest, but an equally if not more interesting characteristic of this experimental technique is the means by which attention can be selectively directed in a matter of milliseconds to the relevant stimulus item.' And, indeed, only a change in emphasis or perspective was needed to change the tasks from visual memory tasks into attention or selection tasks (see also Eriksen, 1990).
2
THE THEORY
The results described were incorporated in, and became supporting evidence for, two very influential general theories concerned with information processing and selection of information: the theories of Neisser (1967) and Broadbent (1971). These two theories had much in common. Moreover, they shared one fundamental, a priori assumption that purportedly made clear why was observed what was observed. The assumption is that the human information processing system has a central capacity limitation (see Kahneman, 1973, for further support). The theories can be regarded as 'one theory' because of their similarity and as 'the theory' because of their influence. On the basis of auditory research, Broadbent (1958, p. 297) concluded 'A nervous system acts to some extent as a single communication channel, so that it is meaningful to regard it as having a limited capacity.' This conclusion reappeared at the start of Broadbent's (1971) study, which also deals with visual information processing. 'Large though the brain is, any conceivable mechanism which could cope simultaneously with all possible states of the eye, the ear, and our other receptors, would probably be even larger' (Broadbent, 1971, p. 9). Neisser (1967, p. 87) agreed that 'To deal with the whole visual input at once, and make discriminations based on any combination of features in the field, would require too large a brain, or too much "previous experience" to be plausible.' 'If we allow several figures to appear at once, the number of possible input configurations is so very large that a wholly parallel mechanism, giving a different output for each of them, is inconceivable' (Neisser, 1967, p. 94). Brains are obviously too small. They have a limited capacity.
Visual attention
15
Of course, the problem then is how to make an appropriate use of this limited brain capacity. Broadbent's (1958, p. 297) second conclusion gave the answer: 'A selective operation is performed upon the input to this channel...'. Broadbent (1971, p. 9) repeated this conclusion when he states 'The workings of the nervous system then are likely to incorporate a good many devices aimed at economizing on the mechanism necessary...Broadbent (1958)...held that the limited capacity portion of the nervous system was preceded and protected by a selective device or filter, which would pass only some of the incoming information.' And again, Neisser (1967, p. 94) agreed: 'To cope with this difficulty [of limited capacity], even a mechanical recognition system must have some way to select portions of the incoming information for detailed analysis.' So, selection is the answer. Appropriate selection guarantees appropriate use of the limited capacity brain. But selection requires something, e.g. a display of possibilities, from which to select. Broadbent (1958, p. 298) had already introduced this provision. He stated that 'Incoming information may be held in a temporary store at a stage previous to the limited capacity channel: it will then pass through the channel when the class of events to which it belongs is next selected.' So, information is selected from a 'buffer or a temporary store' (Broadbent, 1971, p. 10). Also Neisser (1967, p. 94) was clearly aware that for selection a range of alternatives is required. He stated that selection '... implies the existence of two levels of analysis... '. One level of analysis provides the information to select from, the second level deals with the information selected in detail (see also Neisser, 1967, p. 90). And, of course, for the first level the icon comes in very handy. In the foregoing I stressed the correspondences between Neisser's (1967) and Broadbent's (1958, 1971) views, but, of course, there are also differences. For instance, the difference in terms used to characterize the information processing sequence just described is striking. Broadbent talks in terms of 'channels' containing representations of physical features, 'filter settings' for selecting this information and 'identification or categorization' for the operation of the limited capacity portion. He virtually refuses to use the word attention (the word does not appear in the subject index of his 1971 book). Neisser talks about 'preattentive analysers' which form segregated 'objects', about 'attention' as the allotment of analyzing mechanisms to a limited region of the segregated field, and about an 'act of focal attention' in which objects are identified. There are also substantial differences in opinion, for instance, about what 'identification' exactly is: a result of a passive analysis (Broadbent, 1958, 1971) or the outcome of an active construction (Neisser, 1967). For present purposes, however, all this is not of very much importance (except for the point that more languages were made available for expressing basically the same theoretical point of view; see also Kahneman, 1973). Of importance is how the detailed data, provided by Mackworth, Sperling, and Averbach and Coriell, are dealt with.
2.1
Selection for Processing
From the foregoing it will be clear that the evidence showing the existence of a positive afterimage was gratefully accepted. It appeared, under the terms 'buffer store' (Broadbent, 1971) and 'icon' (Neisser, 1967), as an essential component in the
16
A. H. C. Van der Heijden
visual information processing theories. However, because our main interest is not in positive afterimages but in visual attention, we can immediately turn to reading, selection and attention. With regard to the processing, i.e. the identification or categorization, of the information in the tasks discussed, both Neisser (1967) and Broadbent (1971) are extremely clear. Of course, the items are simply read. For both theorists the 'wholereport t a s k ' - t h e task Mackworth was mainly concerned w i t h - p r o v i d e d the essential information. Even before he has given his theoretical analysis, Neisser (1967, p. 42) already knows that 'The fact that the span of apprehension averages only four or five... probably results from the high rate of encoding. In a tachistoscopic experiment the subject must read the fading icon as rapidly as possible... '. Later on this view is elaborated: 'The attentive synthesis of any particular letter or figure takes an appreciable time, of the order of 1 0 0 m s . . . I f a whole row of letters is to be identified, they must be synthesized one at a time... To "identify" generally means to name, and hence to synthesize not only a visual object but a linguistic-auditory one... Hence the span of apprehension is limited to what can be synthesized, and then verbally s t o r e d . . . ' (Neisser, 1967, p. 103). With regard to Mackworth's data (see also Figure 1.2) Broadbent (1971, pp. 173174) remarks: 'We therefore have the evidence we wanted, that the selective process is limited in its rate of working.., we have a buffer store lasting a fixed length of time, and a subsequent serial process which can extract one item after another from the buffer until the limit of time is exceeded. The speed of the serial process is less when, for example, naming colours rather than digits; and it seems therefore to have to do with the allocation of responses to stimuli. In our present terms, it is the elicitation of a category state rather than the mere transmission of evidence about the occurrence of a physical event.' According to Broadbent, the speed of the serial process is the speed specified by Mackworth. Of course, this serial identification also requires a serial selection of the information to be identified. This serial selection has not only to make clear how in Mackworth's tasks the responses appear in the correct order, but also how in Sperling's task responding is restricted to one row and in Averbach and Coriell's task to a single letter. These problems are not dealt with in very much detail; for instance neither Broadbent nor Neisser really explains Averbach and Coriell's results. In general, for Broadbent (1971) the relevant selective process is 'filtering' (or 'stimulus set') guided by instruction, with filtering conceived as a hierarchical process in which stimuli are examined for the presence or absence of some key features (e.g. being red, or on the upper row). Only if these features are present are all other features of the stimulus processed (e.g. the red one is an 'A', or on the upper row is 'X, G, T'). For Neisser (1967) the relevant processes are the processes that determine the priority of encoding. Two processes capable of doing this are distinguished: perceptual set, based on (reading) habits or induced by instruction (pp. 39 and 103), but also the preattentive processes or the outcome of preattentive processes e.g. motion is specified as an effective cue (p. 92). As already indicated, Broadbent (1971) had little use for the word attention. The word is used in the section heading 'Visual experiments in selective attention' but does not show up in the subject index. Following Mackworth's lead, Broadbent managed without attention. Neisser, on the contrary, knows what attention is and
Visual attention
17
is not: '...attention is not a mysterious concentration of psychic energy; it is simply an allotment of analyzing mechanisms to a limited region of the field. To pay attention to a figure is to make certain analyses of, or certain constructions in, the corresponding part of the icon' (Neisser, 1967, pp. 88-89). In fact, for the tasks we are concerned with, attending is (simply) reading after selection (and that is not too different from Sperling's and Averbach and Coriell's views). The foregoing does not really do justice to these two major modern information processing and attention theories. Not mentioned, for instance, has been that both theories recognize other forms of selection than we described. In Neisser's theory a second form of selection is hidden in 'the synthesizing of an object'. In his view 'Paying attention is not just analyzing carefully; rather, it is a constructive act... What we build has only the dimensions we have given it' (Neisser, 1967, p. 96), or '... the detailed properties and features we ordinarily see in an attended figure are, in a sense, "optional"' (Neisser, 1967, p. 94). Broadbent (1971; see also Treisman, 1964, 1969) is very explicit about a second form of selection: response set. This form of selection is induced by an instruction such as 'name the letters, not the digits', and operates, in one way or another, via semantic class (see Broadbent, 1971, pp. 177-180 for details). These alternatives are certainly real and of importance in visual information processing (Bundesen, 1990; Phaf, Van der Heijden and Hudson, 1990; Van der Heijden, 1992). However, because they are not central in this stream of theorizing, for this moment, we can safely set them aside. For present purposes, then, our exposition suffices, because it is now clear that two topics became central in modern theorizing: (1) One concerned with a limited capacity information processing system that identifies items one after another (identification). (2) Another concerned with (selection) mechanisms that regulate that this system is working on only a subset of the visual information at a time, i.e. performs 'selection for processing' (selection).
3
FURTHER DEVELOPMENTS
As stated, Broadbent's (1958, 1971) and Neisser's (1967) theories were very influential. Indeed, they were so influential that it is not difficult to characterize the bulk of theorizing since then and up to now. The nearly generally accepted view became, and still is, that the information processing system has indeed a limited central capacity for processing information and that a selective mechanism, early in a linear stream of information processing, has to protect this system against 'overload' by feeding it with appropriate bits and pieces of information to work upon. In this 'limited-capacity early-selection' view, selection precedes recognition and only what is selected can be recognized. Models expressing this view, with slight elaborations, modifications, variations, extensions and other niceties, were proposed by Kahneman (1973), Coltheart (1972, 1975), Rumelhart (1970), Eriksen and Rohrbaugh (1970), Francolini and Egeth (1980), Johnston and Dark (1985, 1986), Kahneman and Treisman (1984) and many others. Nowadays, a large part of experimentation and theorizing is dominated by Treisman's 'feature integration theory of attention', which is simply a direct descendant and further elaboration of Neisser's (1967) view (see Treisman, 1988, for an overview).
18
A. H. C. Van der Heijden
However, there were also dissidents. These theorists rejected the kernel assumption of limited central capacity. Either their a priori theoretical views or some experimental evidence forced them to defend one or another variant of the theoretical position that the central information processing system is not hindered by capacity limitations or that the capacity issue is simply irrelevant for theory. The first to propose such a view were Deutsch and Deutsch (1963). Related proposals came for example from Norman (1968), Morton (1969), Keele (1973), Shiffrin and Schneider (1977), Duncan (1980), Coltheart (1984), Pashler (1987), Posner (1978), Mewhort and Campbell (1981) and Van der Heijden (1981). In line with the 'unlimited capacity' assumption, most of these dissidents also suggested that selection of information takes place at a stage that contains identified information, i.e. that selection occurs relatively 'late' in a linear stream of information processing. Selection follows pattern recognition and can be based on recognized information. So, there was an alternative for the 'early-selection limited-capacity' view: the 'unlimited-capacity late-selection' view. Because of all kinds of major and minor similarities and differences between and within these two groups of views, and because there is also a substantial number of hybrids, it is impossible to present an intelligible overview. Fortunately, that is not really necessary. We are now in a position to have a more systematic look at some important 'further' developments. Obviously, two lines of further development are of critical importance: developments in the domain of information processing and developments in the domain of information selection.
3.1 Parallel Processing 3.1.1 Retinal Acuity and Eye Movements The first further development is not really a further development. It concerns the delayed recognition within visual information processing psychology of an already long-established fact- the fact that the peripheral eye, or, in general, the peripheral visual system, has severe and peculiar capacity limitations that show up in performance (e.g. Hall and Von Kries, 1879; Poffenberger, 1912; both mentioned in Eriksen and Schultz, 1977). As stated, Broadbent (1971) imported his kernel assumption-limited central capacity for categorization or identification in vision- from auditory information processing, and, possibly, in audition this assumption reflects something real (Neumann, Van der Heijden and Allport, 1986). Neisser (1967) imported his basic assumption- limited central capacity for figural synthesis- from artificial intelligence ('... even a competent automaton would require processes of figure-formation and attention...' (p. 94)), and, of course, we now know that for artifacts this central limitation is certainly real. The important point, however, is whether for visual information processing this assumption also hits something real, and there are strong reasons to doubt that it does so. Until now, virtually all major theories of attention in vision have completely neglected the fact that the visual system has severe peripheral limitations. For instance, no implications whatsoever are attached to the astonishing information, provided by neurophysiology, that there are only two million (peripheral) retinal
Visual attention
19
ganglion cells connecting eye and cortex but that the number of (central) neurons in the cortex concerned with visual information processing is thousands of times larger (Barlow, 1981, 1985). The reason for this neglect seems obvious and even understandable. We 'see' a detailed and panoramic world. (But it is far from clear how the reasoning then proceeds and leads to limited central capacity. We 'see' so much, so much too much information is presented to the eyes, so there must be a limited capacity for really 'seeing'?) The amount of information 'seen in the world' or 'presented to the eyes' is of no relevance, however. Of relevance is only the information the eye presents to the brain. And eyes are very 'limited' in this respect. For instance, the human daylight visual system is, on average, very limited in its capacity for resolving detail. Moreover, the eye is markedly inhomogeneous with respect to this capacity. The highest visual acuity is found in a very small, central region of the retina: the fovea. Most retinal ganglion cells are devoted to the transmission of information from this region to the cortex. From the fovea towards the periphery, acuity falls off progressively and dramatically (Anstis, 1974; Haber and Hershenson, 1974). Appreciably fewer retinal ganglion cells are involved in transporting information from these parafoveal and peripheral regions. In short, the eye sends detailed information to the brain about only a very limited region of the visual world. Just because the eyes send only a limited amount of information to the brain, it seems a priori highly unlikely that the brain cannot deal with the information it receives from the eyes, i.e. has capacity limitations. It is much more likely that the brain is tailor-made for dealing with the information received from the eyes. Indeed, it is well known that most of the visual brain's computing power is devoted to information received from the fovea (see Cowey, 1981, and Levi, Klein and Aitsebaomo, 1985, for the evidence). So, instead of assuming that the brain has severe limitations in the identification or categorization of information, and therefore needs a central protective filter, it seems much better to assume that the eyes, as peripheral filters, prevent the brain from reaching its limits by passing on only a limited amount of information about the world. The eyes as interfaces between world and brain seem perfectly suited for solving Broadbent's capacity problem. Normally, the eyes point only in one direction at a time. Together with their limited capacity, this means that at any moment in time the eyes can sample only a small part of the visual world. Only through a series of time-consuming eye movements and fixations can a larger region of the visual world be effectively dealt with. (Indeed, the perception of the detailed, panoramic world is a process extended in time; a limited-capacity process with the eye as the bottleneck.) So, instead of assuming a central shifting around of processing mechanisms, it seems much better to assume that the eye movements are the mechanisms that allocate central analyzing mechanisms to a limited region of the visual field. By means of eye movements the peripheral visual system seems perfectly capable of solving Neisser's analyzer allocation problem. At least and in general, before central limited capacity is invoked as an explanatory construct, peripheral limitations and peripheral contributions have to be adequately acknowledged (but words such as 'retina', 'acuity', 'retinal acuity', etc., virtually never appear in the 'subject indexes' of major treatments of attention in vision). Of course, aspects of the operation of this peripheral limited-capacity system will show up in experiments and can be mistakenly taken as evidence for central capacity limitations. For instance, in experiments in which eye movements are
20
A. H. C. Van der Heijden
possible and the eyes can provide the central system with a series of new detail, indicators for the operation of the central system will be hidden behind a massive, serial contribution of the peripheral sensory system. This was (almost) certainly the case in Mackworth's experiments in which exposure times were used that allowed more than a single eye-fixation (exposure times greater than 150-200 ms; Figure 1.2). Therefore Mackworth's, and consequently Broadbent's and Neisser's, serial reading at an extremely low rate has to be seriously doubted. To learn something about the properties, e.g. the capacity, of the central information processing system, short exposure times, mimicking what happens in one single eye-fixation, have to be used. But even when exposure times are used that do not allow useful refixations the same interpretation error can arise. For instance, the differential retinal acuity means that, with linear arrays as used in the research of Mackworth, Sperling, and Averbach and Coriell, the assumption that all information will be available simultaneously to some central processor is seriously in error (see Eriksen and Schultz, 1977, for relevant evidence). The differential central arrival times for a n d / o r the differential central quality of information projected on different regions of the retina can induce the spurious impression of a limited central capacity a.nd serial processing (an example follows later). If one wants to eliminate this artifact, circular arrays centered around the fixation point have to be used, as Eriksen did in his line of research (see Eriksen and Steffy, 1964, for the introduction of the circular arrays; see Eriksen, 1990, for an overview of this research).
3.1.2
Perceptual Independence
and Lateral Interference
Within a single eye-fixation there i s - besides retinal a c u i t y - a n o t h e r peripheral (and possibly also central) factor that severely limits performance in multiple-item tasks such as whole-report, partial-report and bar-probe tasks. Averbach and Coriell (1961, p. 316) hinted at this factor when they wrote: '...letters in some positions, although perfectly legible by themselves, are not legible in the context of the array.' In their view this factor (let us call it 'lateral interference') is responsible for the peculiar W-shaped serial position curve for correct reports (see Figure 1.4). Later research produced abundant evidence for the importance of lateral interference, i.e. the adverse effect on identification of a letter caused by a (mask) letter in close spatial and temporal proximity. Eriksen and associates clearly saw that it is futile to ask questions about the capacity of the central information processing system when it is not known whether the peripheral sensory apparatus is capable of transmitting sufficient information to work upon. First it has to be ascertained whether there are enough 'independent sensory channels' to make the information available to the central system. In their research the concept of 'perceptual independence' is central. With perceptual independence, 'At any moment in time the varying sensitivities, or alternatively, error factors, in the visual perceptual system are uncorrelated for elements associated with foveal areas separated by some minimal distance, and further, there is a lack of interaction between forms, simultaneously presented on these separate foveal areas' (Collins and Eriksen, 1967, p. 369). From their research it appears that there is perceptual independence when the items are separated by more than 1 degree (Collins and Eriksen, 1967; Eriksen, 1966; Eriksen and Lappin, 1967), while
Visual attention
21
with a separation of less than 1 degree the retinal sensitivity or error factor is correlated (Collins and Eriksen, 1967; Eriksen, Munsinger and Greenspon, 1966). Subsequent research provided more evidence for the importance of this factor and substantially more detailed information about its ways of operation. It appeared that lateral interference, as Eriksen's research had already shown, is indeed stronger with close item-item spacings than with wide spacings (Bouma, 1970; Wolford and Chambers, 1983). It is more detrimental in peripheral regions of the retina than in the fovea (Bouma, 1970; Wolford and Hollingsworth, 1974). It is not only observed with short but also with extended exposure durations (Estes, Allmeyer and Reder, 1976; Taylor and Brown, 1972). The effect is asymmetric: a mask-item placed on the peripheral side of a target-item is more detrimental for performance than one placed on the foveal side (Bouma, 1970, 1973; Chastain, 1983; Chastain and Lawson, 1979). Finally, lateral interference is probably to some extent feature specific; the accuracy-reducing interaction appears larger for identical or similar letters than for dissimilar letters (La Heij and Van der Heijden, 1983; Shapiro and Krueger, 1983). Estes (1978, p. 188) argued that exactly this factor, lateral interference, in combination with retinal acuity can wholly account for the W-shaped serial position curve found with partial-report bar-probe tasks using linear arrays. Acuity is highest in the fovea and declines progressively towards the periphery. This explains the high level of performance observed with items in the middle of the array. Lateral interference is greatest in the interior of the array where each item has two neighbors and least at the ends of the array where the items have only one neighbor at the foveal side. This explains the high level of performance for the two end items. The two explanations combined can account for the shape of the position curve (see Van der Heijden, 1987, and Hagenzieker, Van der Heijden and Hagenaar, 1990, for numerical examples that illustrate the power of Estes' explanation). Nevertheless, as we will see later, this retinal acuity-lateral interference story cannot form the complete explanation of the W-shaped serial position curve. In partial-report bar-probe tasks, more than identification alone is involved and other factors contribute to the position curve. However, recent experiments and analyses have shown that at least a substantial W-shaped component of this curve has indeed to be attributed to the combination of retinal acuity and lateral interference (Hagenaar, 1990; Hagenzieker et al., 1990; Van der Heijden, 1992). Like retinal acuity, lateral interference as a peripheral effect can seriously hinder us in assessing the capacity of the central information processing system. A peripheral 'limitation' precedes the central system and imposes its characteristics on the total stream of information processing. The peripheral limitation is then easily mistaken as a central limitation and thereby as spurious evidence in favor of limited central capacity theories. If lateral interference is (also) a central effect (Cowey, 1979, 1981), then it is a sort of central 'capacity limitation' not intended by Broadbent's, Neisser's or others' limited capacity theories. Lateral interference has then to be regarded as a property of Broadbent's limited capacity channel and no filter can protect this channel against its own properties. Or, lateral interference is then an inherent property of Neisser's analyzing mechanisms and will accompany them to all limited regions where focal atttention goes. In short, if lateral interference is also a central effect, then it is better not to regard it as a central limitation as intended by limited capacity theories. It is then much more likely that it has a positive function in visual
22
A. H. C. Van der Heijden
information processing (see Phaf et al., 1990, and Van der Heijden, 1990, for some suggestions).
3.1.3 Successive Masking and Parallel Processing To gain more insight into the real capacity and mode of operation of the central information processor, Sperling (Averbach and Sperling, 1961; Sperling, 1963, 1967) used a whole-report task. After Baxt (1871), he tried to get rid of the visual image, and therefore be able to control the duration of the visual sensation, by using a masking stimulus. The stimulus containing the information for whole-report was exposed from 5 up to 60 ms (Sperling, 1963) and was, at termination, immediately followed by a patterned mask which consisted of parts of letters spread randomly over the field. Sperling assumed that this visual noise field would erase the visual image, because 'The noise field would effectively mask the stimulus even if both fields were on simultaneously; therefore, it is assumed to stop any possible persistence of the stimulus' (Averbach and Sperling, 1961, p. 202). The results of these experiments were spectacular. With each 10 ms increase in stimulus exposure time, an additional letter could be named, up to a total of about four letters. Sperling (1963) took this finding as evidence for a very fast serial 'scanning' process (the word 'reading' could obviously no longer be used). However, after a further detailed analysis, Sperling (1967, p. 290) concluded that it must be a parallel scanning process. 'The observation that all locations begin to be reported at better than chance levels even at the briefest exposures, may be interpreted as evidence of an essentially parallel process for letter recognition. This process gives the illusion of being serial because the different locations mature at different rates...' In his model Sperling (1967) assumed that the scanned elements are stored in a 'recognition buffer memory' as a 'program of motor instructions'. 'The important idea.., is that the program of motor instructions for a rehearsal can be set up in a very short time (e.g. 50 ms for 3 letters) compared to the time necessary to execute it (e.g. 500 ms for 3 letters)' (Sperling, 1967, p. 291). Indeed, the important idea here is that 'reading' can be decomposed into a fast parallel recognition component and a slow, subsequent, selection component. This important idea was not accepted, however. Broadbent (1971, p. 17) simply left 'these very rapid processes' aside because '... the area is changing rapidly, is somewhat outside our main interests, and at the time of writing is rather confused...'. But Neisser (1967, p. 33) came up with an important argument against Sperling's experiments: 'In my opinion, this and other "erasure" experiments demonstrate only the effect of exposure time on legibility... The subject reads letters not only from the stimulus when it is on, but from the icon afterwards, despite the presence of the mask' (see also Kahneman, 1968, p. 417). And, indeed, because in Sperling's experiments the stimulus duration covaried with the interval between onset of stimulus and onset of mask (the SOA), it cannot be excluded that the quality of the visual image improved with increased exposure durations and therefore was better able to resist the interfering effects of the masking stimulus. Phrased in a different way, Sperling (1963, 1967) subscribed to an 'interruption theory' of backward masking. He assumed that the number of items recognized, X, equals X=F~(t)
(1)
Visual attention
23
where t is the exposure duration of the stimulus (or the moment of presentation of the mask) and F~ stands for the recognition operation performed before interruption or erasure. There is, however, abundant evidence for an alternative theory of backward masking: the 'integration theory' (Eriksen, 1966; Eriksen and Collins, 1967; Kahneman, 1968, pp. 416-418; Kinsbourne and Warrington, 1962; Schultz and Eriksen, 1977). This theory holds that successive visual events are summed or integrated into a composite, something like a montage. This target-mask montage still allows items to be recognized. According to this view, what Sperling really measured equals X = F~(t) + F2(t).
(2)
Here, Fl(t) again represents the number of elements recognized before mask presentation and F2(t) represents the number recognized from the composite, i.e. the number recognized despite the presence of the mask. And because it is likely that F2(t) increases with increasing t (the quality of the stimulus contribution to the composite increases with increased exposure durations), it follows that Sperling's recognition rates, estimated with equation (1), overestimated the real recognition rates. Sperling mistakenly took F1 + F2 for F~. However, t h i s - i n principle v a l i d - a r g u m e n t can be defeated with its own weapons. Imagine a situation in which both target duration and mask duration are held constant and only the dark interval (the ISI) between target offset and mask onset is increased in length. In such a situation F~(t) in equation (2) can still be interpreted as the number of items recognized before presentation of the mask and F2(t) as the number read from the composite. In this situation, however, there is no reason at all to assume that F2(t) increases with t (i.e. with target duration + ISI). Because the visual image of the target fades during the ISI, it becomes increasingly susceptible to masking and the (undecayed) representation of the mask will therefore increasingly dominate in the composite. So, a decreasing contribution of F2(t) with increasing t is to be expected. In such a situation, Sperling's recognition rates estimated with equation (1) will underestimate the real recognition rates. Van der Heijden (1971) reported an experiment using the conditions specified. The results were essentially the same as those reported by Sperling (1963, 1967). So, it seems that there is not very much reason to doubt Sperling's (very fast) parallel 'scanning' process (see also Coltheart, 1972, for essentially the same conclusion). As stated, Sperling (1963) initially assumed serial processing of items. Later, Sperling (1967) wrote that the assumption of parallel processing was more in accordance with the data. A similar transition is to be found in the theoretical work of Estes and associates in relation with their 'detection' method. Initially their theorizing was mainly concerned with serial scanning (Estes and Taylor, 1964, 1966). Later on, Wolford, Wessel and Estes (1968, p. 444) concluded '... the type of model favored by our results involves parallel rather than serial processing.' With regard to these serial-to-parallel transitions it is important to realize that the evidence for serial processing was obtained from experiments in which the number of elements presented simultaneously was varied. So it is not unlikely that most of this early evidence for serial processing has to be ascribed to the decreasing average retinal acuity and the increasing lateral interference with increasing number of items presented.
24
A. H. C. Van der Heijden
The (unlimited central capacity) parallel processing versus (limited central capacity) serial processing issue has proven to be one of the most unruly issues in visual information processing research. Not only is there the problem to converge upon experimental paradigms that eliminate spurious peripheral influences, resulting from, for example, saccadic eye movements, covarying retinal acuity and differential lateral interference (see Bouma, 1970, for the virtual omnipresence of the latter factor), there are also severe problems of interpretation because parallel and serial mechanisms can mimic each other's outcomes (Snodgrass and Townsend, 1980; Townsend, 1971, 1972, 1974, 1990). Last, but not least, there is the system whose properties are searched for, which most likely works with an intricate combination of parallel and serial processes (Allport, 1989; Van der Heijden, 1992). Nevertheless, despite all this, nowadays there is abundant evidence for various forms of parallel processing. Space does not allow me to deal with this evidence in detail, so I restrict myself to a listing of what I regard as the important paradigms: (1) visual search tasks (see Duncan and Humphreys, 1989; Egeth, Folk and Mullin, 1989, for recent overviews; see also Chapter 2); (2) varieties of the Stroop task (see Van der Heijden, 1992, for an overview); (3) the negative priming task (Driver and Tipper, 1989; see Allport, 1989, for a brief overview). Taken all together, the best bet seems to be that the brain has no problems in dealing with the information provided by the eyes, or that the central system has no capacity limitations in this respect (see also Van der Heijden, 1987, 1992; for related conclusions, mainly reached on the basis of theoretical considerations, see also Allport, 1987, 1989; Neumann, 1987, 1990; Van der Heijden, 1990).
3.2 3.2.1
Early Selection Physical Cues and Unlimited Capacity
Of fundamental importance for the study of (selective) attention in vision is the research concerned with the problem of which stimulus features give a good partial-report performance in Sperling's (1960) partial-report task, or, more generally, the research concerned with which stimulus characteristics afford efficient selection. Sperling showed that when the tone directly indicated a position, efficient selection was observed and that result was often replicated. From subsequent research it appeared that efficient selection was also possible on the basis of color (Clark, 1969; Dick, 1972; Von Wright, 1968, 1972; see also Bundesen, Pedersen and Larsen, 1984; Bundesen, Shibuya and Larsen, 1985), brightness (Von Wright, 1968; see also Bundesen et al., 1984), shape (Turvey and Kravetz, 1970; Von Wright, 1968; see also Bundesen et al., 1984) and size (Von Wright, 1970). So, 'simple physical features' afford efficient selection. Sperling (1960, experiment 6) also investigated what happened when, instead of the members of one of the rows, the tone indicated the members of the derived category 'letters' or 'digits'. He found that, even when the tone preceded the letter-digit array, performance was not better than when half of the items were selected at random. Von Wright (1968, 1970) replicated this negative result. The
Visual attention
25
same outcome was obtained when the cue specified the members of a category characterized by a nonvisual property such as 'vowel' versus 'consonant' (Von Wright, 1970) or 'letters ending with the v o w e l / E / ' versus 'letters ending with the v o w e l / I / ' (Coltheart, Lea and Thompson, 1974). Subsequent research showed that under some circumstances efficient selection by a derived property, such as letter-digit, is possible (see Bundesen, 1987; Bundesen et al., 1984, 1985; Duncan, 1983; Merikle, 1980; Shibuya and Bundesen, 1988). However, because in these experiments not so much attention, but something like 'expectation', might have been the prime selective factor, we will not discuss these results further (see Bundesen, 1990; Van der Heijden, 1992). These results considerably helped in shaping and strengthening the conviction, already put forward by Mackworth, Sperling, and Averbach and Coriell, '... that the original display must produce some effect which is unselective, and which at first contains all the information from all parts of the visual field' and that then '... a selective process which picks out some parts of this information...' comes in (Broadbent, 1971, p. 171). Moreover, the outcome of the partial-report experiments seemed to provide the insight into the nature of the information initially registered in the 'icon' (Neisser, 1967) or the "buffer' (Broadbent, 1971). The empirical generalization was that 'raw', 'elementary', 'precategorical' or 'physical' features such as position, color, brightness, size and shape were represented (and therefore afforded efficient selection) and 'processed', 'derived', 'categorical', or 'semantic' properties such as letter, digit, vowel or consonant were not (and therefore could not serve as the basis for selection) (Coltheart et al., 1974). The observation that there is no efficient selection on the basis of derived properties was taken as evidence that identity information did not become available automatically. Identity information for the stimuli was not only not in the initial store, it was also not somewhere else in the information processing system. So, this pattern of results also strengthened the various variants of the view '...that selection takes place in order to protect a mechanism of limited capacity... (serial processing system, or categorizing mechanism)' (Broadbent, 1971, p. 178). However, neither the point of view that only raw physical features are represented nor the point of view that identity information is not represented are easily substantiated. The basic reason is that the reasoning, starting with data about selection and ending with conclusions about representation and capacity, is not valid. This is easily shown by means of experimental results. Von Wright (1970) showed that when the cue specified 'upright letters' versus 'letters rotated through 180 degrees' no efficient selection was found. Bundesen et al. (1984) found that selection of 'angular letters' and 'curved letters' was inferior to selection by category membership. However, nobody doubts that these subtle visual properties were represented somewhere in the visual system. So, from 'does not afford (efficient) selection' it does not follow 'was not represented in the visual system'. Mere presence is no warrant for efficient selection. Also something like 'conspicuity' (see Broadbent, 1971, p. 180), or 'discriminability' (Duncan, 1980; Nissen, 1985) is a factor of importance (see also Allport, 1989, p. 637), and the distinction between vowels and consonants, letters and digits, upright letters and rotated letters, or angular and curved letters is not very conspicuous. We return to this issue later in the chapter. That appreciably more, and very sophisticated, visual information processing than that resulting in representations of 'raw' and 'simple' physical features
26
A. H. C. Van der Heijden
precedes selection was already recognized by Neisser (1967) and Kahneman (1973). As stated, in Neisser's view, 'preattentive processes', working in parallel over the entire visual field, first structure that field and '... produce the objects which later mechanisms are to flesh out and interpret.' (p. 89). These, what must be, highly complex computational processes are responsible for the Gestalt factors of perceptual organization and perceptual grouping. There is indeed abundant evidence that these factors profoundly affect visual selection (Allport, 1989; Duncan, 1984; Fryklund, 1975; Kahneman and Henik, 1981; Merikle, 1980; Prinzmetal, 1981). Moreover, in visual search tasks not only simple physical features such as line orientation or color can be detected in parallel. There are also three-dimensional emergent features that stand out in visual search (Enns, 1990). As stated, the point of view that identity information does not become available automatically is not easily substantiated on the basis of the selection data. In general, '... behavioral evidence about the relative efficiency of "selection"... is simply irrelevant to questions about the level of processing accorded to the "unselected information"' (Allport, 1987, p. 409; see also Van der Heijden, 1981; 1992). Limited central capacity requires efficient selection, but efficient selection does not require limited central capacity. With the selection data as described it is perfectly possible that identity information becomes available automatically and in parallel but that it cannot serve as a starting point for efficient selection (Coltheart, 1984; Duncan, 1981; Pashler, 1984; Van der Heijden, 1992). In summary, the importance of these experiments is neither in the information they provide about the kind of information stored in 'iconic memory' nor in the insight they provide into the kind of information processing that is going on. The real importance is in the information they provide about the kinds of selection that can be done. It appears that especially conspicuous visual characteristics afford efficient selection. Only in this sense can it be said that the selective operation addresses a stage of precategorical representation or that selection is 'early' selection. However, it is better to say that visual selection is really selection in vision (Van der Heijden, 1992). There is no reason to reject the assumption that the central system suffers no capacity limitations. 3.2.2
Location Errors a n d Early Selection
Also of fundamental importance for the study of selective attention in vision is the research concerned with the problem of what type of errors are made in Averbach and Coriell's (1961) partial-report bar-probe task. Initially it was simply assumed that all errors are errors of identification. Later it was realized that in this task localization errors could also occur (Eriksen and Rohrbaugh, 1970). Therefore two types of errors were distinguished. An error response was counted as an intrusion error when the subject named a letter not present in the array and as an location error when the subject named a letter from the array but from a wrong position (Townsend, 1973). Intrusion errors were regarded as indicators of identification problems and location errors as indicators of problems of localization. The amazing outcome of the research using this analysis is that, with linear arrays and with the letters close together, overwhelmingly more location errors are found than are to be expected on the basis of the traditional identification-problem view (Butler, 1980; Chow, 1986; Hagenaar, 1990; Hagenzieker et al., 1990; Matsuda,
Visual attention
27
1988; Mewhort and Campbell, 1978; Mewhort, Marchetti and Campbell, 1982; Mewhort et al., 1981, 1984; Townsend, 1973). Most of the location errors consist of correct responses to an item adjacent to the item indicated by the probe, i.e. 'near' location errors (Campbell and Mewhort, 1980; Hagenaar, 1990; Hagenzieker et al., 1990; Mewhort and Campbell, 1978; Mewhort et al., 1981). Location errors are distributed in the form of an M over the serial positions and nearly completely complement the W-shape for correct reports, presented in Figure 1.4 (Campbell and Mewhort, 1980; Hagenaar, 1990; Hagenzieker et al., 1990; Mewhort and Campbell, 1978; Mewhort et al., 1981). The decrease in accuracy as a result of delaying the cue and of applying a backward mask is largely reflected by an increase in location errors (Butler, 1980; Butler, Mewhort and Tramer, 1987; Campbell and Mewhort, 1980; Hagenaar, 1990; Mewhort and Campbell, 1978; Mewhort et al., 1981, 1984). Intrusion errors show only minor variations with the different experimental conditions and manipulations. This pattern of results appears completely opposite to that predicted by the reading-of-a-fading-image analogy. The data strongly suggest that it is not so much problems of identification, but problems of localization that impose the limitations in bar-probe performance. To account for the results, Mewhort and associates proposed a 'dual-buffer' model. A representation of the stimulus is first registered in a 'feature buffer'. An 'identifier', working in parallel on all the information in this buffer, identifies the letters and stores its results, together with location information, in a 'character buffer'. An 'attentional mechanism' operates in this store, and selects and passes a code, suitable for output (see especially Mewhort and Campbell, 1981, for a detailed description). Coltheart (1980, 1984) presented a generalized version of this model. In his version of the character buffer, a system composed of 'lexical entries', there is not only identity and location information. The lexical entries are 'tagged' with information about the complete physical manifestation of the item that triggered the entry. It is of fundamental importance to see how in these models 'reading' is conceptualized. Contrary to the traditional views, but in line with the proposal we put forward earlier, it is assumed that all information present in a single eyefixation is processed, i.e. identified, automatically and in parallel. However, also contrary to the received view, but now also contrary to the view we put forward in the previous section, it is assumed that selection takes place at a postcategorical level. Episodic information at this level- the tags in Coltheart's view or an abstract representation of relative spatial position in Mewhort's v i e w - is used for distinguishing relevant information from irrelevant information, and the location errors are explained in terms of saliency of position information and loss of position information in the character buffer. So, in these views selection is not really early selection in vision, but is late selection in a memory containing 'abstract' information. While in these models the existence of an initial, high-capacity, fast-decaying analog representation, a sensory buffer or 'icon', is still recognized, it plays no role in the selection of information. However, from the observation that in bar-probe tasks most errors are location errors, and from the conclusion, based on this observation, that problems of localization rather than problems of identification impose the major limitation, it in no way follows that selection takes place in a store containing identified information and abstract position information. Localization problems can also be perceptual problems. Indeed, with short stimulus-probe intervals it is mainly visual factors,
28
A. H. C. Van der Heijden
such as small inter-item spacings (Eriksen and Rohrbaugh, 1970; Mewhort et al., 1982) and backward masking (see above), that induce the large number of location errors. More, in general, it is not easily substantiated that the initial, high-capacity, analog representation, i.e. vision, plays no role in bar-probe performance. In a series of bar-probe studies, with relatively large inter-item spacings and therefore only few location errors, Van der Heijden and associates obtained clear evidence that with increasing probe delays a progressively decaying visual represention was addressed (Van der Heijden, 1984, 1986; Van der Heijden et al., 1987; Keele and Chase, 1967). With a modified bar-probe task, Pashler (1984) showed that a quality manipulation, which could affect the visual representation but not the derived representation, strongly influenced performance (see Mewhort, Johns and Coble, 1991, for a number of replications). Recent analyses strongly suggest that bar-probe performance is limited by two factors: identification and localization (Hagenaar, 1990; Hagenzieker et al., 1990; Mewhort et al., 1988; Van der Heijden, 1992). Identification is likely to be automatic parallel identification as suggested by Sperling, Coltheart, and Mewhort and associates, with retinal acuity and lateral interference determining the quality of this process. With short stimulus-probe intervals the localization problems are likely to be localization problems in vision, not problems in a higher-order memory. In general, it is not only true that '...behavioral evidence about the relative efficiency of "selection" .. . is simply irrelevant to questions about the level of processing accorded to the "unselected information"'. Evidence suggesting highly efficient (unlimited capacity parallel) processing is also simply irrelevant to the question at what level the selective mechanism intervenes. Unlimited-capacity parallel processing affords late selection, but unlimited-capacity parallel processing certainly does not command late selection (Van der Heijden, 1981, 1987, 1990, 1992). 3.2.3
S p a t i a l L o c a t i o n a n d t h e P o s i t i o n of Objects
As stated, most errors observed in partial-report bar-probe tasks are location errors and most of these location errors are near-location errors. Instead of naming the target indicated by the bar marker, the subject names one of its immediate neighbors. The abundance of near-location errors is not only observed with post-cues but also with pre-cues (Hagenzieker et al., 1990). Of course, this pattern of results strongly indicates that subjects are dealing with a subregion in visual space, and, moreover, that there are limits to the precision with which this can be done. At present it is far from clear what is at the basis of this imprecision. It is possible that, with the short exposure durations used, the spatial resolving power of the visual system is not sufficient for perfect visual localization of letters or bar marker or both. As will become apparent at the end of this chapter, this topic deserves much more investigation. In the present context, however, the precise explanation is not of real importance. Of real importance is that an abundance of near-location errors is not only found when a bar directly indicates the position of the target; the same result is obtained when a property of the target, other than its position, signifies its relevance. Snyder (1972) asked his subjects to name a letter specified by color (e.g. name the red one among the black ones), by fragmentation or by inversion. Incorrect responses tended to come from positions adjacent to the target. Fryklund (1975) asked his
Visual attention
29
subjects to name the five red letters among the 20 black ones. He reports that '...intrusions come from positions immediately adjacent to target positions' (p. 380). Butler et al. (1987) showed their subjects a row of eight letters with a red (black) target embedded in seven black (red) letters and asked them to name the target. These authors stress that the results obtained do not deviate from those obtained with a position cue, so there must have been many near-location errors. In theory, in these experiments position is irrelevant. It is color, inversion or fragmentation, and not position, that distinguishes the target from irrelevant elements, and theories can be invented in which position plays no special role in the selection of information (Broadbent, 1958, 1971; Bundesen, 1990; Duncan, 1984; Phaf et al., 1990). In the reality of empirical results, however, the spatial imprecision indicates that space intermediates in the selection of the target. Color, inversion or fragmentation appears to indicate firstly a position in space, just as a bar marker, and the target appears to be selected via its position in visual space. In a recent study, Tsal and Lavie (1988) provided further evidence for this special role of position in visual selection tasks. In their first experiment a circular array with nine letters was briefly shown. Three letters were red, three green and three brown. The subjects were instructed to report first a letter of a given color (e.g. report first a red letter) and then, without restrictions, as many of the rest as possible. After adjusting the outcome for chance, it appeared that, among the letters reported in addition to the first letter, there were significantly more 'near-location reports', i.e. reports of letters from the two positions adjacent to the letter reported first, than letters with the relevant color or neutral letters. In two subsequent experiments in which first one (experiment 2) or two (experiment 3) curved letters, presented together with angular letters, had to be reported, essentially the same result was obtained. The authors conclude that '.... the direction of attention to a relevant spatial location seems to be a general and mandatory process that takes place irrespective of the dimension according to which the stimulus was initially selected...' (p. 19). They also wonder why spatial position functions as an intermediate because '... this proposed sequence of processing does not seem to be a particularly parsimonious operation of the attentional system' (p. 20). We return to these attentional issues in the last section. This space-as-a-not-avoidable-intermediate is compatible with the abundant evidence, produced in the last two decades, in favor of the view that attention can be conceived as a spotlight that can be directed at circumscribed regions in visual space. Space does not allow me to go into the details of this topic. Suffice it to say that benefits as a result of foreknowledge of the position at which a target is going to appear have been demonstrated with location cues (e.g. a short duration dot on the position of the impending target) and symbolic cues (e.g. an arrow in a neutral position pointing to the right and indicating that the target will appear at the right), in single-item tasks and in multiple-item tasks, with detection, localization, search and recognition as the task to be performed, and with latency and accuracy as the dependent variable (Egly and Homa, 1984; Eriksen, 1990; Eriksen and Hoffman, 1972, 1974; Eriksen and Rohrbaugh, 1970; Eriksen and St James, 1986; Jonides, 1980, 1981, 1983; Posner, 1980; Posner and Cohen, 1984; Posner, Nissen and Ogden, 1978; Van der Heijden and Eerland, 1973; Van der Heijden, Schreuder and Wolters, 1985; Van der Heijden et al., 1987; see Van der Heijden, 1992, for an overview). Indeed, as Duncan (1981, p. 92) states '... it must be important that advance knowledge of position affords such an excellent (perhaps the best) selection cue.'
30
A. H. C. Van der Heijden
This space as an intermediate is also compatible with the evidence that suggests that, at least in a number of circumstances, selection in vision is object selection (Allport, 1987, 1989; Duncan, 1984; Kahneman, 1973; Kahneman and Chajczyk, 1983; Kahneman and Henik, 1981; Kahneman and Treisman, 1984; Neisser, 1967; Treisman, Kahneman and Burkell, 1983). Some thought reveals that '... a physical object may be thought of as a region of the sensory environment that can be separately acted upon in some w a y . . . ' (Allport, 1987, p. 412; my italics). Clearly, in vision an object can be approached and selected via the exact visual location it occupies. So, object selection in vision is simply one specific case of location selection in vision. Given that position is the ultimate intermediate in selection in vision, the problem of the efficiency of other selection criteria, such as color, shape, inversion, letter, vowel, etc., reduces to the problem of the efficiency of these criteria in specifying a position in visual space. As Duncan (1981) argues for all selection criteria, given the theory as it stands, this is purely an empirical matter. Fortunately, empirical research strongly suggests that criteria closely related to position, such as movement of a target (Nakayama and Silverman, 1986; Neisser, 1967), the abrupt onset of a target (Jonides and Yantis, 1988; Yantis and Jonides, 1984) and (stereoscopic) depth (Enns, 1990; Nakayama and Silverman, 1986) are uniquely efficient. And, when in a multiple-item array all items are equivalent with regard to position or attributes related to position, the most likely complete empiric rule seems to be the one that also determines efficiency of visual search: efficiency of search increases continuously with (1) decreasing similarity between target and nontargets and (2) increasing similarity between the nontargets (Duncan, 1989; Duncan and Humphreys, 1989). For the tasks discussed in this chapter only the first part of the rule is of relevance (for certain types of search tasks also the second part is of importance). Using this part of the rule it follows that, for example, the localization of a red item among black ones or a bright item among dim ones is very efficient. The target is the element in the visual field that is most unlike its neighbors, i.e. target-nontarget similarity is appreciably lower than nontarget-nontarget similarity. It also follows that, for example, the localization of a letter among digits or a vowel among consonants will be highly inefficient. There is no reason to assume that the target is most unlike its neighbors, or that target-nontarget similarity is lower than nontarget-nontarget similarity. So, it appears very likely that an attribute's efficiency in specifying a position in visual space corresponds with its efficiency in serving as a selection criterion in vision. The reader is referred to Duncan (1989) and Duncan and Humphreys (1989) for several intriguing suggestions about how this localization is performed.
4
A MODEL
From the further developments reported in the preceding sections, three outcomes are essential because they strongly point at, and easily lead to, a parsimonious model for information processing, selection and attention for the aspects of task performance with which we were concerned (see Van der Heijden, 1992, for a more disciplined derivation and a more complete account). To see that, we have to keep
Visual attention
31
in mind that: (1) '...behavioral evidence about the relative efficiency of "selection"... is simply irrelevant to questions about the level of processing accorded to the "unselected information"' (Allport, 1987, p. 409). While limited central capacity requires early selection, early selection does not require limited central capacity. (2) '... evidence suggesting highly efficient (unlimited-capacity parallel) processing is simply irrelevant to the question at what level the selective mechanism intervenes' (Van der Heijden, 1981). While unlimited central capacity affords late selection, unlimited central capacity does not command late selection. In other words, we have to realize that the limited capacity versus unlimited capacity issue and the early selection versus late selection issue are largely independent issues (see also Allport, 1987, 1989; Kahneman and Treisman, 1984, p. 55; Van der Heijden, 1981, 1987, 1992). The three essential outcomes mentioned earlier are: (1) Within the limits set by retinal acuity and lateral masking there is, within a single eye-fixation, an unlimited central capacity for processing, i.e. identification or categorization, of visual information. (2) Selection of information is early selection, or selection is selection in vision (obviously, this selection is not selection for 'processing' as suggested by orthodox early selection theories). (3) This early selection, whether controlled by verbal instructions, position cues, criterion attributes or whatever else one can have (e.g. by 'the subject'), is ultimately accomplished via a region in visual space. That these three outcomes easily lead to a parsimonious model is most readily seen when we first briefly turn to some suggestions from contemporary neurophysiology that are of help for the interpretation of 'unlimited capacity for processing within a single eye-fixation'. Here we can only briefly indicate these suggestions (for more detailed, general, accounts see Allport, 1989; De Yoe and Van Essen, 1988; Livingstone and Hubel, 1988; Maunsell, 1987; Mishkin, Ungerleider and Macko, 1983; Neumann, 1990; Zeki and Shipp, 1988). The evidence indicates that there are basically two channels in the visual information processing system: a magno channel and a parvo channel. These channels diverge and are specialized in coding different stimulus properties. The magno channel, which leads to the parieto-occipital cortex, carries information about position (and possibly global form). The parvo system (which further subdivides) has the temporo-occipital region as its destination, and handles color and detailed form (but not position). In other words, the parvo system is thought to be concerned primarily with 'what', i.e. with identity processing, and the magno system primarily with 'where', i.e. with location processing. Obviously, given such an architecture, the translation of the unlimited-capacity processing assumption has to be approximately as depicted in Figure 1.5. Visual information enters the central processing system in an 'input map' (the striate cortex?). Here all information is represented, albeit not yet explicit. The information is sent on in an identity channel (the parvo channel) and a location channel (the magno channel). Ultimately the identities are calculated in an 'identity domain' (the
32
A. H. C. Van der Heijden
ID
LO
IN
STIMULI Figure 1.5. Unlimited-capacity identity and location processing model. Information enters the input map (IN) and is sent on to an identity domain (ID) and a location map (LO). [Adapted from Van der Heijden, 1992._]
inferior temporal cortex?) and the positions, or, in general, the spatial relations, are calculated in a 'location map' (the posterior parietal cortex?). The interesting, and in my view theoretically important, point is now that only a single, simple addition suffices to complete this scheme in such a way that it is also compatible with the remaining two essential outcomes mentioned at the beginning of this section: selection is early selection and selection is ultimately accomplished via a region in visual space. Figure 1.6 depicts this completed schematic representation of unlimited-capacity parallel processing. Added are 'feedback lines' from the location map back to the input map. If we assume that selection takes place through enhanced activation via subsets of these feedback or 're-entry' lines then selection is early selection (the lines re-enter the input map) and selection is accomplished via regions in space (the lines originate in the location map). In this scheme unlimited capacity and early selection are harmoniously combined because 'In a massively parallel, activation-coded system, selective enhancement provides a . . . mechanism of selective cueing that need not entail the exclusion from further processing of the noncued information, beyond the level of encoding at which selective enhancement first occurs' (Allport, 1987, p. 410). As in my earlier work (Van der Heijden, 1992), I propose to call the differential or selective operation of the system of feedback lines 'selective attention'. Selective attention consists of the enhanced activation of subsets of these feedback lines.
33
Visual attention
11)
LO
STIMULI Figure 1.6. Completed unlimited-capacity processing model. Position information is fed back from the location map to the input map. [Adapted[tom Van der Heijden, 1992.] By combining early selection and unlimited capacity, this conceptualization is fundamentally at variance with all views that postulate there is early selection in order to protect a central system with a limited information processing capacity against confusion and overload. However, an alternative, and much more positive, function for early selection is easily found. Early selection has an important function in the direction and execution of a selected action (Van der Heijden, 1990, 1992). By means of early selection, attention solves the problem of which object to act upon by determining from what region in visual space the set of parameters is taken that is allowed to specify a selected action in detail (Allport, 1987; Neumann, 1987). In fact, by replacing 'attention directed at position' by 'position directed as attention' (Van der Heijden, 1992), the model is completely compatible with, and an implementation of, the action-based views of attention as proposed by Allport (1987, 1989) and Neumann (1987, 1990). As a theory, this conceptualization is also diametrically opposed to the 'feature integration' theory proposed by Treisman, after the suggestions by Neisser (1967; see Crick, 1984; Treisman, 1977, 1982; Treisman and Gelade, 1980; Treisman and Schmidt, 1982; for an overview see Treisman, 1988). In feature integration theory the starting point is that the modularity of the visual system introduces a problem. (How do we see integrated objects or a unified world?) That problem has to be solved by attention. (Focal attention integrates the information in the different modules.) In the view presented here, (selective) attention, as an animistic deus ex
34
A. H. C. Van der Heijden
machina, is the real problem. That problem is adequately taken care of by the
modularity of the visual system and appropriate (map-map) connectivity. With the scheme presented in Figure 1.6 as a starting point and with attention as defined, topics with regard to the mode of operation of visual atttention, e.g. the topic of divided and focused attention, reduce to the topic of the shape of the distribution of (enhanced) activation in the location map. Topics with regard to what exactly is selected, e.g. the topic of whether attention selects an 'object' or an approximate region in visual space, reduce to the problem of the exact region of (enhanced) activation in the location map. Topics with regard to the control of selective attention, e.g. the topic of involuntary and voluntary attention, reduce to the topic of the mechanisms that are capable of modulating the pattern of (enhanced) activation in the location map. In my view, it is these areas with which contemporary research in visual attention and visual search is basically concerned. However, an appropriate interpretation of contemporary research in these terms needs divided attention. In the past, issues of visual information processing and selection were generally discussed in terms of unidimensional or linear information processing models (processing ~ more processing ~ still more processing; Neisser, 1976). In the present scheme, identification and localization are separated and assigned to two distinct, parallel information processing channels that, clearly, have their own properties and characteristics. Therefore the scheme also provides the possibility to reformulate and revisit interesting and long-standing research problems in visual information processing and attention. How fast is recognition and how slow is subsequent selection? (see Sperling, 1967, for the relevant research). What exactly is the effect of a pattern mask? (see Mewhort et al., 1981, 1984, for good starting assumptions). How to conceptualize lateral interference? (see Butler and Currie, 1986, for useful ideas). Where do 'Gestalt properties' enter the selection process? (see Kahneman and Henik, 1981; Livingstone and Hubel, 1988, p. 748, for good starting assumptions). What is the exact difference between search tasks and recognition tasks? (see Duncan and Humphreys, 1989; Kahneman and Treisman, 1984, for good starting points). How does attention 'move' in the visual field? (see Eriksen and Murphy, 1987; Yantis, 1988, for good starting points). And why do we perceive a stable visual world despite (saccadic) eye movements? (see Bridgeman, Van der Heijden and Velichkovsky, 1990, for a start). However, space does not allow me to go into these intriguing issues in any detail and I leave them as exercises for the reader. Duncan (1981, p. 93) ended his paper on 'Directing attention in the visual field' with the question: 'Why is it that advance knowledge of position affords such an excellent cue for attentional selection?' I end this chapter with my answer to that question: 'Because "advance knowledge of position" is the attention doing the selection.'
REFERENCES Allport, D. A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum.
Visual attention
35
Allport, D. A. (1989). Visual attention. In M. I. Posner (Ed.), Foundations of Cognitive Science. Cambridge, MA: MIT Press. Anstis, S. M. (1974). A chart demonstrating variations in acuity with retinal position. Vision Research, 14, 589-592. Averbach, E. and Coriell, A. S. (1961). Short-term memory in vision. Bell System Technical Journal, 40, 309-328. Averbach, E. and Sperling, G. (1961). Short-term storage of information in vision. In C. Cherry (Ed.), Information Theory. London: Butterworth. Barlow, H. B. (1981). Critical limiting factors in the design of the eye and visual cortex. The Ferrier Lecture. Proceedings of the Royal Society (London) B, 212, 1-34. Barlow, H. B. (1985). The twelfth Bartlett memorial lecture: The role of single neurons in the psychology of perception. Quarterly Journal of Experimental Psychology, 37A, 121-145. Baxt, N. (1871). Ueber die Zeit welche n6tig ist, damit ein Gesichtseindruck zum Bewusstsein kommt und fiber die Gr6sse der bewussten Wahrnehmung bei einen Gesichtseindruck von gegebener Dauer. Pflugers Archiv fiir die Gesamte Physiologie des Menschen und der Tiere, 4, 325-336. Bouma, H. (1970). Interaction effects in parafoveal letter recognition. Nature, 226,, 177-178. Bouma, H. (1973). Visual interference in the parafoveal recognition of initial and final letters of words. Vision Research, 13, 767-782. Bridgeman, B., Van der Heijden, A. H. C. and Velichkovsky, B. M. (1990). Visual stability and saccadic eye movements. Report of the research group "'Mind and Brain", ZiF, Bielefeld, No. 58. Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press. Broadbent, D. E. (1971). Decision and Stress. London: Academic Press. Bundesen, C. (1987). Visual attention: Race models for selection from multielement displays. Psychological Research, 49, 113-121. Bundesen, C. (1990). A theory of visual attention. Psychological Review, 97, 523-547. Bundesen, C., Pedersen, L. F. and Larsen, A. (1984). Measuring efficiency of selection from briefly exposed visual displays: A model for partial report. Journal of Experimental Psychology: Human Perception and Performance, 10, 329-339. Bundesen, C., Shibuya, H. and Larsen, A. (1985). Visual selection from multielement displays: A model for partial report. In M. I. Posner and O. S. M. Marin (Eds), Attention and Performance XI. Hillsdale, NJ: Erlbaum. Butler, B. E. (1980). Selective attention and stimulus localization in visual perception. Canadian Journal of Psychology, 34, 119-133. Butler, B. E. and Currie, A. (1986). On the nature of perceptual limits in vision: A new look at lateral masking. Psychological Research, 48, 201-210. Butler, B. E., Mewhort, D. J. K. and Tramer, S. C. (1987). Location errors in tachistoscopic recognition: Guesses, probe errors, or spatial confusions? Canadian Journal of Psychology, 41, 339-350. Campbell, A. J. and Mewhort, D. J. K. (1980). On familiarity effects in visual information processing. Canadian Journal of Psychology, 34, 134-154. Cattell, J. McK. (1885). Ueber die Zeit der Erkennung und Benennung von Schriftzeichen, Bildern und Farben. Philosophische Studien, 2, 635-650. Chastain, G. (1983). Parafoveal identification asymmetry is a lateral masking effect. Journal of General Psychology, 109, 77-81. Chastain, G. and Lawson, L. (1979). Identification asymmetry of parafoveal stimulus pairs. Perception and Psychophysics, 26, 363-368. Chow, S. L. (1986). Iconic memory, location information, and partial report. Journal of Experimental Psychology: Human Perception and Performance, 12, 455-465. Clark, S. E. (1969). Retrieval of color information from preperceptual memory. Journal of Experimental Psychology, 82, 263-266. Collins, J. F. and Eriksen, C. W. (1967). The perception of multiple simultaneously presented forms as a function of foveal spacing. Perception and Psychophysics, 2, 369-373.
36
A. H. C. Van der Heijden
Coltheart, M. (1972). Visual information processing. In P. C. Dodwell (Ed.), New Horizons in Psychology 2. Harmondsworth: Penguin. Coltheart, M. (1975). Iconic memory: A reply to Professor Holding. Memory and Cognition, 3, 42 -48. Coltheart, M. (1980). Iconic memory and visible persistence. Perception and Psychophysics, 27, 183-228. Coltheart, M. (1984). Sensory m e m o r y - A tutorial review. In H. Bouma and D. G. Bouwhuis (Eds), Attention and Performance X: Control of Language Processes. London: L.E.A. Coltheart, M., Lea, C. D. and Thompson, K. (1974). In defense of iconic memory. Quarterly Journal of Experimental Psychology, 26, 633-641. Cowey, A. (1979). Cortical maps and visual perception: The Grindley memorial lecture. Quarterly Journal of Experimental Psychology, 31, 1-17. Cowey, A. (1981). Why are there so many visual areas? In F. O. Schmitt, F. G. Worden, G. Adelman and S. G. Dennis (Eds), The Organisation of the Cerebral Cortex. Cambridge, MA: MIT Press. Crick, F. H. C. (1984). Function of the thalamic reticular complex: The searchlight hypothesis. Proceedings of the National Academy of Sciences USA, 81, 4586-4590. Deutsch, J. A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. DeYoe, E. A. and Van Essen, D. C. (1988). Concurrent processing streams in monkey visual cortex. TINS, 11, 219-226. Dick, A. O. (1972). Parallel and serial processing in tachistoscopic recognition: Two mechanisms. Journal of Experimental Psychology, 96, 60-66. Driver, J. and Tipper, S. P. (1989). On the nonselectivity of 'selective' seeing: Contrasts between interference and priming in selective attention. Journal of Experimental Psychology: Human Perception and Performance, 15, 304-314. Duncan, J. (1980). The demonstration of capacity limitation. Cognitive Psychology, 12, 75-96. Duncan, J. (1981). Directing attention in the visual field. Perception and Psychophysics, 30, 90-93. Duncan, J. (1983). Perceptual selection based on alphanumeric class: Evidence from partial reports. Perception and Psychophysics, 33, 533-547. Duncan, J. (1984). Selective attention and the organization of visual information. Journal of Experimental Psychology: General, 113, 501-517. Duncan, J. (1989). Boundary conditions on parallel processing in human vision. Perception, 18, 457-469. Duncan, J. and Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96, 433-458. Egeth, H. E., Folk, C. L. and Mullin, P. A. (1989). Spatial parallelism in the processing of lines, letters and lexicality. In B. E. Shepp and S. Ballesteros (Eds), Object Perception: Structure and Process. Hillsdale, NJ: Erlbaum. Egly, R. and Homa, D. (1984). Sensitization of the visual field. Journal of Experimental Psychology: Human Perception and Performance, 10, 778-793. Enns, J. T. (1990). Three-dimensional features that pop out in visual search. In D. Brogan (Ed.), Visual Search. London: Taylor and Francis. Erdman, B. and Dodge, R. (1898). Psychologische Untersuchungen iiber das Lesen. Halle: M. Niemeyer. Eriksen, C. W. (1966). Independence of successive inputs and uncorrelated error in visual form perception. Journal of Experimental Psychology, 72, 26-35. Eriksen, C. W. (1990). Attentional search of the visual field. In D. Brogan (Ed.), Visual Search. London: Taylor and Francis. Eriksen, C. W. and Collins, J. F. (1967). Some temporal characteristics of visual pattern perception. Journal of Experimental Psychology, 74, 476-484. Eriksen, C. W. and Collins, J. F. (1969). Visual perceptual rate under two conditions of search. Journal of Experimental Psychology, 80, 489-492.
Visual attention
37
Eriksen, C. W. and Hoffman, J. E. (1972). Temporal and spatial characteristics of selective encoding from visual displays. Perception and Psychophysics, 12, 201-204. Eriksen, C. W. and Hoffman, J. E. (1974). Selective attention: Noise suppression or signal enhancement? Bulletin of the Psychonomic Society, 4, 587-589. Eriksen, C. W. and Lappin, J. S. (1967). Independence in the perception of simultaneously presented forms at brief durations. Journal of Experimental Psychology, 73, 468-472. Eriksen, C. W., Munsinger, H. L. and Greenspon, T. S. (1966). Identification versus samedifferent judgment: An interpretation in terms of uncorrelated perceptual error. Journal of Experimental Psychology, 72, 20-25. Eriksen, C. W. and Murphy, T. D. (1987). Movement of attentional focus across the visual field: A critical look at the evidence. Perception and Psychophysics, 42, 299-305. Eriksen, C. W. and Rohrbaugh, J. W. (1970). Some factors determining efficiency of selective attention. American Journal of Psychology, 83, 330-343. Eriksen, C. W. and Schultz, D. W. (1977). Retinal locus and acuity in visual information processing. Bulletin of the Psychonomic Society, 9, 81-84. Eriksen, C. W. and St James, J. D. (1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception and Psychophysics, 40, 225-240. Eriksen, C. W. and Steffy, R. A. (1964). Short-term memory and retroactive interference in visual perception. Journal of Experimental Psychology, 68, 423-434. Estes, W. K. (1978). Perceptual processing in letter recognition and reading. In E. C. Carterette and M. P. Friedman (Eds), Handbook of Perception, Vol IX. New York: Academic Press. Estes, W. K., Allmeyer, D. H. and Reder, S. M. (1976). Serial position functions for letter identification at brief and extended exposure durations. Perception and Psychophysics, 19, 1-15. Estes, W. K. and Taylor, H. A. (1964). A detection method and probabilistic models for assessing information processing from brief visual displays. Proceedings of the National Academy of Sciences USA, 52, 46-54. Estes, W. K. and Taylor, H. A. (1966). Visual detection in relation to display size and redundancy of critical elements. Perception and Psychophysics, 1, 9-16. Francolini, C. N. and Egeth, H. E. (1980). On the non-automaticity of automatic activation: Evidence of selective seeing. Perception and Psychophysics, 27, 331-342. Fryklund, I. (1975). Effects of cued-set spatial arrangement and target-background similarity in the partial-report paradigm. Perception and Psychophysics, 17, 375-386. Glanville, A. D. and Dallenbach, K. M. (1920). The range of attention. American Journal of Psychology, 41, 387-393. Haber, R. N. (1983). The impending demise of the icon: A critique of the concept of iconic storage in visual information processing. Behavioral and Brain Sciences, 6, 154. Haber, R. N~ and Hershenson, M. (1974). The Psychology of Visual Perception. London: Holt, Rinehart and Winston. Haber, R. N. and Standing, L. G. (1969). Direct measures of short-term visual storage: Quarterly Journal of Experimental Psychology, 21, 43-54. Hagenaar, R. (1990). Visual selection in letter naming. PhD Thesis, University of Leiden, The Netherlands. Hagenzieker, M. P., Van der Heijden, A. H. C. and Hagenaar, R. (1990). Time courses in visual information processing: Some empirical evidence for inhibition. Psychological Research, 52, 13-21. Hall, G. S. and Von Kries, J. (1879). Ober die Abh/ingigkeit der Reaktionszeit vom Ort des Reizes. Archives of Anatomy and Physiology: Leipzig Supplement, 1-10. Johnston, W. A. and Dark, V. J. (1985). Dissociable domains of selective processing. In M. I. Posner and O. S. M. Martin (Eds), Attention and Performance, XI: Mechanisms of Attention. Hillsdale, NJ: Erlbaum.
38
A. H. C. Van der Heijden
Johnston, W. A. and Dark, V. J. (1986). Selective attention. Annual Review of Psychology, 37, 43-75. Jonides, J. (1980). Towards a model of the mind's eye's movement. Canadian Journal of Psychology, 34, 103-112. Jonides, J. (1981). Voluntary versus automatic control over the mind's eye's movement. In J. B. Long and A. D. Baddeley (Eds), Attention and Performance IX. Hillsdale, NJ: Erlbaum. Jonides, J. (1983). Further toward a model of the mind's eye's movements. Bulletin of the Psychonomic Society, 21, 247-250. Jonides, J. and Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception and Psychophysics, 43, 346-354. Kahneman, D. (1968). Method, findings and theory in studies of visual masking. Psychological Bulletin, 70, 404-425. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall. Kahneman, D. and Chajczyk, D. (1983). Tests of the automaticity of reading: Dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9, 497-509. Kahneman, D. and Henik, A. (1981). Perceptual organization and attention. In M. Kubovy and J. R. Pomerantz (Eds), Perceptual Organization. Hillsdale, NJ: Erlbaum. Kahneman, D. and Treisman, A. (1984). Changing views of attention and automaticity. In R. Parasuraman and P. R. Davies (Eds), Varieties of Attention. New York: Academic Press. Keele, S. W. (1973). Attention and Human Performance. Pacific Palisades, CA: Goodyear. Keele, S. W. and Chase, W. G. (1967). Short-term visual storage. Perception and Psychophysics, 2, 383-386. Kinsbourne, M. and Warrington, E. K. (1962). The effect of an aftercoming random pattern on the perception of brief visual stimuli. Quarterly Journal of Experimental Psychology, 14, 223-234. La Heij, W. and Van der Heijden, A. H. C. (1983). Feature specific interference in letter identification. Acta Psychologica, 53, 37-60. Levi, D. M., Klein, S. A. and Aitsebaomo, A. P. (1985). Vernier acuity, crowding and cortical magnification. Vision Research, 25, 963-977. Livingstone, M. and Hubel, D. (1988). Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science, 240, 740-749. Long, G. M. (1980). Iconic memory: A review and critique of the study of short-term visual storage. Psychological Bulletin, 88, 785-820. Lovie, A. D. (1983). Attention and behaviourism-fact and fiction. British Journal of Psychology, 74, 301-310. Mackworth, J. F. (1959). Paced memorizing in a continuous task. Journal of Experimental Psychology, 58, 206-211. Mackworth, J. F. (1962). The visual image and the memory trace. Canadian Journal of Psychology, 16, 55-59. Mackworth, J. F. (1963). The relation between the visual image and post-perceptual immediate memory. Journal of Verbal Learning and Verbal Behavior, 2, 75-85. Matsuda, M. (1988). Identification and localization in tachistoscopic partial report. Japanese Psychological Research, 30, 33-41. Maunsell, J. H. R. (1987). Physiological evidence for two visual subsystems. In L. M. Vaina (Ed.), Matters of Intelligence: Conceptual Structures in Cognitive Neuroscience. Dordrecht: Reidel. Merikle, P. M. (1980). Selection from visual persistence by perceptual groups and category membership. Journal of Experimental Psychology: General, 109, 279-295. Mewhort, D. J. K., Butler, B. E., Feldman-Stewart, D. and Tramer, S. (1988). 'Iconic memory', location information, and the bar-probe task: A reply to Chow (1986). Journal of Experimental Psychology: Human Perception and Performance, 14, 729-737.
Visual attention
39
Mewhort, D. J. K. and Campbell, A. J. (1978). Processing spatial information and the selective-masking effect. Perception and Psychophysics, 24, 93-101. Mewhort, D. J. K. and Campbell, A. J. (1981). Toward a model of skilled reading: An analysis of performance in tachistoscopic tasks. Reading Research: Advances in Theory and Practice, 3, 39-118. Mewhort, D. J. K., Campbell, A. J., Marchetti, F. M. and Campbell, J. I. D. (1981). Identification, localisation, and 'iconic memory': An evaluation of the bar-probe task. Memory and Cognition, 9, 50-67. Mewhort, D. J. K., Johns, E. E. and Coble, S. (1991). Early and late selection in partial report: Evidence from degraded displays. Perception and Psychophysics, 50, 258-266. Mewhort, D. J. K., Marchetti, F. M. and Campbell, A. J. (1982). Blank characters in tachistoscopic recognition: Space has both a symbolic and a sensory role. Canadian Journal of Psychology, 36, 559-575. Mewhort, D. J. K., Marchetti, F. M., Gurnsey, R. and Campbell, A. J. (1984). Information persistence: A dual-buffer model for initial visual processing. In H. Bouma and D. G. Bouwhuis (Eds), Attention and Performance X: Control of Language Processes. London: Erlbaum. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Mishkin, M., Ungerleider, L. G. and Macko, K. A. (1983). Object vision and spatial vision: Two cortical pathways? Trends in Neurosciences, 6, 414-417. Moray, N. (1969). Attention: Selective Processes in Vision and Hearing. London: Hutchinson Educational. Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76, 165-178. Nakayama, K. and Silverman, G. H. (1986). Serial and parallel processing of visual feature conjunctions. Nature, 320, 264-265. Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts. Neisser, U. (1976). Cognition and Reality. San Francisco, CA: Freeman. Neumann, O. (1987). Beyond capacity: A functional view of attention. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum. Neumann, O. (1990). Visual attention and action. In O. Neumann and W. Prinz (Eds), Relationships Between Perception and Action. Berlin: Springer. Neumann, O. (1996). Konzepte der Aufmerksamkeit. G6ttingen: Hogrefe, in press. Neumann, O., Van der Heijden, A. H. C. and Allport, D. A. (1986). Visual selective attention: Introductory remarks. Psychological Research, 48, 185-188. Nissen, M. J. (1985). Accessing features and objects: Is location special? In M. I. Posner and O. S. M. Martin (Eds), Attention and Performance XI. Hillsdale, NJ: Erlbaum. Norman, D. A. (1968). Towards a theory of memory and attention. Psychological Review, 75, 522-536. Pashler, H. (1984). Evidence against late selection: Stimulus quality effects in previewed displays. Journal of Experimental Psychology: Human Perception and Performance, 10, 429448. Pashler, H. (1987). Detecting conjunctions of color and form: Reassessing the serial search hypothesis. Perception and Psychophysics, 41, 191-201. Phaf, R. H., Van der Heijden, A. H. C. and Hudson, P. T. W. (1990). SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology, 22, 273-341. Potfenberger, A. T. (1912). Reaction time to retinal stimulation with special reference to time lost in conduction through nerve centers. Archives of Psychology, 3, 1-73. Posner, M. I. (1978). Chronometric Exploration of the Mind. Hillsdale, NJ: Erlbaum. Posner, M. I. (1980). Orienting of attention. The VIIth Sir Frederic Bartlett Lecture. Quarterly Journal of Experimental Psychology, 32, 3-25.
40
A. H. C. Van der Heijden
Posner, M. I. and Cohen, Y. (1984). Components of visual orienting. In H. Bouma and D. G. Bouwhuis (Eds), Attention and Performance X. Hillsdale, NJ: Erlbaum. Posner, M. I., Nissen, M. J. and Ogden, W. C. (1978). Attended and unattended processing modes: The role of set for spatial location. In H. L. Pick and I. J. Saltzman (Eds), Modes of Perceiving and Processing Information. HiUsdale, NJ: Erlbaum. Prinzmetal, W. (1981). Principles of feature integration in visual perception. Perception and Psychophysics, 30, 330-340. Rumelhart, D. E. (1970). A multicomponent theory of the perception of briefly exposed visual displays. Journal of Mathematical Psychology, 7, 191-218. Schultz, D. W. and Eriksen, C. W. (1977). Do noise masks terminate target processing? Memory and Cognition, 5, 90-96. Shapiro, R. G. and Krueger, L. E. (1983). Effect of similarity of surround on target-letter processing. Journal of Experimental Psychology: Human Perception and Performance, 9, 547559. Shibuya, H. and Bundesen, C. (1988). Visual selection from multielement displays: Measuring and modeling effects of exposure duration. Journal of Experimental Psychology: Human Perception and Performance, 14, 591-600. Shiffrin, R. M. and Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127-190. Snodgrass, J. G. and Townsend, J. T. (1980). Comparing parallel and serial models: Theory and implementation. Journal of Experimental Psychology: Human Perception and Performance, 6, 330-354. Snyder, C. R. R. (1972). Selection, inspection and naming in visual search. Journal of Experimental Psychology, 92, 428-431. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monograph, 74 (whole no. 498). Sperling, G. (1963). A model for visual memory tasks. Human Factors, 5, 19-31. Sperling, G. (1967). Successive approximations to a model for short-term memory. Acta Psychologica, 27, 285-292. Taylor, S. G. and Brown, D. R. (1972). Lateral visual masking: Supraretinal effects when viewing linear arrays with unlimited viewing time. Perception and Psychophysics, 12, 97-99. Townsend, J. T. (1971). A note on the identifiability of parallel and serial processes. Perception and Psychophysics, 10, 161-163. Townsend, J. T. (1972). Some results on the identifiability of parallel and serial processes. British Journal of Mathematical and Statistical Psychology, 25, 168-199. Townsend, J. T. (1974). Issues and models concerning the processing of a finite number of inputs. In B. H. Kantowitz (Ed.), Human Information Processing: Tutorials in Performance and Cognition. Hillsdale, NJ: Erlbaum. Townsend, J. T. (1990). Serial vs. parallel processing: Sometimes they look like Tweedledum and Tweedledee but they can (and should) be distinguished. Psychological Science, 1, 46-54. Townsend, V. M. (1973). Loss of spatial and identity information following a tachistoscopic exposure. Journal of Experimental Psychology, 98, 113-118. Treisman, A. M. (1964). Verbal cues, language, and meaning in selective attention. American Journal of Psychology, 77, 206-219. Treisman, A. M. (1969). Strategies and models of selective attention. Psychological Review, 76, 282-299. Treisman, A. M. (1977). Focussed attention in the perception and retrieval of multidimensional stimuli. Perception and Psychophysics, 22, 1-11. Treisman, A. M. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology: Human Perception and Performance, 8, 194-214. Treisman, A. M. (1988). Features and objects: The fourteenth Bartlett memorial lecture. Quarterly Journal of Experimental Psychology, 40A, 201-237.
Visual attention
41
Treisman, A. M. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97-136. Treisman, A. M., Kahneman, D. and Burkell, J. (1983). Perceptual objects and the cost of filtering. Perception and Psychophysics, 33, 527-532. Treisman, A. M. and Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107-141. Tsal, Y. and Lavie, N. (1988). Attending to color and shape: The special role of location in selective visual processing. Perception and Psychophysics, 44, 15-21. Turvey, M. T. and Kravetz, S. (1970). Retrieval from iconic memory with shape as the selection criterion. Perception and Psychophysics, 8, 171 - 172. Van der Heijden, A. H. C. (1971). The processing of tachistoscopic displays as a function of effective stimulus duration. Acta Psychologica, 35, 233-242. Van der Heijden, A. H. C. (1981). Short-Term Visual Information Forgetting. London: Routledge and Kegan Paul. Van der Heijden, A. H. C. (1984). Postcategorical filtering in a bar-probe task. Memory and Cognition, 12, 446-457. Van der Heijden, A. H. C. (1986). On selection in vision. Psychological Research, 48, 211-219. Van der Heijden, A. H. C. (1987). Central selection in vision. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ" Erlbaum. Van der Heijden, A. H. C. (1990). Visual information processing and selection. In O. Neumann and W. Prinz (Eds), Relationships Between Perception and Action. Berling: Springer. Van der Heijden, A. H. C. (1992). Selective Attention in Vision. London: Routledge. Van der Heijden, A. H. C. and Eerland, E. (1973). The effects of cueing in a visual signal detection task. Quarterly Journal of Experimental Psychology, 25, 496-503. Van der Heijden, A. H. C., Schreuder, R., De Loor, M. and Hagenzieker, M. (1987). Early and late selection: Visual letter confusions in a bar-probe task. Acta Psychologica, 65, 75-89. Van der Heijden, A. H. C., Schreuder, R. and Wolters, G. (1985). Enhancing single-item recognition accuracy by cueing spatial locations in vision. Quarterly Journal of Experimental Psychology, 37a, 427-434. Van der Heijden, A. H. C., Wolters, G., Groep, J. C. and Hagenaar, R. (1987). Single-letter recognition accuracy benefits from advance cueing of location. Perception and Psychophysics, 42, 503-509. Von Helmholtz, H. (1871). Ueber die Zeit welche n6tig ist, damit ein Gesichtseindruck zum Bewusstsein kommt. Berliner Monatsberichte, 8 Juni, 333-337. Von Helmholtz, H. (1894). Handbuch der physiologischen Optik. Hamburg: L. Vos. Von Wright, J. M. (1968). Selection in visual immediate memory. Quarterly Journal of Experimental Psychology, 20, 62-68. Von Wright, J. M. (1970). On selection in visual immediate memory. Acta Psychologica, 33, 280-292. Von Wright, J. M. (1972). On the problem of selection in iconic memory. Scandinavian Journal of Psychology, 13, 159-171. Warren, R. M. and Warren, R. P. (1968). Helmholtz on Perception: Its Physiology and Development. New York: John Wiley. Wolford, G. and Chambers, L. (1983). Lateral masking as a function of spacing. Perception and Psychophysics, 33, 129-138. Wolford, G. and Hollingsworth, S. (1974). Lateral masking in visual information processing. Perception and Psychophysics, 16, 315-320. Wolford, G. L., Wessel, D. L. and Estes, W. K. (1968). Further evidence concerning scanning and sampling assumptions of visual detection models. Perception and Psychophysics, 3, 439-444. Woodworth, R. S. and Schlosberg, H. (1954). Experimental Psychology. London: Methuen. Wundt, W. (1899). Zur Kritik tachistoscopischer Versuche. Philosophische Studien, 15, 287-317.
42
A. H. C. Van der Heijden
Yantis, S. (1988). On analog movements of visual attention. Perception and Psychophysics, 43, 203-206. Yantis, S. and Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 10, 601-621. Zeki, S. and Shipp, S. (1988). The functional logic of cortical connections. Nature, 355, 311-317.
Chapter 2 Visual Search A. F. Sanders and M. Donk Department of Psychology, Vrije Universiteit, Amsterdam, The Netherlands
In his review of the literature Monk (1984) correctly noted that the term visual search is generally used as a loose label for a variety of experimental paradigms which share the broad feature that there is a need for spatial uncertainty reduction. Spatial uncertainty refers to the unknown position of one or more targets, either predefined and known in advance of a trial or unknown but still sufficiently deviant from the other elements of the visual scene to enable detection. All search conditions have in common that they suffer from parameter overspecification and hence require selective attention for action (Neumann, 1987). Apart from this communality, studies on visual search differ in many respects. One reason for this might be that the field has its roots in applied rather than in basic research, with the consequence that several classic experiments aimed at simulating a real-life task instead of addressing a theoretical issue. Theoretical questions evolved from the applied work, but they were often connected to the original experimental settings, which has the consequence that a general theory of visual search is lacking. Instead, there are various smaller, more or less isolated, frameworks, subsumed under the general search umbrella. Two aspects of visual search are often distinguished, one relating to structural constraints, and the other to functional strategic factors. They can certainly not be strictly separated, but they still represent two traditions. The symposium on Search and the Human Observer (Clare and Sinclair, 1979) and Rabbitt's (1984) chapter in Varieties of Attention (Parasuraman and Davies, 1984) are prototypical. Characteristic is their emphasis on either one of two broad questions. Structural constraints are usually concerned with the question of what kind of information is extracted from fixations during visual search. It is not surprising that this tradition relies heavily on studies of single eye-fixations. Yet, structural constraints can be described on at least three levels: the sensory level of a single fixation, the anatomical level of eye and head movements and, finally, the information processing level. The other question is what determines the position of the next fixation; is the locus of control in search externally (bottom-up) or internally (top-down) guided? This second tradition is dominated by studies on free continuous search with the aim of describing strategies. The division is of course not absolute: a combination of constraints and strategies is thought to determine the final search pattern. Handbook of Perception and Action, Volume 3
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
ISBN 0-12-516163-8
43
44
A. F. Sanders and M. Donk
The study of visual search during a single fixation derives from the wellestablished evidence that there is no useful vision during saccadic eye movements (see Matin, 1974, for a review), which implies that search processes must occur during fixations. Studying a single fixation, then, has the advantage of allowing rigid experimental control of the stimulus conditions, and in particular of the limiting effects of peripheral vision. The advantage is that the experimenter decides what the subject is going to see, without worrying too much about scanning and strategy. Visual search during a single fixation constitutes a major paradigm in the study of visual attention. It is at the basis of many theoretical issues such as early versus late selection (Duncan, 1980; Van der Heijden, 1987), automatic versus controlled processing (Schneider, Dumais and Shiffrin, 1984), feature versus conjunction search (Treisman and Gelade, 1980; Treisman and Sato, 1990; Wolfe, Cave and Franzel, 1989) and spotlight versus zoomlens (Eriksen, 1990; Posner, Snyder and Davidson, 1980). A more detailed discussion of these issues is found in other chapters of this volume, since they are more concerned with mechanisms of attention than with search per se. Here, mechanisms of visual attention will be only touched upon as far as the issues are relevant to free continuous visual search. The first section is concerned with perceptual limits during a single fixation as expressed in visibility and conspicuity (Engel, 1976). These sensory constraints are generally described on a psychophysical level. Visual search may extend to a large field and include eye and head movements (Sanders, 1970). Different search modes are involved when the display exceeds a certain limit (Sanders, 1963, 1970; Sanders and Houtmans, 1984). The second section deals with anatomical constraints when the size of the display requires eye and head movements (Sanders, 1963, 1970). The third section is concerned with information processing constraints. Here, the major interest is in attentional processing limits (Neisser, 1963, 1967; Prinz, 1987). The reported studies mainly deal with prescribed scanning, such as scanning lines as opposed to free scanning in which any direction or order of fixation is permitted. In prescribed scanning, there is little space for strategy, since search is fully determined by the stimulus configuration. It should be noted in passing that the analysis of reading is beyond the scope of this chapter, since it is not really a search problem. The fourth section is concerned with relations between eye movements and conspicuity (Engel, 1976), followed by a discussion on the relations between eye movements and attention. Apparently, both sections are related in that eye movements elicited by conspicuity may very well be accompanied or even preceded by attention shifts. The main argument to separate both sections is that the first is explicitly concerned with research on eye movements and conspicuity, while in the second section the emphasis is on the interrelation between eye movements and attention. The paper continues with a section entitled 'Patterns of eye movements in visual search', which outlines two alternative mathematical models describing visual search behavior in homogeneous noisy displays. The next section deals with cognitive models of visual search followed by a discussion about visual sampling strategies. The final section is concerned with effects of stresses and abnormal conditions on search. Obviously, the last three sections are more related to functional strategic factors in visual search.
Visual search
1
SENSORY
45
CONSTRAINTS
It is an almost trivial statement that the most effective search is no search at all (Corbin et al., 1958). Obviously, this refers to conditions where a target is sufficiently conspicuous to be immediately detected. Engel (1971, 1977) defined visual conspicuity as the degree of perceptual prominence of a visible object in its surroundings regardless of knowledge about the location of the object. According to Engel, an object's conspicuity is determined by simple features such as brightness, color, size, shape and motion. The visual lobe, or conspicuity area, represents the peripheral area around the central fixation point within which a target can be detected with a certain probability within a single glance. Basically, the conspicuity area is assumed to represent the area within which a target 'pops out' without the need for a serial attentional scan. The size of the visual lobe depends on variables such as display density and target-background similarity. The basic reason for measuring the visual lobe is that the number of fixations required to find a target is supposed to be a direct function of the size of the visual lobe for that target. Usually, the visual lobe has been measured with tachistoscopical techniques. The method is rather simple in that a target is briefly presented at different eccentricities from the central point of fixation. For each distance, the subject indicates whether, and eventually where, a target has been seen. In this way, a series of contours around the fixation point can be constructed representing probabilities of target detection (Bellamy, 1984; Bellamy and Courtney, 1981; Engel, 1976; Hughes and Cole, 1986; Kraiss and Knaeeper, 1982). In view of the large number of readings required to map the lobe fully (Chalkin, Corbin and Volkmann, 1962), generally only one or two axes have been measured. Thus, some studies measured four meridians around the fixation point (Johnston, 1965) while others only used one or two (Engel, 1976; Erickson, 1964). The main advantage of the tachistoscopical technique is that it enables visual lobe measurement without the influence of eye movements, thus providing an objective measure of peripheral sensitivity. Although the visual lobe has been widely used in models of visual search, there is no agreed definition of the lobe. In fact, the definition depends on the method of measurement, which varies substantially among the different studies. A major difference between the various studies concerns the critical detection contour. Conspicuity is usually expressed as a distance associated with a certain detection probability. Thus, the 50% detection probability contour (hard-shelled lobe) is usually taken as the edge of the conspicuity area (Chalkin et al., 1962; Engel, 1976). It is assumed that a target occurring inside this area will be directly found. Sometimes another detection probability contour is taken (Johnston, 1965) and, more often, the area with a small probability of detection is included in the lobe (soft-shelled lobe). Another measure of conspicuity starts from the probability of detection at a certain fixed distance from the central point of fixation. For example, Boynton, Elsworth and Palmer (1958) measured acuity at 3 ~ from the fixation point while Erickson (1964) measured at 6 ~ In a field study, Hughes and Cole (1986) simply measured the frequency of target detection as an indication of the conspicuity. A second main difference between the various studies is related to the way in which the lobe area is measured. In order to determine the area of the visual lobe,
46
A. F. Sanders and M. Donk
two different dependent variables have been used. Bellamy and Courtney (1981) established an estimate of the lobe based on location errors committed by subjects when attempting to report the target position. In this procedure, subjects indicated the position of the target after each brief presentation of the display. After transformation of the location error scores, a linear relationship could be established between error scores and target eccentricity (see also Courtney, 1984; Courtney and Chan, 1985, 1986). Another method relies on the proportion of detected targets as a function of the eccentricity of the fixation point (Engel, 1976). This method merely requires detection and no location of the target. The subject has to indicate after each display presentation whether or not he or she had seen a target. A third issue concerns the presentation time of the stimulus display. There is recent evidence that presentation time affects the size of the lobe (Sanders and Br/ick, 1991). However, different studies have used different presentation times of the stimulus display. In some studies the display was presented for an extremely brief period of time (Engel, 1976) whereas in other studies the display was presented until the subject responded (Hughes and Cole, 1986). In most studies, however, the display was presented for about 250 ms (Bellamy, 1984; Courtney and Chan, 1986). This duration is supposed to be short enough to exclude saccadic eye movements during presentation and, at the same time, long enough to be comparable to fixation durations in free search. A final issue concerns the divergence in instructions and, consequently, the response criterion of the subjects. Courtney and Chan (1985) instructed their subjects to indicate a target position after each presentation, even if the target had not been seen (see also Bellamy and Courtney, 1981; Courtney, 1984; Courtney and Chan, 1986). On the other hand, Hughes and Cole (1986) required their subjects to report only clearly detected targets. Similarly, Engel's subjects indicated with a push button whether or not they had detected the target (Engel, 1976). Obviously, either instruction might result in a different response bias of the subject and, therefore, in a different visual lobe estimate. When comparing the results of the various studies, there are large differences in outcome with respect to either size, shape or variability of the visual lobe. For example, Courtney and Chan (1985) reported irregular lobe boundaries, while Bellamy and Courtney (1981) found regular elliptical lobe shapes. Some investigators reported no difference between horizontal and vertical axes while other studies showed a longer horizontal than vertical length (Bellamy and Courtney, 1981; Courtney and Chan, 1986; Chalkin et al., 1962). Furthermore, various visual lobe sizes have been reported, varying from rather small to relatively large (Bellamy and Courtney, 1981; Courtney and Chan, 1986; Engel, 1976; Hughes and Cole, 1986). Unfortunately, there are too many unsystematic methodological differences between the various studies to elucidate the divergences. In addition, there is a total lack of theoretical background with respect to the target-background configurations which have been used and their effect on detection. Yet, in its own specific target-background configuration, each individual visual lobe estimate might be a valuable predictor of search performance. Indeed, it has been repeatedly demonstrated that lobe area is inversely proportional to search time and proportional to the probability of detecting the target in a single eye-fixation during search (Bellamy and Courtney, 1981; Bloomfield and Howarth, 1969; Engel, 1976). However, it is surprising that most correlations between visual lobe parameters and search times are not as significant as might have been expected (Courtney and
Visual search
47
Chan, 1986; Erickson, 1964; Johnston, 1965). The relatively low correlations might be caused by two factors. First, the use of tachistoscopical measurements in assessing the lobe area might be inappropriate. Klein and Farrell (1989) have shown that a short exposure duration is not a particularly good way to control eye-fixation because subjects adopt special strategies to deal with a rapidly decaying icon. This might obviously lead to a different performance level in comparison with situations in which eye movements are allowed. Various alternative lobe measurement techniques have been proposed in which the visual lobe is estimated from free search behavior. For example, Mackworth (1976) presented his subjects displays consisting of horizontal lines separated by a variable vertical distance. Subjects were asked to search each strip consisting of several rows of circles with the aim of locating a square among the circles. They were further instructed to scan the lines from left to right without stopping or looking back. Mackworth (1976) theorized that the smallest vertical distance between the horizontal lines that caused the pattern of fixation to deviate significantly from a horizontal scan was equivalent to lobe width. Indeed, several studies (Mackworth, 1981) suggested effects of distance between the lines as well as of display density. However, the main drawback of Mackworth's studies is that a subject's scanning pattern might also be affected by speed-accuracy trade-off. Subjects were free to choose their preferred fixation duration and saccade length. These variables surely affect the size of the visual lobe and have, therefore, to be taken into account in order to properly estimate the lobe size (Sanders and Briick, 1991). Prinz and his colleagues (Nattkemper and Prinz, 1984, 1990; Prinz and Kehrer, 1982) developed a similar method in which subjects were instructed to scan a list of letters row by row, from top to bottom (as in reading), and to search for a target as quickly as possible. A trial ended when the target was detected or when the end of the list was reached. Eye movements were measured, to determine the vertical detection distance, that is the vertical distance from the actual fixation point to the target location at the moment of detection. On the basis of this vertical detection distance, the integration area, i.e. the field around the fixation point from which information is processed, could be estimated. Prinz and his colleagues used this method to study the effects of visual density (Nattkemper and Prinz, 1990), the difference between the upper and lower hemiretina (Prinz, 1984) and the effect of redundancy (Nattkemper and Prinz, 1984). A quite different method was used by Barbur, Forsyth and Wooding (1993). Basically, they presented their subjects with multiple stimuli equidistant to the central fixation point, i.e. stimuli were arranged in a circle. Subjects were required to make one single saccade to a predefined target in the circle. The visual lobe then was equal to the radius of the circle at which 50% of all searches required only one saccade. Widdel (1983) suggested still another technique for measuring the visual lobe. His basic idea was that a successful fixation implies that, during that fixation, the target item is detected in the periphery. The next saccade is determined by this peripheral stimulus so that the next fixation will attain the target. When the target is fixated at fixation n, fixation n - 1 is the successful fixation because the target was detected peripherally during that fixation. In a series of search trials, Widdel (1983) measured the distances of the fixations n - 1 to the targets and computed
48
A. F. Sanders and M. Donk
their distribution, thus developing a method that certainly deserves attention. Although his method is rather complicated, its main advantage is the absence of tachistoscopic presentation techniques. Furthermore, he excluded the influence of individual scanning strategies by only using the last previous fixation distance to the target. Although promising, the dynamic techniques are usually more complicated than static lobe determinations. In addition, it has not yet been tested whether dynamic lobe measurements are really better correlated with search time than are static lobe measurements. A second major factor that might be responsible for the relatively low correlations between lobe measures and search time is the fact that the visual lobe is not the only determinant of visual search behavior. The strategy adopted in search may also have a considerable effect on search time (see the section on 'Patterns of eye movements in visual search'). In addition, attending to a certain location, the novelty of targets or the meaning of targets may exert influence on search time. Directed attention to a location generally decreases the peripheral sensitivity threshold at that location resulting in a higher detection probability. Engel (1976) recognized this and distinguished, therefore, between the conspicuity and the visibility area. The visibility area represents the peripheral area around the central fixation point within which a target can be detected with foreknowledge about its location. The distinction between the visibility area and the conspicuity area is useful in that it reflects the difference between attending and not attending a location (Posner and Snyder, 1975). However, the extent to which, and the conditions under which, the visibility area is a good predictor for search time is still unclear. Yet, it seems plausible that the combination of the conspicuity area and the visibility area might be a better predictor than either one in isolation.
2
ANATOMICAL
CONSTRAINTS
Spatial uncertainty and the need for search increases with the size of the display for the simple reason that there are more demands on the visual periphery. One question is to what extent the same search principles apply to displays of a different size. It may be useful to distinguish here between the macrostructure and the microstructure of the visual field. The microstructure is characterized by local detail which is only visible in the fovea and parafovea whereas the macrostructure concerns large-sized stimuli with contours that are still visible at a wider visual angle. The microstructure was addressed in the previous section on the visual lobe. The macrostructure will be treated in the present section. Sanders (1963, 1970, 1993; Sanders and Houtmans, 1984) argues that different search modes are involved if a display exceeds the limits set by the need for eye and head movements. In a standard experiment, two widely separated signals are presented at about eye level and at equal distances from the meridian. At the start of a trial the left signal is fixated; this signal should be processed at presentation, followed by a saccade to the right signal whereupon a trial ends with a same-different response. The basic phenomenon, which has been confirmed in a score of experiments, is that the time elapsing from the completion of the saccade (the start of fixating the right signal) until the completion of the same-different response is fairly constant as long
Visual search
49
as the total display angle is within the limits of the 'eye field', but increases with some 100 ms as soon as head movements are required to supplement the saccade, i.e. when the display is in the 'head field'. In the case of the stimuli that were commonly used in the original studies (columns of either four or five dots) the head field started at a binocular display angle of about 80-90 ~ and at about 60 ~ in the case of a more complex stimulus configuration. The original studies also showed another increase in processing time at a display angle at which a saccade was first needed, i.e. at about 25 ~. It was assumed that at smaller angles both signals could be identified by mere peripheral viewing. Together, these results suggested three processing modes: as long as no saccade is needed to arrive at the same-different response (the 'stationary field'), the stimuli are encoded as an integrated whole. In the eye field, the stimuli are independently encoded, but, while encoding the left signal, a hypothesis arises about the right signal which then speeds up the response time from the moment that the right signal is fixated. In the head field, finally, the left and right signal are processed as fully independent events. In other words, the results suggest that, whatever position is fixated, the visual scene can use a constant background schema in the eye field; search can consist of scanning relevant parts of the same schema. In contrast, changes of the background are required when the display angle is within the head field; establishing a new background adds to searching for the target. For the theory of search it would follow that, as long as the display is within the eye field, hypotheses about peripherally located signals are o b t a i n e d - a s far as allowed by peripheral vision- directing the gaze and enhancing the efficiency of subsequent identification. Thus, the eye field might be the limiting case of the lobe. The problem is of course that the experiments, as outlined above, are not proper search studies, so that the validity of the extrapolation remains dubious. However, Sanders (1963, 1970) also carried out a search experiment in which six columns of dots were presented in a horizontal row. All columns but one contained four dots, while one column had five dots. The task was to identify the position of the five-dot column and press a corresponding response key, after which a new display was presented without delay. Apart from a repetition effect in the case of successive five-dot columns at the same position, the results showed a perfectly linear relationship between response time and the visual angle between the target positions at successive presentations as long as the total display was within the eye field. The slope of this function could be largely ascribed to the time taken by a single saccade between the target position at successive trials. This suggests that a reliable hypothesis was obtained about the location of the five-dot column. When the total display exceeded the eye field, response times increased by 100 ms if the angle between the target positions at successive trials exceeded 35 ~. This suggests that, in the head field, the display is inspected in two successive percepts, one covering the area in the immediate vicinity of the target position of the previous trial, and a second one covering the remaining dot columns. The results suggested that the distinction between the processing modes in the eye and head field have some generality. Later studies have confirmed the generality of the distinction between the eye and the head field, but have also shown that the difference between the stationary field and the eye field depends on the specific stimulus configuration. Thus, Sanders and Houtmans (1985a) found evidence for the stationary field only if the stimuli were dot columns and not if they were digits. They concluded that two
50
A. F. Sanders and M. Donk
adjacent stimuli can only be encoded as a single 'whole' if the configuration actually permits recoding. This is intuitively appealing when the signals consist of two columns containing four or five dots. Perhaps this allows recoding in terms of a 'missing dot'. The results on digits as stimuli do not suggest recoding since, in that case, the processing times in the stationary field and the eye field hardly differed. Sanders and Houtmans proposed to maintain the distinction between the stationary field and the eye field with the restriction that a stationary field is more efficient only if recoding occurs. In fact it might serve as a check that such a condition has been realized. Further studies have confirmed the view that, in the eye field, the shorter response latency following fixation of the right signal is due to a hypothesis about the right signal, which arises during the fixation of the left signal. The main evidence came from a study in which the presentation of the right signal was somewhat delayed so that it either still occurred during fixation of the left signal, during the saccade, or even after fixation of the position of the right signal. The shorter response latency in the eye field was found only in the first case, confirming that fixation of the left signal is a prerequisite. In the head field the moment of presentation did not affect response latency, which should be expected if the signals are processed as independent percepts. Together with additional data from Sanders and Reitsma (1982b), these results also excluded an alternative explanation that ascribes the shorter response latency following fixation of the right signal to the possibility that, in the eye field, subjects are capable of starting processing of the right signal immediately on completion of a saccade, whereas in the head field, this is not possible. Sanders and Reitsma found that, in comparison with a normal two-choice reaction time, the latency measured from the fixation of the right signal to the response was shorter in the eye field but about equally long in the head field. Research on the nature of the hypothesis about the right signal, obtained during fixation of the left signal, has so far failed to deliver a satisfactory picture. Houtmans and Sanders (1984) varied the quality of the right signal and found that, in the case of a degraded right signal, the shorter response latency in the eye field was reduced but not eliminated. They also found that when intact and degraded right signals were mixed in a block of trials, and when intact signals were relatively infrequent, the response latency was about equally reduced, irrespective of the quality of the right signal. These results go beyond the trivial conclusion that an intact right signal delivers more peripheral information than a degraded one. First, an intact right signal was advantageous only when it was expected to occursuggesting the involvement of attentional control- and, second, there was still a reduction in latency in the case of a degraded right signal, in which condition subjects were incapable of identifying the right signal during fixation of the left one. Another issue of interest is the end-product of a single fixation. In the context of the functional visual field this can be studied by analyzing the properties of the fixation time of the left signal. Several interesting results have emerged: first, there is no evidence that acquisition of peripheral information about the right signal delays the fixation duration of the left signal. It is not uncommon to find a somewhat longer fixation duration in the head field, but that can also be ascribed to more complex motor programming of a combined eye-head movement. Second, Sanders and Houtmans (1985b) found that, when the left signal is degraded, the fixation duration of the left signal increases by about the same amount as in a traditional choice reaction study. Thus, the eyes appear not to leave the left signal
Visual search
51
before all perceptual problems have been solved. This is not related to intake of information, since presentation duration of the left signal did not appear to affect its fixation duration. Sanders and Rath (1991) found that, when under heavy time pressure, subjects may actually refrain from perceptual processing during the fixation of the left signal. However, this results in prolonged processing during fixation of the right signal. Together these results also suggest that a saccade blocks not only sensory processes, but also central perceptual processing of information, obtained during the previous fixation. Boer and Van de Weijgert (1988) found that, when the left signal had to be classified (target/no target), classification time was not reflected in the left fixation time, so that finishing a fixation and starting a saccade may be bounded by perceptual processing and not by further central processing. Boer and Van de Weijgert's results also suggest that target classification may indeed occur during the saccade. More recently, these results were further confirmed by Van Duren and Sanders (1992) in an extensive series of experiments which showed that target classification as well as response selection can occur during a saccade. A final issue concerns the output of a fixation. This was addressed in a study by Hansen and Sanders (1988) who varied the signal quality of both the left and right signals and found that correspondence of signal quality facilitated the required same/different response. Thus, if both signals were intact, processing time during fixation of the right signal was faster than when the left signal was degraded. However, if both signals were degraded there was also faster processing of the right signal in comparison with the condition where the left signal was intact and the right signal degraded. This result suggests that the stored code of the left signal still contains the features, both relevant and irrelevant, of the last fixated signal. Encoding may have the effect of inventarizing features and making them individually accessible, but not of getting rid of irrelevant features. Alternatively, it could be that the observed result was due to a closer correspondence of processing demands of homogeneous stimuli, i.e. ~tact/intact or degraded/degraded, compared with heterogeneous stimuli, i.e. intact/degraded or degraded/intact (Los, 1994).
3
INFORMATION
PROCESSING
CONSTRAINTS
An account of information processing constraints on visual search may conveniently start with the work of Neisser (1963, 1967) who introduced a continuous visual search task in which structured letter matrices were presented. The instruction was to search either for the presence of a target, which occurred only once in the whole matrix, or for the absence of a target, which always occurred once in each row with the exception of the row that should be detected. Subjects scanned the rows of letters in a prescribed order so as to rule out influences of strategy. Search time, plotted as a function of the location of the target in the matrix, showed a perfectly linear function with a search rate of between 3 and 10 items per second, depending on the similarity between the features of the target and the background, i.e. the more similar the features, the slower the rate, and on whether subjects searched for presence or absence of the target, i.e. search for presence was much faster. 1
52
A. F. Sanders and M. Donk
In line with early selection theory, Neisser suggested that subjects do not identify each individual signal but start with a fast and parallel preattentive search, operating on general physical features rather than on local letters. A letter is identified by w a y of focal attention when all feature tests favor the target, according to a 'shouting demon' principle (Selfridge and Neisser, 1959). Neisser's two-process theory, i.e. preattentive search followed by focal attentive identification, assumes that subjects maintain an active template of the target. Non-targets play a role only in that, as target and non-targets are more similar, there are more critical features to be tested, so as to differentiate between targets and non-targets, which has the effect that search slows down. When searching for absence rather than for presence of a target, each row requires a time-consuming full identification of the target. The finding that search for absence is much slower poses obvious problems for a simple late selection theory which states that individual letters are always fully identified. Such a theory would require the additional hypothesis that deciding about the presence of a signal is particularly attention-demanding and time-consuming. It would then follow that a signal is easier rejected than confirmed, which is clearly at odds with various bodies of research, among which same-different judgements (Farrell, 1985) and target classification are prime examples. Neisser's two-process theory can easily accommodate various well-established findings on effective search for targets with a distinctive physical feature such as color, size and form. Thus, in a free search task, Williams (1966) measured saccades and fixations on large displays, containing two-digit numbers on differently colored, shaped and sized fields. The target was always a two-digit number with a prescribed feature. Numbers with a prescribed color were more frequently fixated, but, in contrast, shape hardly served as a distinctive feature in this study. In the same vein, Green and Anderson (1956) found that search rate depended on the number of non-targets with the same color as the target (see also Noble and Sanders, 1981), suggesting simple early selection with color as selection criterion. They also found that varying the number,of irrelevant colors of non-targets hardly affected search speed. As another illustration, Willows and McKinnon (1973) found that when alternating lines had a different color subjects could limit themselves to reading the lines with the same color, without any distraction from the lines with the other color. Neisser's (1963) own studies had shown that search for a straight letter amid curved non-targets, or vice versa, increased search rate. In a situation where subjects had the task to count the total number of targets on the display, Gould and Dill (1969) found that targets were fixated longer than non-targets, which was taken as evidence that non-targets are less deeply analysed. The longer fixation of targets might as well reflect the extra time taken by counting, but Gould and Dill also found that similar non-targets were fixated longer than nonsimilar ones, suggesting at least a contribution of differential depth of analysis. Again, Corcoran and Jackson (1979) had subjects search for a ' ~ ' among straight or curved letters as non-targets and found that changes within either set of non-targets, or
1Search rates in the absence of saccades (Sperling et al., 1971) are considerably faster, which can be only partially explained by the time taken by the saccades. It has been suggested that, in addition, search in the Neisser type of task slows down as a consequence of, first, lateral interference between letters (Bouma, 1978) and, second, the time taken by programming a saccade during each fixation. This last suggestion entails that search and motor programming cannot occur in parallel, which is more consistent with discrete than with continuous models of processing information (Miller, 1988; Sanders, 1990).
Visual search
53
even a transition from straight to curved ones, did not affect search rate. Apparently subjects could either use the ' 9 or the ' / ' as criterion for their search. This is consistent with the further finding of a pronounced negative effect on search rate when non-targets consist of a combination of straight and curved shapes. The finding that shape has more pronounced effects in some studies than in others might well be related to line search or free search: the less powerful shape feature might be less effective in free search, in which it must determine the direction of the saccades, than in line search, in which it plays a role only in distinguishing incoming stimuli.
3.1
Criticisms of the Two-Process Theory
Despite the impressive set of supporting data, Neisser's two-process model has not been undebated. As in other paradigms on selective attention, the discussion centers around the nature of the criteria that can be used for separating targets and non-targets. In turn, this issue is connected to the question about the extent to which stimuli are fully encoded during search, which is at the basis of the late-selection view of attention. The main reason for this discussion in the area of visual search is that variables affecting non-targets appear more relevant contributors to search rate than originally envisaged by the early selection dual-process theory. For example, in contrast to Green and Anderson's (1956) results, Cahill and Carter (1976) found that heterogeneity of irrelevant colors did affect search for a specific color. Yet, this contradiction might still be explained by dual-process theory in terms of a limited set of feature maps. In that case more heterogeneity increases the probability that the critical feature map is also activated by nontargets. In line with this reasoning, Moraglia et al. (1989) distinguished between non-target heterogeneity and similarity, and found that search time was affected only by similarity and not by heterogeneity per se. However, a simple early selection model faces some problems, which concentrate around findings that composite features may be used as a global search criterion, the physical nature of which is rather hard to imagine, and that, in fact, such composite features can be learned. The observation that, without identification, a target letter can be detected among non-target digits, and vice versa (Brand, 1971), is a case in point. Searching for a digit among letters is relatively easy in that the digit also appears to pop out (Jonides and Gleitman, 1972), although this effect has been found to disappear when searching for various alternative digits (Francolini and Egeth, 1979). The observation of the 'oh-zero' effect, i.e. that, depending on instruction, the ' 9 may pop out as a letter among digits or a digit among letters, has been taken as evidence that the distinctive features may not invariably reflect physical features but can be semantic as well, which is in clear conflict with the dual-process notion. However, the oh-zero effect is not easily reproduced (Duncan, 1983) which renders the issue unsettled. There is additional evidence that, under certain conditions at least, non-targets are fully analyzed, despite the presence of a distinctive physical characteristic. For example, when searching for an 'A' amidst a, E, c and C as background, subjects appear to be distracted by the 'a', suggesting an effect on the name level, despite the presence of shape as a distinctive feature (Hendersen and Chard, 1971).
54
A. F. Sanders and M. Donk
Other evidence stems from the well-known work of Shiffrin and Schneider (1977), Schneider and Shiffrin (1977) and Schneider et al. (1984) suggesting that 'automatic' detection of target letters among non-target letters is possible when subjects receive massive amounts of practice in which targets and non-targets are consistent in that they never change roles. Again, Rabbitt (1964, 1967) found that changing the non-target elements, which all consisted of straight-line letters, slowed down search for a straight-line letter from the target set. This suggests that subjects learn certain cues that are no longer valid when they have to look for the same straight-line targets among other straight-line non-targets. The effect proved to be stronger when searching for one out of eight target letters than when searching for one out of two alternative letters, which suggests a more refined set of distinctive features in the former case. The further observation that it matters whether subjects have to detect 'targets' or to respond differentially to different targets (Rabbitt, 1981) supports the idea that one deals with complex features, defining a learned category, which is almost certainly beyond physical properties. This is of course the critical deviation from early selection: effects of practice in learning how to categorize physical target and background features can be accounted for by early selection, but the features should always be on a global physical level. Conditions in which full identification of all items is observed are obviously in line with late selection theory. Yet, as noted before, full identification is not the usual finding in visual search, but is largely limited to studies on target detection during single fixations. In fact, there might be no proper physical criterion in those conditions, with the consequence that response set, i.e. number of alternative responses, rather than stimulus set, i.e. number of alternative stimuli, tends to dominate (Broadbent, 1971). Response set phenomena might still be of some relevance when search goes on in a systematic sequence of fixations, such as in line scanning, but largely disappear in free search where positional information is more relevant for deciding where to fixate next. The question remains whether phenomena of response set can be reconciled with a global parallel scan as implied by the two-process theory. According to this theory a global always precedes a local analysis, so that an item cannot be identified unless the global level has been successfully passed. As an alternative, Prinz (1987) has suggested that selection may occur simultaneously at a sensory (early-global-nonattentive) level and at a semantic (late-local-attentive) level. This is consistent with recent views, which no longer view attentional selection in terms of rejection or attenuation so as to overcome a limited capacity bottleneck but as creating a functional difference between relevant and irrelevant information (Allport, 1987; Humphreys and M/iller, 1993; Neumann, 1987; Van der Heijden, 1987, 1992; see also Chapter 1). Prinz (1987) has suggested three experimental results as evidence for simultaneous selection in search. The first concerned the size of the lobe in a Neisser type of line search, which turned out to be larger for round target letters among straight non-target letters than for straight target letters among straight non-target letters (Prinz, 1983). Yet, a dual-process theory might explain this result simply in terms of lateral inhibition, which has a stronger damaging effect on complex than on simple features. The second result concerned the effect of secondary task load: a semantically demanding second task slowed down search but increased the lobe for simple features. This was explained in terms of selective interference of the second task with semantic processing, leading to slower search. Since processing
Visual search
55
the simple features was not affected, they might profit from slower search rate and, hence, have a greater lobe (Prinz and Nattkemper, 1986). Yet, this result could equally well be interpreted in terms of more interference as the conditions ask for more detailed processing, reflecting response set. The simultaneous sensory and semantic processing interpretation of Prinz and Nattkemper would require an additional experiment in which a second task differentially affected sensory selection, with the prediction that, in that case, semantic selection would be undisturbed. Finally, Prinz, Meinecke and Hielscher (1987) studied the category effect on the basis of semantic features. They found that, when degrading letters in a Neisser task, the category effect remained, while the difference between categories disappeared with respect to their effect on the lobe. Yet, a dual-process notion might explain this result by a sequential full analysis in the case of the degraded letters. Simultaneous selection at the featural and semantic level has also been proposed by Ellis and Chase (1971), but their study was concerned with memory search of a single stimulus. Target identification on the basis of a feature was found to be faster only in the case of a larger memory set but not when a small memory set was used. Along the lines of Posner and Boies (1971), they suggested a parallel search for physical and name features, a horse race deciding which process would be first completed. Yet, as cogently argued by Broadbent (1982) and by Kahneman and Treisman (1984), a single stimulus fully excludes a stimulus set, thus evoking a competitive analysis between the various stimulus dimensions, featural and semantic alike. A similar problem might have occurred in studies such as those of Cavanagh and Chase (1971), who found full processing when two stimuli (a target and a non-target) were briefly presented in close vicinity. All these situations seem quite different from real overload conditions where a target must be found in the presence of many competing visual signals during free search. A global-to-local search may largely apply to the latter case.
3.2 Target Versus Background Control Prinz combined the simultaneous selection hypothesis with an additional principle that is also found in Chase (1986). The starting point of this view is the widespread a s s u m p t i o n - among others in the literature on the orientation response (Sokolov, 1963)- that people have an internal model of the perceptual environment. Perception, then, is viewed as an exchange process between stimulus information and the internal model, serving the primary function of deciding whether reality fits the model rather than deciding what is actually there. When searching for a target, subjects have an internal model of the background and check this background for deviations. Targets are detected by default, in that they deviate from non-targets. This view differs from the traditional view which holds that people keep an active representation of the target, aiming at a fit of that representation with a particular element in the environment. Instead, search becomes a matter of analyzing the context, so as to detect deviations, rather than of checking target representations. Context control can occur both on the semantic and on the featural level and is not incompatible, therefore, with either the notion of simultaneous selection, as advocated by Prinz, or even of successive preattentive and attentive selection as implied by Treisman and Neisser.
56
A. F. Sanders and M. Donk
Various general results on search are consistent with either a notion of target or context control. Thus, targets are usually incidental, which qualifies them as deviations from normal. This explains that searching for the presence of a target in a Neisser task is faster than searching for its absence. Again, the more a target differs from the context, the faster a deviation is noted. As specific support for the context theory, Prinz (1987) has summarized various experimental results. First, there is the phenomenon of pseudo-targets, referring to the observation that subjects spend relatively more time on new non-targets, i.e. non-targets that did not yet occur in the list (Prinz, Tweer and Feige, 1974). The possibility that new non-targets share more features with the target than with the existing non-targets was well controlled in this study, which makes the result hard to explain in terms of target control. A second result concerns the effect of non-target sequential redundancy. Autocorrelation of the non-targets speeds up search, which suggests that relations between non-targets are analyzed (Prinz, 1979). Nattkemper and Prinz (1984) found that, in a Neisser task, lists with predictable non-targets were searched slower in early practice, allowing for the build-up of an internal model, which then leads to a more efficient search once the model had been established. Again, the finding that, in the case of multiple targets, a target is sometimes detected prior to its identification follows easily from the view that a deviation from the context is detected. Here, the traditional view of target control faces the problem mentioned earlier of describing the specific distinctive features defining the target set. Clusters of non-targets, allowing rapid rejection, may be gradually learned. Yet, the evidence suggests that not all effects of non-targets, such as the pseudo-target effect, depend on practice. Prinz suggests that information about non-targets is retained in a short-term visual store, ready for immediate use at the feature level for evaluating the next trial; this predicts a considerable increase in search speed when the same non-targets are repeated, and this has been found. Autocorrelations between non-targets, on the other hand, would be processed on the semantic level, and therefore depend more on practice. Together these results are consistent with those of Rabbitt (1981), who found that when subjects search for specific targets, they adopt a fixed optimal scan path when the background is familiar, while the scan is unsystematic in the case of an unfamiliar display. Context encoding is certainly a viable alternative to target representation, in particular because it is also relevant to the interpretation of results on searching more structured environments. Yet, it may not be the only principle in search. For instance, Rabbitt, Cumming and Vyas (1977) found that search is fast when the same target is presented on successive displays, and even faster when appearing in the same spatial position. The variance in background is irrelevant in that case; it seems that subjects start the search with a rapid identity check, which is based on target representation. The identity check extends to irrelevant aspects of the target (Jordan and Rabbitt, 1977), suggesting that a complete signal, including irrelevant dimensions, is stored for comparison. Again, Rabbitt (1984) observed that negative transfer, induced by changing background items, ceases to exist after very prolonged practice. Presumably, a unique complex set of features has been acquired which promotes checking the target rather than the background. So far the discussion has been restricted to 'homogeneous' backgrounds, such as when scanning rows of letters, inspecting briefly presented displays with at most
Visual search
57
four items, or in free search of homogeneous noisy fields containing a slightly deviant target. Background effects are much more pronounced in the case of a coherent meaningful or a structured meaningless background. The classic studies of Biederman (1972) are a case in point: coherent or jumbled scenes were presented for a brief period of time, preceded or followed by an arrow pointing to an object on the picture. In all cases it was easier to identify the object when subjects had seen the coherent picture. Biederman, Glass and Stacey (1973) also found that detection of the absence of an object was also much easier on a coherent picture. A coherent background corresponds to internal schemata, i.e. abstract cognitive structures with both invariant and variable properties, which may be summarized by the label 'knowledge of the world'. When scanning a picture, a schema is initiated and parameter values are assigned to the various variables. This explains why 'the gaze selects informative details within pictures' (Mackworth and Morandi, 1967). The background plays also a dominant role in the case of a structured Gestalt, as abundantly shown in the Gestalt literature (Metzger, 1954). A classic example concerns the Gottschaldt hidden figures, where a target is very hard to detect when embedded in a well-structured background. More recent demonstrations stem, among others, from studies by Banks and Prinzmetal (1976) who showed that targets outside a cluster of background elements are more rapidly detected. Farmer and Taylor (1980) showed that backgrounds of different hues and brightness were more rapidly scanned when grouped than when scattered. The finding that similar effects occur with letter search (Marken and Patterson, 1979), i.e. AAAPPPQQQ faster than, say, APQAPQ, is in line with Prinz's results on background redundancy. The effect of the background organization was also confirmed in a study by Moraglia (1989) who had subjects search for a horizontal line segment through displays containing varying numbers of elements, differing from the target and from each other in orientation. The detection latency was considerably longer when the line segments were randomly positioned than when they were arranged in concentric circles. Background organization directs an observer's search along regular paths, from which search may benefit (if the path leads to the target) or suffer (if the path leads away from the target). One application is the sweepline on radar displays. When subjects follow the sweep, they automatically cover the whole display so as to increase the probability of target detection (Teichner and Mocharnuk, 1979). Effects of background organization show that search uses redundancy in the perceptual environment and is guided by known environmental structures. The first element fits traditional early selection theory, which has always maintained that perceptual structure reduces overload, and hence reduces the need for selective attention (Broadbent, 1958). The second is harder for theories that conceive selective attention as an indirect process of comparison of features from a target and its neighbors (Treisman and Gelade, 1980). The point is that it reflects neither a successive nor a simultaneous featural and semantic guidance of search, but, instead, a top-down local guidance in search for deviations from a meaningful schema. This seems more in line with a context-related approach, which holds that the conspicuity of a target depends on whether it fits or disrupts a total stimulus pattern (Duncan and Humphreys, 1989; Humphreys, Quinlan and Riddoch, 1989; Humphreys, Riddoch and Quinlan, 1985). The evidence in favor of this theory
58
A. F. Sanders and M. Donk
stems from studies in which targets with the same feature sets are used, with the difference that the target either fits or disrupts the background (Smets and Stappers, 1990).
3.3
Discussion
The foregoing discussion has made abundantly clear that there is little uniformity with respect to emphases, paradigms or methods used in the investigation of the structural constraints of visual search behavior. Sensory constraints are mainly described in terms of psychophysical limits while neglecting theoretical notions on either anatomical constraints or information processing. Anatomical constraints are probably important when inspecting large displays. However, neither conspicuity nor attentional limits were controlled for, which renders it doubtful whether the results on the functional visual field can be easily extrapolated to other stimulus/background configurations. Information processing constraints, finally, appear in the literature as a wide range of theoretical considerations about the relation between attentional notions on the one hand and display characteristics on the other hand. The main criticism probably concerns the lack of a uniform taxonomy. Paradigms stem from either an early selection view (Broadbent, 1958), a late selection tradition (Deutsch and Deutsch, 1963), or even from a functional view in which attention generally serves the function of creating a difference between relevant and irrelevant information. In addition, most experiments about information processing constraints fail to control for either target conspicuity, as psychophysically defined by Engel (1976), or display angle, which might be important in view of the findings of Sanders (1963, 1970). Thus, research on visual search might profit considerably from a more integrated approach in which all but one type of constraint is kept constant. Effects on search time might then be validly ascribed to the varying constraint.
4
EYE M O V E M E N T S
AND CONSPICUITY
The effects of peripheral vision on spontaneous eye movements have been systematically investigated by Findlay (1980), who developed a coherent psychophysics of saccade generation by using a paradigm developed by Levy-Schoen (1974). Two visual stimuli were simultaneously presented at opposite sides of a fixation point and subjects made a saccade to either one or the other stimulus. By varying the properties of the visual stimuli, he aimed at determining which details of a stimulus attracted saccades. The main conclusion was that the location on the retina (proximity to the fovea) and the amount of transient stimulation were important variables. Spatial detail seemed to be ineffective, at least for the generation of spontaneous eye movements. Although spontaneous eye movements might not be guided by some salient spatial detail, eye movements during visual search might be. In a task where
Visual search
59
subjects searched a heterogeneous visual field, Williams (1966) found that color information could be used effectively to guide the eye to a target, while size and shape information were not effective. Williams (1967) suggested that, when certain characteristics of a target are specified, the searcher attends to objects having those characteristics and attempts to fixate those objects. Objects, then are better fixated on the basis of color than on the basis of either form or size. When the target is specified by all three dimensions, subjects generally select on the basis of only one dimension, e.g. color. There are, however, indications that shape information might also be used to guide the eyes. Gould (1967) showed that, in an array of nine well-spaced patterns, subjects looked more often and longer at non-target patterns as pattern similarity with the target increased. Gould and Dill (1969) replicated this result in that they found that the probability of fixating a pattern increased as the shape of the pattern was more similar to the target shape. Gould (1967) suggested that peripherally discriminable patterns do not require foveal fixation. Functionally, peripheral vision would not fundamentally differ from foveal vision. Only the quality of peripherally presented information is worse than foveal information. Both Gould (1967) and Williams (1967) emphasized the capability of the observer to decide on the basis of extrafoveal information about whether a stimulus is a target or not and, thus, whether or not it requires fixation. According to Engel (1976), conspicuity is the main determinant of an eye movement. He emphasized that parafoveal irregularities automatically attract the eye. He presented his subjects with a homogeneous display of many randomly located background disks among which were two deviant disks. One of the target disks was smaller and one was larger than the background disks. Subjects searched for either the small or the large one. During search for the target there appeared to be a tendency to fixate the irrelevant deviant disk, which appeared to be related to the conspicuity of the irrelevant disk. Thus, when deviating stimuli are present in the periphery, an eye movement will be initiated towards them. The decision as to whether a stimulus is a target is generally taken on the basis of foveal information while extrafoveal information serves to detect irregularities in the total stimulus pattern. In general, eye movement behavior seems to depend strongly on the search display. In homogeneous fields, peripheral vision may serve to select a point for the next fixation which shows at least some irregularity in comparison with the background (Engel, 1977). Peripheral vision in heterogeneous search fields may merely serve to select a point that shares common features with the target (Gould, 1967; Williams, 1967). However, when targets cannot be discriminated from distractors on the basis of parafoveal vision, the adopted search strategy might be the main determinant of search behavior.
5
EYE M O V E M E N T S
AND
ATTENTION
The previous section was explicitly concerned with the question to what extent eye movements are evoked by stimulus characteristics. The main question in the present section is whether eye movements depend on attentional shifts. Logically, the relationship between eye movements and spatial attention might have several forms (Shepherd, Findlay and Hockey, 1986).
60
A. F. Sanders and M. Donk
At one extreme, the processes involved in generating an eye movement as well as attention to a location in space might be completely identical. However, this is unlikely because many experiments have shown that attention can be moved to different parts of the visual field in the absence of overt eye movements (Eriksen and Eriksen, 1974; Eriksen and Hoffman, 1972, 1973; Posner, 1980; Posner, Nissen and Ogden, 1978). At the other extreme, the processes involved in eye movements and attention might be fully independent of each other. This hypothesis is also unprobable. For example, Crovitz and Davies (1962) found that preparation to make an eye movement towards a stimulus facilitates subsequent perception of that stimulus. Again, Nissen, Posner and Snyder (1978) found that, relative to other locations, reaction time to the onset of a stimulus at the saccade target position was faster prior to the start of the eye movement. This suggests that attention shifts and saccades may share a common movement mechanism. A less extreme possibility would be that, at some stage of their execution, the processes involved in generating eye movements and attention share a common element; execution of either process might then be facilitated or inhibited by the other depending on their respective spatial goals. Several studies appear to provide evidence for this position. Broadly, a distinction can be made between studies investigating the effect of eye movements on attending a location and studies emphasizing the influence of attending a location on eye movements. Effects of an eye movement on attention allocation are usually inferred from probe reaction times or from proportions of correctly reported probes, presented on different locations at various time intervals following a cue indicating the direction of the required saccade. The main emphasis is on the question of what happens to the allocation of attention when a saccade is carried out? Thus, Nissen et al. (1978) performed an experiment in which they summoned saccades by peripheral cues. Reaction times to subsequent probe stimuli, occurring at the peripheral cue position or at the central fixation position, indicated that preparing a saccade necessarily involved allocation of attention to the target position in advance of the actual eye movement. At least under peripheral cueing conditions, there seemed to be a strong link between preparing a saccade and allocation of attention to the target position. Remington (1980) has suggested that attention allocation and eye movements are elicited by the same system but are separately controlled. In some of his conditions, he also used a peripheral cue to indicate the target position for an eye movement. Subjects were required to report the presence or the absence of a probe stimulus which occurred on 50% of trials at one out of four positions and at various time intervals following the cue. In another experiment, he used central cues to indicate the direction of the saccade. He concluded that a peripheral stimulus summoned both attention and an eye movement since detection of the target position steadily improved until it was overruled by saccadic suppression. When saccades were directed by a central cue, there was no evidence for prior allocation of attention to the target location of the eye movement. These results indicate a tighter connection between the peripheral cue and attention than between attention and the saccade. Shepherd et al. (1986), on the other hand, found evidence for a closer relation between attention and eye movements. In their experiments, spatial attention was manipulated by varying the probability that a peripheral probe stimulus would appear in different positions, while saccades were directed by a central arrow, enabling separation of the effects of attention and eye movements. When the
Visual search
61
saccade was directed away from the most likely position of the probe, the effect of moving the eyes proved to be stronger than the effect of spatial cueing until well after the saccade had finished. This suggests that making a voluntary saccade necessarily involves allocation of attention to the target position, which competes for resources with the processes that underlie the allocation of attention to the other position. In line with these findings are the results of Henderson (1993) who found that peripheral preview of a letter string facilitated the response to an identical target string when a saccade had to be directed toward the position of that letter string. No facilitation was found when subjects had to remain fixated on the central fixation point. In summary, the evidence concerning attention shifts in relation to saccadic eye movements is not unequivocal. Although the results of Shepherd et al. (1986) and Henderson (1993) suggest that attention shifts necessarily accompany eye movements, Remington's (1980) results suggest otherwise. In Henderson's experiment subjects had to respond to letter strings whereas Remington and Shepherd et al. used probe reaction time tasks. This renders a direct comparison difficult. Also, the experiments of Remington and Shepherd et al. are quite different from each other: a major difference concerns the presentation time of the probe. While Remington (1980) presented probes in the form of a very brief (3 ms) near-threshold luminance increase, Shepherd et al. (1986) presented probes until the subject had responded. The probe in Remington's experiment was a fairly bright box subtending a visual angle of 1.37 ~ whereas in the experiment of Shepherd et al. (1986) the probe was a dark square of about 0.3 ~. Finally, Remington (1980) determined the proportion of hits while Shepherd et al. (1986) measured probe reaction times. To what extent these differences are responsible for the differences in outcomes is unclear. Any general conclusion as to whether eye movements elicit attention shifts seems to be premature. The reverse question, i.e. whether attention shifts elicit eye movements, has also stimulated some research. For instance, Shepherd et al. (1986) investigated the effect of probe stimuli on saccade latencies: probe stimuli appearing before the saccade shortened saccade latencies if they appeared at the saccade's target and lengthened saccade latencies if they appeared on the opposite side of the saccade's target. Thus, attention seems to play an important role in the generation of voluntary eye movements. Saslow (1967) reported a study in which he had a temporal separation, i.e. temporal gap, between the offset of a central fixation point and the onset of a peripheral target light that elicited a saccade. Temporal separation decreased the saccadic reaction time to about 150 ms compared with 250 ms when the central fixation point remained visible. Fischer and Breitmeyer (1987) presented subjects with three different conditions. In one condition, the gap paradigm was used, i.e. the fixation point switched off 200 ms prior to the peripheral target. In another condition, the central fixation point remained on and subjects were instructed to direct attention to it. The third condition was similar to the second one except for the instruction, in which subjects were told to ignore the central fixation point. The results indicated that both the separation of the central fixation light and the peripheral cue and ignoring the central fixation light reduced the latencies of saccades. It should be added, however, that the effect of temporal separation was much more pronounced. In another experiment (Mayfrank, Mobashery, Kimmig and Fischer, 1986), subjects were instructed to direct their gaze to the middle of a
62
A. F. Sanders and M. Donk
screen without a fixation point. Attention had to be directed to a peripheral light. The subsequent target appeared either 4 ~ to the right or to the left of the center of the screen. In the separation condition, in which the attender peripheral light disappeared 200 ms prior to the presentation of the target for the saccade, saccade latencies were generally short. In other conditions, in which the peripheral light source remained on, saccade latencies were longer, even in the condition in which the peripheral light was located at the same position as the target for the saccade. These results were taken as evidence for the hypothesis that a disengagement of attention is necessary before the saccade can be initiated. This disengagement of attention is assumed to take some extra time. The results of Shepherd et al. (1986) suggest that eye movements are facilitated by prior allocation of attention to the target for the saccade while Fischer and Breitmeyer (1987) suggest that starting an eye movement requires a general disengagement from attention. However, the experiments vary with respect to the manipulation of attention. While Shepherd et al. (1986) manipulated attention by means of expectation, Fischer and Breitmeyer (1987) simply told their subjects where to direct their attention. More importantly, a temporal separation might function as a warning signal for the saccadic response. A decrease in saccadic latency might then not be related to disengagement of attention but merely to the constant period preceding the presentation of the target for the saccade. In general, there is some evidence that the prior allocation of attention to the saccade target shortens the saccade latency (Shepherd et al., 1986). Whether attention necessarily moves with the eyes is not clear. Additional studies are required.
6
PATTERNS
O F EYE M O V E M E N T S
IN VISUAL
SEARCH
Eye movements during visual search might be conceived of as including a stochastic process in which the observer samples portions of the display at each fixation (Monk, 1984). Sampling has been modeled as either a random (Krendel and Wodinsky, 1960) or a systematic process (Williams, 1966). Random search implies that the probability of finding a target remains constant over successive fixations and, furthermore, does not depend on events during earlier fixations. Thus, a random search model assumes that the observer is memoryless, so that the same item might be repeatedly fixated. Krendel and Wodinsky (1960) have suggested that a random search strategy can be inferred from the cumulative distribution of search times. A random search strategy expects an exponential cumulative distribution of search times. In contrast, systematic search (Williams, 1966) implies that the same item is never inspected again, so that the observer is supposed to have a perfect memory. Consequently, search times cannot exceed a certain maximum, i.e. the required time for one complete inspection of the display. Engel (1976) has shown that if, as assumed by the systematic search model, search is carried out without repeated fixations, the cumulative distribution function is linear. Mean search time with a systematic search strategy would take about half the time obtained with random search (Howard and Bloomfield, 1968).
Visual search
63
It has been commonly found that search times are exponentially distributed with the result that the random search model has generally been accepted as the most appropriate predictor of search performance (Bloomfield, 1972; Engel, 1976; Howard and Bloomfield, 1969; Krendel and Wodinsky, 1960). Megaw and Richardson (1979) also found that the observed cumulative distributions of search times are best fitted by an exponential distribution. Yet, after a more thorough analysis, a better fit was obtained with a mixed model. Morawski, Drury and Karwan (1980) suggested that some combination of both models should be taken into account. They claim that real performance distributions are usually in between the two functions as described by the random and the systematic model. A major problem with the interpretation of cumulative distributions of search times concerns the fact that sometimes observers look at a target and, yet, totally fail to see it (Mackworth, Kaplan and Metlay, 1964). This looking without seeing might result in an exponential distribution of search times even when a display is systematically scanned. Memory might be perfect while perception fails. In the same vein, Williams (1966) demonstrated that exponential distributions are obtained when subjects employ a systematic strategy with a detection probability of 0.6 instead of 1.0. According to Williams, there starts another systematic search when the whole display area has been searched without successful target detection. These objections suggest that the question of whether search is random or systematic cannot be simply inferred from cumulative search time distributions. A better approach may be the actual recording of eye movements and, even then, firm conclusions seem prohibited as long as one does not know which lobe size subjects actually have in free search.
7
COGNITIVE
MODELS
A more cognitive approach towards the notion of scanning strategies stems from Noton and Stark (1971) and Stark and Ellis (1981) (see also Stark, Yamashita, Tharp and Ngo, 1993). According to their scan path theory, eye movements are controlled by cognitive models that are already present in the brain. These models are perceptual hypotheses to be tested against the complex world of sensory experiences. In the checking phase of what appears to be a hierarchical pattern recognition process, scan paths or repetitious sequences of saccades are generated. Actually, most research on scan paths has been done with pictures. The few studies on homogeneous fields have only led to a few systematic observations. Thus, people tend to avoid the edges of a display and, more generally, more fixations occur in the center than in the periphery (Enoch, 1959). The edge effect can be easily overcome through biasing either by instruction or by display structure (Baker, 1958; Yarbus, 1967). It should be added that scan paths are usually hard to analyze and show considerable individual variation. Using pictures, Mackworth and Morandi (1967) found that regions, rated as highly informative were more frequently fixated. As discussed previously, a meaningful display structure, providing a schema for search, constitutes a prime example of cognitive control. The effect of familiarity of the display structure on the
64
A. F. Sanders and M. Donk
scan pattern (Rabbitt, 1981) is another demonstration of the operation of a cognitive model: confronted with a scene, an observer can recognize within a few milliseconds whether or not that scene has been met before. In another experiment, Friedman (1979; see also Antes and Penland, 1981) informed subjects verbally about the theme of a scene that would be presented. Following presentation, unexpected objects were fixated longer than expected objects. This result suggests that the abstract verbal information is a sufficient basis for generating a concrete set of objects on the basis of which a picture is scanned. The extra processing of unexpected objects was interpreted as evidence for bottom-up processing of elements that did not fit the schema. Loftus and Mackworth (1978) also found that unexpected objects tended to be fixated before expected ones. Expectations are also important in the determination of saccade length. Jacobs (1986, 1987) had subjects search for a target (C, c, k, x') inserted in lines of x's. Jacobs analyzed those lines where no target had occurred and found that saccade size depended on the size of the expected spatial visibility limits, i.e. the visual lobe or visual span in Jacob's terminology, for a particular target. Jacobs (1987) suggested that there is a decisional process at each fixation concerning the presence of a target, ending with a saccade to that target or a shift that is sufficiently outside the visual span to enable an efficient next fixation.
8
OPTIMAL
SCANNING
AND
MONITORING
The most elaborate theoretical analysis of scanning strategies stems undoubtedly from research on scanning discrete displays for relevant signals. The task is usually to detect either a drift of a signal from a predefined optimum or the presence of a signal in some danger zone. Common examples of real-life environments are the cockpit, where several displays inform the pilot about the current state of the aircraft, and the control room, where a human observer has the task of detecting significant deviations from the optimal positions on the display. Some of the earlier applied work in this area has been engaged with search experiments in which sequences of fixations, relative fixation frequencies and durations were measured during a real or a simulated flight. This research (Fitts, Jones and Milton, 1950) resulted in a score of recommendations about improving the arrangement of instruments, so as to enable more efficient control. These include the frequency-ofuse principle, the sequence-of-use principle and the importance principle (Sanders and McCormick, 1987). The major theoretical contributions, the formulation of which started during the 1950s and 1960s, aimed at describing the operator's strategy in deciding where to look at each given moment. The emphasis was on mathematical modeling, usually derived from prevailing concepts. Several alternatives have been proposed but, unfortunately, empirical evidence is scant (Carbonell, 1966; Kvalseth, 1978; Senders, 1983; Sheridan, 1970; Stein and Wewerinke, 1983). The models have in common that they start from some a priori principle of optimal scanning, which subsequently leads to a prediction about what human operators do or should do, provided that they are rational processors of information along the lines of the model's prescrip-
Visual search
65
tion. They also share the assumption that visual scanning strategies depend almost completely on variables related to the dynamics of the displays. Practice has the effect that sampling can be controlled by an internal representation of the process dynamics, which, in principle, should correspond closely to the actual values. The first model (Senders, 1955) proposed that the operator attempts a continuous reconstruction of the various displays. In accord with information theory, this requires that each display should be periodically sampled with a frequency of at least twice its bandwidth. To attain perfect reconstruction, a perfect internal model is required. According to Senders (1983), this might be established only after considerable practice. As a test of this model, Senders carried out experiments on scanning four or six instruments. In the four-instrument condition, the observed and predicted values were in reasonable agreement, but in the six-instrument condition the data deviated from the predicted values: low-bandwidth instruments, i.e. relatively slowly changing signals, were sampled more frequently and highbandwidth instruments, i.e. relatively rapidly changing signals, were sampled less frequently than predicted by the model. The assumption of periodic sampling proved to be untenable. As a post-hoc explanation, Senders suggested that the introduction of two extra signals could have led to short-term forgetting of the signal readings with the consequence that especially low-bandwidth signals must have been inspected more frequently in order to maintain a representation of the actual situation. This oversampling of the low-bandwidth signals could have created an overload situation which resulted in a reduction of time to be allocated to the higher-bandwidth signals. As a main alternative to the information-based reconstruction model, decision-based conditional sampling models were developed during the 1960s. These models suggest that subjects only sample when the risk of missing a critical signal exceeds a certain limit. This means that, in addition to considering bandwidth, observers sample more frequently as the previous reading of a display is closer to the critical zone. In the case of a 'safe' reading, the next reading can be postponed until the previous reading is fully dated, i.e. when the autocorrelation function approaches zero. A main advantage of conditional sampling models is their greater intuitive plausibility: in the reconstruction notion, subjects play simply the passive role of mechanical transmitters of information. In the conditional models, subjects maintain a decision criterion with respect to their strategy. This means that they behave on the basis of both motivational and cognitive factors. It is obvious that in this way both models beautifully reflect the dominant traditions of information and decision theory as formulated in the 1950s and 1960s. Senders (1983) has described several variants of the conditional sampling model, which are all concerned with setting a strategic risk criterion. In one case, subjects only sample when the probability that a signal has entered the critical zone is maximal. This means that the aim is not primarily to detect the moment of entering, but rather whether a signal has actually entered the danger zone. In another version the accepted risk is defined by a threshold, while in still another version the accepted risk varies as a function of the distance of the signal from the danger zone. The variants obviously affect the predicted frequencies of observing, which are smaller when subjects accept the possibility of detecting an error post factum. Yet, the predicted frequencies of observation are always more than in the reconstruction
66
A. F. Sanders and M. Donk
model. This means that the observed undersampling is even more pronounced, and at odds, therefore, with the predictions of the model. Conditional sampling is not the only decision-theoretical prescription for scanning discrete displays. For example, Carbonell (1966) proposed a queueing model in which signal sources queue for inspection with the most urgent cases as first in line. Since displays are assumed to be only serially monitored, there is always a probability of missing a signal. This requires the observer to make a rational decision where to sample next, so as to minimize risk. The optimal strategy is to sample at each moment the display with the highest expected costs of a failure. The model assumes a quick calculation of the costs, connected to each display, before deciding where to fixate next. Sheridan (1970) has also proposed a model with the cost-benefit matrix of observations and actions as main determinant. As in the case in most other models, Sheridan assumes that the controller calls upon internally represented distributions of the processes. In an experimental study, Sheridan and Rouse (1971) found smaller fixation intervals than predicted, and suggested two alternative post-hoc explanations, one cognitive and one motivational. First, subjects could have a deviant (suboptimal or incomplete) internal representation of the processes, which might be repaired by additional practice. Second, subjects might suffer from risk aversion, and not act in accordance with the presumed strategy of risk minimization. There are additional normative formulations along either decision and information theoretical lines (Gai and Curry, 1976; Kvalseth, 1978; Stein and Wewerinke, 1983). They differ from each other, and from the earlier discussed models, with respect to the emphasis on varying aspects of the task. However, all models have in common that they rely largely on external factors, or their internal representation, as the main principle governing search. Even with respect to decision criteria, the models start from the assumption that subjects adhere to externally prescribed costs and benefits. The determining factors include: (a) the rate at which a process generates external uncertainty; (b) the permissible error, or the accuracy with which the current value of a function should be read; (c) the limits which a process should not exceed; (d) the probability that a source shows a critical value, while the observer is attending another source; and, finally, (e) the pay-off matrix associated with missed critical events and the costs of making observations. The main problem for the normative models is that empirical evidence is not only scant, but also in conflict with the predictions of the models. This became clear with respect to Senders' models, but it also applies to Sheridan's m o d e l - general oversampling- and to Kvalseth (1979). The major reasons underlying these experimental failures may well relate to the assumptions of perfect memory, perfectly veridical and independent mental representations of the processes, perfect calculation of costs and benefits, and independent control of the displays. In an analogy to research on decision making (Hogarth, 1990), one might still defend the position that, in principle, the models provide a correct description of how humans search, but that the human elements are imperfect- show forgetting (Moray, Richard and Low, 1980), have an imperfect representation, or fail to make the correct calculations (Sheridan and Rouse, 1971). The much more drastic alternative is that human search strategies depend on principles that basically differ from statistical prescriptions. For instance, van Delft (1987) has suggested that, rather than considering representations of the individual processes, humans might simply search displays on the basis of some fixed sequences, perhaps modulated by relations between
Visual search
67
bandwidths. In support of this view, van Delft found that the frequency of fixating an individual display depended on whether the other displays were relatively slow or fast. The rate of search might depend on the highest bandwidth, so that a medium-bandwidth signal would be inspected more often when it is the slowest than when it is the fastest of the set. Recently, Donk (1994a, b) and Donk and Hagemeister (1994) carried out a series of studies aimed at testing some of the basic assumptions of conditional sampling. In one study (Donk and Hagemeister, 1994), subjects had a self-paced card sorting task in combination with monitoring a display for the occurrence of a critical reading. The rationale of this study was to test whether deviations from optimal inspection rate might be due to failures of memory, in the case of a slow signal, or of time estimation, in the case of a fast signal. This was done by varying selective memory interference through the similarity between the cards and the display, and by varying information load in card sorting, which is known to affect time estimates. The usual phenomenon of oversampling slow signals and undersampling fast signals was found. However, the variations in memory interference and information load did not affect the sampling rate, which casts some doubt on explanations in terms of failures of memory and time estimates. In the same study, it was found that for slow signals sampling was conditional on the distance from the signal to the critical zone at the previous reading, but this variable did not affect sampling fast signals. This was confirmed in another study (Donk, 1994b) where subjects monitored one display in combination with a tracking task. The results suggest that there is no homogeneous search strategy. Instead, sampling might be so frequent for the high-bandwidth signals that subjects no longer bother about the position of the last reading. In contrast, that position might be relevant for slow signals in view of longer intervals between readings. The results demonstrated that signal dynamics play at least a role in shaping a subject's inspection strategy. On the other hand, when subjects monitored four or six displays at the same time, Donk (1994a) also found that, irrespective of bandwidth, horizontal saccades were more frequent than diagonal ones. A contribution of a perceptual variable, relating to the arrangement of the displays, may not surprise, consistent as it is with reading habits. Yet, it is outside the realm of the normative models, which start from rationally controlled internal representations of the outside world rather than from immediate control by outside stimuli or from internal rules, that are independent of the environment.
9
S E A R C H A N D STRESS
It is a classic hypothesis that search strategies may change when people suffer from environmental or organic stresses such as prolonged performance, noise, drugs, sleep loss, or overload in dual-task performance. In particular some stresses would lead to funneling and others to widening of attention. 2 This hypothesis was explicitly stated by Drew (1940), while funneling was also mentioned by Bartlett (1953) as a promising criterion for fatigue. On closer inspection, however, the literature is far from consistent, which is at least partly due again to wide variations 2Mackworth's (1965) reference to tunnel vision as a consequence of visual noise is not included in this discussion, since these results can be readily explained by lateral interference.
68
A. F. Sanders and M. Donk
in search conditions. For example, the edge effect (see the section on 'Cognitive models') did not appear to change as a function of time-on-task, which seems to argue against funneling caused by fatigue (Baker, 1958; Colquhoun and Edwards, 1970; Schoonaard, Gould and Miller, 1973). On the other hand, Broadbent (1950) found in his 'twenty dials' test a tendency towards a greater neglect of peripheral signals as time progressed, while Sanders (1963) also reported that subjects inspected the periphery more frequently in the beginning than toward the end of a long work spell. In both cases the display constituted a head field, so that these instances of funneling might be limited to cases where the task asks for frequent head movements. Sanders and Reitsma (1982a, b) also found more pronounced effects of sleep loss on processing extremely peripheral signals in a study on visual orienting and in one on the functional visual field. In the last case, processing time increased as a function of time-on-task when the display constituted a head field but not when it constituted an eye field. However, these last results should be interpreted with care, since Sanders and Reitsma also found a stronger effect of sleep loss when the signals were successively presented without a need to shift the head. Hence, in this experiment, the effects of sleep loss may be on integrating successive percepts rather than on the size of the visual field. Effects of funneling have also been studied in dual-task studies, where subjects performed a combination of a central task, e.g. pursuit tracking, target classification, and a peripheral signal detection task. For instance, Michon and Kirk (1962) found that detection of peripheral signals depended on the load of the central task, and that eye fixations in the periphery were significantly less frequent towards the end of a work spell. In a classic study by Bursill (1958) the percentage of missed peripheral signals increased as a function of time-on-task, and in particular under hot and humid conditions. Similarly, Bahrick, Fitts and Rankin (1952) found that incentives improved central tracking performance at the cost of peripheral detection. In further studies, Hockey (1970a, b) found that loud noise, supposedly producing high arousal, had the effect of narrowing attention to the central tracking task, while sleep loss, producing low arousal, had the effect of leveling attentional selectivity. This is of course the opposite of the early suggestion that funneling would occur when one is fatigued. Again, Hockey (1970c) found evidence that his earlier results should not be interpreted as evidence for visual funneling but rather for a stronger or weaker bias towards one task. Finally, it should be noted that the Hockey results are not easily replicable (Forster and Grierson, 1978; Loeb and Jones, 1978) and have found little follow-up in recent years. Indeed, Hockey and Hamilton's (1983) comment that replicability might be hard in view of the strategic nature of the changes, representing a delicate balance of task priorities, does not inspire a more detailed inquiry into the matter. In all these studies eye movements were allowed. This was not the case in the work of Abernethy and Leibowitz (1971) and of Leibowitz and Appelle (1969) who found raised peripheral luminance thresholds when subjects performed a central task. In the same way, Ikeda and Takeuchi (1975) reported a decrease in the accuracy of localizing thresholds as foveal load increased. However, Holmes, Cohen, Haith and Morrison (1977) warned against an interpretation of such results in terms of funneling. They could equally well be a matter of general interference in that allocating attention to the periphery is less possible, as the central task needs more attentional resources and has priority. Funneling would require a more pronounced effect of dual-task load on more peripheral signals, while general
Visual search
69
interference would predict little dependence on visual angle, which is what the authors found. In contrast, Williams (1989) found a stronger effect on more peripheral stimuli with a combination of a central memory search task and identification of a parafoveal (maximally 5 ~ target during a brief presentation. However, the fact that he used only parafoveal stimuli casts some doubt on the interpretation, since funneling is commonly understood with respect to the further periphery. In a recent study, Van de Weijgert (1991, 1993) investigated the effect of foveal load on identifying peripheral stimuli with wider visual angles. She found an almost equal effect of foveal load on peripheral identification at different angles, suggesting suppression of allocating attention rather than funneling.
10
DISCUSSION
Visual search behavior has been described at both a structural and a functional level. While the structural approach is primarily concerned with 'what is processed during visual search', the functional approach is mainly interested in 'the determinants of visual search behavior and the resulting visual search pattern'. Either approach has its prospects and limits. A structural approach might be characterized as atomistic: sensory, anatomic and information processing limits are described without too much emphasis on their mutual interplay. This means that various search limits have been investigated in relative isolation. On the other hand, either theory, within its own confined area, enables accurate predictions and provides a rather elaborate account of factors influencing visual search. A functional approach is more holistic. Either theory within this tradition is concerned with the search process as a whole. In contrast to the structural approach, functional theories might not be considered as mutually complementary. Generally, search behavior has been described on a cognitive and a mathematical level. The major limit of a cognitive approach (see section on 'Cognitive models') consists of a lack of methods to test its premises appropriately. Neither schema nor scan path theory predicts eye movement behavior. It merely assumes a large variety of scan patterns, each of which reflects the operation of some cognitive model of the outside world. A cognitive model concerning scanning strategies certainly has a large external validity, but it lacks predictive power. A normative mathematical approach (see section on 'Optimal scanning and monitoring'), on the other hand, precisely predicts search or sampling behavior of the human observer. The main problem, however, is that empiric evidence deviates from the predictions of the models. Normative mathematical models neither explain nor describe human search behavior; they merely prescribe behavior. Functional approaches toward visual search should start from the accumulating knowledge about structural limits in perception and information processing. A large amount of knowledge is already available about structural constraints in search. A major task for future research consists of developing strategic functional principles, which follow from the structural constraints. Admittedly, this sounds like adhering to a largely bottom-up approach. However, it remains to be seen to what extent strategic principles are top-down determined or, in other words, are related to structural constraints.
70
A. F. Sanders and M. Donk
REFERENCES Abernethy, C. N. and Leibowitz, H. W. (1971). The effect of feedback on luminance thresholds for peripherally presented stimuli. Perception and Psychophysics, 10, 172-174. Allport, A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum. Antes, J. R. and Penland, J. G. (1981). Picture context effects on eye movements. In D. F. Fisher, R. A. Monty and J. W. Senders (Eds), Eye Movements: Cognition and Visual Perception. Hillsdale, NJ: Erlbaum. Bahrick, H. P., Fitts, P. M. and Rankin, R. E. (1952). Effect of incentives upon reactions to peripheral stimuli. Journal of Experimental Psychology, 44, 400-416. Baker, C. H. (1958). Attention to visual displays during a vigilance task: Biasing attention. British Journal of Psychology, 49, 279-288. Banks, W. P. and Prinzmetal, W. (1976). Configurational effects in visual information processing. Perception and Psychophysics, 19, 361-367. Barbur, J. L., Forsyth, P. M. and Woodings, D. S. (1993). Eye movements and search performance. In D. Brogan, A. Gale and K. Carr (Eds), Visual Search II (pp. 253-264). London: Taylor and Francis. Bartlett, F. C. (1953). Psychological criteria of fatigue. In W. F. Floyd and A. T. Welford (Eds), Symposium on Fatigue. London: Klewis. Bellamy, L. J. (1984). The application of visual lobe measurement to visual perception. In A.G. Gale and F. Johnson (Eds), Theoretical and Applied Aspects of Eye Movement Research. Amsterdam: Elsevier. Bellamy, L. J. and Courtney, A. J. (1981). Development of a search task for the measurement of visual acuity. Ergonomics, 24, 497-509. Biederman, I. (1972). Perceiving real-world scenes. Science, 77, 77-80. Biederman, I., Glass, A. L. and Stacey, E. W. (1973). Searching for objects in real-world scenes. Journal of Experimental Psychology, 97, 22-27. Bloomfield, J. R. (1972). Visual search in complex fields: Size differences between target discs and surrounding disks. Human Factors, 14, 139-148. Bloomfield, J. R. and Howarth, C. I. (1969). Testing visual search theory. In H. W. Leibowitz (Ed.), Image Evaluation. Proceedings of NATO Advisory Group on Human Factors. Boer, L. C. and van de Weijgert, E. C. M. (1988). Eye movements and stages of processing. Acta Psychologica, 67, 3-18. Bouma, H. (1978). Visual search in reading: Eye movements and the functional visual field. In J. Requin (Ed.), Attention and Performance, vol. 7. Hillsdale, NJ : Erlbaum. Boynton, R. M., Elsworth, C. and Palmer, R. (1958). Laboratory studies pertaining to visual air reconnaissance. Part 3. Technical Report 55-304, AML-WADC, Wright-Patterson, Ohio. Brand, J. (1971). Classification without identification in visual search. Quarterly Journal of Experimental Psychology, 23, 178-186. Broadbent, D. E. (1950). The twenty-dials test under quiet conditions. APU Report. Broadbent, D. E. (1958). Perception and Communication. London: Pergamon. Broadbent, D. E. (1971). Decision and Stress. London: Academic Press. Broadbent, D. E. (1982). Task combination and selective intake of information. Acta Psychologica, 50, 253-290. Bursill, A. E. (1958). The restriction of peripheral vision during exposure to hot and humid conditions. Quarterly Journal of Experimental Psychology, 10, 113-130. Cahill, M. C. and Carter, R. C. (1976). Color code size for searching displays of different density. Human Factors, 18, 273-280. Carbonell, J. R. (1966). A queuing model of visual sampling: Experimental validation. IEEE Transactions on Man Machine Systems, MMS-9, 82-87.
Visual search
71
Cavanagh, J. P. and Chase, W. G. (1971). The equivalence of target and non-target processing in visual search. Perception and Psychophysics, 9, 493-495. Chalkin, J. D., Corbin, H. H. and Volkmann, J. (1962). Mapping a field of short-time visual search. Science, 138, 1327-1328. Chase, W. G. (1986). Visual information processing. In K. R. Boff, L. Kaufman and J. P. Thomas (Eds), Handbook of Perception and Human Performance, Vol. 2, chapter 28. New York: Wiley. Clare, J. N. and Sinclair, M. A. (1979). Search and the Human Observer. London: Taylor and Francis. Colquhoun, P. and Edwards, J. A. (1970). Practice effects on a visual vigilance task with and without search. Human Factors, 12, 537-546. Corbin, H., Carter, J., Reese, E. P. and Volkmann, J. (1958). Experiments on Visual Search 1956-1957. Psychological Research Unit, Mount Holyoke College. Corcoran, D. W. and Jackson, A. (1979). Flexibility in the choice of distinctive features in visual search with random cue blocked designs. Perception, 6, 629-633. Courtney, A. J. (1984). A search task to assess visual lobe size. Human Factors, 23, 289-298. Courtney, A. J. and Chan, H. S. (1985). Simple measures of visual lobe size and search performance. Ergonomics, 28, 1319-1332. Courtney, A. J. and Chan, H. S. (1986). Visual lobe dimensions and search performance for targets on a competing homogeneous background. Perception and Psychophysics, 40, 39-44. Crovitz, H. F. and Davies, W. (1962). Tendencies to eye movement and perceptual accuracy. Journal of Experimental Psychology, 63, 495-498. Deutsch, J. A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Donk, M. (1994a). Human monitoring behavior in a multiple instrument setting: Independent sampling, sequential sampling or arrangement-dependent sampling. Acta Psychologica, 86, 31-55. Donk, M. (1994b). The effect of secondary task load on visual sampling behavior. Ergonomics, 37, 1089-1096. Donk, M. and Hagemeister, C. (1994). Visual instrument monitoring as affected by simultaneous self-paced card sorting. IEEE Transactions on Systems, Man and Cybernetics, 24, 926-931. Drew, G. C. (1940). Mental Fatigue. Flying Personnel Research Committee, no. 227. Duncan, J. (1980). The locus of interference in the perception of simultaneous stimuli. Psychological Review, 87(3), 272-300. Duncan, J. (1983). Category effects in visual search: A failure to replicate the oh-zero phenomenon. Perception and Psychophysics, 34, 221-232. Duncan, J. and Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 98(3), 433-458. Ellis, S. H. and Chase, W. G. (1971). Parallel processing in item recognition. Perception and Psychophysics, 10, 379-384. Engel, F. L. (1971). Visual conspicuity, directed attention, and retinal locus. Vision Research, 11, 563-576. Engel, F. L. (1976). Visual conspicuity as an external determinant of eye movements and selective attention. Thesis. TH Eindhoven. Engel, F. L. (1977). Visual conspicuity, visual search and fixation tendencies of the eye. Vision Research, 17, 95-108. Enoch, J. M. (1959). Effect of the size of a complex display upon visual search. Journal of the Optical Society of America, 49, 280-286. Erickson, R. A. (1964). Relation between visual search time and peripheral visual acuity. Human Factors, 6, 165-178. Eriksen, B. A. and Eriksen, C. W. (1974). Effects of noise letters on the identification of a target letter in a non-search task. Perception and Psychophysics, 16, 143-149.
72
A. F. Sanders and M. Donk
Eriksen, C. W. (1990). Attentional search of the visual field. In D. Brogan (Ed.), Visual Search (pp. 3-19). London: Taylor and Francis. Eriksen, C. W. and Hoffman, J. E. (1972). Some characteristics of selective attention in visual perception determined by vocal reaction time. Perception and Psychophysics, 11, 169-171. Eriksen, C. W. and Hoffman, J. E. (1973). The extent of processing of noise elements during encoding of visual displays. Perception and Psychophysics, 14, 155-160. Farmer, E. W. and Taylor, R. M. (1980). Visual search through color displays: Effects of target-background similarity and background uniformity. Perception and Psychophysics, 27, 267-272. Farrell, B. (1985). 'Same-different' judgments: A review of current controversies in perceptual comparisons. Psychological Bulletin, 98, 419-456. Findlay, J. M. (1980). The visual stimulus for saccadic eye movements in human observers. Perception, 9, 7-21. Fischer, B. and Breitmeyer, B. (1987). Mechanisms of visual attention revealed by saccadic eye movements. Neuropsychologia, 25, 73-83. Fitts, P. M., Jones, R. E. and Milton, J. L. (1950). Eye movements of aircraft pilots during instrument landing approaches. Aeronautical Engineering Review, 9, 1-5. Forster, P. M. and Grierson, A. T. (1978). Noise and attentional selectivity: A reproducible phenomenon? British Journal of Psychology, 69, 482-498. Francolini, C. M. and Egeth, H. A. (1979). Perceptual selectivity is task-dependent: The pop-out effect poops out. Perception and Psychophysics, 25, 99-110. Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108, 316-355. Gai, E. G. and Curry, R. E. (1976). A model of the human observer in failure detection tasks. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6, 85-91. Gould, J. D. (1967). Pattern recognition and eye movement parameters. Perception and Psychophysics, 6, 311-320. Gould, J. D. and Dill, A. (1969). Eye movement patterns and pattern recognition. Perception and Psychophysics, 6, 311-320. Green, B. F. and Anderson, L. K. (1956). Color coding in a visual search task. Journal of Experimental Psychology, 51, 19-24. Hansen, W. and Sanders, A. F. (1988). On the output of encoding during stimulus fixation. Acta Psychologica, 69, 95-107. Hendersen, L. and Chard, J. (1971). Semantic effects in visual word detection with visual similarity controlled. Perception and Psychophysics, 23, 290-298. Henderson, J. M. (1993). Visual attention and saccadic eye movements. In G. d'Ydewalle and J. Van Rensbergen (Eds), Perception and Cognition (pp. 37-50). Amsterdam: Elsevier Science. Hockey, G. R. J. (1970a). Effect of loud noise on attentional selectivity. Quarterly Journal of Experimental Psychology, 22, 28-36. Hockey, G. R. J. (1970b). Signal probability and spatial location as possible bases for increased selectivity in noise. Quarterly Journal of Experimental Psychology, 22, 37-42. Hockey, G. R. J. (1970c). Changes in attention allocation in a multicomponent task under loss of sleep. British Journal of Psychology, 61, 473-480. Hockey, G. R. J. and Hamilton, P. (1983). The cognitive patterning of stress-states. In G. R. J. Hockey (Ed.), Stress and Fatigue in Human Performance. New York: Wiley. Hogarth, R. M. (1990). Judgment and Choice. New York: Wiley. Holmes, D. L., Cohen, K. M., Heith, M. M. and Morrison, F. J. (1977). Peripheral visual processing. Perception and Psychophysics, 22, 571-577. Houtmans, M. J. M. and Sanders, A. F. (1984). Perception of signals presented in the periphery of the visual field. Acta Psychologica, 55, 143-155. Howard, C. I. and Bloomfield, J. R. (1968). Towards a theory of visual search. In AGARD Conference Proceedings (no. 41), Brussels.
Visual search
73
Howard, C. I. and Bloomfield, J. R. (1969). A rational equation for predicting search times in simple inspection tasks. Psychonomic Science, 17, 225-226. Hughes, P. K. and Cole, B. L. (1986). Can the conspicuity of objects be predicted from laboratory experiments? Ergonomics, 29, 1097-1111. Humphreys, G. W. and Miiller, H. J. (1993). Search via Recursive Rejection (SERR): A connectionist model of visual search. Cognitive Psychology, 25, 43-110. Humphreys, G. W., Quinlan, P. T. and Riddoch, M. J. (1989). Grouping processes visual search: Effects with single- and combined-feature targets. Journal of Experimental Psychology: General, 118, 258-279. Humphreys, G. W., Riddoch, M. J. and Quinlan, P. T. (1985). Interactive processes in perceptual organisation: Evidence from visual agnosia. In M. I. Posner and O. S. M. Marin (Eds), Attention and Performance 11. Hillsdale, NJ: Erlbaum. Ikeda, M. and Takeuchi, T. (1975). Influence of foveal load on the functional visual field. Perception and Psychophysics, 18, 255-260. Jacobs, A. M. (1986). Eye movement control in visual search: How direct is visual span control? Perception and Psychophysics, 39, 47-58. Jacobs, A. M. (1987). Towards a model of eye movement control in visual search. In J. K. O'Regan and A. Levy-Schoen (Eds), Eye Movements: From Physiology to Cognition. Amsterdam: North Holland. Johnston, D. M. (1965). Search performance as a functon of peripheral acuity. Human Factors, 7, 528-535. Jonides, J. and Gleitman, H. (1972). A conceptual category effect in visual search: O as a letter or a digit. Perception and Psychophysics, 12, 457-460. Jordan, T. C. and Rabbitt, P. M. (1977). Response times to stimuli of increasing complexity as a function of ageing. British Journal of Psychology, 68, 289-301. Kahneman, D. and Treisman, A. M. (1984). Changing views of attention and automaticity. In R. Parasuraman and R. Davies (Eds), Varieties of Attention. New York: Academic Press. Klein, R. and Farrell, M. (1989). Search performance without eye movements. Perception and Psychophysics, 46, 476-482. Kraiss, K. F. and Knaeeper, A. (1982). Using visual lobe area measurements to predict visual search performance. Human Factors, 24, 673-682. Krendel, E. S. and Wodinsky, J. (1960). Visual search in an unstructured visual field. Journal of the Optical Society of America, 50, 562-568. Kvalseth, T. (1978). Human and bayesian information processing during probabilistic inference tasks. IEEE Transactions on Systems, Man and Cybernetics, SMC-8, 224-229. Kvalseth, T. (1979). A decision theoretic model of the sampling behavior of the human process monitor. Ergonomics, 21, 671-686. Leibowitz, H. W. and Appelle, S. (1969). The effect of a central task on luminance thresholds for peripherally presented stimuli. Human Factors, 11, 387-392. Levy-Schoen, A. (1974). Le champ d'activit6 du regard: Donn6es experimentales. L'Ann& Psychologique, 74, 43-66. Loeb, M. and Jones, P. D. (1978). Noise exposure, monitoring and tracking performance as a function of signal bias and tak priorit. Ergonomics, 21, 265-272. Loftus, G. R. and Mackworth, N. H. (1978). Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology: Human Perception and Performance, 4, 565-572. Los, S. A. (1994). Procedural differences in processing intact and degraded stimuli. Memory and Cognition, 22, 145-156. Mackworth, N. H. (1965). Visual noise causes tunnel vision. Psychonomic Science, 3, 67-68. Mackworth, N. H. (1976). Stimulus density limits in the useful field of view. In R. A. Monty and J. W. Senders (Eds), Eye Movements and Psychological Processes. Hillsdale, NJ: Erlbaum. Mackworth, N. H. (1981). Stimulus density limits the useful field of view. In D. F. Fisher, R. A. Monty and J. W. Senders (Eds), Cognition and Visual Perception. Hillsdale, NJ: Erlbaum.
74
A. F. Sanders and M. Donk
Mackworth, N. H., Kaplan, I. T. and Metlay, W. (1964). Eye movements during vigilance. Perceptual and Motor Skills, 20, 549-554. Mackworth, N. H. and Morandi, A. J. (1967). The gaze selects informative details within pictures. Perception and Psychophysics, 2, 547-552. Marken, R. and Patterson, J. (1979). Effects of sequence and variety of irrelevant items in visual search. Perceptual and Motor Skills, 49, 315-318. Matin, E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin, 81, 899-917. Mayfrank, L., Mobashery, M., Kimmig, H. and Fischer, B. (1986). The role of fixation and visual attention on the occurrence of express saccades in man. European Journal of Psychiatry and Neurological Science, 235, 269-275. Megaw, E. D. and Richardson, J. (1979). Target uncertainty and visual scanning strategies. Human Factors, 21, 303-315. Metzger, W. (1954). Gesetze des Sehens. Darmstadt: Huber. Michon, J. A. and Kirk, N. S. (1962). Eye movements in radar watchkeeping. IZF-report 17. Miller, J. O. (1988). Discrete and continuous models of human information processing: Theoretical distinctions and empirical results. Acta Psychologica, 67, 191-257. Monk, T. H. (1984). Search. In J. S. Warm (Ed.), Sustained Attention in Human Performance. New York: Wiley. Moraglia, G. (1989). Display organisation and the detection of horizontal line segments. Perception and Psychophysics, 45, 265-272. Moraglia, G., Maloney, K. P., Fekete, E. M. and Albasi, K. (1989). Visual search along the color dimension. Canadian Journal of Psychology, 43, 1-12. Morawski, T. B., Drury, C. G. and Karwan, M. H. (1980). Predicting search performance for multiple targets. Human Factors, 22, 707-718. Moray, N., Richards, M. and Low, J. (1980). The Behaviour of the Fighter Controllers, Technical Report. London: Ministry of Defence. Nattkemper, D. and Prinz, W. (1984). Costs and benefits of redundancy in visual search. In A. G. Johnson and F. Johnson (Eds), Theoretical and Applied Aspects of Eye Movement Research. Amsterdam: North Holland. Nattkemper, D. and Prinz, W. (1990). Local and global amplitude and fixation duration in continuous visual search. In R. Groner, G. d'Ydewalle and R. Parham (Eds), From Eye to Mind: Information Acquisition in Perception, Search and Reading. Amsterdam: NorthHolland. Neisser, U. (1963). Decision time without reaction time: Experiments on visual scanning. American Journal of Psychology, 76, 376-395. Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts. Neumann, O. (1987). Beyond capacity: A functional view of attention. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum. Nissen, M. J., Posner, M. I. and Snyder, C. R. R. (1978). Relationships between attention shifts and saccadic eye movements. Paper to the Psychonomic Society. Noble, M. E. and Sanders, A. F. (1981). Searching for traffic signals while engaged in compensatory tracking. Human Factors, 22, 89-102. Noton, D. and Stark, L. (1971). Eye movements and visual perception. Scientific American, 224, 34-43. Parasuraman, R. and Davies, D. R. (1984). Varieties of Attention. Orlando, FL: Academic Press. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Posner, M. I. and Boies, S. J. (1971). Components of attention. Psychological Review, 78,391-408. Posner, M. I., Nissen, M. J. and Ogden, W. C. (1978). Attended and unattended processing modes: The role of set for spatial location. In H. L. Pick and J. J. Salzman (Eds), Modes of Perceiving and Processing Information. Hillsdale, NJ: Erlbaum.
Visual search
75
Posner, M. I. and Snyder, C. R. R. (1975). Facilitation and inhibition in the processing of signals. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance 5. New York: Academic Press. Posner, M. I., Snyder, R. R. and Davidson, D. J. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109, 160-174. Prinz, W. (1979). Integration of information in visual search. Quarterly Journal of Experimental Psychology, 31, 287-304. Prinz, W. (1983). Asymmetrical control areas in continuous visual search. In R. Groner, C. Menz, D. F. Fisher and R. A. Monty (Eds), Eye Movements and Psychological Functions. International Views. Hillsdale, NJ: Erlbaum. Prinz, W. (1984). Attention and sensitivity in visual search. Psychological Research, 45, 355-366. Prinz, W. (1987). Continuous selection. Psychological Research, 48, 231-238. Prinz, W. and Kehrer, L. (1982). Recording detection distances in continuous visual search. In R. Groner and P. Fraisse (Eds), Cognition and Eye Movements. Amsterdam: North Holland. Prinz, W., Meinecke, C. and Hielscher, M. (1987). Effects of stimulus degradation on category search. Acta Psychologica, 64, 187-206. Prinz, W. and Nattkemper, D. (1986). Effects of secondary task on search performance. Psychological Research, 48, 47-52. Prinz, W., Tweer, R. and Feige, R. (1974). Context control of search behaviour: Evidence from a hurdling technique. Acta Psychologica, 38, 72-80. Rabbitt, P. M. A. (1964). Ignoring irrelevant information. British Journal of Psychology, 55, 403-414. Rabbitt, P. M. A. (1967). Learning to ignore irrelevant information. American Journal of Psychology, 80, 1-13. Rabbitt, P. M. A. (1981). Visual selective attention. In C. R. Puff (Ed.), Handbook of Research Methods in Human Memory and Cognition. New York: Academic Press. Rabbitt, P. M. A. (1984). The control of attention in visual search. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention. Orlando, FL: Academic Press. Rabbitt, P. M. A., Cumming, G. and Vyas, S. M. (1977). Modulation of selective attention by sequential effects in visual search tasks. Quarterly Journal of Experimental Psychology, 31, 305-317. Remington, R. W. (1980). Attention and saccadic eye movements. Journal of Experimental Psychology: Human Perception and Performance, 6, 726-744. Sanders, A. F. (1963). The Selective Process in the Functional Visual Field. Assen: van Gorcum. Sanders, A. F. (1970). Some aspects of the selective process in the functional visual field. Ergonomics, 13, 101-117. Sanders, A. F. (1990). Issues and trends in the debate on discrete vs. continuous processing of information. Acta Psychologica, 74, 123-167. Sanders, A. F. (1993). Processing information in the functional visual field. In G. d'Ydewalle and J. van Rensbergen (Eds), Perception and Cognition (pp. 3-22). Amsterdam: Elsevier Science. Sanders, A. F. and Br6ck, R. (1991). The effect of presentation time on the size of the visual lobe. Bulletin of the Psychonomic Society, 29, 206-208. Sanders, A. F. and Houtmans, M. J. M. (1984). The functional visual field revisited. In A. J. van Doorn, W. A. van der Grind and J. J. Koenderink (Eds), Limits in Perception. Utrecht: VNU Science. Sanders, A. F. and Houtmans, M. J. M. (1985a). Perceptual processing modes in the functional visual field. Acta Psychologica, 58, 251-262. Sanders, A. F. and Houtmans, M. J. M. (1985b). There is no central stimulus encoding during saccadic eye shifts: A case against general parallel processing models. Acta Psychologica, 60, 323-338.
76
A. F. Sanders and M. Donk
Sanders, A. F. and Rath, A. (1991). Perceptual processing and speed-accuracy trade-off. Acta Psychologica, 77, 275-291. Sanders, A. F. and Reitsma, W. D. (1982a). Lack of sleep and covert orienting of attention. Acta Psychologica, 52, 137-145. Sanders, A. F. and Reitsma, W. D. (1982b). The effect of sleep-loss on processing information in the functional visual field. Acta Psychologica, 51, 149-162. Sanders, M. S. and McCormick, E. J. (1987). Human Factors in Engineering and Design. New York: McGraw Hill. Saslow, M. G. (1967). Effects of components of displacement-step stimuli upon latency for saccadic eye movements. Journal of the Optical Society of America, 57, 1024-1029. Schneider, W., Dumais, S. T. and Shiffrin, R. M. (1984). Automatic and controlled processing and attention. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention. Orlando, FL: Academic Press. Schneider, W. and Shiffrin, R. M. (1977). Controlled and automatic human information processing. I: Detection search and attention. Psychological Review, 84, 1-66. Schoonaard, J. W., Gould, J. D. and Miller, L. A. (1973). Studies of visual inspection. Ergonomics, 16, 365-379. Selfridge, O. G. and Neisser, U. (1959). Pattern recognition by machine. Scientific American, 203, 60-68. Senders, J. W. (1955). Man's capacity to use information from complex displays. In H. Quastler (Ed.), Information Theory in Psychology. Glencoe: Free Press. Senders, J. W. (1983). Visual Sampling Processes. Hillsdale, NJ: Erlbaum. Shepherd, M., Findlay, J. M. and Hockey, R. J. (1986). The relationship between eye movements and spatial attention. Quarterly Journal of Experimental Psychology, 38, 475-491. Sheridan, T. B. (1970). On how often the supervisor should sample. IEEE Transactions on Systems, Science and Cybernetics, SSC-6, 140-145. Sheridan, T. B. and Rouse, W. B. (1971). Supervisory sampling and control: Sources of suboptimality in a prediction task. NASA Annual Conference on Manual Control. Shiffrin, R. M. and Schneider, W. (1977). Controlled and automatic human information processing. II: Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127-190. Smets, G. J. F. and Stappers, P. J. (1990). Do invariants of features determine the conspicuity of forms? In D. Brogan (Ed.), Visual Search. London: Taylor and Francis. Sokolov, E. N. (1963). Perception and the Conditioned Reflex. New York: Pergamon. Sperling, G., Budiansky, Spivak, J. G. and Johnson, M. C. (1971). Extremely rapid visual search: The maximum rate of scanning letters for the presence of a numeral. Science, 174, 307-311. Stark, L. and Ellis, S. R. (1981). Scan path revisited: Cognitive models direct active looking. In D. F. Fisher, R. A. Monty and J. W. Senders (Eds), Eye Movements: Cognition and Visual Perception. Hillsdale, NJ: Erlbaum. Stark, L., Yamashita, I., Tharp, G. and Ngo, H. X. (1993). Keynote lecture: Search patterns and search paths in human visual search. In D. Brogan, A. Gale and K. Carr (Eds), Visual Search II (pp. 37-58). London: Taylor and Francis. Stein, W. and Wewerinke, P. W. (1983). Human display monitoring and failure detection: Control theoretic models and experiments. Automatica, 19, 711-718. Teichner, W. H. and Mocharnuk, J. B. (1979). Visual search for complex targets. Human Factors, 21, 259-275. Treisman, A. M. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97-136. Treisman, A. M. and Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16(3), 459-478. Van Delft, J. H. (1987). The development of a response sequence: A new description of human sampling behavior with multiple independent sources of information. Proceedings of the Human Factors Society, 31st Annual Meeting, pp. 151-155.
Visual search
77
Van der Heijden, A. H. C. (1987). Central selection in vision. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action (pp. 421-446). Hillsdale, NJ: Erlbaum. Van der Heijden, A. H. C. (1992). Selective Attention in Vision. London: Routledge. Van de Weijgert, E. C. M. (1991). Foveal load and peripheral task performance: Tunnel vision or general interference. The Second International Conference on Visual Search, Durham, UK, Sept. 1990. Van de Weijgert, E. C. M. (1993). Foveal load and peripheral task performance: Tunnel vision or general interference. In D. Brogan, A. Gale and K. Carr (Eds), Visual search II (pp. 341-348). London: Taylor and Francis. Van Duren, L. L. and Sanders, A. F. (1992). The output code of a visual fixation. Bulletin of the Psychonomic Society, 30{4}, 305-308. Widdel, H. (1983). A method for measuring the visual lobe area. In R. Groner, C. Menz, D. F. Fisher and R. A. Monty (Eds), Eye Movements and Psychological Functions: International Views. Hillsdale, NJ: Erlbaum. Williams, L. G. (1966). The effect of target specification on objects fixated during visual search. Perception and Psychophysics, 1, 315-318. Williams, L. G. (1967). The effect of target specification on objects fixated during visual search. Acta Psychologica, 27, 355-360. Williams, L. G. (1989). Foveal load affects the functional field of view. Human Performance, 2, 1-28.
Willows, D. M. and McKinnon, G. E. (1973). Selective reading attention to the 'unattended' lines. Canadian Journal of Psychology, 27, 292-304. Wolfe, J. M., Cave, K. R. and Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual attention. Journal of Experimental Psychology: Human Perception and Performance, 15(3), 419-433. Yarbus, A. L. (1967). Eye Movements and Vision. New York: Plenum Press.
Chapter 3 Auditory Attention G. ten Hoopen Leiden University, The Netherlands
In this chapter I shall present three topics, one old topic of auditory attention and two new ones. The old o n e - auditory selective attention, often treated under the heading 'cocktail party p r o b l e m ' - w i l l be surveyed globally. There are many texts available that deal extensively with the matter. Some relevant titles will be mentioned in this section and in the epilogue section. However, since I could not find overviews in which the more recent studies on auditory selection of Johnston and coworkers were included, I will treat their work in more detail. Furthermore will I discuss doubts that could be cast upon some well-known studies that expressed much optimism about the fate of unattended messages. Moreover I will attempt to advocate that there is an inherent confounding of attentional selection features in the dichotic paradigm. Although it is mostly reported that location per se is a good selector, it is probably only so in combination with other features such as frequency or timbre. In the section on one of the new topics, 'Streaming, attention and auditory illusions', this latter suspicion will be discussed again. I decided to include the topic of streaming because it is often neglected in surveys of auditory attention. This section is therefore more specific and technical than are most sections in handbooks. Nevertheless one can get only a partial glimpse there of this fascinating and huge field of recent research, which is concerned with how the listener organizes the auditory environment. In the epilogue section of this chapter I shall mention the relevant references, since neither of them carries the word attention in its title. Since I surmised that the classical student of attention might not be well acquainted with the streaming literature, I tried to sketch the commonalities between the old cocktail party studies and the more recent streaming studies as far as possible. A far more dangerous endeavor was to write the middle section, embraced by the 'cocktail parties' and the 'streams'. Relatively few studies in the literature were devoted to the question of whether temporal allocation of attention to sound signals exists or not. Hence this topic is often missing in surveys on auditory attention. Still, 'set' is a legitimate candidate that should also be covered by the term attention. Moray (1969) already mentioned set in his list of six categories of attention: mental concentration, vigilance, selective attention, search, activation and set. The section on set ('attention in single auditory streams') is also a little bit more technical than could be expected from surveys. Handbook of Perception and Action, Volume 3
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
ISBN 0-12-516163-8
79
80
G. ten Hoopen
Thus, this chapter presents three aspects of attention: (1) attention as selection (the cocktail party studies); (2) attention as set (the temporal allocation studies); and (3) attention as organizing the auditory environment (the streaming and auditory illusion studies).
AUDITORY ATTENTION: PARTY SOLUTION
THE COCKTAIL
An important ability of the human listener is to follow one speaker in the midst of others. Although this faculty is often discussed under the heading 'cocktail party problem', it may be better to speak of 'cocktail party solution', since it is the machine rather than the person that is having a hard time disentangling concurrent speakers. Research on auditory attention started in the 1950s to get a better understanding of how humans, faced with a mixture of auditory (and visual) inputs, manage to pay attention to one message (focused attention) or to keep track of more messages at the same time (divided attention). The choice of the terms focused and divided attention seems a little unlucky, since it might erroneously be suggested that the listener can indeed process the auditory environment either way. The terms 'focused' and 'divided' only describe the instructions of the experimenter. Cherry's (1953) experiments are the classical example of studying the cocktail party solution. In his first set of experiments, two readings by the same speaker (messages A and B) were recorded on audiotape in a superimposed way, i.e. in 'mono'. The task for the listener was to separate message A from B and dictate A as well as possible to the experimenter who recorded the response. The task of verbally repeating a message word by word or phrase by phrase is called 'shadowing'. Since nearly all separation cues that are normally present at a real drinking party (binaural cues, timbre cues, loudness cues) were stripped off, the listener could only with great difficulty manage to shadow message A. Syntactic features such as syllable and word order, and suprasegmental features such as stress pattern, which still differed between the messages, enabled the listener to achieve separation, but only after listening to the record over and over again. The shadowing task became a lot easier when A and B were channeled to different ears (Cherry's second set of tests), even though the speaker of the messages was the same. This suggests that the 'ear' is a good separator. However, it should be emphasized that, although many studies used headphones, the same results are obtained when A and B are played back over different loudspeakers. In the latter case both ears receive both messages. Hence, perceived location, rather than the ear, is the selection principle, a fact already mentioned by Broadbent (1958). In the dichotic paradigm, which is of course more convenient in many noisy university laboratories, the lateralization of the messages is 180 ~ Spieth, Curtis and Webster (1954) demonstrated that selection of a message in the localization paradigm was already possible when the angular separation between the speakers was 10-20 ~. The same authors showed that listeners could also select between messages when one message was routed via a low-pass filter and the other one via a high-pass filter. Thus, frequency bands can also serve as a basis for selection. A very trivial cause of the pervading dichotic paradigm should not be overlooked: two-channel
Auditory attention
81
tape recorders could and still can be afforded financially much easier by university researchers than bandpass filters. Auditory environment, also in air traffic control towers, however, is not dimensioned only by location but also by frequency and time. In the remainder of this chapter I will advocate that these latter dimensions could be better selectors than location. A methodological problem in using concurrent speech messages in the dichotic shadowing task is that there is no adequate control over the 'dichotic quality' of the stimulus. According to Tartter (1988): 'The dichotic listening task involves the simultaneous or near-simultaneous presentation of two different stimuli, one to each ear.' (p. 283). It should be obvious that this criterion is often violated when using spoken messages. Relatively long silences in one message might occur at moments the other one comprises signals and vice versa, allowing the listener to make attention shifts. Better control over the dichotic situation is achieved by the split-span paradigm, in which different words are more or less synchronized in time, and routed to different ears. Broadbent (1954), for instance, presented three digits to one ear and three digits to the other ear in a dichotic manner (split-span paradigm) and found that listeners preferred to order the digits by ear rather than by time when recalling them. On the basis of these results and those of the shadowing studies, Broadbent proposed his now classical 'filter model'. Messages that are presented at the same time enter a sensory buffer in parallel. One of the messages is routed through a filter on the basis of physical features, the other ones residing in the sensory buffer. A limited-capacity mechanism (the so-called r-system), beyond the filter, further processes the message. See Broadbent (1958) for the original description and textbooks such as those of Massaro (1975), Moray (1969) and Norman (1976). As to many initial models, many amendments were also made to Broadbent's filter model. At least two experiments convincingly demonstrated that 'channels' in Broadbent's buffer could be organized not only along physical stimulus characteristics, but also according to meaning. Treisman (1960) implemented a clever variation on the dichotic shadowing task. Her subjects had to shadow the right ear message, while ignoring the left ear message. At unexpected moments, however, the messages were reversed between ears, whereupon subjects continued shadowing the 'primary' message, only switching back to the right ear after a couple of seconds. This result is evidence that selection was based not solely on location but also on content. Also devastating to the notion of genuine physical separation of messages was the well-known experiment of two undergraduate students, Gray and Wedderburn (1960). Applying exactly the same split-span paradigm as Broadbent did, they presented three digits alternating between ears and three words forming a sentence alternating in antiphase. One ear received, for instance, the sequence ' w h o ' - ' 6 ' 'there', and the other ear received ' 4 ' - ' g o e s ' - ' 1 ' in synchrony. The order of report in the recall task was not by ear, as predicted by physical separation, but by meaning: 'who goes there' and '4-6-1'. In line with Broadbent's model, the shadowing studies reported mostly that no meaning was processed of the message to be ignored. However, not being aware of meaning does not necessarily imply that a stimulus is not processed for meaning. Three kinds of studies seem to shed light on that problem: recall, conditioning and reaction time studies.
82
G. ten Hoopen
Norman (1969) tested the fate of the secondary message in the following way. During the shadowing task subjects were unexpectedly interrupted by the experimenter for an immediate recall test. It turned out that also the last few words of the secondary message were recognized. This can be taken as evidence that the secondary message also accessed the e-system, but that it was not stored in long-term memory. An alternate interpretation is that the filter might have switched quickly to the contents of the other physical channel in the buffer. In that view, Broadbent's model can cope with such results. I will, therefore, discuss other paradigms that attempted to trace the fate of the secondary message. Corteen and Wood (1972) shock-conditioned their subjects to names of cities before having them participate in a dichotic shadowing experiment. On the channel to be ignored, the conditioned city names as well as other city names appeared periodically. The authors reported that, although the subjects claimed not to be aware of city names, they showed a significant galvanic skin response (GSR), not only to the shock-conditioned cities but even to non-conditioned city names. Corteen and Wood stated that their results conform to the position held by Deutsch and Deutsch (1963). Their model also posits that non-target messages do achieve full perceptual analysis. Corteen and Dunn (1974) replicated these findings and concluded that there can be some semantic processing without awareness. Similar results have been reported by yon Wright, Anderson and Stenman (1975). They found an appreciable GSR to conditioned words in the secondary message and a smaller but noticeable GSR to homonyms and synonyms. Although these conditioning studies seem to offer good convergent evidence refuting early selection notions, they should be taken with some caution: Wardlaw and Kroll (1976), in a meticulous replication of the Corteen studies, failed to find differential GSRs. It is a salient fact that, although these authors communicated with Corteen, they were unable to obtain his original audiotapes for a replication in their own laboratory. Lewis (1970) performed two experiments in which subjects were presented with two synchronized lists of words, one to each ear, and were required to shadow one list. (The mean synchronization error between word pairs was 25 ms, therefore the messages obeyed the criterion 'dichotic' rather well.) Verbal reaction time (RT), that is, the time elapsing from the onset of the stimulus word to the onset of the pronunciation of it, was registered. Some of the words in the other ear were semantically related to their counterparts in the list to be shadowed. The remarkable result was that semantically related words in the ear to be ignored nevertheless affected the verbal RT to their counterparts in the shadowed list. Reaction time to words, which were accompanied by synonyms in the other ear, was increased from 699 ms (base rate of unrelated words) to 726 ms, whereas RT decreased to 643 ms when antonyms arrived at the ear to be rejected. Lewis' conclusion was that: 'unattended messages are processed at a semantic level.' (Lewis, 1970, p. 228). Unfortunately Lewis did not report an analysis of the interaction between semantic relationship and ear to be shadowed, although half of his subjects received the primary message in the left ear, and the other half in the right ear. Kimura (1961), who applied Broadbent's split-span procedure, reported a right ear advantage (REA) for verbal material. Since then, hundreds of studies in the field of neuropsychology have used the dichotic listening procedure as a non-invasive test for establishing language lateralization, and the REA appears to be a robust effect (Bryden, 1988). It is an intriguing question whether the semantic interfering effects
Auditory attention
83
on the RT in Lewis' study differed depending on the ear to be ignored. One could hypothesize that, when the semantic counterparts (to be ignored) were presented to the right ear, they could affect shadowing RT much more than when presented to the left ear. Such an analysis would have refined the model of auditory selective attention. Lewis posited that his results supported the Deutsch and Deutsch (1963) model, rather than Treisman and Geffen's (1967) perceptual decrement model. It might not be surprising that, at the battlefield of models, Treisman, Squire and Green (1974) replicated Lewis' study. The authors reported that the Lewis effect dissipated in the course of the dichotic list, i.e. the semantic effect of a non-target word on its dichotic cohort was confined to the first few positions of the list. According to Treisman et al. the listener needs some time to build up full attention to the primary message. It is worthwhile to square this notion with a process operating in auditory stream formation. If tones are rapidly alternated between frequencies far apart, listeners hear two sound sequences (called 'streams'), instead of one. However, Bregman (1978a) demonstrated that the listener needs a few cycles of alternation to interpret the sequence as two streams. I shall return to the phenomenon of streaming extensively in the third section of this chapter. Johnston and Heinz (1979) found that the Lewis effect could be diminished if target and non-target words were made more discriminable. The aim of their study was to pit 'late selection' theory (Deutsch and Deutsch, 1963) against 'multi-loci' theory (Broadbent, 1970; Norman, 1969). Late selection theory posits that all input will automatically receive full sensory and semantic analysis to the extent that some late selection device can screen unimportant messages. It is a so-called 'late bottleneck' theory. In multiple loci theory, on the other hand, the bottle is malleable: the neck can be moved. According to Johnston and Heinz (1979), it is the very essence of attention that selective operations can be carried out anywhere along the continuum from early to late in perceptual processing. So the processing of targets and non-targets can also vary from superficial to deep. If this supposition is true, then it should be possible to manipulate the Lewis effect. If conditions are created in which the non-targets receive deeper processing, the Lewis effect should be magnified, but this effect should be attenuated when the non-targets are processed shallowly. In the latter case no semantic interference with the words to be shadowed (targets) should occur. Johnston and Heinz devised experiments in which sensory discriminability between the primary and secondary messages varied. In the 'low discriminable' condition the dichotic target and non-target words were spoken by the same male voice whereas in the 'high discriminable' condition these words were in a male and a female voice. The results showed that varying the sensory discriminability affected the Lewis effect in the hypothesized direction. According to the authors the 'late selection' theory can not predict such phenomena. Johnston and Wilson (1980) offered more evidence for 'multi-loci' theory. As targets in dichotic pairs ambiguous words were presented and the task of the subject was to identify them as belonging to certain categories (e.g. articles of clothing). The meaning of the non-target cohort could be appropriate in the sense that it could, if processed, disambiguate the target such that the latter belonged to the designated category (e.g. target: sock, non-target: smelly). The non-target could also be inappropriate and, if processed, disambiguate the target so that it did not fit the category (e.g. target: sock, non-target: punches). The non-target could also be
84
G. ten Hoopen
neutral. The second factor, which was covaried with semantic relationship, was the listening condition. In the focused listening condition the targets were presented to one ear only, while in the divided condition the target could arrive at either ear. The results showed a significant interaction between semantic relationship and listening condition. In the focused condition target identification was not affected by the semantic features of the non-target. In the divided condition, however, targets accompanied by non-targets that could appropriately disambiguate them were identified much better than targets accompanied by inappropiate non-targets. Consequently, the authors claimed that the depth of processing of the non-target is not fixed (neither shallow as in the early selection stance, nor deep as in the late selection stance), but variable instead. It should be mentioned that the Johnston stance (see also Johnston and Heinz, 1978) is quite similar to the compromise between early and late selection notions already proposed by Norman in 1968. In Norman's model, competing messages can be discarded anywhere in the processing sequence. Both the physical input and the so-called 'pertinence' of information (based on expectation and linguistic rules) determine what will be screened and what will be selected for further processing. As far as I know, no more important theoretical contributions to selective attention modeling were reported after 1980 by students of auditory attention. However, theorizing about selective attention was intensified in the field of visual perception. I refer to Neumann, van der Heijden and Allport (1986) for a creditable explanation of this theory shift by juxtaposing auditory and visual selection clearly. Theorizing about attention was moved to the field of motor behavior too. The latter shift might be due to the fact that many researchers located the bottleneck (limitedcapacity system) far away toward the response site (see Chapter 4). If selective attention is indeed response competition, then the field of motor behavior seems an appropriate lodging for theorists of attention too.
2
ATTENTION
IN A SINGLE AUDITORY
STREAM
In this section I shall discuss studies that share the supposition that the listener, guided by recurrent patterns in the ongoing sound stream (e.g. rhythm, a pitch pattern, an accent pattern), is set to foster certain moments in the flow of time. The idea is that more or less attention can be invested at temporal positions as a function of the pattern structure. A related notion is 'temporal priming': sounds occupying certain positions in a temporal pattern might be processed in favor of other sounds as a function of the priming pattern. The studies stem from various fields of research: speech perception, music perception and time perception.
2.1 Speech A good example of a study that explicitly investigated the temporal allocation of attention in a single speech stream is that of Shields, McHugh and Martin (1974). The study was based on Martin's (1972) model for rhythmic structure in speech and other behavior. According to this model, speech elements are hierarchically organized, thus providing an underlying temporal structure within which elements are
Auditory attention
85
related instead of being simply concatenated. The rhythmic rules derived from the hierarchy impose an accent structure (a prosodic structure) on the speech sequence. According to Shields et al., a smart perceptual system would make use of the prosodic pattern of the sequence. That is, if a rhythmic structure has been detected, predictions can be made about accents as they should unfold in time. Such a perceptual system could benefit from attentional focusing on elements in temporal positions still to come. This presupposes that listeners, after having locked on to the rhythm, are able anticipatorily to focus their attention on accented syllables. Shields et al. tested their hypothesis by means of the classic phoneme monitoring task (this task consists of monitoring for some predesignated phoneme in a message and reacting to its occurrence, often manually; cf. Foss, 1969; Hakes and Foss, 1970). The prediction was that the monitoring reaction time (RT) to phonemes in accented syllables would be shorter than the RT to phonemes in unaccented syllables. Subjects were confronted with two kinds of material: (1) sentences containing the target phoneme ( / b / ) either in accented or unaccented nonsense syllables; and (2) strings of nonsense words, one of which comprised the accented or unaccented syllable, which had been spliced from the sentence material. By means of this splicing procedure, acoustic parameters were held constant across materials. The prediction was that the RTs to t h e / b / t a r g e t would be shorter when they appeared in accented syllables than when they were presented in unaccented ones, and that this effect would be larger in the sentences than in the nonsense words context. (The nonsense words were scrambled in order to prevent a natural binary rhythm pattern.) Table 3.1 displays the results which supported the hypothesis: the RT improvement for accented target positions was significant in the (rhythmic) sentence context (63 ms), whereas it was not significant (16 ms) in the nonrhythmic context. An interesting variable also included in this experiment was the position of the syllable that contained the target phoneme. This could be at the beginning, middle or end of the sentence or string. If the rhythm hypothesis is correct, then the allocation of temporal attention might improve as more instances of the beat structure (unaccented-accented) could be gathered. This leads to the prediction that RT to the accented target in the sentence will diminish as a function of the target's temporal position. This pattern could, indeed, be observed when comparing the initial position with the middle position. However, the pattern reversed
Table 3.1. Monitoring reaction times (ms) to the phoneme/b/in unaccented and accented nonsense syllables embedded in a sentence or in a nonsense word string (data from Shields et al., 1974)
Context Accent
Sentence
Nonsense word string
Unaccented Accented Difference
612 549 63*
687 671 16 (N.S.)
*Significantdifference. N.S., not significant.
86
G. ten Hoopen
when comparing the middle with the end position: RTs to accented targets were 10 ms longer in the end position than in the middle position, whereas there was a decrease of 78 ms for unaccented targets. Perhaps the classic 'ageing foreperiod effect' (Drazin, 1961) 1 interfered, but it is not clear why it should have interacted with accented and unaccented targets. Unfortunately, the authors mentioned positional RTs only for their sentence condition and not for their nonsense word string control condition. Therefore, this experiment offers only limited support for the interesting idea that rhythmic attention might build up as a function of time. Pitt and Samuel (1990) coined the term 'attentional bounce hypothesis' (ABH) for the notion that the listener's attention can lock in to the rhythmicity of the speaker. The authors criticized some aspects of the Shields et al. study discussed above, one criticism concerning the fact that the target phonemes were embedded in nonwords. Pitt and Samuel set up an improved replication study in which phonemes in two-syllable English words were used as targets. Moreover, the acoustic parameters were held constant by making the target words neutrally stressed. These neutral stress words were embedded in sentences predicting stress or non-stress on the first syllable (e.g. t a r g e t / p / i n 'p6rmit' as a noun) and also in sentences predicting stress or non-stress on the second syllable (e.g. t a r g e t / m / i n 'permit' as a verb). The hypothesis was that RTs would be faster to target phonemes in predicted stress syllables as compared with non-stress syllables. The results of this first experiment were not very convincing. There was no significant effect on RT, although the error rate for detecting the target phonemes was much lower when they were at expected accented positions. Therefore, the authors decided to manipulate the binary rhythm in a stronger way by having a string of two-syllable words precede the neutral stress words. In each such string of rhythm-inducing words, either the first or the second syllables were stressed. The data now turned out more favorable: RTs to targets in the stress positions were 24 ms faster than those to targets in non-stress positions, a difference that was significant. As Shields et al. had hypothesized, Pitt and Samuel found a fastening of monitoring RTs to targets that appeared later in the binary rhythmic sequence. This indicates that allocation of attention, indeed, builds up as the beat pattern gets longer. As already mentioned, such a positional RT pattern might have been caused by the 'ageing foreperiods effect', but this explanation was discounted by the third experiment in which the preceding rhythmic context was deliberately made unpredictive: the position effect disappeared. Though the studies of both Shields et al. (1974) and Pitt and Samuel (1990) seem to offer evidence for rhythmic attention operating in the listener's perception of a single speech stream, one might criticize the use of RT patterns as a direct reflection of attentional allocation. It is quite conceivable, for instance, that listening to the binary rhythm tuned the motor system to initiate responses more rapidly if the target was at a temporally accented position. Some evidence that may weaken this criticism can be found in a study by Robinson (1977), who did not use an RT paradigm but tested, among other things, 1Drazin (1961) showed that, when no catch trials were used, reaction time decreases with increasing or 'ageing' foreperiod (the time between the warning and the imperative stimulus). Although there is no formal foreperiod in the phoneme monitoring task, the time elapsing between the beginning of the sentence or string and the phoneme occurrence can be functionally conceived of as a foreperiod: the listener becomes more and more expectant for the phoneme the longer it stays away.
Auditory attention
87
whether recognition could be affected by manipulating the syllabic stress rhythm (iambic and trochaic stress patterns). It turned out that recognition for words improved if there was more stress similarity between the words in the original list and the test list. Although Robinson did not cast his study in attentional terms, the following quotation makes clear that his ideas are similar to those in the studies discussed earlier: '... the failure to recognize a word when its stress pattern is different from expectation' (Robinson, 1977, p. 85). Further evidence for rhythmic attention in single auditory streams, not based upon RT paradigms either, comes from experiments in the field of music perception.
2.2
Melodies
A study by Jones, Boltz and Kidd (1982), like the studies mentioned earlier, emphasized the temporal aspect of auditory attention. Jones et al. assumed that attending is a rhythmic activity that is guided by perceiving pattern invariances in the unfolding sequence. Notice the close correspondence of this approach to the basic assumptions of the speech rhythm studies. The authors quote one study (Dewar, Cuddy and Mewhort, 1977), in which it was suggested that a rhythm hierarchy underlying a melodic sequence might have given rise to a better recognition of tones, though rhythm was not manipulated per se. The main purpose then of the Jones et al. study was to test whether attention could be guided away from, or toward, melodic relationships by systematically varying the rhythmic context. Since the conceptual underpinnings and the experimental procedure were quite complex as compared with the classical studies on auditory attention (selection between dichotic messages), it is worthwhile to treat the study in some detail. First, I will explain the kind of melodic rules that were applied to generate the tonal stimulus sequences. Then I shall exemplify the rhythmic rules and how they were superimposed on the melodic sequences. Finally the experiment and its main outcome will be discussed. Sequences of nine tones were constructed from a three-tone 'argument' by applying certain rules. A three-tone argument comprised, for instance, the increasing musical tones G4, A4 and B4 from the C major scale. These tones are related to each other by a so-called 'lower-order melodic rule': adjacent tones differ by + 1 scale unit. A higher-order rule that could be applied to this argument is the transposition rule (TR). An example of applying this rule is: TR(G4A4B4)= C4D4E4. Other higher-order rules are the complement rule and the reflection rule. Figure 3.1 illustrates how these rules operate. The three-tone argument was preceded and succeeded by three-tone patterns resulting from one or two higher rules which yielded melodic strings such as C4D4E4 G4A4B4 C4D4E4 (one rule, namely transpose) or C4D4E4 F4E4D4 G4A4B4 (two rules, namely reflection and complement). Melodic context in these kinds of nine-tone patterns can be described at two rule levels: relations between tones within a triplet by a lower-order rule, and relations between tones between triplets by higher-order rules. Keep in mind that the fourth position in the sequence (SP4) realizes a higher-order rule, but the sixth position (SP6) completes a lower order rule. While these melodic rules specify what
88
G. ten Hoopen
Transpose, Tr
Reflection, RI
c
C~~
c.
1
'
f-
c,
B,
D, B,
G,
F,,
Tr(C4D4E4)-G4A4B4
G,,
-" D4
F,
RI(C4D4E4)-F4E4D4
Figure 3.1 Two examples of rules that transform one musical note into another on any diatonic scale. Here the mapping is on the C major scale. [After]ones et al., 1982].
happens, rhythm rules describe when the tones occur. These latter rules specify the temporal structure: temporal accents are obtained by metrically lengthening the tones or the pauses between them. Two of the rhythmic contexts that were used are displayed in Figure 3.2: dactyl and anapest, both as 'surface sound level' and as 'deep structure' (the underlying hierarchical tree). If, in the same vein as the attentional bounce hypothesis predicted, rhythm directs attention towards certain temporal positions, the expectation is that the dactyl rhythm (AUU) induces a (subjective) temporal accent on the fourth tone whereas the anapest rhythm (UUA) induces such an accent on the sixth tone of the sequence. It should be stressed that these tones, both residing in the central triplet, do not have temporal accents by themselves and hence their acoustic properties are the same across different rhythmic conditions. Since the hypothesized rhythmic guiding of attention is either toward the 4th or 6th position in the nine-tone pattern, it can be tested whether recognition is facilitated selectively. Listeners had to judge the melodic equivalence of a standard and a comparison nine-tone pattern (a same-different judgement). Half of the comparison stimuli were the same as the standard, and in the other half the frequency of the 4th or 6th tone was deviant. Not surprisingly, one of the findings was that recognition performance was worse when the deviant pitches were at higher-order transitions than when they were at lower-order transitions. But, in addition, rhythm 'modulated' recognition performance in the hypothesized direction. At the 4th position (higher-order transition), performance was better in the AUU rhythmic context. This dactyl rhythm apparently guided attention towards this position. Conversely, recognition performance at the 6th position (lower-order rule transition) was better in the anapest (UUA) rhythmic condition. Apparently this condition encouraged the listener to attend to that position. The rhythm X position interaction effect was significant. In a subsequent study Jones (1984) enlarged the time scope of rhythmically guided attention. In the study discussed above, rhythm was a between-subjects variable. Other studies (Jones, Kidd and Wetzel, 1981) suggested that recognition performance was somewhat worse if rhythmic context in the experimental session
89
Auditory attention T G
T 3
T
i I
"['SO
l
][
I I
][
I
2
I
2
J ~
3
~
~
4
t:
I ~
5
6
7
4
5
6
] r
8
i
9
T e
T
4
T 2
~,UU
I
3 j
I
[
]~
2
]
3
~
~
4
~
5
7 l
8
,
9 1 [
6
7
~[
8
]
9
Figure 3.2. Three rhythmic contexts for a common central segment (filled rectangles). On top is the isochronous rhythm. The middle row represents the anapest (UUA) rhythmic context, and the bottom row the dactyl (AUU) rhythmic context. [After Jones et al., 19821.
is a within-subject variable. Such a performance decrement could, according to Jones, be interpreted as follows: in the course of the experimental session, listeners make an abstraction of the rhythm. They build an internal representation of the rhythmic invariance that is used to guide attention. If successive recognition trials within a session vary with respect to rhythm, such an attentional guidance could be worse as compared with a constant rhythm throughout the session. Jones (1984) put this supposition to the following test: listeners were first adapted by 24 recognition trials. Each trial comprised two melodies, a standard and a comparison, each containing 12 musical tones. Half of the comparisons could be the same as the standard and in the other half there was a deviant tone at the 4th, 7th or 11th position. The rhythm was either isochronous (/ / / / / /) or duple ( / / / / / / ) , and the duration of the quarternote was 200 ms. After the adaptation phase subjects immediately entered the test phase in which two kind of trial occurred: context trials, preserving the adaptation rhythm, and test trials, either also preserving that rhythm or having the other rhythmic pattern. The hypothesis was that recognition performance in test trials should drop when they were switched to a rhythm other than the context rhythm. The argument for this hypothesis went as follows: if the adaptation and context trials build up attentional rhythmicity which might carry over to following trials, it becomes inappropriate if a different rhythm is encountered. Attention is then guided to the wrong temporal positions and recognition performance should be hampered. In
90
G. ten Hoopen
Table 3.2. Mean recognition accuracy on context trials in the test phase. (The measure is Ag, a nonparametric ROC recognition measure) (data from Jones, 1984)
Test rhythm Context rhythm
Isochronous
Duple
Isochronous Duple
0.68 0.59
0.60 0.66
Table 3.2 the results are presented, showing that recognition performance was better when context and test rhythm were the same than when they were different (the relevant interaction term was significant). Hence the data support the hypothesis that temporal patterning can guide attention in a larger context too. These and other studies by Jones and coworkers indicate that 'rhythmic attention' operates not only in speech perception but also in the processing of melodic patterns. An everyday example of this fact is that composers, without any knowledge of the literature on attention, often maximize melodic surprises by putting them in temporal locations of which they know that 'rhythmic attention' is high. Monahan, Kendall and Carterette (1987), although not adopting Jones' attentional terminology, had a similar aim as the Jones et al. (1982) study, namely 'to explore the kinds of temporal patterning that foster pitch-difference discrimination' (Monahan et al., 1987, p. 576). However, Monahan et al. studied temporal influences on pitch recognition processes from a different point of view. They emphasized the role of concordance between 'temporal accenting' and 'pitch-level accenting'. The concept of temporal accenting means that (musical) time is thought to be periodically pulsed. The perceiver superimposes a slower periodic rate on this pulse train: the so-called 'beat' or 'metric' or 'clock rate' (Povel and Essens, 1985). 'Pitch-level accenting' is a function of the contour of a melody (the ups and downs of the pitches), also called melodic accent. This notion implies that the form of the melodic contour invokes a (subjective) accent pattern (see Thomassen, 1979, 1982, for evidence). The idea of Monahan et al. was that the listener is confronted with two clocks, one running at the speed of temporal accenting and the other at the speed of pitch-level accenting (see also Monahan and Carterette, 1985). It should be stressed that the notion of pitch-level accenting, which can impose a grouping in melodies, differs from theories that describe melodic grouping by pitch rules derived from coding systems (cf. Jones, 1976; Deutsch and Feroe, 1981). Monahan et al. put forward the concept of rhythmic consonance according to which both clocks are metrically in phase. Rhythmic dissonance implies that they have different metrical rates. Much of their experimenting was devoted to testing these concepts. In general it was found that rhythmic consonance promoted performance of pitch-difference recognition. Several rhythms were included in their study, including the anapest and dactyl as used by Jones et al. (1981). Monahan et al. could not replicate Jones' results, but as they stated themselves there were too many differences in experimental pro-
Auditory attention
91
cedure. The main reason for the difference in results appears to be that rhythm was a within-subjects factor in the latter study, whereas it was a between-subjects factor in Jones' study. It was precisely that difference which was the topic of the Jones (1984) study, which we discussed at some length above, but which Monahan et al. did not refer to. Despite the different stances of the Monahan group ('clocks') and the Jones group ('rhythmic attending'), there is at least one important commonality: both stress the importance of the temporal dimension. I shall now turn to experiments that offer evidence that temporal attention can also operate in sound patterns that are far simpler than speech and music.
2.3
Time
If an empty time interval, bounded by two short stimuli, includes one or more short stimuli, it is called a divided time interval. Hall and Jastrow (1886) were the first to report that the subjective duration of such a divided time interval was longer than that of its empty counterpart, though both had the same objective duration. Since then, this illusion has been replicated several times (Benussi, 1913; Fraisse, 1961; Grimm, 1934; Nakajima, 1987; Thomas and Brown, 1974). I will treat the most recent study at some length because, apart from giving a satisfying explanation of the illusion, it also provides a good example of the role of attention in auditory time perception. In his thorough article 'A model of empty duration perception', Nakajima (1987) advocated his 'supplement hypothesis', which states that the subjective duration of an empty time interval is directly proportional to its physical duration plus a constant of about 80 ms. In the formula: r(t) = k(t + ~)
(1)
where r(t) is the subjective duration of the physical duration t, k is a psychophysical scaling constant (> 0), and a is the supplement constant of approximately 80 ms. Applying equation (1) to the case of an empty time interval, divided by one marker into two objective durations t l and t 2, yields a total subjective duration of k(t I + ~) + k(t 2 + o~) -- k(t~ + t 2 4- 2a). However, the subjective duration of the corresponding undivided interval amounts to k(t~ + t 2 4- or Hence the whole duration (t 1 4- t 2) of the divided time interval is overestimated by the supplement constant of a. According to Nakajima, this is the essence of the illusion of the divided time interval. He substantiated this claim by a series of convincing experiments. But what is the role of attention in the 'filled duration illusion'? In one of Nakajima's previous experiments (Nakajima, 1979), the listeners were confronted with divided time intervals (standards) and had to adjust them to nondivided comparison time intervals. The comparison could either precede or succeed the standard (see Figure 3.3a). Instead of simply averaging over these situations, Nakajima analyzed them separately and found a remarkable result. The adjustments of the comparison time intervals were bimodally distributed when the comparison came after the divided standard. One mode reflected about 100 ms overestimation of the standard, and the other mode only 20 ms. If, on the
~0 ~
\
m
,,,,_
,.,.-
~ 8 o# E'~
I
~,
~
,,
t
!
L.
,_
I
~
!
I
I
m
0
0
0 L.
0 r~
" X
..I"
!
F
i
\_
\_
,)
(
~
,
0
,
~
0
,
'-
~
~0 ~
0
"r~ ~
E
92
!
F
,
~
m
\_
n
\_
,
,
0
,
--~
~ ~
0
0
0
~
-O,T,
,-o
.~
\_
\_
,.,,."
'~
,.t=
o,~
~ . ~
o~,~
.
:~
:~.~
q~
=1 ..,..,
Auditory attention
93
other hand, the comparison preceded the divided standard, the distribution of Points of Subjective Equality (PSEs) was unimodal, reflecting an overestimation of only 20 ms (see Figure 3.3b). So it appeared that when the (nondivided) comparison time interval came first in the trial, the succeeding divided time interval was not much overestimated. This suggests that the interval was not interpreted as divided. The listeners apparently had been set by the temporal structure of the comparison to direct their attention only to the initial and final sound marker. They only processed the time elapsing between initial and final sound marker and apparently neglected or 'filtered' the dividing sound marker. However, when the divided time interval came first, subjects either matched it in the 'illusory' way, i.e. by processing all sound markers (yielding the overestimation of about 100 ms), or they matched the total time 'veridically', again filtering out the dividing sound. The reason that 'set' worked here in only half the cases might be that the experimental trials were interspersed with control trials, comprising two nondivided time intervals (Y. Nakajima, personal communication). Notice that such an explanation squares nicely with the results of the two Jones' studies just discussed: attentional guidance appeared to deteriorate when different temporal patterns were presented in a session. Finally, it should be emphasized that the spectral composition of the dividing soundmarker in Nakajima's studies was precisely the same as that of the initial and final ones, so it could not be filtered out by the listener on the basis of physical distinctive features. Adams (1977) offered evidence that other attentional factors also can counter the filled duration illusion (FDI). In his first experiment he presented durations of 0.8, 1 and 1.2 s which were marked by 20 ms 500 Hz sound markers. Each duration contained 0, 1 or 5 intervening 1000 Hz tones, also of 20 ms duration (so-called fillers), and the subjects had to judge the duration on a five-point short-to-long category judgement scale. Though the fillers had a different spectral composition (1000Hz) than the delimiting sound markers (500Hz), he found a clear FDI. Unfortunately there was no condition in which fillers had the same frequency as the markers. If subjects had been able to filter out fillers of a different frequency before duration judgements took place, then the amount of FDI should have become zero. Nevertheless, Adams had two conditions that are interesting with respect to the attention question. He presented the markers and fillers either 'mono' (both ears received both markers and fillers) or 'stereo' (markers to one ear and fillers to the other ear). The amount of FDI in the stereo condition was about half of the FDI in the mono condition but there was still a small FDI (as compared with the nonfilled duration condition). Hence the ear difference between markers and fillers was not enough totally to eliminate the fillers from the duration judgement process. A far better filtering of the fillers intervening between the initial and final marker of the duration to be judged was obtained by a temporal condition that Adams included. In a so-called 'running background' condition he also presented 1000 Hz tones in the intertrial intervals. Over a total trial sequence of about 400 trials the temporal background pattern of 1000 Hz tones was three tones per second. Very interestingly, the FDI almost disappeared. Apparently the 1000 Hz tones between the two markers of 500 Hz were captured by their intertrial counterparts to form a common stream through which the physically filled duration became perceptually unfilled. Its duration judgement nearly equaled that of the unfilled control duration.
94
G. ten Hoopen
From these experiments it seems that temporal context was a far stronger selection criterion than location. Furthermore it turned out that there was no interaction between location (mono versus stereo) and temporal context ('unitary'-only fillers between markers versus 'running background'). Both the Nakajima and Adams studies indicate that auditory signals can be excluded from further processing on the basis of temporal context. In the Nakajima study this was caused by set; in Adams' study by streaming. Auditory streaming will be the topic of the next section.
3
STREAMING,
ATTENTION
AND AUDITORY
ILLUSIONS
This section gives a sketch of the role that attention plays in organizing the auditory environment. It will be made clear that streaming and auditory illusions are not the result of preattentive processes only and that attentive processes can also affect the perceptual outcome. There are, for instance, many sound sequences that can be heard either as one coherent stream or as two segregated streams at will. In addition, evidence will be presented that a conflicting auditory situation, which gives rise to illusory percepts, can be interpreted in different ways depending on the experience of the listener. Every now and then, I will refer back to the cocktail party studies in order to confront them with the studies to be presented here.
3.1
Streaming and Attention
If consecutive tones, emanating from one physical source (e.g. a musical instrument or sine wave generator), are alternated rapidly between different frequencies, we often hear two melodic lines, as we may all experience when listening to pseudopolyphonic pieces by Baroque composers. Some research on this phenomenon was undertaken by Miller and Heise (1950). They alternated two pure tones regularly at a speed of 10 tones per second and established the so-called 'trill threshold', i.e. the frequency difference between the two tones (H and L) beyond which the sequence is no longer perceived as a single string. If the frequency difference was greater than 2-3 semitones, two sequences could be heard, -L-L-Land -H-H-H-, and attention could be paid at will either to the high tones or to the low tones, the other ones being relegated to the background. So, although physically there is one sequence, the perceptual impression is that there are two sequences, each emanating from a different source. We then speak of 'streaming'. 'A "stream" is a psychological organization that mentally represents such a sequence and displays a certain internal consistency, or continuity that allows that sequence to be interpreted as a "whole"' (McAdams and Bregman, 1979, p. 26). An alternating two-tone pattern is not the only kind of stimulus that can give rise to two streams. For instance, Heise and Miller (1951) continuously repeated a V-shaped melodic contour comprising 11 tones at a speed of 8 tones per second, wherein the middle (6th) tone could be adjusted in frequency. When sufficient frequency difference was installed, a single recurring tone emerged, 'occupying its own perceptual source', the remainder of the sequence forming the other stream.
Auditory attention
95
Whereas Miller et al. varied frequency to invoke streaming, Schouten (1962) investigated the role of tempo. He repeated the tones of an ascending major scale continuously. Up to 20 tones per second the sequence remained coherent, i.e. no streaming occurred. If, however, these tones were scrambled and again repeated cyclically, they streamed and the speed had to be reduced from 20 to about 10 tones per second before coherence was restored. Tempo is a very relevant variable in stream formation, as was demonstrated in a classical experiment by Warren et al. (1969). They presented a loop of four sounds repeatedly: a hiss, a tone, a buzz and the phoneme ee, each sound lasting 200 ms. Although the subjects could identify the sounds, they could not tell the correct temporal order of the looping elements. Only when the loop was slowed down considerably to between 450 and 670 ms per element could the order be established at a better than chance level. What typically seemed to have happened here was that (at the fast rate of the loop) four streams emerged, each stream characterized and cemented by the four different spectral compositions of the sounds. Bregman and Campbell (1971) also demonstrated that judging the temporal order of elements is far more difficult if they reside in different streams than when they belong to the same stream. It is evident from these studies that both frequency and tempo affect the way we perceptually organize such sequences. How tempo and frequency interact was nicely shown by van Noorden (1975) in his often cited thesis. Fortunately, van Noorden was very well aware of the influence of the observer's attentional set ('Einstellung'). He distinguished between two attentional sets: 'selective listening', in which the listener tries to hear either the stream of low tones or the stream of high tones, and 'comprehensive listening', in which the listener tries to hear all tones together in one stream. Instructions were given to his subjects accordingly. He defined the temporal coherence boundary (TCB) as the boundary between temporal coherence and fission (streaming) when the observer is trying to listen for coherence (one stream), and he defined the fission boundary (FB) as the boundary between temporal coherence and fission when the listener tries to hear fission (two streams). If there were no attentional set, these boundaries would coincide, which is suggested by the Miller and Heise (1950) data. However, they used only one (fast) tempo. As is clear from Figure 3.4, the boundaries, being very close to each other at fast tempos, diverge at slower tempos. Thus three different regions of perceptual organization can be distinguished. Rather independent of tempo, the FB is located at about 1-2 semitones, below which value listeners inevitably hear temporal coherence. Above the TCB the perceptual organization is also inevitable: listeners always hear streaming. The steep slope of the TCB shows that there is a trade-off between tempo and frequency: the slower the sequence, the more frequency difference between alternating tones has to be added in order to keep streaming inevitably. In the region between the two boundaries it depends on the attentional set whether coherence or streaming is perceived. So we have a rather curious situation here. For all combinations of frequency difference and tempo under the FB, the observer can hear only a single string despite his or her effort to listen selectively. Above the FB, but under the TCB, either one or two streams can be perceived depending on attentional set. That can only mean, of course, that attentional set guided the process of perceptual organization. However, the term 'auditory selection' as used in the present paradigm seems to
96
G. ten Hoopen
Tempo (tones/sec)
20 10 I
o 0
15 -
E 0 o
5
I
!
TCB ~1s~aySa
10
3
either 1or 2 streams (depending on the attenti0nal
"
seO
o 0
always
I stream
Figure 3.4. Re&awing of van Noorden's well-known Figure 2.9. In the area under the fission boundary (FB), the listener always hears one coherent stream. In the area above the temporal coherence boundary (TCB), two streams are always heard. In the middle area either one or two streams can be perceived.
be quite different from the way it was used in the classical studies on auditory selective attention. In those paradigms two streams (often two speech messages) were presented 'cut and dried', and the listener had to disentangle (follow) one stream. In van Noorden's middle region, 'auditory selection' itself creates two streams. Probably, focusing on one frequency band yields one stream, whereby the other stream is formed automatically. After this accomplishment, attention can be focused on either stream, the other one forming the background. This latter process of 'auditory selection' is more akin to the classical usage of the term. However, it should immediately be added that the act of creating two streams in the middle region is not infallible: 'spontaneous' reversals to the percept of one stream often occur.
In the region above the TCB, where set did not affect the perceptual outcome (always two streams), the term 'selective auditory attention' in its classical sense seems most appropriate: listeners could, as in the cocktail party studies, be instructed to 'shadow/sing' either the high or the low stream. Such a shadowing/singing task would be a very dull one, if the streams were monotonous. Shadowing either the upper stream or the lower stream, as depicted in Figure 3.5, would be more interesting. However, if we apply this task to test the selective power of frequency, then we run into the same pitfall as did the classical dichotic studies. Stream and the kind of melody residing in that stream are inherently confounded. The dichotic studies claimed that location was a good separator, but the cues by which the 'filtering' could be done were not solely location but always 'location + something else' (pitch, timbre, loudness, syntax a n d / o r meaning). A critical test for location separation would require two completely equal messages, one in each ear, of which ensemble only the left (or right) one should be shadowed. However, this is not feasible: even if the experimenter could invent an intelligent
Auditory attention
97
(a)
ILl _J 0 CO 9 _J >0 Z I..I.I 0 I..IJ rr l.l,.
TIME
(b) P I N
AqII'. 'IlI.
INN,,..m mmmmmm , m n m m
TIME
Figure 3.5 (a) An excerpt from Telemann's 'Sonata in C major for recorder and basso continuo', which gives rise to the perception of two streams, a melody in the higher-frequency region against a rhythmic background stream of lower frequency. (b) An excerpt from Telemann's "Capriccio for recorder and basso continuo', which also gives rise to the perception of two streams. The listener can direct ~,is or her attention to either melody at will, or can be primed by the preceding structure of the piece to put one melody in the foreground. [After Deutsch, 1975].
98
G. ten Hoopen
means of checking whether the shadowed message really came from the instructed ear, the ensemble fuses and is perceived as one message in the median plane. In the remainder of this chapter it will become clear from less direct approaches that location is not so good a 'filter' at all. There is still a further aspect of attention that can be discussed with respect to streaming: the notion of attention switching. Above the TCB the only possible perceptual organization is two streams. A simple and attractive explanation is that tones jump so rapidly through frequency space that some hypothetical attentionswitching mechanism cannot keep track. According to this view, streaming is a deficit. Such an explanation is of course a functional one: nobody has ever been able to locate an attention-switching mechanism in the brain. There is also another kind of functional explanation, favored by Bregman: 'The other type of functional explanation explains auditory organization by showing how it solves practical problems for the listener' (Bregman, 1990, p. 209). In view of such a teleological explanation, streaming should be seen as an accomplishment and not as a deficit. In our natural environment, sounds do not jump rapidly between two different frequencies. If the auditory system is confronted with the technical achievement of artificially switching sounds, its wise conclusion is that there are different sources instead of one. The following quotation also makes the point: 'We are built to pay attention to auditory sources, not to acoustic components, and it is the decomposition heuristics, organized in ways described by Gestalt psychology, that build the auditory descriptions of sources out of the complex acoustic input and, in doing so, place strong constraints on the process of attention' (Bregman, 1978b, p. 74). As we have seen, sequences of tones alternating between frequencies are organized depending on the rate of the sequence and the frequency interval. For combinations of tempo and interval above the TCB, the best possible interpretation was that there were two sound sources in the auditory environment, although physically there is only one. A very obvious question then is to ask what happens if tones alternate between auditory locations. If they do alternate relatively quickly, there is also an 'unecological' situation to which the auditory system is exposed. An auditory source does not quickly move forth and back, alternating between two different positions in space. Does the auditory system solve this situation in the same way as it solves the frequency case? Are two segregated streams created if tones are switched rapidly between ears or locations in space? This would seem a logical solution since contrary to the frequency case there are two physical sources. Two studies have strongly suggested that 'streaming by ear' or 'streaming by locus' do exist. Blauert (1970) found that temporal coherence was lost if the time between ear-alternating sounds became shorter than 170 ms. However, it was not reported by Blauert whether a clear sense of streaming took over. Nevertheless he states: 'Der Begriff "Tr/igheit des Richtungsh6ren" beschreibt die Tatsache, dass sich der Richtung, in der ein H6rereignis lokalisiert ist, nicht beliebig schnell/indert.' (The concept of inertia of listening for location holds that the direction of location perception cannot be changed infinitely fast.) (Blauert, 1970, p. 287.) Huggins (1974) was more explicit about the perceptual interpretation: 'In an informal listening test, it seemed that low alternation rates were heard as a sequence of intervals, and high rates were heard as two separate pulse trains, one at each ear, which could not be fused into a single perceptual image.' (Huggins, 1974, p. 939). He tested this informal observation and found that the percept of one
Auditory attention
99
sequence (stream) disappeared when the intersound interval became shorter than 150ms (thus close to Blauert's estimate), but that the percept of two separate streams was fully established only at about 60 ms. Several experiments in the Cherry/Broadbent tradition (e.g. Treisman, 1971; Guzy and Axelrod, 1972) were concerned with the processing of stimuli alternating between ears (so-called interaural presentation). The speed at which the auditory sequences were alternated could not give rise to separate streams according to the criteria just mentioned. But, despite the fact that the sounds belonged to one (alternating) stream, processing deficits were substantial. Treisman (1971) reported that lists of ear-alternating words were recalled worse, Harvey and Treisman (1973) found slower monitoring RTs for target tones residing in interaural tone sequences, and Guzy and Axelrod (1972) found that interaurally sequences of clicks were counted less well, a finding replicated by Massaro (1976) and ten Hoopen and Vos (1981). Deutsch (1979) found that ear-alternating melodies were recognized badly. Nakao and Axelrod (1976), van Noorden (1975) and ten Hoopen (1985) showed that the discrimination threshold between an isochronous (/ / / / / /) and a duple ( / / / / / / ) rhythm was higher for ear-alternating sequences. (In all these experiments the control condition comprised nonalternating sound sequences.) An explanation in terms of attention switching between locations was often proposed to account for such decrements in performance: 'This attention shifting time would reduce the time available for perception and storage...' (Treisman, 1971, p. 164); ' . . . that interaural attention shifts are processes which take time to perform...' (Guzy and Axelrod, 1972, p. 292); 'In the alternating case, switching time subtracts from processing producing a decrement in counting performance' (Massaro, 1976, p. 302). Two fundamentally different concepts of auditory attention switching should be distinguished. If there are two segregated streams, it is a redirecting of attention from the foreground stream to the background stream (which thereby becomes the figure). If there is one stream, attention shifting could be conceived of as a kind of auditory tracking between changing locations or frequency bands. I never came across such a distinction. Nevertheless the latter interpretation of attention switching should be the one applicable to the experimental results just mentioned. However, attempts to explain the performance decrements with interaural material by attention switching (in the tracking sense) quantitatively turned out to be a failure (ten Hoopen and Vos, 1980; ten Hoopen, 1982). A perceptual rather than an attentional explanation gives a better account of the results. For some reason, still unknown, the auditory system prefers to slow down the rate of ear-alternating sequences: they have a slower subjective tempo than nonalternating sequences. This effect was first reported by Axelrod, Guzy and Diamond (1968), and work done at my laboratory established the quantitative difference between the subjective tempos of ear-alternating and nonalternating sequences (Akerboom et al., 1983; ten Hoopen, Vos and Dispa, 1982). We applied a so-called stop RT technique (for which we are indebted to Schaefer, 1979) to interaural and monaural click sequences. Listeners had to respond as fast as possible to the end of the sequences containing an unpredictable number of clicks. Although a monaural and an interaural interval are the same in the physical time domain, the interaural interval is perceptually longer. Thus, in order to detect that no further click is arriving, the listener has to monitor time longer in the interaural condition before he or she can initiate the button press
100
G. ten Hoopen ii
,
,,
.
.
.
.
.
.
,,
,,|
Monaural click sequence N-3
N-2
l
N
N-1
I > Real time
~OAq
9
,
."
.
.
.
9 -
.
9"
.
.
1"
1'
[~
I > Mental time
~ POA"'
Interaural click sequence N-3 N-2 N-1 ~SOA,. 9 "] ...".1 9
.
9
~
~
9
N I ~ Real time " "
9
~
1""
[
"
|
~-" POA q
,I
> Mental time
(a)
(b)
N-2
N-1
I
l
N l ~ P O A r q - ~ r , i ----~ RTrnon------~ ,~ RT-POAi-POA m
N-2
N
I
r
I
N-1
*
RTin t -
poA i -+---ri---[
Stop Reaction Time Paradigm Figure 3.6. (a) Monaural and interaural sequences with the same stimulus onset asynchrony (SOA) between clicks in real time differ with regard to perceptual-onset asynchrony (POA). In mental time, the temporal positions of interaural click percepts are spaced more widely than those of monaural ones. (b) Diagram of the stop reaction time paradigm. See text for explanation.
Auditory attention
101
(Figure 3.6). Given that this latter response-initiating component of the RT is invariant with listening conditions, the difference in stop RT to the interaural and monaural sequences reflects the subjective time difference. It turned out to be 26 ms and, interestingly, it was constant over a wide range of intervals. Whether the objective interval between consecutive sounds was as short as 40 ms or as long as 2130 ms, the interaural percepts were all spaced 26 ms farther apart than monaural sounds in auditory memory. Akerboom et al. (1983) argued that deteriorated performance resulting from ear-alternated presentation of the sounds can be far better explained by the 'interaural tempo illusion' than by attention-switching models. In addition it should be emphasized that, even when the ear-alternating sequence turns from the 'one-stream percept' to the 'two-stream percept' (which is claimed by Huggins to happen when the intersound interval becomes shorter than about 100 ms), the subjective dilation of the interaural interval still remains at 26 ms. This means, of course, that the temporal relation between sounds residing in different hypothesized 'locus streams' is preserved. This contrasts strongly with the situation in the case of frequency streams between which temporal relations can hardly be judged correctly, as discussed earlier. Two things should have become clear by now. First, there is a set of experimental results, stemming from the tradition of auditory selection studies, that might be better explained by mere perceptual factors than by attentional ones. Second, it empirically turned out that the ears are bad streamers. Bregman reached this point in a far more elaborate way: 'This may be why spectral organization does not depend too strongly on spatial cues. A person can do a creditable job at segregating concurrent sounds from one another even when listening to a monaural recording. Spatial evidence is just added up with all the other sorts of evidence in auditory scene analysis' (Bregman, 1990, p. 659).
3.2
Auditory Illusions and Attention
Deutsch reported several interesting auditory illusions which further elucidate the way listeners organize a complex set of tonal stimuli into a percept. One of these illusions is the 'two-tone illusion' (Deutsch, 1974, 1975), also known as the 'octave illusion' (Deutsch, 1980, 1983). The illusion is heard when a dichotic chord of 400 and 800 Hz tones, lasting 250 ms, alternates repeatedly (Figure 3.7, top). What would a traditional theory of auditory selective attention predict, given the 'primary message' instruction to follow one ear? Probably that the listener should perceive an alternating l o w - h i g h - l o w - h i g h pitched sound sequence in one ear, or conversely a h i g h - l o w - h i g h - l o w pitched sequence in the other, if that ear was designated as the primary one. This prediction is false. In reality observers reported several percepts: some heard one pitch alternating between the ears, others heard more complex percepts such as two alternating pitches in one ear and a third pitch discontinuously in the other. However, most observers reported that the sequence they heard alternated simultaneously between pitches and locations (Figure 3.7, bottom) and nobody reported the veridical structure of the stimulus (Deutsch, 1974). It should be emphasized
102
G. ten Hoopen
Objective stimulation
LX~
R
t.
R
L
R
J
J
J
J
J
L
.I
A
A
A
A
A
A
r
r
r
F
L
R
L
R
F
F
R
L
L
R
IR
L
L
Subjective impression Figure 3.7. Objective and subjective patterns in the two-tone or octave illusion. [After Deutsch, 1975].
that the illusion works equally well with headphones as with loudspeakers, hence the terms lateralization and localization will be used here interchangeably. Deutsch and Roll (1976) proposed the following rules to describe the genesis of this illusion. With regard to pitch, right ear dominant listeners (most of us) perceive the tone that stimulates the right ear. Thus a sequence of high and low pitches is heard, the tones in the contralateral ear being suppressed. Second, the lateralization of the tones follows the ear that receives the higher tone, irrespective of whether this tone is being suppressed or not. Thus, although the classical model for selective auditory attention was not built for, and upon, this kind of tricky artificial 'message', at least half of the prediction seems to be true: the pitch attribute of the percept indeed corresponds to the tones in one ear. But this ear is not selected deliberately! Moreover, the selection mechanism operating on the location attribute of the tone was subordinate: the selected location was, so to speak, prescribed by the frequency of the tone. The following quotation puts it better: 'The mechanism responsible for pitch perception chooses to follow the frequencies that are presented to one side of auditory space rather than the other, that is, the decision as to what is heard is determined by where the signals are coming from. Yet the localization mechanism chooses instead to follow the higher-frequency signal; that is, the decision as to where the stimulus is located is determined by what the signal frequencies are' (Deutsch, 1980, p. 578). The idea, then, is that different attributes are analyzed by different mechanisms and that the output from these analyses is integrated at a later stage. Zwicker (1984) could replicate these findings. When he used higher alternating frequencies, however (centering around 1 and 2 kHz), the lateralization of the perceived sound remained toward the ear receiving the high pitch for about half the subjects, but was toward the ear receiving the low pitch for the other subjects. Furthermore, he found that the illusion was optimal when the speed of alternation was 200 ms, while the illusion disappeared completely when the alternation speed was slowed down beyond 1 s. Then his subjects could describe a veridical percept.
Auditory attention
103
RIght ear Left ear
(a) O
Z~
0
0 0
0
0
oi
i
I
1
2
3
0 I
9
I
!
1
4 5 TIME
6
7
8
I
I
I
I
I
Perceptual Organization (b) Left ear/right ear 09 LU I--
O z UJ
(..) 09
I
I
I
I
I
(c) High frequencies/low frequencies
-
~.,O "~
I
I
,I
"O.
I
I
I
(d) Upward progression/downward progresslon
' & , " & , ~
TIME (sec)
Figure 3.8. (a) The objective pattern that creates the scale illusion. (b-d) Alternative organizations
that could solve the dilemma of competing grouping principles. [AfterHandel, 19891.
Another interesting illusion reported by Deutsch (1975) is the 'scale illusion'. A rising and a falling C major scale were presented dichotically: the falling scale of eight tones alternated between the right and the left ear, and the rising scale alternated the other way round (Figure 3.8). This juxtaposition of frequency and location puts the listener into the dilemma of how to interpret the pattern, since
104
G. ten Hoopen
perceptual organization principles have been set in competition. There are at least two such principles, but, as Handel (1989) has mentioned, good continuation (frequency progression) might also be a candidate. If location is the most powerful principle, the two Gestalts shown in Figure 3.8b should emerge from the battle: one 'melody' at the left ear and another at the right ear. However, if frequency is a stronger organizer then two other Gestalts, shown in Figure 3.8c, should emerge. And finally, when good continuation overrules location and frequency, the perceptual pattern in Figure 3.8d should survive. It turned out that frequency won. Location was overruled so strongly that even the upper v-contoured melody seemed to originate from the right half of auditory space and the lower A-contoured melody from the left, despite of the fact that half of the tones of each melody were presented to the other ear. Like the octave illusion, the scale illusion works not only with headphones but with loudspeakers as well (Butler, 1979). Those researchers who adhere to an attention-switching model would interpret the perceptual outcome as follows: since the tones constituting the v - and ^-contoured melodies alternated relatively slowly (250 ms), attention could follow the tones in opposite ears and therefore no dissociation by ear occurred. A prediction would then be that ears become the organizing factor if the sequence is speeded up beyond some hypothetical switching time, say 150 ms, a rough estimate based on Blauert (1970), Cherry and Taylor (1954) and Huggins (1974). But anyone playing around with a stereosound-generating computer can convince him/herself that, even with tones lasting only 80 ms or so, frequency remains the organizer. (I owe this reasoning partly to Bregman's (1990) book.) An important comment on the scale illusion was put forward by Judd (1979). He called organization by frequency pitch stream segregation (PSS) and organization by location spatial stream segregation (SSS). Deutsch had intended to put PSS and SSS into competition, but Judd questioned that location got a fair chance to be the dominating principle. He carried out two experiments to test this contention. In the first experiment he showed that SSS of an ear-alternating tone sequence could be suppressed by adding broad band white noise in the contralateral ear. Although the experiment was set up cleverly, I skip its details since my main argument is that it did not properly support the criticism raised. The criticism of the scale illusion concerned a potential suppression of SSS by using dichotic tones, not by using dichotic tone-noise pairs. The experiment could have been done in a more appropiate way if Judd had been aware of Warren and Bashford's (1976) article, reporting a phenomenon called 'auditory contralateral induction'. When a dichotic chord, comprising a sine tone at one ear and a narrow noise band at the other ear, alternated every 500 ms, the sine tone could be delateralized toward the median plane if the intensity and the spectral composition of the noise band were such that it could have masked the sine tone, had it been in the same location. If Judd had used such narrow-band noises, resembling tones more than white noise, his experiment would have been more convincing. It is interesting to relate these contralateral effects to the classical speechswitching experiments (as was also done by Judd). Cherry and Taylor (1954) found minimal speech intelligibility when the speech signal alternated three to four times per second between the ears, a deterioration that they ascribed to limitations of a hypothetical attention-switching mechanism. Besides the fact that this explanation was discounted by Broadbent (1958) and Huggins (1964), Schubert and Parker
Auditory attention
105
(1955) found that switched-speech intelligibility improved greatly when noise was added in the 'empty' ear (not receiving the speech signal). Very probably, contralateral induction took place through which the alternating speech delateralized toward the middle of the head. But let us return to the main story of the scale illusion. Judd (1979) devised a second experiment in which nondichotic (binaural) rather than dichotic patterns were presented, but were otherwise similar to those of the scale illusion. He found that he could reinstate SSS at the expense of PSS. Judd's conclusion was that the scale illusion follows from the laws of stream segregation (like PSS and SSS) and auditory localization, rather than solely from a competition between PSS and SSS. In addition, as Judd mentioned, there is the complicating factor of the old musical law of counterpoint theory, stating that melodic lines should not cross.
Whatever the intricate preattentive processes may be that give rise to the scale illusion, it is without doubt that if the two pitch Gestalts have emerged, one can follow either the v-contour (the higher melody) or the A-contour (the lower melody) at will. An interesting question is whether the kind of illusory outcome can be affected by factors such as set, training or experience. Two studies from a group at the Macquarie University, Australia, address such questions. Smith et al. (1982) could only reliably replicate Deutsch's scale illusion when their subjects could choose from a restricted set of response alternatives (figures representing the alternative melodic percepts). Subjects did less well when open-ended responses were required. The authors suggest that the scale illusion is not a very robust phenomenon (unlike, for instance, the M611er-Lyer illusion). They quoted Gillam (1980) who suggested that illusions should only be called so if they are genuinely perceptual, i.e. the illusion should not be weakened by conceptual knowledge. Smith et al. therefore preferred to call the 'scale illusion' an ambiguous figure (like the Necker cube). When I discussed the work of van Noorden on streaming, we had already met such an ambiguous type of pattern. All frequency alternating sequences between the FB and the TCB could be perceived either as a coherent sequence or as two segregated streams, and reversals between these percepts could occur spontaneously. Smith et al. used musicians as well as nonmusicians as subjects and found that these groups differed in their reports. Consequently, the main purpose of the second study of the Sydney group (Davidson, Power and Michie, 1987) was to investigate the effect of conceptual factors such as training and experience on the perception of Deutsch's ambiguous figure. An interesting variable that they included was 'priming'. Given that the ambiguity could be resolved in three different ways (organization by ear, by pitch, or by frequency progression), it is of course possible to attempt to prime or set the listener selectively. This is illustrated in Figure 3.9. The authors referred to several inspiring aspects of Baroque music such as the use of organization principles, e.g. pitch, timbre, dynamics, rhythm and speed in polyphonic and pseudopolyphonic pieces. They even remarked that: 'Baroque composers seem to have intuitively subscribed to a model of attention that states that we cannot attend to more than one sound at a time' (Davidson et al., 1987, p. 601). Priming can be observed very clearly in Baroque pieces: before presenting two tunes together they are often played individually to set the listener. Dowling (1973), in his well-known article on 'interleaved melodies', indeed found that target melodies could be unraveled when prespecified.
106
G. ten Hoopen
aI ~
A
~
,o~
L
[
',
i* I
a.,,
9
i
9
9
I
J[
ku
IP . . . .
jp
9
- - 7
a_
I
9
9
li '~ ~
9 .______J 9
I,~,
e
e
r
9
9
,----~
I
--
~
~--
1
i__l
~. 9 -- 1____ ~Z 9 __jr__.____--_ ~__.__j ~
....
~
7 - - -
e
~
r
--1
-
"
~
*--~
1
Figure 3.9. The stimuli presented in the experiments of Davidson et al. (1987). In panels A to D, the first two bars represent the priming stimuli, and the last bar the stimulus on which the subjects reported. In each pair of staves, the upper stave was presented to the right ear, the lower to the left. f Afler Davidson et al., 19871. Davidson et al. (1987) ran their experiment with three groups of subjects (16 each): non-musicians, performers of chamber music and contemporary composers. They all heard the four conditions illustrated in Figure 3.9. Notice that conditions A, B and C contain both the prime patterns and the ambiguous figure, but D only the ambiguous figure (the control). Most non-musicians perceived the ambiguous figure in the way the prime 'prescribed'. Most performers of chamber music, however, perceived the ambiguous figure almost always as organized by pitch (the 'common' scale illusion impression), irrespective of the kind of prime. But the other group of musicians, the contemporary composers, perceived the ambiguous figure almost always organized by ear, also irrespective of the kind of priming. So it appeared that experience, be it in classical or contemporary music, was stronger than set in organizing the ambiguous dichotic stimulus.
4
EPILOGUE
Many texts on attention start by quoting William James (1890), who gave a good articulation of the concept of attention. I prefer to finish by a rather long quotation
Auditory attention
107
from a letter of Seneca, since it gears very nicely to the contents of this auditory chapter. Selection, set, experience, and even rhythmic attending: it is all there. His letter is so lucid that I will add only one comment: an aspect of selective attention that we dare not easily study in our laboratories is the main theme of the letter (culminating in the last sentence of the quotation): selecting no external stimuli at all. Although there is a reference section, I will set some landmarks under the heading 'significant literature' for those who would like to inspect better or more extensive accounts of the new topics on which I have tried to focus attention.
4.1
Seneca's Lodgings
'I cannot for the life of me see that quiet is as necessary to a person who has shut himself away to do some studying as it is usually thought to be. Here I am with a babel of noise going on all about me. I have lodgings right over a public bathhouse. N o w imagine to yourself every kind of sound that can make one weary of one's years. When the strenuous types are doing their exercises, swinging weight-laden hands about, I hear the grunting as they toil a w a y - or go through the motions of toiling a w a y - at them, and the hissings and strident gasps every time they expel their pent up breath. When my attention turns to a less active fellow who is contenting himself with an ordinary inexpensive massage, I hear the smack of a hand pummeling his shoulders, the sound varying accordingly as it comes down flat or cupped. But if on top of this some ball player comes along and starts shouting out the score, that's the end! Then add someone starting up a brawl, and someone else caught thieving, and the man who likes the sound of his voice in the bath, and the people who leap into the pool with a tremendous splash. Apart from those whose voices are, if nothing else, natural, think of the hair remover, continually giving vent to his shrill and penetrating cry in order to advertise his presence, never silent unless it be while he is plucking someone's armpits and making the client yell for him! Then think of the various cries of the man selling drinks, and the one selling sausages and the other selling pastries, and all the ones hawking for the catering shops, each publicizing his wares with a distinctive cry of his own. '"You must be made of iron," you may say, "or else hard of hearing if your mind is unaffected by all this babel of discordant noises around you, when continual "good morning" greetings were enough to finish off the Stoic Chrysippus!" But I swear I no more notice all this roar of noise than I do the sound of waves of falling w a t e r - even if I am here told the story of a people on the Nile who moved their capital solely because they could not stand the thundering of a cataract! Voices, I think, are more inclined to distract one than general noise; noise merely fills one's ears, battering away at them while voices actually catch one's attention. Among the things which create a racket all around me without distracting me at all I include the carriages hurrying by in the street, the carpenter who works in the same block, a man in the neighbourhood who saws, and this fellow tuning horns and flutes at the Trickling Fountain and emitting blasts instead of music. I still find an intermittent noise more irritating than a continuous one. But by now I have so steeled myself against all these things that I can even put up with a coxswain's strident tones as he gives his oarsmen the rhythm. For I force my mind to become self-absorbed and not let outside things distract it...' (Seneca: Epistulae Morales ad Lucilium, translated by Robin Campbell, pp. 109-110).
G. ten Hoopen
108
4.2
Significant Literature
Bregman's (1990) monumental work Auditory Scene Analysis has no chapter on attention, but the whole book is pervaded by an air of attention which condenses in his last chapter. Deutsch wrote a fine tutorial chapter ('Auditory pattern recognition') in the Handbook of Perception and Human Performance, vol. 2 (1986); there is a subsection (1.4) titled 'Grouping and selective attention'. Handel (1989) wrote a nice textbook, Listening, concerned with speech and music perception. There is no chapter called 'Auditory attention', but chapter 7 ('Breaking the acoustic wave into events: stream segregation') is relevant to our topic. Less recent but relevant is also Warren's book Auditory Perception: A New Synthesis (1982). Two books on music psychology are also relevant. Sloboda's (1985) The Musical Mind: The Cognitive Psychology of Music contains a section 'Attention in music listening'. Dowling and Harwood's (1986) Music Cognition has a chapter 'Melody: attention and memory', but the next chapter, 'Melodic organization', is also concerned with the topic of attention. The impressive work of Jones is very pertinent to the topics presented. To recall only a few titles: 'Time, our lost dimension: Toward a new theory of perception, attention, and memory' (her 1976 article in Psychological Review), 'Dynamic attending and responses to time' (Jones and Boltz, 1989, in Psychological Review), and 'Musical events and models of musical time' (in Block, 1990). Those readers who favor a more traditional survey than the one I wrote, could consult Hawkins and Presson's (1986) chapter 'Auditory information processing' (also in the Handbook of Perception and Human Performance, vol. 2) containing a section on auditory attention. Though tutorially this section is fine, do not expect any references beyond the 1970s. Swets and Kristofferson's survey on 'Attention' (1970) and Neisser's Cognitive Psychology (1967) could also be inspected for a fuller description of the first decades of attention research.
ACKNOWLEDGEMENTS I am grateful to the Canon Foundation in Europe, who selected me as an 1991 Canon Visiting Research Fellow. This chapter was completed during the term of the fellowship. Thanks are also due to the Department of Acoustic Design of the Kyushu Institute of Design (Fukuoka, Japan), for generously giving me space, time and support to finish this chapter. Special thanks to Yoshitaka Nakajima who helped me a lot.
REFERENCES Adams, R. D. (1977). Intervening stimulus effects on category judgments of duration. Perception and Psychophysics, 21, 527-534. Akerboom, S., ten Hoopen, G., Olierook, P. and van der Schaaf, T. (1983). Auditory spatial alternation transforms auditory time. Journal of Experimental Psychology: Human Perception and Performance, 6, 882-897. Axelrod, S., Guzy, L. T. and Diamond, I. T. (1968). Perceived rate of monotic and dichotically alternating clicks. Journal of the Acoustical Society of America, 43, 51-55.
Auditory attention
109
Benussi, V. (1913). Psychologie der Zeitauffassung. Heidelberg: Carl Winter's Universit~itsbuchhandlung. Blauert, J. (1970). Zur Tr~igheit des Richtungsh6rens bei Laufzeit- und Intensit~itsstereophonie. Acustica, 23, 287-293. Block, R. A. (Ed) (1990). Cognitive Models of Psychological Time. Hillsdale, NJ: LEA Publishers. Bregman, A. S. (1978a). Auditory streaming is cumulative. Journal of Experimental Psychology: Human Perception and Performance, 4, 380-387. Bregman, A. S. (1978b). The formation of auditory streams. In J. Requin (Ed.), Attention and Performance VII. Hillsdale, NJ: Erlbaum. Bregman, A. S. (1990). Auditory Scene Analysis. Cambridge, MA: MIT Press. Bregman, A. S. and Campbell, J. (1971). Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of Experimental Psychology, 89, 244-249. Broadbent, D. E. (1954). The role of the auditory localisation in attention and memory span. Journal of Experimental Psychology, 47, 191-196. Broadbent, D. E. (1958). Perception and Communication. London: Pergamon. Broadbent, D. E. (1970). Stimulus set and response set: Two kinds of selective attention. In D. E. Mostofsky (Ed.), Attention: Contemporary Theories and Analysis. New York: AppletonCentury-Crofts. Bryden, M. P. (1988). An overview of the dichotic listening procedure and its relation to cerebral organization. In K. Hugdahl (Ed.), Handbook of Dichotic Listening: Theory, Methods and Research. Chichester: John Wiley. Butler, D. (1979). A further study of melodic channeling. Perception and Psychophysics, 25, 264-268. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 25, 975-979. Cherry, E. C. and Taylor, W. K. (1954). Some further experiments upon the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 26, 554-559. Corteen, R. S. and Dunn, D. (1974). Shock-associated words in a nonattended message: A test for momentary awareness. Journal of Experimental Psychology, 102, 1143-1144. Corteen, R. S. and Wood, B. (1972). Autonomic response to shock-associated words in an unattended channel. Journal of Experimental Psychology, 94, 308-313. Davidson, B., Power, R. P. and Michie, P. T. (1987). The effects of familiarity and previous training on perception of an ambiguous musical figure. Perception and Psychophysics, 41, 601-608. Deutsch, D. (1974). An auditory illusion. Nature, 251, 307-309. Deutsch, D. (1975). Musical illusions. Scientific American, 233, 92-104. Deutsch, D. (1979). Binaural integration of melodic patterns. Perception and Psychophysics, 25, 399-405. Deutsch, D. (1980). The octave illusion and the what-where connection. In R. S. Nickerson and R. W. Pew (Eds), Attention and Performance VIII. Hillsdale, NJ: Erlbaum. Deutsch, D. (1983). Auditory illusions, handedness, and the spatial environment. Journal of the Audio Engineering Society, 31, 607-618. Deutsch, D. (1986). Auditory pattern recognition. In K. R. Boff, L. Kaufman and J. P. Thomas (Eds.), Handbook of Perception and Human Performance, vol. 2 (pp. 32/1-32/49). New York: Wiley. Deutsch, D. and Feroe, J. (1981). The internal representation of pitch sequences in tonal music. Psychological Review, 88, 502-522. Deutsch, J. A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Deutsch, D. and Roll, P. L. (1976). Separate 'what' and 'where' decision mechanisms in processing a dichotic tonal sequence. Journal of Experimental Psychology: Human Perception and Performance, 2, 23-29. Dewar, K. M., Cuddy, L. L. and Mewhort, D. J. K. (1977). Recognition memory for single tones with and without context. Journal of Experimental Psychology: Human Learning and Memory, 3, 60-67.
110
G. ten Hoopen
Dowling, W. J. (1973). The perception of interleaved melodies. Cognitive Psychology, 5, 322-337. Dowling, W. J. and Harwood, D. (1986). Music Cognition. Orlando, FL: Academic Press. Drazin, D. H. (1961). Effects of foreperiod, foreperiod variability, and probability of stimulus occurrence on simple reaction time. Journal of Experimental Psychology, 62, 43-50. Foss, D. J. (1969). Decision processes during sentence comprehension: Effects of lexical item difficulty and position upon decision times. Journal of Verbal Learning and Verbal Behavior, 8, 457-462. Fraisse, P. (1961). Influence de la dur6e et de la fr~quence des changements sur l'estimation du temps. Annie Psychologique, 61, 325-339. Gillam, B. (1980). Geometrical illusions. Scientific American, 242, 102-111. Gray, J. A. and Wedderburn, A. A. (1960). Grouping strategies with simultaneous stimuli. Quarterly Journal of Experimental Psychology, 12, 180-184. Grimm, K. (1934). Der Einfluss de Zeitform auf die Wahrnehmung de Zeitdauer. Zeitschrifl fiir Psychologie, 132, 104-132. Guzy, L. T. and Axelrod, S. (1972). Interaural attention shifting as response. Journal of Experimental Psychology, 95, 290-294. Hakes, D. T. and Foss, D. J. (1970). Decision processes during sentence comprehension: Effects of surface structure reconsidered. Perception and Psychophysics, 8, 413-416. Hall, G. S. and Jastrow, J. (1886). Studies of rhythm. Mind, 11, 55-62. Handel, S. (1989). Listening: An Introduction to the Perception of Auditory Events. Cambridge, MA: MIT Press. Harvey, N. and Treisman, A. (1973). Switching attention between the ears to monitor tones. Perception and Psychophysics, 14, 51-59. Hawkins, H. and Presson, J. (1986). Auditory information processing. In K. R. Boff, L. Kaufman and J. P. Thomas (Eds), Handbook of Perception and Human Performance, vol. 2 (pp. 26/1-26/64). New York: Wiley. Heise, G. A. and Miller, G. A. (1951). An experimental study of auditory patterns. American Journal of Psychology, 64, 68-77. Huggins, A. W. F. (1964). Distortion of the temporal pattern of speech: Interrruption and alternation. Journal of the Acoustical Society of America, 36, 1055-1064. Huggins, A. W. F. (1974). On perceptual integration of dichotically alternated pulse trains. Journal of the Acoustical Society of America, 56, 939-943. James, W. (1890). Principles of Psychology. New York: Holt. Johnston, W. A. and Heinz, S. P. (1978). Flexibility and capacity demands of attention. Journal of Experimental Psychology: General, 107, 420-435. Johnston, W. A. and Heinz, S. P. (1979). Depth of non-target processing in an attention task. Journal of Experimental Psychology: Human Perception and Performance, 5, 168-175. Johnston, W. A. and Wilson, J. (1980). Perceptual processing of non-targets in an attention task. Memory and Cognition, 8, 372-377. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83, 323-355. Jones, M. R. (1984). The patterning of time and its effects on perceiving. Annals of the New York Academy of Sciences, 423, 158-167. Jones, M. R. (1990). Musical events and models of musical time. In R. A. Block (Ed.), Cognitive Models of Psychological Time. Hillsdale, NJ: LEA Publishers. Jones, M. R. and Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459-491. Jones, M. R., Boltz, M. and Kidd, G. (1982). Controlled attending as a function of melodic and temporal context. Perception and Psychophysics, 32, 211-218. Jones, M. R., Kidd, G. and Wetzel, R. (1981). Evidence for rhythmic attention. Journal of Experimental Psychology: Human Perception and Performance, 7, 1059-1073. Judd, T. (1979). Comments on Deutsch's musical scale illusion. Perception and Psychophysics, 26, 85-92.
Auditory attention
111
Kimura, D. (1961). Cerebral dominance and the perception of verbal stimuli. Canadian Journal of Psychology, 15, 166-171. Lewis, J. L. (1970). Semantic processing of unattended messages using dichotic listening. Journal of Experimental Psychology, 85, 225-228. Martin, J. (1972). Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review, 79, 487-509. Massaro, D. W. (1975). Experimental Psychology and Information Processing. Chicago, IL: Rand McNally College Publishing Company. Massaro, D. W. (1976). Perceiving and counting sounds. Journal of Experimental Psychology: Human Perception and Performance, 2, 337-346. McAdams, S. and Bregman, A. (1979). Hearing musical streams. Computer Music Journal, 3, 26-43 (references continued on p. 60). Also published (1985) in C. Roads and J. Strawn (Eds.), Foundations of Computer Music. Cambridge, MA: MIT Press. Miller, G. A. and Heise, G. A. (1950). The trill threshold. Journal of the Acoustical Society of America, 22, 637-638. Monahan, C. B. and Carterette, E. C. (1985). Pitch and duration as determinants of musical space. Music Perception, 3, 1-32. Monahan, C. B., Kendall, R. A. and Carterette, E. C. (1987). The effect of melodic and temporal contour on recognition memory for pitch change. Perception and Psychophysics, 41, 576-600. Moray, N. (1969). Attention: Selective Processes in Vision and Hearing. New York: Academic Press. Nakajima, Y. (1979). A psychophysical investigation of divided time intervals shown by sound bursts. Journal of the Acoustical Society of Japan, 35,145-151 (in Japanese with English abstract and English figure captions). Nakajima, Y. (1987). A model of empty duration perception. Perception, 16, 485-520. Nakao, M. A. and Axelrod, S. (1976). Effects of bilateral alternation on perceived temporal uniformity of auditory and somesthetic pulse trains. Perception and Psychophysics, 20, 274-280. Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts. Neumann, O., van der Heijden, A. H. C. and Allport, A. (1986). Visual selective attention: Introductory remarks. Psychological Research, 48, 185-188. Norman, D. A. (1968). Toward a theory of memory and attention. Psychological Review, 75, 522-536. Norman, D. A. (1969). Memory while shadowing. Quarterly Journal of Experimental Psychology, 21, 85-93. Norman, D. A. (1976). Memory and Attention, 2nd edn. New York: John Wiley. Pitt, M. A. and Samuel, A. G. (1990). The use of rhythm in attending to speech. Journal of Experimental Psychology: Human Perception and Performance, 16, 564-573. Povel, D.-J. and Essens, P. (1985). Perception of temporal patterns. Music Perception, 2, 411-440. Robinson, G. M. (1977). Rhythmic organization in speech processing. Journal of Experimental Psychology: Human Perception and Performance, 3, 83-91. Schaefer, F. (1979). Gerichtete und verteilte Aufmerksamkeit bei der Einsch/itzung richtungsalternierender Clicks. Paper presented at the 21st Tagung Experimentell arbeitender Psychologen, Heidelberg, Germany. Schouten, J. F. (1962). On the perception of sound and speech; subjective time analysis. Fourth International Congress on Acoustics, Copenhagen, Congress Report II, pp. 201-203. Schubert, E. and Parker, C. (1955). Additions to Cherry's findings on switching speech between the two ears. Journal of the Acoustical Society of America, 27, 792-794. Seneca (translation by Campbell, R. A., 1969). Letters from a Stoic. Harmondsworth: Penguin. Shields, J. L., McHugh, A. and Martin, J. G. (1974). Reaction time to phoneme targets as a function of rhythmic cues in continuous speech. Journal of Experimental Psychology, 102, 250-255.
112
G. ten Hoopen
Sloboda, J. A. (1985). The Musical Mind: The Cognitive Psychology of Music. Oxford: Oxford University Press. Smith, J., Hausfeld, S., Power, R. P. and Gorta, A. (1982). Ambiguous musical figures and auditory streaming. Perception and Psychophysics, 32, 454-464. Spieth, W., Curtis, J. F. and Webster, J. C. (1954). Responding to one of two simultaneous messages. Journal of the Acoustical Society of America, 26, 391-396. Swets, J. A. and Kristofferson, A. B. (1970). Attention. Annual Review of Psychology, 21. Tartter, V. C. (1988). Acoustic and phonetic feature effects in dichotic listening. In K. Hugdahl (Ed.), Handbook of Dichotic Listening: Theory, Methods and Research. Chichester: John Wiley. ten Hoopen, G. (1982). The perceptual organization of alternating tone sequences. Unpublished PhD Dissertation, Leiden University, The Netherlands. ten Hoopen, G. (1985). The detection of anisochrony in monaural and interaural sequences. In J. A. Michon and J. L. Jackson (Eds), Time, Mind, and Behavior. Berlin: Springer. ten Hoopen, G. and Vos, J. (1980). Attention switching is not a fatigable process: Methodological comments on Axelrod and Guzy (1972). Journal of Experimental Psychology: Human Perception and Performance, 6, 180-183. ten Hoopen, G. and Vos, J. (1981). Attention switching and patterns of sound locations in counting clicks. Journal of Experimental Psychology: Human Perception and Performance, 7, 342-355. ten Hoopen, G., Vos, J. and Dispa, J. (1982). Interaural and monaural clicks and clocks: Tempo difference versus attention switching. Journal of Experimental Psychology: Human Perception and Performance, 8, 422-434. Thomas, A. C. and Brown, I., Jr (1974). Time perception and the filled duration illusion. Perception and Psychophysics, 16, 449-458. Thomassen, J. M. (1979). Melodic accent in computer composed metrical tone sequences. IPO Annual Progress Report 14. Eindhoven, The Netherlands, pp. 43-50. Thomassen, J. M. (1982). Melodic accent: Experiments and a tentative model. Journal of the Acoustical Society of America, 71, 1596-1605. Treisman, A. M. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental Psychology, 12, 242-248. Treisman, A. M. (1971). Shifting attention between the ears. Quarterly Journal of Experimental Psychology, 23, 157-167. Treisman, A. M. and Geffen, G. (1967). Selective attention: Perception or response? Quarterly Journal of Experimental Psychology, 19, 1-18. Treisman, A. M., Squire, R. and Green, J. (1974). Semantic processing in dichotic listening? A replication. Memory and Cognition, 2, 641-646. Van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Unpublished PhD Dissertation. Technische Hogeschool (Institute of Perception Research), Eindhoven, The Netherlands. Von Wright, J. M., Anderson, K. and Stenman, U. (1975). Generalization of conditioned GSRs in dichotic listening. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance V. New York: Academic Press. Wardlaw, K. A. and Kroll, N. E. A. (1976). Autonomic responses to shock-associated words in a nonattended message: A failure to replicate. Journal of Experimental Psychology: Human Perception and Performance, 2, 357-360. Warren, R. M. (1982). Auditory Perception: A New Synthesis. New York: Pergamon Press. Warren, R. M. and Bashford, J. A. (1976). Auditory contralateral induction: An early stage in binaural processing. Perception and Psychophysics, 20, 380-386. Warren, R. M., Obusek, C. J., Farmer, R. M. and Warren, R. P. (1969). Auditory sequence: Confusion of patterns other than speech or music. Science, 164, 586-587. Zwicker, T. (1984). Experimente zur dichotischen Oktav-T~iuschung. Acustica, 55, 128-136.
Chapter 4 Dual-Task Performance H. Heuer Institut fiir Arbeitsphysiologie an der Universitiit Dortmund, Germany
Dual-task performance is a field of study where basic and applied research merge. It is not only an object of study by itself, but it has also been used as a tool for the assessment of automaticity (Neumann, 1984) or mental load (Bornemann, 1942a, b), that is, as a means for the assessment of characteristics of individual tasks. However, it became recognized that the particular relation between tasks that were combined was of higher importance than had been previously thought. With this development, dual-tasks as a method for the study of general individual-task characteristics such as automaticity or mental load became obsolete. Instead, dual-tasks became a useful method for the study of structures or functions involved in the performance of individual tasks. A final step in the theoretical and methodological development was to apply a more fine-grained analysis to dual-task performance. It turned out that there is more than just a global performance decrement. This three-step development will be traced in the main body of this chapter; it will be framed by an initial section on physically incompatible tasks and a final section on practice.
1
PHYSICALLY INCOMPATIBLE TASKS
Some tasks are physically incompatible: a part of the body can be in only one location at a time; different parts of the body are connected with each other so that, if one part is in one location, other parts can be in only a limited set of locations; finally, some time is needed to change locations. Because of these constraints, the assumption that only one action can be performed at a time is justified for many practical purposes (Moray, 1986).
1.1
Speculations on the Control of Physically Constrained Output Devices
Functions of the central nervous system should be adapted to the peripheral physical constraints. First of all, there must be some means to make one action dominant among several different alternatives (Shallice, 1972, 1978), and to select Handbook of Perception and Action, Volume 3
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
ISBN 0-12-516163-8
113
114
H. Heuer
the appropriate input from the numerous available stimuli (Allport, 1987; Neumann, 1987, 1990). Second, the control system has to be adapted to the mechanical linkages between output systems. For example, gross movements of body parts represent a threat to balance because they shift the location of the center of mass of the body. Therefore they are preceded by muscle activity that serves to prevent losing balance (Heuer, 1996; Woollacott and Jensen, 1996). Finally, adaptation of the control system to temporal peripheral constraints is required: human effectors are like low-pass filters, and they cannot be driven by high-frequency signals if the resulting activity is to serve some purpose. In short, an adapted control system should realize selectivity, coupling among different output devices, and protection of ongoing outputs. Such adaptation, of course, will turn out to be detrimental to performance in many dual-tasks. This, however, can now be understood as a side product of a fundamentally adapted, functional system rather than as a pure deficit. Although the functional perspective on dual-task performance is quite general and rather vague, it might be of some heuristic value. For example, the physical constraints on the output devices can be expected to be reflected more strongly in the output-related parts of the controlling system than in the input-related subsystems or intermediate ones. There is some support for this hypothesis: dual-task performance decrements appear to be particularly large when there is temporal overlap between response-related processes of both tasks (Johnston et al., 1970; McLeod, 1973; Trumbo and Milone, 1971).
1.2 Time-Sharing of Physically Incompatible Actions Whenever dual-tasks require physically incompatible actions, the available time has to be split between the different tasks. In general, time-sharing of physically incompatible actions has not attracted very much research effort, but there are exceptions such as monitoring of several different displays. Figure 4.1 illustrates a general but instructive way to conceptualize the problem. In this figure, the vertical lines correspond to internal representations of the displayed variable at different times (from left to right) after reading the display at t = 0. In Figure 4.1a, the observation was within the tolerance range __+L and the observed value was stationary; as the dashed line indicates, the mean expected value remains constant after the observation. In Figure 4.1b, the observation was outside the tolerance range and the operator had taken some action to bring the variable back to its set point (designated as zero); as indicated by the dashed line, the expected value of the variable behaves correspondingly thereafter. Immediately after an observation the observer can be fairly certain about the value of the displayed variable, but with the passage of time thereafter uncertainty will increase. This is represented in Figure 4.1 by the increasing variance of the probability distributions defined on the internal representations. Based on a scheme like that of Figure 4.1, one would expect that the selection of a certain display for inspection is based mainly on two factors. First, there is the uncertainty about the values of the various variables. To minimize total uncertainty, at any point in time the observer should sample that signal for which uncertainty is largest. When tolerance ranges are taken into account, it is not uncertainty per se that matters, but only whether or not tolerance ranges are exceeded. Therefore, at
115
Dual-task performance
Q) § ~ g i
..>
gl i
. . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
"i -I
b) o
§
" ~
~176176176176176176 " ' . - . . e
.oOOoo.
~ Im,,~~
t
-I_
Figure 4.1. Illustration of increasing uncertainty about the value of a variable sampled at t = O. In (a) the reading was within the tolerance range + L and no action was taken; in (b) the reading was outside the tolerance range and action was taken to restore the required value. Uncertainty is represented by the variance of a probability density function. [After Sheridan and Ferrell, 1974, Figure
2o.8.1
any point in time the observer should sample that signal for which the probability of lying outside the tolerance range is largest. If costs are different for different signals or above and below the tolerance range, probabilities would have to be weighted by costs, and these constitute the second factor. The scheme of time allocation guided by probabilities and costs can be understood as normative; it allows one to determine optimal time-sharing strategies when objective estimates of probabilities and costs are available. When internal representations rather than objective estimates are considered, the scheme becomes descriptive. Although it has the touch of presupposing a rational observer, all kinds of irrationality can enter the internal representations, and the observer (or operator) must not behave at all rationally. Nevertheless, normative models invite the
116
H. Heuer
question to what extent people conform to optimality as specified by the model. I shall consider the optimality question for a restricted situation for which costs can be neglected because they are minimal and identical for the different displays. The increase of uncertainty following the sampling of a signal will depend mainly on the frequency characteristics of the signal. Therefore frequency characteristics should enable one to predict sampling patterns. This goal has been pursued by Senders and coworkers (Senders, 1966, 1983; Senders et al., 1964). Eye movements of the subjects were recorded while they monitored four displays. The signals were quasirandom with different bandwidths. The initial work started with a straightforward technomorphic prediction for human sampling. The prediction was based on a well-known mathematical theorem, the so-called sampling theorem, according to which the highest frequency that can be discovered in a signal has a period of twice the sampling interval; in other words, veridical reconstruction of a signal requires that the sampling frequency is at least twice the highest frequency component of the signal. Senders et al. (1964) found that, as the bandwidth and thus the highest frequency component of the signal increased, sampling frequency increased as well, but not as much as predicted by the sampling theorem. What is remarkable about Senders' results is not so much that human sampling behavior is nonoptimal, using the sampling theorem as the criterion, but rather that it follows the sampling theorem as closely as it does. Sampling theorem predictions are much too strong for human behavior; although they represent a criterion for optimality, the underlying assumptions are unlikely to be correct for people. Nevertheless the adjustment of sampling frequency to bandwidth, which requires extensive practice (Moray, 1986), indicates that observers develop internal models of the stochastic characteristics of signals which, in the long run, seem to be rather well adapted to the objective characteristics. Therefore the sampling pattern is likely to approach optimality as long as the internal model fits the outside world; it is also likely, however, to be led astray by the internal model when the objective stochastic characteristics change abruptly, as in a system failure.
2
TIME-SHARING
AND CAPACITY-SHARING
The approaches to dual-task performance that are considered in this section can be viewed as internalizations of competition for output devices; competition for physical devices such as the hand is replaced by competition for hypothetical central entities. These entities are called a 'single channel' or 'capacity' in the two sets of models that will be discussed. From a historical perspective, the two classes of models are rivals, but they are conceptually similar and hard to distinguish experimentally. What is common to both sets of models is the assumption that (almost) all tasks make demands on some central entity, and that competition for this entity is a major source of dual-task interference. (The models may permit that the central entity is bypassed by some tasks or admit that other sources contribute to dual-task interference, but these additions are not essential.) According to single-channel models the central entity is committed to one or the other task in an all-or-none fashion, as shown in Figure 4.2a. This is true for each point in time, but the entity can be time-shared
117
Dual-task performance
al 1-
t
At
{
b) 1-
,
,
{
At Figure 4.2. All-or-none allocation (a) and graded allocation (b) of capacity resulting in the same average during the observation interval At.
among tasks. For a certain time interval At (which represents the temporal resolution in the analysis of performance), the allocation of the central entity to both tasks can be characterized in terms of relative time, but also, and less obviously, in terms of the relative amount of the central entity (the values in both cases are, of course, the same). The average is indicated by the dashed line in Figure 4.2a. According to capacity models, the central entity is allocated to concurrent tasks in a graded manner, as illustrated in Figure 4.2b. What is shared among the tasks is thus capacity at any point in time. However, for a certain time interval At, which again represents the limited temporal resolution of observations, only an average can be estimated. This is shown by the dashed line. Given an estimated average, it is impossible to decide whether it comes from a process of time-sharing as in Figure 4.2a or from a process of capacity-sharing as in Figure 4.2b. Therefore, the two classes of models cannot be distinguished in principle: given an integral over At of a function f(t), the shape of f(t) in the interval At cannot be recovered. A distinction between single-channel models and capacity models requires additional assumptions in particular about minimal allocation durations for a single-channel model. When the temporal resolution At can be made smaller than the minimal allocation duration, single-channel models predict all-or-none allocation of the central entity within each interval of observation; any evidence on graded allocation would be inconsistent with the model. Therefore the study of
118
H. Heuer
single-channel models is very much based on tasks that permit a high temporal resolution in the data analysis. Capacity models, in contrast, generally refer to a more molar level of dual-task performance: the more molar analysis in terms of capacity-sharing can often be reformulated in terms of time-sharing, a fact that is not always recognized.
2.1 Single-Channel Models and the Psychological Refractory Period Single-channel models can be traced back to two papers by Craik (1947, 1948) that dealt with the human operator in tracking tasks. Craik noted that, even though the track was continuous, the movements of the subjects appeared discontinuous. To account for this observation, he postulated a central intermittency of the human operator. Craik and his coworkers (Craik, 1948; Hick, 1948; Vince, 1948, 1949) noted the existence of an experimental methodology that was particularly well suited to put the central intermittency notion under more stringent experimental test. Telford (1931) had discovered that, when stimuli are presented in rapid succession, responses to the second of two stimuli are delayed. His study had been inspired by the known phenomenon of the physiological refractory period, and he dubbed his behavioral analog the 'psychological refractory period'. Although the interpretation in terms of reduced excitability of nervous tissue was abandoned by Craik and his coworkers, the label survived. Figure 4.3 illustrates the psychological refractory period as observed in a single subject (Heuer, 1981). The subject had to respond to a visual signal and an auditory signal presented in rapid succession, using the four fingers of the right and left hand, respectively. The visual signal was a rectangle that appeared in one of four
o) 800 -
800 -
700
700 -
600
RT1
~
600 -
RT1
__..~ ~ - t "
"~7 --
e
500 -
500
s
b)
RT~ms]
RT[rns]
-
# #
~' fA
I
60
Y'
C-RT2 I
220
I
I
I
380
Isl
oo-;,, |
5&0
I
I
I
?00
900
"
C-RT2 ~l
60
I
I
220
I
I
I
380
I
5/,0
I
I
I
700
900
,sl
Figure 4.3. Reaction time (RT2) to the second of two successive signals as a function of the inter-stimulus interval (ISI); RT1 is reaction time to the first of the successive signals, C-RT2 is reaction time to the second signal when the response to the first one is omitted. (a) ISI constant for blocks of trials; (b) ISI variable across trials. [AflerHeuer, 1981, Figure 1.]
Dual-task performance
119
vertically arranged positions on a video screen, and the auditory signal was a tone with one of four different frequencies. As can be seen in Figure 4.3, reaction time to the second (auditory) signal (RT2) increases as the inter-stimulus interval (ISI) shrinks. This increase can be attributed to the requirement to respond to the first signal because it does not occur in the control condition where RT2 is essentially independent of ISI. (This independence is not a universal finding.) In addition, reaction time to the first (visual) signal (RT1) was essentially independent of ISI. The increase of RT2 as ISI declines is the major experimental phenomenon to which single-channel models have been applied.
2.1.1
The Basic Model
Single-channel models are specifications of Craik's more general notion of central intermittency. Welford (1952) suggested the hypothesis of a central mechanism (the single channel) that processes only one stimulus at a time. While data from one stimulus are processed, additional stimuli are held in store until they get access to the central mechanism. Later on (see Welford, 1980) the function of the hypothesized central mechanism was characterized as that of translating a stimulus into a response, or of choosing an appropriate response given a certain stimulus. Thus, single-channel operation became restricted to a particular function, while other functions such as sensory processing of the signal or control of the choosen response were thought to be performed concurrently with other processes. In contrast to Craik's original hypothesis, the central mechanism in the singlechannel model is not assumed to operate with a fixed cycle duration. Rather the gate which controls the access to the translation stage is thought to be closed until processing of the current signal is finished or until some minimal feedback from the beginning response becomes available. Figure 4.4 illustrates the various time intervals that are relevant for the basic single-channel model when two successive stimuli are to be processed. From this figure the equation for RT2 as a function of ISI is quite obvious: ~RT1 + FT(min) + DT2 - ISI for (ISI + PT2) < (RT1 + FT(min)) RT2 = ~DT2 + PT2 for (ISI + PT2) > (RT1 + FT(min))
(1)
In this equation, DT (decision time) is the time that the central mechanism is occupied by one or the other signal, FT(min) is the time that is required for minimal feedback to reopen the gate, and PT is the time for perceptual processing that is not subject to single-channel operation. There are two obvious predictions from equation (1): first, a linear relationship between RT2 and ISI with slope - 1 for (ISI + PT2)< (RT1 + FT(min)) and slope 0 for longer ISis; and, second, a linear relationship between RT2 and RT1 with slope + 1 for (ISI + PT2) < (RT1 + FT(min)) and slope 0 for longer ISis. The predictions from equation (1) are correct for a deterministic variant of the model. However, the terms in equation (1) should be understood as random variables (except for ISI). For a stochastic variant of the model the predictions are no longer accurate. Typically equation (1) is applied to means (or medians), and the main effect of random variations is assumed to be a gradual shift of the slopes from
120
H. Heuer
RT2
; f
i,
,
'
.....
I
I
DT2
I
PT 1 r
---~///////////////////////////////~
I-- . . . .
PT2
ISI
I
FT(min)
DT 1
i
r///////////,'////////////x/////~ II
I
RT1 Figure 4.4 The single-channel model. RT, reaction time; PT, perceptual time; DT, decision time; FT(min), time for processing minimal feedback; ISL inter-stimulus interval. + 1 toward 0 rather than an abrupt one (Welford, 1980, p. 217). This hunch about how random variations will modify the predictions of a deterministic model can be grossly incorrect, as has been demonstrated by Vorberg (1985) for a similar type of model.
2.1.2
E x t e n s i o n s of t h e M o d e l
The single-channel model has attracted considerable research efforts, and several modifications and extensions have been suggested to bring it in line with experimental data. In this section I shall briefly discuss three important extensions that, in my view, do not violate the general spirit of single-channel operation. First of all, in several experiments RT2 did not increase, and even decreased (Elithorn and Lawrence, 1955), when the ISI became smaller than about 100 ms. This finding has been attributed to grouping. With very short ISis, the subject is assumed to produce a double response in a certain proportion of trials rather than two successive responses. This, in terms of the single-channel model, will happen when the second signal arrives at a time when the gate, which protects processing of the first signal, is not yet closed so that the second signal will enter the central mechanism as well. Welford (1980) estimates the closing time of the gate after the arrival of the first signal to be about 80 ms. The single-channel model has also to be adjusted to account for instances when a psychological refractory period is lacking. This is typically observed when a second signal does not require a new response, but rather an amendment of the first response, e.g. a change of its speed or radial direction (Massey, Schwartz and Georgopoulos, 1986; Vince and Welford, 1967), or its stop (de Jong et al., 1990). To make the single-channel model consistent with these data, it has to be assumed that the gate that protects the central mechanism is selective; there must be something like an open side entrance for signals that are relevant for the ongoing response. Finally, lengthening of RT2 has also been observed when $2 was presented after response initiation (Ells, 1973; Heuer, 1981). According to Welford (1980), this can be understood in terms of a single-channel model when the assumption is added that the central mechanism can be occupied by reafferent signals from an ongoing
Dual-task performance
121
response; particular 'high points' of, for example, kinesthetic stimulation should be associated with single-channel processing. However, in simple aimed movements such as those studied by Ells (1973) or Heuer (1981), it is hard to discover such 'high points' at which, to fit the general characterization of the central mechanism's function as response selection, alternative routes of action should be possible.
2.1.3 Testing Single-Channel Models The basic single-channel model is fairly simple and easy to attack experimentally. The attacks have been successful to the extent that the notion of single-channel processing is mainly history today; nevertheless the model receives some attention here because it may have been rejected for partly unjustified reasons. The first kind of evidence against single-channel models is the often too shallow slope of RT2-ISI functions (e.g. Figure 4.3). Dramatic examples have been reported by Greenwald and Shulman (1973) when the S-R relations were highly compatible (pointing in the direction of a visible arrow; pronouncing a heard letter). In their second experiment RT2 did not increase for smaller ISis (in their first experiment it did, but this was accompanied by a decline of RT1 so that the sum (RT1 + RT2) was essentially independent of the ISI). It is likely that with a simple translation the central mechanism is occupied for a shorter period of time so that the increase of RT2 will be smaller at any ISI than when the stimulus-response translation is more difficult. Nevertheless, the slope of the RT2-ISI function should be - 1 in the deterministic single-channel model. Unfortunately it is not clear what predictions could be generated by stochastic variants with different distributional assumptions. Essentially the same kind of reservation is warranted with respect to a second piece of evidence against single-channel models: the increase of RT2 with RT1 is not sufficiently strong. In addition, the slope of the RT2-ISI function was shown to depend on RT1; it becomes steeper when RT1 increases as a result of a larger number of response alternatives (Smith, 1969). This is equivalent to an increasing slope of the RT2-RT1 function for shorter ISis. Again these data violate the predictions from a deterministic single-channel model, but they are possibly within the range of the predictions of stochastic variants. Finally, there are several experiments in which not only RT2 depended on ISI, but RT1 as well. Such findings have given rise to the hypothesis that the presentation of the second signal can briefly interrupt processing of the first one (Tolkmitt, 1973). Often, however, there might be a less far-fetched interpretation. Subjects may choose strategies that are not consistent with the assumptions of single-channel models. In Figure 4.3 there is a weak trend for RT1 to increase with ISI, which is more pronounced in other experiments (Greenwald and Shulman, 1973, experiment 1). Such an increase will be observed whenever subjects, in a certain proportion of trials, wait for the presentation of the second signal before they start processing the first one. Strategies like this are beyond the scope of single-channel models.
2.1.4 A Broader Perspective on Single-Channel Models Single-channel models appear outdated. This may not really be justified by experimental data: the data may falsify predictions derived from deterministic
122
H. Heuer
variants of the model that possibly cannot be upheld with stochastic variants; data that are inconsistent with the models could be caused by factors that are outside the scope of the single-channel notion. Beyond any experimental tests, however, single-channel models do also appear outdated because they are closely associated with an old-fashioned analogy, that of an information channel or the central processing unit of a simple computer. And finally, the heuristic value of the models may have been exhausted; strong formal predictions for phenomena other than the psychological refractory period have hardly ever been developed. Moving away from traditional analogies and methodologies, single-channel models can be viewed from a functional perspective, and they could be taken as major ingredients of broader models that might suggest new questions. Singlechannel processing serves an important function in that it realizes selectivity and protection of ongoing actions. From this point of view, the important question is not only whether or not single-channel processing does exist, but also which rules govern the access of signals. Traditionally these rules are quite simple: the central mechanism is occupied by data which come first, and thereafter it takes a while to close the gate. This assumption is blind for phenomena such as orienting responses or task-related selections from sets of signals (Neumann, 1980). Single-channel models have the potential for revitalization in a broader context: elementary experimental tasks require adequate stochastic modeling, and in a broader framework questions about the involvement of single-channel processing in various tasks are relevant as well as questions about the rules that govern the time-sharing of the hypothetical central mechanism.
2.2 Limited Central Capacity and the Performance-Operating Characteristic In single-channel models, the dual-task performance decrement is attributed to time-sharing of a single central mechanism. Capacity models, in contrast, assume a hypothetical central quantity that can be allocated to concurrent actions in a graded manner. Because all tasks compete for only a single kind of capacity, this is often called 'generalized capacity'. 2.2.1
T h e Basic M o d e l
The basic conceptual tools for models of limited generalized central capacity have been anticipated by Bornemann (1942a, b) as a theoretical foundation for the assessment of mental workload. It took some 30 years before the concepts were described a second time, endowed with new labels and somewhat modified. 'Capacity' is a hypothetical variable with a certain relation to performance that is specified by the 'performance-resource function' (Norman and Bobrow, 1975); the term 'resource' is largely synonymous with 'capacity', except probably for subtle connotations as described below. Performance-resource functions (PRFs) are assumed to be nondecreasing. Thus, performance x~ on task i can be described as a function of capacity c, which can vary between 0 and the upper limit CL: xi(c) = ei + fi(c),
0 ~ c ~ CL,
f(O) = O,
fi (c) >l O
(2)
123
Dual-task performance
Of course, equation (2) holds for ei + fi (c) >~ 0 only; for values below 0, performance should be zero, that is, x~(c) = 0. This can happen with a capacity-free term e~ < 0. For fi(c) = 0, performance is independent of capacity; using the terms of Norman and Bobrow (1975), it is data-limited, but not capacity-limited. As far as performance on a single task is concerned, capacity is characterized by two assumptions, the existence of an upper limit c a and the existence of a nondecreasing PRF. The analysis of dual-task performance requires additional assumptions which are nontrivial and can be violated as pointed out by Navon and Gopher (1979). First, there is the capacity-sharing assumption: if c~ is the capacity supplied for task i and cp the capacity supplied for task p, then ci + cp - CL. Second, task identity has to be assumed, i.e. the performance level on each task, given a certain amount of capacity supplied, is assumed to be independent of whether it is combined with a concurrent task or not. Third, the allocation of capacity to the two tasks is assumed to be at least partly under voluntary control. Given the assumptions one can predict the joint performance levels of two (or more) tasks from individual PRFs by plotting xp(cp) as a function of Xi(CL- %) for the range 0 ~ cp ~ c a. This kind of tradeoff function has been called 'performanceoperating characteristic' (POC). In contrast to PRFs, POCs can be determined experimentally; however, they are not sufficient to recover individual PRFs. Only the ratio of the slopes of PRFs can in principle be recovered from the slope of the POC (Heuer, 1985b): d[xi(xp)]/d[xp] = -f[(CL -- Cp)/f'v(%),
0 <~ Cp <~ CL
(3)
MSE 100 2OO
300
400
-
-
-
,,...
500 - ,
:9.,
9
/
350
i
I
300
I
I
250
I
I
200
I
I
150 [ms]'
RT Figure 4.5. An empirical performance-operating characteristic. Concurrent tracking and simple reaction time (RT), higher performance goals being defined for tracking (left branch of curve) or reaction time (right branch); MSE is mean squared error in arbitrary units. [After Schmidt et al., 1984, Figures 2 and 3.]
124
H. Heuer
Using a certain reference task p, PRFs of different tasks i can thus be compared when Cp is mapped on Xp in an arbitrary manner. POCs have received considerable theoretical attention, but experimental studies have been rare. They require extensive experimentation, and there are problems in fitting theoretical curves to the observed data points. Navon (1984) has criticized several published POCs on the grounds that the instructions for the subjects implicitly or explicitly informed them about the expected performance tradeoff. To circumvent these difficulties he suggested a 'method of twin POCs', which had been used by Schmidt, Kleinbeck and Brockmann (1984) quite independently. Schmidt et al. did not present their data in the format of a POC, therefore they have been rearranged in Figure 4.5. The subjects of Schmidt et al. (1984) had to perform a two-dimensional pursuittracking task concurrently with rapid responses to an auditory stimulus presented every 4 s. In a baseline condition subjects were instructed to perform the two tasks with equal emphasis. Thereafter they were requested to improve performance on one or the other task by 20 or 40%, and these percentage improvements were clearly defined in terms of target performance scores that could be compared with knowledge of results. At the same time they were to maintain their performance level on the second task. As can be seen in Figure 4.5, subjects succeeded in improving their performance upon specific requests, but, contrary to instructions, only at the cost of performance decrements on the second task.
2.2.2
E x t e n s i o n s of t h e M o d e l
The conceptual framework of limited generalized central capacity can be used to devise a variety of specific formal models (Schweickert and Boggs, 1984), and various less formal suggestions for modifications have been made. I shall concentrate on three rather broad extensions of the basic model. The first two have been proposed by Kahneman (1973) in his quite influential version of capacity theory. According to casual as well as formal (Hillgruber, 1912) observations, people can be driven by task demands, i.e. effort or 'Anspannung' depends on task difficulty. For models of generalized central capacity this implies that the upper limit CL is not an individual constant, but will increase as the demands increase. Kahneman assumes that the spare capacity, which is available for performance of a secondary task, decreases with increasing task demands in spite of the flexible upper limit, but less so than with a constant limit CL. Most likely a constant upper limit Ca is fictitious in general; nevertheless it may be correct for a set of different experimental conditions of a study and thus be a justified simplification for experimental tests of the models. Although it figures relatively little in his book, Kahneman (1973) recognizes that competition for limited generalized central capacity is not the only source for dual-task interference. In addition he acknowledges the existence of structural interference which originates from incompatible demands of concurrent activities on identical structures. This kind of interference is quite obvious for peripheral structures such as the eyes or the hands, and according to Kahneman it is most likely to occur when performance of two tasks requires the same receptors or the same effectors. The contribution of structural interference to the dual-task performance decrement had already been recognized by Bornemann (1942a), and in the later theoretical development discussed in section 3, it figured prominently.
Dual-task performance
125
A third addition to the basic model is that of 'concurrence costs' and 'concurrence benefits' (Navon and Gopher, 1979). These are likely to arise whenever a dual-task 'is more than the sum of two single tasks' in terms of capacity demands. For example, concurrence costs could arise from requirements to coordinate two simultaneous activities, and concurrence benefits would accrue when one task would profit from processes involved in the second task. Concurrence costs and benefits result in a violation of the capacity-sharing assumption.
2.2.3
T e s t i n g M o d e l s of G e n e r a l i z e d C e n t r a l C a p a c i t y
Models of generalized central capacity have been subjected to a variety of different tests. The outcomes of several of these procedures are suggestive, but not conclusive, because they rest on plausible assumptions which by strict criteria are not justified. Two classes can be distinguished: procedures based on performance decrements (i.e. single points on POCs), and procedures based on POCs. Of the first type I shall consider four different variants which all tell the same story about the critical role of structural interference in dual-task performance. The first approach to testing models of generalized central capacity is to search for tasks that can be combined without a dual-task performance decrement, although all that is known suggests performance of the tasks requires capacity or single-channel operations. For example, Allport, Antonis and Reynolds (1972) found that proficient piano players were able to play new examination pieces while shadowing English prose almost without a noticeable impairment as compared with single-task performance. Hirst et al. (1980) enabled their subjects by extensive practice to read a text and simultaneously write another dictated text. Shaffer (1975) found his subjects able to type one text while shadowing another. The nature of the tasks in these experiments suggests that the respective dual-task feats were possible because processing for the two concurrent tasks could be functionally separated. The second test procedure also rests on the demonstration of unexpectedly small dual-task performance decrements, but the reference condition on which the expectation is based is included in the experiment. Basically it is shown that a certain dual-task performance decrement vanishes or at least is reduced considerably upon a relatively minor change in one of the tasks, mainly a change in input or output modality which presumably has (almost) no effect on the PRF. Wickens (1980) has dubbed such effects 'structural alteration effects'. For example, when Trumbo and Milone (1971) combined vocal recall of auditorily presented digits with manual tracking, interference was less than with manual recall of the digits. McLeod (1977) reported less interference for a vocal reaction time task combined with manual tracking than for a manual reaction time task, and when reaction time tasks were combined with rapid aiming movements the interference that was apparent with manual responses was lacking when responses were vocal (McLeod, 1980); similar effects of manual versus vocal responses have also been found by McLeod (1978). More generally, McLeod and Microp (1979) showed a systematic decline of dual-task performance decrements as the similarity between responses in the two tasks was reduced. A third approach that is also based on three tasks (which are combined in three rather than only two combinations) has been pursued by Bornemann (1942a).
126
H. Heuer
Bornemann found only modest interference between mental arithmetic as the one task and either writing one's name and address or sorting balls (by size) with the feet as the second task. Using a somewhat dubious procedure to estimate numerically (relative) mental load, he predicted that sorting balls and writing one's name and address should go together with almost no dual-task performance decrement; contrary to expectation, however, it was almost impossible to perform these tasks concurrently. This result is again suggestive because the unexpectedly high level of interference showed up when both tasks required coordinated skeletomotor responses, i.e. when they were structurally or functionally similar. Heuer (1985b) argued that valid tests, according to strict criteria, require four different tasks i, j, p and q and four combinations ip, iq, jp and jq. The model predicts that the ranking of tasks p and q with respect to interference should be identical in combination with tasks i and j (Navon and Gopher, 1979). This prediction is fairly easy to appreciate intuitively because the ranking of p and q should always reflect their respective relative capacity demands, independent of the demands of a second task. There are several findings that do not satisfy the 'tetrad criterion' (Heuer, 1985b) of the basic central-capacity model. For example, Wickens and Sandry (1982) used memory search with verbal and visuospatial material as tasks i and j and tracking and memory of words with low imagery ratings as tasks p and q. The verbal memory-search task was faster when combined with tracking rather than with verbal memory, while the visuospatial memory-search task was faster when combined with verbal memory rather than with tracking. Corroborating these findings, Baddeley et al. (1975) showed interference of tracking with learning tasks that involved imagery, but not with rote learning; in contrast, rote learning interfered more with light discrimination than did imagery tasks (Baddeley and Lieberman, 1980), and the same ranking was found when light discrimination was replaced by simple reaction time (Griffith and Johnston, 1973). These and other violations of the tetrad criterion again point to the role of structural interference. For example, tracking seems to require structures underlying imagery as do certain verbal learning tasks, but not rote learning. Tasks that use verbal material appear to load other common structures. Whenever structural overlap does exist, interference seems to be higher than without (or with reduced) structural overlap of the two tasks. Of course, it is tempting to relate these findings to what is known about the functional specialization of the cerebral hemispheres (Springer and Deutsch, 1981). The four types of test for models of generalized central capacity that have been reviewed are tests of the basic model, which does not acknowledge the existence of structural interference. The outcomes of all procedures point to its importance. Thus, the question arises as to whether an extended model which includes structural interference effects can survive experimental tests. As far as I can see, this kind of model has not been rejected convincingly so far. The available test procedures assume that structural interference produces shifts of the POC along one or both axes, but no deformation of its shape (Heuer, 1985b; Navon and Gopher, 1979). This assumption may not be justified, but it appears to be necessary to make extended models testable at all. The available tests are concerned with whether or not performance tradeoffs between concurrent tasks can be attributed to competition for a single hypothetical quantity, independent of any constant decrements that are attributed to structural interference. Of course, when
Dual-task performance
127
offsets of the POCs are to be neglected in the analysis, it has to focus on the slopes of POCs. Navon and Gopher (1980) suggested a procedure that has been assigned a critical role for the study of models of central capacity by Gopher and Sanders (1984). The procedure rests on the assumption that PRFs for easier and harder variants of a certain task are different; in particular, the average slope should be steeper for the easier task (Navon and Gopher, 1979). As a consequence, when both task variants are combined with a third task, the slopes of the POCs should be different as well. Therefore, when there is no difference between slopes, or when a difference is found with one manipulation of task difficulty but not with another one, this is inconsistent with extended models of generalized central capacity. Figure 4.6 presents the results of a study by Gopher, Brickner and Navon (1982) in which this test procedure has been used. On the abscissa is shown the performance on a two-dimensional pursuit-tracking task; the relative root-meansquared error is subtracted from 1 so that higher scores correspond to better performance. On the ordinate the mean latency for a serial response task is shown in which finger chords had to be produced in response to Hebrew letters as stimuli;
[ms] 900 E t/1 c: 0 C~ I/1
letters (easy)
4
e-..._.._
1000 -
1100 -
I_
l
,, &
1200
"-N.
-
,
i_
-~
.,i.a
1300 1400
;
7///
(difficult) I
I
0.650
I
I
I
"~"~"~4 I
I
I
0.700
trocking p e r f o r m o n c e
I
I
I
0.750 (1-RMS)
Figure 4.6. Performance-operating characteristics for tracking (root-mean-squared (RMS) error in arbitrary units) and three variants of a typing task that differed in difficulty; dashed lines connect dual-task data points with single-task performance levels. [After Gopher et al., 1982, Figure 4.]
128
H. Heuer
smaller numbers correspond to higher performance levels. The difficulty of the serial response task was varied in two different ways. First, the difficulty of a four-letter set was varied, which roughly corresponds to a manipulation of serial response compatibility in more conventional choice-reaction-time tasks. Second, the number of alternatives was varied: the larger set of 16 letters included the easy and difficult letters in addition to eight letters of intermediate difficulty. All four tasks were performed in isolation, and when they were combined subjects were given three different priority instructions. As is evident from Figure 4.6, the POC was fairly flat for the simple serial response task. It was steepest when only the difficult set of four letters was used, and with 16 letters of mixed difficulty the slope was intermediate. Although in this study the POCs for task variants of different difficulty did not have exactly identical slopes, Gopher et al. (1982) took the fact that manipulations of different difficulty had different effects on the slope of the POC as evidence against models of generalized central capacity. Although the assumption is quite plausible that the average slope of the PRF is steeper for an easier task variant than for a harder one, it does not imply a corresponding difference in slope for all values of capacity. Strictly speaking, a priori assumptions about PFRs for different tasks or different task variants are not justified for a simple reason (Heuer, 1985b): performance measures in general are arbitrary, and the question of which measure one should use is beyond the scope of capacity models (this is different, for example, for serial stage models; cf. Gopher and Sanders, 1984). Therefore all kinds of monotonic transformations are permitted, and by using sufficiently awkward transformations one can generate any kind of non-decreasing PRF (given that it is non-decreasing for at least one measure). A proper test for extended models of limited generalized central capacity has to avoid assumptions about differences between PRFs. However, the practical problems in applying a proper test procedure are such that it will probably never be used (Heuer, 1985b). 2.2.4
T h e I n t e r p r e t a t i o n of ' C a p a c i t y '
So far I have discussed capacity models in a rather formal manner, defining 'capacity' by some formal characteristics. As a hypothetical variable that is assumed to underlie performance on all or at least a majority of tasks, generalized central capacity bears an obvious resemblance to the g-factor of models of human intelligence (see Heuer, 1985b). The resemblance goes beyond the formal characteristics. Spearman (1927) has interpreted the g-factor as representing some kind of generalized mental energy, and this kind of interpretation has also been given to generalized central capacity. The most prominent energetic variant of capacity theory is probably Kahneman's (1973) version. Kahneman uses the terms 'capacity' and 'effort' as synonyms. It is by this interpretation that the capacity notion obtains its obvious relevance for theories of mental workload. The energetic interpretation of capacity captures a fundamental fact of everyday life, namely that people can work harder or less hard on a task and thus may achieve better or worse performance. It produces at least two difficulties for the models as applied to dual-task performance. First, the assumption of a constant upper limit of capacity appears to be incompatible with an energetic interpretation, which more or less enforces the
Dual-task performance
129
assumption of a flexible limit as proposed by Kahneman (1973). Without this it would be hard to explain well-known paradoxical effects such as an improvement of tracking performance upon an increase of task difficulty (Poulton, 1966) or an improvement in mental arithmetic tasks upon small (but not too small) doses of alcohol (Diiker, 1963). As already noted in section 2.2.2, the assumption of a flexible upper limit causes problems for tests of the model. A second difficulty concerns the assumption of non-decreasing PRFs. The effort invested in the performance of a certain task cannot only be estimated from performance on another concurrent task, but also more directly from physiological indicators of activation (like heart rate or pupil diameter; cf. Kahneman, 1973) or simply from motivational factors that are in effect (e.g. incentives). In such experiments a nonmonotonic relation between activation/motivation and performance is typically observed, which is known as the Yerkes-Dodson law and, since its discovery (Yerkes and Dodson, 1908), has invaded the relevant textbooks. This empiric relation seems to be at variance with the assumed shape of PRFs. The energetic interpretation of capacity is not the only one that can be found. Often the notion of limited central capacity seems to be related to the 'narrowness of consciousness' ('Enge des BewuBtseins'), rephrasing the phenomenological fact in functional terms and without reference to subjective experience. Quite common are technomorphic interpretations which can take the form of the limited capacity of an information channel (Broadbent, 1958) or of a computer (Moray, 1967); such interpretations appear very much nourished by the desire to avoid explicit reference to subjective experience. Although the terms 'capacity' and 'resources' are generally used as synonyms, they have somewhat different connotations as far as their interpretation is concerned. With the economic analogy that was introduced by Navon and Gopher (1979), resources can be thought of as various kinds of 'materials' that are needed to perform a task. Norman and Bobrow (1975) vaguely hint at different types of memories or communication channels. More generally one can take resources as a set of functions or structures that are relevant for performing a task. As pointed out by Allport (1980), with such an interpretation it is counterintuitive that the capacity-sharing assumption should hold. Therefore, as long as one talks about generalized capacity, the term 'capacity' should be preferred, while the term 'resources' should be reserved for models that postulate different types of mental entities or quantities, so that one type cannot be exchanged easily with another type to achieve a certain level of performance on a task.
2.3
An Evaluation of Single-Competition Models
Both single-channel models and models of limited generalized central capacity explain dual-task performance decrements in terms of competition for a single central quantity. The two types of models are partly competitors, and partly they are disjunct in that they refer to different phenomena. They are rivals whenever the temporal resolution of the data analysis is sufficiently high to permit a decision about whether the central quantity is allocated to tasks in an all-or-none or a graded fashion. More correctly, models of generalized central capacity include all-or-none allocation as a limiting case and, by formal criteria, embrace single-channel models as variants. There is considerable evidence
130
H. Heuer
against the assumption of all-or-none allocation, even though the falsified predictions of single-channel models may not always be adequate. Therefore, one would be on the safe side to reject single-channel models in favor of capacity models. The cost, however, would be that one has a far less specific theory. Reaction time tasks, which provide a fairly high temporal resolution, have been used extensively to probe the capacity demands in the course of performing various kinds of tasks (Posner and Boies, 1971). The purpose of this research was to characterize mental processes by the amount of capacity needed to perform them (for a review, see Kerr, 1973). However, not only has the probe-reaction time technique been misused by ascribing it a higher temporal resolution than it really has (Heuer, 1981; McLeod, 1980), but one might also doubt the meaning of the inferences. Are there any benefits that accrue from characterizing mental processes in terms of a purely formal variable which has no unambiguous interpretation? It appears that characterizations of mental processes in terms of capacity demands are essentially empty and do not serve any useful purpose. Tests of models of limited generalized central capacity are concerned with whether or not competition for a single central quantity is sufficient to account for patterns of dual-task performance decrements. As is evident from the data reviewed above, it is not. Of course, this outcome also applies to single-channel models. Overall the nature of the violations of the predictions derived- correctly or incorrectly- from the basic capacity model enforces the assumption of structural interference. As far as I can see, however, none of the data is inconsistent with an extended model of generalized central capacity that acknowledges the existence of structural interference; nevertheless, this kind of model does not figure prominently in current research.
3
MULTIPLE PROCESSORS
AND MULTIPLE RESOURCES
The response to the failure of single-competition models to account for the existing data on dual-task performance decrements was the development of multiplecompetition models. These are generalizations of single-channel models and models of generalized central capacity which posit the existence of various different channels or resource pools, respectively.
3.1 Multiprocessor Models Multiprocessor models focus on structural interference. They have been proposed by Allport et al. (1972) to account for the astonishing dual-task performance of their subjects (see section 2.2.3). Instead of a single channel, a set of independent channels (or processors) is posited that work in parallel; each of them has to be time-shared. Multiprocessor models are models of structural interference in that the dual-task performance decrement should depend on the extent to which concurrent tasks access the same processors (or structures). Thus, interference will critically depend on the specific relation between the tasks; two tasks may not interfere with each other if they are subserved by disjunct sets of processors, although each of these
Dual-task performance
131
tasks may produce strong interference when combined with other tasks, the performance of which requires the employment of identical processors. Multiprocessor models have the additional ingredient of time-sharing, i.e. all-or-none dedication of each processor to one of the concurrent tasks at any moment of time (or during any sufficiently brief time interval).
3.1.1
Testing Multiprocessor Models
Probably the only experiment that provided support for both the major assumptions of multiprocessor models has been reported by McLeod (1977). He used a particular variant of a tracking task in which a cursor had to be centered on a display using some kind of bang-bang control; the cursor had a constant acceleration in one or the other direction depending on whether the control stick operated by the subject was to the left or right of the central position. Concurrent with tracking, subjects had to respond as quickly as possible to high or low tones presented in random intervals of 1.5-2.5s. Subjects in the vocal group had to respond by saying 'high' or 'low', while subjects in the manual group had to respond by pressing one or another key with the non-preferred hand. As already mentioned in section 2.2.3, the dual-task performance decrement was larger in the manual than in the vocal group. This represents weak support for the assumption that structural overlap between tasks affects interference. As a test for the assumed time-sharing of common processors, McLeod (1977) performed an analysis of the timing of the responses for the two tasks. In particular he analyzed the time intervals between each choice response and the preceding and following tracking response. The distributions of these time intervals under the assumption of independent responses in the two tasks were determined (and tested using pairs of subjects who worked concurrently but independently on the two tasks) and compared with the observed distributions. The predicted and observed distributions for the two groups are shown in Figure 4.7. In the vocal group, choice responses and tracking responses were independent, but not in the manual group. As can be seen in Figure 4.7, the probability of tracking responses was reduced during a time interval preceding a choice response, and at some later time the probability was increased. This is consistent with time-sharing of a processor involved in response production which, while one signal is processed, neglects other signals and gives priority to the signals that require rapid choice responses. McLeod's (1977) data provide evidence for a common processor for tracking and manual choice responses, but with vocal choice responses no such evidence was seen. Multiprocessor models encounter a difficulty in explaining the dual-task performance decrement that was also found in this condition. McLeod invoked an 'executive controller', a mechanism unique to dual-task performance that requires capacity. It seems, however, that no such device is really needed, but that McLeod's data nicely fit a capacity model that is supplemented with structural interference as predicted by multiprocessor models. There is reason to doubt the generality of McLeod's (1977) finding that sequences of manual and vocal responses are independent with respect to their timing. For example, Fisher (1975a, b) observed dependencies between responses in a manual serial five-choice task and a concurrent mental arithmetic task with vocal responses.
132
H. Heuer
a)
P (tracking response) ~17617617617 D6
OOO
.-I
b)
I(/trl p ackJng response} e . . e o
,
I
1-3 0-9 0-5 0-1~0-1 0-5 0-9 1-3 time before(s} time after(s) two-choice response Figure 4.7. Estimated probabilities for tracking responses as a function of the time preceding and following a binary choice response; dotted curves give estimates for independent response streams. (a) Vocal choice responses; (b) manual choice responses. [AflerMcLeod, 1977, Figure 3.] Klapp (1979, 1981) observed no major differences when different rhythms had to be produced, supported by pacing signals, with the two hands or in a manual-vocal combination; in both task variants there were strong temporal interdependencies (Heuer, 1996). Thus, it may be premature to conclude that common processors are involved in m a n u a l - m a n u a l task combinations, but not in manual-vocal ones. In addition, there is reason to doubt that the temporal interdependencies between sequences of responses indeed come about through time-sharing of a common processor (cf. section 4).
3.1.2
Task Similarity and Dual-Task Performance Decrements
There is no conclusive evidence on time-sharing of individual processors as posited by multiprocessor models. There are also difficulties with the second major assumption, namely that the dual-task performance decrement should increase as tasks become more similar, i.e. as the structural overlap or the number of common
Dual-task performance
133
processors increases. In spite of the abundant evidence that supports this claim (cf. section 2.2.3), there are exceptions and there is a logical difficulty. Both point to the same kind of neglect of multiprocessor models. Consider two concurrent tasks that are made progressively more similar. By this the dual-task performance decrement should increase because of the increase of structural overlap. Therefore, when the tasks become identical, interference should reach a maximum. Obviously this argument is flawed. It is based on an increasing number of common processors or structures, but does not take into account what these processors or structures actually do. It presupposes that every common structure is a source for additional interference. However, if concurrent tasks require identical processes performed by a certain processor or identical functions subserved by a common structure, this might actually result in dual-task facilitation. A variety of facilitative effects of task similarity can be observed when movements with the two hands are to be performed concurrently; in general it is easier to perform identical movements with the two hands than different ones, and movements with different spatiotemporal characteristics such as circles and rectangles are essentially impossible to perform simultaneously except when specific coordinative patterns can be chosen (Heuer, 1996). Results of this type have not only been found when movements had to be performed simultaneously, but also when one movement had to be prepared while another one was executed. Heuer (1985a, experiment 1), for example, found shorter reaction times for the initiation of movements with the one hand when the response signal was presented during execution of the same movement with the other hand than when it was presented during execution of a different movement. Finally, facilitative effects of task similarity are also likely to occur in the simultaneous preparation of two movements or movement sequences, giving rise to so-called response-response compatibility effects (see Heuer, 1990, for a review). Facilitative effects of task similarity are not restricted to motor characteristics of tasks, but they have also been found for stimulus-response translations. Duncan (1979, experiment 2) found that, in a double-stimulation experiment, reaction times to the first and second signal depended on whether the serial response translation rules were the same or different. Essentially equivalent results have been obtained in tracking experiments. Chernikoff, Duey and Taylor (1960) studied performance in a two-dimensional compensatory tracking task in which the transformations for the two axes were varied. In particular, the controlled variable was obtained by integrating the position of the control element one, two or three times (position, velocity and acceleration control). For each transformation on one axis, performance was best when the same transformation was applied to the other axis. Wickens (1989) emphasized a particular condition for efficient dual-task performance that he called 'compatibility of similarity'. For example, identical transformations for the two axes in two-dimensional tracking enhance tracking performance more, as compared with different transformations, when a single two-dimensional joystick is used rather than two separate control sticks (Chernikoff and Lemay, 1963) or when there is a single display showing the two-dimensional error rather than two separate displays, one for each axis (Fracker and Wickens, 1989). More generally, tasks that are integrated in one respect appear to gain more from integration in other respects, where 'integration' roughly means that the tasks are in some way supported by identical or coordinated rather than competing processes.
134
H. Heuer
The facilitative effects of task similarity suggest the conclusion that, in contrast to all models that have been discussed so far, there is more in dual-task performance than only competition; what is needed is a closer look at what the processors and structures are supposed to do or actually do. This perspective on dual-task performance thus overcomes established competition models and makes one look for interactions between processes that subserve the performance of concurrent tasks, which will be undertaken in section 4.
3.2 Multiple-Resource Models Multiple-resource models share with multiprocessor models the concern for structural factors in dual-task performance. Instead of a single type of generalized central capacity, a set of different resource pools is posited; each of them can be (capacity-) shared by concurrent tasks. Thus, multiple resource models subsume multiprocessor models as the limiting case of all-or-none allocation for each type of resource, and they are the most general competition-type model of dual-task performance. Multiple-resource models exist in two different variants: a more formal version that has very little concern for the question of what types of resources do exist, and a less formal version that is mainly concerned with the identification of types of resources and neglects the formal requirements of this type of model. I shall discuss both variants in turn. 3.2.1
F o r m a l A s p e c t s of M u l t i p l e - R e s o u r c e M o d e l s
Although Navon and Gopher (1979) describe the formal characteristics of multipleresource models in some detail, they have never been worked out sufficiently. At the heart of the models there is again a performance-resource function, but it is multidimensional rather than two-dimensional. For a set of n different types of resources cl... ,G .... c,, performance xi on task i is described as: xi = ei + fi(cl . . . . c, . . . . cn),
O ~ C, ~ C,L
f~(0...0) = 0; 6f~/6c, >>,0
forr=l...n;
for r = 1... n
(4)
Thus performance x~ is represented as a surface in a multidimensional space. For some of the resources the partial derivatives ~fi/&r may be zero across the full ranges from zero to the limits GL. These resources do not contribute to task performance; in Navon and Gopher's (1979) terms they are not included in the demand composition of the task. Navon and Gopher (1979) discuss a certain constraint on the performance surface of multiple-resource models. They distinguish between fixed-proportion and variable-proportion functions. With a fixed-proportion function, performance on a task is associated with a fixed proportion of the various types of resources such as 2:1 for resources A and B; except when resources are supplied in the proper proportion, performance is limited by only one type of resource. With variable-
Dual-task performance
135
proportion functions, an available type of resource can replace one that is lacking to some extent. The assumption corresponds to the fact that many tasks can be performed in different ways, e.g. by using different strategies. In the limiting case the contribution of each type of resource to performance will no longer depend on the supply of other types of resources so that performance can be represented as the sum of independent contributions of various resource pools: xi = ei + f i l ( c , ) + "'" fir(G) + "'" + fi~(c,,)
(5)
It is instructive to simplify equation (5) by assuming linear functions fir with zero intercept. The multiple-resource model then becomes similar to a factor-analytic model (Heuer, 1985b). Each kind of model is applied to only one type of data: correlations between performance scores and dual-task performance, respectively. The similarity of the models suggests that convergent conclusions could emerge from both types of data, and on a rather superficial level examples can be found (cf. section 5.2). However, the assumption of linear models in factor analysis can be taken to imply linear functions in equation (5), and this can be rejected, first, because no assumption about linear functions can be justified given that arbitrary transformations of performance scores are possible, and second, because linear PRFs imply linear POCs which do not generally exist. Thus, although the linear decomposition of performance scores is always possible, the hypothetical quantities are specific to particular performance measures and specific analytical procedures; they have no meaning whatsoever beyond this. This, in principle, is true for factor-analytic models as well as for multiple-resource models, at least when the focus is on their formal characteristics.
3.2.2
The Nature of Multiple Resources
The qualitative predictions of multiple-resource models, mainly those regarding the effects of task similarity on dual-task performance decrements, have been used to formulate hypotheses on the nature of resources. The principle is that, if a certain aspect of task similarity results in increased interference, a new type of resource is postulated. As pointed out by Neumann (1985) and exemplified by Hirst and Kalmar (1987), this procedure is likely to produce an inflation of hypothesized types of resources. Nevertheless, the existing models are restricted in this respect and concentrate on a limited set of resource pools. Wickens (1980) gave an impressive overview of task similarity effects on dual-task performance. Based on these data, he suggested a classification of resources along three dimensions. As stressed by Wickens (1984), the resulting model which specifies the nature of resources should not be understood as a complete model of dual-task performance, but only as a usable and parsimonious scheme for gross predictions about whether the performance decrement for one dual-task will be smaller or larger than that for another one. The first dimension that Wickens suggested for the classification of resources is stages of processing. However, only a coarse subdivision of this dimension is used as compared with fully developed stage models (Sanders, 1980); the dimension is taken as binary with encoding and central processing defining one resource pool and responding defining the other one. Wickens' second dimension is modality, in
136
H. Heuer
particular auditory versus visual, and the third dimension comprises verbal and spatial codes. The two types of codes bear an obvious relation to the two cerebral hemispheres in that processing of verbal material is closely associated with left hemisphere activity, while the right hemisphere seems to have a prominent role in the processing of spatial material. Wickens' scheme is somewhat ambiguous about how manual and vocal responses should be fitted in. Wickens (1980) treated them as the modality dimension for the output-related stage. In a later formulation (Wickens, 1984), they were taken as the code dimension on the output side, corresponding to spatial and verbal codes on the input side, respectively. Justification can be found for both these assignments, and arguments could be raised about both of them as well. The hypothesis that the two cerebral hemispheres constitute separate resource pools (Friedman and Poison, 1981) has received particular attention. There is, in fact, abundant evidence for structural interference based on common demands on the same cerebral hemisphere; in particular there is a wealth of data showing that speaking interferes more with right-hand than with left-hand activity (for a review, see Summers, 1990). However, such results are insufficient to support the assumption of separate resource pools; they are consistent with any other model of structural interference such as a multiprocessor model. In a search for different tradeoffs, Friedman, Poison and Dafoe (1988) combined tapping with either hand with a verbal memory task. The memory task was subdivided into three 5 s periods in which the screen was fixated, the material was read aloud and memorized; reproduction followed thereafter. Consistent with previous findings, memory performance suffered more from concurrent right-hand tapping, and right-hand tapping suffered more than left-hand tapping from the concurrent memory task. When tapping rather than memory performance was emphasized, the percentage decrement in the memory task increased by the same amount in both dual-tasks. Tapping during memorizing did not improve upon getting priority, but tapping during reading aloud did. However, the relative effect of the priority manipulation was the same for both hands. Thus, there is no indication for different performance tradeoffs as would be expected on the assumption that the demand composition of the verbal memory task overlaps more with the demand composition of right-hand tapping than that of left-hand tapping.
3.3
An Evaluation of Multiple-Competition Models
The notion of competition for multiple processors or resources captures an important aspect of dual-task performance, namely the effect of task similarity, which in general is an increased performance decrement. The models embrace those data that provide the strongest evidence against single competition models. However, facilitative effects of task similarity pose a problem, and the reason for this is the very notion of competition. To accommodate facilitative effects of task similarity one needs additional concepts such as the concurrence benefits of Gopher and Navon (1979, cf. section 2.2.2) and these must be of a noncompetitive nature. While both multiprocessor and multiple resource models account for structural interference in terms of higher or lower overlap of structures (processors or resource pools), they differ in the way the tasks are assumed to compete. In
Dual-task performance
137
multiprocessor models the competition is for the time that a processor is dedicated to one or the other task; the evidence for multiple time-sharing of a set of independent processors is essentially nonexistent. In multiple resource models, the competition is for different types of resources. Again there is no convincing evidence for the existence of several different sources of performance tradeotfs; as far as I can see, the available data on performance tradeoffs can all be accommodated by the assumption of a single source such as generalized central capacity. Regarding multiple-resource models, it is not only the lack of empirical support that might trigger criticism, but also the very nature of the models (Navon, 1984). First, consider formal variants. It seems that these models represent some kind of takeoff from the experimental ground. There is no way to reject the models by any strict criterion. In this they resemble factor-analytic models. However, there is also no way to estimate parameters and to describe data in terms of the hypothesized resources. Thus, as formal tools the models are useless, and one might ask whether they have any use at all. With respect to models that are more specific on the types of resources, it seems that they have some use, namely as tools for gross predictions about larger or smaller dual-task performance decrements. However, the same predictions could be made when the term 'resources' was replaced by 'structures', thus implying no longer an assumption about how the tasks compete for them. Here, it seems, the resource concept is really a 'soup stone' (Navon, 1984), something that does not add to the flavor of the soup, but also does no harm (with the possible exception of leaving less space for soup in the pot).
4
PROCESS
INTERACTIONS
Competition models are general models of dual-task performance, and they can be general because they deal with a limited set of phenomena, mainly the dual-task performance decrement. Process interactions, in contrast, have a variety of manifestations, and none of the available models covers all of them. Thus, as soon as one turns to process interactions, there is no longer a general model on the conceptual level, but only a general interpretational framework and a set of specific models.
4.1
A Conceptual Framework
In section 3.1.2 it has been emphasized that competition models cannot fully account for the effects of task similarity because they are conceptually restricted to competition for common structures (processors or resource pools) and are blind for the processes that are subserved by them. A general conceptual framework for process interactions covers both of these aspects, which I shall call the 'structural relation' and the 'processing relation'. One way to elaborate the framework has been suggested by Kinsbourne and Hicks (1978). The basic concept of Kinsbourne and Hicks (1978) is that of a 'functional cerebral space', an abstract space in which mental functions or structures that carry them can be located. The distance in this space corresponds to the potential of simultaneous processes that realize the functions to interact; the smaller the distance, the
138
H. Heuer
more likely interactions are to occur or the stronger they are. Distances in functional cerebral space should not be conceived as symmetric because interactions are not necessarily symmetric. Therefore the metaphor of a space which in everyday thinking is so closely associated with (symmetric) Euclidian distances might not be fully appropriate. Functional cerebral space is defined in terms of the likelihood or strength of interactions, but it is also related to the anatomic conditions. In fact, Kinsbourne and Hicks (1978) focus on the assumption that functions that are primarily subserved by the same cerebral hemisphere are closer in functional cerebral space than functions that are primarily subserved by different hemispheres. More generally, topographical studies, using different methods, have indicated that activity during different mental functions is distributed differently across the cerebral surface (Kolb and Wishaw, 1990). Such differences in localization of functions can be taken as one determinant of the distance in functional cerebral space; a second determinant is, of course, the strength of the connections between anatomical locations. Distance in functional cerebral space specifies the structural relation between concurrent tasks or, more accurately, processes involved in the performance of the tasks. The processing relation between tasks is phrased by Kinsbourne and Hicks (1978) in terms of identical or unrelated cerebral programs. With identical programs, a closer distance in functional cerebral space will enhance dual-task performance, while with unrelated programs a closer distance will produce a deterioration. In other words, the major prediction is this: the higher the structural separation of two concurrent tasks, the weaker will be the interactions between simultaneous processes; the interactions again can impede or facilitate performance depending on whether they are between different or identical processes (or processes generating different or identical data).
4.2
Manifestations of Process Interactions
Process interactions can manifest themselves in a variety of forms behaviorally, and I am not aware of a systematic classification except for a rather coarse one proposed by Navon (1985). Therefore, the classification that I propose in this section is rather tentative, but it may serve as a preliminary overview of the variety of phenomena. Processes can be characterized in terms of operations and parameters and in terms of their output signals or the data that they generate. Interactions appear to be possible on both these levels, that of operations and that of data, and their manifestations appear to be of a qualitative (all-or-none) or quantitative (graded) nature. Qualitative manifestations take the form of confusions and intrusions. An example of an intrusion of an inappropriate operation has been described by Duncan (1979, experiment 2; see section 3.1.2): in rapid responses to successive stimuli the serial response translation that is appropriate for the one set of signals can be erroneously used for the other set. Intrusions of inappropriate data appear to be fairly frequent. A couple of days ago, for example, I answered a question posed to me while I was signing some forms by giving my name (of course, the question was not what is my name). Shaffer (1975) reported that, when his subjects typed a visual text and shadowed an auditory one, there were several intrusions
Dual-task performance
139
from each text in the other which seemed rather unsystematic. In contrast to the unidirectional intrusions, confusions are bidirectional. Quantitative manifestations of interactions in general take the form of increased similarity of certain aspects of dual-task performance; although, in principle, process interactions could also result in contrast phenomena, I am not aware of any relevant data. Most of the quantitative manifestations can be described in terms of coupling, as a kind of mutual attraction between concurrent tasks. A simple and well-known example is the tendency of simultaneous aimed movements to be produced with almost identical durations, even though their durations in singletasks are greatly different (Kelso, Southard and Goodman, 1979). Of course, the large set of data of this type is outside the scope of competition models. Intrusions, confusions and increased similarity of some performance characteristics are direct manifestations of process interactions; indirect manifestations are a result of measures taken to avoid the direct ones. Mainly the indirect manifestations are prolonged durations of processes which serve to generate accurate outputs in spite of transient disturbances by a concurrent task. Within limits, serial processing can be used to prevent the direct overt manifestations of process interactions. The terminology used to describe process interactions is somewhat variable. A term that is used fairly often is 'crosstalk'. This concept covers a large part of the phenomena; it mainly refers to the spread of some aspects of the signals or data related to performance of the one task into the signals or data related to performance of the other task, and thus it seems to be somewhat narrower than the term 'process interactions'. Navon and Miller (1987) used the term 'outcome conflict'; this seems to be almost a synonym for crosstalk, but somewhat biased toward the indirect effect of increased processing duration.
4.3
D o m a i n s of Process Interactions
Although process interactions have received relatively little attention in the study of dual-task performance, there is a wealth of relevant data from other fields of inquiry that can only be briefly touched upon in this chapter. The main purpose in doing so will be, first, to provide support for the general scheme of a structural and a processing relation between tasks, and second, to give some examples for the manifestations of process interactions. For the sake of organization, different domains of process interactions can be distinguished, based on whether the processes are more related to responding or to perceiving or to neither of these; the assignment of interactions to these categories, however, is not always unambiguous.
4.3.1
The Motor Domain
Simultaneous movements of different limbs are not independent. In part the interdependencies seem to be established temporarily in the service of a particular action, but in part they seem to be permanent. In general the interdependencies among simultaneous movements are 'soft'; they represent tendencies toward or preferences for certain relations, and they can be overcome to some extent by practice or particular effort. The tendencies toward certain relations can mostly be
140
H. Heuer
described in terms of 'coupling' with respect to certain characteristics of the movements (for an overview, see Heuer, 1991, 1996). Coupling of, or mutual attraction between, concurrent movements seems to exist at least with respect to timing, phasing, forces and muscle groups involved. Most conspicuous is the tendency toward certain temporal relations between concurrent movements. For example, aimed movements do not only tend toward identical durations overall when performed simultaneously (Kelso et al., 1979), but they also share most of the spontaneous trial-to-trial variability (Schmidt et al., 1979). Discrete movements with different spatiotemporal patterns such as circles and rectangles are (almost) impossible to perform concurrently; probably for this reason they have received very little experimental study. Finally, simultaneous periodic movements become extremely difficult when the periods are different (Klapp, 1979, 1981). However, these interactions do not seem to be specifically motor but rather to embrace all kinds of periodic activities. The difference between temporal coupling and phase coupling is not always very clear; it is well defined, however, for periodic movements where period and phase can be distinguished. In general, phase coupling refers to a tendency toward certain relative placements of movements in real time. For example, discrete responses tend to be initiated simultaneously even though in single-tasks their reaction times are different (Haferkorn, 1933). Periodic movements with identical frequencies are biased toward certain phase relations, in particular toward relative phases of 0 or 0.5 so that, if responses are discrete, they are performed simultaneously or in alternation (Yamanishi, Kawato and Suzuki, 1980). For high frequencies the preferred phase of 0.5 vanishes and only the tendency toward phase 0 remains, at least for continuous oscillations (Kelso, 1984). Force coupling refers to the tendency to perform concurrent movements with similar forces, and it seems not to exist in discrete movements. For example, the tendency to perform aimed movements of different amplitudes with similar movement times is accompanied by an increase in the peak force difference (Kelso et al., 1979); this is a necessary consequence of the fact that the amplitudes remain different in spite of the similar durations. In addition, the spontaneous variability of the amplitudes of bimanual movements is not shared by the two hands (Schmidt et al., 1979). In contrast to these observations on discrete movements, force coupling can be evidenced when two streams of periodic movements are combined. Instructed trial-to-trial variations of force for one sequence of movements tend to spread into the other sequence, even when the instruction requests that force levels remain constant (Chang and Hammond, 1987; Kelso, Tuller and Harris, 1983). Homologous coupling denotes the highly conspicuous tendency toward coactivation of homologous muscle groups on both sides of the body. This tendency is not universal. For oscillations of the arms in a forward-backward direction, for example, a tendency toward antagonistic coupling has been reported (Gunkel, 1962) that corresponds to the out-of-phase oscillations of the arms in locomotion. For simultaneous oscillations of the hands and feet, coactivation tendencies appear to be determined largely by identical spatial directions (Baldissera, Cavallari and Civaschi, 1982). According to the general conceptual framework outlined in section 4.1, process interactions should be reduced upon an increase in the structural separation of concurrent tasks. Some support for this general prediction has been reported by Fracker and Wickens (1989) who used a two-dimensional compensatory tracking
Dual-task performance
141
task with a single joystick or two different joysticks, one for each axis. They found that with a single joystick cross-coherence (a measure similar to a cross-correlation) between input on one axis and output on the other axis was higher than with two separate joysticks. This difference, it seems, could arise from a tighter coupling between lateral and forward-backward movements of a single hand than between lateral movements of the one hand and forward-backward movements of the other, or alternatively from more intensive crosstalk between the input signals channeled to less structurally separated motor control processes than between input signals that are channeled to processes which are structurally more separated. The results of Fracker and Wickens are complemented by the findings of Briggs and Kinsbourne (1978; reported in Kinsbourne and Hicks, 1978). Briggs and Kinsbourne used a discrete pursuit-tracking task in which a light had to be aligned with the position of a target light by operating a lever with one of the four limbs. In dual-tasks, pairs of limbs were used and the target light positions were uncorrelated. It turned out that the most efficient pairs were the diagonal ones, followed by limbs on the same side of the body, while performance was worst when the two hands or the two feet were used. Overall the findings suggest a ranking with respect to structural separation that closely follows the anatomical distance of the limbs; even larger than the structural separation of diagonal limbs is possibly that between skeletomotor and vocal responses. A closer examination of the data, however, reveals that such a generalization might not always be correct. Gunkel (1962), for example, reported that the highest degree of independence between simultaneous oscillations was found for the two hands. A similar degree of independence was observed only for ipsilateral arm and leg, while the two legs exhibited the smallest degree of independence. In addition, the quality of process interactions can change as different structures become involved. For example, the two hands exhibit a tendency toward homologous coupling, but for hands and feet the preferred relation between simultaneously activated muscle groups seems to be determined mainly by spatial rather than anatomical factors. Findings like these complicate the concept of structural separation in at least two ways. First, connections between different anatomical centers appear not only to affect the strength or likelihood of process interactions, but also their quality. Second, a higher density of neural connections between two centers does not always imply less structural separation; some of these connections might actually serve to reduce interactions and to achieve a higher degree of independence. The most direct evidence on this has been reported by Preilowski (1972): so-called split-brain patients in whom a large part of the fibers that connect the two cerebral hemispheres had been cut exhibited a reduced independence of bimanual movements rather than an increased independence.
4.3.2
The Perceptual Domain
Process interactions in the perceptual domain can be classified according to whether or not they affect the appearance of stimuli. Although there is a wealth of data on how the appearance of certain stimuli can be modified by the presence of other stimuli, these are of little relevance for dual-task research because stimuli for dual-tasks are generally designed to avoid this kind of interaction. Among the
142
H. Heuer
obvious risks, however, is masking so that stimuli for the one task make the stimuli for the other task hard or even impossible to perceive. Another interaction effect that is relevant for dual-task performance is cross-task grouping of stimuli: stimuli for both tasks can be grouped so that they appear as a unitary Gestalt. As described in section 3.1.2, such cross-task grouping does not necessarily reduce dual-task performance, but can even enhance it, probably when the apparent dual-task is actually performed as a single task (Klapp et al., 1985). Other interactions in the perceptual domain do not affect subjective experience in obvious ways but can be evidenced from performance measures. For example, Navon and Miller (1987) found that the latency for classifying words (e.g. city name yes/no) depended on which words were presented at the same time for a second classification (e.g. boy's name yes/no). In particular the latency for a certain classification was increased when a word was presented for the second classification that required a 'no' response, but was a target or from a similar category (e.g. girl's name) for the first classification. Thus, the stimuli for both classifications were not processed independently. In this experiment the manifestation of, using Miller and Navon's term, outcome conflict was of the more indirect kind in that the latency was increased, but not the error rate. Stimuli for concurrent tasks can be kept apart to a higher degree when different sensory modalities are used rather than a single modality. Even then, however, the stimuli are not perfectly separated, at least when they are both linguistic. Shaffer (1975), in addition to the intrusions described in section 4.2, reported an interesting observation on that matter which can be characterized as 'capture'. In one of his variations of concurrent typing and shadowing (cf. section 2.2.3) he used the same prose for visual and auditory presentations. Because shadowing was faster than typing, the auditory presentation started at an earlier passage of the text. At some time it crossed the visual presentation so that both messages were identical for a while. Thereafter the subjects were unable to separate them again; instead they continued to type and speak the visual text. With respect to visual signals, these can be kept apart to a higher degree when they are presented at more distant locations. For example, Pomerantz and Schwaitzberg (1975) had their subjects sort sets of four simple stimuli: )), ((, 0 and )(. In one task classification was based on the right bracket only [)), () versus ((,)(] so that the left bracket had to be neglected; sorting time decreased with increasing lateral distance between the brackets of each pair. In another task variant the classification was based on the relation between the brackets of each pair [)), (( versus )(, 0] and in this case sorting time increased with lateral distance. More generally, process interactions should lose strength as the selectivity of attention increases. However, this does not necessarily result in an improvement of dual-task performance because efficient performance not only requires attentional separation of stimuli to minimize crosstalk, but also concurrent processing of them. Apparently there is some degree of incompatibility between these two requirements. Attempts to minimize perceptual crosstalk, and by this the dual-task performance decrement, will often be accompanied by serial processing of stimuli that are relevant for the two tasks, and this will have the effect of increasing the decrement again. An example for a reduction of crosstalk by way of spatial separation of visual stimuli that did not produce an improvement of dual-task performance has been reported by Fracker and Wickens (1989). In this experiment (cf. section 4.3.1) an integrated display was used that showed the errors on both axes, but also two
Dual-task performance
143
separate displays, one for each axis. Similar to the effects of integrated versus separate joysticks, the cross-coherence between the displayed error on one axis and the control movements for the other axis tended to be larger with the integrated display. In addition, with two joysticks the cross-coherence between errors on one axis and movements of the joystick for the other axis in the irrelevant dimension was increased, i.e. with an integrated display subjects tended to try to reduce an error by moving a joystick in the proper direction, but this happened to be the wrong joystick. Thus, there was more crosstalk with integrated than with separated displays; nevertheless performance was not worse. 4.3.3
The Central Processing Domain
Responses that are appropriate for one task can be channeled to another concurrent task where they are rather inadequate, and rules for serial response translations can be erroneously applied to the wrong task (cf. section 4.2). In this section I shall briefly describe two other sets of manifestations of process interactions of an apparently central origin. The first one is related to spatial imagery and the second to timing. In section 2.2.3 some results have been reviewed which suggest that tasks involving visuospatial imagery share common structures. The combination of such tasks, in principle, invites analyses in terms of process interactions because they can be described spatially, similar to movements. Nevertheless such analyses are rare. An example has been reported by Johnson (1982). In part, Johnson's study was a replication of a well-known effect, namely the bias that can be introduced by performing a movement of a certain amplitude while another movement of a different amplitude is memorized. In general, when the memorized movement is finally reproduced, its amplitude is biased toward that of the interpolated movement. What was new about Johnson's data is that he obtained identical biasing effects with real interpolated movements and with imagined movements; thus, movement imagery modified the memory trace in the same way as did movement execution. Among the most conspicuous manifestations of process interactions are the temporal constraints on dual-task performance. The available data strongly suggest that concurrent tasks rely on a common timing control (Heuer, 1992). Although temporal constraints have been studied mainly in bimanual rhythms, they are not restricted to the motor domain. For example, Bornemann's (1942a) subjects encountered difficulties in performing rapid mental arithmetic (adding pairs of numbers) concurrent with a slowly paced serial response task. Casual observations show that it is hard to dance a waltz when the marching band plays. Based on more serious research, Keele and Ivry (1990) have suggested that timing of various types of task depends on a particular timing module which is closely associated with cerebellar function.
4.4
Interactions and Competition
Attempts to account for dual-task performance in terms of process interactions can be seen as an alternative to competition models (Navon, 1985). When the two approaches are evaluated from such an either-or perspective the outcome is hardly
144
H. Heuer
debatable: competition models are better developed and provide a more unified explanation of dual-task performance decrements, but they face serious problems when confronted with experimental data - neglecting the difficulties encountered in running stringent tests for multiple competition models. First, they need additional assumptions that no longer adhere to the general principle of competition to handle facilitative effects of task similarity. Second, and more importantly, there is a wealth of data on dual-task performance that is simply outside their scope; these data constitute the main evidence for process interactions and result from more finegrained analyses of dual-task performance than those using global performance scores. Of course it is not really adequate to consider the two approaches to dual-task performance-competition and process interactions-as rivals because they are based to a large part on different levels of data analysis (Heuer and Wing, 1984); each approach could have its merits for its own type of observation. If this perspective were correct, two conditions should be fulfilled. Each approach should be able adequately to handle its relevant data; this condition seems not to be fulfilled for competition models. The second condition is that, in principle, phenomena on the higher level of data analysis should be explicable in terms of phenomena at the lower level. More specifically, dual-task performance decrements should be reducible to manifestations of process interactions. So far there is no compeling evidence that this is possible. It is probably justified to view the two approaches to dual-task performance as complementary, at least for the time being. First, manifestations of process interactions cannot be explained in terms of competition models, and second, it appears unlikely that phenomena such as the performance tradeoff can be explained in terms of process interactions. A useful hybrid model would probably be a model of generalized central capacity which not only acknowledges the existence of structural interference, but which is supplied with the concept of process interactions (and specific models for these).
5
PRACTICE
Practice serves to improve performance, and at the same time performance becomes progressively automatic. The term 'automatization' has many facets, in its everyday as well as in its scientific usage (Neumann, 1984). Probably the most important operational criterion for automatization is the reduction of dual-task performance decrements (Bahrick and Shelly, 1958; Brown and Poulton, 1961; Mohnkopf, 1933). In this section I shall consider some of the factors that underlie this practice effect.
5.1
Automatization
Automatization refers to a modification of task performance in the course of practice such that interference with all other kinds of concurrent tasks will be reduced. In addition other changes are frequently implied that are not directly related to dual-task performance such as a reduced involvement of consciousness and a shift from dependence on external (e.g. visual) stimuli to dependence on
Dual-task performance
145
internal (e.g. proprioceptive) stimuli (cf. Bahrick and Shelly, 1958). Models of generalized central capacity can be used to account for automatization in a simple and straightforward manner, at least as far as the reduction of the dual-task performance decrement is concerned (cf. Heuer, 1988): practice is assumed to modify the PRF such that, first, the capacity needed for any particular level of performance declines over the course of practice, which implies an upward shift of the PRF, and second, for any increase or decrease of performance a larger change of capacity is required, which corresponds to a reduced slope of the PRF (at least for higher levels of capacity). From the simple capacity account of automatization it follows, first, that the dual-task performance decrement after extensive practice is less than after only little practice, and second, that the performance tradeoff between two concurrent tasks is modified; in particular, when the available capacity decreases, performance on a well-practiced task should decline more slowly than on a less practiced task. While there is plenty of support for the first of these expectations, evidence on the second one seems to be lacking.
5.2 Structural Displacement The concept of automatization encounters difficulties when confronted with experimental data. For example, in some experiments it has been found that after a period of practice the performance decrement upon introduction of a concurrent task was larger for those task variants that exhibited greater improvement in the course of practice than in those with less improvement. In particular, performance in tracking tasks with predictable tracks benefits more from practice than performance with unpredictable tracks, but it also suffers more from a concurrent task that is introduced late in practice (Pew, 1974; Trumbo, Noble and Quigley, 1968). More importantly, when concurrent tasks are introduced early and late in practice, the dual-task performance decrement will not invariably be reduced. For tracking tasks such results have been obtained by Noble, Trumbo and Fowler (1967) and McLeod (1973). Bornemann (1942b) found that third-year and second-year apprentices in precision mechanics exhibited a larger performance decrement when filing was combined with mental arithmetic than first-year apprentices. Figure 4.8 shows how practicing one task can modify the dual-task interference with several different concurrent tasks in quite different ways. The practiced task in this experiment was a video game called 'space fortress' (Man6 and Donchin, 1989). Logie et al. (1989, experiment 3) gave their subjects practice for 3 h. After I h of practice on the secondary tasks a first test session with various dual-task conditions followed. Another test was performed after an additional 5 h of practice. Figure 4.8 shows the relative dual-task performance decrement in both tasks plotted against each other; data points from the first and second test session are connected by arrows. Arrows that point to the upper left or lower right indicate improvement on one task and deterioration on the other task; the interpretation of such changes is ambiguous because they might result from a simple change of the performance tradeoff over the course of practice. Results of this kind were obtained for the 'Brooks verbal task' (4), which essentially is a rote-learning task.
146
H. Heuer
ALI(%)
10
~2
20 30
3t
40 J
50 60
70
6 4
BO 60
50
40 30 20 A L2(%)
10
0
Figure 4.8. Changing dual-task performance decrements in the course of practicing the video game "space fortress'; ALl is the relative decrement on the video game; AL2 the relative decrement on the secondary tasks; arrows run from initial to final decrements. Secondary tasks 1-6 are explained in the text. The inset indicates ambiguous changes in terms of directions of arrows (hatched areas) in which the decrement on one task increases but on the other one declines. [After Logie et al., 1989, Tables 13 and 14.1
Interference with three of the six secondary tasks was reduced during practice: 'limerick task' (3), in which the subjects first learned a limerick and then had to answer questions about the relative positions of words; 'map task' (5), in which a map of an island with six locations was shown and then questions about the relative positions of the locations had to be answered; and 'Brooks spatial task' (6), which is essentially an imagery task. In contrast there were two secondary tasks for which the dual-task performance decrement increased: 'repeat-a-day task' (1), in which the subjects had to repeat auditorily presented days of the week as rapidly as possible; and 'rapid-tapping' (2) with the feet. Overall it seems that the dual-task performance decrement for most concurrent tasks is reduced when a task is practiced but that there are some exceptions. Heuer (1984) pointed out that this pattern of results closely resembles the findings on intertask correlations: when a task is practiced, most of its correlations with
Dual-task performance
147
performance on other tasks are reduced, but a minority of these correlations increase. Both sets of findings invite the same kind of interpretation that is based on the structural relation between tasks, more specifically on its changes that come about through a structural displacement of the practiced task. Such a displacement will have the effect that the structural separation between the practiced task and some concurrent tasks will increase, while it will decrease for other concurrent tasks. To the extent that structural separation reduces dual-task interference, the dual-task performance decrement should be reduced or increased, respectively. To the extent that structural separation reduces intertask correlations, these will be reduced or increased as well. It is likely that structural displacement is accompanied by another change, structural constriction (Heuer, 1984). During practice, tasks become structurally narrow and performance becomes progressively task-specific (Logan, 1988). Thus, from a structural interference perspective, data that seem to indicate automatization are of less interest than data on increasing dual-task performance decrements; the former reflect both structural constriction and displacement, but the latter tap the structures that are the targets of the structural displacement during practice. In the experiment of Logie et al. (1989), for example, the data indicate that in the course of practice progressively higher demands are made on response-related structures and progressively lower demands on structures that carry verbal and imaginal processes. Although the findings on dual-task performance and intertask correlations both suggest an interpretation in terms of structural relations, it is not known whether the two types of data would lead to convergent conclusions in specific instances. As far as I am aware there is no study in which both dual-task interference and intertask correlations were assessed for the same set of tasks that were combined with a certain task early and late in practice.
5.3 Time-Sharing Skills So far I have discussed changes of dual-task performance decrements that result from single-task practice; in this section I shall consider dual-task rather than single-task practice. In this case it is likely that dual-task performance not only improves because of changes in each individual task, but also because of changes that pertain to concurrent task performance. In addition to the individual skills a time-sharing skill could be developed. (Here the term 'time-sharing' should be understood in a broad sense of a 'dual-task skill', not only in the narrow sense of a skill of allocating time slots to concurrent tasks.) The explanation of dual-task feats such as those reported in section 2.2.3 usually invokes the structural separation of the tasks, and this might be an important facet of a time-sharing skill. For example, strategies in performing the tasks could be chosen so as to increase their structural separation, and, if that is impossible, temporal relations could be developed that minimize strictly simultaneous demands on common structures and at the same time reduce overall performance only a little. In spite of its plausibility, there is only little formal evidence on the development of time-sharing skills. An exception is the optimization of timesharing in monitoring several displays, as described in section 1.2.
148
H. Heuer
As an example of a study that purports to demonstrate the development of a time-sharing skill, consider that of Damos and Wickens (1980). On the first day of the experiment the subjects practiced two tasks, a classification task and a serial short-term memory task. In the classification task two digits (from the set 5-8) were presented which could be the same or different and be of the same or different size; subjects had to press as rapidly as possible one of three keys operated by the left hand depending on whether the digits differed on 0, 1 or 2 dimensions; 40 ms after a response the next pair of digits followed. In the memory task a random sequence of digits (1-4) was presented and the subjects had to respond by way of pressing one of four keys operated by the right hand, corresponding to the digit presented just before; again the response was immediately followed by the next stimulus. The first 10 trials were single-task; thereafter both tasks were performed concurrently, and occasional single-task trials were inserted. There was a practice effect in dual-task trials in spite of a constant performance level in single-task trials. Similar results were obtained on the second day of the experiment when subjects had to perform two unidimensional compensatory tracking tasks concurrently. Again there was considerable improvement in dual-task performance, but very little improvement in single-task trials. Although these data suggest the development of a time-sharing skill, they are in no way convincing. They are consistent with the assumption of a progressive automatization of single-task performance; when the PRF gradually changes, subjects might choose to keep single-task performance constant and to reduce their effort (or the capacity supplied). To prove the existence of a time-sharing skill one has to show that the dual-task performance decrement is smaller after dual-task practice than after single-task practice. Although the criterion of different practice effects in dual-task and single-task trials is insufficient to prove the existence of time-sharing skills, Damos and Wickens (1980) found other evidence of it. On the second day dual-task tracking but not single-task tracking differed between groups who had dual-task practice on the first day or only single-task practice: dual-task practice on two discrete tasks thus produced some transfer to dual-task performance on two fairly different continuous tasks. In general, however, there is no strong support for the hypothesis that time-sharing skills developed for a certain combination of tasks are transferable to other task combinations (Poison et al., 1989).
ACKNOWLEDGEMENTS This chapter has gained from the valuable comments of Angus Craig and an anonymous reviewer.
REFERENCES Allport, D. A. (1980). Attention and performance. In G. Claxton (Ed.), Cognitive Psychology. New Directions (pp. 112-153). London: Routledge & Kegan Paul. Allport, D. A. (1987). Selection for action. Some behavioral and neurophysiological considerations of attention and action. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action (pp. 395-419). Hillsdale, NJ: Erlbaum.
Dual-task performance
149
Allport, D. A., Antonis, B. and Reynolds, P. (1972). On the division of attention: A disproof of the single-channel hypothesis. Quarterly Journal of Experimental Psychology, 24, 225-235. Baddeley, A. D., Grant, S., Wight, E. and Thomson, N. (1975). Imagery and visual working memory. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance V. New York: Academic Press. Baddeley, A. D. and Lieberman, K. (1980). Spatial working menory. In R. S. Nickerson (Ed.), Attention and Performance VIII (pp. 521-539). Hillsdale, NJ: Erlbaum. Bahrick, H. P. and Shelly, C. (1958). Time sharing as an index of automatization. Journal of Experimental Psychology, 56, 288-293. Baldissera, F., Cavallari, P. and Civaschi, P. (1982). Preferential coupling between voluntary movements of ipsilateral limbs. Neuroscience Letters, 34, 95-100. Bornemann, E. (1942a). Untersuchungen iJber den Grad der geistigen Beanspruchung. I. Teil: Ausarbeitung der Methode. Arbeitsphysiologie, 12, 142-172. Bornemann, E. (1942b). Untersuchungen iiber den Grad der geistigen Beanspruchung. II. Teil: Praktische Ergebnisse. Arbeitsphysiologie, 12, 173-191. Broadbent, D. E. (1958). Perception and Communication. New York: Pergamon. Brown, I. D. and Poulton, E. C. (1961). Measuring the spare 'mental capacity' of car drivers by a subsidiary task. Ergonomics, 4, 35-40. Chang, P. and Hammond, G. R. (1987). Mutual interactions between speech and finger movements. Journal of Motor Behavior, 19, 265-274. Chernikoff, R., Duey, J. W. and Taylor, F. V. (1960). Two-dimensional tracking with identical and different control dynamics in each coordinate. Journal of Experimental Psychology, 60, 318-322. Chernikoff, R. and Lemay, M. (1963). Effect of various display-control configurations on tracking with identical and different coordinate dynamics. Journal of Experimental Psychology, 66, 95-99. Craik, K. J. W. (1947). Theory of the human operator in control systems. I. The operator as an engineering system. British Journal of Psychology, 38, 56-61. Craik, K. J. W. (1948). Theory of the human operator in control systems. II. Man as an element in a control system. British Journal of Psychology, 38, 142-148. Damos, D. and Wickens, C. D. (1980). The acquisition and transfer of time-sharing skills. Acta Psychologica, 6, 569-577. de Jong, R., Coles, M. G. H., Logan, G. D. and Gratton, G. (1990). In search of the point of no return: The control of response processes. Journal of Experimental Psychology: Human Perception and Performance, 16, 164-- 182. Diiker, H. (1963). Uber reaktive Anspannungssteigerung. Zeitschrift fiir experimentelle und angewandte Psychologie, 10, 46-72. Duncan, J. (1979). Divided attention: The whole is more than the sum of its parts. Journal of Experimental Psychology: Human Perception and Performance, 5, 216-228. Elithorn, A. and Lawrence, C. (1955). Central inhibition-some refractory observations. Quarterly Journal of Experimental Psychology, 7, 116-127. Ells, J. G. (1973). Analysis of temporal and attentional aspects of movement control. Journal of Experimental Psychology, 99, 10-21. Fisher, S. (1975a). The microstructure of dual-task interaction. 1. The patterning of main-task responses within secondary-task intervals. Perception, 4, 267-290. Fisher, S. (1975b). The microstructure of dual-task interaction. 2. The effect of task instructions on attentional allocation and a model of attention-switching. Perception, 4, 459-474. Fracker, M. L. and Wickens, C. D. (1989). Resources, confusions and compatibility in dual-axis tracking: Displays, controls and dynamics. Journal of Experimental Psychology: Human Perception and Performance, 15, 80-96. Friedman, A. and Polson, M. C. (1981). Hemispheres as independent resource systems: Limited-capacity processing and cerebral specialization. Journal of Experimental Psychology: Human Perception and Performance, 7, 1031-1058.
H. Heuer
150
Friedman, A., Polson, M. C. and Dafoe, C. G. (1988). Dividing attention between the hands and the head: Performance trade-offs between rapid finger tapping and verbal memory. Journal of Experimental Psychology, 14, 60-68. Gopher, D., Brickner, M. and Navon, D. (1982). Different difficulty manipulations interact differently with task emphasis: Evidence for multiple resources. Journal of Experimental Psychology: Human Perception and Performance, 8, 146-157. Gopher, D. and Sanders, A. F. (1984). S-Oh-R: Oh stages! Oh resources! In W. Prinz and A. F. Sanders (Eds), Cognition and Motor Processes (pp. 231-253). Berlin: Springer. Greenwald, A. G. and Shulman, H. G. (1973). On doing two things at once. II. Elimination of the psychological refractory period. Journal of Experimental Psychology, 101, 70-76. Griffith, D. and Johnston, W. (1973). An information-processing analysis of visual imagery. Journal of Experimental Psychology, 100, 141-146. Gunkel, M. (1962). Uber relative Koordination bei willkiirlichen menschlichen Gliederbewegungen. Pfliigers Archiv fiir die gesamte Physiologie, 275, 472-477. Haferkorn, W. (1933). Uber die zeitliche Eingliederung yon Willkiirbewegungen. Neue psychologische Studien, 9, 37-63. Heuer, H. (1981). Uber Beanspruchungs/inderungen im Verlauf schneller gezielter Bewegungen. Zeitschrifi fiir experimentelle und angewandte Psychologie, 28, 255-280. Heuer, H. (1984). Motor learning as a process of structural constriction and displacement. In W. Prinz and A. F. Sanders (Eds), Cognition and Motor Processes (pp. 295-305). Berlin: Springer. Heuer, H. (1985a). Intermanual interactions during simultaneous execution and programming of finger movements. Journal of Motor Behavior, 17, 335-354. Heuer, H. (1985b). Some points of contact between models of central capacity and factoranalytic models. Acta Psychologica, 60, 135-155. Heuer, H. (1988). 'Pseudoautomatization' in manual control: A simulation study. Ergonomics, 31, 1729-1742. Heuer, H. (1990). Rapid responses with the left or right hand: Response-response compatibility effects due to intermanual interactions. In R. W. Proctor and T. G. Reeve (Eds), Stimulus-Response Compatibility. An Integrated Perspective (pp. 311-342). Amsterdam: North-Holland. Heuer, H. (1991). Motor constraints in dual-task performance. In D. Damos (Ed.), MultipleTask Performance (pp. 173-204). London: Taylor and Francis. Heuer, H. (1996). Coordination. In H. Heuer and S. W. Keele (Eds), Handbook of Perception and Action, Vol. 2: Motor skills (pp. 121-180). London: Academic Press. Heuer, H. and Wing, A. M. (1984). Doing two things at once: Process limitations and interactions. In M. M. Smyth and A. M. Wing (Eds), The Psychology of Human Movement (pp. 183-213). London: Academic Press. Hick, W. E. (1948). The discontinuous functioning of the human operator in pursuit tasks. Quarterly Journal of Experimental Psychology, 1, 36-51. Hillgruber, A. (1912). Fortlaufende Arbeit und Willensbet/itigung. Untersuchungen zur Psychologie und Philosophie, 1, Heft 6. Hirst, W. and Kalmar, D. (1987). Characterizing attentional resources. Journal of Experimental Psychology: General, 116, 68-81. Hirst, W., Spelke, E. S., Reaves, C. C., Caharack, G. and Neisser, U. (1980). Dividing attention without alternation or automaticity. Journal of Experimental Psychology: General, 109, 98-117. Johnson, P. (1982). The functional equivalence of imagery and movement. Quarterly Journal of Experimental Psychology, 34A, 349-365. Johnston, W. A., Greenberg, S. N., Fisher, R. P. and Martin, D. W. (1970). Divided attention: A vehicle for monitoring memory processes. Journal of Experimental Psychology, 83, 164-171. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall. ..
Dual-task performance
151
Keele, S. W. and Ivry, R. (1990). Does the cerebellum provide a common computation for diverse tasks? A timing hypothesis. In A. Diamond (Ed.), The Development and Neural Bases of Higher Cognitive Function. Annals of the New York Academy of Sciences, Vol. 608. Kelso, J. A. S. (1984). Phase transitions and critical behavior in human bimanual coordination. American Journal of Physiology: Regulatory, Integrative, and Comparative, 246, R1000-R1004. Kelso, J. A. S., Southard, D. L. and Goodman, D. (1979). On the coordination of two-handed movements. Journal of Experimental Psychology: Human Perception and Performance, 5, 229-238. Kelso, J. A. S., Tuller, B. and Harris, K. S. (1983). A 'dynamic pattern' perspective on the control and coordination of movement. In P. F. MacNeilage (Ed.), The Production of Speech (pp. 137-173). New York: Springer. Kerr, B. (1973). Processing demands during mental operations. Memory and Cognition, 1, 401-412. Kinsbourne, M. and Hicks, R. E. (1978). Functional cerebral space: A model for overflow, transfer and interference effects in human performance: A tutorial review. In J. Requin (Ed.), Attention and Performance VII (pp. 345-362). Hillsdale, NJ: Erlbaum. Klapp, S. T. (1979). Doing two things at once: The role of temporal compatibility. Memory and Cognition, 7, 375-381. Klapp, S. T. (1981). Temporal compatibility in dual motor tasks. II: Simultaneous articulation and hand movements. Memory and Cognition, 9, 398-401. Klapp, S. T., Hill, M., Tyler, J., Martin, Z., Jagacinski, R. and Jones, M. (1985). On marching to two different drummers: Perceptual aspects of the difficulties. Journal of Experimental Psychology: Human Perception and Performance, 11, 814-828. Kolb, B. and Whishaw, I. Q. (1990). Fundamentals of Human Neuropsychology, 3rd edn. New York: Freeman. Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Review, 95, 492-527. Logie, R., Baddeley, A., Man6, A., Donchin, E. and Sheptak, R. (1989). Working memory in the acquisition of complex cognitive skills. Acta Psychologica, 71, 53-87. Man6, A. and Donchin, E. (1989). The space fortress game. Acta Psychologica, 71, 17-22. Massey, J. T., Schwartz, A. B. and Georgopoulos, A. P. (1986). On information processing and performing a movement sequence. In H. Heuer and C. Fromm (Eds), Generation and Modulation of Action Patterns (pp. 242-251). Berlin: Springer. McLeod, P. M. (1973). Interference of 'attend to and learn' tasks with tracking. Journal of Experimental Psychology, 99, 330-333. McLeod, P. (1977). A dual task response modality effect: Support for multiprocessor models of attention. Quarterly Journal of Experimental Psychology, 29, 651-667. McLeod, P. (1978). Does probe RT measure central processing demand? Quarterly Journal of Experimental Psychology, 30, 83-89. McLeod, P. (1980). What can probe RT tell us about the attentional demands of movement? In G. E. Stelmach and J. Requin (Eds), Tutorials in Motor Behavior (pp. 579-589). Amsterdam: North-Holland. McLeod, P. and Microp, J. (1979). How to reduce manual response interference in the multiple task environment. Ergonomics, 22, 469-475. Mohnkopf, W. (1933). Zur Automatisierung willk6rlicher Bewegungen (zugleich ein Beitrag zur Lehre v o n d e r Enge des BewuBtseins). Zeitschrifl ~r Psychologie, 130, 235-299. Moray, N. (1967). Where is capacity limited? A survey and a model. Acta Psychologica, 27, 84-92. Moray, N. (1986). Monitoring behavior and supervisory control. In K. R. Boff, L. Kaufman and J. P. Thomas (Eds), Handbook of Perception and Human Performance. Vol. II: Cognitive Processes and Performance (pp. 40.1-40.51). New York: Wiley. Navon, D. (1984). Resources-A theoretical soup stone? Psychological Review, 91, 216-234.
152
H. Heuer
Navon, D. (1985). Attention division or attention sharing? In M. I. Posner and O. M. Marin (Eds), Attention and Performance XI (pp. 133-146). Hillsdale, NJ: Erlbaum. Navon, D. and Gopher, D. (1979). On the economy of the human processing system. Psychological Review, 86, 214-255. Navon, D. and Gopher, D. (1980). Task difficulty, resources and dual-task performance. In R. S. Nickerson (Ed.), Attention and Performance VIII (pp. 297-315). Hillsdale, NJ: Erlbaum. Navon, D. and Miller, J. (1987). Role of outcome conflict in dual-task interference. Journal of Experimental Psychology: Human Perception and Performance, 13, 435-448. Neumann, O. (1980). Informationsselektion und Handlungssteuerung. Bochum, unpublished dissertation. Neumann, O. (1984). Automatic processing: A review of recent findings and a plea for an old theory. In W. Prinz and A. F. Sanders (Eds), Cognition and Motor Processes (pp. 255293). Berlin: Springer. Neumann, O. (1985). Die Hypothese begrenzter Kapazit/it und die Funktionen der Aufmerksamkeit. In O. Neumann (Ed.), Perspektiven der Kognitionspsychologie (pp. 185-229). Berlin: Springer. Neumann, O. (1987). Beyond capacity: A functional view of attention. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action (pp. 361-394). Hillsdale, NJ: Erlbaum. Neumann, O. (1990). Visual attention and action. In O. Neumann and W. Prinz (Eds), Relationships between Perception and Action. Current Approaches (pp. 227-267). Berlin: Springer. Noble, A., Trumbo, D. A. and Fowler, F. (1967). Further evidence on secondary task interference in tracking. Journal of Experimental Psychology, 73, 146-149. Norman, D. A. and Bobrow, D. G. (1975). On data-limited and resource-limited processes. Cognitive Psychology, 7, 44-64. Pew, R. W. (1974). Levels of analysis in motor control. Brain Research, 71, 393-400. Polson, M. C., Wickens, C. D., Klapp, S. T. and Colle, H. A. (1989). Human interactive informational processes. In P. A. Hancock and M. H. Chignell (Eds), Intelligent Interfaces: Theory, Research and Design (pp. 129-164). Amsterdam: North-Holland. Pomerantz, J. R. and Schwaitzberg, S. D. (1975). Grouping by proximity: Selective attention measures. Perception and Psychophysics, 18, 355-361. Posner, M. I. and Boies, S. J. (1971). Components of attention. Psychological Review, 78, 391-408. Poulton, E. C. (1966). Tracking behavior. In E. A. Bilodeau (Ed.), Acquisition of Skill (pp. 361-410). New York: Academic Press. Preilowski, B. (1972). Possible contribution of the anterior forebrain commissures to bilateral motor coordination. Neuropsychologia, 10, 267-277. Sanders, A. F. (1980). Stage analysis of reaction processes. In G. E. Stelmach and J. Requin (Eds), Tutorials in Motor Behavior (pp. 331-354). Amsterdam: North-Holland. Schmidt, K. H., Kleinbeck, U. and Brockmann, W. (1984). Motivational control of motor performance by goal-setting in a dual-task situation. Psychological Research, 46, 129-141. Schmidt, R. A., Zelaznik, H. N., Hawkins, B., Frank, J. S. and Quinn, J. T. (1979). Motor-output variability: A theory for the accuracy of rapid motor acts. Psychological Review, 86, 415-451. Schweickert, R. and Boggs, G. J. (1984). Models of central capacity and concurrency. Journal of Mathematical Psychology, 28, 223-281. Senders, J. W. (1966). A re-analysis of the pilot eye-movement data. IEEE Transactions on Human Factors in Electronics, HFE-7, 103-106. Senders, J. W. (1983). Visual Scanning Processes. Tilburg: University of Tilburg Press. Senders, J. W., Elkind, J. E., Grignetti, M. C. and Smallwood, R. P. (1964). An Investigation of the Visual Sampling Behavior of Human Observers. Cambridge, MA: Bolt, Beranek and Newman.
Dual-task performance
153
Shaffer, L. H. (1975). Multiple attention in continuous verbal tasks. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance V (pp. 157-167). London: Academic Press. Shallice, T. (1972). Dual functions of consciousness. Psychological Review, 79, 303-393. Shallice, T. (1978). The dominant action-system: An information-processing approach to consciousness. In K. S. Pope and J. L. Singer (Eds), The Stream of Consciousness. Psychological Investigations into the Flow of Private Experience. New York: Plenum. Sheridan, T. B. and Ferrell, W. R. (1974). Man-Machine Systems. Information, Control and Decision Models of Human Performance. Cambridge, MA: MIT Press. Smith, M. C. (1969). The effect of varying information on the psychological refractory period. Acta Psychologica, 30, 220-231. Spearman, C. (1927). The Abilities of Man. New York: Macmillan. Springer, S. P. and Deutsch, G. (1981). Left Brain, Right Brain. San Francisco, CA: Freeman. Summers, J. J. (1990). Temporal constraints on concurrent task performance. In G. R. Hammond (Ed.), Cerebral Control of Speech and Limb Movements. Amsterdam: NorthHolland. Telford, C. W. (1931). The refractory phase of voluntary and associative responses. Journal of Experimental Psychology, 14, 1-36. Tolkmitt, F. J. (1973). A revision of the psychological refractory period. Acta Psychologica, 37, 139-154. Trumbo, D. and Milone, F. (1971). Primary task performance as a function of encoding, retention and recall in a secondary task. Journal of Experimental Psychology, 91, 273-279. Trumbo, D., Noble, M. and Quigley, J. (1968). Sequential probabilities and the performance of serial tasks. Journal of Experimental Psychology, 76, 364-372. Vince, M. A. (1948). The intermittency of control movements and the psychological refractory period. British Journal of Psychology, 38, 149-157. Vince, M. A. (1949). Rapid response sequences and the psychological refractory period. British Journal of Psychology, 40, 23-40. Vince, M. A. and Welford, A. T. (1967). Time taken to change the speed of a response. Nature, 213, 532-533. Vorberg, D. (1985). Unerwartete Folgen von zuf~illiger Variabilit~it: Wettlauf-Modelle fiir den Stroop-Versuch. Zeitschrift fiir experimentelle und angewandte Psychologie, 32, 494-521. Welford, A. T. (1952). The 'psychological refractory period' and the timing of high-speed performance-A review and a theory. British Journal of Psychology, 43, 2-19. Welford, A. T. (1980). The single-channel hypothesis. In A. T. Welford (Ed.), Reaction Times. London: Academic Press. Wickens, C. D. (1980). The structure of attentional resources. In R. S. Nickerson (Ed.), Attention and Performance VIII (pp. 239-257). Hillsdale, NJ: Erlbaum. Wickens, C. D. (1984). Engineering Psychology and Human Performance. Columbus, OH: C. E. Merrill. Wickens, C. D. (1989). Attention and skilled performance. In D. H. Holding (Ed.), Human Skills, 2nd edn (pp. 71-105). Chichester: Wiley. Wickens, C. D. and Sandry, D. L. (1982). Task-hemispheric integrity in dual-task performance. Acta Psychologica, 52, 227-248. Woollacott, M. and Jensen, J. (1996). Locomotion and stance. In H. Heuer and S. W. Keele (Eds), Handbook of Perception and Action. Vol. 2: Motor Skills (pp. 333-403). London: Academic Press. Yamanishi, J., Kawato, M. and Suzuki, R. (1980). Two coupled oscillators as a model of the coordinated finger tapping by both hands. Biological Cybernetics, 37, 219-225. Yerkes, R. M. and Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18, 459-482.
Chapter 5 Involuntary Attention M. Eimer, D. Nattkemper, E. Schr6ger and W. Prinz Department of Psychology, University of Munich and Max-Planck-Institute ]:orPsychological Research, Munich, Germany
The concept of attention refers to at least three major functions. The first is the function of mobilizing aspecific mental energy to enable certain information processing activities (Berlyne, 1974). The second is concerned with integration of visual features into a localized and identifiable object (Treisman and Gelade, 1980). Finally, selection of information is considered to be a third function of attention. Selection is said to occur when the cognitive system chooses a small portion of currently present information for further processing while at the same time excluding the remaining information from further consideration. The act of selecting information is equivalent to directing attention to the events or stimuli containing the information. Directing attention, or attending, can come about in various ways. First, it can be intentional. This occurs, for instance, when a person searches for a certain object, which has the effect that all objects that are similar to the target receive more attention. Whenever attending is intentional, it is said to be voluntary. In contrast, attending is involuntary when it does not stem from intentions but is elicited from outside events. A simple example is a loud bang, which elicits attending in the absence of any intention to take notice of loud noises. While involuntary attention is elicited bottom-up, voluntary attention is top-down in the sense that attention is directed to outside events by inner intentions. This chapter is concerned with research on involuntary attention and in particular with the conditions in which it occurs. The distinction between voluntary and involuntary attention is already common in the early literature (D6rr, 1907; Ebbinghaus, 1911; Elsenhans, 1912; James, 1890, p. 304; Kreibig, 1897). 1 More recent dichotomies, such as the distinction between automatic and controlled processing, or between exogenous and endogenous control of attention (Neumann, 1984; Ohman, 1992; Shiffrin and Schneider, 1977; Theeuwes, 1991) are related to the distinction between involuntary and voluntary attention, but should still be distinguished for two major reasons. First, involuntary attending refers only to the selection of information, while 1As an exception, Wundt (1903, vol. III, p. 303) rejected the notion of involuntary attention, since he considered all attentional phenomena as an expression of the will.
Handbook of Perception and Action, Volume 3 ISBN 0-12-516163-8
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
155
156
M. Eimer et al.
automaticity is concerned, in addition, with subsequent processing of information. 2 Second, as will be shown, involuntary attending is not merely a matter of the occurrence of outside events, but is also tied to certain internal conditions. Although the elicitation of involuntary attention is 'exogenous', its realization requires at least an interaction between endogenous and exogenous factors. This is easily overlooked if involuntary attention is said to be externally controlled. In the next section, a brief taxonomy will be developed of the conditions in which attending is involuntarily elicited. This will be done on the basis of observations from everyday life. It will be shown that this attempt towards a systematic classification of the phenomena requires theoretical considerations about the functional basis of the processes involved.
1
A TAXONOMY
OF THE PHENOMENA
What external events might elicit involuntary attending? It seems useful in this regard to distinguish between specific and aspecific involuntary selection (Prinz, 1983a). There is specific selection if features of an event or situation attract involuntary attention. Pictures of a pinup g i r l - or a handsome m a n - are usually apt to attract the gaze of m a l e - or f e m a l e - observers. The same can be said of a crying baby., with respect to its mother, or of a nicely displayed meal with respect to a hungry person. In such cases, attending results from specific events, which fit latent wishes, desires or interests of the observer. The structure of the antecedents of this type of selection m a y well be similar to those of voluntary selection in that attending is the consequence of a specific mental set. The critical difference with voluntary a t t e n t i o n - w h i c h leads to the qualification 'involuntary' in the abovedescribed e x a m p l e s - is the absence of an explicit intention to attend. Instead there is merely a latent disposition. Since in the case of specific selection the b o u n d a r y between an explicit intention and a latent disposition is unclear, there is a smooth transition between voluntary and involuntary attention. This is one of the main reasons to focus the argument on processes relating to aspecific selection, which refers to conditions in which directing attention does not depend on specific but on relational features. Relational features are characterized by deviations from what is common for a certain situation. A sudden noise in a silent environment attracts attention. Again, a single vertical bar, surrounded by many horizontal bars, pops out (Treisman, 1982). A false tone in a piece of music also elicits attending involuntarily. In such cases the specific content of the stimulus event is not responsible for attending: thus the vertical bar amidst many horizontal ones is not conspicuous because it is vertical but because it deviates from the spatial orientation of the other bars. If these were vertical, then a single horizontal bar would strike the eye. Hence, it is a matter of aspecific selection that in principle can be elicited by an unlimited variety of completely different stimuli. 2Involuntary selection and unlimited capacity of processing are combined in the concept of 'automatic processing'. But one might well think of a combination of involuntary selection with other kinds of processing (see, for example, the models discussed in section 1). Last but not least, for this reason it is necessary to make a difference between a selection and a processing component when describing attentional processes: the process of selection (of specific information for further processing) is not necessarily identical with the act of processing (of this selected information).
Involuntary attention
157
In cases of aspecific selection it is a discrepant stimulus that involuntarily attracts attention. 3 The sudden noise stands out from the earlier relatively lower noise level. The pop-out stimulus does not fit the homogeneous pattern of the remaining situational context and the false tones do not tune in with the melody of the musical piece. The first and last of these examples are concerned with deviations from a structure developing in time, while the pop-out effect is a deviation from a spatial structure. Spatial deviations are usually dealt with in relation to problems of perceptual structuring and will therefore not be discussed in this chapter. Instead, this chapter will focus on the role of situational deviations occurring over time in eliciting involuntary attending. Two types of deviations will be distinguished: level shifts and rule deviations. Level shifts occur whenever there is an obtrusive change in the stimulus situation, e.g. the appearance of a new stimulus or a change in the value of a feature of a stimulus which is actually present. In contrast, rule deviations refer to changes in a rule-based sequence of events. This is the case, for instance, when a repetitive sequence is broken off. If, say, a dripping tap suddenly stops dripping, the absence of the sound of dripping constitutes a deviation of the formerly established repetitive sequence of events. Involuntary attending can also be elicited if the interrupted regularity relates to a more complex structure of events. This is the case in the example of the false tone. A false tone attracts attention because it represents a deviation from the rules according to which the sequence of tones is structured in that particular piece of music. H o w to explain that situational deviations elicit attention? It is a prerequisite that the situational deviation is registered, and in turn, this requires a comparison between a present and an earlier situational state. There could be different underlying mechanisms, depending on whether there is a level shift or a rule deviation. In the case of a level shift a simple sensor mechanism might be responsible, a mechanism that registers changes in elementary sensory features. In the case of a rule deviation, there must be a more complex mechanism, which enables registration of deviations from a prior existing event structure. Therefore, this mechanism must be capable of maintaining a representation of that prior event structure. Hence, these two mechanisms seem to differ in at least two distinct ways: first, with respect to encoding (elementary sensory features versus sequential structures), and second, with respect to the time w i n d o w over which information should be integrated so as to notice a deviation. (The time w i n d o w is clearly longer for rule deviations than for level shifts.) In the following two sections we consider involuntary attending elicited by situational deviations. Some of the experimental evidence will be discussed with respect to level shifts (following section) and rule deviations. In the final section the distinction between voluntary and involuntary attention will be considered again but this time with the emphasis on the possibility of distinguishing underlying processing mechanisms rather than categorizing empiric phenomena.
3The difference between specific and aspecific selection can also be formulated this way. In specific selection (voluntary or involuntary), events attain selection that indicate specific (e.g. intentionally defined) features. In aspecific selection (involuntary), stimuli can be selected that do not show specific (context defined) features.
M. E imer et al.
158
2
LEVEL SHIFTS AND
INVOLUNTARY
ATTENTION
In this section experiments are discussed in which the influence of elementary situational changes on directing attention is studied. The main concern will be with sudden changes of single physical parameters, e.g. a change in the distribution of brightness on a cathode ray tube, which is brought about by the sudden appearance of a new stimulus. Level shifts will first be described in the context of the orienting reflex. This will be followed by a review of the effects of level shifts on evoked potentials in which context a model of a mechanism will be presented with regard to attending to auditory level shifts. Finally, reaction time studies will be discussed on attending in the visual domain, aiming at the question of the functional distinction among processes involved in directing attention.
2.1
Orienting Reflex
An orienting reflex (OR) is defined as the complex of physiological and behavioral reactions arising in the case of a change in stimulus situation (Pribram and McGuinness, 1995).4 Among the responses are alpha blocking in the electroencephalogram (latency 150-250ms), decrease in skin resistance (latency 2-6s), vasodilatation of blood vessels in the head, changes in pupil diameter as well as in pulse and breathing frequency and, finally, eye and head movements in the direction of the location of the change. Directing attention to a new or changing stimulus is usually considered to be a main function of the OR (N/i/it/inen, 1992); in addition, the OR can also be viewed as an indicator that a change in situation has been noticed. The conditions on which an OR occurs, on which it habituates and, again, dishabituates, are of course of special interest for the present discussion. Level shifts typically elicit an OR as in cases in which a tactile, auditory or visual stimulus is presented for the first time. Again, regularly repeated or permanently presented stimuli lead to habituation of the OR, which may dishabituate in the case of a subsequent change in the stimulus situation (Sokolov, 1963, p. 39, 1975). It is interesting that an OR is elicited not only by the onset, e.g. a loud bang, but also by the offset, e.g. a sudden darkness, of a stimulus (Sokolov, 1963, p. 82, 1975). Any change in intensity, either increase or decrease, suffices to bring about the OR, the strength of which is correlated with the extent of the intensity change (Sokolov, 1963, p. 39). Sokolov (1963, 1975) explained the effect, both the habituation and the dishabituation of the OR, in terms of the 'neural model of the stimulus'. His basic reasoning is as follows: the properties of repetitive as well as of permanent stimuli are extracted by the organism and stored for a certain lapse of time. The stored neural representation of the stimuli contains simple parameters such as intensity or color, but also more complex aspects such as the sequence of successively presented stimuli and the duration of the inter-stimulus intervals. An OR arises whenever the parameters of an actual stimulus situation do not correspond to the parameters as 4The connections between the various measures will not be discussed, but see Barry (1984). Again the relations between the OR and evoked potentials are not covered, but see Donchin (1981), N/i/it/inen (1979), Schandry and Hofling (1979), and Rockstroh and Elbert (1990).
Involuntary attention
159
represented in the neural model. Following Sokolov, the strength of an OR depends, at least within certain limits, on the degree to which a parameter of the actual stimulus deviates from its representation in the neural model, and also on how many parameters change simultaneously. Alternative attempts towards modeling habituation and dishabituation without invoking the concept of a neural model stem, among others, from Groves and Thompson (1970) and from Horn (1967). Thus, the dual-process theory of Groves and Thompson distinguishes between a habituation system and a state system. The habituation system is responsible for the decrease in response readiness of the effectors in the case of repeated stimulation. In turn, the state system is responsible for an initial increase in response readiness, which is also followed by a decrease in the case of repeated stimulation. Assuming that both systems contribute independently to the degree of response readiness, many phenomena relating to habituation and dishabituation can be reasonably explained. One may argue, though, as did Pribram and McGuinness (1975), that the alternative theories apply only to effects of simple stimulus changes. As pointed out in section 1, an OR can be elicited by either level shifts or rule deviations. To explain the latter type of phenomena it seems more plausible to postulate a storage system, the contents of which can be compared with the newly presented stimulus. The meaningfulness of a stimulus is often considered as a necessary condition for the OR to occur (Maltzman, 1979). One could, indeed, argue that level shifts elicit an OR owing to their potential meaning for the organism. It is not denied that stimulus changes or newly arriving stimuli are potentially meaningful. Yet, directing attention cannot be ascribed to the meaning of the stimulus, since the stimulus is not selected on the basis of particular features. Instead, as was already argued, the control of involuntary attention occurs bottom-up, merely on the basis of a change in situation (see first paragraphs and section 1). In support of this view, Gati and Ben-Shakar (1990) have demonstrated in two studies that an OR can be elicited by mere newness. This could occur, in fact, on a fully abstract level. In their featurematching model the authors postulated two comparison mechanisms to which newly arriving stimuli are subjected, one mechanism resulting in voluntary and the other in involuntary attending. On the one hand, a newly arriving stimulus is compared with the stimulus representation that is relevant in the context of the experiment, for instance the target stimulus as defined by the experimenter. On the other hand, the new stimulus is compared with the representation of the recently presented stimuli. In both cases incoming stimuli are compared with representations of stimuli. The result of the first comparison determines the relevance of the new stimulus, while the result of the second comparison concerns its newness. The degree of relevance and newness of the newly arriving stimulus shows, respectively, a positive and a negative correlation with the internal model of the situation. 5 According to Gati and Ben-Shakar, meaningfulness and newness have additive effects on the OR. The effect of newness on the OR, measured by the electrodermal response, was supported by the following experiment. In a variation on the so-called 'guilty knowledge technique' (Lykken, 1959) subjects were told to imagine SSimilar considerations suggest two different selection mechanisms, one based on a 'match' and the other on a 'mismatch' between a newly arriving stimulus and the internal model of the situation (Ohman, 1979;N~i~it~inen,Gaillard and M~intysalo, 1980; Prinz, 1983b, 1990a,b).
160
M. Eimer et al.
that they were suspected of murder. They were either shown a picture or given a verbal description of the victim, and were further instructed to act in the experiment as if they were innocent. Before representing the relevant stimulus (the victim), several other persons were brought in, and made familiar in the same way with the victim but now they had a variable number (zero to three) of characteristics in common with the victim. As an example, a characteristic could be 'the victim is an architect'. The results showed that the OR was stronger in the case of a characteristic that was seldom found for the control persons. In a further similar study the factors 'newness' and 'relevance' were simultaneously manipulated with the result that both factors had a main effect but no interaction. These results are consistent with the author's model in which, first, features of the control stimuli are continuously analyzed and integrated into representations, and second, newly arriving stimuli are compared with these just-established representations.
2.2
Evoked Potentials
An evoked potential (EP) is an event-related sequence of changes in electrical brain potentials. The changes are relatively small in comparison with spontaneous background activity, but can be isolated by way of adequate averaging techniques. An EP consists of several relative minima and maxima, which are indicated both by their direction (negative, positive) and by their number (e.g. N2 for the second negative wave following the stimulus) or by their latency (e.g. P300 for a positive wave that occurs about 300 ms after presentation of the stimulus). 6'7 There is ample evidence that level shifts as well as rule deviations lead to characteristic changes in the EP. Following a sudden stimulus, either visual, auditory or tactile, an N1 followed by a P2 is commonly observed (Picton, 1980; R6sler, 1982, p. 18). This N 1 - P 2 complex is also found when attention is not explicitly directed to the eliciting stimulus, which suggests involuntary attention, s Thus, N/i/it/inen, Gaillard and M/intysalo (1980) presented frequent standard tones and infrequent deviations via earphones to one ear at a time. The task of the subjects was to count the deviant tones presented to either one of the ears, and to neglect those presented to both ears. The N1 occurred both at standard stimuli and at deviant stimuli irrespective of how they had been presented or, in other words, irrespective of whether or not attention had been directed to the deviant stimuli. The neural processes involved in the generation of the N1 wave have been related to processes of directing attention (N/i/it/inen, 1988; Verbaten et al., 1986). Various generator processes may contribute to the N1 as elicited by auditory stimuli (N/i/it/inen, 1988, 1990; N/i/it/inen and Picton, 1987; Scherg, Vajsar and Picton, 1989). N/i/it/inen (1988, 1990) has argued that processes in the supratemporal auditory cortices are in particular responsible for eliciting attention in the case of 6The various waves are labeled as 'components'; this term is also used to indicate the cortical processes are assumed to underlie the occurrence of the EP waves (N/i/it/inen and Picton, 1987). 7See Chapter 9 for a discussion of technical details. 8yet, the amplitudes of the N1 and P2 are usually somewhat smaller than when attention is directed in advance to the eliciting stimulus, which suggests that the N1-P2 complex is also affected by voluntary attention (Hillyard and Hansen, 1986; Hillyard et al., 1973; Picton and Hillyard, 1988).
Involuntary attention
161
auditory level shifts. 9 The basic rationale of N/i/it/inen's hypothesis is that certain stimulus properties are extracted on a subcortical level. Level shifts activate subcortical systems for detecting transients, which, in turn, send interrupt signals to central processing mechanisms. This has the effect that the processes involved in generating the N1 are activated. If a certain threshold is exceeded, attention is directed to the sensory proceesses at work and to the results of those earlier sensory processes that are still available in the iconic storage system. According to N/i/it~inen (1988, 1990), a variety of results supports the notion that attending is elicited by the N1 generators. The N1 is correlated with the presentation of a stimulus; its amplitude depends on the detection threshold of auditory stimuli. Again, the N1 amplitude depends on the presence rather than on the content of the stimulus. Thus, a dissociation is commonly observed between the amplitude of the N1 and the loudness of the stimulus. In addition, the N1 is quite similar for different stimuli, such as clicks, tones and speech-like noises. Attendingas well as N1 amplitude are reduced during sleep, and more so as sleep is at a deeper stage. These results are consistent not only with the hypothesis that attending and N1 are related, but also that N1 is primarily related to stimulus detection. Results arguing against these assumptions include the association between N1 amplitude and subjective distractibility, and the dissociation between the N1 amplitude and the detection of suprathreshold signals: the N1 has a smaller amplitude as the inter-stimulus interval between stimuli is smaller (Davis et al., 1966). Again, the N1 is at the first than at later stimuli of a sequence (Verbaten et al. 1986; Woods and Elmasian, 1986). The subjective phenomenal appeal of the stimuli decreases in both cases, while the detectability of the stimuli remains the same. A combination of N1 and attending appears not only after changes in stimulus energy (increases or decreases in the case of continuous signals) but also after changes in qualitative aspects of the stimulus, such as frequency. In a study by Speer, Zimmer and Odenthal (1969), a frequency change of a 1000 Hz tone led to differences between the N 1 - P 2 amplitude in the order of 6-12 mV, the difference increasing with the amount of modulation. In the same vein, the N1 latency decreased as the frequency changes were larger. This mechanism of eliciting attention may be characterized as a detector of level shifts, either quantitatively or qualitatively, which simply reacts to an incoming stimulus. As long as the signal, which is triggered by the stimulus, is sufficienctly strong, the earlier extracted and stored physical properties of the stimulus come into the focus of attention (N~i/it/inen, 1988, p. 133). At that point, it is decided whether or not the stimulus will be subjected to further processing. The same mechanism of attending to auditory stimuli may actually apply to visual and tactile stimuli. It is mentioned in passing that, when processing visual stimuli, differential effects have been observed on the EP, depending on whether there had been a peripheral visual onset, either as a cue or as distractor, in advance of the target stimulus (Hillyard, M6nte a n d Neville, 1985; Luck et al., 1990). The next section is concerned with behavioral studies on effects of level shifts on reaction time in the visual domain. However, to our knowledge, there is no systematic research on the 9It is still an open question whether precisely this component is responsible for directing attention. According to Verbaten (1990), eliciting attention is more a matter of an aspecific component, which, in contrast to the modality-specificsupratemporal component, also occurs in the case of visual stimuli, and which is correlated with the occurrence of an OR.
M. Eimer et al.
162
relation between level shifts in the visual modality on the one hand, and N1 generators and eliciting attention on the other hand.
2.3
Reaction Time Studies
In recent years there has been much research on the relation between level shifts and attending in reaction time experiments, and in particular on processes involved in covert orienting (Posner, 1980). These processes cannot be directly observed, but are restricted to internal events. Covert orienting can occur voluntarily as well as involuntarily. The following discussion focuses on conditions in which covert attending is involuntarily elicited, and will be limited to visual situations only. 1~ The existence of phenomena of covert orienting presupposes that subjects are capable of dissociating the direction of attention from the direction of eye-fixation. In the experiments there are various techniques persuading subjects to concentrate on a particular region of the visual field, while at the same time maintaining their original eye-fixation. By measuring eye movements, through electro-oculography or otherwise, it is checked whether subjects really obey the instruction to keep their eyes fixated, and only those trials are subjected to further analysis. An example of a study on covert orienting stems from Posner, Nissen and Ogden (1978). At their eye-fixation point, subjects view an arrow that points either to the left or to the right, indicating the direction to which attention should be oriented. Following a variable interval a target stimulus is presented, either to the left or to the right of the fixation point, requiring a simple key-pressing response. In 80% of the trials, the target corresponds to the direction of the arrow. In comparison with the control condition, in which there is no advance information about the location of the target, the results of the experimental condition show significantly faster reaction times when the location of the target corresponds to the direction of the arrow, while reaction times increase if arrow and target do not correspond. Posner and colleagues interpreted these results as evidence for the hypothesis that the central cue had elicited covert orienting. Subjects start attending the cued direction prior to the presentation of the target, which has the effect that processing is facilitated if the target occurs at the indicated spot. The opposite effect occurs when cue and target do not correspond. Yet, involuntary attention seems not to be involved in this study, since it was the subjects' task to decode a symbolic stimulus (the centrally presented arrow), and to direct attention accordingly. Hence, the direct reason for attending concerns the subject's intentions. In contrast, involuntary covert orienting should be elicited irrespective of, and even contrary to, the intentions of a subject. To evoke involuntary processes, Jonides (1981) investigated a variation on the earlier-described Posner paradigm. During a trial eight letters were presented on an imaginary circle, the midpoint of which was fixated by the subject. The task was to detect a target letter among the eight letters. Briefly (50-100 ms) before the letters, subjects saw a cue stimulus. This 1~ Bernstein, Clark and Edelstein (1969a) and Bernstein and Edelstein (1971), with respect to the effect of auditory stimulus onsets on the direction of visual attention.
Involuntary attention
163
could be an arrow presented in the middle of the display, indicating the direction of the target with a certain predefined probability. Alternatively the cue could be presented in the periphery, pointing towards the position of the target with the same probability. Jonides (1981, experiment 2) found that, in the case of the peripheral cue, directing attention was hard to suppress voluntarily. In contrast to the case of the central cue, subjects were incapable of ignoring the peripheral cue even w h e n they were explicitly instructed to do so, and w h e n the validity of the peripheral cue was only 12.5% and, hence, at chance level. The few targets that were present at the indicated location were processed faster, while processing at all other target positions was slower. Jonides concluded from these results that peripheral cues attract attention involuntarily, while the attentional response to central cues is actually voluntary and d e p e n d e n t on instruction. 1 Which properties of the peripheral cue are responsible for involuntary attending? Yantis and Jonides (1984) suspected that a peripheral cue elicits attention in the case of an abrupt onset, and they p r o p o s e d a mechanism that could be responsible for eliciting attention to abruptly appearing stimuli. ~2 Yantis and Jonides tested their proposition in an experiment in which the effect on attending of an abrupt stimulus onset was c o m p a r e d with that of a gradually appearing stiumulus. Subjects were asked again to react to a target letter amidst a configuration of other letters, which were all presented on a screen. Besides the target there could be one or three context letters. One of the presented letters appeared abruptly, while the remaining one(s) arose gradually. For the gradually arising letters, Yantis and Jonides applied a technique that was first used by Todd and van Gelder (1979). A rectangular digit '8' was presented, separate segments of which were gradually extinguished within 80 ms. In this w a y the letters E, H, P, S and U (so-called offset stimuli) could be established, which were then used as either target or context letters. If the target letter was presented abruptly, reaction time was largely i n d e p e n d e n t of the n u m b e r of gradually arising context letters. However, if the target was gradually established, reaction time increased as a function of the total n u m b e r of presented letters. Yantis and Jonides interpreted the effect by the hypothesis that a abruptly presented stimulus attracts attention involuntarily and, therefore, is processed with priority. Given a serial self-terminating search and a gradually presented target, subjects w o u l d first process the abruptly presented context letter, followed by a serial search until detecting the target. This would, indeed, have the effect that reaction time increased with the n u m b e r of context letters. This did not occur if the target letter was the abrupt stimulus, since in that case the target was processed first, w h e r e u p o n the search could be abandoned. On the basis of these 11Peripherally elicited attention fulfills Neumann's (1984) second criterion for automatic processing, namely that the processes are not subject to voluntary control. Jonides (1981, experiment 1) also found that attending following a peripheral cue obeys Neumann's first criterion for automaticity. Performance in the case of a peripheral cue appeared less affected by a secondary memory task than in the case of a central cue. As expected from automatic processes, this suggests that attending on the basis of the peripheral cue does not consume capacity and, hence, does not interfere with a secondary task. 12The authors refer to the functional difference between two types of retinal ganglion cells as indication for a biological basis of such a mechanism (1984, p. 602). X cells are sensitive to continuous stimulation, while Y cells reach a maximum activation when the intensity of a stimulus suddenly changes. Again, their receptive fields are larger than those of X cells. In addition Y cells are regularly distributed over the whole retina while X cells are mainly found in the fovea.
164
M. Eimer et al.
starting points, Yantis and Jonides developed a quantitative model, the predictions of which were in accord with their experimental data. In a further study, Jonides and Yantis (1988, experiment 2) varied the number of context letters (2, 4 or 6) and obtained similar results. In addition, they studied to what extent stimulus properties other than direct onset might elicit involuntary attending (1988, experiment 1). They did find an increase of reaction time as a function of the number of context letters, if either color or brightness was the distinctive feature between the target letter and the context letters, suggesting that these properties seem incapable of eliciting involuntary attending. There remains, of course, the question to what extent this result depends on the extent to which attention is focused during a trial. Is abrupt stimulus onset really the only feature that elicits involuntary attending? Krumhansl (1982) found that abrupt changes in the form of a stimulus also facilitate its localization and identification. 13 Miller (1989) investigated whether it is the abrupt presentation of a stimulus situation. In the experiments of Yantis and Jonides, there was clearly a more general change in stimulus situation in the case of abrupt than of gradual onset in which some segments gradually disappeared. Miller attempted to deconfound both factors by applying a variation on the technique of Todd and van Gelder. The orginally presented pattern from which the offset stimuli were handed out little by little- the '8' in the experiments of Yantis and Jonides- had several additional segments. In this way the total change in the visual situation was at least equal and probably more in the case of gradual than of abrupt presentation. In this condition, there were no indications for involuntary attending to the abrupt stimulus, since reaction time increased with the number of context stimuli. Miller concluded therefore that not only onset, but also offset (although probably less so), can elicit involuntary covert attending. The decisive factor seems to be the net sum of the total changes in visual stimulation. The more the appearance of a stimulus is accompanied by a general situational change, the more it attracts involuntary attention. The described experiments all studied elementary situational conditions for involuntary attending, which indeed appears to be elicited by simple level shifts. Elementary changes in situation occur frequently and simultaneously outside the laboratory, and it seems plausible that they do not all lead to involuntary attending. Abrupt onset cannot be a sufficient condition for involuntary attention, therefore. Thus, Yantis and Jonides (1984) proposed that involuntary attending occurs only when there is one single abrupt change. 14 What are the more specific conditions on which abrupt stimulus onsets can be operative? Recently this question has been elaborated into either of two directions. One is concerned with the temporal dimension: is involuntary attending merely a transient phenomenon, which in subsequent processing is replaced or completed by voluntary attention? The other direction wonders about the relation between involuntary attending and the direction of attention prior to the moment that involuntary attending was elicited. What is the effect of an abrupt event when attention has already been directed to a certain area? 13This result has been interpreted in the framework of the model of Yantis and Jonides (1984). 14'Apparently only when the visual field contains but one such event (as when a relatively static scene is viewed and a moving object appears in the visual periphery) can attention be engaged in the manner illustrated by these experiments.' (p. 617).
Involuntary attention
165
The temporal course of involuntary attending as elicited by abrupt onset has been studied by M611er and Rabbitt (1989). In one of their studies (experiment 1), four peripherally localized squares were initially presented in which targets or context letters could be positioned later. Subjects were asked to direct attention to a certain square that was indicated by a cue. In one condition the cue was an arrow presented at the fixation point (central cue). In another condition, the sides of one of the squares lit u p d u r i n g 50 ms (peripheral cue). The variables were the validity of the cue and the temporal interval between presentation of the cue and of the target (between 100 and 725 ms), while the probability of correct localization and identification of the target was measured. M611er and Rabbitt observed that the temporal course of facilitation in the case of a valid cue, or of inhibition in the case of an invalid cue, strongly d e p e n d e d on the type of the cue (with valid cues) or erroneous responses (with invalid cues) occurred at a temporal interval between cue and target of 175 ms. In contrast, valid central cues had their maximal effect only after 400ms. M611er and Rabbitt suggested, therefore, two i n d e p e n d e n t mechanisms for attending. A fast mechanism, elicited by peripheral cues reaching its maximal effect after 100-200 ms, causes the above-described early effect with peripheral cues. A second, slower mechanism is elicited by central cues and has its maximal effect only after 275-400 ms. Does the fast mechanism represent involuntary attending? In a further study, Miiller and Rabbitt (1989, experiment 2) investigated to what extent the presence of a peripheral cue affects the reaction to a simultaneous central cue. What happens if both cues deviate with respect to the indicated target location? The results s h o w e d that, if a valid central cue was presented first, followed after a 500 ms interval by (the lighting u p of) an invalid square, the proportion of correct reactions was significantly less than in the opposite case of an invalid central and a valid peripheral cue. In the latter case, the target a p p e a r e d in the square that was lit up but not indicated by the central cue. It should be noted that this pattern of results was found only w h e n the interval between peripheral cue and target was 100 ms at the most. If this interval was longer, say 400-700 ms, the effects changed in that n o w the proportion of correct responses was larger if the central cue had been valid. Mfiller and Rabbitt interpreted these data as suggesting that the slower attending as a result of the central cue is interrupted in the case of an abruptly presented peripheral cue. This causes fast attending to its location, which occurs against the subject's intention and despite the process of v o l u n t a r y attending, as controlled by the central cue. This suggests indeed that the fast process is at the basis of involuntary attending. If there is a longer interval between the cue and the target, the effect of the fast mechanism gets lost. In this case attention can be voluntarily redirected to the position as indicated by the central cue. 15
15In a further experiment, M611erand Rabbitt (1989, experiment 3) investigated whether the occurrence of a (for the target position noninformative) lightflash in temporal proximity of a valid peripheral cue leads to a reduced detection accuracy. The most interesting results were found in the condition in which target and context stimuli were presented 100ms after the lightflash. In that case, the proportion of correct detections was smaller as the temporal interval between peripheral cue and flash was longer. If the flash occurred as early as 100 ms after the peripheral cue, i.e. within the temporal span in which the operation of the fast orienting mechanism is presumed, the effect of the flash was smaller than at an interval of 300-500 ms at which involuntary attending has made way for voluntary attending.
166
M. Eimer et al.
The results of experiments by Nakayama and Mackeben (1989) also support the assumption of two different mechanisms for directing attention. Here, the task of the subjects was to detect a square-like target amidst 64 other figures. The target figure differed from the context elements by a specific combination of two features (e.g. the only horizontally located white rectangle). Subjects received first a cue, e.g. a square, marking the future position of the target stimulus. Following a variable temporal interval, the stimulus figures were briefly presented (33-117 ms). The question concerned the effect of the interval between cue and target on its correct identification. Nakayama and Mackeben established that identification gradually improved when the interval between cue and target was prolonged to 200 ms, but clearly declined at longer intervals. This effect was again interpreted as suggesting that the cue elicits fast involuntary attending, which accounts for the improvement in performance, but which reaches its optimal activity between 100 and 200 ms. This fast mechanism can therefore contribute little to directing attention, when the target arrives at a longer interval. Like M/iller and Rabbitt, Nakayama and Mackeben suspect that at longer intervals only the second slower and voluntarily controlled mechanism remains operative. Hence single stimuli, appearing in an otherwise stable context, attract involuntary attention with a specific temporal course, irrespective of, or even contrary to, existing intentions. Is involuntary attention independent of the momentary state of attending in which a subject is actually engaged? This question has been addressed in a recent series of studies, the results of which appear to argue against this assumption. Instead, involuntary attending seems to be elicited only if observers have not yet focused their attention elsewhere. Yantis and Jonides (1990) investigated to what extent the validity of a central cue might affect the occurrence of involuntary attending. In their first experiment they presented an 80% valid central cue. In one half of the trials, the target letter appeared abruptly at a position on the screen which had been signaled by a dotted pattern, while the context letter was presented gradually by way of the earlierdescribed technique. This was reversed in the other half of the trials. Yantis and Jonides found that after presentation of a valid central cue the mode of target presentation (abruptly or gradually) did not play a role. Identification was about equally fast in both cases. However, at the 20% invalid central cues the advantage of abrupt presentation on reaction time reappeared. Apparently abruptly appearing stimuli do not necessarily elicit involuntary attending. In order to investigate further the limits of involuntary attending, Yantis and Jonides (1990, experiments 2 and 3) varied the temporal interval between cue and target as well as the cue validity. In experiment 2 a target letter (appearing abruptly in half of the trials) was presented together with three context letters. The central cue was 100% valid and was presented either 200ms before, 200ms after or simultaneously with the letters. In the first case, there was again no significant difference between reaction time to abrupt targets or gradual target letters. However, in other conditions the abrupt targets were detected significantly faster than the gradual ones. This was interpreted by Yantis and Jonides as follows. If a valid central cue can be used for voluntary attending, i.e. the condition in which the cue preceded the target, then there is no effect of abruptly appearing context letters on the speed of detecting a gradually emerging target letter. The sudden appearance of context letters does not elicit involuntary attending when attention is already focused; it only does as long as attention is not yet focused on a prospective target.
Involuntary attention
167
To what extent was suppression of involuntary attention due to the fact that the central cue was fully valid? This question was addressed in experiment 3 of the Yantis and Jonides (1990) study, in which cue validity was varied between 25% and 100%. The results s h o w e d that, at a low cue validity (25%), abrupt targets were actually detected faster than gradually emerging targets. 16 Thus, the general tenet of these results is that involuntary attending caused by s u d d e n changes in the situation is not a matter of mere external control, independent of internal factors. Focused attention directed by a valid cue eliminated the effectiveness of abrupt stimuli. A b r u p t stimuli appear to exert an effect only w h e n attention is not yet focused, as in the conditions in which the central cue appeared together with or after the stimulus pattern, and in which the validity of the cue was at chance level. The question whether a s u d d e n event m a y elicit involuntary attending if attention has already been focused on a different spatial location was also studied by Theeuwes (1991). He varied the temporal interval between a 100% valid central cue and the presentation of target and context letters ( - 6 0 0 m s , - 3 0 0 m s , + 2 0 0 m s ) . In contrast to Yantis and Jonides, Theeuwes presented all letters gradually but, in addition, a bar abruptly a p p e a r e d beside one of the letters between 160 ms before and 80 ms after presentation of the letters. If there was a sufficiently large interval between the central cue and the presentation of the letters (600 or 300 ms), then there was no inhibitory effect of the abrupt appearance of the peripheral bar at a non-target location on reaction time to the target. Hence, the abruptly appearing event did not cause interfering involuntary attending because attention was already directed to another location on the basis of the valid central cue. However, Theeuwes did find that a peripheral bar beside the target letter had the effect of slowing d o w n reaction time. 1v If the central cue was not presented until 200 ms after the stimulus l e t t e r s - and, consequently, could not lead to voluntarily controlled a t t e n d i n g - then the abrupt appearance of the peripheral bar beside the target letter had the effect of a significant decrease in reaction time. In this case, attention had not yet focused and therefore involuntary attending could be elicited, which is in line with the results of Yantis and Jonides. The question then arises as to whether abrupt appearance and abrupt disappearance of stimuli have the same kind of effect. Theeuwes (1991, experiment 2) investigated this by presenting four peripheral bars at the start of a trial, one of which (either close to the target or a context letter) abruptly disappeared at a certain interval before or after presentation of stimulus letters. No significant effects on reaction time were found in comparison with a control condition without abruptly disappearing peripheral bars. Yet there was one exception to this result: if the central cue was presented 2 0 0 m s after the stimulus letters, i.e. attention was unfocused at the time the letters were presented, then reaction time was less in the 16Ifthe central cue was valid at 75% the reaction time data were less consistent. Yantis and Jonides (1990, p. 130) suggested that the subjects used different strategies for the employment of the central target. But the abrupt onset of a target letter does not lead involuntarily to attentional shifts; the effects of such an onset are more dependent on the strategy that is followed by the subjects. If the central cue is valid at 100%, the onset modus of the targets plays no role, as already mentioned. ~7It should be noted, though, that this occurred only when presentation of the peripheral bar and the letters were separated by an interval of less than 160 ms. Theeuwes suggested that the slowing down might be ascribed to lateral masking of the letter by the bar or to lateral interference.
168
M. Eimer et al.
case of abrupt disappearance of the peripheral bar beside the target letter. Thus Theeuwes and Yantis and Yonides arrived at the same conclusion: a sudden appearance, and less pronounced a disappeance, of a stimulus evokes involuntary attending to the corresponding location if attention is not focused. However, if attention is focused because of a valid central cue, the sudden stimuli do not elicit involuntary attending, at least not as long as they are outside the focus of attention. The results of the discussed experiments are partly contradictory. On the one hand, M~iller and Rabbitt (1989) found that voluntary attending can be affected or even interrupted by a peripheral change in situation, while Yantis and Jonides (1990) and Theeuwes (1991) arrived at the opposite conclusion: if attention is focused, sudden stimuli outside the focus of attention have no effect. How can these divergent results be reconciled? Yantis and Jonides (1990, p. 123) presume that a decisive difference between their experiments and those of M/filer and Rabbit (1989) relates to the relatively long interval between the presentation of the central cue and the letters in the Miiller and Rabbitt experiments. In addition, Mfller and Rabbitt's 50% cue validity was not particularly high. According to Yantis and Jonides, both factors might have added to incomplete focusing of attention. The more attention is divided, the larger is the probability that an abrupt event elicits involuntary attending. Plausible as this explanation may be, it still cannot satisfactorily account for the differences between the earlier experiments- suggesting early involuntary attendi n g - and the more recent studies, which cast at least some doubt on this interpretation. The relations between single situational variables and the extent of attentional focusing remain to be investigated. The temporal interval between central cues, distractors and targets, and the validity of either peripheral or central cues, are certainly among these variables. It should be scrutinized to what extent abrupt presentation of targets, distractors and cues leads to different consequences for involuntary attending. The dependence of involuntary attending on current orienting of attention should also be studied in greater detail. On an still more basic level, the question should also be raised to what extent the described phenomena of voluntary and involuntary attention are indeed based upon separate mechanisms. An alternative view could be to conceptualize visual attending as a matter of a single mechanism, in which case the difference between voluntary and involuntary attending is not related to different mechanisms but solely a matter of the conditions of eliciting either one. Studies by Warner, Juola and Koshino (1990) have s h o w n - contrary to, say, M611er and Rabbitt (1989)- no systematic differences, either in temporal course or in cost-benefit pattern, between involuntarily and voluntarily elicited attending. They might be considered, therefore, as support for the existence of a single mechanism (see also Cheal and Lyon, 1991).
3
RULE DEVIATIONS
AND INVOLUNTARY
ATTENDING
In this section the discussion will center around the question to what extent rule deviations may elicit involuntary attending. First, it will be shown on the basis of studies about the orienting reflex that, indeed, deviations from a sequence of regular events may lead to attending. The data favor a mechanism that models
Involuntary attention
169
rules occurring in sequences of stimuli, which registers deviations and which attends if a deviation is noticed. However, the problem with experiments in the tradition of the orienting reflex is that they tell little about the nature of the mechanisms involved. Differential effects of rule deviations on evoked potentials- as discussed b e l o w - a p p e a r to support more detailed hypotheses about involuntary attending. The final section deals with behavioral data from studies on visual search, which show that relatively complex rules can also be adequately modeled, and that violating these rules may lead to involuntary attending.
3.1
Orienting Reflex
An orienting reflex (OR) can be elicited by level shifts as well as by rule deviations. For example, there is a quite simple rule deviation, when a fixed order in which a combination of two stimuli always occurs is reversed. The rule deviation could also consist of omitting one of the stimuli of the combination. Thus, an initial OR habituates in the case of a recurring combination of a soft and a loud noise, but it dishabituates when the loud noise is presented without the soft noise (Sokolov, 1975). There is also an OR when the stimuli belong to different sensory modalities, as in the case of a tone and a light (Badia and Delfran, 1970; Siddle and Packer, 1987). In one of the conditions of the Badia and Delfran study, 15 tones were presented, each time followed by a light. The 16th presentation had only the tone, which led to an OR. In another condition there was only the light at the 16th presentation which again elicited an OR. Other simple rule deviations, which evoke an OR, are changes in stimulus duration or in inter-stimulus interval in the case of a sequence of repetitive stimuli (Sokolov, 1975). Thus, the results imply a mechanism that can detect deviations in the context of regularly occurring events. This mechanism should be capable of anticipating events on the basis of earlier processed information, and registering deviations due to a model of the sequential dependencies of the various events. In this way the mechanism can distinguish between appropriate and inappropriate events in sequences of stimuli. In the above-mentioned studies the rule deviations were simple changes in a repetitive sequence. This raises the further question as to whether deviations from more complex regularities will also be recognized. A more complex sequence might be, for instance, A-B-C-D-E-F-G-I-J-K-L-M-N-B, in which the letter 'B' appears twice, first at an appropriate and then at an inappropriate position, at least according to the alphabetical order, which would require an 'O'. Will such more complex rule interruptions be detected, and will they elicit an OR? In a study by Unger (1964), this has indeed been observed. Subjects received a sequence of numerically ascending numbers until the O R - operationalized in terms of vasoconstriction of the blood vessels of the f i n g e r - h a d habituated to the presentation. TM Following successful habituation, a number that did not fit in the sequence, e.g. 15 after 16, elicited a clear OR. A further experimental result in favor of a mechanism for modeling sequential dependencies was described by Zimny, Pawlick and Saur (1969). They investigated the effect of a test stimulus, i.e. the number 600, as a function of the structure of standard stimuli, i.e. the numbers 21 18Subjects who did not habituate were excluded from the experiment.
170
M. Eimer et al.
to 60. Subjects received once a numerically ascending sequence of stimuli (21, 22, 23, 24 .... 59, 60). The structure of the sequence was violated, however, after the standards 28, 34, 44, 53 and 58. The test stimulus 600 was presented, whereupon the sequence continued in the normal way. The first and second presentation of the 600, after either 28 or 34, did not elicit an OR. After the third presentation of the test stimulus (following 44) the electrodermal activity, as an indicator of OR, increased. The last two presentations of the 600 did not elicit an OR. The authors suspect that a sufficiently specific neural model had been established after 24 presentations of the standard, i.e. after the number 44, so as to bring about dishabituation of the OR. At still later presentations the test stimulus was included in the neural model with the result that the OR was no longer evoked. It should be noted that in the experiments of Unger and of Zimny et al., the OR was not elicited by a novel stimulus but by a stimulus that had already been presented during the sequence. Instead, the standard stimuli were 'new' in a formal sense. Thus, the decisive condition for the OR to occur was not the novelty of the stimulus, but the fact that a rule was violated. Hence, the data favor a mechanism that recognizes sequential dependencies in stimulus materials, anticipates future situations, registers discrepant events and, finally, elicits attending.
3.2
Evoked Potentials
Studies on attending through evoked potentials frequently use the so-called 'oddball' paradigm. The subject receives a sequence of stimuli, many of which (the standards) have a high probability of occurrence while other stimuli (deviants) are relatively rare (Fabiani et al., 1987). The passive form of the oddball paradigm is particularly suitable for studying involuntary attending since it diverts attention from the stimuli in which the experimenter is really interested. This is realized by having subjects perform a task that occupies another channel or sensory modality, e.g. when investigating evoked potentials to auditory stimuli. Alternatively they may be instructed to process information presented at the right ear and to ignore information presented at the left ear, while the experimentally interesting sequence of standard and deviant stimuli actually occurs at the left ear. According to N~i~it~inen, there is, in addition to the earlier-discussed mechanism (section 2), a second mechanism that elicits involuntary attending in the case of.auditory stimuli (N~i~it~inen, 1988, 1990; N~i~it~inen, Simpson and Loveless, 1982; Ohman, 1992), in which 'mismatch negativity' (MMN) fulfills a key role. The MMN is part of the N2 component and shows up in the time function of the potential as an additional negativity in the case of the deviant as compared with the standard. The maximum of the MMN is at about 100-200 ms after stimulus onset, and, in contrast to the effect on the N1, MMN is usually elicited in the passive oddball paradigm by deviants only and not by standards. MMN also clearly differentiates better between stimulus repetition and stimulus change than does the N1 (Sams et al., 1985). The amplitude of the MMN is largest in the frontal area and shows a slight dominance in the right hemisphere irrespective of whether the stimuli are presented contralaterally or ipsilaterally. Various generator processes appear to be involved in establishing MMN and there seems to be at least one supratemporal generator in either one of the auditory cortices (Hari et al., 1984; Scherg et al., 1989) and a further
Involuntary attention
171
generator at a frontal location (N/i/it/inen and Michie, 1979, p.115; Giard et al., 1990). A deviant stimulus which elicits M M N may differ from a standard in intensity (N/i/it/inen et al., 1987; 1993), in duration (N/i/it/inen et al., 1989; Paavilainen et al., 1991), in the location of the stimulus source (Paavilainen et al., 1989), in interstimulus interval (ISI) (Ford and Hillyard, 1981; Nordby, Roth and Pfefferbaum, 1988) or in frequency (Nordby et al., 1988; Sams et al., 1985). For example, in the Sams et al. (1985) study the deviation was frequency. In 80% of the trials, subjects were given 1000 Hz tones, while in the remaining 20% of the trials the frequency was either 1002, 1004, 1008, 1016 or 1032 Hz (stimulus duration: 50 ms; ISI: I s; intensity: 80 dB (SPL)). Both 1016 and 1032 Hz stimuli showed a clear M M N in comparison with the standards. M M N occurs not only when parameters of simple stimuli change but also with more complex auditory stimuli. Thus, in various experiments by Schr6ger, N/i/it/inen and Paavilainen (1992) the standard was a combination of successive tones comprising eight components of different frequencies. The deviant differed from the standard only in the sixth component and was identical otherwise. Yet, an M M N appeared after sufficient presentations of the stimuli. Following N/i/it/inen (1988, 1990), the M M N results may be explained as follows: the physical parameters of the standard stimulus are encoded as neural representation of that stimulus. Each incoming auditory stimulus is compared with this m e m o r y representation. If the comparison process results in a mismatch between actual stimulus information and m e m o r y representation, M M N occurs. Thus, MMN represents the extent to which the representation of a given stimulus deviates from the neural representation of the standard stimulus. 19 According to N/i/it/inen and Michie (1979, p. 115), a deviation first causes activity in the M M N generator in the auditory cortex which, in turn, initiates activity in the frontal generator. Provided a variable threshold value is exceeded, this last generator might elicit attending the discrepant stimulus which results, as indicated by an N 2 b - P 3 a complex (N/i/it/inen and Gaillard, 1983), in an OR. The biological function of the mechanism is temporarily to store repetitive properties of auditory stimuli and to elicit attending in the case of sufficiently large changes. In a study by Winkler et al. (1990), M M N was also observed in a more ecologically valid situation, which supports the notion of the biological function of the above-mentioned mechanism. The authors showed that a preceding constant repetitive stimulus is no prerequisite for eliciting M M N to a deviant stimulus, but that the standard stimulus might just as well be variable. In their experiment subjects were instructed to read in a book and to ignore auditory stimuli. The standard was a 600 Hz tone with an average intensity of 80 dB (SPL). The intensity of the standard was varied in such a way that the intensity difference between two successive standards amounted to 0, 0.2, 0.4, 0.8 or 1.6 dB while the range of intensity variation within a block was 0, 0.8, 1.6, 3.2 or 6.4 dB. The standard had a probability of occurrence of 0.9. The deviant could differ either in intensity (600 Hz; 19The MMN generator process might also be viewed as the neural basis of the mechanism for detecting deviations, as proposed by Sokolov (1975) (see also Sams et al., 1985). There are, however, several differences between MMN and an OR, which show that an OR can be elicited by processes that do not elicit MMN. Thus, an OR, but not MMN, is commonly observed following the first of a sequence of stimuli, and also when the inter-stimulus interval between stimuli exceeds 10s. In addition it is contested whether MMN also occurs after other than auditory stimuli (e.g. N/i/it/inen, 1990; Ciesielski, 1990; Cammann, 1990).
172
M. Eimer et al.
70 dB) or in frequency (650 Hz; 80 dB). MMN occurred both in the case of a deviant frequency and a deviant intensity. Separate analyses for each individual block of trials showed that MMN decreased as the range of variation of the standards increased. Again, the conditions leading to MMN in the passive oddball paradigm often, although not always, elicit a positivity after 300 ms (P3) (Ritter, Vaughan and Costa, 1968; Sams et al., 1985; Snyder and Hillyard, 1976; Squires, Squires and Hillyard, 1975; Squires et al., 1977). The subjects in the Ritter et al. study (1968, experiment 5) were reading during which activity a click was presented every 2 s at one ear. In about 4% of the trials the click appeared at the contralateral ear which caused a P3 component. 2~ In one condition of the Snyder and Hillyard experiment, subjects were instructed to read and to ignore the presented clicks while deviant clicks were counted in another condition. The intensity of the clicks was 65 or 75 dB (SPL). Varied between blocks, 90% of the clicks were either loud or soft. In the 'reading' condition a frontal positive shift was observed with a maximum after 258 ms, i.e. a P3 component. In the counting condition a parietal positive shift was found after 378 ms, i.e. a P3b component. The authors interpreted the results as evidence for the notion that the N2-P3a complex indicates the activity of a detector for background events. Evidence for bimodal interference was obtained in an experiment of Squires et al. (1977), in which combined light-tone stimuli were presented (blue lightflash/ 1000 Hz tone; orange lightflash/1100 Hz tone; blue/1100 Hz and orange/1000 Hz). Visual as well as auditory deviants could occur. A deviant stimulus in one modality was independent of a deviant stimulus in the other modality, so that, besides light-tone pairs as defined above, there were light-tone pairs with a deviant in either or both modalities. In one condition, the subjects' task was to count visual deviants and to ignore auditory stimuli; this instruction was reversed in another condition. In separate blocks of trials, EPs were not measured but, instead, subjects reacted to standard and to deviant signals with respect to one of the two modalities by pressing an appropriate response key, while ignoring the stimuli from the other modality. According to expectation, the deviants of the attended modality elicited a P3, but in addition the deviants of the ignored modality also elicited a P3. Again, reaction times for the frequent (standard) stimuli of the attended modality were faster when the stimulus of the ignored modality was also standard than when it was deviant. This interference effect may be explained by assuming that the irrelevant deviant had still attracted attention; in turn the P3 component, connected to the irrelevant deviants, might indicate the extent to which attention was distracted from the primary task. Compatible with this interpretation are the results of the earlier outlined study of Sams et al. (1985, p. 445), in which only a P3 component was found if subjects became aware of a deviant stimulus. P3 effects to deviants, which also suggests involuntary attending, occur in the 'active' oddball paradigm in which subjects react to predefined target stimuli. Here, the typical effect is a pronounced P3 complex in the case of task-relevant targets, in comparison with the case of irrelevant standard stimuli. However, this effect can be readily related to voluntary attending (Picton and Hillyard, 1988) and is, therefore,
2~
it is unclear from their data whether there was also an MMN.
Involuntary attention
173
irrelevant to the present argument. Yet, a P3 component is evoked not only by target stimuli but also by task-irrelevant deviants (novel stimuli, non-targets) which clearly differ from the standard (Kemner and Verbaten, 1990, for the auditory, visual and somatosensory modalities). 22 Admittedly, attending to such stimuli is not involuntary, because the instruction implies that each stimulus must be inspected and assessed whether there is a relevant target stimulus. I n v o l u n t a r y - in the sense of bottom-up control- is only the reaction to the standard stimuli. Hence, there is no involuntary attending to the stimulus, but only to certain aspects of the stimulus. It is impossible to ascribe the 'registration' of deviants to intentional selection, because the specific features of the novel stimuli could not be anticipated. Therefore, attending could not be top-down controlled. The effect on the novel stimuli differs, among others, from that on the targets in that it clearly habituates in the course of an experiment, which does not occur with target stimuli (Courchesne, Courchesne and Hillyard, 1978; R6sler, Hasselmann and Sojka, 1987). This is precisely what should be expected from an OR-based mechanism, because novel stimuli lose their deviant characteristics with an increasing number of repetitions. It would be dysfunctional not to include them in the standard model of the situation. Donchin (1979, 1981) and Donchin and Coles (1988) consider the P3 component in the case of rare events as reflecting elaboration of the context (but see criticism of Verleger, 1988). So far the rule deviations that led to the described EP effects all had a physical origin. What is the situation in cases of semantic rule interruptions? Nasman and Rosenfeld (1990) showed that stimuli from a deviant category had a larger P3 amplitude than nondeviant stimuli. Again, this did not only apply to targets but also to non-targets. In five different tasks, stimuli were presented to subjects from a set of nine possible alternatives. Eight stimuli consisted of two-digit numbers, while the ninth was a double letter (e.g. BB). The P3 amplitudes were larger for a non-target of the deviant category than for other non-targets. The categorically deviant non-targets evoked an almost equally large P3 amplitude as categorically nondeviant targets. The relative effect of physical and semantic deviants on EP components has been repeatedly studied. In the three experiments by Kutas and Hillyard (1980), physical deviants showed a P3 (with P210, P360 and P560 as local maxima), while semantic deviants showed an N400 component. Seven-word sentences were presented visually and word for word (stimulus duration 1000 ms, ISI 900 ms). A total of 160 different sentences was used, so that no sentence was ever repeated. In the Kutas and Hillyard study, which aimed at studying the effect of physical deviants, 25% of the sentences ended with a word in bold type (e.g. 'She put on her high-heeled shoes'). In the two studies in which the effect of semantic deviants was investigated, 25% of the sentences were incongruent in that the last word did not fit the context (e.g. 'He took a sip from the transmitter'), the incongruence being more pronounced in the one than in the other of the two studies. The negativity with a maximum at 400 ms was stronger as semantic incongruence was more pronounced. In these studies by Kutas and Hillyard the variations related not only to the dimension 'physical-semantic', but also to the nature of the deviation. In the
21Effects of N1 and MMN in the 'active' oddball paradigm are not discussed here (but see Chapter 9).
174
M. Eimer et al.
example the physical deviation implied a deviation from a repetitive event in that the lettertype of the final word did not correspond to that of the other six. In the case of the semantic deviation, there was a deviation from a more complex regularity of events. The semantic deviation did not simply consist of a new event added to a sequence of repetitive events since the seventh word was also new in the congruent sentences. The issue was rather that it was a word that did not belong to the set of words that could be expected on the basis of the first six words. Thus, the dimension 'deviation from a repetitive sequence' versus 'deviation from a complex rule' was orthogonal to the dimension 'physical versus semantic deviation'. Hence both dimensions were confounded and, consequently, the experiments of Kutas and Hillyard do not allow conclusions about the dimension to which the differential effect of the change in situation on the EP should be ascribed. Both dimensions were not confounded in an experiment by Besson and Macar (1987), the sole aim of which was to study effects of deviations from more complex rules. Stimulus sequences of seven items each (duration 750 ms, ISI 1000 ms) were presented in four different conditions. The stimuli were: (1) six- or seven-word sentences presented visually and word for word; (2) geometric patterns the size of which either increased or decreased in the course of a stimulus sequence; (3) tones with either an increasing or decreasing frequency; and (4) tunes from well-known music. In each of the conditions, 25% of the stimulus sequences ended with an incongruent stimulus. In condition 1 it was a semantically incongruent word; in condition 2 it was a pattern that violated the implicit rule about the change in size; in condition 3 it was a wrong frequency with respect to the stimulus sequence; and in condition 4 it was a tone that did not belong to the melody. In conditions 1, 2 and 4 a more pronounced N1 component showed up after incongruent stimuli as compared with congruent ones. In conditions 2, 3 and 4, incongruent stimuli evoked a P3 complex after 350-450 ms. Finally, incongruent stimuli had the effect that the positivity shifted into the negative direction in condition 1, i.e. in the case of sentences (maximum about 350ms). Hence the P3 complex appeared to be superseded by the N400. It is of course essential that the more complex nonlinguistic deviants (wrong size, wrong frequency, wrong tone) did not evoke an N400. The authors suggest that the first words of each sentence may semantically prime later words, and therefore preactivated words are processed more easily. An N400 component is evoked if the later word does not belong to the primed semantic category. In summary, it can be said that various investigators, studying evoked potentials, have developed functional notions about internal registration of rule interruptions in which attending is elicited by a deviation of an actual event from an internal model.
3.3
Visual Search
Evidence for the effect of deviations from rules also stems from research on continuous selection processes, which, for example, take place in continuous search tasks (see also Chapter 2). In experiments of this type, which have been developed by Neisser (1963, 1967), subjects are instructed to scan, line by line, a list of letters
Involuntary attention
175
following each other either randomly or according to certain rules, and to look for one (or several) target letter(s). 22 Each list consists of a large number of context symbols (typically 300-1000 letters or digits), but contains only one single target. Scanning ends as soon as the target is detected. Neisser's original description of the performance of the subjects was in terms of search speed, defined by the average search time per line or symbol. More differentiated descriptions are possible by recording eye movements during scanning (Jacobs, 1986; Nattkemper, 1990; Nattkemper and Prinz, 1984; Prinz and Nattkemper, 1986, 1987; Rayner and Fisher, 1987). A variety of experiments, summarized by Prinz (1977, 1986), have led to the assumption that search activity is largely controlled by representational structures, which involve involuntary attending based on internal models of the environment. Three kinds of experiments are essential with respect to this view. The first is concerned with the phenomenon of detecting pseudo-targets (Prinz, Tweer and Feige, 1974). Subjects stop searching or hesitate when detecting certain symbols which they are not actually looking for. This occurs when, following some practice, lists are presented containing a new item that belongs neither to the target nor to the non-target list used so far. (The logic of these experiments is similar to those discussed in the previous section, in that rarely occurring deviants are inserted in a sequence of events.) It is surprising that detecting pseudo-targets does not require extensive experience with the usual context events, but occurs at an early stage of practice (Prinz, 1979, experiment 1). This phenomenon means first of all that, while searching, subjects notice deviations from a regular context, which may potentially lead to an interruption of the search. This cannot be explained by a theory that assumes mechanisms of searching and finding are ultimately a matter of voluntary attending (Neisser, 1967). Under this view the instruction to search for a distinct target would have the effect of selectively activating a representation of that target. With this representation 'in mind', the list is scanned until stimulus information is met that corresponds with the memory representation of that target. According to this theory, detection would be the result of voluntary attending, in the sense that a distinct intention, directed to the to-be-detected set of stimuli, constitutes the functional basis that stimuli of that specific category are processed as a first priority. The very fact of pseudo-target detection means that interruption of search does not necessarily imply a match between stimulus information and an internal representation of the to-be-searched target object. The phenomenon may be explained by reversing the logic of the Neisser theory: search may not be primarily controlled by representations of the targets but rather by an internal model of potential context events (Prinz, 1979; Prinz and Ataian, 1973). This internal model contains stored representations of context elements, and is established by the first context elements that occur in the list. All further repetitions of the context elements refresh and actualize the stored representations. To explain the genesis and maintenance of the internal model, if suffices to postulate that a memory representation which has been activated by given stimulus information primes itself. The 22The mechanisms used for searching and finding in such tasks are largely a matter of voluntary attention. In the present discussion only effects that are unrelated to voluntary attention will be considered.
176
M. Eimer et al.
prime decays over time unless it is refreshed by repeated activation. Thus, the activation of a memory representation alters its functional state in a way that facilitates its subsequent activation. Search is continued as long as the stimuli fit the internal model of potential context events. Any stimulus that does not fit the model, either a target or any deviant from the regular context elements, is subjected to a further analysis. This mode of control becomes plausible when considering the essential aspects of the search task. Targets are rare events, while context elements are continuously and repeatedly met. Under these circumstances the generation of an internal model about potential context symbols is an economic way of controlling search. This is even more valid if the search task is considered in terms of decisions-to-act. While scanning the list, a decision is made at each fixation either to stop (in case of a deviation from the regular context) or to continue search with a saccade to another position of the list (in case of only regular context events). The decision to continue search is the common one and, hence, it is relevant to specify the criterion for making these decisions in terms of the properties of the more frequent and not in terms of the less frequent alternative (see also Heuer and Prinz, 1987). In two additional types of experiment it was found that registration of deviants is not limited to novel stimuli, such as pseudo-targets, but also occurs in the case of a rule interruption. What happens when there is a deviation from an earlier established rule structure? The first line of evidence emerged from studies in which the complexity of the context was varied within search lists (Nattkemper, Ullmann and Prinz, 1991). The complexity of the context was defined in terms of the number of different context letters in a certain area of horizontally adjacent elements. Complexity was varied across two successive lines within search lists. The critical issue was whether direct effects of such a variation are observed on scanning behavior, e.g. on saccadic eye movements. The results of these studies show two relevant features. The first fixation after the change in complexity exceeded the duration of the corresponding first fixation in lines where complexity had remained unchanged. This effect appeared to be aspecific, since it was observed at both increasing and decreasing complexity. A specific effect was found on the next saccade, the amplitude of which was smaller as context complexity increased and larger as it decreased. Thus, although the first fixation after the change in context complexity did not show a specific effect, the specific effect of the next saccade suggests that the direction of the change did not remain unnoticed. This saccade must be prog r a m m e d - or adjusted to the new structure of the list- on the basis of information processed during its preceding fixation. Presumably, therefore, the direction of the change is noticed during the fixation and saccade amplitude is immediately adjusted to the new list structure. Evidence concerning effects of a rule deviation on search is also obtained from search experiments in which the sequence of context elements is not at random but has a certain regularity that is suddenly violated. One may, for instance, establish such regularities by introducing a certain nonrandom algorithm generating specific letter bigrams or trigrams (Nattkemper and Prinz, 1991). In a critical block of trials, a violation may be realized by combining the same elements in a novel and thus unknown way. These experiments have aspects in common with research on implicit learning of regularities in event sequences (see Reber, 1989, for a review), which have demonstrated that subjects are capable of taking into account structural properties
Involuntary attention
177
of stimulus information in shaping their behavior, while incapable of describing these properties. Consistent results stem from different paradigms. First, it has been shown that serial reproduction as well as prediction of a stimulus sequence with some experimental grammar is better than when the same stimulus sequence has a random order (Dulany, Carlson and Dewey, 1984; Reber, 1967; Reber and Millward, 1971). Second, a continually improving detection performance is observed when a certain rule determines the sequence of positions at which targets appear (Lewicki, Czyzewska and Hoffmann, 1987; Lewicki, Hill and Bizot, 1988). In search studies subjects do not usually notice the distinct regularities in the sequence of context elements, nor do they consciously recognize violations of the sequence. Yet, the critical session, in which the regularity is violated, revealed a clear increase of regressive eye movements (Nattkemper and Prinz, 1991). Regressive eye movements may be viewed as an indication for difficulties in processing information (Prinz, Nattkemper and Ullmann, 1991). Hence, this result suggests that violations of thus far existing event regularities lead to specific problems in processing that often require a retesting of previously analyzed areas of the context. The observation that deviations from the rule cause modifications in the pattern of eye movements suggests also that the regularities have become part of the representational structure that controls the search activities. As a result of this excursion into the area of visual search, it may be concluded that aspecific selection processes, based on internal models, are also operative in continuous tasks. However, it should be noted that, so far, the data from search experiments merely demonstrate that continuous search is mainly controlled by internal models. The mechanisms determining the generation, maintenance and ongoing actualization of the internal models are largely unknown.
4
CONCLUDING REMARKS
Involuntary attention in this contribution refers to processes of attending that are not elicited by intentions but by certain outside events. The discussion has been fully limited to aspecific processes of attending, elicited by the deviation of stimulus events from a given context. A distinction has been drawn between more simple and more complex deviations, i.e. level shifts and rule interruptions, although these cannot be considered as clearly separated categories but rather reflect a continuum. Suddenly appearing novel events (the onset of section 2) are at the one end, while more complex rule interruptions, as met in the discussion on visual search (section 3.3), are at the opposite end of the continuum. This proposed differentiation notwithstanding, the phenomena of involuntary attending have been incompletely covered in several respects. First, there are probably various types of other outside events that can elicit processes of involuntary, i.e. unintentional, attending. Such processes, which might be lumped together under the label of 'specific selection' (compare section 1), have not been dealt with. A second restriction may be even more essential: when referring to 'involuntary attention' or 'aspecific selection' the implicit classification (voluntary/involuntary) has always been in terms of elicitors of attending. The relevant processes are involuntary to the extent to which they are determined by outside events, and aspecific in the sense that the events are defined in relation to the situational context, namely as deviants from that context. The obvious essential question is
178
M. Eimer et al.
whether distinct elictors of attending are actually accompanied by distinct differences in the nature of the processes. Do involuntary and intentional attending merely differ with respect to their actual determinants, or are the processes involved related to essentially different mechanisms? More generally formulated, the issue is to what extent the various elicitors reflect functional differences. When considering the processes outlined in this chapter under this perspective, there are reasons to doubt functional differences. As mentioned at the end of section 2, the recent research on attending, elicited by either central or peripheral cues, does not unequivocally suggest the operation of two mutually independent attention systems. The observed differences in the effects of central symbolic cues and peripheral onsets may, in fact, be explained by the assumption that a single common attentional mechanism is activated by these types of stimuli, albeit in a different way. In the context of the discussion about the relations between rule interruptions and involuntary attention (section 3), the central question of the research on visual search as well as that on event-correlated potentials concerned the type of mechanism through which rule interruptions can be identified. The question whether, following successful detection of a deviant, attending is qualitatively different from voluntary determined attention has not been answered, nor has it become clear how one could experimentally prove a possible difference. One of the reasons for this state of affairs is that, thus far at least, electrophysiological indicators of attentional processes and their effects have been almost exclusively concerned with voluntary attending (see also Chapter 9). Thus, voluntary and involuntary attending are distinguished only by the way in which they are elicited. The question of whether involuntary attention exists in the sense of an independent attention system remains fully open.
ACKNOWLEDGEMENT D. Nattkemper was supported by grant Pr 118/9 from the Deutsche Forschungsgemeinschaft (DFG). The final version of this manuscript was completed in November 1991. Therefore the review of the literature only takes into account publications available up to 1991.
REFERENCES Badia, P. and Delfran, R. H. (1970). Orienting responses and GSR conditioning: A dilemma. Psychological Review, 77, 171-181. Barry, R. J. (1984). Preliminary process in OR elicitations. Acta Psychologica, 55, 109-142. Berlyne, D. E. (1974). Attention. In E. C. Carterette and M. P. Friedman (Eds), Handbook of Perception: Historical and Philosophical Roots of Perception, vol. I (pp. 123-147). New York: Harcourt Brace Jovanovich. Bernstein, I. H., Clark, M. H. and Edelstein, B. A. (1969a). Effects of an auditory signal upon visual reaction time. Journal of Experimental Psychology, 80, 567-569. Bernstein, I. H., Clark, M. H. and Edelstein, B. A. (1969b). Intermodal effects in choice reaction time. Journal of Experimental Psychology, 81, 405-407.
Involuntary attention
179
Bernstein, I. H. and Edelstein B. A. (1971). Effects of some variations in auditory input upon visual choice reaction time. Journal of Experimental Psychology, 87, 241-247. Besson, M. and Macar, F. (1987). An event-related potential analysis of incongruity in music and other non-linguistic contexts. Psychophysiology, 4, 351-360. Cammann, R. (1990). Is there a mismatch negativity (MMN) in the visual modality? Behavioral and Brain Sciences, 13, 234-235. Cheal, M. and Lyon, D. R. (1991). Importance of precue location in directory attention. Acta Psychologica, 76, 201-211. Ciesielsky, K. T. (1990). Variability, gnostic units and N2. Behavioral and Brain Sciences, 13, 236-237. Courchesne, E., Courchesne, R. Y. and Hillyard, S. A. (1978). The effect of stimulus deviation on P3 waves to easily recognized stimuli. Neuropsychologia, 16, 189-199. Davis, H., Mast, T., Yoshie, N. and Zerlin, S. (1966). The slow response of the human cortex to auditory stimuli: Recovery process. Electroencephalography and Clinical Neurophysiology, 21, 105-113. Donchin, E. (1979). Event-related potentials: A tool in the study of human information processing. In H. Begleiter (Ed.), Evoked Potentials and Behavior (pp. 13-75). New York: Plenum Press. Donchin, E. (1981). Surprise!...Surprise? Psychophysiology, 18, 493-513. Donchin, E. and Coles, M. G. H. (1988). Is the P300 component a manifestation of context updating? Behavioral and Brain Sciences, 11, 357-374. Dulany, D. E., Carlson, R. A. and Dewey, G. I. (1984). A case of syntactical learning and judgement. How conscious and how abstract? Journal of Experimental Psychology: General, 133, 541-555. Diirr, E. (1907). Die Lehre von der Aufmerksamkeit. Leipzig: Quelle and Meyer. Ebbinghaus, H. (1911). Grundziige der Psychologie (Bd. 1). Leipzig: von Veit. Eimer, M. (1990). Representational content and computation in the human visual system. Psychological Research, 52, 238-242. Elsenhans, T. (1912). Lehrbuch der Psychologie. T6bingen: J.C.B. Mohr. Fabiani, M., Gratton, G., Karis, D. and Donchin, E. (1987). Definition, identification and reliability of measurement of the P300 component of the event-related brain potential. Advances in Psychophysiology, 2, 1-78. Ford, J. M. and Hillyard, S. A. (1981). Event-related potentials (ERPs) to interruptions of a steady rhythm. Psychophysiology, 18, 322-330. Gati, I. and Ben-Shakar, G. (1990). Novelty and significance in orienting and habituation: A feature-matching approach. Journal of Experimental Psychology: General, 119, 251-263. Giard, M. H., Perrin, F., Pernier, J. and Bouchet, P. (1990). Brain generators implicated in the processing of auditory stimulus deviants: A topograhic event-related potential study. Psychophysiology, 27, 627-640. Groves, P. M. and Thompson, R. F. (1970). Habituation: A dual-process theory. Psychological Review, 77, 419-450. Hari, R., H/im/il/iinen, M., Ilmoniemi, R., Kaukoranta, E., Reinikainen, K., Salminen, J., Alho, K., N/i/it/inen, R. and Sams, M. (1984). Responses of the primary auditory cortex to pitch changes. Neuromagnetic recordings in man. Neuroscience Letters, 50, 127-132. Heuer, H. and Prinz, W. (1987). Initiierung und Steuerung von Handlungen und Bewegungen. In M. Amelung (Ed.), Bericht i~ber den 35. Kongre~ der Deutschen Gesellschaft ~r Psychologie in Heidelberg 1986 (Bd. 2, pp. 289-299). G6ttingen: Hogrefe. Hillyard, S. A. and Hansen, J. C. (1986). Attention: Electrophysiological approaches. In M. G. H. Coles, E. Donchin and S. W. Porges (Eds), Psychophysiology: Systems, Processes, and Applications (pp. 227-243). Amsterdam: Elsevier. Hillyard, S. A., Hink, R. F., Schwent, V. L. and Picton, T. W. (1973). Electrical signs of selective attention in the human brain. Science, 182, 177-180.
180
M. Eimer et al.
Hillyard, S. A., M6nte, T. F. and Neville, H. J. (1985). Visual spatial attention, orienting and brain physiology. In M. I. Posner and O. S. Marin (Eds), Mechanisms of Attention: Attention and Performance, vol. XI (pp. 63-84). Hillsdale, NJ: Erlbaum. Horn, G. (1967). Neuronal mechanisms of habituation. Nature, 215, 707-711. Jacobs, A. M. (1986). Eye movement control in visual search: How direct is visual span control? Perception and Psychophysics, 39, 47-58. James, W. (1890). The Principles of Psychology. New York: Holt. Jonides, J. (1981). Voluntary versus automatic control over the mind's eye's movement. In J. B. Long and A. D. Baddeley (Eds), Attention and Performance, vol. IX (pp. 187-203). Hillsdale, NJ: Erlbaum. Jonides, J. and Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception and Psychophysics, 43, 346-354. Kemner, C. and Verbaten, M. N. (1990). P3 to targets and novels in different modalities. Supplement to Psychophysiology, $44, 27. Kreibig, J. C. (1897). Die Aufmerksamkeit als Willenserscheinung. Wien: H61der. Krumhansl, C. L. (1982). Abrupt changes in visual stimulation enhance processing of form and location information. Perception and Psychophysics, 32, 511-523. Kutas, M. and Hillyard, S. A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207, 203-205. Lewicki, P., Czyzewska, M. and Hoffmann, H. (1987). Unconscious acquisition of complex procedural knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 523-530. Lewicki, P., Hill, T. and Bizot, E. (1988). Acquisition of procedural knowledge about a pattern of stimuli that cannot be articulated. Cognitive Psychology, 20, 24-37. Luck, S. J., Heinze, H. J., Mangun, G. R. and Hillyard, S. A. (1990). Visual event-related potentials index focused attention within bilateral stimulus arrays. II. Functional dissociation of P1 and N1 components. Electroencephalography and Clinical Neurophysiology, 75, 528-542. Lykken, D. T. (1959). The GSR in the detection of guilt. Journal of Applied Psychology, 43, 358-388. Maltzman, I. (1979). Orienting reflexes and significance: A reply to O'Gorman. Psychophysiology, 16, 274-281. Miller, J. (1989). The control of attention by abrupt visual onsets and offsets. Perception and Psychophysics, 45, 567-571. Miiller, H. J. and Rabbitt, P. M. A. (1989). Reflexive and voluntary orienting of visual attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315-330. N/i/it/inen, R. (1979). Orienting and evoked potentials. In H. D. Kimmel (Ed.), The Orienting Reflex in Humans (pp. 61-75). Hillsdale, NJ: Erlbaum. N/i/it/inen, R. (1988). Implications of ERP data for psychological theories of attention. Biological Psychology, 26, 117-163. N/i/it~inen, R. (1990). The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function. Behavioral and Brain Sciences, 13, 201-288. N/i/it/inen, R. (1992). Attention and Brain Function. Hillsdale, NJ: Erlbaum. N/i/it/inen, R. and Gaillard, A. W. K. (1983). The orienting reflex and the N2 deflection of the event-related potential (EP). In A. W. K. Gaillard and W. Ritter (Eds), Tutorials in ERP Research: Endogenous Components (pp. 119-141). Amsterdam: North-Holland. N/i/it/inen, R., Gaillard, A. W. K. and M~intysalo, S. (1980). Brain potential correlates of voluntary and involuntary attention. Progress in Brain Research, 54, 343-348. N/i/it/inen, R. and Michie, P. T. (1979). Early selective attention effects on the evoked potential. A critical review and interpretation. Biological Psychology, 8, 173-187.
Involuntary attention
181
N~i~it~inen, R., Paavilainen, P., Alho, K., Reinikainen, K. and Sams, M. (1987). The mismatch negativity to intensity changes in an auditory stimulus sequence. Current Trends in
Event-Related Brain Potentials Research (EEG Supplement 40). N~i~it~inen, R., Paaivilainen, P. and Reinikainen, K. (1989). Do event-related potentials to infrequent decrements in duration of auditory stimuli demonstrate a memory trace in man? Neuroscience Letters, 107, 217-222. N~i~it~inen, R., Paavilainen, P, Tiitinen, H., Jiang, D. and Alho, K. (1993). Attention and mismatch negativity. Psychophysiology, 30, 436-450. N~i~it~inen, R. and Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure. Psychophysiology, 24, 375-425. N~i~it~inen, R., Simpson, M. and Loveless, N. E. (1982). Stimulus deviance and evoked potentials. Biological Psychology, 14, 53-98. Nakayama, K. and Mackeben, M. (1989). Sustained and transient components of focal visual attention. Vision Research, 29, 1631-1647. Nasman, V. T. and Rosenfeld, J. P. (1990). Parietal P3 response as an indicator of stimulus categorization: Increased P3 amplitude to categorically deviant target and nontarget stimuli. Psychophysiology, 27, 338-350. Nattkemper, D. (1990). Mechanismen der Steuerung sakkadischer Augenbewegungen- Neue Funde beim Suchen. In C. Meinecke and L. Kehrer (Eds), Bielefelder Beitr~ge zur Kognitionspsychologie (pp. 1-26). G6ttingen: Hogrefe. Nattkemper, D. and Prinz, W. (1984). Costs and benefits of redundancy invisual research. In A. Gale and F. Johnson (Eds), Theoretical and Applied Aspects of Eye-Movement Research (pp. 343-351). Amsterdam: North-Holland. Nattkemper, D. and Prinz, W. (1991). On Dynamic Pertinence Models: Further Evidence from Continuous Search. M~nchen: Max-Planck-Institut f~ir Psychologische Forschung. Nattkemper, D., Ullmann, T. and Prinz, W. (1991). Adjusting Saccadic Eye Movements to Variations of Stimulus Complexity. Evidence from Continuous Search. XIV European Conference on Visual Perception, Vilnius, Lithuania. Neisser, U. (1963). Decision-time without reaction-time: Experiments in visual scanning. American Journal of Psychology, 76, 376-385. Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts. Neumann, O. (1984). Automatic processing: A review of recent findings and a plea for an old theory. In W. Prinz and A. F. Sanders (Eds), Cognition and Motor Processes (pp. 255293). Berlin: Springer. Nordby, H., Roth, W. T. and Pfefferbaum, A. (1988). Event-related potentials to breaks in sequences of alternating pitches or interstimulus intervals. Psychophysiology, 25, 262268. Ohman, A. (1979). The orienting response, attention and learning: An information-processing perspective. In H. D. Kimmel, E. H. van O|st and J. F. Orlebeke (Eds), The Orienting Response in Humans (pp. 443-471). Hillsdale, NJ: Erlbaum. Ohman, A. (1992). Preattentive processing and orienting. In B. A. Campbell, H. Hague and R. Richardson (Eds), Attention and Information Processing in Infants and Adults: Perspectives from Human and Animal Research (pp. 236-295). Hillsdale, NJ: Erlbaum. Paavilainen, P., Alho, K., Reinikainen, K., Sams, M. and N~i~it~inen R. (1991). Right-hemisphere dominance of different mismatch negativities. Electroencephalography and Clinical Neurophysiology, 78, 466-479. Paavilainen, P., Karlsson, M. L., Reinikainen, K. and N~i~it~inen R. (1989). Mismatch negativity to change in spatial location of an auditory stimulus. Electroencephalography and Clinical Neurophysiology, 73, 129-141. Picton, T. W. (1980). The use of human event-related potentials in psychology. In I. Martin and P. H. Venables (Eds), Techniques in Psychophysiology. New York: Wiley.
182
M. Eimer et al.
Picton, T. W. and Hillyard, S. A. (1988). Endogenous event-related potentials. In T. W. Picton (Ed.), Human Event-Related Potentials. EEG Handbook (revised series, vol. 3) (pp. 361-426). Amsterdam: Elsevier. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Posner, M. I., Nissen, M. J. and Ogden, W. C. (1978). Attended and unattended processing modes: The role of set for spatial location. In H. L. Pick and E. J. Saltzman (Eds), Modes of Perceiving and Processing Information (pp. 137-157). Hillsdale, NJ: Erlbaum. Pribram, K. H. and McGuinness, D. (1975). Arousal, activation and effort in the control of attention. Psychological Review, 82, 116-149. Pribram, K. H. and McGuinness, D. (1992). Attention and para-attentional processing: Event-related brain potentials as tests of a model. In D. Friedman, G. E. Bruder (Eds), Psychophysiology and Experimental Psychopathology: A tribute to Samuel Sutton. Annals of the New York Academy of Sciences, vol. 658. pp. 65-92. Prinz, W. (1977). Memory control of visual search. In S. Dornic (Ed.), Attention and Performance, vol. V/(pp. 441-462). Hillsdale, NJ: Erlbaum. Prinz, W. (1979). Integration of information in visual search. Quarterly Journal of Experimental Psychology, 31, 287-304. Prinz, W. (1983a). Redundanzausnutzung bei kontinuierlicher Sucht/itigkeit. Psychologische Beitriige, 25, 12-56. Prinz, W. (1983b). Wahrnehmung und Ttitigkeitssteuerung. Heidelberg: Springer. Prinz, W. (1986). Continuous selection. Psychological Research, 48, 231-238. Prinz, W. (1990a). On dynamic pertinence models. In H. G. Geissler, M. Miiller and W. Prinz (Eds), Psychophysical Explorations of Mental Structures (pp. 411-421). Toronto: Hogrefe and Hogrefe. Prinz, W. (1990b). Unwillkiirliche Aufmerksamkeit. In C. Meinecke and L. Kehrer (Eds), Bielefelder Beitriige zur Kognitionspsychologie (pp. 49-75). G6ttingen: Hogrefe. Prinz, W. and Ataian, D. (1973). Two components and two stages in search performance: A case study in visual search. Acta Psychologica, 37, 255-277. Prinz, W. and Nattkemper, D. (1986). Effects of secondary tasks on search performance. Psychological Research, 48, 47-52. Prinz, W. and Nattkemper, D. (1987). Integrating non-target information in coninuous search. Perception and Action, Report No. 155. Bielefeld: ZiF. Prinz, W., Nattkemper, D. and Ullmann, T. (1991). Moment-to-moment control of saccadic eye movements: Evidence from continuous search. In K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer. Prinz, W., Tweer, R. and Feige, R. (1974). Context control of search behavior: Evidence from a 'hurdling' technique. Acta Psychologica, 38, 72-80. Rayner, K. and Fisher, D. L. (1987). Letter processing during eye fixations in visual search. Perception and Psychophysics, 42, 87-100. Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior, 77, 317-327. Reber, A. S. (1989). Implicit learning and tacit knowledge. Journal of Experimental Psychology: General, 118, 219-235. Reber, A. S. and Millward, R. B. (1971). Event tracking in probability learning. American Journal of Psychology, 84, 85-99. Ritter, W., Vaughan, H. G. and Costa, L. D. (1968). Orienting and habituation to auditory stimuli: A study of short term changes in average evoked responses. Electroencephalography and Clinical Neurophysiology, 25, 550-556. Rockstroh, B. and Elbert, T. (1990). On the relations between event-related and autonomic responses: Conceptualization within a feedback loop framework. In A. Rohrbough, D. Parasuraman and R. Johnson (Eds), Event-Related Brain Potentials. Oxford: Oxford University Press.
Involuntary attention
183
R6sler, F. (1982). Hirnelektrische Korrelate kognitiver Prozesse. Berlin: Springer. R6sler, F., Hasselmann, D. and Sojka, B. (1987). Central and peripheral correlates of orienting and habituation. EEG Suppl. 40: Current Trends in Event-Related Potential Research, 366-372. Sams, M., Paavilainen, P., Alho, K. and N~i~it~inen, R. (1985). Auditory frequency discrimination and event-related potentials. Electroencephalography and Clinical Neurophysiology, 62, 437-448. Schandry, R. and H6fling, S. (1979). Interstimulus interval length and habituation on the P300. In H. D. Kimmel (Ed.), The Orienting Reflex in Humans (pp. 129-134). Hillsdale, NJ: Erlbaum. Scherg, M., Vajsar, J. and Picton, T. W. (1989). A source analysis of the late human auditory evoked potentials. Journal of Cognitive Neuroscience, 1, 336-355. Schneider, W. and Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. Detection, search and attention. Psychological Review, 84, 1-66. Schrojer,E., N~i~it~inen, R. and Paavilainen, P. (1992). Event-related potentials reveal low non-attended complex sound patterns are represented by the human brain. Neuroscience letters, 146, 183-186. Shiffrin, R. M. and Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127-190. Siddle, D. A. and Packer, J. S. (1987). Stimulus omission and dishabituation of the electrodermal orienting responses: The allocation of processing resources. Psychophysiology, 24, 181-190. Snyder, E. and Hillyard, S. A. (1976). Long-latency evoked potentials to irrelevant, deviant stimuli. Behavioral Biology, 16, 319-331. Sokolov, E. N. (1963). Perception and the Conditioned Reflex. New York: Pergamon Press. Sokolov, E. N. (1975). The neuronal mechanisms of the orienting reflex. In E. N. Sokolov and O. S. Vinogradova (Eds), Neuronal Mechanisms of the Orienting Reflex (pp. 217-338). New York: Wiley. Spoor, A., Timmer, F. and Odenthal, D. W. (1969). The evoked auditory response (EAR) to intensity modulated and frequency modulated tones and tone bursts. International Audiology, 8, 410-415. Squires, N. K., Donchin, E., Squires, K. C. and Grossberg, S. (1977). Bisensory stimulation: Inferring decision-related processes from the P300 component. Journal of Experimental Psychology, 3, 299-315. Squires, N. K., Squires, K. C. and Hillyard, S. A. (1975). Two varieties of long-latency positive waves evoked by unpredictable auditory stimuli in man. Electroencephalography and Clinical Neurophysiology, 38, 387-401. Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effect of visual onsets and offsets. Perception and Psychophysics, 50. Todd, J. T. and van Gelder, P. (1979). Implications of a sustained-transient dichotomy for the measurement of human performance. Journal of Experimental Psychology: Human Perception and Performance, 5, 625-638. Treisman, A. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology: Human Perception and Performance, 8, 194-214. Treisman, A. M. and Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Unger, S. M. (1964). Habituation of the vasoconstrictive orienting reaction. Journal of Experimental Psychology, 67, 11-18. Verbaten, M. N. (1990). N~i~it~inen's auditory model from a visual perspective. Behavioral and Brain Sciences, 13, 256-257. Verbaten, M. N., Roelofs, J. W., Sjou, W. and Slangen, J. L. (1986). Habituation of early and late visual ERP components and the orienting reaction: The effect of stimulus information. International Journal of Psychophysiology, 3, 287-298.
184
M. Eimer et al.
Verleger, R. (1988). Event-related potentials and cognition: A critique of the context updating hypothesis and an alternative interpretation of P3. Behavioral and Brain Sciences, 11, 343-356. Warner, B. C., Juola, J. F. and Koshino, H. (1990). Voluntary allocation versus automatic capture of attention. Perception and Psychophysics, 48, 243-251. Winkler, I., Paavilainen, P., Alho, K., Reinikainen, K., Sams, M. and N~i~it~inen, R. (1990). The effect of small variation of the frequent auditory stimulus on the event-related brain potential to the infrequent stimulus. Psychophysiology, 27, 228-235. Woods, D. L. and Elmasian, A. (1986). The habituation of event-related potentials to speech sounds and tones. Electroencephalography and Clinical Neurophysiology, 65, 447-459. Wundt, W. (1903). Grundzi~ge der Physiologischen Psychologie, 5th edn. Leipzig: Engelmann. Yantis, S. and Jonides, J. (1984). Abrupt visual onsets and selective attention: Voluntary vs. automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 10, 601-621. Yantis, S. and Jonides, J. (1990). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 10, 601-621. Zimny, G. H., Pawlick, G. F. and Saur, D. P. (1969). Effects of stimulus order and novelty on orienting responses. Psychophysiology, 6, 166-173.
Chapter 6 Automatic and Controlled Information Processing: The Role of Attention in the Processing of Novelty G. Underwood and J. Everatt2 ~University of Nottingham, England and 2University of Surrey, England The words and sentences that I am writing at the moment are being written using a word-processor. I have the distinct impression that I must think about the meanings of these sentences, and about their grammatical construction, but that the act of producing the words can take care of itself. Typing the words, spelling them conventionally and even selecting the most appropriate word are all activities that do not need my attention. These activities might be described as having become automatized. They required attention at one t i m e - when I was learning to spell and then again when I was learning to t y p e - but no longer is my mind occupied with these low-level writing skills. My attention now focuses upon more general problems of composition: what to say next, and how to express it. Although this declaration of claimed skill concerns the output of information, the idea that low-level processing is automatized is a suggestion that can also be applied to recognition. You may not need to attend to the form of each letter in each word, or even to each word in this sentence, but if you want to extract the underlying meaning then it is to the meaning that you must attend. If you do not, then the words may be recognized, but the meanings of sentences will be lost. If, when you get to the bottom of the page, you realize that you have been thinking about something other than the relationships that I am trying to describe, then a re-reading will be necessary. The words will then look familiar, but the text will not. One possible reason for this is that words have a largely invariant relationship with the meanings stored in the reader's internal lexicon, whereas the texts are novel. Invariance allows the reader to learn the relationship between the input and the required cognitive action, and when learning is complete attention may 'drop out' of the processing sequence. When this happens we may say that the activity has become automatized. In this discussion of automatic behavior we shall consider input and output activities, with the overview of attention as being necessary for the processing of novel inputs and novel outputs. To do this it is first necessary to consider the relationship between attention and automatization, and to examine the effects of practice upon both.
Handbook of Perception and Action, Volume 3
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
ISBN 0-12-516163-8
185
186
G. Underwood and J. Everatt
ATTENTION, AUTOMATIZATION OF PRACTICE
AND THE ROLE
Automatic activities have been considered as those that possess one or more of a set of 'defining' characteristics. The sort of characteristics that have been proposed in the past are that automatic activities: (1) (2) (3) (4) (5) (6) (7) (8)
develop with extensive practice; are performed smoothly and efficiently; are resistant to modification; are unaffected by other activities; do not interfere with other activities; are initiated without intention; are not under conscious control; do not require mental effort.
The first five of these characteristics can be derived mainly from laboratory observations, whereas the characteristics involving intention, conscious control and mental effort are either inferred or arise from subjective reports. The list is descriptive rather than definitive and is taken from a variety of sources including LaBerge (1981), Logan (1988), Posner and Snyder (1975), Schneider and Shiffrin (1977) and, of course, James (1890). It is intended to provide a general description of the characteristics that have been applied to something that is considered automatic. There is no agreed criterion for categorizing an activity as being automatic rather than volitional; however, proponents of the two-process view (Posner and Snyder, 1975; Shiffrin and Schneider, 1977) consider that processes can be split up into those that are automatic and those that are attentional, i.e. automaticity can be considered as all-or-none. More recent views have emphasized the notion of a continuum of automaticity, with new, unskilled activities at the conscious control end and familiar, highly practiced activities at the automatic end (Cohen, Dunbar and McClelland, 1990; Kahneman and Chajzyck, 1983; Logan, 1985). As more experience of an activity within a constant environment is encountered, so the activity moves from the controlled end toward the automatic end. The arguments put forward by proponents of the continuum view is that automatic processes gain characteristics of automaticity with practice (MacLeod and Dunbar, 1988) and still show effects of attentional factors even when they are considered to be automatic (Francolini and Egeth, 1980; Kahneman and Henik, 1981). It does seem unlikely that all activities are either automatically or consciously controlled, and, although a whole activity may not be automatic, component processes within that skill may be (see the arguments of Jonides, Naveh-Benjamin and Palmer, 1985; Shiffrin, 1988; Shulman, 1990). In analyzing the automatized components of an activity we have three options available. The first two options are 'positive' in that they continue to use the assumption that behavior can be under automatic or conscious control, but the third, negative, option simply says that the distinction has no value in that activities do not fall into one or other of these categories. The first positive option is to propose (as above) that there exists a continuum of automaticity, with new, unskilled activities at the 'conscious control' end and with familiar, highly practiced activities at the 'automatic control' end. One
Automatic and controlled information processing
187
problem with this description is that it gives a one-dimensional impression of the the nature of skill. Another is that it assumes that skills are crystallized, whereas they change with practice. This change can be described in terms of the acquisition of automatization. The second positive option, then, is to propose that skills are organized hierarchically, and that the automatization of the low-level subskills will progress with practice. Skill acquisition is then seen as the increasing automatization of the component subskills. Attention may be directed initially at the control of the low-level components, but as practice is increased these components are automatized and attention can be released for higher-level components. In the case of typing, a low-level component would be hitting a required key, and a higher-level component would be transforming an idea into the surface form of a sentence. In the case of playing a game such as tennis, these levels would be the equivalent of gripping the racquet to meet the ball versus deciding where to place the ball to put your opponents at their greatest disadvantage. Even the low-level components are attention-demanding for the novice, but the Wimbledon finalist will be allocating little thought to details such as the angle of the racquet head. Instead, this highly skilled player will have the impression that these details can take care of themselves, and that thought should be given to decisions about game tactics such as whether to approach the net more often. The novice may not even be aware of having these thoughts, even if there was time for them. This model requires us to describe the mechanism whereby practice allows attention to move up the skill hierarchy, and this is part of the purpose of the present discussion. Practice on perceptual-motor tasks has several observable effects, including an increase in accuracy, an increase in speed, and an increase in the smoothness of performance. For the set of activities at which we are personally a d e p t - tying our shoelaces, making a gear change when driving, hitting an approaching tennis ball with a racquet, or typing at a keyboard, p e r h a p s - w e might also have the impression that we are not always aware of having initiated each motor response. The list of characteristics of automatized activities mentioned at the beginning of this section may seem to be a good description of what happens when the well-practiced motorist changes gear in response to a change in the car's engine speed. Upon reflection, this motorist may be unable to recall having initiated or executed the gear change even though it was completed perfectly adequately. The motorist may also have been engaged in conversation with a passenger, and neither the gear change nor the conversation would have interfered with each other. At the same time as the gear change, other perceptual-motor activities will have been performed in order to keep the vehicle traveling along its intended path. The gear change may therefore be described as being automatized. The novice driver would have very different experiences, starting with having to decide when to initiate the action. The overt indices of performance change with practice, but can we give any weight to the verbal reports of reflections upon our own skilled activities? If these reflections do not correlate with any externally observable change in performance, then they have little credibility. On the other hand, if the subjective reports of attention-free behavior coincide with changes in performance such as speed and accuracy improvements, then we are entitled to look at the reports in more detail. They may be emerging after the event, in which case they are not very interesting. For instance, it may simply be that the performer noticed an improvement in performance, and that change is now attributed to a particular state of awareness.
188
G. Underwood and J. Everatt
If this has happened, then we cannot know whether a change in performance has been accompanied by a change in attentional control. Did the change in attention accompany the change in performance or did it follow this change? To make matters worse, these are not mutually exclusive possibilities. The subjective reports of automatized behavior are obtained after the behavior has been produced, and so perhaps we simply have unreliable memories of skilled performances. Perhaps we think that the performance was attention-free for the reason that we did not record a memory during performance. In this case, attention may have been allocated, but when no memory is recorded we subsequently have the impression that we were behaving without attention. We must be very cautious of subjective reports from skilled performers, as it is not clear whether their impressions of changes in attention come as an accompaniment to changes in performance or as a result of changes in performance. Practice gives an action faster, more accurate and smoother performance, and the change can also be described in terms of a change in the cognitive structures that mediate performance. A relatively conventional account of the change starts with a categorization of motor performance according the feedback necessary for successful execution. The model suggested by Adams (1976), Keele and Summers (1976), Reason (1979), Underwood (1982) and others describes novel activities as requiring closed-loop control (CLC) in that performance of the individual components of the activity requires individual checking. The closed loop here refers to feedback from execution of an individual action being used to check the match between intention and action. If there is a match, then the next individual action can be executed. Behavior under CLC is halting, slow and variable. Practice has the effect of eliminating the need to use feedback. The skilled performer issues a command for action and does not check that the individual action matches the individual intention. The elimination of feedback from the sequences of actions is described as the change to open-loop control (OLC). In the OLC mode, feedback is not used to check the intention-action match and performance becomes smoother because there are no longer any interruptions to the flow of action. Performance becomes faster because the time taken to check the feedback is eliminated, and accuracy is improved because the performer is now able to issue instructions for action based upon over-learned associations. The evidence in favor of this view of a change from CLC to OLC is reviewed by Underwood (1982), where the use of feedback is identified with attention. The subjective impressions that accompany a highly skilled action under OLC result from the removal of attention from the production of the action sequence. The CLC/OLC model applies equally well to motor skills and to cognitive skills such as recalling familiar facts (simple multiplication calculations, the names of certain capital cities and world politicians, for example). Provided that the performer does not need to check the intention-action match, then attention will be unnecessary for the production of the answer to questions concerning these facts. An alternative view of the effects of practice is proved in Logan's (1988) instance-based theory of automatization, which will be described in some detail shortly. Briefly, this view suggests that practice has the effect of taking the performer from a reliance upon algorithm-based actions to a reliance upon memories. The algorithms must be calculated each time an unskilled action is performed, while the practiced action can rely upon a memory of the stimulus and its accompanying action. Calculation and operation of the algorithm requires mental
Automatic and controlled information processing
189
resources, while memory-based performance is free of attention. The two models are compatible, of course, if Logan's algorithm requires the use of feedback and can be described as running under CLC, while the reliance upon memories is free of feedback and runs under OLC. However, the use of feedback is not explicitly specified by Logan. The model of skills that suggests a hierarchical structure is a model best applied to complex activities that can be only loosely categorized as skills. Activities such as typing, riding a bicycle and playing a game such as tennis all clearly involve skill, but whether the whole activity is a single skill is a more debatable matter. They are assemblies of coordinated but individual perceptual-motor skills, defined as learned motor responses to distinctive perceptual inputs in order to achieve a specified goal. The integration of sensory information with muscular responses is usually considered to be an essential part of skilled behavior. Activities such as reading, remembering and performing mental arithmetic, in which the observable motor response is irrelevant, each share the critical features of a skill. Motor activities such as playing tennis are said to be skilled but it is just as easy to describe them as collections of smaller-scale skills. Being able to hit a ball with a tennis racquet, giving it varying amounts of momentum and sending it in an intended direction might in itself be considered a skill. Being able to serve the ball requires different coordinated actions to those required when returning the ball in a rally, and so we might also want to describe serving as an independent skill. Similarly, returning the ball from behind the baseline, while stationary, requires different perceptual-motor coordinations to the action involved in hitting a volley on the run. Being able to impart backspin to the ball might in itself be considered an independent skill, and so on. Any of these high-level activities may be described in terms of their components, which themselves have the characteristics of a skill. They differ from the high-level skill in the generality of the goal. We play tennis for one reason, but we hit the ball with a racquet specifically as a part of playing tennis. The lower down this hierarchy, the more constrained will be the skill. There are many possible tennis games that can be played, but a smaller number of ways of serving the ball, and an even smaller number of ways of gripping the racquet when hitting the ball. Descending the hierarchy can therefore be said to reduce the degrees of freedom associated with the action, and this in turn can be described as reducing the novelty of the action. One's grip on the racquet handle will be much the same as it was the last time a forehand shot was played, and so this part of the action is invariant. The trajectory of the ball, as it arrives to invite a return volley, will be slightly different from the last time the shot was played, and only at the lower levels of the skill hierarchy will the required action be invariant from instance to instance. The argument will suggest that as we consider higher levels of the hierarchy, this invariance is reduced, and so the task of automatization becomes more difficult. Not that automatization of a global activity is impossible. For example, wellpracticed drivers can report that for whole sections of a route they have no recollection of having done anything. These are instances of what Reed (1972) called 'time-gaps', and they occur when a skilled operator can daydream while responding to changing perceptual inputs. The predictability of the input can be appreciated only by those who have learned the statistical constraints of the input, and once learned it can produce well-learned responses. The individual may return to their external reality only when an unpredictable event occurs or when attention is
190
G. Underwood and J. Everatt
called to some perceptual configuration sufficiently novel to be as yet unlearned. Although high-level activities may be considered as automated to the extent of occurring without awareness, it is the low-level components of a skill that are more readily performed with this form of control, and it is to these that we will direct most of the following discussion, though higher-level processes will be considered later.
1.1
Dual Tasks and Resource Limitations
As discussed above, a skill can be considered to be a learned motor response to a distinctive perceptual input in order to achieve a specific goal. For such a skill to be automatic, the processes of perceiving the perceptual input and retrieving and executing the learned motor response must be automatic. It is also possible that the whole skill is not automatic, but that one or more processes within that skill may be; for example, the processing of the stimulus may be automatic. (Jonides et al. (1985), argue that stimulus identification may be automatic in tasks that show features of nonautomaticity.) In investigating automatic versus attentional processing, various techniques have been used in the hope of finding tasks that show one or more of the characteristics mentioned above. The techniques have ranged from perceptual/attentional orienting responses to the problem of trying to perform two tasks at the same time. As discussed above, the act of changing gear could be considered as an automatic act in the skilled driver, because it can occur without attending to it (unconsciously), and while other acts, or processes, occur at the same time, e.g. talking to a passenger, listening to a play on the radio, reading a road sign, etc. The act of changing gear does not affect the other acts and so may be considered to be occurring without limiting resources to other tasks. Researchers have used this feature of well-practiced skills to investigate automatic skills. Types of experiments using this technique are usually referred to as dual-task experiments. In these a subject has to perform a well-practiced task while at the same time performing a secondary task. Examples of tasks in which a secondary task was used to assess the performance on the task are: typing a piece of text while at the same time shadowing (repeating verbally) an aurally presented message (Shaffer, 1975); playing the piano while performing the similar secondary shadowing task (Allport, Antonis and Reynolds, 1972); writing a message to dictation, while at the same time reading (Hirst et al., 1980; Spelke, Hirst and Neisser, 1976); tracking a stimulus with the hand, while verbally responding to a second stimulus (McLeod, 1977); and memorizing a set of figures for recall while performing a multiple-choice stimulusresponse task (Logan, 1979). These experiments suggest that, with large amounts of practice on one of the tasks, performance of it and the secondary task can be relatively unaffected by having to perform them together. One explanation of these findings is that extensive practice on one task allows it to be performed automatically and so as items (6) and (7) of the above characteristics of automatized behavior state, it will not interfere with, or be affected by, another task (LaBerge, 1981; Logan, 1988). The automatization of the one task will also allow more attentional processes to perform the other task. This view has become know as the limited resources viewpoint. It considers that a certain amount of attentional resources can be applied to a task(s)
Automatic and controlled information processing
191
and if these are exceeded then interference between tasks will occur, owing to the competition for those resources (Kahneman, 1973; LaBerge and Samuels, 1974; Posner, 1978; Schneider and Shiffrin, 1977). This is the basis of the view of automatization being useful. If there is a limited amount of attentional resources available, then performing simple, invariant tasks without the use of these resources (i.e. performing it automatically) will allow them to be used for other processes. For example, LaBerge and Samuels (1974) considered that as individuals learn to read then simple processes such as letter recognition and word recognition can become automatic, leaving attentional resources to be used to understand a piece of discourse. However, there are at least two other explanations for these effects. First, it is possible that both tasks require attentional processing, but that attention can be switched between the tasks (Hirst et al., 1980; Neisser, Hirst and Spelke, 1981). Second, it is possible that the different tasks have their own separate source of attention-like processes with which to perform the tasks. This latter point has become associated with the idea of modules within the human processing system (Fodor, 1983; Minsky, 1980). Modules are separable information processing systems, usually associated with a well-practiced task or skill, and are themselves considered to be at least partially under automatic control. Proponents of multiple resources views (Navon and Gopher, 1979; Wickens, 1984) consider that each module can make use of its own pool of limited-capacity resources and so interference between these tasks will occur only when the tasks use the same module (see also Allport, 1980). Such a view makes it difficult to distinguish between a totally automated process and one that is only partially automated and is using a separate source of attention. One recent theory of automaticity that has tried to circumvent this problem is that of Logan (1988).
1.2 The Algorithm/Instance Theory of Automaticity Logan (1988) attempts to side-step the above issue of resource limitations by considering automatization as simply memory retrieval. Automaticity in this view is a memory phenomenon, governed by factors that govern memory. There is no need to consider single versus multiple sources of resources. Automaticity will occur as a result of the processes of memory retrieval. Logan considers automatization as the acquisition of a domain-specific knowledge base formed from separate representations (instances) of each exposure to a task. A task becomes automated when it is based on 'single-step direct-access retrieval of past solutions from memory' (p. 493). (It is closely related to instance theories of memory and categorization; cf. Hintzman, 1986.) Novices, who do not possess such a knowledge base, perform a task via a general algorithm. As the novice gains experience of the task, specific solutions are learnt. When a large enough d a t a / m e m o r y base is stored, algorithm-based performance can be abandoned for the direct retrieval of a solution to a given input. Thus a behavior is automatic when it is accomplished via the process of memory retrieval and nonautomatic when it is accomplished via the algorithm. Performance within a task can also be partly automatic in that some responses can be accomplished via memory retrieval whereas others are accomplished via the algorithm.
192
G. Underwood and J. Everatt
Practice improves performance in Logan's view by increasing the number of individual traces connecting a response to a specific stimulus in pursuit of a specific goal. When the stimulus is encountered again in the context of the same goal the trace will be retrieved and the subject can respond on the basis of the retrieved information. Logan considers that there is a race between algorithm-based performance and retrieval-based performance. If retrieval is quicker then it controls the response; if not, the algorithm controls the response. The more instances there are, the more likely that at least one of them will win the race and so more practiced tasks become more likely to be performed automatically. This view is the main difference between Logan's view of automaticity and many other theories. For example, MacKay (1982) considers that automatization is produced by a connection between perception and action, but that increased practice increases the strength of that connection. The more practice, the stronger the connection and so the faster the process from stimulus to response. Similar views have been proposed by Anderson (1982), Schneider and Shiffrin (1977) and Schneider (1985). The recent PDP model of Cohen et al. (1990) considers that increased performance is due to a strength factor which is related to the weightings placed on connections between units within a processing pathway. This model allows for the possibility that memory for an event is encoded in the strengths of a set of connected units within a distributed system which allows overlapping, but distinct, representations of individual stimulus-response pathways. This model may allow the amalgamation of a strength theory with the features of instancebased learning. As we have seen, the CLC/OLC model of automatization (Underwood, 1982), which identifies feedback with attention, may be compatible with Logan's instance model. The use of CLC, in which the performer attends to the feedback from each individual action, would be applied to an algorithm used for an unfamiliar action. The use of OLC, with feedback no longer being inspected, would correspond to a memory presenting a solution before the algorithm can generate an action. A memory can be evoked rapidly when there is an invariant calling-pattern. Automatic performance depends upon the development of a oneto-one relationship between a stimulus and its cognitive response, as when we hear a familiar word, for example, and know what it means without having to consider alternatives. The stimulus in this case can be regarded as a calling-pattern which evokes a specific 'condition tE action rule' and which does not have to be decoded by the use of resource-limited algorithms. The two models diverge on the matter of memory-based performance. For Logan, automatic performance results from the availability of a large number of instances of previous encoding-retrieval episodes, whereas the CLC/OLC model suggests that these episodes compile into condition iE action rules which can be called by familiar stimuli. The transition from CLC to OLC, which corresponds to the compilation process, results from the direction of attention away from algorithmic, CLC-based performance. Attention becomes unnecessary when the performer no longer needs to match the intention with the action. Since Logan's model assumes that instances of particular solutions to specific stimuli are encoded, generalization from performance on one stimuli to another should not occur. Logan (1988) tested this prediction by presenting subjects with w o r d / n o n w o r d lexical decision tasks. Continued practice on the same set of words and nonwords was shown to reduce the mean and variability in decision times to those words and nonwords, but did not transfer to new words and nonwords. That
Automatic and controlled information processing
193
nonwords show similar effects suggests that the effect is not to be found in semantic memory, but in representations of specific episodes (Jacoby and Brooks, 1984). In another experiment, subjects performed a continuous task on a set of w o r d / nonword stimuli, or varied between two tasks: lexical decision and pronouncibility. A later frequency judgement task showed performance in the varied tasks condition was at the same level of performance as after the continuous task condition. These results are more easily explained from the point of view of Logan's instance-based theory. Process-based strength theories would have difficulty in explaining these findings unless by considering more individual traces between input and output. Logan's model explains the noninterference in dual-task experiments such as those mentioned above by considering that performance on the well-practiced task is memory retrieval based, while performance on the other task will be algorithm based. However, since we do not know what criteria make up memory retrieval and what make up algorithm-based retrieval (Logan suggests that there could be a great many such algorithms each with its own set of properties), deciding when a process is accomplished by memory retrieval (automatic) and when it is not will be very difficult. Also, since memory may be unlimited, there are no bounds to the degree of automaticity that can be achieved with this model. To explain why two automatic tasks can be performed without interference, while using the same process (memory retrieval), we are again left to conclude that it depends on whether they use the same memory retrieval resources or not. Although Logan's model may not provide a solution to the problems of studies of automaticity, it may, however, provide a useful null hypothesis from which to study automatic or nonautomatic processes. By considering that automaticity is a function of an increase in memory and an already available retrieval process, rather than the occurrence of a new set of processes, or changes in underlying processes, studies of automaticity could almost be done away with and exchanged with already existing studies of the process of memory retrieval. Studies of skill acquisition would look at the functioning and properties of algorithm-based processing up to the point of automaticity, when studies of memory retrieval would take over. What is left to decide then is whether or not a particular automatic skill, set of skills or process can be reduced to the level of memory retrieval. In summary, whether the two-process viewpoint is a useful way of considering human functioning within various processes, or skills, remains to be seen. The role of practice suggests that a process, or skill, may not be considered as either automatic or attentional, but rather as varying along a continuum from strongly attention based to strongly automatic; or, in terms of Logan's model, from algorithm controlled to memory retrieval controlled. Another way of looking at this is to consider a skill as a set of subskills and that automatization of a skill is reflected in the attention-free performance of those subskills. This attention-free performance may spread up the hierarchy, producing an increasingly more autonomous skill. The role of attention is determined by the constraints of the task and its invariant condition ~ action relationships. The open-loop or closed-loop performance of these subskills may then be an important criterion for assessing automaticity. The more open-looped the performance, the more automatic the functioning. As suggested above, these viewpoints may not be mutually exclusive. It is possible that subskills have their own continuum of automaticity, as when a tennis player has a good forehand, but a poor backhand. The backhand may show
194
G. Underwood and J. Everatt
signs of variability and less smoothness than the forehand and may be more prone to external distraction. A final possibility is that the automatic/attentional distinction is not useful and investigations of human behavior should perhaps consider problem-solving processes and memory retrieval processes, since attention (and so automaticity) is a variable within these processes, not a defining characteristic. In the following section we will look at the evidence for automatic processing, by reviewing the relationship between attention and input processing.
2
AUTOMATIC
INFORMATION
PROCESSING
Logan's (1988) model used lexical decision tasks as a basis for assuming memory retrieval as a plausible basis for automatic functioning. Here the input is a string of letters which access a stored trace, which in turn initiates the response given the desired goal. In our initial discussion we looked at how the individual experience of reading a piece of text can take the form of recognition of written information without understanding of the underlying meaning. Within the bounds of automaticity theories, many have considered that for well-practiced readers written information is automatically recognized (most notably, LaBerge and Samuels, 1974). However, even for this subprocess of a well-learned skill there is evidence that word identification (Kellas, Ferraro and Simpson, 1988) and even individual letter recognition (Paap and Ogden, 1981) seem to use limited resources. For example, Kellas et al. (1988) provided subjects with words with multiple meanings and single meanings. In line with previous findings (Jastrzembski, 1981; Rubenstein, Garfield and Millikan, 1970), they found that multiple-meaning words were recognized more rapidly than single-meaning words and these words showed less interference with a secondary task than slower single-meaning words. Whatever the reason for this effect (Kellas et al. (1988) explain the findings along the lines of a cascading, interactive system as proposed by McClelland and Rumelhart (1981)) it suggests that when word identification is accomplished quickly, resources can be transferred to secondary tasks more quickly. (Becker (1976) found a similar effect for lexical decisions on high- versus low-frequency words in combination with a concurrent probe task.) This suggests that word identification, at least within the bounds of a lexical decision task, used the same resources as in the secondary probe task. This finding is difficult to accommodate into the view that word identification is automatic and does not use up limited resources. Past evidence, used to conclude that word identification is to some degree automatic, involved studies looking at automatic priming effects and Stroop-like interference effects. However, recent evidence suggests that these are also affected by attention and intention factors.
2.1
Stroop-Like Interference Effects
The classic Stroop effect is to present a subject with a word written in colored ink and to ask for the name of the ink color. If the word itself named a color that was different from the color of the ink, responses tend to be considerably slower than if the word name and color name are the same, or if the word is not the name of a
Automatic and controlled information processing
195
color (Dyer, 1973). Variations on this procedure have included identifying letters flanked by other letters (Eriksen and Eriksen, 1974; Eriksen and Schultz, 1979), classifying words flanked by words from other categories (Shaffer and LaBerge, 1979), naming pictures with related words positioned within the frame of reference (Lupker and Katz, 1981; Rosinski, Golinkoff and Kukish, 1975; Underwood and Briggs, 1984), naming visually presented digits while hearing different spoken digits (Greenwald, 1972), and repeating spoken presented words in one ear while related words are presented to the other ear (Lewis, 1970; Treisman, Squire and Green, 1974). The common characteristic of these tasks is that attention is required for the analysis and response to one stimulus and that an associated stimulus presented at the same time results in a slower response to the attended stimulus. An account of this effect in terms of automaticity would consider that the distracter word or letter is automatically recognized and interferes with the performance of the required response. Although most interpretations suggest that the interference occurs at the response stage (Dyer, 1973; Eriksen and Eriksen, 1974), Shaffer and LaBerge (1979) found that the usual interference effects were produced when subjects were given distracter stimuli assigned to the same response as the target but different categories. This suggests that interference may occur at a semantic categorization stage, presumably before response allocation. It also suggests that noninterference in these tasks occurs only up to the point of semantic categorization and that the only processes that possess this automatic feature are recognition processes (LaBerge, 1981). Evidence supporting the view that such interference is due to automatic processes comes from findings that the interference effects increase as reading skills increase (Rosinski et al., 1975; Schiller, 1966). Thus as reading becomes more automatic (and so more resistant to conscious intervention), so its interference with other tasks increases (Hasher and Zacks, 1979). However, evidence from MacLeod and Dunbar (1988) suggests that this is not an all-or-none phenomenon. It does not follow that once a process is considered automatic then it is free from interference effects. MacLeod and Dunbar gave subjects training on naming a group of novel shapes. The names given to the shapes were the names of four familiar colors. During training the shapes were presented in a neutral color. The naming times for the shapes were compared with the naming times for the four colors themselves in a neutral shape, the naming times for the shapes when they appeared in color and the naming times for the colors when they were presented in the form of the shapes. With up to 2 h training, colors were named faster than shapes and interference occurred only when naming the shapes in color. This was a typical Stroop-like interference effect and would be expected if the color-naming task were automatic to some extent and so interfered with the shape-naming task. However, with 5 h training, interference was found when naming colors in the form of the shapes as well as naming shapes when they appeared in color, even though shape naming was still slower than color naming (this is against simple speed of processing explanations of these interference effects; see Dyer, 1973). With 20 h training, shape and color naming were equivalent and interference occurred only when naming a color in the form of the shapes. This suggests either that the initial interference effect was not due to automaticity, or that automatic processes can show effects of interference from other well-practiced tasks. MacLeod and Dunbar (1988) interpret their findings as suggesting a continuum upon which tasks are positioned in terms
196
G. Underwood and J. Everatt
of how automatic they are. Interference between tasks in these terms depends on the relative position along that continuum. Logan (1988) also considers that two well-practiced tasks can show interference as training on one shows more and more use of the same memory retrieval processes as the first. However, if memory retrieval is obligatory and control of a process is due to some sort of race between ways of performing a task, it is difficult to see why after 5 h of training interference should be produced by a slower task. A second problem for the view that these interference effects are produced by an automatic process is that many studies have found that interference can depend on what the subject attends to within the presented stimuli. For example, Kahneman and Chajzyck (1983) found that if a second word was added to a display showing interference from a color word positioned below a color patch, interference was actually reduced, suggesting that attention-grabbing stimuli can reduce the effects of the interfering stimuli. Similarly, Kahneman and Henik (1981) gave subjects two words, a color word and a noncolor word. One was presented in colored ink, the other in black ink. The subjects' task was to name the colored ink. Kahneman and Henik found that if the color word was presented in colored ink then more interference took place than if the noncolor word was in colored ink. Even though in both cases the color word was presented to the subjects, different amounts of interference were produced depending on whether the subject was attending to the color word or not. Francolini and Egeth (1980) found the same effect using colored compared with neutral digits. Irrelevant digits showed more interference when they were colored the same as the target stimuli than when they were not. Other research suggests that interference reduces as the distance between target and distracter increases (Gatti and Egeth, 1978; Goolkasian, 1981). Thus the amount of interference produced seems to depend greatly on the amount of attention paid to the potentially interfering stimuli. This is difficult to explain if it is considered that identification of written stimuli is automatic. It seems more likely that some initial attentional process concentrates processing on particular features within the presented stimuli. This view is similar to the idea of early attentional selection views (Broadbent, 1971, 1982; Treisman, 1960; Treisman and Gelade, 1980) in which attentional/selection factors play an early role in the processing of stimuli, and thus determine what is perceived and the way it is perceived. For example, Treisman and Gelade's (1980) feature integration theory considers that simple features of the stimuli (e.g. brightness, orientation, color, movement) are initially processed automatically and in parallel, and these are identified as particular objects/percepts by focusing attention on the particular features that make up that percept. Thus what we see depends on what this focused attention, or filtering (Broadbent, 1982; Kahneman and Treisman, 1984), directs us to see. The opposing viewpoint, or late selection theories, suggest that what is perceived is determined much later in the processing of the stimuli, possibly after identification (Deutsch and Deutsch, 1963; Duncan, 1980; Keele, 1973; Norman, 1968, 1969). The above findings are more in line with the early selection views. The interference shown in the Stroop task poses problems for the early selection models of attention, and at the very least it demands a refinement in order to account for the perceptual interference of an unattended word upon the processing of a response to the color of ink. The refinement suggested by Treisman (1969) was that the involuntary processing of the word resulted from processing within a modular subsystem. Attention selects the perceptual analysers which form the
Automatic and controlled information processing
197
subsystem, but has more difficulty in selecting the analysers within the subsystem. Kahneman and Treisman (1984) consider Stroop interference to arise through the failure of 'filtering' - the process by which we select between two perceptual events. This failure is said to occur if selection requires the use of analysers within the same subsystem, and the appearance of interference is not considered to challenge the notion of early selection by these authors. Indeed, the evidence presented above suggests greater Stroop interference from attended words than from unattended words, and this can be interpreted as suggesting that early selection against a word does moderate its processing. A similar conclusion comes from the modified Stroop task using picture/word interference. Here the viewer names a picture in the presence of a conflicting word: the advantage of using pictures is that a greater variety of pictures and words can be used, and the word is now a perceptual event which is physically distinct from the picture. Experiments that have used the picture/word version of the Stroop task have either used sheets containing several pictures, with the total sheet time as the dependent measure (Rosinski et al., 1975), or have used single-trial designs (see below). There are several restrictions imposed by using the total sheet time, namely: lists do not separate recognition time from pronunciation time; search patterns and eye movements can confound the reading time; there may be distraction from other pictures on the page; and the blocking of stimuli within experimental conditions may induce readers to adopt strategies that rely more or less upon the word according to whether it will be helpful or harmful in processing the picture. A single-trials design which used tachistoscopic presentations, and which avoided the restrictions associated with sheets of pictures and words, has been the basis for a series of our experiments looking at influences of unattended words (Briggs and Underwood, 1982; Underwood, 1976, 1977, 1981; Underwood and Briggs, 1984; Underwood and Thwaites, 1982; Underwood and Whitfield, 1985). The interest in this series is not so much the processing of the pictures or other target stimuli as the influence of the unattended words that accompany them. By varying the relationship between picture and word, and observing the effects of different relationships, we have inferred the extent of processing of the word. This is the same process of inference by which Lewis (1970), Bradshaw (1974) and others have concluded that unattended words are analyzed for meaning. In the first of the picture-processing studies it was found that unattended words that were related in meaning to the picture adversely affected the naming response (Underwood, 1976). The pictures were presented in a predictable location, and were therefore fixated by the viewers, and the words were printed to one side. Pictures and words were printed on the same tachistoscope cards, which were displayed for 60 ms. The task was to name the picture, which was a line-drawing of a common object, as quickly as possible. In comparison with all other experimental conditions, including nonassociated words, nonwords and pictures without distracting words, the associated words slowed down the picture-naming response. The subjects in this experiment could focus their attention upon the appearance of the picture, and attempt to select against the word. Even so, the meaning of the word was influential, and the experiment indicates that focusing attention upon one stimulus is not always sufficient to exclude the analysis of a second stimulus. The next experiment in the series qualifies the conclusion that all unattended words are analysed (Underwood, 1976, experiment 2). If subjects cannot know in advance the location of the picture, then associated words inhibit the naming
198
G. Underwood and J. Everatt
response, as before, but nonassociated words provide even greater inhibition. The uncertainty over the location of the picture would require that the viewers divide their attention between the two possible locations until the moment of presentation. At this point they could select the picture and ignore the stimulus in the other location. Printed in the to-be-ignored location was the unattended word, however, and the delay in selecting against this location would allow more of this unwanted stimulus to be processed than in the experiment where selection was made prior to stimulus presentation. These two experiments are similar to Dallas and Merikle's (1976) precue and postcue conditions, and the conclusions are also similar: there is greater processing of the unattended word when attention is divided. The explanation of these effects of unattended words is crucial for the well-being of early selection theories: can they survive the appearance of so many demonstrations of the analysis of unattended words? The effects of selectivity give support, in that unattended words are more disruptive if attention is more divided. The early selection theories could present these effects as a demonstration of the power of the attenuation process, but the unattended words are effective even with focused attention and this poses a substantial problem. Attenuation is seen at its strongest in the focused attention experiments of Bradshaw (1974), Bryden (1972), Dallas and Merikle (1976), Lewis (1970), Underwood (1976)- these studies will be discussed in later sections. Even under these focused attention conditions, associates of the target were seen to influence processing. Other experiments lead to the same conclusion, and will be described presently. Attenuation alone does not prevent the analysis of meaning. There are basically two accounts of these effects of unattended words, one which describes them as arising from the recognition of all unattended words, and the other considers that only associated words are recognized. We have previously described these two accounts as the 'nonselective access hypothesis' and the 'contextual facilitation hypothesis' (Underwood, 1981). The effect to be explained, and which is such a potential problem for early selection theories, is the influence of an unattended associate under conditions of focused attention. Nonassociates are rarely reported as having an effect under these conditions. The nonselective access hypothesis considers that all unattended words gain access to the lexicon, regardless of their relationship with the target stimulus. The selective effect of an associate may then arise during the processes after recognition, and candidate processes include selection of the recognized lexical token, and selection of the response. As two stimuli activate their lexical representations, say, a target word and an unattended word, one of them may be selected as the basis for the organization of the response. If the two sources of activation in the lexicon point to semantically distant words then their separation may pose no processing difficulty, and selection of the lexical token would continue unimpaired. However, if the two words are associates, then the selection of the target may be impeded by the activity caused by its near neighbor. In this way different effects will be caused by the presentation of unattended associates and nonassociates, even though both types of words have been recognized to the level of their semantic properties. In this hypothesis it is the semantic similarity between target and distracter that results in a difficulty in separating them. An alternative account of the effects of unattended words is provided by the contextual facilitation hypothesis. In this case only associated words are analyzed, and it is by virtue of their association with the target that the analysis can occur.
Automatic and controlled information processing
199
When this analysis has occurred, the further processing of the target can be impeded. The sequence events would be as follows. First the target is recognized lexically, and at this time the distracter would gain only primitive analysis. Its existence would be noted, together with analysis of physical characteristics such as loudness, location, size or pitch of voice. At this stage there would be little or no analysis of meaning of the distracter. As the target is processed then contextual facilitation would become available, perhaps through the process of spreading activation within the lexicon (Anderson, 1983; Collins and Loftus, 1975; Meyer and Schvaneveldt, 1971; Warren, 1977). The process envisaged here is one whereby recognition thresholds are reduced whenever an associated stimulus is processed. Treisman (1960) appealed to a similar process in her experiment reporting that sometimes plausible unattended words are shadowed. In this case the context prior to an unattended word had reduced the recognition thresholds for these unattended words. Even though they had been attenuated, sufficient information had accessed their lexical representations for lexical recognition with the temporarily reduced threshold. And so it might be with the case of simultaneous distracters. The processing of the attended stimulus may act to reduce the recognition thresholds of all associated words, and leave unassociated words unaffected. If one of these associates is available in the environment, then even its attenuated form may be sufficient to exceed the reduced recognition threshold, as it did in Treisman's experiment. At this stage in the sequence, the target is fully recognized, the thresholds of target-associates have been reduced, and an unattended associate can activate its lexical representation. Nonassociates are unable to progress this far in the sequence, and have no effect upon the processing of the target. An associate can be recognized, and once recognized it is available to influence future processing of the target by the same routes as suggested by the nonselective access hypothesis. The associate may impair selection of the lexical token, or selection of the response, but a nonassociate can have no effect whatsoever. The two hypotheses provide different accounts of the influence of unattended words that are associates of currently attended words. In several experiments, an inhibition effect has come from nonassociates (Dallas and Merikle, 1976, postcueing conditions; Underwood, 1976, experiment 2; Underwood, 1981), and this effect is informative. The contextual facilitation hypothesis cannot account for the appearance of inhibition from nonassociates, given that it suggests that unattended words are recognized only by associative facilitation. The nonselective access hypothesis assumes that all unattended words gain lexical access, associates and nonassociates alike, and can accommodate inhibition effects from nonassociates by suggesting that they have their effect at a processing stage different from that of associates. The effects of nonassociates are greatest under divided attention and when exposure durations are increased: this might indicate that the effects are most apparent when the nonassociate is available for verbal report and therefore able to produce response competition. In these circumstances the viewer might be expected to be aware of the identity of the unattended word. Associates and nonassociates would produce response competition, of course, and so the hypothesis is left to explain why associates should produce less inhibition than nonassociates. One account is to say that these two kinds of words produce the same amount of inhibition at the response selection stage, and that the difference between them arises earlier in the sequence of processing when associates inhibit targets less than nonassociates.
200
G. Underwood and J. Everatt
The hypothesis does not identify the stage at which we should observe reduced inhibition from associates. It identifies encoding and lexical access as a stage at which associative facilitation can occur, and it identifies response selection as the location of response competition, but to account for inhibition from associates a third stage must be implicated. One possibility is the stage at which the recognized word is selected from the lexicon in preparation for the response. If the target and its associated unattended word generate cross-facilitation, then both lexical representations will become more activated than if a nonassociate had been the unattended word. This enhanced activation might then facilitate the selection of the target from the lexicon, in the preparation of the response.
2.2
A s s o c i a t i v e Facilitation or Associative I n h i b i t i o n
These hypothetical accounts of the progress of targets and distracters both acknowledge the effects of divided attention while at the same time assuming that unattended words are recognized at some level of processing. The effects of dividing attention are, first, to allow greater inhibition from nonassociated distracters, and second, to delay the response to the target. With focused attention, the naming or categorization of the target is faster, and unattended words are less distracting. The pattern of distraction is also seen to change, with nonassociates becoming less effective or noneffective. The effect of an associate depends upon the specific task being performed, and upon the conditions of presentation. Associative facilitation or inhibition may be observed, and the direction of the effect appears to depend upon the difficulty of encoding of the target. When the target and distracter can be seen clearly, then associative inhibition has been observed in a variety of tasks (e.g. Underwood, 1976; Underwood and Briggs, 1984; Underwood and Thwaites, 1982; Underwood and Whitfield, 1985). In the picture categorization experiments reported with Alison Whitfield, associative inhibition changed to associative facilitation when we made the subject's task more difficult by masking the stimuli. In the word-naming experiments of Allport (1977) and Dallas and Merikle (1976), associative facilitation was found with masked presentations in an identification and a speeded response task, respectively. With difficult target viewing, the encoding of the target may be aided by presentation of an unattended stimulus which shares some semantic feature with the target. There may be cross-facilitation through spreading activation, and this would enhance the selection of the target from the lexicon. The stage of encoding would be less likely to be influenced under easy viewing conditions, because the target would be recognized before the attenuated distracter, and would have progressed to a later stage in the sequence. The inhibition would then be seen to occur during one of the stages involving the selection and organization of the response. The notion of two interference effects is supported by data recently reported by La Heij, Dirkx and Kramer (1990). While an influence upon the encoding stage may produce facilitation from an associated item, an influence of the same item upon a decision stage may produce inhibition. In these experiments subjects named briefly presented pictures, and printed words acted as primes. The related primes were members of the same semantic category, e.g. the picture of a chair would be
Automatic and controlled information processing
201
accompanied at different times by the word 'bed' or the word 'table' or by control words unrelated to the picture. La Heij et al. reported two effects, one involving differences between priming words and one involving the variable time interval between presentation of the word and presentation of the picture (the stimulusonset asynchrony, or SOA). Words that were members of the same category as the picture, but which were weak associates (chair/'bed') only produced inhibition effects in the picture-naming task. The size of this inhibition effect varied according to the SOA, with the largest effect when the word was presented shortly after presentation of the picture. There was no reliable inhibition effect for a weak associate appearing before the picture. This leads to the suggestion that the inhibition effect occurs late in the sequence of picture recognition and name selection. The inhibition effect is considered to result from interactions between word representations in the output lexicon, and prevents the simultaneous activation of two phonological forms. This inhibition effect is the only influence that can be observed with weakly associated words. Whereas they will be recognized and will be active during picture name selection, any early activity will not be sufficiently strong enough to aid name selection. Facilitation can arise only when a strong associate is present. In contrast to the 'inhibition-only' effect found with a weak associate, La Heij et al. reported that a strong associate of the picture (chair/'table') produced a facilitation effect when presented before the picture and inhibition when presented after the picture. Not only must a strong associate be present for facilitation to be observed, but the associate must appear before the target picture. This facilitation effect may result from lexical priming, with identification of the word aiding selection of the picture name in the output lexicon. An alternative interpretation would be to suppose that the facilitation effect could arise from identification of the word aiding the identification of the picture itself. The nature of the facilitation effect is considered in the following section, which focuses upon automatic word encoding.
2.3 Associative Facilitation by Priming The classic demonstration of priming effects consists of a presentation of two stimuli that are semantically related. One item can be observed to aid the processing of the second (target) item, with the two presentations usually, but not necessarily, being asynchronous. When processing of a target item is facilitated by the prior presentation of a related item, then the target is said to have been primed. By looking at the effect of a related item upon a target, relative to the effect of a neutral item, the direction of the priming effect can be observed as the nature of the relationship is varied. Also by looking at the effect of an unrelated item relative to the neutral item, we get a clear picture of the direction of facilitation and inhibition effects. Suppose that a related prime results in a faster response than does an unrelated prime. This alone does not tell us whether the related prime generates a facilitation effect or whether the unrelated prime generates an inhibition effect. The neutral prime serves an invaluable function here. It may have an effect that is similar to the unrelated word (in which case we can conclude that the related item results in facilitation), or it may be most like the related word (in. which case the
202
G. Underwood and J. Everatt
related item has no effect but the unrelated item causes an inhibition effect). The importance of the neutral item is in providing a baseline against which two other conditions can be measured. Using this reasoning, Neely (1977) presented subjects with a letter string that was to be classified as a word or nonword, preceded by a word (prime) that was semantically related or unrelated to the word targets. The effects were compared with the effects of a neutral prime (a group of Xs). The results suggested an automatic priming effect, in that word targets were responded to faster following a related prime than following a neutral or unrelated prime. Explanations for these results, along the lines of an automatic effect, consider that activation from the lexical entry of the prime spreads to the lexical entry of the target and increases its activation (Collins and Loftus, 1975; Posner and Snyder, 1975). Thus the recognition of an item means less evidence is needed for a related item to exceed a recognition threshold; the first item can be said to be aiding the recognition of the second item. This priming effect has also been found in experiments where the prime and target were presented together (Meyer and Schvaneveldt, 1971), in experiments using a naming task (Seidenberg et al., 1984; Warren, 1977), in experiments using semantic categorization decisions (Guenther, Klatzky and Putnam, 1980), in experiments using sentence primes instead of single word primes (Stanovich and West, 1981, 1983a), in experiments using single letter prime and targets (Posner and Snyder, 1975), and in experiments using pictures instead of words as primes/targets (Carr, et al., 1982). The evidence presented in the previous section suggests that making a task harder (by masking the stimuli) produces facilitation by related items. A similar suggestion has been proposed by Carr et al. (1982) to explain the differing effects of picture/word priming in naming and categorization. Carr et al. found in a naming task that pictures were more primable than words, whereas Guenther et al. (1980) found in a semantic categorization task that words were more primable than pictures. These findings led Carr et al. to conclude that words were more automatically named, while pictures were more automatically categorized, and that priming aided the more difficult task. Stanovich and West (1983a) considered a similar viewpoint to explain their findings that word frequency and priming interacted; low-frequency words showed greater priming effects. They considered that the harder it is to process a word, the more associative priming effects will aid the processing of that w o r d - t h e y put forward their views as an interactivecompensatory model of word recognition. A further source of facilitation effects is suggested by Neely (1977). Subjects were given category name primes that were followed by an exemplar of that category on most trials ('bird'-'robin'); this showed the normal facilitation from related primes soon after the prime. Other subjects though were given category names that were followed by an exemplar from another category on most trials ('body'-'door'). Here priming effects were found when a large enough gap between prime and target allowed subjects to switch their attention to the different category. This suggests another source of facilitation effects caused by consciously controlled strategiesstrategies that can be described as being intentionally initiated and given attention during their execution. Similar distinctions between automatic and strategic facilitation effects were proposed by Posner and Snyder (1975) using letter stimuli. The occurrence of such strategic effects within tasks such as lexical decision (word/ nonword decision), as used by Neely (1977), can produce problems in deciding the
Automatic and controlled information processing
203
source of effects within word recognition-so much so that some have questioned the assumption that such effects are due to the normal functioning of the word recognition system. For example, Balota and Chumbley (1984) proposed that lexical decisions could be made by a familiarity assessment, outside lexical processes. Other accounts of priming effects consider that the effects are due to some sort of a post-recognition coherence check (deGroot, Thomassen and Hudson, 1982), due to searches through context-based lists and to inhibition effects from integration processes after recognition (Forster, 1981), and due to intentions to respond in a particular way (Neumann, 1984). In conclusion, several stages of facilitation are suggested by the results from the studies discussed so far. First, facilitation can occur at the stage of accessing a lexical entry. This may be via a process of automatic spreading activation from a related lexical entry. The evidence for this is that facilitation is apparent early in the processing of the target (La Heij et al., 1990), suggesting initial encoding processes, and occurs soon after the appearance of the prime (Neely, 1977), in turn suggesting a fast-acting process. Items that are harder to process show larger priming effects (Stanovich and West, 1983a). A second source of facilitation may be at a stage of selecting a recognized item from the lexicon in preparation for a response. This is suggested by the differing effects of associated and nonassociated words on responses to other items (Underwood, 1976). Responses that are harder for a given stimuli show larger priming effects (Carr et al., 1982). A final stage at which facilitation may occur is during which it is under relatively more attentional control. Here subject strategies can show effects on responses to stimuli (Neely, 1977; Posner and Snyder, 1975). A distinction between automatic priming and attentional priming is thus suggested, with the time between prime/target onset and the type of task performed being important variables within these effects. However, there are alternative viewpoints for the source of these effects, as mentioned. The distinction may be made clearer by the effects of unattended stimuli, as discussed in the previous two sections. However, the findings here are not entirely clear-cut, and in the following sections we will discuss these effects further.
2.4
Attention in Simultaneous Presentations
The automatic effects of unattended messages have been investigated extensively with presentations of simultaneous speech and presentations of visual displays containing more than one element. When an unattended element, whether spoken or visual, influences the processing of an attended element, then we can conclude that recognition of the element does not require attention and may be automatic. Interactions between items can also be observed in dichotic listening tasks. Here subjects are presented with messages to both ears at the same time and the task is usually to attend to one message and repeat it out loud ('shadowing'). The time taken to shadow each word can be measured, and the effects on this of presenting related or unrelated words in the unattended message observed. Interference between related words was demonstrated in Lewis's (1970) experiment, in which listeners shadowed one message from a dichotic presentation of lists of words, and their shadowing responses were timed. The unattended words, to which no response was required, were not always unrelated to the words presented at the
204
G. Underwood and J. Everatt
same time in the attended message. In comparison with shadowing latencies when the two members of a dichotic pair were unrelated, which can be considered as the baseline control, unattended synonyms slowed down the shadowing response. Lewis reported other effects of unattended meanings and, for example, antonyms tended to speed up the shadowing. These results indicate that the meaning of an unattended word is recognized at some level of analysis. This is not to say that the listener had necessarily been aware of the meaning of the unattended word, but that activation had occurred in the part of the processing system that responds to lexical meaning. Recognition in this sense corresponds to activation which is selective to a specific feature of a word. In this case, the feature is lexical meaning. It is not entirely clear why antonyms of attended words should have exactly the opposite effect of synonyms in Lewis's experiment, and similar experiments have not confirmed the direction of the effect. The influence of unattended words upon shadowing has been confirmed, however, in experiments reported by Bryden (1972) and by Treisman et al. (1974). Bryden extended Lewis' result by demonstrating that an unattended antonym which is presented prior to a related shadowed word, rather than simultaneously with it, also speeded up the shadowing response. Synonyms also produced a facilitation effect, in contrast to Lewis' inhibition effect, and in contrast to Treisman et al., who also found inhibition with synonyms. A significant feature of the Treisman et al. result is that the inhibition effect appeared only for words near the beginning of a dichotic list. When they appeared a few seconds after the beginning, the difference between synonyms and the control words was eliminated. Treisman et al. interpreted their positive result as being an indication of serial processing of the two related messages at a time when capacity was not fully occupied by one message. At the beginning of the list both messages could be analyzed, and the synonym relationship recognized. Another way of looking at their result is to consider it as a function of the increased focusing of attention during the presentation of the dichotic messages. At the beginning of the lists the task for the listener is to select the message that is to be attended, and at this point there would be a small amount of sampling of the to-be-unattended message. This sampling might be necessary for the listener to confirm that the message does not possess the features of the to-be-attended message. It was during this period of selection, when attention was not fully focused, that unattended words were most effective in the Treisman study. By the time attention was focused, just a few seconds into the list, the synonyms were ineffective. This explanation would indicate an important relationship between the focusing of attention and the semantic analysis of unattended messages. Evidence consistent with the notion of semantic processing of unattended words comes from experiments using a slightly different approach, but one which again looks for an influence of a completely ignored unattended message. Smith and Groen (1974) and Traub and Geffen (1979) reported experiments in which subjects heard short dichotic lists, with instructions to attend to the words presented in one ear only. A test word was then presented, and the subjects required to indicate whether or not this word had been in the attended list. The test words of interest are those that were not in the attended list (i.e. negative probes), but were in the unattended list. Smith and Groen found that these particular test words gained slower response times and higher error rates than negative probes that had not been in the unattended list. Their result held only if the words in the unattended
Automatic and controlled information processing
205
list were members of the same semantic category as the attended words. Traub and Geffen confirmed this result, and also looked at the effects of increasing the focusing of attention. By precueing the attended words with a sequence of five digits, they increased the listener's ability to select the appropriate words, but this did not change the interference effect. Even with a good selection cue their listeners were influenced by the meaning of the unattended words, although Traub and Geffen attributed their results to acoustic rather than semantic analysis. Unattended lists were certainly not processed to the same categorical level of analysis as the attended lists, this conclusion coming from their second experiment. Listeners in this experiment heard lists of words taken from one category or from different categories. When the words came from one category, not only would they act to prime each other, but the shared category feature could be used by listeners as an organization aid. The homogeneity of the unattended list did not influence the response to negative probes taken from that list, but inhomogeneous attended lists produced slower and less accurate responses. The deficit associated with inattention was the failure to recognize the common category of the words in the list. If the meaning of an unattended message is recognized only when attention is poorly focused, then the early selection model of Treisman (1960) and Broadbent (1971, 1982) gives a satisfactory account of the findings that unattended information did not influence performance. Johnston and Wilson's (1980) comparison of divided and focused attention also supports this conclusion with a dichotic listening experiment. Their experiment showed that the detection of a target word was affected by the meaning of a simultaneous word only when attention was divided. When listeners could focus upon one message, the unattended words did not influence performance. Similarly, a study by Kidd and Greenwald (1988) found that recall of attended stimuli was not affected by presentation within an unattended list, suggesting that repetition effects require attention to become effective. The results from studies of visual presentations of words and pictures also suggest a role for attenuation during input processing, but that attenuation does not preclude processing of the unattended stimulus (Underwood, 1976). The discussion so far presents evidence that processes thought of as automatic can themselves be products of, and influenced by, attention, the very process that automaticity is supposed to spare. It also suggests that plausible automatic processes can show interference effects from other processes, and interfere with other processes. There is also evidence that well-practiced automatic skills are not free from subject control, e.g. findings that typing and speaking can be inhibited quickly when errors occur (Ladefoged, Silverstein and Papcun, 1973; Levelt, 1983; Logan, 1982; Rabbitt, 1966). Though not damning evidence, these findings suggest major problems with what can be considered an automatic process or not. The view that automaticity is simply memory retrieval (Logan, 1988) seems attractive under these circumstances. The findings that automaticity can depend on attention can be accounted for if it is considered that attention is needed for retrieval of information from memory. The findings of interference effects from and within automatic processes can be accounted for if it is considered that two tasks use the same memory retrieval processes, or the same algorithm procedure. Intentions and control factors can be accounted for by considering that different input information affects the type of output produced, and by considering that subjects can choose to respond to that output or not. Interference can then be accounted for by having to make choices between different outputs. The fact that Logan's model considers that
206
G. Underwood and J. Everatt
storage and retrieval of memory traces is an obligatory and unavoidable consequence of attention accounts for these findings quite well. Cohen et al. (1990), however, argue that the characteristic of obligatory retrieval in Logan's model suggests that interference would be expected if time is allowed for a slower process to be accomplished. Experiments suggest that interference on color word reading from colors does not occur if time is allowed for color processing to occur by presenting the color before the color word (Glaser and Glaser, 1982), or color word processing is slowed down by presenting the color word upside-down and backwards (Dunbar and MacLeod, 1984). It is also difficult to see why slower processes should interfere with faster processes, as in MacLeod and Dunbar's (1988) study. These findings suggest at least that simple processing time accounts of Stroop effects need considerable revising. It should also be noted that Cohen et al.'s simulations do not conform to all of these data. We have seen that so-called automatic effects can be affected by attention. This, however, does not necessarily suggest that automaticity depends on attention. The alternative account of where the cognitive system selects percepts (the late selection theories) suggests that selection occurs relatively late in the processing of stimuli. The model of Deutsch and Deutsch (1963, 1967) is a notable example of this viewpoint. They considered that all inputs are analyzed to a relatively high level of processing and the results are used to select stimuli for further processing. This model was extended by Norman (1968, 1969) who considered that inputs were relatively completely encoded and that the pertinence of a stimulus determines the order of further processing, pertinence being itself determined by the sensory encoding and analysis of previous inputs. These models gain support from demonstrations of the processing of unattended information, as with, for example, the dichotic listening studies of Lewis (1970), Bryden (1972) and Smith and Groen (1974). Dichotic listening tasks do not provide the only evidence for the possibility of processed unattended information; similar evidence has been found in investigations of simultaneous visual presentations. For example, Bradshaw (1974) demonstrated the potency of unattended words in brief visual displays with a task that required subjects to make judgements about the meanings of attended words. An attended word was presented to a predictable location in the visual field on each trial, and thereby gained the benefit of an eye-fixation during presentation-presented to the fovea. This word was polysemous (e.g. 'bank') and was accompanied by a second word that offered disambiguation (e.g. 'water' or 'money'). The unattended second word was presented either to the right or left of the target. As the 125 ms display went off, the subject was presented with a forced choice task in which they had to select one of two meanings of the target word. Subjects tended to bias their interpretation of the target in favor of the meaning provided by the accompanying word, and this result held for those accompanying words that were reportable, and for those that were not. Bradshaw concluded that the parafoveal second word had received semantic processing in the absence of conscious identification. The finding that unattended information can affect the processing of attended information is a controversial one, but further support is provided by the studies of Dallas and Merikle (1976) using related and unrelated precued and not precued target and unattended words, and of Underwood (1976) using target pictures in predictable or unpredictable locations with or without related and unrelated words
Automatic and controlled information processing
207
and nonwords (see section 2.1 for more discussion of this problem). In both cases evidence was found that suggests the unattended word affected target naming, and in the Underwood (1976) study this effect varied depending on whether attention could be focused on a particular position because of the predictable location. However, other studies do not support these conclusions (Paap and Newsome, 1981; Rayner, Balota and Pollatsek, 1986; Stanovich and West, 1983b), finding that items presented to the parafovea did not affect the processing of a foveal stimulus. The problem of how much information is picked up in peripheral vision and to what level this is coded is the subject of the following section.
2.5
Parafoveal Processing-Automatic Orientation?
Experiments investigating processing of information within the parafovea during reading suggests that the further the information is from the center of fixation the less information can be picked up, or the less processing is performed upon that information. For example, McConkie and Rayner (1975) found, by blanking out letters to the right of fixation, that reading speed is optimal if 16 letters to the right of fixation are available for processing; with around 14 to 16 letters, if word length is kept constant, reading speed is not affected. Thus changing the word 'can' to a word of similar length like 'dog', during an eye movement, did not alter reading speed. This suggests word length, and little more, is acquired up to this point. Below 12 letters, changing the word during a saccade did affect the subsequent fixation. Along similar lines, Underwood and McConkie (1985) found that letter information up to 10 letters to the right of fixation affects eye movements, whereas McClelland and O'Regan (1981) found that previews, five-character spaces from fixation, of the target item (e.g. 'model') speeded up naming compared with previews of a highly similar item (e.g. 'molel'). It seems detailed information about a word is picked up in this area. Similar evidence for a reduction in detail picked up in the parafovea as distance from center of fixation increases has been found in picture processing (Nelson and Loftus, 1980), and is suggested by the reduction in Stroop interference effects the further a distracter is from the center of fixation (Gatti and Egeth, 1978). Further findings within the reading literature suggest that information in the parafovea of vision can be processed to a large degree, if not completely. This comes from findings that normal reading words, particularly function words, can be skipped within sentences without detrimental effects to reading (Carpenter and Just, 1983; O'Regan, 1979; Rayner, 1977) and that skipped words are not inferred from the rest of the text (Fisher and Shebilske, 1985). This suggests that a great deal of processing can occur in the parafovea, but that foveal processing provides more detailed processing. This is supported by the fact that reading by parafoveal processing alone is very difficult (Rayner and Bertera, 1979). These findings suggest that some process can accept information before the eyes land on that information, and there is evidence to suggest that the intake of information from the parafovea shows characteristics associated with automatic processes. For example, Jonides (1981) presents evidence that information in the parafovea does not draw heavily on cognitive resources compared with central information. This is suggested by the finding that there is no effect of increasing
208
G. Underwood and J. Everatt
memory load on detecting peripheral information, whereas there is on detecting central information. Jonides also found that it was harder to suppress the costbenefit effects of valid or invalid cues when they were in peripheral vision compared with when they were centrally located, and that expectancies show significant effects on central cues but not on peripheral ones. These findings led Jonides to conclude that automatic processes occur in peripheral vision, and that there is a relationship between attentional shift and eye movements. Similar views have been expressed by Eriksen, Webb and Fournier (1990) and Shepherd and Mfller (1989). Using a letter discrimination task in which two letters had to be searched for and responded to in an array of other letters, Eriksen et al. found that the effect of changing the letter in the parafovea depended on whether or not it was a target letter. The two potential positions in which a target letter could appear were cued, one after the other. The interesting effects came when subjects were fixating the first cued position, and the letter in the second cued position was changed to a target letter. The extent to which this second letter was processed before being changed could then be studied by analyzing the effect of the change on responses to the target letters. Eriksen et al. found that up to 50 ms after the second position was cued there was no effect of changing the letter. At about 80 ms, though, an effect of changing the letter from one target letter to the other was found, but not of changing the letter from a non-target letter to a target letter. This effect was considered to be due to an automatic system processing the letter at the second position while an attentional system is still processing the letter at the first position. If the letter in the second position is a target item, the automatic system produces a response bias such that, when the attentional system arrives and processes the word itself, a response conflict will occur between the automatic system and the attentional system. This will not occur in the situation where the letter is changed from a non-target to a target letter because here the automatic system will not have produced a response bias. If this is the case it suggests a fast-acting system that possibly processes information to output, and, if Eriksen et al.'s interpretation is correct, this precedes attentional processing. Shepherd and M611er found that presenting subjects with cues to fixation produced initial wide-ranging facilitation of stimuli detection which over time decreased for all locations except the location specifically cued. This they proposed was due to a focusing of an initially broad beam of attention onto a particular location. A more rapid focusing process is suggested when peripheral cues are used. Here facilitation for the specific location was found as early as 50 ms after cue onset. Shepherd and Miiller suggest that these effects may be due to a process involved in programming saccadic movements to a particular location or stimulus. This process rapidly focuses in on a position so that a saccade to that position can be programmed, and more attentional, finer-detailed processes follow. This suggests that eye movements are associated with attentional shifts. Similar views have been expressed by Kennedy (1983), Nelson and Loftus (1980), and Henderson, Pollatsek and Rayner (1989). Although there is evidence that attentional shifts can occur without eye movements (Posner, 1980; Posner, Cohen and Rafal, 1982), the argument expressed by Kennedy (1983) and Henderson et al. (1989) is that it is more usual for attention and eye movements to be closely linked. There is, for example, evidence that when
Automatic and controlled information processing
209
shifts of attention are induced, eye movements are also induced. Cooper (1974) and Kahneman (1973) present findings that if subjects are provided with a concurrent auditory stimulus, or questions, about an object, eye movements are found toward those objects, even when fixation of those objects is not necessary for the performance of the experimental task. This suggests the possibility of an automatic triggering of inspection processes, or an orientation response induced by attending to a particular item, and may be related to Shiffrin and Schneider's (1977) view of an automatic attentional response. Whether these are the same processes or not remains to be seen. (An orientation response within the auditory system may be suggested by the attentional switch to unattended information that occurs when highly important information, perhaps highly learnt information such as the listener's own name, is presented in an unattended message. This may be related to the findings from dichotic listening experiments such as Moray (1959).) The value of parafoveal information within reading is suggested by the findings that reading is considerably slowed down when parafoveal information is removed (McConkie and Rayner, 1975; see discussion by Rayner and Pollatsek, 1987). Even the removal of word boundaries in the parafovea severely impairs eye movements (McConkie and Rayner, 1975; Pollatsek and Rayner, 1982). Parafoveal processing also seems to be valuable within picture processing. In the Henderson et al. (1989) study, pictures were presented at different locations on a screen and subjects inspected these in order to answer questions about them. In one condition subjects were given previews of the pictures, in another they were not; only the picture at fixation was presented to the subject. The results showed that picture preview produced about a 100 ms advantage in fixation duration on the picture, compared with the fixation durations when no preview was available. In a second experiment there was evidence that this effect was mainly due to the availability of previews of the picture that was to be fixated next. This advantage of preview in picture recognition is very similar to that referred to above in sentence reading. Previews of information allow for faster processing of that information. Henderson et al. propose that this is because covert visual attention shifts to a parafoveal location and processes that information to some extent. This processing can be complete (if, say, the information is well known, as in the skipping of function words mentioned above) or not complete, in which case foveal processing will take over. Since a certain amount of processing will have been accomplished parafoveally then less processing need be accomplished foveally. This would generate a preview advantage. Whether the covert attention mechanism proposed by Henderson et al. is the same as the automatic mechanism discussed above remains to be seen, but both are considered to have the same function: programming saccadic movements. Henderson et al. (1989) refer to the views of Morrison (1984), who considers that attention (or covert attention) moves to the next piece of information, and is used to trigger off an instruction to move the eyes to a new position. The information used in this decision may help us to decide what information is processed parafoveally. Evidence from studies of eye movements during sentence-reading tasks suggests that the initial fixation on a word is affected by the location of information within that word (Hy6n~i, Niemi and Underwood, 1989; Underwood, Clews and Everatt, 1990). Information in the sense used here relates to the novelty of the letter sequence within a word. The more novel a letter sequence within a word, the more
210
G. Underwood and J. Everatt
distinctive that word is, because of that particular letter sequence. The less novel a letter sequence is, the more the number of other words that possess that sequence, and the less likely it is that the word can be recognized from that letter sequence. Letter sequences that occur in many other words are considered to be redundant, because processing of them alone will not be sufficient for the particular word to be identified. The finding that initial fixations are affected by where these more informative parts of a word occur leads to the suggestion that processing can move ahead of fixation. This processing results in the identification of some features of the parafoveal display, and triggers a saccade to the informative parts of the word. This suggests a process that is sensitive to the parafovea, which may be used to find distinctive features (informative parts) of to-be-fixated pieces of information to allow them to be processed more easily. A second possibility is that well-known pieces of information can be processed by the parafoveal processes and so foveal processes are directed away from such well-known words or parts of words. There is also evidence that within picture recognition eye movements may be attracted to informative parts of the picture, and that this process is vital for information processing. For example, Mackworth and Morandi (1967) and Loftus and Mackworth (1978) found that subjects quickly fixate on an informative area of a picture when that informative area is defined in terms of subject ratings, or the probability that a particular detail belongs in a particular s c e n e - the less probable, the sooner the eyes land on that piece of information, suggesting parafoveal processes are picking out unusual information within a scene. Loftus (1981) found that with various 50 ms tachistoscopic presentations of a picture, performance on a following recognition test did not improve with the number of presentations, unlike in the case where 100 ms presentations were used. Loftus suggested that this was because the smaller presentation duration was not long enough to allow peripheral scanning, which is necessary to determine where a subsequent fixation should occur. Subsequent fixations were thus at random locations instead of at informative locations. With longer presentations there was enough time to perform this peripheral scanning and determine where an informative part of the picture was located for future foveal analysis. The evidence reviewed here suggests that some information can be extracted from words in the parafovea. It suggests that parafoveal information can be used to program a saccade to the next point of regard. It also suggests that the important features of the information around fixation are chosen by this process as the points to be fixated next. Whether this process is controlled by redundancy of information (whereby well-known pieces of information would be processed parafoveally, and so foveal processes can move onto other areas of information), or by informative pieces of information (whereby unusual/informative areas would be recognized by parafoveal processes, and the analysis from this is used to attract foveal vision toward them) is yet to be decided. It is also possible that both processes occur to some extent. In either case parafoveal processing seems to result in deep analysis of the information dealt with by those processes. Let us return to the point of interest, in deciding whether unattended parafoveal information can affect attended foveal information. For parafoveal processing not to interfere with foveal information it would have to be concluded that all processing of the attended item is accomplished before parafoveal processes start, or that parafoveal processes and foveal processes do not interfere. The data presented here suggest that parafoveal processing seems to be fast, and that of Eriksen et al. (1990) suggest that parafoveal
Automatic and controlled information processing
211
displays can be processed to a response biasing stage where this bias can interfere with later attentional processes. The arguments of previous sections suggest this stage as the most likely for Stroop interference to occur (see section 2.1, but, as mentioned, Shaffer and LaBerge (1979) suggest this interference can occur sooner). The fast and biasing nature of this parafoveal process then seems a plausible candidate to produce Stroop-like interference in word-processing. This process of orienting attention towards a particular location suggests a process in use in much of what we do in everyday life (in visual recognition and perhaps auditory recognition) which may possess the features of an automatic process, and if this process seeks out unusual, informative parts of a stimulus it suggests an automatic process that it is more than the memory retrieval processes suggested by Logan's (1988) model. The actual features of this process also may give us an idea of how to distinguish automaticity from more controlled processes. Thus, if we return to the idea of a continuum of automaticity, we may consider this orientation process as being at, or near, the automatic end of the continuum. In the introduction to this chapter we talked about reading a page of text. If attention is not paid to the meaning of the text then understanding will almost certainly be lost. This leads us to the question of processing near the other end of the continuum, when we attempt to understand complex stimuli which are themselves composed of familiar components. This is the problem of determining the role of attention in language comprehension.
3
FAILING TO INTEGRATE
UNATTENDED
MESSAGES
To understand the meaning of a sentence it is necessary to integrate the meanings of each of the words. This is to perform a comprehension calculation which selects the most appropriate shade of meaning of each of the words and combines them into an underlying meaning which reflects the relationships between the words. In Chomskian terms, this process involves the recognition of the deep structure of the sentence. The relationship made explicit by this comprehension calculation would include, for example, the assignment of a subject and an object to a verb, the recognition of anaphora, and the recognition of the propositional structure of the sentence. The product of the calculation is an interpretation of what the speaker or writer meant. What we will consider next is whether this process requires attention, or if comprehension can be independent of attention. Early selection theory is quite clear on this question: if attention is required for perception then there should be a comprehension deficit suffered by unattended sentences. Late selection theory makes a less straightforward prediction: if the attended and unattended messages compete for the same postperceptual processes (including storage; see Deutsch and Deutsch, 1967; Norman, 1968), then only one message will be understood. The comprehension calculation requires storage, and Baddeley (1979) has specifically discussed the cognitive resources of the 'working memory' requirements in reading comprehension. To integrate the words at the beginning of any sentence with the words appearing towards the end of it, all of the words must be stored, and if storage is a process that is available only to selected messages then we should see no evidence of the comprehension of
212
G. Underwood and J. Everatt
unattended messages. This was the case with Cherry's (1953) listeners, but they were questioned some time after presentation, and so the deficit may have occurred after recognition. To know whether unattended messages can be understood we must find a task in which storage is not required, otherwise late selection theorists are entitled to object that the comprehension deficit is a result of postrecognition competition. This approach satisfies the demands of the late selection theorists, but another aspect raises objections from early selection theorists. In the previous section several experiments were described as demonstrating the occurrence of cognitive effects of unattended messages. In many of these experiments subjects were presented with stimuli to which a single response was required, and the cognitive processing of an unattended word may be inferred from the nature of its effect upon that single response. A word might appear on a screen, for example, and a response key pressed according to whether or not the word was a member of a prespecified category. Kahneman and Treisman (1984) describe such single-trial tasks as being 'selective-set' experiments, in that subjects respond to one of several stimuli that might be presented. In contrast, many of the initial investigations of attention can be described as studies of 'filtering' in that the subject selected between two or more stimuli that were actually presented. Kahneman and Treisman are not satisfied that the conclusions drawn from selective-set experiments may be compared with those from filtering experiments. In addition to differences between selecting between possible versus actual stimuli, they emphasize the complexity of the filtering task in that the organization of a continuous response requires additional processing. The difficulty in organizing a continuous response such as shadowing is unquestionable, but their argument is not entirely satisfactory. If the selective-set experiments remove the load induced by response competition then the only difference between the processing of attended and unattended inputs must be a perceptual difference. Furthermore, several of the 'filtering' experiments have demonstrated cognitive effects of unattended messages: the semantic interference experiments of Lewis (1970), Bradshaw (1974) and Underwood (1976, 1977), for example. In such experiments the subjects were required to organize a response to an attended message, and there was evidence of the processing of unattended meanings. Kahneman and Treisman are dissatisfied with such results because the effects are typically small in magnitude, but they are prepared to admit the effects as demonstrations of the semantic processing of unattended messages. What they do insist is that we cannot conclude that perception does not require attention. We can agree that inattention produces a recognition deficit- there are numerous demonstrations of this deficit from studies of target detection. In order to demonstrate that attention does not affect perception with the single-trial experiments, it would be necessary to demonstrate similar interference effects with focused and divided attention. This is clearly not the case: the focusing of attention induced a different pattern of interference in word-naming and picture-naming experiments (Dallas and Merikle, 1976; Underwood, 1976). In the following discussion of the attentional demands of comprehension, evidence will be taken from 'filtering' experiments, not because the Kahneman and Treisman argument is irrefutable, but because it is not easy to imagine a suitable experiment. It would require the presentation of an unattended sentence requiring one of a fixed number of responses, and the absence of competition between the attended and unattended messages. The relevant evidence comes from tasks in which subjects make responses to sentences while their attention is diverted.
Automatic and controlled information processing 3.1
213
Unattended Ambiguous Sentences
To what extent can a listener recognize the underlying meaning of an unattended sentence? Three reports have attempted to answer this question by having the interpretation of an ambiguous sentence affected by an unattended message. The strongest claim comes from Lackner and Garrett (1972), who found that a number of varieties of ambiguity can be resolved. By saying that the ambiguities were resolved in what follows, what is meant is that an ambiguous sentence was interpreted in the suggested direction more often when the unattended message was present than when it was not present. The size of this bias shift is often quite small, less than 5% in some cases. Lackner and Garrett found that an unattended sentence could bias the interpretation of lexical, surface structural and deep structural ambiguity. For lexical ambiguity, the listener might have attended to: 'The plot occupied much of his time that month' in preparation for a paraphrasing response, and at the same time the unattended ear might have been presented with: 'The scheme was very good but they did not like it.' In this case the ambiguity resides in the word 'plot' and is resolved by the word ' s c h e m e ' - a n alternative would have been to replace 'scheme' with 'soil'. For surface structural ambiguity, an attended sentence could be: 'They are eating apples' and the unattended sentence: 'They are making gloves.' The ambiguity in this famous sentence resides in the question of whether the word 'eating' is a verb attached to the noun 'they' (as suggested by the structure of the unattended sentence), or an adjective attached to the noun 'apples'. To obtain this adjectival interpretation the unattended sentence would have to have the structure of: 'They are evening gloves.' The experiment actually distinguished between particle-preposition ambiguities such as: 'The boy looked over the stone wall' and surface structure or bracketing ambiguities such as: 'Jack left with a dog he found last Saturday' and found successful disambiguation in both cases. Finally, Lackner and Garrett reported that sentences with deep structural ambiguity such as: 'They knew that the shooting of the hunters was dreadful' were interpreted according the reading suggested by unattended sentences such as: 'Tom said the sportsmen had been slain prematurely.' This last example is the one most relevant to the question of whether comprehension requires attention, for if the 'sportsmen' sentence is to be influential its deep structure must be recognized. In addition, this interpretation must be available to the process that is used in generating a paraphrase of the 'hunters' sentence. The meaning of the unattended sentence was effective in this experiment, clearly suggesting that comprehension proceeds independently of attention. Although this is a temptingly straightforward conclusion, it has not been supported by subsequent research and we cannot accept the suggestion of anything other than lexical analysis of unattended messages. Two further attempts to find disambiguating effects will be mentioned before considering the reasons for the lack of empiric support that they give for Lackner and Garrett's conclusions.
214
G. Underwood and J. Everatt
If the Lackner and Garrett (1972) result could be confirmed, then we would have evidence that the underlying meaning of a sentence can be recognized when the listener's attention is elsewhere. This follows from the influence of an unattended meaning, gainable only through the integration of the words in a sentence, upon the interpretation of an ambiguous attended sentence. MacKay (1973) and Newstead and Dennis (1979) attempted to replicate Lackner and Garrett's result, but with mixed fortune. MacKay was able to find effects of occasional unattended words upon the interpretation of lexically ambiguous sentences. Further, he found that ambiguity that depended upon surface structure could be resolved by an unattended phrase with a structure corresponding to one of the interpretations. Surface structure ambiguities were unaffected by the meanings of unattended words, and deep structural ambiguities were unaffected by the underlying meaning or lexical meaning of the words in the unattended message. Newstead and Dennis were even less successful in their experiments. They found no effects upon surface structural ambiguities, and effects upon lexical ambiguities only under specific conditions. These conditions included the inclusion of a long intertrial interval and the use of students rather than housewives as subjects in the experiments. Newstead and Dennis did not examine effects upon deep structural ambiguity. The collective conclusion from these three reports is that lexical ambiguity may be resolved by the presence of an unattended message, but that surface structures and deep structures are unaffected. When more than a word or two appears as the unattended message, then no effects are found, suggesting that the integration of words is a process for which attention is required. Why is the Lackner and Garrett (1972) result so difficult to replicate? One suggestion is that they used an inadequate method of controlling the direction of the listener's attention. They instructed their subjects to listen carefully to the attended sentence, and to paraphrase it immediately afterwards. The usual method of attention control- s h a d o w i n g - was not not used. The experiments reported by MacKay (1973) and Newstead and Dennis (1979) used shadowing, and found restricted effects of unattended messages. Lackner and Garrett were aware of this problem, and included informal tests of their listeners' knowledge of the unattended message. None of them could report the content of these messages, and most were unable to say that they had heard sentences. None of the subjects had noticed that the paraphrased sentences had been ambiguous and so, presumably, they had no reason to collect cues intentionally from the unattended message. The task does seem to have been difficult. Some subjects were rejected from the experiment because they were unable to listen to one sentence while ignoring the other, and an acceptable subject would 'sit with eyes closed and head cocked to one side, one hand pushing against the headphone carrying the message to be paraphrased, and immediately blurt out his paraphrase' (p. 366). This does suggest that they were attending quite carefully to the ambiguous sentence, rather than dividing their attention between messages. In view of the apparent selectivity demanded of listeners in Lackner and Garrett's (1972) experiment, it is difficult to understand the failure of subsequent attempts to find similar effects upon the full range of ambiguous sentences. One possibility lies with the structure of the sentences used in the three experiments, but none of the reports gives a list of materials and so this must be tentative. However, it is possible that the 'deep structural ambiguities' were formed in different ways in the different experiments, and that the disambiguating sentences
Automatic and controlled information processing
215
in Lackner and Garrett's experiment were able to influence a pivotal lexical ambiguity. The example quoted in their paper is the famous 'The corrupt police can't stop drinking' and, although this undoubtedly has deep structural ambiguity in that two underlying meanings are represented by one phrase structure, the word 'drinking' can be seen as being critically ambiguous. It is pivotal in the sense that the ambiguity rests upon its interpretation as a verb attached to 'police' or to a missing referent. Perhaps in the other two experiments the deep structural ambiguities were less dependent upon the attachment of a single word. However, the only consistent result from these experiments is that a lexical ambiguity may be resolved by the presence of an unattended message, and that the effect is best seen with an unattended message consisting of a single word. From these studies of linguistic ambiguity comes the single conclusion that a word in the unattended message may influence the interpretation of an ambiguous word in the attended message. All three of the studies found this result, and we must discount the early suggestion that attention is not required for the processing of the underlying meaning of a sentence. This result can be found only under circumstances where the direction of the listener's attention is undetermined. Again using dichotic presentations, Henley (1976) has provided support for the result that lexical ambiguity may be influenced by the content of the unattended message. Responses to ambiguous words were affected by words in the second message that were not only unattended but also presented at an intensity that was below the listener's individual threshold for awareness. The effect upon the interpretation of the homophone was not clear, but there was evidence of lexical processing of the unattended word. This influence was upon the delay in responding to the homophone with an associated word: faster responses were observed when the unattended/subliminal word matched the meaning of the word offered as an associate. Whereas we may be cautious about MacKay's (1973) demonstration of effects upon lexical ambiguity (because of the attention-attracting nature of a single unattended word), no such objection can be raised against Henley's result. Taken together these results are consistent with a wealth of reports of the lexical processing of single unattended words both in dichotic listening (Lewis, 1970; Smith and Groen, 1974) and in selective viewing (Bradshaw, 1974; Dallas and Merikle, 1976; Underwood, 1976). Individual words may gain lexical processing without being attended, but there is no evidence of sequences of words gaining the integration necessary for the recognition of their underlying meaning. The investigations of unattended phrases and sentences that accompany ambiguous attended sentences are one of the few sources of data pertaining to the relationship between attention and comprehension. Single unattended words are analyzed to the level of lexical meaning, but there are few experiments that have looked at the analysis of unattended sentence meanings. Traub and Geffen (1979) used lists of words to demonstrate that category effects in memory search are restricted to attended lists, and this result suggests that attention is necessary for the extraction of features common to the words in a sequence. Although Traub and Geffen used lists rather than sentences, their result is relevant because it shows that interword processing is impaired by inattention. The relationship between the words was appreciated and effective only when the list was attended.
216
3.2
G. Underwood and J. Everatt
Attention and Comprehension
The conclusion that attention is necessary for comprehension also comes from an investigation of the effects of cumulative context in dichotic listening. One of the better-established results in psychology is that objects and words gain easier responses when they are placed in a familiar context. This context allows the perceiver to anticipate their arrival, and to prepare the response in advance of the stimulus. Tulving and Gold (1963) showed this for the case of a word appearing at the end of a sentence, and Palmer (1975) showed it for a simple line-drawing appearing immediately after a context-setting scene. Is context useful at all times, or is it necessary to attend in order to make use of the features contained in the context? This was the question we asked in an experiment that measured shadowing latencies as a function of attended and unattended context (Underwood, 1977). The only word that was critical was the final word of a sentence, in that this was the only word whose shadowing response was timed. The subjects were not told this of course, and they were also told that the unattended message was a distraction. On some trials the critical final word was preceded by useful context, as with: 'The angler returned the fish to the trout stream.' Whenever the congruent context formed part of the attended message, the unattended message contained a list of unrelated words. On some trials the contextually congruent words were replaced by unrelated words, as with: 'Antelope cover income hat collect stream.' The amount of context replaced by unrelated words varied, and on other trials the listeners heard a few unrelated words, followed by a part of the congruent context and then the critical word. The shadowing response to the final word was faster when it was accompanied by context, and the more context there was, the greater was this facilitation effect. None of this was particularly surprising- it confirmed the result obtained by Tulving and Gold (1963) and many others. A more interesting result came from trials in which the context was presented in the unattended message. Listeners would then shadow unrelated words, with a variable amount of unattended context leading up to the attended critical word. In these cases there was still a shadowing advantage, but it was constant in size, regardless of the number of contextually congruent words. So, the benefits of attended context accumulated, but unattended context had a constant effect. Why should an unattended context have a nonaccumulating effect upon shadowing latency? There are several possibilities consistent with the general picture of unattended lexical processing that is starting to emerge. Sentence context effects can accumulate only if the listener can calculate the relationships between successive words, and generate a sentence theme which is, so to speak, greater than the sum of the component words. This sentence theme is, of course, the underlying meaning of the whole sentence, and lexical processing alone is insufficient if the comprehension calculation is to be successful. If the meaning of each unattended word can be recognized, but not the relationships between those words, then we should expect that the effects of unattended context should be restricted to the last few words heard before the critical word. These words could be effective as associative primes, through the process of spreading activation suggested by Meyer and Schvaneveldt (1971), Collins and Loftus (1975) and others. Alternatively, the
Automatic and controlled information processing
217
most recent unattended words could be effective as predictors, by the same constructive calculation process that is able to use an accumulating attended context. By either of these processes the unattended context could have only a constant effect: associative priming would be minimal from the words early in an unattended sentence. The calculation process requires the resources of a working memory system not available without attention. The lexical activation of the earlier words would have dissipated by the time the critical word was heard. All unattended words would gain lexical processing, but only those immediately prior to the critical word would have any effect. To make use of context, we need to attend; to make use of the categorical relationships between the words in a list, we need to attend; and, a more disputable result, if we are to make use of the relationships between words when resolving the ambiguity of a sentence, we again need to attend. These are the conclusions from investigations of unattended context upon shadowing latencies (Underwood, 1977), unattended categories upon memory probes (Traub and Geffen, 1969), and unattended words upon the resolution of ambiguity (Lackner and Garrett, 1972; MacKay, 1973; Newstead and Dennis, 1979). Taken together they suggest that we cannot calculate the relationships between words unless we attend to those words and to their alternative relationships. Ambiguous sentences provide one of the best demonstrations of the need to attend when attempting to understand. Words may be recognized at the level of lexical processing, but the system that integrates individual words can be accessed only when attention is directed towards their relationships. Attention to the words may be insufficient, of course, but it is necessary. In addition to attending to the meaning of each word, successful comprehension will depend upon the selection of the appropriate referent of each word. If the listener attends without attempting to integrate, then a relatively shallow level of processing is achieved, a level perhaps equivalent to Craik and Lockhart's (1972) type I or maintenance processing. Successful retention is more easily achieved by type II or elaboration processing, and this is the deeper processing associated with attention being directed to the relationships between the words. The success of elaborative processing may depend in part upon the new associations that are created between the incoming stimulus and existing knowledge structures, and in part upon the creation of memories of the cognitive operations themselves. A final demonstration in support of the general conclusion comes from a slightly different background. Kleiman's (1975) experiments were designed to determine whether speech recoding was necessary before, during or after lexical access, but his approach and his conclusions are related to the question of the role of attention in comprehension. Subjects made various judgements about words appearing on a screen, either while they were shadowing lists of digits or waiting quietly, and Kleiman observed the potentially disruptive effects of shadowing upon the speed of the judgements. For example, a graphemic judgement about a pair of words ('heard/beard' get a judgement of true) suffered a 125ms decrement due to shadowing, and a phonemic judgement ('heard/beard' are now false) suffered a 372ms decrement. This increase in the time required to make the judgement gives an indication that the processes required for shadowing are more disruptive towards the processes required for making phonemic comparisons. In one of Kleiman's experiments subjects judged whether a target word was the category
218
G. Underwood and J. Everatt
label for any of the words in a sentence. For example, the word/sentence pairing: Games Everyone at home played monopoly gets a true response. Both word and sentence appeared on the screen at the same time. The category decision suffered a 78 ms decrement if the subjects were shadowing while judging. The critical judgement, from the point of view of our general conclusion, was about the legality of a single sentence. Subjects judged whether a list of five words, written in a particular order, formed a semantically acceptable sentence. For example, the single sentence: Noisy parties disturb sleeping neighbors gets a true response, while the sentence: Pizzas have been eating Jerry is false. The sentence acceptability judgement suffered the greatest decrement of all, with an additional 394ms being necessary if the subject was shadowing while reading. So, while judgements about the meanings of individual words could be performed without any great cost, subjects were heavily penalized by the requirement to shadow while integrating the meanings of those words. Given that shadowing is known to require attention, Kleiman could have described the experiment as a demonstration of category judgements during divided attention. Whereas individual words could be processed under conditions of divided attention, the integration of those words could not. Kleiman's experiment suggests that divided attention impairs word integration but not word recognition. Although the result is consistent with the conclusions drawn from studies of focused attention during dichotic listening, we cannot be sure that the experiments are completely comparable. Kleiman's subjects divided their attention between spoken digits (to be shadowed) and printed words seen on a screen (to be judged), and were required to respond to both messages. The other experiments which have allowed us to draw the conclusion of 'recognition without integration' differ in two respects. First, they required the subjects to focus their attention upon one of the messages- the second message was u n a t t e n d e d - and second, they used two messages within the same modality. The division of attention between two modalities might be expected to involve different processes to those required for a similar task which requires just one of those modalities. Indeed, several experiments that have used bisensory presentations appear to have found evidence of effective sharing of attention between messages (Allport et al., 1972; Hirst et al., 1980; Shaffer, 1975; but see Broadbent, 1982, for some doubts about this evidence). It might be argued that these experiments have bypassed the processing bottleneck during input by using two input modalities. If some processes compete for common resources while other processes do not, then we are moving our model in the direction of a modular description of cognitive processing.
INATTENTION AND AUTOMATICITY: SOME CONCLUSIONS The evidence reviewed here can be summarized as suggesting three main conclusions. First, individual words are recognized at the level of analysis of their
Automatic and controlled information processing
219
lexical meaning. This evidence comes from observations of the effects of unattended spoken words upon the shadowing latencies to associated attended words in dichotic listening tasks, and from observations of the effects of printed unattended words upon the responses to competing visual stimuli in Stroop and Stroop-like interference tasks. These unattended words are processed to a cognitive stage where they can influence other ongoing processes. Second, the words in the unattended message, although recognized themselves, cannot be interrelated. This evidence comes from the failure of listeners to recognize the deep structure of an unattended sentence, to recognize the common category of words in an unattended list, and to use the context of an unattended sentence as a contextual predictor of what attended word is coming next. Unattended words are not integrated with themselves or with the attended message. From these two conclusions it appears that unattended messages are not processed beyond the level of lexical recognition, and their effects probably manifest themselves through an automatic process of associative priming. The third conclusion is that the attention deficit is not only one of restricting the integration of individual words, but inattention also moderates the perceptibility and effectiveness of these words. This evidence comes from comparisons of divided attention and focused attention instructions, in studies that ask listeners to detect the presence of target words in dichotic messages, and in studies of the effectiveness of distracting printed words. This process of moderation, which Treisman termed attenuation, does not prevent the semantic processing of individual unattended words, however, but it can, in some cases, change the direction of interference upon the processing of the current attended message. The model suggested by these conclusions is one that emphasizes the role of attention in the processing of novel sequences. Whenever the input is familiar and well learned, in the sense that it has an invariant response associated with it, or in the sense that it has an invariant meaning, then that input may be processed without attention. The invariant response might be part of the shoe-lace tying operation in response to a particular state of the shoe-laces- having pulled on a shoe and taken hold of each lace-end, the next part of the operation is invariant and does not require attention. Using another terminology, it is a constant environmental calling-pattern to which a specific condition ~ action rule can be applied. In exactly the same way a drop in temperature is a calling-pattern to a thermostat that is set to operate a specific action when a specific condition is detected. Attention is not required for the detection of the condition of shoe-laces and for the operation of the next action, in the same way that selective attention is not a part of the equipment in the thermostat. It would be wasteful of our cognitive resources if invariant, regularly performed actions required the same selectivity that we reserve for solving crossword puzzles or deciding what to say to the bank manager. If an invariant environmental pattern requires an invariant response, then a little practice will ensure performance without attention. What holds for shoe-laces also holds for word recognition. Individual words are invariant calling-patterns in that they call for specific condition tE action rules. In this case the responses are the cognitive actions that result in a sensory pattern activating a lexical representation. The lexical meaning of a word is recognized whether or not attention is available for processing. As we have seen, inattention does have a detrimental effect, and so we must conclude that attention can vary the strength of the sensory signal admitted to the lexicon.
220
G. Underwood and J. Everatt
When a word is presented it accesses the internal lexicon, but if this follows the rule that invariant signals are processed without attention, how should we deal with the problem of ambiguous words? A word such as "ball' or 'mint' will have any number of meanings, and so there is no single relationship between the environmental calling-pattern and the required cognitive response. This is a problem because we are concluding that attention will not be available to select the appropriate meaning of the homograph. There is, of course, a considerable literature concerning the recognition of words with multiple meanings, with a consensus view being that all meanings are accessed (Simpson, 1984). The context of presentation does influence the interpretation given to a word, as does the frequencies of the alternative meanings, but the evidence points to an ambiguous word activating all of its possible interpretations. A particularly strong demonstration of this nonselective access of lexically ambiguous words comes from a divided attention experiment reported by Swinney (1979). While subjects listened to prose, in preparation for a comprehension test, they also watched a screen in preparation for making a lexical decision response to a letter-string. The letter-string was sometimes an associate of a word heard at the same time, and when this happened the decision was facilitated. Critically, this facilitation effect was observed with ambiguous spoken words, and both meanings were helpful regardless of which one had been suggested by the prior context. For example, at the same time as hearing the word 'bugs' in: '... The man was not surprised when he found several spiders, roaches and other bugs in the corner of the room...' the letter-string 'ant' might be presented, and this would gain a faster lexical decision than a word such as 'sew'. The nonsuggested meaning of "bugs' was also accessed, however, because the letter-string 'spy' gained a faster response than the neutral 'sew'. The model requires that an invariant stimulus be processed without attention, but the variable meanings of words pose no problem here, because it is possible to demonstrate that their alternative meanings are processed nonselectively. This implies that, although words cannot be considered as invariant stimuli, their meanings can be. A word is a calling-pattern to each of its invariant meanings, with the strength of the resulting activation depending upon stimulus frequency. The meanings of individual words are invariant calling-patterns, and are admitted to the lexicon without the perceiver's attention. They call for very specific cognitive actions and do not require the generation of a new algorithm to perform these actions. This is performance without consideration of the match between intention and action, unlike the recognition of novel stimuli such as the novel combination of words in this sentence. Recognizing the underlying meaning of a sentence is a process that requires attention, for the simple reason that the recombination of word meanings requires the selection of referents. In Swinney's sentence printed above, to take an example, the reader must decide who was not surprised, who did the finding, where the finding was done, what is the relationship between a room and a corner, what is the common feature of spiders, roaches and bugs, and so on. This amounts to a propositional analysis of the sentence. It requires a reconstruction of the underlying meaning which involves the selective attachment of each word to the other words within the proposition. This is a process of selection because sentences do not often share the same propositional structure. The process will be driven by a sentence-processing routine which will
Automatic and controlled information processing
221
produce a unique computation. If sentences did share a regular structure then sentences would possess an invariant feature and propositional attachment would not require attention. The comprehension calculation would then only require selective processing for the determination of relationships between sentences. Provided that there is some feature of the text that does not always produce the same output from this comprehension calculation, then the model suggests that it is to this feature which the reader must attend. I am occasionally aware of my eyes arriving sleepily at the bottom of a page of text, with the realization that I have not been attending to the underlying meanings intended by the writer. As I re-read I have a feeling of familiarity for the words, and even for the positions of the words on the page, and the model attributes this to their original, preattentive recognition. To understand the sentences, and to compute the relationships of the sentences into a schema of the text, we need to attend selectively to their specific and unique underlying meanings.
REFERENCES Adams, J. A. (1976). Issues for a closed-loop theory of motor learning. In G. E. Stelmach (Ed.), Motor Control: Issues and Trends (pp. 87-107). London: Academic Press. Allport, D. A. (1977). On knowing the meanings of words we are unable to report: The effects of visual masking. In S. Dornic (Ed.), Attention and Performance V/(pp. 505-533). Hillsdale, NJ: Erlbaum. Allport, D. A., Antonis, B. and Reynolds, P. (1972). On the division of attention: A disproof of the single channel hypothesis. Quarterly Journal of Experimental Psychology, 24, 225-235. Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89, 369-406. Anderson, J. R. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior, 22, 261-295. Baddeley, A. D. (1979). Working memory and reading. In P. A. Kolers, M. E. Wrolsted and H. Bouma (Eds), Processing of Visible Language, vol. 1 (pp. 355-370). New York: Plenum. Balota, D. A. and Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception and Performance, 10, 340-357. Becker, C. A. (1976). Allocation of attention during visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 2, 556-566. Bradshaw, J. M. (1974). Peripherally presented and unreported words may bias the perceived meaning of centrally fixated homograph. Journal of Experimental Psychology, 103, 1200-1202. Briggs, P. and Underwood, G. (1982). Phonological coding in good and poor readers. Journal of Experimental Child Psychology, 34, 93-112. Broadbent, D. E. (1971). Decision and Stress. London: Academic Press. Broadbent, D. E. (1982). Task combination and selective intake of information. Acta Psychologica, 50, 253-290. Bryden, M. P. (1972). Perceptual strategies, attention, and memory in dichotic listening. Unpublished report, University of Waterloo. Carpenter, P. A. and Just, M. (1983). What your eyes do while your mind is reading. In K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processes (pp. 275-307). New York: Academic Press. Carr, T. H., McCauley, C., Sperber, R. D. and Parmalee, C. M. (1982). Words, pictures, and priming: On semantic activation, conscious identification, and the automaticity of information processing. Journal of Experimental Psychology: Human Perception and Performance, 8, 757-777.
222
G. Underwood and J. Everatt
Cherry, C. (1953). Some experiments on the recognition of speech with one and two ears. Journal of the Acoustical Society of America, 23, 915-919. Cohen, J. D., Dunbar, K. and McClelland, J. L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332-361. Collins, A. M. and Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82, 407-428. Cooper, R. M. (1974). The control of eye fixations by the meaning of spoken language. Cognitive Psychology, 6, 84-107. Craik, F. I. M. and Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671-684. Dallas, M. and Merikle, P. M. (1976). Semantic processing of non-attended visual information. Canadian Journal of Psychology, 30, 15-21. DeGroot, A. M. B., Thomassen, A. J. W. M. and Hudson, P. T. W. (1982). Association facilitation of word recognition as measured from a neutral prime. Memory and Cognition, 10, 358-370. Deutsch, J. A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Deutsch, J. A. and Deutsch, D. (1967). Comments on 'Selective attention: perception or response?' Quarterly Journal of Experimental Psychology, 19, 362-363. Dunbar, K. and MacLeod, C. M. (1984). A horse race of a different colour: Stroop interference patters with transformed words. Journal of Experimental Psychology: Human Perception and Performance, 10, 622-639. Duncan, J. (1980). The locus of interference in the perception of simultaneous stimuli. Psychological Review, 87, 272-300. Dyer, F. N. (1973). The Stroop phenomenon and its use in the study of perceptual, cognitive and response processes. Memory and Cognition, 1, 106-120. Eriksen, B. A. and Eriksen, C. W. (1974). Effects of noise letters upon the identification of a target letter in a non-search task. Perception and Psychophysics, 16, 143-149. Eriksen, C. W. and Schultz, D. W. (1979). Information processing in visual search: A continuous flow model and experimental results. Perception and Psychophysics, 25, 249-263. Eriksen, C. W., Webb, J. M. and Fournier, L. R. (1990). How much processing do nonattended stimuli receive? Apparently very little, but... Perception and Psychophysics, 47, 477-488. Fisher, D. F. and Shebilske, W. L. (1985). There is more than meets the eye than the eye-mind assumption. In R. Groner, G. McConkie and C. Menz (Eds), Eye Movements and Human Information Processing (pp. 149-157). Amsterdam: North-Holland. Fodor, J. A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press. Forster, K. I. (1981). Priming and the effects of sentence and lexical context on naming time: Evidence for autonomous lexical processing. Quarterly Journal of Experimental Psychology, 33a, 465-495. Francolini, C. M. and Egeth, H. (1980). On the non-automaticity of 'automatic' activation: Evidence of selective seeing. Perception and Psychophysics, 27, 331-342. Gatti, S. V. and Egeth, H. A. (1978). Failure of spatial selectivity in vision. Bulletin of the Psychonomic Society, 11, 181-184. Glaser, M. O. and Glaser, W. R. (1982). Time course analysis of the Stroop phenomenon. Journal of Experimental Psychology: Human Perception and Performance, 8, 875-894. Goolkasian, P. (1981). Retinal location and its effect on the processing of target and distracter information. Journal of Experimental Psychology: Human Perception and Performance, 7, 1247-1257. Greenwald, A. G. (1972). Evidence of both perceptual filtering and response suppression for rejected messages in selective attention. Journal of Experimental Psychology, 94, 58-67.
Automatic and controlled information processing
223
Guenther, R. K., Klatzky, R. L. and Putnam, W. (1980). Commonalities and differences in semantic decisions about pictures and words. Journal of Verbal Learning and Verbal Behaviour, 19, 54-74. Hasher, L. and Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of Experimental Psychology: General, 108, 356-388. Henderson, J. M., Pollatsek, A. and Rayner, K. (1989). Covert visual attention and parafoveal information use during object identification. Perception and Psychophysics, 45, 196-208. Henley, S. H. A. (1976). Responses to homophones as a function of cue words on the unattended channel. British Journal of Psychology, 67, 529-536. Hintzman, D. L. (1986). 'Schema abstraction' in a multiple-trace model. Psychological Review, 93, 411-428. Hirst, W., Spelke, E. S., Reaves, C. C., Caharack, G. and Neisser, U. (1980). Dividing attention without alternation or automaticity. Journal of Experimental Psychology: General, 109, 98-117. Hy6n~i, J., Niemi, P. and Underwood, G. (1989). Reading long words embedded in sentences: Informativeness of word parts affects eye movements. Journal of Experimental Psychology: Human Perception and Performance, 15, 142-152. Jacoby, L. L. and Brooks, L. R. (1984). Nonanalytic cognition: Memory, perception, and concept learning. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (pp. 1-47). New York: Academic Press. James, W. (1890). The Principles of Psychology. New York: Holt. Jastrzembski, J. E. (1981). Multiple meanings, number of related meanings, frequency of occurrence and the lexicon. Cognitive Psychology, 13, 278-305. Johnston, W. A. and Wilson, J. (1980). Perceptual processing of non-targets in an attention task. Memory and Cognition, 8, 372-377. Jonides, J. (1981). Voluntary versus automatic control over the mind's eye's movement. In J. Long and A. Baddeley (Eds), Attention and Performance IX (pp. 187-203). Hillsdale, NJ: Erlbaum. Jonides, J., Naveh-Benjamin, M. and Palmer, J. (1985). Assessing automaticity. Acta Psychologica, 60, 157-171. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall. Kahneman, D. and Chajzyck, D. (1983). Tests of the automaticity of reading: Dilution of Stroop effects by colour-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9, 497-509. Kahneman, D. and Henik, A. (1981). Perceptual organisation and attention. In M. Kubovy and J. Pomerantz (Eds), Perceptual Organisation (pp. 181-211). Hillsdale, NJ: Erlbaum. Kahneman, D. and Treisman, A. M. (1984). Changing views of attention and automaticity. In R. Parasuraman and R. Davies (Eds), Varieties of Attention (pp. 29-61). New York: Academic Press. Keele, S. W. (1973). Attention and Human Performance. Pacific Palisades, CA: Goodyear. Keele, S. W. and Summers, J. (1976). The structure of motor programs. In G. E. Stelmach (Ed.), Motor Control: Issues and Trends (pp. 109-142). London: Academic Press. Kellas, G., Ferraro, F. R. and Simpson, G. B. (1988). Lexical ambiguity and the time-course of attentional allocation in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 14, 601-609. Kennedy, A. (1983). On looking into space. In K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processes (pp. 237-251). New York: Academic Press. Kidd, G. R. and Greenwald, A. G. (1988). Attention, rehearsal and memory for serial order. American Journal of Psychology, 101, 259-279. Kleiman, G. M. (1975). Speech recoding in reading. Journal of Verbal Learning and Verbal Behaviour, 14, 323-339. LaBerge, D. (1981). Automatic information processing: A review. In J. Long and A. Baddeley (Eds), Attention and Performance IX (pp. 173-186). Hillsdale, NJ: Erlbaum.
224
G. Underwood and J. Everatt
LaBerge, D. and Samuels, S. J. (1974). Toward a theory of automatic information processing in reading. Cognitive Psychology, 6, 293-323. Lackner, J. R. and Garrett, M. F. (1972). Resolving ambiguity: Effects of biasing context in the unattended ear. Cognition, 1, 359-372. Ladefoged, P., Silverstein, R. and Papcun, G. (1973). Interruptability of speech. Journal of the Acoustical Society of America, 54, 1105-1108. La Heij, W., Dirkx, J. and Kramer, P. (1990). Categorical interference and associative priming in picture naming. British Journal of Psychology, 81, 511-525. Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition, 14, 41-104. Lewis, J. L. (1970). Semantic processing of unattended messages using dichotic listening. Journal of Experimental Psychology, 85, 225-228. Loftus, G. R. (1981). Tachistoscopic simulations of eye fixations on pictures. Journal of Experimental Psychology: Human Learning and Memory, 5, 369-376. Loftus, G. R. and Mackworth, N. H. (1978). Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology: Human Perception and Performance, 4, 565-572. Logan, G. D. (1979). On the use of a concurrent memory load to measure attention and automaticity. Journal of Experimental Psychology: Human Perception and Performance, 5, 189-207. Logan, G. D. (1982). On the ability to inhibit complex movements: a stop-signal study of typewriting. Journal of Experimental Psychology: Human Perception and Performance, 8, 778-792. Logan, G. D. (1985). Skill and automaticity: relations, implications and future directions. Canadian Journal of Psychology, 39, 367-386. Logan, G. D. (1988). Toward an instance theory of automatisation. Psychological Review, 95, 492-527. Lupker, S. J. and Katz, A. N. (1981). Input, decision and response factors in picture-word interference. Journal of Experimental Psychology: Human Learning and Memory, 7, 269-282. MacKay, D. G. (1973). Aspects of the theory of comprehension, memory and attention. Quarterly Journal of Experimental Psychology, 25, 22-40. MacKay, D. G. (1982). The problem of flexibility, fluency and speed-accuracy trade-off in skilled behaviour. Psychological Review, 89, 483-506. Mackworth, N. H. and Morandi, A. J. (1967). The gaze selects information details within pictures. Perception and Psychophysics, 2, 547-552. MacLeod, C. M. and Dunbar, K. (1988). Training and Stroop like interference: Evidence for a continuum of automaticity. Journal of Experimental Psychology: Learning, Memory and Cognition, 14, 126-135. McClelland, J. L. and O'Regan, J. K. (1981). Expectations increase the benefit derived from parafoveal visual information in reading words aloud. Journal of Experimental Psychology: Human Perception and Performance, 7, 634-644. McClelland, J. L. and Rumelhart, D. M. (1981). An interactive-activation model of context effects in letter perception: Part 1. An account of basic beginnings. Psychological Review, 88, 375-407. McConkie, G. W. and Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception and Psychophysics, 17, 578-586. McLeod, P. (1977). A dual task response modality effect: Support for multiprocessor models of attention. Quarterly Journal of Experimental Psychology, 29, 651-667. Meyer, D. E. and Schvaneveldt, R. W. (1971). Facilitation in recognising pairs of words: Evidence of a dependence in retrieval operations. Journal of Experimental Psychology, 90, 227-234. Minsky, M. (1980). K-lines: A theory of memory. Cognitive Science, 4, 117-133. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instruction. Quarterly Journal of Experimental Psychology, 9, 56-60.
Automatic and controlled information processing
225
Morrison, R. E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. Navon, D. and Gopher, D. (1979). On the economy of the human processing system. Psychological Review, 86, 214-255. Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited capacity attention. Journal of Experimental Psychology: General, 106, 226-254. Neisser, U., Hirst, W. and Spelke, E. S. (1981). Limited capacity theories and the notion of automaticity: Reply to Lucas and Bub. Journal of Experimental Psychology: General, 110, 499-500. Nelson, W. W. and Loftus, G. R. (1980). The functional visual field during picture viewing. Journal of Experimental Psychology: Human Learning and Memory, 6, 391-399. Neumann, O. (1984). Automatic processing: A review of recent findings and a plea for an old theory. In W. Prinz and A. F. Sanders (Eds), Cognition and Motor Processes (pp. 255293). Berlin: Springer. Newstead, S. E. and Dennis, I. (1979). Lexical and grammatical processing of unshadowed messages: A re-examination of the MacKay effect. Quarterly Journal of Experimental Psychology, 31, 477-488. Norman, D. A. (1968). Toward a theory of memory and attention. Psychological Review, 75, 522-536. Norman, D. A. (1969). Memory while shadowing. Quarterly Journal of Experimental Psychology, 21, 85-93. O'Regan, J. K. (1979). Saccade size in reading: Evidence for the linguistic control hypothesis. Perception and Psychophysics, 25, 501-509. Paap, K. R. and Newsome, S. L. (1981). A perceptual-confusion account of the WSE in the target search paradigm. Perception and Psychophysics, 27, 444-456. Paap, K. R. and Ogden, W. C. (1981). Letter encoding is an obligatory but capacitydemanding operation. Journal of Experimental Psychology: Human Perception and Performance, 7, 518-527. Palmer, S. E. (1975). The effects of contextual scenes on the identification of objects. Memory and Cognition, 3, 519-526. Pollatsek, A. and Rayner, K. (1982). Eye movement control in reading: The role of word boundaries. Journal of Experimental Psychology: Human Perception and Performance, 8, 817-833. Posner, M. I. (1978). Chronometric Explorations of Mind. Hillsdale, NJ: Erlbaum. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-26. Posner, M. I., Cohen, Y. and Rafal, R. D. (1982). Neural system control of spatial orienting. Philosophical Transactions of the Royal Society of London, B298, 187-198. Posner, M. I. and Snyder, C. R. R. (1975). Attention and cognitive control. In R. L. Solso (Ed.), Information Processing and Cognition: The Loyola Symposium (pp. 55-85). Hillsdale, NJ: Erlbaum. Rabbitt, P. M. A. (1966). Errors and error correction in choice response tasks. Journal of Experimental Psychology, 71, 264-272. Rayner, K. (1977). Visual attention in reading: Eye movements reflect cognitive processes. Memory and Cognition, 4, 443-448. Rayner, K., Balota, D. A. and Pollatsek, A. (1986). Against parafoveal semantic pre-processing during eye fixations in reading. Canadian Journal of Psychology, 41, 211-236. Rayner, K. and Bertera, J. H. (1979). Reading without a fovea. Science, 206, 468-469. Rayner, K. and Pollatsek, A. (1987). Eye movements in reading: A tutorial review. In M. Coltheart (Ed.), Attention and Performance XII: The Psychology of Reading (pp. 327-362). London: LEA.
226
G. Underwood and J. Everatt
Reason, J. (1979) Actions not as planned: The price of automatisation. In G. Underwood and R. Stevens (Eds), Aspects of Consciousness, vol. 1 (pp. 67-89). London: Academic Press. Reed, G. (1972). The Psychology of Anomalous Experience. London: Hutchinson. Rosinski, R. R., Golinkoff, R. M. and Kukish, K. S. (1975). Automatic semantic processing in a picture-word interference task. Child Development, 46, 247-253. Rubenstein, H., Garfield, L. and Millikan, J. A. (1970). Homographic entries in the mental lexicon. Journal of Verbal Learning and Verbal Behaviour, 9, 487-494. Schiller, P. H. (1966). Developmental study of colour-word interference. Journal of Experimental Psychology, 72, 105-108. Schneider, W. (1985). Toward a model of attention and the development of automatic processing. In M. I. Posner and O. S. Marin (Eds), Attention and Performance XI (pp. 475492). Hillsdale, NJ: Erlbaum. Schneider, W. and Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. Detection, search and attention. Psychological Review, 84, 1-66. Seidenberg, M. S., Waters, G. S., Sanders, M. and Langer, P. (1984). Pre- and post-lexical loci of contextual effects on word recognition. Memory and Cognition, 12, 315-328. Shaffer, L. H. (1975). Multiple attention in continuous verbal tasks. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance V (pp. 157-167). London: Academic Press. Shaffer, W. O. and LaBerge, D. (1979). Automatic semantic processing of unattended words. Journal of Verbal Learning and Verbal Behaviour, 18, 413-426. Shepherd, M. and Miiller, H. J. (1989). Movement versus focusing of visual attention. Perception and Psychophysics, 46, 146-154. Shiffrin, R. M. (1988). Attention. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey and R. D. Luce (Eds), Steven's Handbook of Experimental Psychology, vol. 2: Learning and Cognition (pp. 739-811). New York: Wiley. Shiffrin, R. M. and Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127-190. Shulman, G. L. (1990). Relating attention to visual mechanisms. Perception and Psychophysics, 47, 199-203. Simpson, G. B. (1984). Lexical ambiguity and its role in models of word recognition. Psychological Bulletin, 96, 316-340. Smith, M. C. and Groen, M. (1974). Evidence for semantic analysis of unattended verbal items. Journal of Experimental Psychology, 102, 595-603. Spelke, E. S., Hirst, W. C. and Neisser, U. (1976). Skills of divided attention. Cognition, 4, 215-230. Stanovich, K. E. and West, R. F. (1981). The effects of sentence context on ongoing word recognition: Tests of a two-process theory. Journal of Experimental Psychology: Human Perception and Performance, 7, 658-672. Stanovich, K. E. and West, R. F. (1983a). On priming by a sentence context. Journal of Experimental Psychology: General, 112, 1-36. Stanovich, K. E. and West, R. F. (1983b). The generalizability of context effects on word recognition: A reconsideration of the roles of parafoveal priming and sentence context. Memory and Cognition, 5, 84-89. Swinney, D. A. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning and Verbal Behavior, 18, 645-659. Traub, E. and Geffen, G. (1979). Phonemic and category encoding of unattended words in dichotic listening. Memory and Cognition, 7, 56-65. Treisman, A. M. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental Psychology, 12, 242-248. Treisman, A. M. (1969). Strategies and models of selective attention. Psychological Review, 76, 282-299.
Automatic and controlled information processing
227
Treisman, A. M. and Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Treisman, A. M., Squire, R. and Green, J. (1974). Semantic processing in dichotic listening? A replication. Memory and Cognition, 2, 641-646. Tulving, E. and Gold, C. (1963). Stimulus information and contextual information as determinants of tachistoscopic recognition of words. Journal of Experimental Psychology, 66, 319-327. Underwood, G. (1976). Semantic interference from unattended printed words. British Journal of Psychology, 67, 327-338. Underwood, G. (1977). Contextual facilitation from attended and unattended messages. Journal of Verbal Learning and Verbal Behavior, 16, 99-106. Underwood, G. (1981). Lexical recognition of embedded unattended words: Some implications for reading processes. Acta Psychologica, 47, 267-283. Underwood, G. (1982). Attention and awareness in cognitive and motor skills. In G. Underwood (Ed.), Aspects of Consciousness, vol. 3 (pp. 111-145). London: Academic Press. Underwood, G. and Briggs, P. (1984). The development of word recognition processes. British Journal of Psychology, 75, 243-255. Underwood, G., Clews, S. and Everatt, J. (1990). How do readers know where to look next? Local information distributions influence eye fixations. Quarterly Journal of Experimental Psychology, 42A, 39-65. Underwood, G. and Thwaites, S. (1982). Automatic phonological coding of unattended printed words. Memory and Cognition, 10, 434-442. Underwood, G. and Whitfield, A. (1985). Right hemisphere interactions in picture-word processing. Brain and Cognition, 4, 273-286. Underwood, N. R. and McConkie, G. W. (1985). Perceptual span for letter distinctions during reading. Reading Research Quarterly, 20, 153-162. Warren, R. E. (1977). Time and the spread of activation in memory. Journal of Experimental Psychology: Learning and Memory, 3, 458-466. Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman and R. Davies (Eds), Varieties of Attention (pp. 63-102). New York: Academic Press.
Chapter 7 Energetics and the Reaction Process: Running Threads Through Experimental Psychology 1 Maurits W. van der Molen University of Amsterdam, The Netherlands
Contemporary theories of human information processing are concerned primarily with the architecture and temporal organization of computational mechanisms. Not surprisingly, information processing theorists have borrowed heavily from computer science for the modeling of human performance. Unfortunately, the computer metaphor may carry the danger of producing 'cognitive wheels'. The wheel image has been used by Dennett (1984) to denote characteristics of cognitive models which bear as little relation to the workings of the mind as the wheel bears to natural solutions to mammalian locomotion. The fear of producing cognitive wheels inspired Hockey, Gaillard and Coles (19.86) to argue that information processing theories should account for: (i) variability resulting from imposed or natural changes of the state of the system; (ii) regulatory and strategic actions of the system under suboptimal conditions; (iii) the possibility that individuals may differ in the organization and intensity of cognitive operations; and (iv) cognitive operations and biological functioning within the same general model (Hockey et al., 1986, p. 4). This program would require the integration of energetic constructs within information processing models of cognition. Hockey et al. used the term 'energetics' to denote the 'motivational or intensive aspects of behavior as opposed to structural or directional effects' (p. ix). The proposed marriage between energetic constructs and information processing models is basically a reversion to theories of mind developed by the pioneers in experimental psychology. In the late 19th century, the foundations were laid for an energetic view on attention. James (1890), for example, distinguished between two basic types of attention: adjustive attention and anticipatory preparation. The former involves a facilitation through orienting of a sense organ and inhibition of other sense organs, whereas the latter consists of a voluntary effort to build up and sustain a state of preparation. Obviously, both aspects of attention refer to variations of the state of the organism which may affect the speed and accuracy of the computional mechanisms involved in the analysis of the stimulus and the production of the response. The effects of fluctuating attentional states on the lThis article was written for original publication in 1991.
Handbook of Perception and Action, Volume 3 ISBN 0-12-516163-8
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
229
M. W. van der Molen
230
efficiency of performance were typically examined in standard reaction time (RT) tasks developed originally for the neurophysiological purpose of determining the speed of nervous transmission (Helmholtz, 1850) but soon adopted to measure the speed of thought (Donders, 1868) and the dynamics of performance (Wundt, 1910). The standard RT task will provide the running thread for such a discussion. In the early days of experimental psychology the RT task was a very popular tool to examine performance efficiency under a wide array of suboptimal and superoptimal conditions (e.g. review in Woodworth, 1938). The standard RT task still enjoys great popularity. A recent count indicated that up to 40% of the articles published in representative issues of the Journal of Experimental Psychology: Human Perception and Performance have used measures of RT to reach their conclusions (Meyer et al., 1988). The remainder of the chapter consists of four sections. The first section will trace the origins of energetics and attention during the first stages of experimental psychology. It will illustrate: (i) the use of the concept of 'energy' in psychological theorizing; (ii) some speculations on the neural substrate of attentional states; and (iii) procedures developed to relate those states to performance efficiency. The second section will be concerned with the revival in the study of attention and physiological arousal and the rebirth of the study of the reaction process after its disappearance during the behavioristic era. The third section will focus on revisions during the 1970s in (i) experimental stress research, (ii) the study of attention and physiological arousal, (iii) the decomposition of the unitary arousal, and (iv) the stage analysis of the reaction process. A final section will present an example of a recent hybrid model of the reaction process in which energetic constructs have been integrated within a stage conceptualization of the reaction process. The chapter will close in concluding that the current renaissance of energetical thinking did not uncover new views on attention but rather stimulated the use of new RT methods and powerful brain technology in the interest of old ideas, the solidity of which has been proven during more than a century of experimental psychology.
1
HISTORICAL
ORIGINS
The 'energy' metaphor shaped the minds of the pioneers in experimental psychology. The law of energy conservation derived from thermodynamics suggested to Heymans (1927) several specific theses on consciousness and attention. Among these theses were: (i) each mental content contains distance energy ('Distanzenergie') enabling it to move to the attentional focus; (ii) when forcing its way to the attentional focus, its distance energy will be reduced while its position energy ('Niveauenergie') increases and can be transformed into potential association, connation, feeling or volition energies; and (iii) these energies may be transformed to invoke bodily changes and other mental contents or be distributed across similar and contingent mental contents (p. 358). A less abstract view of the relation between attention and energy was entertained by McDougall (1911) who distinguished between two types of nervous processes. One the one hand are processes in well-organized systems occurring without any effort or attention because organized systems have a low internal resistance. On the other hand are processes that occur in elements not yet organized in fixed systems.
Energetics and the reaction process
231
In contrast to fixed neural systems, plastic parts of the brain have a high degree of resistance to the current of nervous energy. McDougall considered the forcing of a passage across synapses of high resistance an essential feature of the organization of functional brain systems. He assumed further that the accumulation of energy to a high level and its discharge across synapses of high resistance is invariably and proportionally accompanied by clear consciousness and attention. McDougall's energetic interpretation of attention has three important corollaries. First, it is consistent with a view suggesting that psychic forces may intervene in the course of brain processes without violating the law of energy conservation. Second, it anticipates the distinction between automatic and controlled processing and provides a neural basis for these two types of processes. Third, it suggests a developmental course of attentional systems in the brain. Early in development, nervous processes occur predominantly in brain structures that are genetically determined and that contribute only a little to the growth of attention, whereas in the course of development plastic brain tissue becomes organized under the touch of experience and the intervention of attention. Ribot's (1919) monograph Psychologie de l'Attention presents a coherent account of the early experimental psychologist's view on the relation between attention and the brain. At the turn of the century, the predominant view was to distinguish between two types of attention: involuntary and voluntary attention (see also James, 1890). Involuntary attention is characteristic of animals and human infants. It has strong ties with affection and does not require effort. As soon as effort comes into play this is a sign that involuntary attention is shifting into voluntary attention. Ribot considered voluntary attention as the product of training. In this sense, voluntary attention is artificial and strongly related to motivation. One of the prime examples of involuntary attention is the 'surprise' reaction which is elicited by novel stimuli. The bodily changes that occur during involuntary attention consist of an increased responsivity of the sympathetic nervous system, which results in a greater blood flow to the brain. Involuntary attention is also associated with slower breathing or even respiratory arrest. The effect of involuntary attention on the respiratory system was one of the signs suggesting to Ribot that attention is an abnormal state of the organism which can persist only for a short time. Finally, involuntary attention has widespread effects on the musculature of the face and will generally result in the inhibition of ongoing behavior and a turning towards the source of stimulation. The quintessential feature of voluntary attention is inhibition. Ribot assumed that the seat of attentional inhibition was located in the frontal lobes. The hypothesis of the frontal lobes as the neural implementation of voluntary attention was based upon the following observations: (i) stimulation of the frontal lobes does not elicit motor responses; (ii) frontal lesions do not result in paralysis but rather in a loss of intellectual function resulting in attentional deficits; (iii) the frontal lobes are poorly developed in mental retardates and so is their artificial attention; (iv) phylogenetically, the development of the frontal lobes is associated with an increase in attentional skill; and (v) ontogenetically, intelligence is correlated with the development of attention and also with the maturation of the frontal lobes. In the early days of experimental psychology, 'expectant attention' was one of the most researched types of voluntary attention (Ribot, 1919, p. 109). Expectant attention has been induced by instructing a subject to respond as quickly and accurately as possible to a stimulus in an RT task. Ribot referred to the work of Wundt and Exner
232
M. W. van der Molen
to illustrate that, when subjects have prior knowledge of the nature of the stimulus, RT shows a decrease from 500 to 253 ms. When subjects can predict the onset of the stimulus, RT can even be reduced to 76 ms. Ribot also cited results obtained by Obersteiner (1879) to illustrate the sensitivity of the RT method to suboptimal conditions. This investigator observed that a headache delayed RT to 171 ms while the same subject had shown a response speed of 133 ms under optimal conditions. Fatigue and drowsiness prolonged RT to 183 ms. Illness was even more disastrous. During the first stages of his illness the RT of this subject was first lengthened to 166 ms and then from 281 to 755 ms when his illness progressed into later stages. The early enthusiasm for using the RT method to assess attentional states led many investigators to consider RT variability as the 'dynamometer of attention' (Salow, 1912). The method was soon used to examine the effects of practice, preparation, drugs, incentives and organic factors, all of which were assumed to affect the subject's attentional state and selectivity. Practice was observed to produce shorter RTs. The early experimenters assumed that during the course of the experiment attention shifted from the analysis of the signal (sensorial reactions) to the execution of a quick response (muscular reactions). This possibility raised the wider question of the effects of attentional fluctuations on the reaction process. Some investigators observed that during prolonged testing 'blocks' may occur: a series of quick responses followed by an extremely retarded one. These blocks have been interpreted to suggest temporary lapses of attention (Woodworth, 1938). A warning signal has been used to facilitate a state of optimal attention. It has been observed that a warning signal reduces muscular reactions from 188 to 136 ms whereas sensorial reactions are only shortened from 305 to 279 ms (Wundt, 1910). This differential effect of the warning signal was interpreted in terms of an attentional balance. When the subject allocates attention towards the preparation of the responding hand, the insertion of a warning signal will enable a targeting of response preparation at the time of stimulus onset. In adopting a sensorial attitude, the subject is already optimally prepared to receive the stimulus, and thus the warning signal will be less effective. These findings seem to suggest a different time course for perceptual and motor preparation. Perceptual preparation can be maintained for relatively long time periods whereas the maintenance of motor preparation is much more strenuous. This interpretation was supported by the observation that short preparatory intervals (1.5 s) produce faster responses than longer waiting periods (3 or 6 s). Short intervals allow the adoption of a muscular attitude whereas longer intervals induce a shift from the more strenuous muscular reactions towards a sensorial attitude. A second facilitating factor that received early attention is the use of incentives. Woodworth (1938) discussed an experiment performed by Johansson in which three subjects undertook an auditory reaction task under three conditions. In one condition, the subjects received prior information about their speed of responding on the preceding trial. In the other condition, the subjects could escape electrical punishment if they responded quicker than a deadline adjusted to their average speed of responding. The third condition served as a baseline. Knowledge of results and punishment were both very effective in speeding up the reaction process. Most likely, these task manipulations balanced attention towards a muscular reaction process at the cost of sensorial components. Drugs and psychopathological conditions are also likely to alter the subject's attentional state. Wundt (1910) demonstrated how the RT method can be used to classify drugs in one of four categories. One category contained drugs, such as
Energetics and the reaction process
233
alcohol, that produce an initial decrease in RT but, as the task continues, these drugs have a detrimental effect on response speed. The second class of drugs, e.g. small doses of chloroform, elicit an opposite effect. These drugs produce an initial decrease in response speed followed by an increase. The third category of drugs, e.g. high doses of alcohol or chloroform, result in a lengthening of RT at all stages of the task. Other drugs may have only a facilitating effect (tea and coffee). In discussing these findings, Wundt (1910) warned against a blind application of the RT technology. Variations in response speed have been linked to a wide array of task and organismic variables but, as Wundt pointed out, the measurement of response speed is pointless if RT experiments are not guided by theory. Thus, it is not surprising that, in the absence of a theory, Woodworth (1938) is unable to account for the surprisingly long response times of the child 'in spite of his short nerve paths and general liveliness' (p. 336). The lack of a theoretical framework and the disconcerting variability in RT obtained under seemingly identical conditions were major factors in the rapid decline of the RT method. It was not even mentioned in the most influential experimental psychology textbooks that appeared in the 1950s (Osgood, 1953; Stevens, 1951; Woodworth and Schlossberg, 1954). Similarly, as the term 'instinct' fell into disrepute, the question of attention was neglected in academic psychology. Thus, in a recently edited volume commemorating the first 100 years of scientific psychology, dating back to the founding of Wundt's laboratory, there is not a single reference to the concept of attention in the subject index (Koch and Leary, 1985). The study of attention and the use of the RT method did not reappear until the late 1950s.
REVIVED VIEWS O N A T T E N T I O N A N D THE REBIRTH OF THE RT M E T H O D Academic psychology in the 1950s was dominated by behaviorism and Gestalt psychology. Both schools shared the conviction that simple laws govern the relation between stimulus and response. Behaviorism adopted a view in which the conditioned stimulus is mapped onto the response simply by its contingency with the unconditioned stimulus. Such a view leaves little space for intervening variables such as attention. The Gestaltists were primarily concerned with the question of how the integration of sensations into meaningful wholes is achieved by the organism. Their fundamental idea is the doctrine of a 'determinative whole'. The percept as a whole is a supersummative aggregate that is selecting, determining and shaping its parts according to a set of laws peculiar to itself (Allport, 1955). Attention plays only a minor role in the formation of Gestalten (Boring, 1970). It is against this background that Broadbent (1958) presented his well-known filter model of selective attention and Berlyne (1960) opened up the study of collative stimulus properties as determinants of attention-getting. Both lines of investigations related psychological attention to physiological arousal. The significance of physiological arousal as a determinant of attentional selectivity is amplified in Easterbrook's (1959) hypothesis of cue utilization. The linkage between attention and arousal advanced the development of energetic perspectives on attention. In tracing the origins of the 'arousal' or 'activation' concept, Malmo
234
M. W. van der Molen
(1959) proposed activation as a unidimensional behavioral continuum linking performance to physiological arousal. Broadbent's filter model was a breakaway from the stimulus-response psychology and Gestaltism of the 1950s. His model presents an attempt to describe the flow of information through the organism and was based upon an analogy with telephone and radio engineering. The model regards the whole nervous system as a single channel with a limit to the rate of information it can transmit. Within a fixed period of time the system is able to handle only a certain number of signals. The single-channel system is preceded by a selective filter which would pass only some of the incoming information to protect the single channel from overstimulation. The filter selects stimuli with a common feature. The basis of selection may change over time so that over a prolonged period the system is able to sample all available information. The information that is not selected is held temporarily in a buffer. This information will decay rapidly and become unusable when it is not selected before the time limit of the buffer mechanism expires. Broadbent assumed that a change in the filter setting will take an appreciable time. This feature of the filter model plays a crucial role in his interpretation of performance decrement in noise-aroused subjects. The drop in performance under noise exposure was examined most frequently with reference to Leonard's serial RT task. This set-up consists of five separate lights and five contacts corresponding to a light. When a lamp goes on the subject must touch the appropriate contact. As soon as the contact is made, another lamp lights. There are no pauses other than those inserted by the subject (blocks). When a subject performs this monotonous type of task for a prolonged period, momentary blocks are likely to occur associated with a loss of the opportunity to make a correct response. Such a performance decrement is compatible with the view that prolonged monotonous work is associated with a drop in arousal level reducing performance efficiency. The problem is, however, that noise-induced arousal level is associated with a similar deterioration of performance. Broadbent pointed out that an arousal interpretation must be complemented with filter theory in order to reconcile the discrepancies observed in vigilance and noise studies. His explanation of performance deterioration goes essentially like this. The filter is set by task instructions to a particular channel to receive relevant information. Task-irrelevant information is rejected. Occasionally irrelevant information may pass the filter so that task-relevant information does not reach the limited-capacity channel. In paced tasks, i.e. tasks in which signals are presented at a fixed rate, this will delay the response, whereas in unpaced tasks, i.e. tasks in which the rate of stimuli depends on the subject's response, the occasional retarded response may be countered by an increase in activity between time-outs so that, on average, performance is not altered. Broadbent assumes that with repetitive stimulation intermittent failures in the intake of task-relevant information will occur more frequently. He also assumes that noise increases the sampling rate of irrelevant information. Thus, Broadbent entertains basically a 'blinking' theory. During the 'blink' the central nervous system is cut off from relevant information, producing errors. Blinking may occur during safe periods, however, and then performance will not be affected. Furthermore, the subject may exercise some influence by suspending the blink temporarily. All this suggests that tasks which allow a great deal of anticipation will be more resistant to performance decrement than tasks in which relevant information arrives continuously or irregularly.
Energetics and the reaction process
235
In discussing the effects of noise, Broadbent emphasized that the selective filter is biased to channels on which novel stimuli occur. Thus, the onset of the noise causes the filter to select auditory information, thereby preventing task-relevant information reaching the central nervous system. As the bias is only towards novel stimulation the filter is soon reset to the relevant channel and performance efficiency returns to normal. Novelty is one of the 'collative' or attention-getting characteristics of stimuli, which have been thoroughly examined by Berlyne (1960). In his experimental work, Berlyne sought to determine the stimulus properties that arouse a high degree of attention. He developed a task in which a stimulus display contained four windows in a horizontal line and a response board with a row of four keys, each corresponding to one of the windows, and a home key in the center. On each trial, one or more windows would display a light. When only one window lit up, the subject was to press the corresponding key. When two or more windows lit up, the subject was to press only one of the corresponding keys, the choice being left free. In one series of experiments, the stimuli consisted of large versus small circles, bright versus dim circles, or constantly lit versus flickering circles. The results showed that when two different stimuli were presented, the larger, brighter or flickering stimulus attracted significantly more responses than the smaller, dimmer or constantly lit stimuli. The effects of stimulus novelty were examined in a series of experiments using a similar task. In these experiments, trials consisted of single, pairs or triplets of stimuli. There were three conditions. In the first condition, similar stimuli were presented in all four windows (e.g. four white circles). In the second condition, one of the four stimuli was replaced by a stimulus differing either in shape (e.g. square) or color (e.g. red or green). This procedure was repeated in the third condition. The results showed that the novel stimulus attracted more responses than the background stimuli. Furthermore, this effect was more pronounced in the second compared with the third condition. The results of Berlyne's experiments indicate that novelty is a potent attention-getting stimulus property. At a neurophysiological level, Berlyne (1960) related stimulus novelty to Sokolov's (1963) model of the orientation reaction. This model assumes that incoming stimuli leave traces within the nervous system, especially in the cortex. These traces are nervous models which preserve information about a wide range of stimulus characteristics. Any stimulus will be matched against existing nervous models. The reception of a stimulus will produce a nonspecific activation of the reticular formation via collateral afferents. In the case of a match, the cortex will block the nonspecific effects from the afferent collaterals. In the case of a mismatch, the cortex will send down excitatory pulses to the reticular formation. The activation of the reticular formation, from the cortex and via the afferent collaterals, will then initiate the orientation reaction. Apart from a behavioral orientation towards the source of stimulation, the orientation reaction involves a variety of physiological changes including a lowering of sensory thresholds, the inhibition of ongoing reactions, electroencephalographic changes towards arousal, galvanic skin responses, vasodilatation in the head and vasoconstriction in the limbs, respiratory delay and heart rate changes. The purpose of these changes is to make the organism more sensitive to incoming and relevant stimuli. The discussion of novelty implicated the notion of phasic physiological arousal as a facilitator of stimulus intake. Physiological arousal as a moderator of selective attention is central to the theory presented by Easterbrook (1959). This author intended to explain how heightened arousal affects performance via attentional
236
M. W. van der Molen
mechanisms. He proposed that states of high emotionality, stress and anxiety produce an increase in arousal level which limits the range of cues that an organism uses. The reduction in the range of cue utilization affects performance in a direction depending on the complexity of the task at hand. Complexity is defined here in terms of the number of cues that must be used simultaneously to achieve the performance level required by the task. Task performance is facilitated when the use of irrelevant cues is reduced but it is disrupted when attention must be deployed over a wide range of cues. Easterbrook cited an experiment performed by Bahrick, Fitts and Rankin (1952) to illustrate the differential effects of an enhanced arousal state. In this experiment the subject performed two tasks simultaneously. The primary task was continuous tracking and the secondary task was to report the occurrence of occasional lights in the periphery and to respond to an occasional deflection of a needle on a peripheral dial. When subjects were rewarded to improve their performance on both tasks it was observed that they improved on the tracking task at the cost of the peripheral task. Thus, the effect of incentives represents a shrinkage of the attentional field rather than an overall shift in performance level. In a thoughtful paper, Callaway and Stone (1960) related the Easterbrook hypothesis to Broadbent's selective filter model. They stressed that the filter model is basically an attempt to conceptualize the strategies of the organism to protect itself against information overload. One mechanism is selective filtering. Other strategies consist of considering smaller ensembles for stimulus input and probabilistic coding. Callaway and Stone hypothesized that arousal will lead to a reduced ensemble of possible stimuli all considered more equally probable. The probabilistic coding hypothesis received support from an experiment in which pharmacologically aroused subjects performed the Stroop color-word test. The authors suggested that in the Stroop color-word test color names have a higher probability of eliciting a response than color words. Their probabilistic coding hypothesis would then suggest that pharmacologically induced arousal will act to reduce this difference in probability. Indeed, their results indicated that aroused subjects were better than unaroused subjects in naming the colors of inks used to print conflicting color names. The idea that aroused subjects tend to relinquish their probabilistic coding of stimuli was challenged, however, by the findings reported by Houston and Jones (1967). They examined the effects of noise on Stroop test performance. Their noise-aroused subjects were more efficient on the color-word test, as were the drug-aroused subjects in the experiment described by Callaway and Stone, but the noise did not improve the naming of color spots as would be predicted by the probabilistic coding hypothesis. These conflicting findings are difficult to assess in the absence of independent measures of physiological arousal. Moreover, the introduction of arousal as an explanatory mechanism requires an articulated view on arousal and nervous system functioning. This is exactly the task to which Malmo (1959) set himself in his review of findings from three different sources: (i) electroencephalography (EEG) and neurophysiology; (ii) behavioral energetics; and (iii) physiological measurements of drive. Students of EEG had discovered that distinctive wave patterns characterized the level of psychological functioning from deep sleep through to highly agitated states. The electroencephalogram shifts from a regular tracing consisting of large low-frequency waves to an irregular desynchronized tracing of reduced amplitude. Desynchronization in the electroencephalogram was
Energetics and the reaction process
237
observed recurrently in conjunction with an increased state of alertness. Electroencephalographic desynchronization was typically found, for example, to occur during the foreperiod in a signaled RT task (e.g. Lansing, Schwartz and Lindsley, 1959). The consistent relation between the EEG and alertness suggested to many investigators the existence of brain mechanisms that mediate between the state of the organism and the proficiency of behavior. The discovery of the reticular formation by Moruzzi and Magoun (1949) provided strong empiric support for this idea. The reticular formation was found to be a nonspecific functional unit which seemed particularly well suited for the role of regulating the state of the organism. Long before the discovery of the reticular formation, the existence of an arousal system in the brain was anticipated by physiological studies of behavioral 'energetics'. This work suggested to Malmo in his contribution to the 1958 Nebraska Symposium on Motivation that all variations in behavior fall in two categories: direction and intensity. As long as direction and steering functions are excluded from the definition of drive, the study of drive or emotion is basically concerned with activating or energizing aspects of behavior. Thus, increased drive intensity is expected to be related to increased levels of physiological arousal. Malmo reported the results of a sleep deprivation study which provide support for the drivearousal association. In this study, three males were deprived of sleep for some 60 h while engaged in a tracking task at regular intervals during the period of vigil. It appeared that palmar conductance and respiration rate rose throughout the vigil and EEG alpha amplitude fell progressively. Heart rate and muscle potential data revealed comparable changes. These findings corroborate the view that a graded increase in drive intensity is indexed by a rise in a generalized state of physiological arousal. The attempts to use physiological measurements in gauging drive intensity indicated that response strength is not a monotonic function of drive. Malmo (1958) pointed to the classical study performed by Freeman (1940) who paired, over a period of many days, measures of palmar conductance and response speed from one subject. These pairings were taken at so many different hours during the day that the subject must have been drowsy on some occasions, highly alert on others, and at many stages in between. Freeman's results showed that low palmar conductance was associated with slow responses and that, when conductance increased, responses were progressively faster until a certain optimal conductance level beyond which responses slowed down again. The upward part of the curve was taken by Malmo as support for his contention that physiological recordings can replace the use of antecedent conditions (e.g. hours of sleep deprivation) in the measurement of drive intensity. The downward part was interpreted to suggest that intense drive weakens rather than strengthens correct responses. In his 1959 review, Malmo further developed his concept of 'activation' as a neuropsychological dimension. According to this view, activation refers to a continuum extending from deep sleep at the low activation end to excited states at the high activation end. This dimension is a function of the amount of cortical bombardment by the ascending part of the reticular formation; the greater the bombardment the higher the activation. The quality of performance is related to the level of activation by an inverted U-shaped function. Initially, there is a monotonic increase in performance with activation level but beyond an optimal activation level performance will deteriorate. Activation is a dimension that can be quantified by physiological indicants provided these recordings have
238
M. W. van der Molen
a sufficiently high intraindividual concordance (p. 385). Unfortunately, there are many studies indicating a dissociation between behavioral activation and somatic or electrocortical arousal. The interested reader is referred to Lacey's (1967) penetrating critique for a challenge to the idea of a single activation system.
REVISED VIEWS ON ATTENTION, AND THE REACTION PROCESS
AROUSAL
After the initial period of a revived interest in the concept of attention and the RT method, rapid progress was being made to refine the notions of attention and arousal and to improve the power of the RT experiment. Again, the revisions leaned heavily on the seminal work performed by the pioneers in experimental psychology. One of the ideas that gained an increasing interest was the concept of 'set'; i.e. the notion that instruction and motivation determine importantly the speed and accuracy of the reaction process. The idea of 'set' is considered a precursor of many 'control' theories of the reaction process. Theorists of attention also began to realize that structural models of attention should be revised to incorporate the older notion of 'effort'. Following the footsteps of Ribot (1919), these theorists developed models in which 'effort' has been assigned a crucial role in attaining and maintaining a state of voluntary attention. The new theories confirmed the older ideas that attention has at least two features: adjustive attention and expectant attention. The refinements in the concept of attention were paralleled by a decomposition of the arousal concept. Arousal theorists began to consider the possibility of multiple activation systems as contrasted with the unidimensional activation dimension. A final important notion that re-entered experimental psychology is the assumption that the reaction process consists of a series of distinct components, typically labeled 'processing stages'. This assumption provided the foundation for the development of powerful research tools to identify the selective influence of stresses and other energetic variables on the reaction process.
3.1
From Set to Supervisory Control
The origin of the concept of 'set' can be traced to Lange's (1888) observation that the speed of the simple reaction process varied depending on whether the subject allocated attention to stimulus analysis or response preparation (sensorial versus muscular reactions; see above). The idea of a preparatory set had far-ranging implications beyond the setting of the RT experiment. The W6rzburg psychologists extended the notion of 'set' to the study of thinking (Ach, 1905), and soon further extensions were developed in the fields of perception, memory and human conditioning (see Gibson, 1941, for a review). Broadbent (1970) reintroduced the concept of 'set' to denote two basic types of attentional selection: stimulus and response set. Subjects adopt a stimulus set when relevant and irrelevant stimuli can be discriminated by an obvious physical characteristic such as position or color. Thus, stimulus set defines relevant stimuli by a physical characteristic that allows them to be analyzed more fully. This type of early selection is what Broadbent
Energetics and the reaction process
239
called 'filtering' in his 1958 version of the selective attention model. Response set is a second type of selection and refers to the selection of outputs of the perceptual analysis. Thus, this is a later kind of selective attention which restricts the range of possible responses. Broadbent (1970) used a p r e - p o s t instruction technique to distinguish between stimulus and response set. He reasoned that capacity limitations of the system would make it advantageous to analyze stimuli as little as necessary. With stimulus set this is possible, since irrelevant stimuli can be rejected after one binary choice, e.g. left versus right ear message. If the instruction as to which stimuli to select is given after stimulus presentation, all stimuli, including the irrelevant stimuli, require a complete perceptual analysis. Thus, response set requires a more complete analysis of the stimuli, even of the irrelevant ones. Stimulus set was expected to produce a larger p r e - p o s t instruction effect than response set. To verify this prediction Broadbent devised two tasks, each having a visual and an auditory format. One task allowed subjects to adopt a stimulus set whereas the other task involved response set. The visual format presented red and white digits and in the auditory format the digits were spoken by male and female voices. If subjects are told which digits are to be selected prior to their presentation they will be able to ignore the irrelevant stimuli after a quick analysis of color or voice. The other task presented digits and letters that require a good deal of analysis even if the subjects have advance knowledge about whether digits or letters are to be selected. The results conformed to expectations. Although errors were more pronounced in the post-test condition, this difference was significant only for subjects who were allowed to adopt a stimulus set. In Decision and Stress, Broadbent (1971) revised his 1958 filter model to incorporate these two types of attentional selection. Although the contours of the early filter model were still thought to be valid, it was now assumed that the limited-capacity channel receives probabilistic rather than determinate information. Thus, the limited-capacity channel receives 'evidence' about a stimulus as an input making it more likely that one particular stimulus was present rather than other possible stimuli. A second important modification of the early filter model concerns the assumption that the system selects output states on the basis of the evidence presented to it. The output state is termed a 'category state' to stress the central nature of the output of the limited-capacity channel. Evidence from some stimulus sources may be given more weight than that from others. This is selection by filtering (Broadbent, 1958) and corresponds to the notion of 'stimulus set' (Broadbent, 1970). In addition, category states may have a bias that increases the probability of certain outputs at the expense of others. This is selection by 'pigeon-holing' and corresponds to the idea of 'response set' (Broadbent, 1970). One of the major goals of Decision and Stress was to distinguish the roles of filtering and pigeon-holing in various domains of experimental psychology. One of these domains was the reaction process. Previously, Broadbent (1958) had treated RT as a convenient tool to assess the workings of the selective filter mechanism. Now he wants to take a much closer look into the nature of the reaction process to see which parameters of the process might change under abnormal conditions or stresses. In accord with his probabilistic approach, Broadbent viewed the reaction process as an accumulation of evidence in a central decision mechanism which ultimately selects a category state as an output. He pointed to the well-documented fact that, when subjects are instructed to increase the speed of responding, more errors will be made, and he
240
M. W. van der Molen
reviewed the RT literature with a special focus on error latency to attain a more specific notion of the reaction process. This review indicated that when a response occurs as an error its speed is determined by the response rather than the stimulus. The apparent tendency to make as errors those responses that occur most rapidly suggested to Broadbent the conclusion that pigeon-holing rather than filtering is involved in the generation of errors. At this point, the important question to ask is how stresses affect the reaction process. In Perception and Communication', Broadbent (1958) interpreted the adverse effect of noise exposure in terms of filtering. In Decision and Stress, Broadbent (1971) considered the possibility that stresses may affect pigeon-holing rather than filtering. He adopted his usual strategy in distinguishing between filtering and pigeonholing, i.e. a detailed examination of the pattern of correct and incorrect responses. If a stress would increase the probability of a correct response one might conclude that its effect consists of a change in filtering. In contrast, if a certain response would occur more frequently as an error this would suggest a response bias, and thus an effect on pigeon-holing. This strategy was applied to the results of an experiment in which noise-aroused subjects were presented common and clearly visible words together with uncommon and less visible words. Intense background noise slowed responses to the uncommon word while the common word was slightly easier to see than when it was quiet. Most importantly, noise did not change the proportion of misperceptions for common versus uncommon words. Thus, noise did not induce a bias in favor of one response rather than another. From these findings, Broadbent concluded that the difficulty in noise is one of stimulus selection (i.e. filtering) and not one of changes in response bias (i.e. pigeon-holing). Broadbent (1971) also considered stresses other than noise. Previously, he observed that the effects of noise on serial reaction task performance consisted of an increase in extremely slow reactions and a higher incidence of errors (Broadbent, 1958). Other stresses were observed to exert somewhat different effects. The effect of sleep deprivation, for example, was also found to increase the number of slow reactions but, unlike noise, it did not increase error incidence. The drop in performance in sleep-deprived subjects is compensated by the effect of incentive but incentive exaggerates the adverse effect of alcohol. These and other findings are difficult to reconcile with an interpretation relating stress-induced shifts in performance to changes along a unitary arousal continuum. The counteracting effects of noise and sleep loss can be easily explained by assuming that noise is associated with an arousal increase whereas sleep loss results in a drop in arousal level. The noise and sleep loss compensating effects of incentive are more difficult to explain in terms of unitary arousal changes. In order to provide a unified account for the interactive effects between stresses, Broadbent (1971) speculated that two mechanisms are involved: an upper and a lower mechanism (Figure 7.1). The lower mechanism is concerned with the execution of well-established decision processes whereas the upper mechanism is, at a higher level, monitoring the performance of the lower mechanism and altering its parameters to ensure an acceptable performance level. As long as the upper mechanism is in an efficient state, the adverse effects on the lower mechanism exerted by noise or sleep loss do not need to show up in performance. A relaxation of the upper mechanism, however, may result in a drop in performance efficiency. Such a relaxation may occur at the end of the working period or under the influence of alcohol. In contrast, incentives may act to increase the efficiency of the upper mechanism.
Energetics and the reaction process
241
STATE VARIABLES Time on Task Extraversion Alcohol
UPPER MECHANISM whose increasing activity reduces effects of sub or super optimal lower
LOWER MECHANISM ,~ for whose activity Input there is an optimum
Output
Noise Sleeplessness STATE VARIABLES Figure 7.1. Broadbent's (1971) solution to the complications of the relationship between stresses and physiological arousal. Some stresses will affect the lower arousal mechanism and do not alter performance unless the upper mechanism is ineffective. [Reprinted with permission of the author and publisher.]
Neurophysiological data were marshaled as preliminary evidence for the biological plausibility of an upper and lower mechanism. Broadbent harked back to the distinction between paced and unpaced tasks. Unpaced tasks allow the subject to compensate performance failures whereas paced tasks do not permit compensation (see above). Unpaced tasks will show deterioration only when the upper mechanism is out of action. Pharmacological studies showed that barbiturates have their effect primarily on unpaced tasks, and tranquilizers on paced tasks. This finding was assumed to be consistent with the different loci of drug action, barbiturates having their effect on cortical centers while tranquilizers lower the activity of the brain stem. Furthermore, it was observed that sleep loss affects paced task rather than unpaced task performance suggesting that sleep deprivation and tranquilizers influence the same mechanism, i.e. the efficiency of the lower mechanism. A second line of evidence stemmed from neurotransmitter studies showing that one system based on adrenergic transmission is controlled by another system based on cholinergic transmission. The adrenergic system is involved in stabilized performance based upon more remote learning while the cholinergic system is in control in unstabilized tasks which require the prompt detection of departures from regularity. The revised filter model of selective attention represents an early attempt to integrate energetic aspects of behavior into a model describing the information flow
242
M. W. van der Molen
through the organism. The original model was derived from telephone and communication theory and did not consider the possibility that human performance is plagued by errors and unreliability. The new model was formulated adopting a probabilistic perspective and does allow for variability in the characteristics of the information processing system under suboptimal and superoptimal conditions. The new model also made an attempt to account for the regulatory aspects of the control of behavior by introducing an upper mechanism to supervise changes in the efficiency of performance. Another energetic feature of the model refers to the attempt to base the distinction between an upper and lower mechanism on a neurophysiological foundation. This integration of energetic aspects of behavior within a broad-ranging theory of the human information processing system has been difficult to improve upon by more recent attempts.
FROM ATTENTIONAL BOTTLENECKS ALLOCATION OF EFFORT
TO THE
In discussing the limits to human performance, Broadbent (1971) assumed the human information processing system to consist of hypothetical structures, each structure receiving an input from a predecessor or the environment and transforming this input into an output for other structures or for generating a response. This information processing system was bestowed with two structural bottlenecks to protect the limited-capacity processor for overload-early selection by filtering stimuli and later selection restricting the range of possible responses. In his influential book Attention and Effort, Kahneman (1973)emphasized the flexibility of attentional selection. Bottlenecks do not occur at predetermined loci in the information flow but may occur during any stage of information processing depending on the amount of processing capacity allocated to that structure by a central control system. Thus, Kahneman assumed that information processing structures require two types of input: an information input specific to that structure and a nonspecific input which may be dubbed 'capacity' or 'effort'. Figure 7.2 presents a diagram of the information flow through a series of processing structures assembled for the perceptual analysis of a stimulus. The diagram illustrates Kahneman's position by showing that the efficiency of perceptual analysis will depend on the stimulus presented to the system and the amount of effort allocated to its processing structures. The diagram describes perceptual analysis as a sequence of processing structures from sensory registration of the stimulus through the selection of a response. The initial process, dubbed 'unit formation', partitions the perceptual field in segments or units. Some units receive greater 'figural emphasis' than others. The figural emphasis controls the quality of the input to 'recognition units'. The activation of a unit is a matter of degree. Its level of activation will depend on the specific features to which the unit responds and the subject's perceptual readiness. The graded output of the recognition units is then fed into a structure which selects 'perceptual interpretations' for some of the perceived objects. If stimuli fail to activate their recognition units beyond a critical value they will be left uninterpreted. Uninterpreted stimuli will have little effect on subsequent processing. The process of 'response selection' completes the perceptual analysis of the stimulus. In
Energetics and the reaction process
STRUCTURES Ii
243
CONTROL
CAPACITY
1 !
SENSORY REGISTRATION
,,
UNIT FORMATION
///// FIGURAL EMPHASIS
AVAILABLE CAPACITY oN
,
it,,
high arousal disrupts allocation
PHYSIOLOGICAL AROUSAL
ACTIVATION OF RECOGNITION UNITS
low arousal prevents task set
SELECTION OF INTERPRETATIONS I!
SELECTION OF RESPONSES
I_ PUPIL DILATION
#
v~ PERFORMANCE MEASURE
Figure 7.2. Kahneman's (1973) effort model. The diagram describes the information flow through the organism from sensory registration through response selection. The central information processor needs two types of input: information and capacity. The limited-capacity supply depends on variations of physiological arousal. The central element of the model is the policy of capacity allocation. The allocation policy is controlled by enduring dispositions, momentary intentions, evaluation of task demands and level of arousal. [After Kahneman, 1973.] many experimental situations, the selection of the appropriate response will be facilitated by a state of response readiness. The central part of Kahneman's model consists of a control system that allocates effort to the information processing structures. The figure shows that the allocation policy of this control system is determined by four factors. (1) Enduring dispositions. These enduring dispositions reflect the rules of involuntary attention and refer to innate routines that are triggered by specific stimulus properties. Moving objects and stimuli with rich contours, for example, are especially favored in the allocation of attention. Enduring dispositions also refer to Berlyne's (1960) collative stimulus properties, such as stimulus novelty or complexity (see above). Thus, the preliminary analysis of a novel stimulus may cause the allocation of greater effort to elaborate the analysis of the stimulus via the recursive route of attention control. This path of attentional control will be involved
244
M. W. van der Molen
whenever the initial analysis of the stimulus does not yield a sufficiently detailed and complete perceptual interpretation. (2) Momentary intentions. The adjustments controlled by momentary intentions can be grouped under the collective labels of Broadbent's (1970) notions of stimulus and response set (see above). Stimulus set defines the relevant stimuli by a physical characteristic (e.g. left ear message), which permit these stimuli to be analyzed in more detail than other stimuli, whereas response set defines relevant items by a common set of responses rather than by a stimulus feature. (3) Evaluation of task demands. A major assumption of Kahneman's model is that the mobilization of effort is determined by the task demands rather than by the intentions of the performer. When a task allows passive monitoring, subjects may be able to distribute their attentional capacity across concurrent stimulus sources allowing parallel processing of simultaneous inputs with little interference. When a task requires immediate responses, however, the activation of a recognition unit corresponding to a target will result in the allocation of greater effort to a more detailed analysis of the target and a withdrawal of capacity from other sources. Thus, attention will be more unitary for tasks demanding greater amounts of effort. (4) Arousal. Kahneman suggested that failures of under-arousal are due to low motivation. Subjects may fail to adopt a task set or to evaluate the quality of their performance. He interpreted the failures of over-aroused subjects in terms of Easterbrook's (1959) hypothesis (see above). Performance decrement in overaroused subjects may be due to attentional narrowing, impaired attentional discrimination, a heightened attentional lability, or a combination of these factors. The observation that momentary fluctuations in task difficulty are frequently accompanied by changes in physiological responsivity suggested to Kahneman an intimate relation between mental effort and physiological arousal. The hypothetical relation between mental effort and physiological arousal is important because it allowed Kahneman to derive a metric for the momentary exertion of effort. Kahneman proposed pupil dilatation as the prime measure of mental effort. Pupil size is sensitive to between-tasks variations, i.e. it orders tasks by their difficulty. Pupil size is also sensitive to between-subjects differences, i.e. it reflects differences in the amount of effort people invest in a given task. Finally, pupil size is sensitive to within-task variations, i.e. it reflects the momentary involvement in the task. Furthermore, Kahneman was able to demonstrate that neither anxiety nor muscular strain could account for the pupillary responses observed during mental task performance. Validation studies of changes in pupil size that accompany information processing suggest that they reflect a common factor related to the processing demands of a wide array of tasks ranging from simple perceptual tasks to more complex problem-solving tasks (see Beatty's (1982) review of the available literature). In many respects, Kahneman's (1973) effort theory represents an extension of earlier versions of activation theory. The proposal of pupil dilatation as the most generally suitable and sensitive measure of effort is based upon the link between capacity and physiological arousal. The attractive feature of Kahneman's approach is the flexible and active allocation of energetic supply to replace the stimulusdriven mobilization of energetic processes as proposed by the early activation theorists. But his approach shares with the early activation theorists the untenable hypothesis of physiological arousal as a unidimensional construct. In pointing out that mental capacity and physiological arousal covary, Kahneman did not specify
Energetics and the reaction process
245
the type of arousal that is involved. It could be behavioral, electrocortical or autonomic arousal. In proposing an index derived from the sympathetic branch of the autonomic nervous system as a metric for effort, however, he seemed to retain the early activationist's idea that arousal states can be aligned along a continuum of sympathetic dominance. Another problem that Kahneman's effort theory is sharing with early activationism is that it is notoriously difficult to make accurate predictions of behavior. Hockey (1984) generalized this point to all theories that assume a relation between arousal level and performance efficiency. Hockey claimed that unless one is able to make an independent assessment of how different factors affect arousal, it will be impossible to make predictions about the combinatory effects of these factors on behavior. Previously, this problem led N/i/it/inen (1973) to reject the notion of arousal as a unidimensional mechanism varying only in intensity. N/i/it/inen emphasized arousal as a qualitative patterning of bodily state rather than a level of some general process. Performance of a task will depend on the patterning of arousal rather than on its intensity. This view has been elaborated by Hamilton, Hockey and Rejman (1977) who proposed the analysis of 'activation states' which relate to points in a multidimensional space rather than levels on a single dimension ranging from low to high activation. The 'state' approach represents an attempt to avoid the difficulties inherent in a quantitative arousal theory by a descriptive analysis based on the mapping of qualitative patterns of performance in different task environments. It should be noted, however, that strong predictions cannot be derived from a descriptive analysis of activation states. This broad-band approach may provide explanations only after the facts. Finally, Kahneman's emphasis on general capacity limitations was criticized by Sanders (1986) who pointed to many situations in which the performance of a primary task (e.g. mental arithmetic) is not adversely affected by the concurrent performance of secondary task (e.g. pursuit tracking) (for a review, see Gopher and Sanders, 1984). These observations led Sanders and other investigators (e.g. Wickens, 1984) to suggest the existence of multiple capacity sources. An early rejection of the single capacity hypothesis of attention can be found in a paper by Allport, Antonis and Reynolds (1972). These authors demonstrated that subjects are perfectly able to attend continuous speech at the same time as taking in complex visual scenes or while sight-reading piano music. Thus, Allport et al. submitted a multicapacity hypothesis of attention. As Kahneman identified processing capacity with physiological arousal, the division of capacity poses immediately the question of how multiple processing resources are related to the unitary concept of physiological arousal underlying Kahneman's effort theory.
4.1 From Unitary Arousal to Multiple Brain Systems In broad outline, recent advances in cognitive psychology and neuroscience reinforced the ideas already proposed by the pioneers in the experimental study of attention. Ribot (1919) distinguished between involuntary and voluntary attention (see above). The division of attention in involuntary and voluntary varieties has undoubtedly been the main impetus for Pribram's landmark contribution to the breakdown of the unitary arousal concept (McGuinness and Pribram, 1980; Pribram
246
M. W. van der Molen
and McGuinness, 1975). His theory of attentional controls distinguishes between two major regulatory systems: a system producing a phasic response to input, 'arousal', and a system maintaining readiness for action, 'activation'. Through the arousal system, the organism orients to novel input. The visceroautonomic responses to novel stimuli (e.g. changes in heart rate, respiration and galvanic skin conduction) but not the behavioral responses (e.g. turning of head and eyes) have been observed to disappear after amygdalectonomy, which suggests that the amygdala is central to arousal. The finding that the behavioral responses continue after damage of the amygdala indicates that another system is involved in orienting. This system centers on the striatum of the basal ganglia and a lateral strip of the frontoparietal cortex. Orienting is abolished when these areas are damaged in addition to the amygdala (Pribram, 1990; Pribram and McGuinness, 1975). Pribram also specified the neurotransmitter pathways involved in the orienting reaction (McGuinness and Pribram, 1980). The arousal system is posited to be basically serotonergic with the norepinephrenergic fibers acting to modulate the serotonergic mechanism to produce an interrupt of an ongoing state. Through the activation system the organism maintains a tonic readiness for action. According to Pribram, the attention directed by the activation system is characteristic for a state of vigilance. It has been observed that the ability to maintain a vigil state is greatly reduced in patients with basal ganglia lesions, which suggests that the activation system activates the basal ganglia. Pribram reviewed the neurotransmitter literature to suggest that the maintenance of response readiness depends on a balancing of dopamine and cholinergic systems (McGuinness and Pribram, 1980). Pribram further assumed an effort system for the supervisory control of the arousal and activation systems thought to be necessary for motivated action. This system is centered on the hippocampal formation. The effort system exerts its control over arousal via frontocorticothalamic connections and over activation by way of brain stem linkages. When idling, the system reflects comfortable, unstressed exploratory behavior. When engaged, the system makes it possible for attention to be 'paid' (Pribram, 1990, p. xxxii). Tasks demanding the categorization of stimuli and the selection of responses will invoke the effort system. The uncoupling of arousal and activation to prevent reflex action is also necessary in reasoning tasks. Pribram proposed that adrenocorticotropic hormone (ACTH)-related peptides operate on the hippocampus and thus on the effort system to facilitate its influence on the two more basal arousal and activation systems (McGuinness and Pribram, 1980). Pribram's scheme of attentional controls has been supplemented by Tucker and Williamson's (1984) elaboration of noradrenergic arousal and dopamine activation. The noradrenergic fibers originate from a relatively small number of body cells in the locus coerulus and innervate multiple brain areas including the neocortex, hippocampus, thalamus, cerebellum, and portions of the hypothalamus and limbic system. The multiplicity of innervations originating from a single source is consistent with the notion of arousal as a general regulatory system. Tucker and Williamson pointed out that this system does not increase neural activity directly but rather enables responsivity to the environment. Thus, it would be inappropriate to describe noradrenergic control in terms of conventional notions of an activation system. The primary role ascribed to arousal is to respond to novel external input and to habituate to repetitive stimuli. According to Tucker and Williamson, the arousal system achieves its attentional control primarily through its bias toward
Energetics and the reaction process
247
habituation. The inherently negative habituation bias has the important positive effect of augmenting the brain's response to changes in the environment. Noradrenergic arousal serves to minimize informational redundancy and opens up sensory channels for receptive input. Thus, it broadens the attentional scope and shifts control of the current information flow to the outside. In this latter respect, the Tucker and Williamson conceptualization of the arousal system is similar to the arousal state described by Lacey (1967) as a 'state of relaxed acceptance of external stimulation'. The dopamine pathways appear to target more specific brain structures compared with the noradrenergic system, which is omnipresent in the brain. The dopamine projections arise mainly from the ventral tegmental area and innervate the neostriatum of the basal ganglia, discrete areas of the frontal and enthorinal cortices, and a lack of projections in posterior parts of the cortex. Since the basal ganglia are implicated in motor behavior, Tucker and Williamson suggest that the attentional control exerted by the dopamine system consists of the maintenance of an active vigilant state closely tied to motor readiness. The dopaminergic regulation facilitates a serial organization of specific motor acts rather than simply increasing motor activity diffusely and randomly. Extremely high levels of dopamine activation are likely to result in highly stereotyped motor sequences. The activation system is assumed to exert its attentional control by increasing the redundancy of the information in the brain channels. The bias toward redundancy is needed to ensure continuity of behavior. Without it, the organism would be engaged in an enduring and ferocious state of novel self-stimulation. In contrast to the broadening of attention resulting from the habituation bias of noradrenergic arousal, the increased redundancy with dopaminergic activation produces a focal attentional mode which facilitates the internal control of motor operations. An important feature of Tucker and Williamson's refinements of the Pribram and McGuinness theory of attentional controls refers to the lack of an attentional control system involved in discrimination and problem solving. Tucker and Williamson argued that the functions ascribed by Pribram to the effort system can be attributed more easily to their conceptualization of the dopaminergic activation system. The strong emphasis on activation as an attentional control promoting routinization led them to propose that the redundancy bias of activation provides the attentional substrate for developing practiced skills, for building a repertoire of efficient processing routines, and for establishing descriptive systems such as language or mathematics (cf. Tucker and Williamson, 1984, pp. 205-206). The interpretation of arousal in terms of a bias toward habituation and of activation in terms of a redundancy bias allows a reinterpretation of Easterbrook's (1959) notion of cue utilization (see above). Easterbrook assumed that increases in physiological arousal cause a narrowing of attention. This hypothesis considers only quantitative features of arousal; high levels restrict the range of environmental input. In contrast, Tucker and Williamson suggest that the selectivity of attention is controlled by different neural systems. An increase in the activity of the arousal system maximizes the number of unique events that receive attention, whereas activation promotes a funneling of attention, restricting the range of possible inputs and increasing the degree of processing for each input. Obviously, the Tucker and Williamson formulation qualifies Kahneman's (1973) suggestion that physiological arousal regulates the availability of mental capacity. Kahneman's unitary notion of physiological arousal is divided into two distinct neural control systems: one
248
M. W. van der Molen
system, arousal, allocates capacity to maintain a global representation of the environment and the other system, activation, allocates capacity to a serial mode of information processing so that each of the serial operations is assured sufficient capacity for thorough processing. In a series of studies, Robbins (1986) discussed the possibility of relating noradrenergic and dopaminergic attentional control to the 'upper' and 'lower' arousal systems postulated by Broadbent (1971) (see above). This discussion led Robbins to a conclusion that is rather different from the formulation advanced by Tucker and Williamson. The latter authors implicated the dopamine system in the internal control of attention. Conversely, Robbins suggested an intimate relation between the noradrenergic system and Broadbent's upper mechanism. The empiric foundation for this conclusion was derived primarily from studies in which lesioned or pharmacologically treated rats perform a serial five-choice task analogous to Leonard's task for humans. One such study showed that dopamine depletion reduces the speed and probability of responding without affecting discrimination accuracy (Robbins et al., 1982). By contrast, the indirect dopamine antagonist, D-amphetamine increases the number of premature responses. The effects of D-amphetamine were observed to be similar to the effects of white noise presented as a distractor in the serial choice task (Carli et al., 1983). The disrupting effect of white noise was more pronounced in noradrenergic lesioned rats than in intact animals. This pattern of results suggested to Robbins and coworkers that stressors, white noise or D-amphetamine increase the activity in at least part of the dopamine projections, resulting in response activation which is seen in the adoption of a more risky criterion. Accordingly, the dopamine system was thought to be the neural correlate of Broadbent's lower mechanism. In the intact organism, however, the noradrenergic pathways protect the organism from disruptive levels of performance. Thus, the neural implementation of Broadbent's upper mechanism was assumed to reside in the prefrontal cortex, being one of the few neocortical regions able to influence the activity of noradrenergic cells in the locus coerulus. To summarize briefly, the current cursory overview of attempts to break down the unitary notion of physiological arousal suggests that these attempts converge on the existence of at least two distinct attentional systems in the brain. One is a system involved in orienting and the other system participates in maintaining a readiness for responding. The orienting system is noradrenergic and the readiness system is dopaminergic. Of course, the neural control of attention is much more complex than this simplified scheme of a dual control of attention by orienting and readiness systems. One of the many unresolved issues in this area pertains to the alleged existence of a distinct neural system engaged in the monitoring of lower attentional brain systems. The existence of a supervisory mechanism is inferred from human performance literature on stress and attention (Broadbent, 1971; Kahneman, 1973), from animal studies examining neurotransmitter systems in the brain (Robbins, 1986), and from discussions of the neurophysiological literature on attentional controls (Pribram and McGuinness, 1975). One of the issues surrounding the neural implementation of supervisory control is whether such a mechanism must be reduced to noradrenergic (Robbins, 1986) or dopaminergic (Tucker and Williamson, 1984) systems, or whether it is a sovereign system modulating the neural state of the brain by ACTH-related peptides, as suggested by Pribram and McGuinness (1975). At this point, however, the precise neural implementation of supervisory control is less important than the observation that both the human
Energetics and the reaction process
249
performance and neurophysiological literature agree in assuming the existence of supervisory control. The task of the experimental psychologist is now to detail further the interplay between lower and upper mechanisms in their response to suboptimal and superoptimal conditions. The next section will illustrate how this task may be facilitated by methods developed for the decomposition of the reaction process.
4.2
From Mean Reaction Time to Processing Stages
In the first half of the 19th century physiologists were interested in the problem of variation in human nervous system transmission but they were bound to the view that its speed was instantaneous. It was not until the distinguished physiologist Helmholtz (1850) published his studies on nerve conduction velocity that this notion was challenged. He estimated nerve conduction velocity in the frog by stimulating a nerve in its leg, as far away from and as close as possible to the muscle. In each case, the time that elapsed between stimulation of the nerve and contraction of the muscle was measured. The difference in time between the two measurements was used to derive an estimate of nerve conduction velocity. His estimates were about 26 m s - 1 , a surprisingly slow rate given the belief that transmission was so rapid it could probably never be measured. Helmholtz then conducted similar studies with humans, which represent the first systematic studies of human RT. The Dutch physiologist Donders grasped the psychological significance of the procedure. His work marked the beginning of the use of RT measures to infer the timing of mental processes. He and his student De Jaager assumed that only a small amount of the time taken by a mental action could be ascribed to nerve conduction velocity and that the durations of mental acts could be calculated by measuring increases in physiological times that were produced by the systematic introduction of more complex reactions (De Jaager, 1865; Donders, 1868). They assumed further that the physiological time taken from the stimulus input to the response output was the sum of the times taken at each level of processing in the nervous system. Thus, they reasoned that, by starting with the simple reaction of Helmholtz and progressively adding complexity to this reaction, they could infer the durations of the inserted mental processes from the increases in the resultant physiological times. This experimental approach assumed therefore not only that processes were additive, but also that the addition of a new process to a reaction did not alter the durations of the other processes. The latter assumption, known as 'pure' insertion, in combination with the assumption of additivity, led to their development of the 'subtraction' method to infer the duration of mental processes. Donders and De Jaager assumed the existence of a three-level hierarchy of mental reactions, and identified by them as the a-, b- and c-reactions. The 'a-reaction' is a simple reaction that requires no stimulus discrimination or response selection decisions. The 'b-reaction' is a choice reaction that required both a stimulus discrimination and a response choice. The 'c-reaction' was elicited in a task in which the subject was presented with two or more stimuli and was required to make a fixed response to only one of them. The assumptions of additivity and pure insertion led Donders and De Jaager to assert that the
250
M.W.
van der Molen
difference in physiological times between the a- and c-reactions provided a measure of stimulus discrimination time and the difference in times between the b- and c-reactions gave an estimate of response selection time. The 'pure insertion' assumption underlying the subtraction method was soon challenged by Lange's (1888) distinction between sensorial and muscular reactions (see above). In performing a Donders a-task most subjects will adopt a muscular attitude which will allow them to respond with a 'prepared reflex' upon stimulus detection. When the task is changed from an a- to a b- or c-task, the muscular attitude is likely to be replaced by a more sensorial attitude in order to avoid unacceptable error rates. Thus a-reactions are associated with a higher state of motor preparedness than b- or c-reactions. From this it follows that the change from an a-reaction to a c-reaction does not simply involve adding stimulus discrimination to stimulus perception and response actualization; it means that a fundamentally different and more complicated reaction process is substituted for the simple reaction process. The challenge to the 'pure insertion' assumption struck a near-fatal blow to the stage analysis of the reaction process. Much of the credit for its rebirth can be attributed to Sternberg (1969). Like Donders, Sternberg (1969) assumed that human information processing, from stimulus input to response output, was achieved via a series of nonoverlapping stages whose durations were stochastically independent. Unlike Donders, however, he did not assume that experimental tasks could be devised that would insert or delete stages of processing. Rather, Sternberg assumed that the existence of a stage could be inferred and the amount of information processed by it could be manipulated by judicious selection of experimental factors. That is, he assumed that factors could be identified that have a 'selective influence' on information processing. Further, he assumed that the pattern of statistical effects produced by these manipulations supported specific conclusions about the stage or stages of processing being influenced. Inferences could thus be drawn about the structure of mental processing, but not about its timing. Sternberg illustrated the basic idea underlying his new method with the following example. Suppose that the RT interval consists of three stages: a, b and c. Suppose further that the duration of stage a is influenced only by factor F, of stage b by factor G, and of stages b and c, but not a, by factor H. From Sternberg's assumptions that stages are temporally discrete and stochastically independent it follows that: (i) experimental factors that influence the durations of different stages will produce additive effects on RT; and (ii) experimental factors that influence the same stage will produce interactive effects on RT. Thus, in this example, factors F and G will have additive effects on RT (i.e. the effects of F will not vary as a function of the level of G or vice versa), but factors G and H will have interactive effects on RT. Sternberg applied the additive factors method (AFM) to test the stage model in a standard two-choice RT task. In this task the subject was required to make a verbal response to a visually presented numeral. Three factors were manipulated, each at two levels. Stimulus quality was varied by presenting the numeral intact or degraded; stimulus-response (S-R) compatibility was varied by requiring the subject to respond either by naming the numeral or by saying the name of the numeral that was one value larger; and the number of equally likely S-R alternatives was varied by having the subject respond to a numeral selected from a set of either two or eight numerals.
251
Energetics and the reaction process
The effects of stimulus quality and S-R compatibility were perfectly additive, but both of these factors interacted with the third factor, number of alternatives. Sternberg interpreted this pattern of results as follows. The additive effects of stimulus quality and S-R compatibility imply that the task is mediated by at least two distinct stages: stimulus encoding and response translation and organization. The interaction of both these factors with the number of alternatives factor suggests that this latter factor influences both stages of processing. Recent work suggests the existence of at least six processing stages: three perceptual stages, a response selection stage, and two motor stages (for reviews see Frowein 1981; Sanders, 1980, 1990). Figure 7.3 presents the information flow through the alleged six processing stages. Several studies have shown that stimulus intensity has additive effects with other variables that are likely to affect the perceptual side of the reaction process, such as stimulus quality and stimulus discriminability. From these findings, a stimulus pre-processing stage has been hypothesized in which sensory input undergoes a 'perceptual clean-up' in preparation for subsequent processing. Stimulus intensity has been assumed to affect the duration of this stage. Once this pre-processing is completed, the output is then fed into a second stage where feature analysis occurs. The duration of this stage is thought to be affected by stimulus quality. Output from this feature analysis stage is passed to a third stage in which the analysis of the entire stimulus takes place. The duration of this stage is determined by the ease with which the different stimuli can be discriminated into the various relevant classes. The response end of processing may include at least three distinct stages: response selection, motor programming and motor adjustment. Support for the existence of a response selection stage that is distinct from the perceptual stages comes from the persistent observations of nonzero, but statistically nonsignificant, interactions between the effects of S-R compatibility and stimulus quality. Support for the existence of motor stages that are distinct not only from the perceptual stages but also from the response selection stage comes from numerous studies in which additive effects have been observed between stimulus quality, S-R compatibility and time uncertainty- usually manipulated by varying the interval between
PROCESSING STAGES
I MOI'ORADJUSTMENT I I'ROGRAMLOADING(?)
[ MOTORPROGRAMMING~ RESPONSEC!IO|CE I IDENTIFICATION Ii FI.2ATUREANALYSIS PREPROCESSING
~
~
~ ~
foreperir
tluralir
spatial inovelnclll accuracy
movcmcnt velocity
S-R compatibility
mental rotation
~1~ stimulus quality ~1~ stimulus intensity
TASK VAI~,iA FII_,I:.S
Figure 7.3. The stages: preprocessing, feature analysis, response choice, and motor programming are well-documented in the literature. There is only preliminary evidence for the stages: identification, motor programming and program loading.
252
M. W. van der Molen
a warning stimulus and the stimulus that calls for a response. In addition,, time uncertainty and movement velocity have been reported not to interact. Thus, after the selection of a response, specific motor features of the response are elaborated during a stage which may be labeled as 'motor programming'. This stage is then distinguished from a stage denoted as 'motor adjustment' and which is influenced by a general process of response readiness. During the motor adjustment stage, the readiness to respond approaches the motor action limit and once this limit is exceeded a response is produced automatically. The subtraction method and the AFM may provide the means for a more detailed categorization of stress effects. Wundt (1910) classified drugs in terms of early versus late and facilitating versus debilitating effects on mean RT. Broadbent's (1971) analysis resulted in a taxonomy of stresses affecting either the upper or lower mechanism. The stage analysis of the choice reaction process will allow the investigator to specify the locus of the stress effect in the chain of information processing. The procedure of assessing stresses by means of the AFM is to incorporate stress as one of the factors in the experimental design. Suppose that three factors, F, G and H, were found to have additive effects on mean RT. Following additive factor logic, this data pattern would imply that the choice reaction process manipulated in this experiment consists of at least three independent stages. Suppose further that 'stress' interacts with G but shows additive effects with F and H. Stress would then be thought of as influencing a stage in common with G. The AFM has been adopted by Frowein (1981) in a series of studies to examine the effects of drugs on the choice reaction process. In one experiment, amphetamine (phentermine hydrochloride) and barbiturate (pentobarbital sodium) were factorially combined with stimulus degradation and S-R compatibility. The subject was required to release a home button to press one of four target buttons indicated by an arrow stimulus. In the degraded stimulus condition, a visual noise pattern was superimposed on the arrow stimulus and in the incompatible condition the subject was to press the next button in a counterclockwise direction. The results showed the anticipated additive effects of stimulus degradation and S-R compatibility on mean RT, i.e. the latency between stimulus onset and home button release. These factors failed to influence movement time, i.e. the time elapsing between releasing the home button and depressing the target button. Most importantly, the effect of barbiturate on mean RT showed an interaction with stimulus degradation but not with S-R compatibility, whereas a converse pattern was observed for amphetamine. Movement time was affected by amphetamine but not barbiturate. These findings suggest that the adverse effect of barbiturate has an early perceptual locus whereas amphetamine affects response-related processes. Similar results were obtained by Logsdon et al. (1984). Another illustration of the application of the AFM to the study of stress concerns the question of how sleep loss may alter the duration of processing stages (Sanders, Wijnen and van Arkel, 1982). Sanders and colleagues reported an experiment in which subjects performed a four-choice reaction task in which signals were either intact or degraded, and in which the relation between stimulus and response was either compatible or incompatible. The subjects performed their task after either a good night's sleep or one night of sleep loss, and either during the morning or the afternoon. Finally, the results were analyzed for successive 5 min periods of time-on-task. The results showed the expected additive effect of stimulus quality
Energetics and the reaction process
253
and S-R compatibility on mean RT. More interestingly, sleep deprivation increased the adverse effect of stimulus degradation but it did not alter the S-R compatibility effect. The overadditive interaction between sleep loss and stimulus quality was more pronounced for afternoons compared with morning sessions but was observed to increase with time-on-task only for morning sessions and not afternoon sessions. A second experiment was performed that presented subjects with a simple reaction task with either auditory or visual stimuli which varied in intensity. The stimuli were presented after randomly varying and equiprobable intersignal periods. The major finding was the lack of significant interactions between sleep state and either stimulus modality or intensity. The results emerging from these two sleep deprivation experiments suggest that sleep deprivation affects the stimulusencoding stage (interaction with stimulus quality) and the response adjustment stage (interaction with time uncertainty) but not pre-processing (additive effect with stimulus intensity) or response selection (additive effect with S-R compatibility). The adverse effect of sleep loss seems to increase during the day. It should be noted that the results of the stage analysis of the effect of sleep deprivation on the reaction process are easy to reconcile with Broadbent's (1971) review of the effects of sleep loss on serial RT performance- a slowing of response speed and a more pronounced effect at the end of the working day. The AFM has been criticized for various reasons, two of which are especially relevant here. One critique is that the decomposition of the reaction process into a sequence of discrete stages provides only a taxonomy of task effects rather than a detailed insight into what is actually going on inside each of the processing stages and how these processes can be modeled. Total RT is conceived of as the sum of stage durations but the relation with underlying stochastic latency mechanism is left unspecified. One alternative route is to specify the stochastic latency mechanism involved in the reaction process from which predictions can be derived concerning relevant aspects of the process (Pieters, 1983). An example of this approach has been described by Laming (1988) who proposed a decomposition of the RT process in terms of boundary conditions rather than stages. In this context, the decision component of the choice reaction process is identified as the time taken by a random walk to reach one of the absorbing boundaries. If there are two stimulus alternatives, the process can be thought of as a system which takes the difference between evidence in favor of each of the two alternatives. Figure 7.4 presents a diagram of the random walk. The decision process will start from a central point and move in an irregular and oscillating manner between boundaries until the evidence in favor of one of the stimulus alternatives exceeds a critical value. The critical value need not be the same for each of the two alternatives and the location of these boundaries might be adjusted from trial to trial. For example, an error will lead to an increase of the corresponding response criterion. The epoch at which the decision process begins may be adjusted in a similar fashion. The subject cannot judge exactly when the stimulus will be presented; thus if the sampling begins after stimulus onset, RT will be prolonged with that delay. If sampling begins prior to stimulus onset, the reaction process will start with sampling noise, increasing the error probability of both responses. Many sequential effects, in particular the trial-to-trial effects following an error, can be modeled in terms of these readjustments of response criteria and timing of the sampling epoch. Laming suggested further that his approach to the analysis of sequential effects can
254
M. W. van der Molen
Criterion A 1 Feedback from RA
F Pre-exposure field
Stimulus
I I
/
I
Differential equation
I
T
> Response I
RT
Delay in RT
I I
Epoch of stimulustpresentation-. ~
1
Feedback from RB ____1
Epoch of sampling onset Criterion B Figure 7.4. Laming's (1988) random walk conceptualization of the choice reaction process. [Reprinted with permission of the author and publisher.]
be used as a tool to track the effects of drugs and other stresses. He claimed that he had found experiments demonstrating statistically significant differences in RT and error proportion of less than 5 ms and 0.01, respectively. Thus, the internal parameters of the choice reaction process look to be very sensitive indicators of pharmacological effects and other stresses. The other objection against the AFM has been made most vigorously by Rabbitt in a series of papers (Rabbitt, 1979, 1986, 1988). He pointed to the well-documented fact that subjects use feedback from errors to adjust their response speed on subsequent trials. This observation is incompatible with one of the major assumptions of the additive factor logic, i.e. independent stages and no feedback between stages. Thus, it is reasoned that linear stage models are data-driven, bottom-up processors which cannot possibly describe self-regulatory processes (Rabbitt, 1979, 1986). In contrast to the AFM, which considers errors as a nuisance, Rabbitt's focus is on how subjects balance their performance between speed and errors. The ability to increase speed at the expense of accuracy, and vice versa, can be examined by computing speed-accuracy tradeoff functions (Pachella, 1974; Wickelgren, 1977), and the characteristics of these functions can be used as potential indicators of stresses. Jennings, Wood and Lawrence (1976) provide an example of how graded doses of alcohol affect the speed-accuracy tradeoff. Their subjects performed an auditory choice reaction task in which a response had to be made prior to the onset of a visual deadline signal. The deadline signal was presented with varying delays after the onset of the auditory respond signal between 175 and 375 ms in steps of 50 ms. Subjects performed their task three times with varying
Energetics and the reaction process
255
doses of alcohol. For each deadline condition, mean RT was calculated and an accuracy measure was derived from the proportion of correct responses. A linear regression analysis was then performed to produce an intercept and slope for each of the alcohol conditions. Alcohol did not affect the intercept of the speed-accuracy tradeoff function but the gain in accuracy with RT decreased with increasing alcohol dose. This finding may suggest that alcohol did impair the perceptual analysis of the stimulus. Rabbitt pointed out that the Jennings et al. speed-accuracy tradeoff analysis can be augmented by considering parameters derived from his 'tracking' model. This model assumes that subjects actively track their speed-accuracy tradeoff when they attempt to obey the instruction to 'respond as quickly and accurately as possible' typically given in RT experiments. Thus, subjects gradually increase their speed until an error is made. They will then make an immediate adjustment to the amount of evidence required before responding, so that the next response is abnormally slow and mostly correct (Rabbitt, 1979). The tracking model will require a more detailed error analysis than just tabulating the number of errors. Such an analysis has been performed in a study by Maylor and Rabbitt (1987) in which the effects of alcohol were examined on the efficiency of performance in a four-choice reaction task. Subjects performed their task twice; in one session they received alcohol (1 ml kg -1) and in another session they performed their task when sober. The results showed that responses are faster in the second session, and slower and less accurate with alcohol. A subsequent analysis of the speed-accuracy tradeoff function revealed that alcohol reduced the maximum speed at which subjects can respond without making errors. This finding was explained in terms of a decrease in the rate of accumulation of evidence. Thus, if the accumulation rate is lower with alcohol, a response within a certain time (in this study less than 600ms) is more likely to be an error with alcohol than without alcohol. This conclusion is similar to that of Jennings et al., but Maylor and Rabbitt supplemented the analysis of the speed-accuracy tradeoff by an examination of post-error responses. The examination of the post-error responses indicated that slowing of response speed following an error was more pronounced for alcohol (+ 121 ms) compared with no alcohol (+85 ms) sessions. This finding seems to suggest that alcohol not only affects the rate of information processing but also the efficiency of a higher-order control system invoked whenever an error is committed. Follow-up analyses revealed, however, that the post-error slowing was inversely related to overall response speed. Thus, the magnitude of post-error slowing does not discriminate between a slow sober individual and a fast individual slowed by alcohol. Moreover, in a subsequent study, Maylor and Rabbitt (1989) found that the post-error slowing also depends on overall error rate. Slow and accurate individuals respond more drastically to an error than fast and lenient subjects. The observation that post-error response speed depends on overall response speed and error rate suggested to Maylor and Rabbitt the possibility that these responses are affected by the same factors that control all other responses. If so, this would imply that the assumption of active control processes would be redundant, and that distributions of post-error responses can be generated by the same 'passive' stochastic models used to predict the distributions of all other responses (Maylor and Robbitt, 1989, p. 60). Unfortunately, the alcohol data obtained by Maylor and Rabbitt did not enable a decision to be made between a 'passive' or 'active' explanation of post-error response characteristics.
256 5
M. W. van der Molen
COGNITIVE-ENERGETIC
RELATIONS: A CASE STUDY
In their contributions to Energetics and Human Information Processing (Hockey, Gaillard and Coles, 1986), both Sanders and Rabbitt made the point that it would be of limited use to assess the effects of suboptimal or superoptimal conditions by examining changes in overall performance associated with changes in a single task variable. Thus, it is not very enlightening to observe that D-amphetamine produces a speeding of the reaction process. From this point, opinions diverge. Sanders pointed out that such a finding would be difficult to interpret in the absence of knowledge concerning the constituents of the reaction process. Thus, he proposed the AFM for a decomposition of the reaction process and to examine selective interactive effects of suboptimal or superoptimal conditions with tasks variables known to affect particular processing components. However, according to Rabbitt, the step from single- to multi-task variables will not suffice. First, as noted previously, Rabbitt argued that the underlying assumption of the AFM that the reaction process can be decomposed into a sequence of mutually independent stages is very unlikely (see above). Second, single-index paradigms ignore the fact that suboptimal or superoptimal conditions may influence error rate in addition to latency. Stressors having similar effects on response speed may affect accuracy in alternate ways and vice versa. Third, and most importantly, Rabbitt challenged the validity of the multitask variables approach. The calibration of stress effects on a processing component carefully isolated by task manipulation in the laboratory is unlikely to have predictive value for performance in everyday skills in which the same component is assembled in complex interactions with others. Sanders appreciated the criticism that the scope of the AFM is limited to the choice reaction paradigm and does not generalize to real-life tasks or even laboratory tasks in areas such as comprehension or problem solving. The limited scope of the AFM, however, is associated with a predictive power that is far beyond alternate approaches emphasizing 'real-life' strategic aspects of performance. This claim will be evaluated in the sections that follow. 5.1
P r o c e s s i n g Stages, B r a i n S y s t e m s a n d S u p e r v i s o r y C o n t r o l
At this point, the foundations are laid for a reappraisal of linear stage and capacity thinking. In the linear stage literature, it is now recognized that the speed of a stage is assumed to be a joint function of the operations performed on stimulus input and the presetting of stages produced by expectancies, instructions and the state of the organism. It is further assumed that the a priori state of a stage is determined by the amount of capacity allocated to that stage prior to the occurrence of the reaction stimulus. Sternberg (1969) had pointed out already that information processing views assuming allocation of limited processing capacity are antithetical to the additive factor logic. The main reason for his denial of capacity notions was that, when stages have a capacity-sharing relation, interactions between two factors may be interpreted erroneously to suggest that they affect a common stage. Thus, a hybrid model must assume multiple capacity reservoirs such that stages cannot draw from the same capacity supply. In the capacity literature, the notion of a single capacity reservoir that is equally available to all mental operations has been abandoned. Current capacity theory
Energetics and the reaction process
257
assumes the existence of multiple capacity reservoirs rather than the single capacity pool posited by Kahneman (1973). In several dual-task studies, it has been observed that increases in the demands of one task leave the performance on the other task unaffected. One possible explanation would be that the two tasks draw upon different pools of capacity. In his review of the available dual-task literature, Wickens (1980) arrived at the conclusion that there are at least two different capacity reservoirs: one associated with perception and the other with action (see also Wickens, 1984). In addition, psychophysiologists questioned the concept of a general arousal dimension. Lacey (1967) reported several instances of dissociations among central, autonomic and skeletal response measures. In order to account for a more differentiated pattern of arousal, neurophysiologists began to distinguish between distinct attentional systems of the brain. The partitioning of general processing capacity and the decomposition of unitary arousal requires a modular view of the reaction process. The idea of multiple capacity supplies and the fractionation of arousal are both incompatible with notions of a global reaction process that are entertained in single-task variable paradigms to study energetics. Thus, linear stage theory and recent developments in the capacity/arousal literature converge on hybrid models composed of independent processing operations linked to multiple capacity supplies/arousal systems. The building blocks for the cognitive-energetic edifice erected by Sanders (1983; Gopher and Sanders, 1984) have been derived from Sternberg's (1969) linear stage model of the choice reaction process, Pribram and McGuinness' (1975) fractionation of unitary arousal, and Broadbent's (1971) and Kahneman's (1973) notion of supervisory control. Figure 7.5 shows the position taken by Sanders (1983). It includes three aspects of information processing: (i) a cognitive level with stages of processing; (ii) an energetic level that allocates processing resources; and (iii) an evaluative level that corresponds to an executive process. The processing stages at
evaluation Incchanisln
..I vl
|1
cncrcgcticai lllecmmislns
I.. evalualion ]-..
effort
] 1
I.ct,activatio~
arousal
l processing stages
S
..~
stimulus preprocessing--~
+ feature extraction ~
response choice
lliOtOf
acljustllmnl
t~ H
experimental stiinulus signal S-R ti~11e variables intensity quality compatibility uncertai~ty Figure 7.5. Sanders' (1983) cognitive-energetic model of human information processing. [Reprinted with permission of the author and publisher.]
258
M. W. van der Molen
the cognitive level are established for the most part using the AFM (Sternberg, 1969). At the energetic level, there are three supply mechanisms (Pribram and McGuinness, 1975), two of which are b a s a l - arousal and activation- and coupled to input and output processing stages, respectively. The basal mechanisms are coordinated and supervised by an effort mechanism that is linked directly to the central stage of response choice. In addition, the effort mechanism serves the function of keeping the basal resources at an optimal level. In order tO be informed about the state of arousal and activation, the functioning of these basal mechanisms must be monitored. The supervisory mechanism receives at least two types of feedback information to achieve this end. In one, direct feedback reflecting the physiological state of the system is provided that guarantees intervention by the effort mechanism if there is an imbalance. In the other, evaluative feedback on the adequacy of the performance is given on a cognitive level so that performance can be monitored (Broadbent, 1971; Kahneman, 1973). Evaluation of the state of the organism as well as of the adequacy of the performance were also important elements in Kahneman's (1973) capacity model of attention. In Sanders' model, processing time is determined by the allocation of processing capacity as well as by the computations accomplished at each of the processing stages. It should be noted that the serial stage dimension of the model imposes an important constraint on its capacity dimension. Capacity cannot be distributed freely over stages so that allocation to one stage may lead to shortage of capacity for the next stage. Such a state of affairs would lead invariably to interactions that are incompatible with the assumptions underlying the AFM (Sternberg, 1969). Hence, specificity of cognitive-energetic relations is the essential assumption of Sanders' model. In other words, energetic factors have a selective rather than a global influence on the reaction process. This assumption poses a difficult problem, however. It is difficult to determine how the effects of variations in energetic and cognitive task variables can be distinguished with respect to their relation to changes in the allocation of processing capacity and in the rate of computations. The complex arrangement of the components in the flow diagram does not reveal the different consequences of variations in the linked computational and energetic mechanisms. In presenting his model, Sanders outlined two research strategies for discriminating between energetic and cognitive effects on the reaction process. The first is based on properties of the RT distribution. Sanders argued that cognitive variables would have their effect on all individual trials and, hence, on the entire RT distribution. In contrast, he argued that energetic effects would vary strongly across individual trials so that their effect would be seen primarily at the tails of the RT distribution. The second strategy concerns the distinction between effects on arousal, activation a n d / o r effort. First, an interaction between a cognitive and an energetic variable is to be interpreted as an effect of that energetic variable on the energetic mechanism that supplies capacity to the processing stage affected by the cognitive variable. Thus, an interaction between an energetic variable and stimulus degradation, for example, is interpreted as indicating that the energetic variable exerted its influence on arousal, not on activation. Second, the principal way of deciding between effects on either arousal or activation as opposed to those on effort is to consider the effects of motivational variables. Such variables are strongly related to effort allocation and, hence, their effects provide the tools for distinguishing between the more basal mechanisms of arousal and activation and the more
Energetics and the reaction process
259
voluntary mechanism of effort. Additional criteria for distinguishing these processes are that influences on the effort mechanism would (i) produce an equal effect on the supply of capacity to the arousal and the activation systems, and (ii) have an effect on response choice variables (Sanders, 1983).
5.2
Formal Analysis and Numeric Simulation
Sanders' cognitive-energetic model involves a complex arrangement of components. The detailed effects of variations of some subsets of components cannot be read off directly from the corresponding flow diagram. Thus, Molenaar and van der Molen (1986) reasoned that a combined approach of formal analysis and numeric simulation may be used to examine complex issues that cannot be efficiently resolved in a purely experimental approach. They considered first a stochastic PERT (program evaluation and review technique) network approach to Sanders' model. In this type of analysis, the delay induced by each component is described by an exponentially distributed random variable, while the couplings between components are assumed to be of a multiplicative type (i.e. a component can start processing only if all its inputs fire). There is an impressive body of results demonstrating how PERT network analysis can serve as an efficient alternative to the AFM in the identification of multicomponent design from empiric evidence (Schweickert, 1983). The PERT network approach does not allow a complete representation of Sanders' model. First, the feedback loops in Sanders' model cannot be incorporated in the PERT network. Second, the reciprocal interaction between arousal and activation in Sanders' models has to be replaced by a direct path from arousal to activation. Third, a component process cannot begin until all its predecessors have finished. Fourth, the regulatory effects of the effort and evaluation mechanisms cannot be represented in the PERT network. Thus, Sanders' model had to be trimmed to a linear stage sequence with an arousal mechanism receiving its input from pre-processing and providing an output to feature extraction and activation, the latter providing an output to motor adjustment. These disadvantages of the PERT network approach are, at least partially, compensated by the availability of closed-form expressions for the mean and variance of the network's latency. These closed-form expressions describe population parameters and determine the true mean and variance of latency, unconfounded by sampling variability. Thus, analysis of variance can be replaced by a simple plotting. Molenaar and van der Molen presented two illustrations of the formal evaluation of the PERT network representation of Sanders' restricted model. The first example referred to an independent variation of the rate parameters of the feature extraction and response choice components of the model. The second example related to the variation of the arousal and the activation components. The first simulation experiment, involving computational mechanisms, yielded a data pattern anticipated by the additive factor logic. The second simulation experiment, however, indicated that the independent variation of the two basal physiological mechanisms yielded an interaction. Thus, in contrast to the cognitive level, the additive factor logic is invalidated at the energetic level. It should be noted, however, that the PERT network representation required the physiological mechanisms
260
M. W. van der Molen
to be completed before the activation of the computational mechanisms. Obviously, this limitation presents severe constraints on cognitive-energetic modeling of the reaction process. In view of the limitations of the PERT network approach, Molenaar and van der Molen resorted to a representation of Sanders' model as an instance of Grossberg's (1982) functional-differential network. This differential neural network approach allows a noncommital mathematical description of the information flow. The components of Sanders' cognitive-energetic model are represented as input-output devices in which the input of each device, apart from optional external sources, originates from the other processing components in the model and the output is delayed because of the presence of a threshold. To obtain a stable network, the activation of each component must decay to zero in case the input is vanishing. Hence, each component in the network involves two basic types of parameters: thresholds and decay rates. The dynamics of each component in the system are represented by a differential equation in which the stochastic nature of the reaction process is incorporated by defining the rate of decay parameters as random variables with some time-independent distribution. The neural network representation involved a more complete representation of Sanders' model compared with the PERT approach. The network included the effort and evaluation mechanisms and allowed a reciprocal relation between arousal and effort. As in the PERT network, the regulatory effects of effort and evaluation were not incorporated. The neural network representation was used for four simulation experiments; two experiments considered a multiplicative network whereas the two other simulations related to an additive network representation. The results of the multiplicative network simulations agreed with the findings yielded by the PERT network analysis. The independent variation of feature extraction and response choice resulted in significant main effects but no significant interaction, whereas the variation of arousal and activation yielded significant main effects and a significant interaction. Thus, in multiplicative networks, of either the PERT or neural network type, the additive factor logic may lead to invalid inferences when energetic mechanisms are involved. In contrast, additive network simulations confirmed the additive factor logic both at the level of computation and the level of capacity supply. This particular finding led Molenaar and van der Molen to conclude that a system of stochastic differential equations constitutes a reasonable representation of multicomponent models of RT such as Sanders' model. Although such a system is highly under-determined by the available (output) RT data, a combined approach involving simulation and experimentation may provide interesting and unambiguous results.
5.3 Performance Analysis Frowein's (1981) additive factor analysis of drug effects (see above) was interpreted by Sanders (1983) as providing support for the specificity of cognitive-energetic relations. Recall that Frowein found that treatment with a barbiturate and variations in stimulus quality produced interactive effects on the reaction process, but that manipulations of S-R compatibility and foreperiod duration produced additive effects. Logsdon et al. (1984) reported similar findings in their examination of the
Energetics and the reaction process
261
effects of secobarbital on perceptual processing (i.e. the drug effect interacted with the quality of the stimulus). According to Sanders, these findings suggest that barbiturates have a negative effect on the state of the arousal system, which in turn produces a detrimental effect on the stimulus-encoding stage. Amphetamine, on the other hand, seems to have its predominant effect on the activation mechanism. This drug has additive effects with perceptual or decisional variables, but interacts with time uncertainty. The effects of sleep loss were interpreted along similar lines. Recall that sleep deprivation has been found to interact with signal degradation (Sanders et al., 1982) and time uncertainty (Frowein, 1981). Thus, Sanders reasoned that sleep deprivation may have a negative effect on both the arousal and activation systems. The coordinating role of the effort mechanism has been examined in a study by van der Molen et al. (1987). They performed two experiments in which the task variables used to manipulate processing stages were the quality of the stimulus (intact versus degraded), the number of response alternatives (two versus four), and the duration of the foreperiod (short (4s) versus long (12s) in the first experiment and fixed (6 s) and variable (6, 9 or 12 s) foreperiods in the second experiment). As reviewed earlier, these task variables are commonly observed to have additive relations, suggesting that they influence different stages of processing: stimulus encoding, response choice and motor adjustment, respectively. In addition, both experiments included a task variable that van der Molen et al. labeled as 'task involvement'. This variable was not manipulated. It was inferred from a post hoc analysis of trial-to-trial fluctuations in the best (fastest) and the worst (slowest) reactions (defined as the first and fourth quartiles of the RT distribution). In the RT literature, this comparison is often made in speed-accuracy tradeoff analyses (Wickelgren, 1977), and is referred to as a microanalysis, as opposed to a macroanalysis (which involves the use of response deadlines, response signals or variations in response speed instructions), of the speed-accuracy function. Van der Molen et al. assumed that this analysis would reveal effects on the effort mechanism as it coordinated the activity of the basal mechanisms, arousal and activation. To engage further the effort mechanism in its coordinating and supervising role, performance feedback was provided after each trial. In the first experiment, the threat of shock was used as an additional energetic task variable. Following Sanders (1983, p. 81), anticipation of a threatening stimulus was expected to induce a pattern of stress dominated by overactivation. The shock was presented randomly in lieu of the imperative stimulus on 0, 1, 2 or 3 trials in a block of 42 trials. The experimental factors were blocked in experiment 1, but mixed randomly within a block in experiment 2. The mixed presentation was designed to place a greater task load on the effort system. The performance data from both experiments revealed a pattern that is at the same time compatible and at variance with the additive factors literature. When the experimental factors were blocked and there was no threat of shock, the cognitive task variables contributed additively to mean RT. This finding is consistent with previous research and suggests that the three variables (stimulus quality, number of response choices and foreperiod duration) affect three different stages of processing (stimulus encoding, response choice and motor adjustment). The threat of shock prolonged RT considerably, as has been shown in previous reports (Jennings et al., 1971; Somsen, van der Molen and Orlebeke, 1983). In contrast to Sanders' prediction, however, the threat of shock interacted weakly with the duration of the
262
M. W. van der Molen
foreperiod but strongly with the quality of the stimulus. Thus, following Sanders' criterion, anticipation of the threatening stimulus seems to have its predominant effect on the state of the arousal mechanism. Finally, task involvement interacted with all of the cognitive variables, as well as with the threat of shock. By and large, the effects of these variables were half the size for the fastest quartile of RTs as compared with the slowest quartile. This pattern was interpreted as reflecting the working of a central mechanism that counteracts the detrimental effects of degraded stimuli, an increased load on the response choice stage, time uncertainty and the anticipation of a threatening stimulus. Thus, dichotomizing the RT distribution into its best and worst times may permit the compensatory activities of the effort mechanism to be revealed. The interpretation of the results was complicated, however, by the finding that the serial stage structure of the reaction process obtained under blocked conditions with no threat of shock was destroyed by the threat of shock. Thus, the entire pattern of results did not conform to the 'stage robustness' criterion of the additive factor logic. That is, the relation between two factors should not change when a third factor is added to the design. The results from the van der Molen et al. study showed, however, that the additive relation between the quality of the stimulus and the number of response choices changed into an overadditive interaction when subjects were faced with the threat of shock. Interestingly, a similar pattern emerged when RTs were dichotomized into fastest and slowest quartiles. An additive relation was found between the quality of the stimulus and the number of response choices for the response latencies in the fastest quartile of the RT distribution, but an interactive relation was found for the responses in the slowest quartile. This pattern was replicated in the second experiment. Van der Molen et al. suggested that the overadditive interaction (between stimulus quality and response choice) reflected a malfunctioning of the arousal system when task involvement was low and there was the threat of shock. More specifically, they assumed that (i) efficient stimulus encoding requires the preactivation of internal codes when the stimuli are degraded, and (ii) under suboptimal conditions the state of the arousal mechanism is lowered to prevent adequate preactivation, particularly when several stimulus alternatives are involved. Thus, it could well be possible that the output of the stimulus-encoding stage was not identical in the two- and four-choice conditions. An overadditive interaction would be expected if in the four-choice condition the response choice stage received a more distorted output than in the two-choice condition.
5.4 Psychophysiological Analysis Van der Molen and coworkers reasoned that psychophysiological analysis may be an additional method for testing a model like that of Sanders' (van der Molen et al., 1987). They used components of the heart rate (HR) response to complement performance measures in distinguishing energetic from cognitive effects on the choice reaction process. To derive a set of specific predictions concerning the relation between HR and the state of energetic components in Sanders' model, they followed Sanders in resorting to Pribram and McGuinness' (1975) conceptualization of arousal, activation and effort, but they deviated from Pribram and McGuinness' views in several important respects. For Pribram and McGuinness, any information
Energetics and the reaction process
263
input triggers a response from the arousal system which is reflected by changes of sympathetic nervous system activity, inducing HR acceleration. The activation system is involved when the subject maintains a set to respond to external events. Pribram and McGuinness suggested that a state of readiness will be reflected in HR deceleration. Conversely, when problem solving takes place, HR acceleration occurs that reflects the task demands on the effort system (see also McGuinness and Pribram, 1980). Van der Molen and colleagues pointed out that the change in HR that Pribram and McGuinness associated with the three systems of attention may not be related directly to the typical pattern of HR change observed during performance of a signaled choice RT task (van der Molen et al., 1987, p. 255). In the time interval between the warning stimulus and the imperative stimulus, a triphasic HR response is usually observed that consists of an initial deceleration, then an acceleration, and, finally, a more pronounced deceleration that reaches its nadir at some point in time near the response to the imperative stimulus. Thus, van der Molen et al. reasoned that the deceleration preceding the imperative stimulus may be a manifestation of the coordinating and supervising role played in the choice reaction process by the effort mechanism. During focused temporal anticipation- as in the signaled choice RT t a s k - the voluntary effort mechanism may maintain the basal mechanisms in a state of readiness to receive input and this, in turn, facilitates performance of the anticipated action. In contrast to Pribram and McGuinness, van der Molen et al. pointed out that the phasic response to input is cardiac deceleration, not acceleration. This position is supported in the cardiac cycle time literature in which it has been demonstrated that information input elicits HR slowing, 'primary bradycardia', in association with the allocation of capacity to the early perceptual stages of information processing (Coles and Strayer, 1985). Finally, van der Molen et al. suggested an additional HR measure, one that can be used as a chronometric index of response initiation, namely, the transition from anticipatory HR deceleration to acceleration, 'vagal inhibition', when response execution mechanisms are engaged. For the present discussion, it is important to determine whether the analysis of HR changes during the reaction process can contribute to a deeper understanding of the cognitive-energetic relations. Figure 7.6 illustrates the HR responses obtained by van der Molen et al. in the second experiment. Recall that in this experiment time uncertainty was varied by comparing responses in a fixed foreperiod reaction (6 s) with those in a variable foreperiod reaction (6, 9 or 12 s). The morphology of the HR response observed under these conditions indicates that the subject did not wait passively for the imperative stimulus to arrive, but made active attempts to predict the time of its occurrence. With a fixed foreperiod, the deceleratory response showed a steep trend that reached its nadir at the time at which the stimulus occurred. With variables foreperiods, HR responses did not differ between foreperiods for the first 6 s. In this condition, subjects seemed to set their response to the medium length of the foreperiod alternatives. At the shorter length, the response was interrupted by the early presentation of the imperative stimulus, whereas at the longer length it leveled off beyond the medium length until the stimulus was presented. The dynamic nature of the anticipatory HR response led van der Molen et al. to suggest that this response reflects voluntary tuning of the basal mechanisms by the effort system. A second important finding was that there was a pronounced increase in the anticipatory deceleration just prior to or at the moment of stimulus occurrence
264
M. W. van der Molen
-(D ~ 76 .4,-,a 9~ 75
E
74 r/3 4,,,a
(D
73 ------m---
.4,,,a
t::
72 71
70
'
0
Fixed 6 s
~-
Variable 6 s
--
Variable 9 s
r
Variable 12 s I
4
'
I
8
'
I
12
'
I
16
'
I
20
'
I
24
'
I
28
'
I
32
'
I
36
'
40
Time (.5 s) Figure 7.6. The effect of foreperiod on the heart rate response. Data from van der Molen et al. (1987). [Reprinted with permission of the authors and publisher._]
when the foreperiod was fixed (i.e. time uncertainty was low). This cardiac response must be differentiated, however, from that seen when the foreperiod is variable but an external cue is presented that enables the subject to predict the exact time when the stimulus will occur (Jennings, van der Molen and Terezis, 1987). Under these conditions, the commonly observed triphasic HR response is absent and there is only a deceleration immediately prior to the stimulus occurrence. This pattern is consistent with the suggestion of van der Molen et al. that the deceleration they observed actually consists of two components, one associated with temporal prediction and the other with processing capacity becoming available just prior to the arrival of the stimulus. Also in accord with this speculation is their finding that the length of the foreperiod exerted an effect on primary bradycardia and vagal inhibition. Primary bardycardia was stronger when the length of the foreperiod was fixed than when it was variable. This finding suggests that, when the subject is able to predict the time of stimulus onset, processing capacity can be allocated to the perceptual stages of the reaction in a timely fashion. This interpretation requires that either the threat of shock or variations in the quality of the stimulus influence primary bradycardia as well. However, neither factor modified the magnitude of primary bradycardia. Finally, time uncertainty affected vagal inhibition. This effect was more pronounced when the length of the foreperiod was fixed (i.e. time uncertainty was low) than when it was variable (i.e. time uncertainty was high), suggesting that HR deceleration shifts from deceleration to acceleration when the response initiation stage is engaged. In discussing the relation between the reaction process and the cardiac responses, van der Molen et al. pointed out that Sanders' cognitive-energetic model emphasizes presetting mechanisms and their influence on processing stages in
Energetics and the reaction process
265
anticipation of the occurrence of the imperative stimulus. Van der Molen et al. suggested that measures derived from the electrocardiograph are particularly useful in studying covert behavior during the preparatory interval. They summarized the changes in HR during the foreperiod as follows (pp. 284-286): (1) The first manifestation of preparation during the foreperiod is cardiac slowing. This response disappears when there is no need for temporal prediction (Jennings et al., 1988). (2) Subsequently, deceleration is added just prior to the stimulus. The amplitude of this response increases with the likelihood of a motor response (van der Molen et al., 1987). (3) The imperative stimulus elicits 'primary bradycardia'. The amplitude of this phase-dependent cardiac slowing increases with stimulus discriminability (Jennings et al., 1990) and decreases with increases in time uncertainty (van der Molen et al., 1987). (4) Finally, initiation of the motor response induces 'vagal inhibition'. That is, the timing of the cardiac shift from deceleration to acceleration depends on exactly where in the cardiac cycle the response to reaction stimulus will be initiated (Somsen et al., 1985). To explain this pattern of results, van der Molen et al. proposed an integration of Niemi and N/i/it/inen's (1981) conception of the foreperiod effect with Sanders' cognitive-energetic model. Niemi and N/it/inen argued that the preparatory phase of a mental reaction that takes place during the foreperiod includes both perceptual and motor components. Perceptually, preparation is assumed to consist of a process whereby a mental image of the stimulus is retained and rehearsed, a process that facilitates its subsequent identification. Motor preparation, in their view, consists of a priming-like activation of the response system toward the 'motor action limit' that facilitates the attainment of this limit when a particular response is engaged. Given this combination of intensive processes, it is not surprising that it is difficult and energy consuming to maintain a constant, high state of preparation (Gottsdanker, 1975). Niemi and N/i/it/inen's conceptualization of the processes engaged during the foreperiod of a mental reaction and their effect on the speed of the reaction can be incorporated into Sanders' framework. Following Sanders, van der Molen et al. assumed that the two more central stages of the choice reaction process, stimulus encoding and identification, receive energetic support from the 'arousal' system, the temporary state of which might be altered by input from the early stimulus pre-processing stage (see diagram of Sanders' model in Figure 7.5). It was also assumed that the output of the stimulus identification stage might elicit a phasic response by the arousal system. Van der Molen et al. suggested that the effect of perceptual preparation on speeded RT tasks consists primarily of more effective processing of sensory input at the pre-processing stage. Thus, when the subject is prepared at the optimal level, stimulus pre-processing exerts a stronger effect on the arousal system. They pointed out that the active processes involved in perceptual preparation may be similar to the notion of selective attention offered by Posner and Boies (1971). Consider a subject who is instructed to attend to one set of stimuli and to ignore another. When a designated stimulus is presented, the output of the identification process may trigger a response from the arousal system,
266
M. W. van der Molen
the magnitude of which is proportional to the significance of the stimulus. Van der Molen et al. hypothesized that primary bradycardia is the autonomic component of this arousal response to significant stimuli. At the response end of the choice reaction process, the motor adjustment or initiation stage can be viewed as the stage at which the motor action limit is exceeded and motor readiness automatically engages response execution processes. During the foreperiod, the subject attempts to maintain an optimal level of response readiness to minimize the amount of activation required to attain the motor action limit. To do so, however, consumes energy. Therefore, the call for energetic support occurs only when the subject expects the stimulus to occur. Van der Molen et al. suggested that the sudden enhanced deceleration of HR immediately prior to the arrival of the imperative stimulus reflects energetic support provided by the activation system. When motor initiation processes are completed, response execution begins (i.e. the muscular processes that are necessary to execute the response). They suggested that the timing of these processes is reflected in the timing of the shift from HR deceleration to acceleration. An important element in Sanders' cognitive-energetic framework is the evaluation mechanism, a system that acts as a governor to determine the amount of energetic support needed to satisfy the demands of the task or to titrate the basal arousal and activation systems. Van der Molen et al. noted that this formulation bears a strong resemblance to the concept of expectancy that Niemi and N/i/it/inen invoked to explain the effects of time uncertainty on response latency. Accordingly, they assumed that, during the preparatory period in an RT task, evaluation consists of an active prediction of the arrival time of the stimulus. They assumed further that expectancy exerts its influence on preparation via a mechanism they called 'effort', which in turn coordinates engagement of the arousal and activation systems. Van der Molen et al. suggested that anticipatory HR deceleration associated with the need for temporal prediction reflects a specific pattern of coordination by the effort system, namely, holding available processing capacity in anticipation of subsequent action. The attentive reader should have noticed that in the van der Molen et al. conceptualization of the influence of presetting on processing stages and its cardiac concomitants, the processes related to response choice are strikingly absent. This omission might be a result of the fact that Niemi and N/i/it/inen's discussion of the foreperiod effect is to simple reactions only, while most additive factor studies of the choice reaction process employ highly overlearned or natural S-R relations. In the van der Molen et al. study, for example, response choice was manipulated by varying the number of response choices while keeping the S-R mappings constant. According to Sanders, active processes involved in the presetting of response choice can profit from energetic supply from the effort system which is used for handling incompatible S-R mappings or, in a more general sense, is needed for an adequate functioning of the response choice stage. In Sanders' view, processing at this stage can be considered to be 'conscious', constituting therefore one of Posner's (1978) components of attention. On the basis of Pribram and McGuinness' (1975) neurophysiological review, one would predict that, whenever the effort system is invoked for handling incompatible S-R mappings, the direct connection between arousal and activation is uncoupled and this would be associated with HR acceleration.
Energetics and the reaction process
267
940 Cr
E
,~
920 900 880 D----
m
two-choice simple mapping
860 =
840
I
-8
-6
complex mapping '
I
-4
'
I
-2
'
I
0
2
4
Sequential Inter Beat Intervals Figure 7.7. Cardiac inter-beat interval responses as a function of types of reaction task: two-choice, four-choice with a compatible S-R mapping (simple) and four-choice with an incompatible mapping (complex). The inter-beat intervals starting with the sixth beat prior to the stimulus and ending with the second beat following the stimulus are depicted. [Reprinted with permission of the authors and publisher.l The prediction that incompatible S-R mappings elicit HR acceleration was tested recently by Jennings et al. (1990). They manipulated compatibility by varying the spatial mapping between stimuli and responses in a four-choice RT task. For simple mapping, stimuli were mapped from left-to-right onto response keys that were aligned similarly. For complex mapping, the spatial relations between the stimuli and responses were randomized. HR responses differed from the triphasic pattern seen typically in choice RT tasks. Figure 7.7 shows that deceleration was maximal during the second cardiac inter-beat interval after the stimulus. For the complex, but not for the simple, mapping a brief secondary deceleration occurred during the cardiac interbeat interval after the imperative stimulus. The secondary deceleration was stronger for short inter-stimulus intervals and slower responses than for long inter-stimulus intervals and faster responses. Jennings et al. concluded that these findings suggest that effortful mapping of a stimulus onto a response may induce transient deceleration. In terms of the Sanders' model, this transient deceleration might reflect the uncoupling of the arousal and activation mechanisms or, in other words, the active inhibition of overlearned or natural S-R relations. In a subsequent study, Jennings and colleagues further examined the relation between inhibition of natural S-R connections and transient cardiac deceleration. In this study, they used Logan's (1981) stop paradigm which has been frequently used to study the issue of voluntary control versus ballistic action. In this paradigm, subjects perform a choice reaction task and, occasionally and unpredictably, a signal is presented that requires them to withhold the motor response to the reaction stimulus. The stop signal may occur at several delays following the
268
M. W. van der Molen
presentation of the reaction stimulus. In the Jennings et al. study, stop signals were presented on 30% of the trials and with a delay of either 50 or 150 ms after the onset of the reaction stimulus. In addition to behavioral and cardiac responses, electromyographic activity was picked up from the forearm to provide a measure of peripheral response activation. Muscle activity must occur prior to the closure of the response key, but key closure is not an inevitable consequence of muscle activity. Thus, four trial types may be distinguished: 'respond' trials in which there is no stop signal; 'complete inhibition' trials in which the subject successfully refrains from responding; 'inhibition failure' trials in which the motor response to the reaction signal escapes from inhibition; and 'partial inhibition' trials in which there is electromyographic activity but which is not followed by an overt motor response. Jennings et al. predicted HR deceleration if subjects withheld their motor response to the reaction stimulus. This had been observed in several previous studies in which subjects performed a c-task (Lacey, 1972; van der Molen, Somsen and Orlebeke, 1983; van der Molen et al., 1989). In those studies, HR deceleration at the time of the stimulus continued longer on NoGo compared with Go trials. Figure 7.8 illustrates the inter-beat interval response for the four trial types. The figure shows the morphology that is typically obtained in speeded motor t a s k s anticipatory deceleration up to the inter-beat interval during which the reaction stimulus is expected and which is followed by acceleratory recovery. More importantly, Figure 7.8 shows that 'complete' and 'partial inhibition' trials induce more
980
Inhibit Failure
960" 940
>
r
Inhibit
=
Partial Inhibit
920 l-if
900 880
-
860 840
' -4
, -3
'
, -2
Sequential
'
,
'
-1 Inter
, 0 Beat
'
, 1
'
, 2
' 3
Interval
Figure 7.8. Inter-beat interval responses for 'respond', "partial inhibition', 'complete inhibition' and 'inhibition failure" trials. The inter-beat interval in which the stop signal occurred is labeled 0 on the x-axis." Sequential inter-beat intervals starting with the third beat prior to the stimulus and ending with the second beat following the stimulus are depicted. [Reprinted with permission of the authors and publisher.]
Energetics and the reaction process
269
deceleration at the time of the reaction stimulus than 'failed inhibition' trials. 'Respond' trials seem to elicit an intermediate cardiac slowing but the difference with 'failed inhibition' trials was statistically not significant. Thus, a delay in cardiac timing seems to occur when inhibition successfully halts response processing. Jennings et al. suggested that the lengthening of the inter-beat interval reflects the midbrain inhibition of the central command for the button-closure response as implied by de Jong et al. (1990). Such an interpretation would be consistent with a two-process theory of speeded actions in which the process of motor programming is separated from the centrally organized energetic aspects of the motor act, the output of which has been coined as the 'Go' signal (Bullock and Grossberg, 1988). Thus, inhibition appears to be a powerful control mechanism which is able to produce fast interrupts at subordinate levels of processing. In this respect, inhibition may be one of the executive functions ascribed by Pribram and McGuinness (1975) to the effort system.
6
RECULER POUR MIEUX SAUTER
The goal of this chapter was to present a descriptive analysis of the revived interest in state-related changes in the reaction process. Three main themes have emerged from the present discussion. The first theme refers to ongoing attempts to specify the locus of energetic factors in the chain of information processing. A second thread running through the literature is concerned with the control that can be exerted over the reaction process in response to changes in organismic or environmental conditions. The third long-lived issue is centered on the biological manifestations of state changes in the reaction process. The attempts to specify the locus of energetic factors are based on the foundations of stage analysis formulated by Donders (1868) who assumed that the reaction process is composed of a sequence of independent processing stages. He proposed the subtraction method for measuring stage durations. This method can also be used for determining the selective influence of energetic factors. For example, if an energetic factor prolongs the duration of the choice reaction process but does not affect the speed of responding in a disjunctive RT task, then the subtraction method allows the investigator to conclude that the performance decrement is due to a selective effect on the response selection stage. The stage analysis of the reaction process has been refined by Sternberg (1969) who, like Donders, assumed that the reaction process consists of a unidimensional sequence of processing stages with discrete transmission between stages. He developed the additive factor logic to infer stages from the results of multifactorially designed experiments. The AFM has been used to determine what stage or stages are altered as a result of energetic manipulations. For example, if a drug is shown to interact exclusively with a task variable known to influence early perceptual processing, then the additive factor logic suggests that the locus of the drug effect is on perception not on action. Thus, the stage analysis of the reaction process seems to provide a convenient tool for tallying energetic factors. The stage analysis of the reaction process has been criticized from the moment of its inception. Research in Wundt's (1910) laboratory seemed to invalidate the
270
M. W. van der Molen
subtraction method in suggesting a profound difference between sensorial and muscular reactions. Instead of dissecting out of the reaction process the durations of successive stages, Wundt examined the effects of energetic factors on total RT. Thus, he provided a catalog of drug effects based on time-on-task fluctuations of processing speed. Rabbitt (1986) argued, however, that differences in mean RT can tell little about state-related changes of the reaction process. Only distributions of correct and incorrect responses are informative. The analysis of alcohol effects on the speed and accuracy of responding suggested to him a response control model of choice RT in which a central executive organizes the efficiency of successive operations. Previously, Broadbent (1971) arrived at a similar conclusion. He observed that under some conditions stresses fail to influence speed and accuracy of the choice reaction process. This finding suggested the notion of compensatory control. Thus, the potentially debilitating effects of state changes on performance may be compensated by the regulatory actions of a supervisory mechanism. Broadbent (1971) also addressed the issue of the neural implementation of the hierarchical model of the reaction process. The idea that higher brain mechanisms control lower ones is an old one in physiology. Ribot (1919), for example, suggested that the frontal lobes exercise inhibitory control over lower brain systems to facilitate a state of voluntary attention at the expense of involuntary attentional responses to novel stimuli. Many recent efforts have been directed to articulate further these two types of attentional control. Current notions of 'arousal' as a phasic response to novelty and 'activation' as a state of response readiness are not basically different from Ribot's attentional typology (Pribram and McGuinness, 1975; Tucker and Williamson, 1984). Moreover, recent results emerging from brain studies provide strong support for Ribot's speculation that the frontal lobes are implicated in the control of attention (see the review in Fuster, 1989). The selective review of energetic approaches to the reaction process was wrapped up in Sanders' (1983) attempt to play all three themes in concert. His cognitive-energetic model presents a daring synthesis of linear stage thinking, capacity theory and regulatory principles. In discussing the the feasibility of a unifying cognitive-energetic theoretical framework, Hitch (1986) argued that energetics and information processing should be integrated within a purely psychological analysis. This is just what Sanders did. His model integrates capacity theory in linear stage thinking. Like Kahneman (1973), Sanders suggested that processing stages need two types of input: information and capacity. Within a stage, continuous capacity changes may occur and these graded changes may account for the variability of human performance. The communication between stages should be discrete, however, and stages are prohibited to have a capacity-sharing relation. Thus, a cognitive-energetic framework must assume the existence of multiple capacity supplies. Sanders' explicit reference to the Pribram and McGuinness neurophysiological control systems of attention does not imply, however, that he followed Kahneman in identifying psychological capacity with physiological arousal and in using physiological data as measures of the choice reaction process. In fact, Sanders (1990) is overly critical of current attempts to assess the temporal dynamics of processing stages by using psychophysiological measures (Coles et al., 1985). Furthermore, although the labels 'arousal', 'activation' and 'effort' may create the impression that Sanders is referring to neurophysiological mechanisms, he is using these terms to denote capacity supplies in a purely psychological sense without alluding to neurophysiological underpinnings.
Energetics and the reaction process
271
Where does Sanders' psychological synthesis bring us? In its present form, it provides a taxonomy of state and computational variables. It allows the investigator to determine what stage or stages are altered when a change in the state of the organism influences overall performance. In this respect, the results of Sanders' approach are not too different from Wundt's catalog of drug effects, Hockey's (1984) sketch map of stress effects and Rabbitt's (1986) psychometric classification of stresses. The important difference between the descriptive analyses of Hockey and Rabbitt is, however, that the cognitive-energetic model allows strong predictions. But the price paid for this advantage is high. Strong predictions may limit the scope of the cognitive-energetic model to artificial laboratory conditions which have little to do with real-life stress and performance in applied settings. The conclusion that a century of research effort amounted only to a somewhat more detailed sketch of energetic influences and a strong research tool for laboratory use seems rather depressing, but progress can be made. First, the performance analysis of cognitive-energetic aspects of the choice reaction process should be supplemented with mathematical modeling. Laming (1988) suggested how energetic constructs may be incorporated in the decision as special parameters of a random walk. Another example has been provided by Wickens (1986) who suggested an integration of energetic concepts and information processing analysis by attaching gains to specific processing operations. Finally, the Molenaar and van der Molen (1986) neural network examination of Sanders' cognitive-energetic model provides a convincing case of how mathematical modeling can be married with a performance analysis of the choice reaction process. Second, in contrast to Sanders but in accord with Kahneman, the performance analysis of the cognitive-energetic aspects of the reaction process should be augmented with measures derived from brain function. A combined approach may contribute not only to the analysis of the organizational and temporal aspects of information processing by providing more detailed information about elementary mental operations and their implementation in the brain, but also by providing the possibility of examining variations in mental reactions under a wide range of conditions that have been referred to as 'energetic' states. A combined approach will unite cognitive and energetic views of the reaction process, and in so doing will revive 19th century experimental psychology. The basic ideas may not have changed but there has been a considerable improvement in experimental sophistication, measurement techniques and mathematical analyses that we may expect to witness rapid progress in the study of attention in the decade of the brain. In this respect, the energetics of attention is to be understood as 'reculer pour mieux sauter'.
A CKN O WL ED G EME NT S The preparation of this chapter and parts of the research described were supported by NWO grants no. 153209, no. 560-263-023 and no. 560-265-026 and NIMH grant no. 40418. Discussions with Dick Jennings, Peter Molenaar, Ted Bashore, Riek Somsen and Evert-Jan Stoffels are gratefully acknowledged. Special thanks are due to Atie Vogelenzang-de Jong for helping and putting the chapter together.
272
M. W. van der Molen
REFERENCES Ach, N. (1905). fflber die Willenstiitigkeit und das Denken. G6ttingen: Vandenhoeck. Allport, D. A., Antonis, B. and Reynolds, P. (1972). On the division of attention: A disproof of the single channel hypothesis. Journal of Experimental Psychology, 24, 225-235. Allport, F. H. (1955). Theories of Perception and the Concept of Structure. New York: Wiley. Bahrick, H. P., Fitts, P. M. and Rankin, R. E. (1952). Effects of incentives upon reaction to peripheral stimuli. Journal of Experimental Psychology, 44, 400-406. Beatty, J. (1982). Task-evoked pupillary responses, processing load and the structure of processing demands. Psychological Bulletin, 91, 276-292. Berlyne, D. E. (1951). Attention to change. British Journal of Psychology, 42, 269-275. Berlyne, D. E. (1960). Conflict, Arousal and Curiosity. New York: McGraw-Hill. Boring (1970). A short historical perspective. In D. I. Mostofsky (Ed.), Attention: Contemporary Theory and Analysis (pp. 3-8). New York: Appleton-Century-Crofts. Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press. Broadbent, D. E. (1970). Stimulus set and response set: Two kinds of selective attention. In D. I. Mostofsky (Ed.), Attention: Contemporary Theory and Analysis (pp. 51-60). New York: Appleton-Century-Crofts. Broadbent, D. E. (1971). Decision and Stress. London: Academic Press. Bullock, D. and Grossberg, S. (1988). Neural dynamics of planned arm movements: Emerging invariants and speed-accuracy properties during trajectory formation. Psychological Review, 95, 49-90. Callaway, E. and Stone, G. (1960). Re-evaluating the focus of attention. In L. Uhr and J. G. Miller (Eds), Drugs and Behavior (pp. 393-398). New York: Wiley. Carli, M., Robbins, T. W., Evenden, J. and Everitt, J. (1983). Effects of lesions to ascending noradrenergic neurons on performance of a 5-choice serial reaction task in rats: Implications for theories of dorsal noradrenergic function based on selective attention and arousal. Behavioural Brain Research, 9, 361-380. Coles, M. G. H., Gratton, G., Bashore, T. R., Eriksen, C. W. and Donchin, E. (1985). A psychophysiological investigation of the continuous flow model of human information processing. Journal of Experimental Psychology: Human Perception and Performance, 11, 529-553. Coles, M. G. H. and Strayer, D. L. (1985). The psychophysiology of the cardiac cycle time effect. In J. F. Orlebeke, G. Mulder and L. J. P. van Doornen (Eds), Psychophysiology of Cardiovascular Control: Models, Methods and Data (pp. 517-534). New York: Plenum Press. De Jaager, J. J. (1865). De Physiologische tijd bij psychische processen. In J. Brozek and M. S. Sibinga (Eds), Origins of Psychometry: Johan Jacob de Jaager, Student of F. C. Donders. Nieuwkoop: B. de Graaf. De Jong, R., Coles, M. G. H., Logan, G. D. and Gratton, G. (1990). In search of the point of no return: The control of response processes. Journal of Experimental Psychology: Human Perception and Performance, 16, 164-182. Dennet, D. (1984). Cognitive wheels; the frame problem of AI. In C. Hookway (Ed.), Minds, Machines and Evolution. Cambridge: Cambridge University Press. Donders, F. C. (1868). On the speed of mental processes. In W. G. Koster (Ed.), Attention and Performance II (Acta Psychologica, 30, 1969). Amsterdam: North-Holland. Easterbrook, J. A. (1959). The effect of emotion on cue utilization and the organization of behavior. Psychological Review, 66, 183-201. Freeman, G. L. (1940). The relationship between performance level and bodily activity level. Journal of Experimental Psychology, 25, 602-608. Frowein, H. (1981). Selective drug effects on information processing. Doctoral Thesis. University of Utrecht, The Netherlands. Fuster, J. M. (1989). The Prefrontal Cortex. Anatomy, Physiology and Neuropsychology of the Frontal Lobe, 2nd edn. New York: Raven Press.
Energetics and the reaction process
273
Gibson, J. J. (1941). A critical review of the concepts of set in contemporary experimental psychology. Psychological Bulletin, 38, 781-817. Gopher, D. and Sanders, A. F. (1984). S-Oh-R: Oh stages! Oh resources! In W. Prinz and A. F. Sanders (Eds), Cognition and Motor Behavior. Heidelberg: Springer. Gottsdanker, R. (1975). The attaining and maintaining of preparation. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance V (pp. 33-49). London: Academic Press. Grossberg, S. (1982). Studies of Mind and Brain: Neural Principles of Learning, Perception, Development and Motor Control. Dordrecht, Holland: D. Reidel. Hamilton, P., Hockey, G. R. J. and Rejman, M. (1977). The place of the concept of activation in human information processing theory: An integrative approach. In S. Dornic (Ed.), Attention and Performance V/. Hillsdale, NJ: Erlbaum. Helmholtz von, H. (1850). Messungen fiber den zeitlichen Verlauf der Zuckung animalischer Muskeln und die Fortpflantzungsgeschwindigkeit der Reizung in den Nerven. Archiv fiir Anatomie, Physiologie. und Wissenschafiliche Medicin, 276-364. Heymans, G. (1927). Uber die Anwendbarkeit des Energiebegriffes in der Psychologie. In Gesammelte Kleinere Schriften zur Philosophie and Psychologie (Zweiter Teil, pp. 319-359). Leipzig: Barth. Hitch, G. J. (1986). Energetical aspects of information processing: Some pretheoretical issues. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 425-434). Dordrecht, The Netherlands: Nijhoff. Hockey, G. R. J. (1984). Varieties of attentional state. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention (pp. 449-483). New York: Academic Press. Hockey, G. R. J., Gaillard, A. W. K. and Coles, M. G. H. (Eds) (1986). Energetics and Human Information Processing. Dordrecht, The Netherlands: Nijhoff. Houston, B. K. and Jones, T. M. (1967). Distraction and Stroop color-word performance. Journal of Experimental Psychology, 74, 54-56. James, W. (1890). The Principles of Psychology. London: Dover. Jennings, J. J., Wood, C. C. and Lawrence, B. E. (1976). Effects of graded doses of alcohol on speed-accuracy tradeoff in choice reaction time. Perception and Psychophysics, 19, 85-91. Jennings, J. R., Averill, J. R., Opton, E. M. and Lazarus, R. S. (1971). Some parameters of heart rate change: Perceptual versus motor requirements, noxiousness and uncertainty. Psychophysiology, 7, 194-212. Jennings, J. R., van der Molen, M. W., Brock, K. and Somsen, R. J. M. (1992). On the synchrony of stopping responses and delaying heart beats. Experimental Psychology: Human Performance and Perception, 18, 422-436. Jennings, J. R., van der Molen, M. W., Somsen, R. J. M. and Terezis, C. (1990). On the shift from anticipatory heart rate deceleration to acceleratory recovery: Revisiting the role of response factors. Psychophysiology, 27, 385-395. Jennings, J. R., van der Molen, M. W. and Terezis, C. (1987). Primary bradycardia and vagal inhibition as two manifestations of the influence on the heart beat. Journal of Psychophysiology, 4, 361-374. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall. Koch, S. and Leary, D.E. (Eds) (1985). A Century of Psychology as Science. New York: McGraw-Hill. Lacey, J. I. (1967). Somatic response patterning and stress: Some revisions of activation theory. In M. H. Appley and R. Trumbull (Eds), Psychological Stress: Issues in Research. New York: Appleton-Century-Crofts. Lacey, J. I. (1972). Some cardiovascular correlates of sensorimotor behavior: Example of visceral afferent feedback? In C. H. Hockman (Ed.), Limbic Mechanisms and Autonomic Function (pp. 175-201). Springfield, IL: Thomas. Laming, D. (1988). Some boundary conditions of choice reaction performance. In I. Hindmarch, B. Aufdembrinke and H. Ott (Eds), Psychopharmacology and Reaction Time (pp. 6577). New York: Wiley.
274
M. W. van der Molen
Lange, L. (1888). Neue Experimente fiber den Vorgang der Einfachen Reaction auf Sinneseindriicke. Philosophische Studien, 4, 479-510. Lansing, R.W., Schwartz, E. and Lindsley, D. (1959). Reaction time and EEG activation under alerted and non-alerted conditions. Journal of Experimental Psychology, 58, 1-7. Logan, G.D. (1981). Attention, automaticity and the ability to stop a speeded choice response. In J. Long and A. D. Baddeley (Eds), Attention and Performance IX. Hillsdale, NJ: Erlbaum. Logsdon, R., Hochhaus, L., Williams, L., Rundell, H. L. and Maxwell, D. (1984). Secobarbital and perceptual processing. Acta Psychologica, 55, 179-193. Malmo, R. B. (1958). Measurement of drive: An unsolved problem in psychology. In M. R. Jones (Ed.), The Nebraska Symposium on Motivation V/(pp. 229-265). Lincoln, NB: University of Nebraska Press. Malmo, R. B. (1959). Activation: A neuropsychological dimension. Psychological Review, 66, 367-386. Maylor, E. A. and Rabbitt, P. M. A. (1987). Effects of alcohol and practice on choice reaction time. Perception and Psychophysics, 42, 465-475. Maylor, E. A. and Rabbitt, P. M. A. (1989). Relationship between rate of preparation for, and processing of, an event requiring a choice response. Quartely Journal of Experimental Psychology, 41A, 47-62. McDougall, W. (1911). Body and Mind, 8th edn, 1938. London: Methuen. McGuinness, D. and Pribram, K. (1980). The neuropsychology of attention: Emotional and motivational controls. In M. C. Wittrock (Ed.), The Brain and Psychology (pp. 95-140). New York: Academic Press. Meyer, D. E., Osman, A. M., Irwin, D. E. and Kounios, J. (1988). The dynamics of cognition and action: Mental processing inferred from speed-accuracy decomposition. Psychological Review, 95, 183-237. Molenaar, P. C. M. and van der Molen, M. W. (1986). Steps to a formal analysis of the cognitive-energetic model of stress and human performance. Acta Psychologica, 62, 237-261. Moruzzi, G. and Magoun, H. W. (1949). Brain stem reticular formation and activation of the EEG. Electroencephalography and Clinical Neurophysiology, 1, 455-473. N/i/it/inen, R. (1973). The inverted U-relationship between activation and performance: A critical review. In S. Kornblum (Ed.), Attention and Performance IV (pp. 155-174). London: Academic Press. Niemi, P. and N/i/it/inen, R. (1981). Foreperiod and simple reaction time. Psychological Bulletin, 89, 133-162. Osgood, C. E. (1953). Method and Theory in Experimental Psychology. New York: Oxford University Press. Pachella, R. G. (1974). The interpretation of reaction time in information processing research. In B. Kantowitz (Ed.), Human Information Processing: Tutorials in Performance and Cognition. Hillsdale, NJ: Erlbaum. Pieters, J. P. M. (1983). Sternberg's additive factor method and underlying psychological processes: Some theoretical considerations. Psychological Bulletin, 93, 411-426. Posner, M. I. (1978). Chronometric Explorations of Mind. Hillsdale, NJ: LEA. Posner, M. I. and Boies, S. J. (1971). Components of attention. Psychological Review, 78, 391-408. Pribram, K. H. (1990). Introduction: Brain and consciousness. A wealth of data. In R. John, T. Harmony, L. S. Pricep, M. Vald6s-Sosa and P. A. Vald6s-Sosa (Eds), Machinery of the Mind. Data, Theory and Speculations about Higher Brain Function (pp. xxi-xxxvi). Boston, MA: Birkh/iuser. Pribram, K. H. and McGuinness, D. (1975). Arousal, activation and effort in the control of attention. Psychological Review, 82, 116-149. Rabbit, P. M. A. (1979). Current paradigms and models in human information processing. In V. Hamilton and D. M. Warburton (Eds), Human Stress and Cognition (pp. 115-140). New York: Wiley.
Energetics and the reaction process
275
Rabbitt, P. M. A. (1986). Models and paradigms in the study of stress effects. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 155-174). Dordrecht, The Netherlands: Nijhoff. Rabbitt, P. M. A. (1988). The faster the better? Some comments on the use of information processing rate as an index of change and individual differences in performance. In I. Hindmarch, B. Aufdembrinke and H. Ott (Eds), Psychopharmacology and Reaction Time (pp. 79-95). New York: Wiley. Ribot, Th. (1919). Psychologie de l'Attention, 4th edn. Paris: F61ix Alcan. Robbins, T. W. (1986). Psychophamacological and neurobiological aspects of the energetics of information processing. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 71-90). Dordrecht, The Netherlands: Nijhoff. Robbins, T. W., Everitt, B. J., Fray, P. J., Gaskin, M., Carli, M. and de la Riva, C. (1982). The roles of the central catecholamines in attention and learning. In M. Y. Spiegelstein and A. Levy (Eds), Behavioral Models and the Analysis of Drug Action. Amsterdam: Elsevier. Salow, P. (1912). Untersuchungen zur uni- und bilateralen Reaktion. Psychologishe Studien, 7, 1-81.
Sanders, A. F. (1980). Stage analysis of reaction processes. In G. E. Stelmach and J. Requin (Eds), Tutorials in Motor Behaviour 20 (pp. 331-353). Amsterdam: North-Holland. Sanders, A. F. (1983). Towards a model of stress and human performance. Acta Psychologica, 53, 61-97. Sanders, A. F. (1986). Energetical states underlying task performance. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 139-154). Dordrecht, The Netherlands: Nijhoff. Sanders, A. F. (1990). Some issues and trends in the debate on discrete vs. continuous processing of information. Acta Psychologica, 77, 123-167. Sanders, A. F., Wijnen, J. L. C. and van Arkel, A. E. (1982). An additive factor analysis of the effects of sleep-loss on reaction processes. Acta Psychologica, 51, 41-59. Schweickert, R. (1983). Latent network theory: Scheduling of processes in sentence verification and the Stroop effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 353-383. Sokolov, E. N. (1963). Perception and the Conditioned Reflex. Oxford: Pergamon. Somsen, R. J. M., van der Molen, M. W., Jennings, J. R. and Orlebeke, J. F. (1985). Response initiation not completion seems to alter cardiac cycle length. Psychophysiology, 22, 319-325. Somsen, R. J. M., van der Molen, M. W. and Orlebeke, J. F. (1983). Phasic heart rate changes in reaction time, shock avoidance and unavoidable shock tasks: Are hypothetical generalizations about different S1-$2 tasks justified? Psychophysiology, 20, 88-94. Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders' method. In W. G. Koster (Ed.), Attention and Performance II (Acta Psychologica, 30, 276-315). Amsterdam: North-Holland. Stevens, S. S. (Ed.) (1951). Handbook of Experimental Psychology. New York: Wiley. Tucker, D. M. and Williamson, P. A. (1984). Asymmetric neural control systems in human regulation. Psychological Review, 91, 185-215. Van der Molen, M. W., Bashore, T. E., Halliday, R. and Callaway, E. (1991). Chronopsychophysiology: Mental chronometry augmented with psychophysiological time-markers. In J. R. Jennings and M. G. H. Coles (Eds), Handbook of Cognitive Psychophysiology: Central and Autonomic Nervous System Approaches (pp. 9-178). Chichester, England: John Wiley. Van der Molen, M. W., Boomsma, D. I., Jennings, J. R. and Nieuwboer, R. T. (1989). Does the heart know what the eye sees? Cardiac pupillometric analysis of motor preparation and response execution. Psychophysiology, 26, 70-80. Van der Molen, M. W., Somsen, R. J. M., Jennings, J. R., Nieuwboer, R. T. and Orlebeke, J. F. (1987). A psychophysiological investigation of cognitive-energetic relations in human information processing: A heart rate/additive factors approach. Acta Psychologica, 66, 251-289.
276
M. W. van der Molen
Van der Molen, M. W., Somsen, R. J. M. and Orlebeke, J. F. (1983). Phasic heart rate responses and cardiac cycle time in auditory choice reaction time. Biological Psychology, 16, 255-272. Van der Molen, M. W., Somsen, R. J. M. and Orlebeke, J. F. (1985). The rhythm of the heart beat in information processing. In P. Ackles, J. R. Jennings and M. G. H. Coles (Eds), Advances in Psychophysiology, vol. 1 (pp. 1-88). Greenwich, CT: JAI Press. Wickelgren, W. B. (1977). Speed-accuracy tradeoff and information processing dynamics. Acta Psychologica, 41, 67-85. Wickens, C. D. (1980). The structure of attentional resources. In R. S. Nickerson (Ed.), Attention and Performance VIII. Hillsdale, NJ: Erlbaum. Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention (pp. 63-102). New York: Academic Press. Wickens, C. D. (1986). Gain and energetics in information processing. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 373-390). Dordrecht, The Netherlands: Nijhoff. Woodworth, R. S. (1938). Experimental Psychology. New York: Holt. Woodworth, R. S. and Schlosberg, H. (1954). Experimental Psychology. New York: Holt. Wundt, W. (1910). Physiologische Psychologie (Sechste Auflage). Leipzig: W. Englemann.
Chapter 8 Sustained Attention H. S. Koelega University of Utrecht, The Netherlands
1
INTRODUCTION
1.1 Origins of Vigilance Research Concern about time-related deterioration of performance can be traced to studies carried out in the 1930s (e.g. Wyatt and Langdon, 1932), but it was World War II that gave the impetus to systematic study of vigilance performance. The Royal Air Force had received reports that airborne radar operators on anti-submarine patrol over the Bay of Biscay missed a large number of potential U-boat contacts when they had been on watch for some time. The operator's task, signaling the presence of an enemy submarine on the surface of the sea, had to be performed in isolation under monotonous conditions and was often a matter of waiting for nothing to happen, since the target he was searching for was a relatively infrequent event. Often, the operator produced 'false alarms', for example Spanish fishing vessels in the Bay of Biscay, but more important was the rather startling discovery that the observer failed to detect perceptible targets soon after his watch was started. Warm (1984a), describing the historical background of vigilance research, characterized this situation as 'to observe and perceive not'. Efforts to study the problem of missed contacts were initiated at about the same time (1943) in the USA, Canada and Great Britain, but the systematic, controlled laboratory experiments of Norman H. Mackworth (1948) are generally considered as the genesis of sustained attention or vigilance research. Useful summaries of Mackworth's experiments are provided by Davies and Parasuraman (1982) and Warm (1984a). Briefly, Mackworth designed a laboratory task, the 'clock test', which simulated the essentials of the radar operator's job. A rotating black pointer, describing a circle against a white background without reference points, moved on in discrete steps once per second. Occasionally the pointer executed a 'double jump', and this was the critical signal observers were required to detect and report by pressing a key. Signals occurred 24 times per hour and a session lasted continuously for 2 h. Charting the course of performance over time, Mackworth showed that observers became more inefficient as time on watch progressed: correctly detected signals declined sharply within the first 15-20min and then showed a more gradual decline. The within-session deterioration in performance has become known as the vigilance decrement or the decrementfunction. Handbook of Perception and Action, Volume 3 ISBN 0-12-516163-8
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
278
H. S. Koelega
This, then, is the vigilance problem: tasks in which attention is directed to one or more sources of information over long, unbroken periods of time show a rapid decline in performance which seems to result merely from the necessity of looking or listening for an infrequent signal. The decrement, which can be assessed in terms of measures of performance such as percentage correct detections (hits), detection latency (or response time; RT), false alarms or the signal detection measures of sensitivity (d') and response criterion (beta) may become apparent very early in the session. Usually, analyses of the decline are rather coarse-grained: for example, hits are averaged over time periods of 20min, but Jerison (1963), using a more fine-grained (signal by signal) analysis of detection efficiency, has suggested that performance may decline from the very beginning onward. So, the waning of performance may set in rapidly. This progressive decline in performance with time appears to result from central processes (inhibition, attention, arousal, motivation, etc.) rather than from changes in peripheral processes (sense organs). One may well ask what the essential characteristics of the vigilance task are. Given the wide variation in vigilance tasks, it would be prudent to ask whether they really have features in common that are unique to the vigilance situation. Warm (1984a), discussing the vigilance 'paradigm', has also remarked that the absence of any common characteristics would make it difficult to define vigilance in operational terms and to draw meaningful conclusions about 'vigilant behavior'. An attempt to define human vigilance and to specify the essential task characteristics has been made by McGrath (1963). Noting that hundreds of reports on vigilance had been published with a profusion of research techniques, McGrath raised the question whether every investigator was studying the same phenomenon. He reported to have encountered several different notions of vigilance in the literature, the most important of which were (1) a central process or state of the organism ('attention') determining performance, or (2) the performance on certain tasks, or (3) a general area called human watchkeeping. In the first view, vigilance usually has meant the individual's readiness to respond to certain infrequent and unpredictable events as a form of attention, namely sustained attention, as an intervening variable. Since nothing in the environment is changing, a change in performance must result from changes in a central state (attention). However, this definition of vigilance is not very productive, since the state of vigilance is inferred from performance and then is used to explain performance. Entities such as vigilance, sustained attention, or arousal for that matter, give the appearance of explaining the observed phenomena but in reality explain nothing: to say that performance declined because the subject's vigilance declined is to follow a circular way of reasoning. The second notion of vigilance, performance on certain tasks, is more attractive and useful. Vigilance is then described by measures of detection probability, etc. In the definition of vigilance in terms of performance, one has to define the task and to specify the measures of performance. McGrath (1963) has also specified the conditions under which we observe what is called vigilance performance, criteria by which to distinguish it from repetitive monotonous work, tracking performance, etc. The most important specifications are: (1) The task should require detection, i.e. perceiving and reporting a change in the operating environment.
Sustained attention
279
(2) The intensity of the signal should be close to the observer's detection threshold, but the signal should be clearly perceivable when the observer is alerted or directed to it. (3) Signals should occur irregularly, infrequently and, if nonsignal stimuli are present, the ratio of nonsignals to signals should be high. (4) The task should be prolonged and continuous. Not all vigilance tasks conform to these characteristics in a hard and fast manner. The various interpretations of McGrath's specification with respect to the intensity and ease of discrimination of the signal may have caused confusion and contributed to disparate findings. Likewise, the requirement of a prolonged, continuous work period, without an indication where the cutoff should be made, has led to a great diversity of task lengths, ranging from less than 10 min to more than 6 h (in fact, most studies appear to have a session length of 40-60min). However, the aforementioned task dimensions seem to capture the characteristics of most vigilance tasks, and thus are the boundary conditions of the vigilance paradigm. How to define the appropriate indices of performance, the measures? The probability of detection during some interval (percentage hits) and the latency of the hit response (RT) have been the principal means of gauging performance. Furthermore, false alarms are often used, sometimes combined with omissions (misses), yielding an unintelligible 'error' measure. The application of the measures of the theory of signal detection has been viewed as promising by some investigators but has been challenged by others (Warm, 1984a). Koelega (1992) has suggested that the importance of these measures for vigilance may have been overstated in the past, and in a recent experiment (van Leeuwen et al., in press), in which signal detection measures such as 'sensitivity' were related to event-related brain potentials, we concluded that various aspects of processing are involved in these measures, which makes them quite meaningless. The question arises whether all these dependent variables are different manifestations of the same underlying phenomenon, and whether they are equivalent. They often do not correlate, so obviously may measure different processes. Further, a measure such as false alarms is difficult to interpret. Does it merely reflect the stability of the observer's criterion? It may also be questioned whether false alarms comprise a homogeneous, unitary measure. Recently, Halperin et al. (1988) reported that in a widely used vigilance task, the CPT (continuous performance task), different types of false alarms are identifiable, associated with different reaction times. The authors have suggested that the different types of errors assess different underlying psychological functions (impulsivity, inattention, etc.). All in all, the most appealing of all measures is probability of detection, i.e. percentage of hits.
1.2
Theories of Vigilance Performance
In a personal communication to Jerison (Jerison, 1970, p. 130) Norman Mackworth remarked that 'the essential feature of the vigilance story was that its origins were without any theoretical background'. An RAF report that submarines were being unreported was the starting point for Mackworth's research and all subsequent research on vigilance. Emphasis in the early years has been on a straightforward empiricism, but research workers cannot long be satisfied with purely empiric
280
H. S. Koelega
studies. According to Jerison, research on vigilance in the USA, in contrast to European research, initially remained purely empiric, without theoretical analysis. Not until it was discovered that the vigilance situation could be used for an analysis of attention, could vigilance data be entered into the mainstream of psychological theory-building. Jerison (1970) has even proposed that the vigilance situation be recognized as a major paradigm for research on attention: the vigilance task provides a fundamental paradigm for defining sustained attention as a behavioral category. In Jerison's view the task is an unusually good one for the study of orderly changes in attention over long periods of time, but this contribution has been partly obscured by the erroneous belief that the vigilance situation is a simulation of the real world. In order to explain the diversity of vigilance findings, a large number of theories of vigilance has been proposed, focusing upon many psychological processes. Norman Mackworth himself (1950) was the first author making a theoretical effort to account for his findings in appealing to the Hullian concept of response inhibition. A variety of theories have subsequently been proposed, ranging from physiological to cognitive notions. Teichner (1974), reviewing the literature published between 1950 and 1971, remarked that the vigilance function has had applied to it, in succession, all the major theoretical concepts of experimental psychology so that the vigilance literature stands almost as a historical guide of theorizing in psychology. The pros and cons of these theories need not be recounted here in detail; there are many summaries (Davies and Parasuraman, 1982; Dember and Warm, 1979; Loeb and Alluisi, 1977, 1984; Warm, 1977), so we may forego elaboration of the theories here. The large number of theoretical models is a testimony to how little is actually understood about what is going on during the time of watch. Each theory may explain certain facts of the findings, many theories can account for similar data, but Loeb and Alluisi (1984, p. 200) conclude their review by stating that there are serious difficulties in the application of all of the models. A failure to detect and respond may occur because of sleepiness, or by decreases in sensitivity (d'), or by failures to observe, or by habituation to a level below criterion, or by an increase in criterion based on expectancies, or may result from the interaction of these various processes, possibly with other processes as well. Further, there is a similarity between concepts such as inhibition, habituation and filter. It is difficult to set up a critical test of one theory against the other. Most theories have been frameworks for possible explanations of the available data and have been conspicuously delinquent in providing new hypotheses and predictions that will allow them to be evaluated empirically. It should be noted that several theories have not been developed primarily for application to vigilance; most have been developed for cognitive psychology, signal detection, selective attention, etc. Further, almost any theory is devoted to explaining the decrement to the exclusion of questioning what determines the overall level of performance. Future theorizing, if undertaken at all, would have to develop a synthesis of all the different points of view. Vigilance behavior probably is not explicable in terms of a single mechanism. Loeb and Alluisi (1984) have stated that there is every reason to believe that numerous qualitatively different processes are operating simultaneously. We are taught to be parsimonious, to explain as many phenomena as possible in terms of the operation of as few specific mechanisms as possible. But the 'single mechanism' explanation exceeds the limits of credibility for vigilance,
Sustained attention
281
and possibly for other behavioral phenomena as well. Whereas Loeb and Alluisi (1984) believe it highly probable that a valid and credible theory of vigilance, incorporating a multitude of explanatory principles, will finally be developed, I am not convinced of the feasibility of further attempts to develop a synthesis of a dozen or so theories. Such attempts may not be profitable and may end up in a dead alley; so many factors have been shown to affect vigilance performance that to encompass them all within one theoretical framework seems to be an impossible mission. We have here a curious phenomenon: what seems to be a relatively simple behavioral p h e n o m e n o n - a gradual decrease in detection efficiency during watchk e e p i n g - turns out to be a complex issue exceeding the explanatory power of all theories available to experimental psychology. Theorists could argue that impaired performance is typically an applied problem, which does not lend itself well to theoretical approaches, but on the other hand we have seen that authors concerned with practical applications (Mackie, 1984, 1987; Wiener, 1984, 1987) complain that vigilance research has done nothing to solve real-world problems. Their point of view is that theory rather than operational consideration has established the research priorities, and has been driving the choice of task conditions and the selection of variables over the past 40 years. In the present chapter an attempt is made to assess the extent of recent advances in several selected areas of research on sustained attention in order to make up the balance after more than 40 years of experimentation: 'What do we really know?' and 'Is it useful, for the advancement of psychological knowledge or for the benefit of society, to pursue the lines of research?'.
1.3 The Scope of this Chapter Of course, the present chapter cannot provide a complete picture of the state of the art of the whole paradigm. However, the reader may note that my choice of topics strongly resembles the table of contents of the two most important books on vigilance published during the last decade (Davies and Parasuraman, 1982; Warm, 1984b), i.e. classification of tasks, individual differences, a possible physiological basis of vigilance performance, as manifest in event-related brain potentials, and the effect of external stressors (noise). Relatively new is a discussion of the effects of drugs on sustained attention. The chapter is biased in the sense that I rely heavily on my own reviews produced during the last few years. Examples of areas that are not covered in this chapter are the effects of stimulus, display and task parameters. These topics have received attention in the two aforementioned books. Apart from these two landmarks, some other contributions provide a wealth of data: (1) the proceedings of the 1961 Santa Barbara Symposium, edited by Buckner and McGrath (1963); (2) the book by Davies and Tune (1970); (3) the book by Stroh (1971); (4) the proceedings of the 1976 St Vincent (Italy) Symposium, edited by Mackie (1977); and (5) the special issue of the journal Human Factors (vol. 29, no. 6, 1987). Findings in the field of vigilance performance are heterogeneous and confusing. A main reason for the diversity and heterogeneity of findings may be found in the use of a profusion of vigilance tasks, differing in information processing demands. It would be quite helpful if a framework was available within which the data could be interpreted, some system of classifying tasks in terms of their processing
H. S. Koelega
282
demands, such that empiric data might be organized efficiently and comparability of different studies enhanced. The development of such a classification or taxonomy should be a primary concern. Section 2 describes some attempts to arrive at a taxonomy of vigilance tasks.
2
TOWARD
A TAXONOMY
OF VIGILANCE
TASKS?
The first step in organizing knowledge is the recognition of similarities and differences of things and events. These observations are then the basis for classifications, necessary for drawing generalizations. The process of classification dates back to primitive humans: among the earliest dimensions were probably m a l e female, young-old, long-short, hot-cold, etc. Scientific classification goes back to the ancient Greeks. The theoretical study of systematic classifications is called taxonomy, the science of how to classify and identify. In the behavioral sciences, due to an overwhelming cascade of unorganized facts, the need for some organizing systems to synthesize research information has become critical. Although in the present section the focus is on a taxonomy of vigilance tasks, it should be remembered that this is but a small piece of a taxonomy of human performance, which in turn is but a small piece of behavioral science. The history of and need for a taxonomy in the field of human task performance, methodological issues, available classificatory systems and many other topics have been extensively dealt with in a 500-page book by Fleishman and Quaintance (1984). According to these authors, a taxonomy of performance is useful for theoretical purposes (classifications are heuristic, generate hypotheses), but also has applied practical benefits (job analysis, system design, personnel selection, training, etc.). A taxonomic system for classifying task variables could aid in bridging the gap between basic (laboratory) research and applications. Generalization from laboratory tasks to operational tasks is improved when a central role is ascribed to relevant common task dimensions. Generalization of research results has been limited by the absence of a unifying set of task dimensions: commonly used categories of human functions such as cognitive, motor, perceptual, problem solving, discrimination, learning, motor skill and information processing have turned out to be too general, there is a considerable diversity of functions within each of these broad areas. Classification of tasks is not an end, in and of itself, rather it should be viewed as a tool to increase the ability to interpret or predict some facet of human performance. Ultimately, a taxonomy should have predictive power at a sophisticated level: not only whether or not a particular variable affects performance but also in what direction and how much. In terms of the present chapter, eventually a taxonomy of vigilance tasks should spell out the task characteristics that do and do not produce a decrement. Fleishman (1975) has generated a taxonomic framework for describing tasks, featuring an 'ability requirements' approach: tasks are categorized according to the abilities needed to perform them. Fleishman was able, with a factor-analytic approach, to isolate and identify the ability factors common to a wide range of perceptual-motor tasks. Performance on several hundred different tasks could be accounted for in terms of a relatively small (11) number of abilities. A study by Levine, Romashko and Fleishman (1973) is relevant in the present context. Count-
Sustained attention
283
less variables have been reported to affect vigilance performance: Mackie (1987, p. 708) has provided a partial listing of all such factors. Noting the enormous diversity of vigilance data, Levine et al. made an attempt to make more effective use of these data by structuring the literature according to the abilities required for task performance. In their view, such a system would allow generalizations to operational settings, prediction of the effects of independent variables, and identification of gaps in existing knowledge. In 53 vigilance studies, selected from 195 articles, two abilities were found to be predominating: perceptual speed (the speed with which stimuli are compared) and flexibility of closure (the ability to isolate a target stimulus from an embedding, more complex field). In both the auditory and visual modality, perceptual speed tasks showed a severe and rapid performance decrement with time on task. For the auditory condition, flexibility of closure tasks showed, after an initially slight decline, an increment rather than a decrement, but only beyond 90 min, and for the visual condition, these tasks showed a decrement, but only after 120 min. The authors concluded that relationships between performance and time on task differ markedly as a function of the class of task imposed upon the subjects. Parasuraman and Davies (1977) have used the Levine et al. (1973) data to develop their own taxonomy. They have suggested that perceptual speed be termed successive discrimination (signals and nonsignals being presented successively) and flexibility of closure would refer to simultaneous discrimination (detect or identify a signal which is part of a configuration of nonsignals). One could argue about the interpretation of perceptual speed as 'successive discrimination': Levine et al. (1973, p. 151) defined perceptual speed as comparisons between successively or simultaneously presented stimuli. Whatever, the nature of the discrimination required for detection is, in the view of Parasuraman and Davies, the most important task dimension. Together with the event rate, the rate of presentation of the stimuli, as a second dimension, they have proposed a twofold classification system, which is nowadays the most important taxonomy in the vigilance literature. Vigilance decrements, especially those in sensitivity (d'), would occur only in tasks combining a high event rate (> 24 per min) with the requirement for successive discriminations, due to memory demands when a stimulus has to be compared with previously stored data. In our own laboratory several experiments have been carried out in an attempt to test the validity of this taxonomy, as well as that of another classification proposed by Teichner (1974), which for some reason has remained rather obscure in the literature. The first experiment (Koelega et al., 1989) compared four vigilance tasks differing in memory load and in stimuli employed (sensory or cognitive). The tasks shared the aforementioned critical dimensions for a sensitivity decrement to occur (high event rate and successive discrimination). The data suggested that neither of these two critical conditions is sufficient for the occurrence of a decline. Further, measures of detectability (hits) and response latency (RT) appeared to be relatively independent indices of performance, vigilance level and vigilance decrement were dissociated, and performance on one task had a limited predictive validity for other types of task. The conclusion was that the critical dimensions of Parasuraman's (1979) taxonomy do not cover all factors involved in the vigilance decline. Our finding that the taxonomy should incorporate the sensory-cognitive distinction has recently been confirmed in a meta-analysis of 42 vigilance studies (See et al., 1995).
284
H. S. Koelega
A second experiment (Koelega et al., 1990) tested the prediction, made by Teichner (1974), that on visual tasks dynamic stimuli result in greater performance decrements than do static stimuli. The prediction received support for the measure of speed only. If anything, the results of these two experiments show that we are far from arriving at a comprehensive taxonomy of vigilance tasks. The taxonomy proposed by Parasuraman (declines in sensitivity occur only when a high event rate is coupled with successive discrimination) failed to receive support, as did the taxonomy suggested by Teichner (decrements in visual tasks take place only with dynamic stimuli), although admittedly the stimuli employed by us were not very dynamic. Moreover, a decrease in d' has also been reported with a low event rate (Eilers et al., 1988; Mackworth, 1970, p. 42; Williges, 1971). Other investigators (Dittmar, Warm and Dember, 1985), noting a decrement in sensitivity in a task with a high event rate and a simultaneous type of discrimination, have also stated that the taxonomy proposed by Parasuraman (1979) does not completely describe the conditions under which the decrement is rooted in perceptual and nonperceptual factors. The discrepancy was partly resolved by Parasuraman and Mouloua (1987), who noted a decrement in simultaneous discrimination tasks with poorly discriminable stimuli. Warm et al. (1988) also noted a decrement with both types of discrimination under free-viewing conditions, but not under head restraint conditions, and concluded that memory load is not sufficient to produce a decrement when signal quality is maintained. Further, the two types of task were not differentially sensitive to increased capacity from long hours of work or heavy workloads (Warm et al., 1989). On the other hand, Gluckman, Dember and Warm (1988) have stated that, in a session where both types of task were performed, there is evidence that successive tasks consume more processing resources than simultaneous tasks. Lanzetta et al. (1985) reported that their data indicated that the two main dimensions of the taxonomy (event rate and task type) are not independent, and have a synergistic effect on performance. These authors have also suggested that the cutoff of slow and fast categories can be maintained at 24 events per min for successive tasks, but should be placed higher for simultaneous tasks: 48 per min would be a more appropriate value. Further, our finding that 'cognitive' stimuli have a different effect on vigilance performance as compared with 'sensory' stimuli does not imply that there is evidence for a unidimensional sensory-cognitive factor in a taxonomy. Even cognitive alphanumeric stimuli show differences among themselves. But, while the 'sensory' or 'cognitive' character of stimuli might be added as a dimension in a taxonomy of tasks, some aloofness should be kept concerning future attempts to develop a classification system. Not only the sensory-cognitive dimension, but several other factors might also turn out to affect the occurrence of performance declines, e.g. sense modality (note that the Teichner taxonomy is limited to visual tasks), personality, length of the session, search requirements, etc. Further, our experiments have shown that vigilance level and vigilance decrement may dissociate, i.e. different factors may account for overall level and decline, which suggests that at least two separate taxonomies are required to explain vigilance performance. Moreover, our data have led us to question the validity of a taxonomy based on measures of detectability, when measures of speed are considered.
Sustained attention
285
There are, moreover, large differences between laboratory tasks (on which the taxonomy is based) and operational tasks (see section 7.3). For example, most operational tasks involve simultaneous discrimination, including scanning or searching requirements. Further, vigilance requirements in operational situations often cut across the classification of high or low event rate; there often is no simple physical dichotomy between signals and nonsignals and often complex multidimensional discriminations are required. One may question the attempt to build a useful taxonomy from a reference base of laboratory tasks and then to extrapolate to real tasks. The search for a single general taxonomy, the taxonomy, may have its limitations. One may well wonder whether further attempts to clarify the dimensions of a taxonomy are worthwhile. In pursuing a 'complete' taxonomy which might predict (group) performance on laboratory tasks, we may invest research efforts yielding a relatively small benefit. An effective taxonomy should not only predict operator performance, or integrate empiric data, but it should also be cost effective: its utility, either theoretical or practical, should outweigh the investment in time and money. Even if a complete taxonomy, predicting group performance, was available, nothing can be said about individual performance. Several experiments have shown that the particular distribution of 'good' and 'poor' performers in a sample may have a profound effect on the outcome of an experiment. The existence of large individual differences in vigilance performance is a fact figuring prominently in the literature. From an analysis of 141 papers in the journal H u m a n Factors, Simon (1976) concluded that subjects accounted for more of the total variance than did the independent variables, implying that intersubject differences are greater than the effects of the variables being studied. Some authors have long since argued that it would be logical to attempt to explain the individual differences first. The next section addresses such an approach.
INDIVIDUAL DIFFERENCES: AN APPROACH TO EXPLAINING VIGILANCE PERFORMANCE It is one of the more common findings of research on vigilance that considerable variation exists among the performance scores achieved by different individuals working at the same task. Some monitors suffer a considerable loss in proficiency, but others maintain a high performance level throughout the watch. Selection of only highly proficient monitors would increase group performance levels. The feasibility of predicting individual differences from other measures is dependent upon their reliability. Davies and Parasuraman (1982) and Davies, Jones and Taylor (1984) report that it is usually observed that variability in performance is consistent: individual differences have been found to be reliable not only within a session, but also between sessions. Since substantial individual differences in performance are routinely obtained, attempts should therefore be made to develop a selection test that would predict performance. Apart from this practical interest, individual differences are also beginning to be used for theoretical purposes: a theory of vigilance must explain why, under essentially identical environmental and task
286
H. S. Koelega
conditions, overall level of performance of one subject will be superior to that of another, and why the performance of one will decline as a function of time on task but that of another will not. Some 30 years ago, Buckner (1963) pointed out that, since group curves represent the performances of subjects showing no decrements and of those showing marked decrements, it would seem more logical and productive to attempt to explain the individual differences first, since such an explanation would account also for average group performance. Experimental psychologists often do not take individual differences into account. By doing this they throw away a vital part of the total experimental variance and unduly enlarge the error variance, which in psychology is often already large. Eysenck (1981) has always emphasized that no experimental or applied psychology can flourish which does not incorporate individual differences. Psychology deals with people, and people, although as a class share many things, are above all else individuals, i.e. they behave differently in identical situations. Hence laws based on regularities of behavior have to be modified by reference to those aspects of human nature that produce differences. With respect to vigilance performance, only individual behavioral characteristics and their interactions with the array of stimulus factors remain uncontrolled, because within a given monitoring session the external set of stimulus factors remains constant. It is the former partition of the total variance that becomes of primary interest to those engaged in an individual differences approach. According to Waag (1971), the subgroup of good performers usually represents a sizable proportion of the total subject pool, ranging anywhere from 25% to 50%. Of course, within the context of an individual differences approach, generalization across tasks and modalities is of critical importance. If performance under different tasks is found to be uncorrelated, then the assumption of an underlying common vigilance factor must be seriously questioned, and there would be little possibility of obtaining a set of predictors that were valid in a variety of situations. It would then be necessary to obtain a set of predictors for each particular task situation. In the literature there are a few interesting attempts to use performance on selective attention tests (SATs) as a predictor of performance. Gopher and Kahneman (1971), noting that the ability to divide and switch attention appropriately among concurrent signals, and to avoid interference from distracting sources of information, is a main feature of flying high-performance military aircraft, reported that pilots of such aircraft performed better on an auditory SAT than did pilots of transport planes and slower jet aircrafts. They concluded that the predictive variance that the SAT contributes is essentially independent of other cognitive and psychomotor tests that are currently in use for the prediction of pilot aptitude. In a large-scale validation study, the SAT was subsequently administered to more than 2000 flight candidates. It appeared that success and failure rates were correlated with the attention measures, and especially switching errors correlated highly with the criterion (Gopher, 1982). Gopher concluded that individual differences in basic attention capabilities do exist and are correlated with a relevant external criterion. The systematic investigation of these individual differences is not only of practical merit, but may also lead to a better understanding of the structure of attention. In another study (Kahneman, Ben-Ishai and Lotan, 1973) the SAT was validated against a criterion of accident frequency in bus drivers, and here also the scores obtained on the SAT were clearly related to the accident criterion. The authors suggested that the SAT could be an aid to reject applicants who are most likely to
Sustained attention
287
be accident prone. Avolio, Kroeck and Panek (1985) have also found a correlation between two measures of selective attention and accident involvement with motor vehicles. In another study, Avolio et al. (1981) developed a visual SAT and attempted to obtain evidence for the existence of a central processing mechanism by correlating the visual and auditory SATs. There was a significant but moderate (0.42) correlation. However, both tests also correlated with a third test, the embedded figures test (EFT), a measure once regarded as reflecting field dependence or cognitive-perceptual style, but now increasingly recognized as a measure of spatial ability, although the EFT also seems to correlate with nonverbal intelligence (general ability) and fluid ability (facility in reasoning, independent of previous knowledge). Sack and Rice (1974), who obtained evidence for three factors in attention (selectivity, resistance to distraction and shifting), reported that the EFT was especially associated with the selectivity factor, but Barroso (1983) also found a relationship with resistance to distractions. It has been suggested that the EFT is a good measure of individual differences in visual information processing, and there have been a few attempts to use this measure as a predictor of vigilance performance. At least four studies reported that field-independent subjects were superior in vigilance and inspection tasks (Cahoon, 1970; Gallwet, 1982; Moore and Gross, 1973; Moses, 1970), although Moses reported that they were superior for a complex task only, not for a simple vigilance task. At least one study (Sanders, 1973) found no relationship (this study did find a relationship between need for achievement and vigilance). Braune and Wickens (1986) reported that field (in)dependence is related to strategy differences (serial versus parallel processing) in dual-task performance. Forbes and Barrett (1978) used several other predictors besides the EFT: the RFT (rod and frame test, the well-known Witkin tilting chair in a dark room, another measure of field dependence but not an equivalent form because correlations between EFT and RFT are low), an auditory SAT, and measures of time-sharing, memory and intelligence. On a simple visual vigilance task, the EFT was the only variable predicting performance, albeit for RT only, not for hits. On a more complex task, all predictors were related to hits, but RT was related only to the RFT. In a later study (Barrett et al., 1980) the EFT appeared to predict performance on a simple task better than the RFT, but the reverse was true for the more complex task. These results are not unambiguous and in later studies by the same group (Barrett et al., 1983; Cellar et al., 1982) individual differences in measures of short-term memory (STM) search, visual search, array memory (memory for location) and sequential memory also appeared to be related to monitoring performance (sometimes differentially for hits and correct rejections!), but the finding of Forbes and Barrett (1978) that performance on a selective attention task was related to vigilance performance is an interesting finding. Lansman, Poltrock and Hunt (1983) have also provided evidence suggesting that, within a modality, performance in selective and divided attention tasks is related to vigilance performance. The significance of all these findings is that large individual differences in basic attention capabilities do exist and sometimes are correlated with a relevant external criterion, such as shown by the data on pilot and bus driver performance. For some other 'predictors', to be discussed now, more studies are available than for field dependence or selective attention. Using the former ones, I will attempt to assess whether the study of individual differences may contribute to an understanding of
288
H. S. Koelega
vigilance performance, as claimed by some authors. Two questions should be explored, one of practical and one of theoretical interest. Does membership of a particular group (sex, age, personality, intelligence, etc.) account for a significant proportion of the variation in performance, thus enabling the development of a selection device that would discriminate between 'good' and 'bad' monitors? Further, following earlier suggestions to explain the individual differences first, can individual differences, anchored in constructs such as attention, arousal, etc., throw more light on the mechanism(s) of vigilance? Two potentially promising predictors are: (1) electrodermal measures, and (2) the personality dimension of extroversion-introversion. Two recent reviews are devoted to a discussion and evaluation of the findings of studies using (autonomous) physiological measures and personality questionnaires. Koelega (1990) reviewed studies using measures of electrodermal rate or speed of habituation of the orienting reaction, and spontaneous fluctuations. It appeared that slow habituators display a high overall level of performance; a difference in cautiousness of responding was not involved in this superiority. There was less evidence that fast habituators exhibit greater performance decrements with time. Electrodermal lability, in the sense of spontaneous activity, appeared to be useless as a predictor of overall vigilant efficiency. Basically, two interpretations of the relationship between speed of habituation and vigilance performance are available: an attentional model and an arousal model. Both positions claim support in the differential relationships between speed of habituation and signal detection measures, but the validity of the latter measures obtained in vigilance experiments may be questioned. Speed of habituation has also been linked to intelligence. A meta-analysis (Koelega, 1992) of studies investigating the personality dimension extroversion-introversion, using 53 sources, revealed a highly significant difference in detection performance in favor of introverts, but there were many inconsistencies. In a subset of the studies (extreme extroverts and introverts carrying out tasks with visual stimuli), introverts turned out to be superior in overall level of performance, but not in maintaining efficiency over time. The meta-analytic technique used was biased toward minimizing these relationships. Differences in false alarms and response speed were not found. The validity of some 'classic' findings in the vigilance literature (the effects of time of day and of caffeine) was questioned. It was suggested that some inconsistent findings with respect to performance decrements may have been caused by inappropriate use of univariate analysis of variance in repeated-measures designs. The two predictors considered above (speed of habituation and extroversionintroversion) appear to possess some predictive capacity (especially for overall level of performance) that was, to an extent, regardless of task conditions, i.e. for simple monitoring tasks. Unfortunately, most 'real-life' (operational) monitoring tasks involve more than simply detecting and responding to infrequent critical events. Besides sustained attention, selective and divided attention may play a role in complex multidimensional discriminations. There are suggestions that on selective attention tasks and in time-sharing activities (dual tasks), fast habituators and extroverts may outperform slow habituators and introverts, whereas the latter two groups are superior in vigilance tasks (Davies et al., 1984; O'Gorman and Lloyd, 1988). This would, at the present stage, make attempts to characterize the 'ideal' monitor unrealistic. However, there is contradictory evidence suggesting a positive relationship between performance on vigilance, selective attention and divided
Sustained attention
289
attention tasks (Forbes and Barrett, 1978; Lansman et al., 1983). Although the practical need for valid predictors remains, numerous investigators have failed to produce a global selection test with predictive capacity for operational task performance, and one would be hard put to justify any type of predictor. Before we despair of further research, however, more experiments should be carried out in which the same subjects perform selective, divided and sustained attention tasks. I have also argued for more research on preference for complexity of stimuli and of task performance (Koelega, 1992). Subjects preferring simple stimuli performed well on a simple task, but subjects preferring complex stimuli performed very poorly on simple, monotonous tasks. Strelau (1983) has also observed that people with a certain type of temperament (reactivity) and a need for simple stimuli (operators at an electric plant where a fake emergency situation was set up, as well as pilots in a stress situation during a simulated flight) were less efficient in a difficult, highly stimulating situation than were people with a need for complex stimuli. Preference for complexity may be related to the capacity to endure boring situations, to maintain alertness in the absence of appreciable external stimulation, and the tendency to persist when only minimal reinforcement is provided. The other question formulated in the introduction pertains to the theoretical significance of individual differences in vigilance performance. A theoretical process explaining individual differences may also provide a viable explanation of the decrement phenomenon and may contribute to achieving a complete understanding of sustained attention. Most research on individual differences in vigilance has centered about the personality dimension of extroversion-introversion. H. J. Eysenck (1967) postulated that this dimension measures differences in cortical arousal. His initial interest was not in prediction of performance but in validating an arousal theory of personality. Eysenck relied on vigilance data to support his theory: arousal would facilitate the maintenance of attention over time, and hence introverts do better than extroverts since introverts are hypothesized to be more cortically aroused than are extroverts during the boring, monotonous vigil. Factors changing level of arousal (noise, drugs, etc.) could reverse performance trends, however. More recent extensions of the theory have questioned the arousal interpretation and have suggested an explanation in terms of attentional capacity (e.g.M.W. Eysenck, 1988). Interpretations of individual differences in the other predictor considered in this section (speed of habituation) appear to have taken the same form. In fact, only two models have been proposed: a (physiological) arousal model, stating that slow habituators have a higher tonic level of arousal, and a (cognitive) attentional model, claiming that the individual differences reflect attentional or information processing capacities. So, we have two models, both claiming empirical support, explaining individual differences in temperament and in rate of electrodermal habituation. Since these differences appear to be related to differential vigilance performance, these models are obvious candidates for an explanation of sustained attention, both of overall level of performance and of changes in performance over time. The remainder of this chapter will be devoted to an evaluation of the two explanatory models. It is generally agreed that a sophisticated way to study attentional mechanisms is by means of event-related brain potentials. These time-locked indices of cortical activity are assumed to be a manifestation of attention rather than of arousal. In the following section their relationship to vigilance performance will be explored. Next,
290
H. S. Koelega
the arousal model will be considered. There are several ways to manipulate level of arousal, but only two of these have provided a sufficient number of empiric findings: arousal induced by external stressors and environmental stimulation (noise), and arousal manipulated by depressant and stimulant drugs. In sections 5 and 6 the relevant studies will be evaluated, respectively.
ATTENTION AND INFORMATION PROCESSING: AN APPROACH TO EXPLAINING VIGILANCE PERFORMANCE USING EVENT-RELATED BRAIN POTENTIALS
4.1
The Construct of Attention
In the preceding section I have noted that work on vigilance has mainly relied on the use of either arousal or attention as the major theoretical constructs explaining performance. It may be questioned whether these two explanatory models are mutually exclusive. I shall take up this question in section 7. The present section addresses vigilance performance within an information processing framework, with a focus on the construct of attention, because 'sustained attention' is used as a synonym of vigilance. An insistent problem has always been the definition of attention. 'Attention' is a vague, catchall phrase, permitting a great variety of meanings to be associated with it: sustained, selective, divided and focused attention, inattention, lapses of attention, effort, consciousness and even arousal. Attention is an essential concept, representing a bridge between various schools and disciplines, but because of its vagueness it is evident that not everyone is talking about the same thing. Some theorists view attention as unitary: a single, central capacity or energy, supporting cognitive processing. Others view attention as only one of several resources. Several characteristics of attention may be conveniently described by one of two metaphors: attention as a searchlight, and attention as a resource that can be distributed to processing. Both metaphors have their limitations, however. Cognitive psychology is the study of human information processing, the determination of the internal operations, the mental processes underlying performance on a particular task. Attention should be viewed within an information processing framework, and is often defined in terms of consciousness or awareness. Processing encompasses a broader domain; virtually all models of information processing propose some preattentive processing to take place (operating without awareness and not requiring capacity) called 'iconic' in the visual system. An influential theory of attention (Posner, 1978) views attention essentially as consciousness: attention is a field of study of internal mechanisms relating to our awareness of events. Not everyone agrees with this definition, however. The enormous outpouring of books, chapters and articles devoted to the subject of attention precludes a discussion of the many contentious issues surrounding the use of this hypothetical construct within the context of this section. A cursory review of theoretical disagreements is provided by Lane (1982), and Eysenck (1982) who considered 10 theories but adds (p. 43) that this number could have been doubled without any difficulty.
Sustained attention
291
There have been several attempts, by using different tasks, to assess which factors are involved in attention. I shall not discuss them all; the results depend somewhat on the particular tasks used, which may also tap other capabilities besides attention. Sack and Rice (1974) identified three attentional factors: (1) selectivity, associated with the EFT (see section 3), (2) resistance to distraction, and (3) shifting. Barroso (1983), noting that auditory tasks have played a key role in theories of attention, included both auditory (three different dichotic listening tasks and three different shadowing tasks) and visual (the Stroop color-word test, the EFT and anagram solving) tasks. His conclusion was that attention is a multicomponent system. The different auditory tasks did not engage related processes, and visual tasks shared attentional processes with auditory tasks. Four factors, common sources of variance, were uncovered: (1) resistance to (both external and inherent) distraction; (2) a specific auditory component involved in tracking semantic features; (3) an 'executive' function, controlling other aspects of attention, especially spatial locations; and (4) the breadth of attention. Finally, an analysis of 10 test scores (derived from neuropsychological tests used at the National Institute of Mental Health, Bethesda), commonly considered to be measures of attention, resulted in four factors: (1) focusing, intertwined with execution (speed), (2) sustaining or vigilance, (3) encoding numeric-mnemonic aspects, and (4) shifting or flexibility. These components of attention were assigned to differing brain regions (Mirsky, 1987). A major development in thinking about attention has been the notion of a limited-capacity system. There is an upper limit (both peripheral and central) placed on the capacity to process information; were it not for this limitation, the term 'attention' would not be necessary. Limited capacity, for Posner (1978) 'the key to understanding the nature of attention', cannot be observed directly, but must be inferred from performance, especially from the interference between various tasks (divided attention, dual-task performance). This technique of investigating attention was proposed some 100 years ago by Loeb, and tested empirically by his student Welch (1898) in several experiments with the aim of determining a construct of attention. Theories of attention, put forward in the 1950s and 1960s, had in common the notion of limited capacity or a 'bottleneck' at some point in the course of information processing. The theories differed primarily with respect to the putative location of the bottleneck: in an early stage of processing (in the sensory buffer) or late (response selection). The debate of early versus late theories of attention has now been rendered out of date. Wickens (1984, pp. 282-284) summarized the pros and cons and concluded that attention helps processing at all stages beyond short-term sensory store. Eysenck (1982, p. 13) considers the greatest inadequacy of the early and late theories that they all underestimated the flexibility of attentional mechanisms and processes, which can be used at several different stages of information processing. During the 1970s and 1980s there have been at least three significant shifts in thinking about attention. (1) Several theories of attention have emphasized the conceptual distinction between attention-demanding and automatic processes that do not require attention. There are many controversies and implications surrounding the nature of automaticity. According to the older view (whether single-capacity or multipleresource theory), automatic processing is a way to overcome resource limitations:
292
H. S. Koelega
it takes place without attention, is fast, effortless, because no capacity is required, and is also unconscious. Attacks have been made on the idea of capacity or resources and the economic metaphor on which it is based. A recent view is that automaticity is a memory phenomenon: performance is considered automatic when it depends on single-step, direct-access retrieval from memory (Logan, 1988). Wickens (1984) has warned that attention cannot be represented as a dichotomous variable. The distinction between attention-demanding and automatic processing could suggest that a source of information is either fully attended or ignored. But this dichotomization fails to account for the continuous gradations in the degree to which stimuli may be processed; some stimuli may be partially processed, receiving some but not full processing. From dichotic listening experiments it is known that certain (meaningful) aspects of the nonattended channel appear to be processed, so the brain must be determining the meaning of the nonattended channel all along. (2) The original notion of attention as a bottleneck at a certain stage of processing has been replaced by a conceptualization in which attention is regarded as a limited power supply that can be flexibly allocated in many different ways in response to task demands. The conceptualization of older theories, postulating a bottleneck occurring at the same point in the stream of information processing in different tasks, was too rigid. Contemporary theories agree that the limited-capacity central processor (LCCP, a homunculus-like entity) is used in a flexible way and can be used to facilitate processing at virtually any stage of processing. The LCCP has a relatively limited capacity, as buttressed by interference effects obtained in the dual-task paradigm. A parallel development in thinking has been the notion that data-driven or 'bottom-up' models, in which processing is a passive process, should be supplemented by resource-driven or 'top-down' models where there is room for active control of the information flow, active selection of what data to process and whether or how to respond. It should be noted, however, that it is difficult to disentangle completely the passive and active nature of processing within any single task: performance is always an interaction (Posner, 1978). (3) A third shift in thinking is that, in addition to some nonspecific resource of limited capacity, there are multiple resources, a number of specific resources or mechanisms. Theories proposing global capacity, a single, undifferentiated pool of resources (Kahneman, 1973), cannot account for the absence of a tradeoff on dual-task performance. Apparently, not all tasks compete for the same undifferentiated pool of resources. Performance on any task requires different mental operations drawing on separate resources (the concept of resource describes the entity enabling task performance, but it is a hypothetical construct, which can be inferred but not observed). Wickens' (1984) theory of multiple resources states that separate resources may be defined in terms of three dichotomous dimensions: (1) stages of information processing (encoding, perception, memory, responding, etc.); (2) modalities of input; the auditory modality is superior to vision as a means of alerting, but there is a visual dominance over awareness, compensating for this handicap (Posner and Rothbart, 1980): when both auditory and visual stimuli are presented simultaneously, subjects are often unaware of the tone; and (3) codes of information processing: spatial versus verbal information; this dichotomy appears to correlate with the function of the two cerebral hemispheres. It should be noted that theories
Sustained attention
293
proposing multiple, separate, structure-specific capacity reservoirs have sometimes been faulted for their unparsimonious nature. If a current version of the theory cannot account for the data at hand, it is always possible to define another resource, and the theory cannot be disconfirmed. Several theorists have expressed the view that the study of attention is productive only when performance data are intimately connected with physiological data, neural processes involved in attention. During the last two decades there have been several attempts to capture attention in terms of physiological mechanisms. These approaches are interesting because they refer to the data of vigilance experiments. Pribram and McGuinness (1975) identified, on the basis of a survey of approximately 200 experiments, three basic attentional control processes, separate but interacting neural systems: (1) one is phasic and controls arousal resulting from input, the registration of input in awareness ('What is it?'); this system is centering on the amygdala; (2) a second system is tonic and controls activation, a vigilant physiological readiness to respond ('What's to be done?'), centering on the forebrain basal ganglia; (3) a third system coordinates arousal and activation, which demands effort, centering on the hippocampus. According to Pribram and McGuinness, especially the activation system is involved in vigilance performance. Drawing from this formulation, Tucker and Williamson (1984) differentiated between the neurotransmitter substrates of the arousal and activation systems (they did not include the 'effort' construct because in their view it overlaps with activation features). The neurotransmitter substrates of arousal, the system producing a phasic response to input, are norepinephrine (NE) and serotonin (5-hydroxytryptamine; 5-HT) pathways, functioning in a reciprocally balancing relation. NE, associated with activity of the locus coeruleus, varies with the level of alertness, e.g. declines with stimulus repetition, and is closely linked to external input. The right hemisphere is specialized for arousal, has a primary role in perception, and relies more on NA and 5-HT than does the left hemisphere. In the activation system, maintaining a tonic readiness for action, dopamine (DA) pathways combine with cholinergic influences. The left hemisphere is specialized for activation, has a primary role in motor control (in most people the right hand is superior in manual control) and, according to the authors, the relevant neurotransmitter pathways are asymmetric in their distribution and function: DA is left-lateralized and DA terminals show a higher concentration on the left. A currently influential theory is that developed by Posner and his coworkers. In the early formulation (Posner and Boies, 1971) three components of attention were distinguished: (1) selectivity, the ability to select relevant information; (2) alertness, a general component, a widespread increase in cortical activation in order to maintain attention; especially this component would be involved in vigilance performance; and (3) limited central processing capacity. Comparison of this model with that of Pribram and McGuinness (1975) is thwarted, due to the lax use of the constructs 'arousal', 'activation' and 'alertness'. For Posner and Boies, 'alertness' is identical to 'arousal' (see Posner and Boies, 1971, footnote 3, p. 391), but their description of 'alertness' bears more similarity to the 'activation' system of Pribram and McGuinness than to these authors' notion of 'arousal'. It is beyond the scope of this chapter to sketch the development in Posner's thinking about attention: see Posner (1975, 1978) and Posner and Rothbart (1980).
294
H. S. Koelega
Posner et al. (1987) presented a 'cognitive-anatomic analysis'. The authors investigated whether visuospatial attention shared cognitive operations with other senses of attention (a language task). They reported to have evidence for the existence of two distinct neural systems: visuospatial attention would be a specific, independent module, involving the parietal lobe, but there is also a more general command system, common to all attention tasks, related to the frontal lobe. A further point, made by the authors, is that the right hemisphere would be more closely involved in maintaining alertness, necessary for successful completion of vigilance tasks. In a more recent paper (Posner and Petersen, 1990), a functional anatomy of the human attentional system is outlined, based on three fundamental findings: (1) the attention system of the brain is anatomically separate from the systems that process specific data, although there are interactions; (2) attention is carried out by a network of anatomic areas rather than by a single center; and (3) these areas carry out different functions. Three major functions have been prominent, subsystems that perform different, albeit interrelated, functions: (1) Orienting to events, relying on the parietal lobe. (2) Detecting signals for conscious processing, involving the anterior cingulate. (3) Maintaining a vigilant or alert state, depending heavily upon the right hemisphere, especially the prefrontal cortex, and on the norepinephrine system arising in the locus coeruleus. (Note that Tucker and Williamson (1984) also viewed the latter system as crucial for level of alertness, but note also that Pribram and McGuinness considered the activation system, centering in the left hemisphere and drawing on dopamine pathways, as the most important in sustaining attention.) So, we now have a concept of an attentional system of the brain with specific operations allocated to distinct anatomic areas. Attention may be thought of as the operation of a separate set of neural areas, interacting with domain-specific systems (e.g. visual word form, semantic association, etc.).
4.2
Event-Related Brain Potentials
Many investigators of attention, especially those working in cognitive psychophysiology, have expressed the view that attentional mechanisms may be uniquely captured by the use of event-related brain potentials (ERPs). This time-locked manifestation of cortical activity is in fact a plot of amplitude of a wave over time, showing both negative and positive shifts in electrical potential, the waveforms or 'components' (according to some authors, the term 'component' may be used only if there is evidence for a separate generator). Early negative shifts, occurring after approximately 100 and 200ms (N1 and N2, respectively), are relatively small and more invariable than later components such as the positive deflection manifest after about 300 ms (P3), which are usually larger and more sensitive to experimental manipulations. Posner (1978) has suggested that the N1 and P3 reflect two different processing systems: orienting and detection, respectively. In his view the P3 is closely related to conscious processing, to conscious detection of a signal (which may also be a missing stimulus). The LCCP would be indexed by the P3. Most investigators agree that ERPs are at present a sophisticated means of studying attention, and many researchers have stated that ERPs may reveal the mechan-
Sustained attention
295
ism(s) of the vigilance decrement. Koelega and Verbaten (1991) have looked at the evidence. These authors noted that investigators have been interested in the vigilance paradigm from the earliest days of ERP research (1964), the general idea being that performance failures are failures of attention and ERPs can be used to study attention. But, although a plethora of ERP investigators have studied vigilance or oddball tasks, relatively few have reported on the hallmark of the sustained attention problem, i.e. changes with time on task. Inspection of 15 studies, summarized by Koelega and Verbaten, showed an erratic pattern in the time-course of behavioral and ERP data. When there was a decline in performance, there was often no change in ERP parameters and vice versa. Examples were given for the N1, P2, N2 and P3, together with recent notions about their possible functional significance in signal discrimination situations. The review showed that ERP studies of vigilance do not present a clear and consistent picture, at least with respect to changes with time. Several reasons were advanced for this state of affairs, namely the simultaneous occurrence of several, separate processes in ERP trends over time, shortcomings in technique and analysis, and the lack of a fine 'texture' in behavioral measures (the psychological constructs 'sensitivity' and 'criterion level' often correlate with the same ERP measures); in contrast to an overt response, ERPs may not reflect the final product but only aspects of processing. The conclusion was that experiments overcoming several shortcomings of earlier studies might indicate relationships between behavioral and electrophysiological trends. Koelega et al. (1992) claim to have carried out such an experiment, a visual vigilance study, in which single-trial ERPs and performance data of 40 males were obtained. The authors noted several significant correlations between overall mean scores, e.g. between P3 and hits, but the only significant relationship between time trends was that between P3 amplitude and RT, an inversely varying relation over time. Analysis of covariance showed that this was a 'true' relationship. The ERP results did not support the hypothesis that a decline in detection scores is caused by time-induced increasing difficulty in discriminating signals from nonsignals. A gradual decline in effort or resources allocated to the task might be an alternative explanation, but an effort hypothesis cannot easily be tested, because there is uncertainty over ERP indices of effort. The uncertain and disputed character of the SDT components of sustained attention (d' and beta) may also impede attempts to assess their relationship to particular ERP deflections. What does 'sensitivity' signify in vigilance experiments when d' or A' declines with time but the discrimination of targets from non-targets, as reflected in ERP amplitudes, remains unchanged? So, there is evidence for a relationship between trends in P3 amplitude and trends in motor processes (response latency). Further, there was a significant correlation between mean P3 amplitude and mean hit score, suggesting that the activity of which P3 is a manifestation may be related to overall level of performance rather than to changes with time on task. However, there was no evidence for a relation between ERP components and measures of detectability (hits, A') over time, the main issue of interest. In the 1991 review by Koelega and Verbaten, several reasons were advanced to account for this absence, but some remarks may be added. Several ERP investigators have indicated that statements such as 'ERPs reflect attention' are meaningless,
296
H. S. Koelega
since all words are vaguely defined (Donchin and Isreal, 1980). According to these authors, the term 'attention' invokes a miasma of imponderables, and the meaning of the verb 'reflect' is obscure. Donchin (1979) has warned that the view of psychophysiology as the search for the 'physiological correlates of psychological processes' may lead to a fallacy. Attempts to validate the proposition that, for example, P3 is a correlate of attention, require a demonstration of a correlation between P3 and hits or RT, but there does not need to be a significant correlation coefficient. The behavioral measure (e.g. RT) is only an index of attention: RT is not attention. An ERP component is not a 'correlate' of some behavioral variable, but is a manifestation, at the scalp, of neural activity which plays a certain role in the informational transactions of the brain. Correlational statements about the 'relationship' between a psychological variable and an ERP component therefore are of little inherent value. The mere statement that 'P3 reflects, or manifests, or is associated with conscious attention' (Posner, 1978) or that 'the N1 indexes the allocation of processing resources' (Hillyard, 1981) leaves a psychologist who is interested in attention quite cool, because the statement provides no useful information: nothing in any psychological theory predicts either a reduction or an enhancement of N1 and P3 as a consequence of attention (Donchin and Isreal, 1980). Further, a major reason for the lack of clear-cut relationships probably also is that the field of ERP research is in flux: there is wide disagreement among psychophysiologists with respect to validity and interpretation of ERP components. The notion developed by Posner (1978) that N1 reflects automatic processing, and P3 indexes conscious attention or limited capacity, is an over-simplification. This view of the ERP as a relatively simple waveform with main deflections at N1, N2 and P3 has been challenged. It has been suggested (N/i/it/inen and Picton, 1987) that at least six different cerebral processes can contribute to the auditory N1, that there are several different kinds of N2 (N/i/it/inen, 1986), and that there are certainly multiple P3 waves, possibly with different intracranial generators. Friedman et al. (1981) identified, in two vigilance tasks, four distinct P3s, differing in timing, topography and their relationship to the discriminative response. Picton (1992), noting that the more we know about the P3 the less we understand, has suggested that the P3 wave is generated at multiple locations within the brain, and since these locations may vary from task to task and from individual to individual, it will be difficult to demonstrate the sources. Moreover, to look only at the P3 and not at its associations with the N1, N2 and P2 is to put on blinkers. Given the uncertain character of components and the sometimes contrasting opinions about their functional significance, we may hardly expect to find straightforward explanations of performance declines in terms of ERPs. An ERP approach has at the present stage turned out to be inadequate although, admittedly, ERPs are as yet one of the most sophisticated means available to cognitive psychophysiology of investigating attention and information processing. Hillyard and Picton (1979) have asked whether attempts to identify electrocortical correlates of attention are in fact a rather empty exercise: 'what can possibly be gained by linking up ill-defined, hypothetical events on the psychological level with diffuse electrophysiological phenomena having unknown brain origins?'. Their reply, with which I concur, is that ERPs are virtually the only means to evaluate the physiological events of the normal human brain as it performs its spectacular feats of information processing.
Sustained attention
297
Let us now turn to the other construct often employed in explaining vigilance performance: arousal.
AROUSAL-AN INDIRECT APPROACH TO EXPLAINING VIGILANCE PERFORMANCE: THE EFFECTS OF NOISE
5.1
The Construct of Arousal
The concept of arousal has usually been thought of in both physiological and behavioral terms and has been applied to changes accompanying the transition from sleep to excitement. The concept has its origins in the 1930s when several investigators proposed that there was a general drive state, some internal energy, potentiating and driving all behavior. Duffy (cited in Eysenck, 1982) defined arousal or activation in terms of metabolic activity in the tissues. The level of arousal or energy mobilization '... may be defined as the extent of the release of potential energy, stored in the tissues of the organism, as this is shown in activity or response'. It was claimed that there is a continuum of arousal ranging from coma or deep sleep, through drowsiness and normal wakefulness, to panic-stricken terror or great excitement. Behavior can be regarded as varying along this continuum of intensity and these behavioral variations were hypothesized to parallel, and to be related to, physiological variations. Level of arousal as manifest in, for example, the desynchronization of the electroencephalogram (EEG) would be related to the degree of activation in, for example, behavioral performance. Research on the effects of various arousing agents (e.g. loud noise, heat) on task performance lent credibility to the same concept ('arousal') as used by research on neural systems (the ascending reticular activating system and the diffuse thalamic projection system) involved in the maintenance of wakefulness. Broadbent (1971) was one of the first investigators noting that there are intriguing similarities in the effects on performance of such apparently quite disparate factors as noise, time of day, introversion-extroversion, incentive, etc. He argued that all such factors increased arousal, whereas arousal was decreased by sleep deprivation, boredom, fatigue, etc., and he supported his argument with physiological and behavioral evidence. According to Eysenck (1982), this attempt to explain a great variety of findings by means of a fairly simple arousal model was surprisingly successful. Broadbent (1971, p. 413) acknowledged that the physiological concept of arousal is of ultimate relevance, but he emphasized especially a psychological concept of arousal: the common effects on behavior of factors such as drugs, noise, sleep loss, social interaction, anxiety, etc. His 'arousal theory of stress' made the assumption that the relationship between the level of arousal and the level of performance takes the form of an inverted U. This generalization, which had already been offered by Yerkes and Dodson more than 80 years ago (1908), states that there is a curvilinear relationship between arousal and performance: performance is best at a moderate level of arousal and suffers if arousal is either very low or very high. A further assumption of Yerkes and Dodson was that the optimal level of arousal was inversely related to task difficulty. The Yerkes-Dodson law, or inverted-U relationship, has become the cornerstone of many theories of performance.
298
H. S. Koelega
Eysenck (1982) discusses seven theories of arousal and performance (respectively, by Yerkes-Dodson, Easterbrook, Broadbent, N/i/it/inen, Kahneman, Thayer, Hasher and Zacks) and concluded that almost all theories incorporate a unitary and concept of arousal. All effects of arousing and de-arousing agents on behavior would be mediated by a single, global, nonspecific arousal system. This was justified by two discoveries: (1) different arousal manipulations often produced similar behavior patterns, which warrants adherence to a single arousal model; and (2) arousers frequently interact with one another in theoretically predictable ways; it is difficult to explain the presence of interaction, rather than additivity, if each arouser were affecting a separate system or mechanism. Revelle et al. (1980, p. 5) have stated that the assumption that all effects on performance can be subsumed under a common construct 'arousal' is theoretically fruitful and, in their theory of performance, Humphreys and Revelle (1984, p. 158) remarked that a single state of general arousal is theoretically parsimonious as a higher-order construct, which allows for a broad synthesis of research findings. Gale and Edwards (1986, p. 491), noting that the concept of arousal has emerged as a hypothetical construct or intervening variable intimately connected with the notion of energy, have even suggested that arousal, in this guise, can be seen to be one of the best candidates for the unification of psychology, as a major integrating concept for psychological theory. Arousal theory, as advanced to explain the topic of this chapter, vigilance performance, maintains that a progressive reduction in the level of arousal of the central nervous system (CNS) takes place during task performance, brought about by the monotonous nature of the vigilance situation, and, as a result, the brain becomes less responsive and less efficient at dealing with external stimulation (Davies and Parasuraman, 1982). Warm (1977, p. 641) cites several studies, e.g. by O'Hanlon and his coworkers, as impressive support for arousal as a factor in the deterioration of vigilance performance. Loeb and Alluisi (1984, p. 187) report that '... there appears to be considerable evidence that arousal is associated with both the level of vigilance and the vigilance decrement'. Davies and Parasuraman (1982, p. 18) state that '... there is little doubt that the level of arousal does tend to fall during the performance of a vigilance task, and that the vigilance decrement is therefore associated with a decline in arousal' and (p. 19) '... it is likely that high arousal levels increase, while low arousal levels decrease, the overall level of performance'. However, the authors have also stated that changes in performance are not always related to changes in physiological activity. A relatively straightforward method of studying the effects of arousal on performance is to manipulate the level of arousal by means of exogenous (external) stimulation, such as noise and heat, or by means of internal stimulation, such as drugs. A difference between the two types of stimulation is that with drugs the subject will sometimes be unaware that he or she has been aroused or de-aroused, whereas with noise the subject is always aware. Noise has been the most widely used treatment for increasing arousal in performance studies, and has been the most extensively studied (Hockey, 1979). Eysenck (1982), discussing the performance literature, has also stated that, in practice, noise has been far and away the most used exogenous determinant of arousal and that we have a clearer picture of the effects of noise on performance than of any other arouser. In the case of noise as an arouser we have to draw on indirect evidence; arousal must be inferred from experimenter definitions of stimulus conditions. Sanders (1986) terms noise a 'real'
Sustained attention
299
environmental factor. In the case of internal stimulation by means of drugs, we can draw on direct evidence because the nervous system (and thus arousal) is directly manipulated; Sanders calls this a 'real' bodily factor. The next section addresses the latter, more direct, approach. The present section inspects the evidence for the indirect manipulation of arousal by means of noise.
5.2
Noise and Vigilance
The effects of noise on sustained attention have been reviewed by Koelega and Brinkman (1986). The authors noted that the literature on the effects of auditory noise on visual performance presents an outstanding example of confusion. Early investigators such as Cassel and Dallenbach (1918) recognized the inconsistent results of previous studies and of their own study. They found all kinds of results: facilitative, detrimental and no effect at all. Not only older reviews (Berrien, 1946; Broadbent, 1957; Plutchik, 1959), but also more recent ones (Broadbent, 1979; Coates, Adkins and Alluisi, 1975; Hancock, 1984; Kryter, 1970; Loeb, 1980), show a discouraging lack of agreement. Contradictions abound, and there is no basis for drawing conclusions. It is practically impossible to predict the effect of noise in a given situation. Even within a single study, performance may at any given time be seen to be decreased, increased or unaffected depending on a number of variables (Lysaght et al., 1984; Teichner, Arees and Reilly, 1963), sometimes depending on very specific, seemingly trivial, features of experimental conditions (Jones, Smith and Broadbent, 1979). The hypothesis of the review by Koelega and Brinkman was that task classification in terms of demands made on the observer should reconcile conflicting findings so that generalizations could be made. Therefore, a study was made of the effects of intermittent or variable noise on vigilance experiments with similar task demands. Twenty-one sensory vigilance studies, selected from 98 visual performance experiments, were analyzed in detail. It appeared that, even when studies possess similar task characteristics, they are hard to compare owing to the many types and varieties of the noise variables involved and the measures of performance used. Contradictory results remain. It was concluded that we know nothing about the effects of variable noise on sustained attention, despite the importance of this kind of noise for everyday life. An experiment by Koelega, Brinkman and Bergman (1986) showed that the coarse analyses usually employed in noise studies may not always have accurately reflected performance changes: the conventional way of analyzing the data showed no effect of noise, but a fine-grained analysis demonstrated a facilitatory effect on level of performance. The authors suggested that the usual way of analyzing may veil the effects of independent variables such as noise. It is a somewhat discouraging suggestion that many studies, reporting no effect of noise, or having remained unpublished as a consequence, might have revealed effects on an in-depth analysis. Likewise, the review of the literature cannot have left many investigators in high spirits: it was stated that it may be impossible ever to arrive at generalizable statements on the effects of noise on (vigilance) performance, in spite of a spectacular increase of reported studies during the 1960s and 1970s. This is not equivalent to stating that there are no well-designed experiments on noise and vigilance: there are numerous fine studies.
300
H. S. Koelega
A main reason for the lack of consistency, of course, is that the study of noise is an extremely difficult and frustrating area of research, which is (again) highlighted in a recent paper by Dornic (1990). Noise is a relative construct: sounds may be unwanted (i.e. experienced as inappropriate in a given situation) and may therefore be considered as noises for some people but not for others, on some occasions but not on others. The notion of noise is subject-related: it is a function of the person (personality, noise sensitivity, attitudes, interests, momentary mental and physical state) and the situation in which it takes place. A person may be at home enjoying loud music from the stereo, but may get very annoyed at a neighbor's barking dog. Important factors are the suitability of the noise, the self-responsibility for the noise setting (Hockey, 1986a) or, more generally, the possibility to exercise control over the effects of the environment, which has also been shown to be a primary factor in the 'sick building' phenomenon (Vroon, 1990). A finding emerging from many studies is the presence of complex and often unpredictable interactions between a large number of variables: noise intensity, noise type, sex, time of day, extroversion-introversion, task length, task type, task difficulty and measure of performance (loud noise has been shown to improve speed but to impair accuracy). Results will be highly specific, which amounts to stating that the generalizability of a single experiment is nil. How to continue with noise research? I cannot see that more theorizing will bring a solution. Hancock and Warm (1989) have recently presented a dynamic model of stress and vigilance, and have remarked (p. 522) '...the frustration expressed by Koelega and Brinkman (1986) is an example of conditions that result when theory fails to assume a leading role in knowledge development'. However, in this area detailed and reliable predictions are virtually impossible, since these are a function of replicability of findings, which is low in noise research. Dornic (1990, p. 26) has also concluded that ambitions of generalizability and predictability, and thus of understanding, are unrealistic and should not be included in programs of noise research. Although, on the basis of these considerations, one might defend the position that further experimentation is rather pointless, it should also be acknowledged that the effects of noise are an important societal problem and therefore deserve researchers' attention. Traffic noise and industrial noise have been shown to be associated with chronic diastolic hypertension and cardiac impairment. It has been reported that children living in noise-impacted areas or noisy homes display attention dysfunctions and impaired performance on cognitive tasks, including concentration capacity, that sensorimotor development may be impaired in children as young as I and 2 years of age, and that sleep efficiency in young subjects may be reduced (Dejoy, 1984, p. 131; Evans and Tafalla, 1987; Saletu et al., 1989). Noise can affect people both directly and indirectly, can impair performance or can make unchanging, overt performance more costly in terms of increased expenditure of additional effort. But noise can also affect persons by annoying them. Two different target areas of noise research (annoyance and performance) have always been treated as separate and unrelated issues but annoyance may be an important mediator of performance (Koelega, 1987). The direct and indirect (irritation, increased effort) effects of noise may be caused by increases in arousal (Dornic, 1990), and this takes us back to the starting point of this section. I mentioned that authorities have expressed the view that noise has been the most widely used treatment for increasing arousal, and that we have a clearer
301
Sustained attention
picture of its effects on performance than of any other arouser. I have not encountered studies demonstrating that deteriorating vigilance performance can be prevented by the introduction of noise. Several studies have shown facilitatory effects, and other studies have reported impairing effects, on overall level of performance. The complex nature of noise stimuli as arousers, discussed in the present section, may have contributed to this picture. Possibly, more can be learned from the studies manipulating arousal directly by means of drugs.
AROUSAL-A
DIRECT APPROACH
TO EXPLAINING
VIGILANCE PERFORMANCE: THE EFFECTS OF D R U G S A large number of experiments investigating the effect of drugs on performance, including vigilance, have been carried out, yet the subject is often absent in many books on attention and performance. Neither has the field of psychopharmacology received much recognition in psychological textbooks, although effects of drugs on behavior have been studied for over 100 years. Many investigators have stated that drugs provide a unique means to manipulate arousal. Callaway (1983) has remarked that 'psychopharmacology provides us with a vast array of tools for modifying human information processing; there are agonists and antagonists for almost all known neurotransmitters and neuromodulator systems...'. O'Connor (1983), who has been working with the stimulant nicotine for more than 15 years, has stated that pharmacological manipulation of arousal is more direct than that of any other arouser and has yielded consistent associations with behavior. Smith, Wilson and Davidson (1984), using caffeine, have pointed out that it is important to increase the range of arousal conditions systematically by means of drugs, particularly at the lower end. Hockey (1979, p. 165) has also emphasized that much more work is needed on reduced arousal states; unlike high arousal, low arousal is a 'normal' state for a large part of each day. A great advantage of sedative and stimulant drugs is the ability to grade effects by judicious choice of dosages and the reversibility of the effects. With respect to vigilance performance, there are several statements pertaining to drug effects. Davies and Parasuraman (1982) report that there seems little doubt that amphetamines reduce the amount of decrement (p. 154), that caffeine arrests the decline in performance (p. 156), and that nicotine also prevents the decrement by maintaining the level of arousal (p. 157). Further, they state that all drug effects are usually interpreted in terms of arousal (p. 158). Hink et al. (1978) have also stated that a well-replicated finding is that CNS stimulants (e.g. amphetamine, methylphenidate, nicotine) counter the detrimental effects of time on vigilance performance, and that CNS depressants (e.g. alcohol) precipitate the vigilance decrement or increase its depth. In their own experiment, the authors attributed the vigilance decrement as well as the effects of methylphenidate or secobarbital to changes in the intensity of behavior (arousal) rather than to changes in attention. Koelega (1989, 1991, 1993, 1995) has reviewed the psychopharmacological evidence and addressed the question of whether vigilance tasks are indeed sensitive instruments to monitor drug effects, since the literature on this issue is conflicting. Further, if Callaway (1983) is right in stating that drugs with highly specific actions are available, we might even get some clue as to the specific structures and
302
H. S. Koelega
pathways in the nervous system involved in vigilance (cf. the contrasting opinions, reported earlier, regarding the roles played by the norepinephrine and dopamine systems in vigilance). Some biological theories of neuropsychiatric disorders have evolved from the neurochemical mode of action of drugs, e.g. the 'dopamine hypothesis of schizophrenia' and the 'catecholamine theory of depression'. We might find some support for a 'norepinephrine or dopamine hypothesis of attention dysfunction or deficit'. The reviews were limited to a discussion of the depressant drugs (benzodiazepines and alcohol) and the stimulant drugs (amphetamine, methylphenidate, caffeine and nicotine). The mechanisms of action of these drugs are different, but in the present chapter we may forego a discussion of the psychopharmacology of the agents and their neurotransmitters. The results of several experiments using drugs show that for the two depressant drugs, benzodiazepines and alcohol, the results are remarkably similar. Overall level of performance was impaired in about 50% of the cases, and there were practically no reports of an aggravated decline of performance under drug conditions, but one should bear in mind that very few studies have analyzed time-on-task effects. Only one study has reported a more rapid decline under alcohol. Some types of vigilance task, especially those using spatial stimuli, appeared to be sensitive to very low levels of blood alcohol concentration. Alcohol effects seem to be greatest during periods of sleepiness (the early afternoon and after midnight). With respect to benzodiazepines, some additional information from recent experiments in Utrecht can be provided. In an experiment with a crossover design (van Leeuwen et al., 1994), a dose-dependent deterioration of overall level of performance with two doses of oxazepam was found, whereas the decrement with time was aggravated by the drug, albeit non dose-dependently. In a further experiment (van Leeuwen et al., in press) a curious dissociation was noted: both the benzodiazepine and time-on-task impaired performance but alpha-1 activity (8-10 Hz) of the EEG was decreased after drug ingestion and increased with time-on-task. Alpha-1 power could 'explain' (established by analysis of covariance) the drug effects on level of performance, and on some ERP waves (N1 and P3) as well, but the effects of time on performance were not wholly explained by the effects of time on power bands, although increased delta activity contributed to the decline in performance. These data underline that different explanations are required for effects on absolute level of performance and the deterioration with time, and that explanations of electroencephalographic activity in terms of a unidimensional theory of arousal are untenable. The results for the stimulant drugs are more clear-cut (Koelega, 1993). Amphetamine has been reported to improve overall level of detection (hits) in five studies out of 12, and an additional three studies reported improvement under special conditions. In four out of six experiments that reported results on the time course, amphetamine prevented a decrement in hits. There were no effects on false alarms. Very few studies reported on RT, d' and beta; generally, no effects were noted on these measures. Only one study (O'Hanlon et al., 1978) provided a separate analysis for 'decrementers' and 'nondecrementers', arguing that when there is no decrement under the placebo condition, one cannot expect that amphetamine will prevent a decline; indeed, amphetamine arrested the deterioration of the decrementing subgroup.
Sustained attention
303
There are only four studies reporting on the effects of methylphenidate in normal adults. The results on overall level are not consistent, not even with the same research group using identical tasks. Two studies reported on the decrement: Hink et al. (1978) reported no effect of methylphenidate on d', but this is not surprising as there was no decrement in the placebo condition. Strauss et al. (1984) found a significant interaction of drugs and period for d': methylphenidate prevented the decline in sensitivity; for hits, this interaction approached significance. For caffeine, 17 comparisons for overall level of hits are available (besides an additional six for the hybrid measure 'errors', i.e. misses plus false alarms). Six comparisons showed no improvement after caffeine, three only under special circumstances, and 14 comparisons showed an improvement in hits with the drug. Seven comparisons are reported for the decrement in hits: only one reported an effect of caffeine on the time course in a continuous clock task (not in another task), ameliorating with 250 mg but aggravating with 500 mg. For nicotine, 17 comparisons are available for hits or sensitivity, 11 of which show improvement of overall level as an effect of the drug, and two others showed improvement under special conditions. In five out of seven cases, the decrement was prevented. In conclusion, although statistical analyses have sometimes been less than adequate, for all four stimulants improvements in overall level of detection have been found, and especially for amphetamine and nicotine it has been reported that the decrement occurring in the placebo condition was prevented. There were practically no effects of the drugs on false alarms. Since improved performance was also noted in sessions with a duration of less than 10 min, there is no support for earlier suggestions in the literature that effects are noticed only in fatigued subjects. In some cases it has been reported that nicotine (i.e. smoking) produces absolute improvements in performance, above and beyond baseline levels (Wesnes and Warburton, 1983). Evidence from several studies does not support the hypothesis, advanced in the literature, that improvements with nicotine are only a recovery of withdrawal-induced impairment. Because vigilance performance is affected in a seemingly similar way by drugs with a different neurochemical mode of action, there is no clear support for either a noradrenergic, dopaminergic or cholinergic theory of sustained attention. On the basis of these results, Koelega (1993) has concluded that simple neurotransmitter theories of attention and information processing are untenable, as most drugs possess multitransmitter effects. Most authors have explained the results obtained with drugs in terms of 'arousal': depressant drugs would decrease the level of arousal and stimulant drugs would increase the level (note that the distinction between 'depressant' and 'stimulant' drugs is not absolute, as some depressants may sometimes have stimulatory effects and vice versa, e.g. depending on dose). In the next section I shall address the question of whether this settles the argument around the mechanism(s) of vigilant behavior in favor of arousal. There are also some experiments in which arousal was manipulated both indirectly, by means of noise, and directly, by means of drugs. Some investigators have argued that since noise raises arousal levels, and a depressant drug such as alcohol reduces arousal, the combined effects should be canceled out. Colquhoun and Edwards (1975) reported this to be the case in their experiment: there was an interaction between noise and alcohol in accuracy (errors increased with alcohol but
304
H. S. Koelega
decreased under noise), but not in speed of responding (noise, but not alcohol, decreased speed, and there was no interaction). Hamilton and Copeman (1970) employed a dual-task paradigm (tracking and signal detection) and obtained somewhat more complex results: tracking performance improved under noise and fell under alcohol, but detection of lights in the periphery of vision was degraded by both noise and alcohol. Such results do not provide unequivocal evidence that in the case of noise and drugs the same arousal dimension is involved. Let us examine the arousal concept somewhat closer.
7
CONCLUDING
7.1
REMARKS
The Constructs of Arousal and Attention Revisited
The preceding section provided evidence that drugs can change vigilance performance in a more straightforward and consistent way than noise. Almost unanimously the authors have used the arousal concept in explaining these effects. Does that help us in understanding the mechanism(s) of sustained attention? In this section I shall make an attempt to highlight some problems associated with the use of the concepts of arousal and attention. 7.1.1
Arousal
In the introductory part of section 5 we have seen that several authors have expressed the opinion that the use of a common construct 'arousal' is theoretically fruitful, and that arousal might be a major integrating concept for psychology. However, 'arousal' has, as yet, not fulfilled this promise. Rather than drawing together areas hitherto thought to be unrelated, many investigators now see it as a deceptive concept leading to confusion by creating an illusion of unity. Gale and Edwards (1986, p. 491) have acknowledged that its overuse as a portmanteau, all-explaining concept has tended to inflate- and therefore to dev a l u e - its explanatory power. It is a chastening exercise merely to inspect the authors' listing of the broad range of contexts in which the concept has been used. From the areas of experimental psychology, drug research, individual differences, stress research, circadian studies and social-psychological theory, Gale and Edwards provide at least 40 different examples. Arousal has been used as a drive, a source of stimulation, a consequence of stimulation, a property of stimuli, a quality of trait, state and mood, a cyclic fluctuation, a property of tasks, etc., etc. Moreover, the operational definition used ranges over a host of dependent variables, and the measurement of arousal includes some 20 different methods, from questionnaires to biochemical measures. The manner in which the proponents of arousal use the term, in a slipshod way, carries the danger that the concept ceases to have content. The concept seems able to act as an independent variable, dependent variable, intervening variable and hypothetical construct, at one and the same time. Within one design the concept is used to describe trait (extroversion), circadian, drug (caffeine) and interactional effects (Gale, 1987). The combination of such disparate
Sustained attention
305
uses in one context implies commonality, and this is in fact how the proponents think of arousal: as a common property of different systems. We observe a continuum of alertness in ourselves and others: we can be drowsy, awake but relaxed, alert, absorbed, angry, agitated, or on the point of psychological collapse. Thus, the concept of an arousal continuum still seems to make sense. The advocates of the continued use of the arousal concept acknowledge that there are problems with the arousal conceptualization of the 1950s and 1960s, but they find it helpful to incorporate the arousal concept in their theorizing. For example, Humphreys and Revelle (1984) have preserved the concept of a unitary arousal system in their model of performance. They see arousal as a motivational construct, a state of the organism to be thought of as a conceptual dimension ranging from extreme drowsiness to extreme excitement. It is the common behavioral effects with which investigators should be concerned; in their view, arousal is that factor common to various indicants of alertness. General arousal, a single activational state, would be theoretically parsimonious as a higher-order construct. Many authors do not agree with these views. Hamilton, Hockey and Rejman (1977) have stated that psychology has been poorly served by the principle of parsimony in this area, and they present a complex scheme of arousal and information processing. Neiss (1990) has criticized the view of arousal as a conceptual dimension: '...arousal, long regarded as nearing its long-overdue retirement, has begun a second career as a conceptual dimension. The failure of its referents to intercorrelate is thus eliminated, as it need not be assessed in its new status. Widely differing patterns of results are taken as support for arousal. Although charged with betraying those seduced by it (Claridge, 1987, p. 134) arousal remains at large...'. Claridge (1981) has remarked that the term 'arousal' has, by itself, been outstripped as an explanatory concept of anything other than the fact that people sometimes seem to be rather sleepy and sometimes rather excited. Hockey, Coles and Gaillard (1986) have argued that the terms 'arousal' and 'activation' should be avoided, because they are not sufficiently broad and are associated with much recent ambiguity. The continued popularity of arousal may testify more to the need for a construct denoting the intensity aspect of behavior, and the lack of alternatives, than to the success of arousal per se. It simply seems to be the only construct currently availably, in spite of more than two decades of criticism. Arousal has many of the qualities of a difficult but persuasive lover, whom reason tells one to abandon, yet who continues to satisfy an inescapable need (Claridge, 1967). My personal view is that we cannot yet do without the construct and that its further use may be justified if we keep bearing in mind that the earlier notion of some nonspecific, unitary state has been rendered out of date by later research data. Just as in the introductory section of section 4 with respect to the construct of attention, I shall make an attempt to sketch some developments which in my view represent shifts in thinking about arousal. (1) There is more than one arousal or activation state. Evidence comes from three levels of measurement. (a) Physiological. During the 1960s several lines of research indicated the existence of more than one arousal system (Lacey, 1967; Routtenberg, 1968; Vanderwolf and Robinson, 1981). Physiological work indicates that there are several types of arousal process and several distinct functional systems activating the neocortex.
306
H. S. Koelega
(b) Behavioral. Physiologists involved in animal work usually see arousal as a system governing sleep-wakefulness cycles. For psychologists the fluctuation o f arousal within the wakeful state is more interesting. Several studies have suggested that within this state there are several distinct types of arousal (Kahneman, 1973). Hamilton et al. (1977) have also proposed that we should conceive of the organism at any moment being in one of many possible arousal states, rather than as being at some level on a single dimension. The effects of low arousal after sleep loss (a decrease in hits and an increase in false alarms, so a general increase in errors) are not the same as those of low arousal produced by time-on-task (monitors usually exhibit a general response drop, a decrease in both hits and false alarms). The existence of more than one qualitatively different form of arousal is also suggested by the finding that different indices of arousal appear to have different circadian rhythms: body temperature, for example, peaks in the early evening but self-report measures of arousal, as well as catecholamines and performance, reach a peak at approximately midday, at about noon. Eysenck (1982) reviewed several theories of arousal and performance and proposed the existence of a common arousal mechanism besides more specific mechanisms. (c) Subjective. Self-report arousal is said to reflect the integration of several bodily systems, and has been reported to correlate better with physiological measures than with each other. Thayer, who has been working with subjective measures of arousal for more than 25 years (e.g. Thayer, 1989), has identified two distinct kinds of arousal: tense arousal and energetic arousal. (2) Arousal does not uniformly affect all resources. If we have to distinguish between qualitatively different states of arousal, rather than different levels varying quantitatively along a single dimension, a consequence is that a certain arouser may affect one type of process, but not another type. Information processing performance is not a unitary activity; arousal will probably differentially affect different processing functions. Hockey (1979) and Eysenck (1982) have reported evidence that increasing arousal changes the balance of effectiveness: work rate and the selectivity of attention are increased, but accuracy and primary memory capacity are decreased. So, arousal has differential effects on the various component processes involved in cognitive functioning. (3) The Yerkes-Dodson law has outlived its usefulness for psychology. The assumed inverted-U relationship between arousal and performance, still called upon by many researchers to 'explain' their data, has been under heavy attack because it effectively contributes nothing to theorizing or explanation. Hockey et al. (1986) have stated that the venerable law is, at best, an unhelpful oversimplification. Matthews (1985) could find no evidence for the applicability of the law and concluded that there is no reason to retain the notion of a curvilinear arousalperformance curve within future theorization. The Yerkes-Dodson law is too open-ended, allowing the experimenter to explain away almost any combination of results. The law is descriptive rather than explanatory, and most possible outcomes can be fitted to the law post hoc. It is difficult to obtain an experimental result that is beyond its ken. Further, Hockey (1979) has challenged the (implicit) assumption of the law that the effects of low arousal and high arousal are in some way equivalent. The changes are not symmetric: high arousal may increase errors, but low arousal may produce slower responding. (4) Arousal has moved away from a passive, static conceptualization to an active, dynamic one. The notion that the level of arousal may reflect the result
Sustained attention
307
of activity much more than being the cause of it has led to the view that arousal does not just indicate a (static) basic physiological state, but rather, more dynamically, a person's attempts to direct cognitive resources more effectively. Theoretical approaches suggested by Broadbent (1971), Eysenck (1982) and Hockey (1986b) all assume that there is a relatively passive arousal system, in the sense of what is done to the individual (manipulations with noise, drugs, etc.), as well as a system that reflects the result, the byproduct, of active processing effort in performing a task. The second, central cognitive, control system monitors the operations of the first arousal system, and will often instigate compensatory, effortful activities to counteract any performance deficiencies resulting from the manipulations with the first arousal system. Eysenck (1982) has pointed out that a crucial assumption is that the consequences of inefficiency of the lower mechanism will not become manifest in performance provided that the upper mechanism remains in an efficient state. The implication is that behavioral measures can be inadequate. Precisely the same level of performance can be achieved in two very different ways: efficient functioning of the lower mechanism combined with modest use of the upper mechanism, or inefficient functioning of the first system which is compensated for by involvement of the second one. Eysenck emphasizes that it is fatally easy to assume that, if an arouser does not produce a significant effect on task performance, this means that arousal has had no effect on internal processing of task information. On the contrary, comparable performance efficiency is often achieved at greater 'subjective cost' to the subjects who are willing to use the compensatory effort system, the executive control system. A further implication is that, if behavioral measures can be inadequate, it is practically impossible to find behavioral evidence for the two systems proposed by these authors. They claim that their theoretical position is supported by the striking empiric finding that manipulations designed to produce substantial increases or decreases in the level of arousal typically have rather modest (or even nonexistent) effects on performance, where behavioral consequences would be expected. Just as with the Yerkes-Dodson law, such hypotheses are too open-ended in their explanatory power, and any combination of results can be explained away. When there is no effect of an arouser (e.g. a depressant drug) on performance, the upper, active control system may have been involved, and when there is a substantial effect, the system has been unable to compensate for the impairment. However, most authors agree that the apparently great flexibility with which the human processing system copes with very low or very high levels of arousal is rather perplexing for advocates of a unidimensional arousal theory. The reader of this chapter may, in the discussion of arousal, by now have noted some familiar terms, encountered in discussing the construct of attention: flexible allocation, a common system besides specific systems, a top-down (active control) approach, a passive versus active mechanism, etc. Can, in fact, arousal be distinguished from attention?
7.1.2
Arousal and Attention
For some authors, arousal is a component of attention (Posner and Boies, 1971); for others, attention is a component of arousal, viz. the phasic, directional component
308
H. S. Koelega
in contrast to the intensity component (Claridge, 1981). For some, attention is a construct that can be clearly distinguished from arousal (Hillyard and Picton, 1979, p. 1); for others (Picton et al., 1978, pp. 430, 434) attention is intimately related to arousal: '...studies that have attempted to manipulate general arousal independently of attention have found that attention and arousal have similar effects on ERPs. However, it is probably impossible to change levels of arousal in the waking state independently of any attentional change...'. Whereas for many authors arousal is inextricably related to attention, the status of the relationship between the two constructs becomes particularly strained for other authors. For example, Beh (1989) and Smith, Rockwell-Tischer and Davidson (1986) claim that it is possible to maintain a clear distinction between arousal and attention within one design. Such statements strongly suggest that arousal and attention represent independent functions, related to different parts of the brain or regulated by separate processes. However, the distinctions implied by the terms are to a large extent artificial and therefore misleading; the constructs are merely semantic abstractions that overlap to a large extent. The results from experiments with drugs make it improbable that such concepts map onto different, particular, physiological structures or neurotransmitter systems (Koelega, 1993). We are confronted here with one of the classic problems of psychology, viz. accounting for motivational or emotional (or intensive) aspects of behavior, as opposed to structural or cognitive (or directional) aspects. There is a (usually implicit) general hypothesis underlying all theorizing in psychology, called the 'aspecificity hypothesis' by Sanders (1986), claiming that energetic state and processing of information (emotion and cognition, or arousal and attention) are basically distinct realms, allowing a distinct treatment of each. Research on the one has proceeded independent of research on the other, largely due to the dominance of the computer metaphor in cognitive psychology and its incompatibility with the energy metaphor. The idea of the brain as a computer permitted investigators to specify the computations necessary to perform a cognitive act in terms of elementary operations, but energetic processes could not be captured in this approach to psychology. During the last decade several authorities have stated that theorizing in psychology is going on the wrong way, and that the emphasis is too much on theoretical constructs of a cognitive nature, bypassing the energetic (physiological) state of the organism. One of the major contentions of the book Attention and Arousal by Eysenck (1982) is that there is an intimate relationship between arousal and attention, between motivation/emotion and cognition. Information processing, and cognition generally, are profoundly affected by the prevailing state of arousal, there are bidirectional influences, and the attempt to decouple cognition (or attention) from other systems (e.g. arousal) is fundamentally ill-judged. Likewise, the central theme of the book Energetics and Human Information Processing, edited by Hockey, Gaillard and Coles (1986), is the role of motivational (energetic) factors in the regulation of information processing activity. Most contributors in the book have criticized the dominant way of viewing the nature of mental activity, the computer metaphor, the models of human behavior based on the operation of the digital computer. These models are 'dry' models, are concerned primarily with the structural relationship between computational systems, which are assumed to have 100% reliability and zero variability in their information
Sustained attention
309
processing characteristics, each individual system (computer) operating in exactly the same way. What the computer metaphor does not allow is variability under different environmental or internal states, e.g. stress and emotion, effects of drugs and fatigue, or changes originating in natural cyclic (e.g. circadian) bodily processes. All these affect 'intensive' aspects of behavior. They are 'wet' processes, intrinsically tied to the organic tradition of psychology. The 'dry' approach cannot account for the flexibility and variability intrinsic to human behavior, which are outside the scope of the formal information processing models. Apart from the variability resulting from imposed or natural changes in state (fatigue, circadian rhythms, etc.), the models also cannot account for regulatory and strategic aspects of the control of behavior, relationships between information processing operations and underlying patterning of biological activity, and, above all, individual differences in all these areas, a characteristic notably absent in computers. The intrinsic variability in human behavior is almost completely missing from all theories of human information processing. The authors of the aforementioned book have used the term 'energetics' to provide a link between the various manifestations of behavioral intensity. 'Energetics' refers collectively to what have been called the 'intensive' aspects of behavior, its energy, or degree of vigor. Apparently we cannot dispense with energetical considerations altogether. The main point of all the aforementioned authors is that energetic and information processing concepts, such as arousal and attention, should be integrated and cannot in fact be studied separately. However, there are no guidelines or indications as to how we should proceed. How should we define such an integrative concept and can we measure it? The contribution by Parasuraman and Nestor (1986) in the book on energetics gives an example of integration with respect to the vigilance decrement. In their Table 1 it is shown that the earliest theories of the vigilance decline were energetic theories (arousal, motivation). Subsequent theories explained the phenomenon in terms of various information processing constructs (filtering, expectancy matching, probability matching, automatic and controlled processing). The newest theories have combined both energetic and information processing concepts: habituation/expectancy matching (Mackworth); alertness/pathway activation (Posner); arousal/expectancy and memory load (Parasuraman). The important message is that energetic (arousal) or information processing (attention) theories alone are insufficient to explain all the data. However, the examples given by Parasuraman and Nestor are combined theories; both types of concepts are left intact and are not really integrated. For example, Parasuraman explains the overall level of vigilance performance with arousal, but the decrement with memory load. Some authors have suggested that arousal and attention might be separated with certain lesions and in psychotic states. In the same chapter in which Posner and Rothbart (1980) have stated that the intensity (arousal) and direction (attention) aspects of behavior should be integrated, they report that a neat dissociation of the two has been found in patients with damage to the parietal lobe (p. 30; see also Posner et al., 1987). Claridge (1981, p. 136) has suggested that the distinction between arousal and attention, blurred in the normal individual, may become manifest in schizophrenia. Mirsky and Orren (1977, p. 234) have also indicated that the constructs might be considered to be independent by examining their disorders. So, we have a situation that the influences of arousal and attention may be
310
H. S. Koelega
theoretically separable, may be separable in pathological circumstances, but are inextricably related in normally functioning individuals, an assertion that is not easy to translate into an empirically testable form. In our search for an explanation of vigilance performance, we have ended up in the situation that the distinction between the two commonly used explanatory constructs, arousal and attention, is blurred. Neither construct can by itself be precisely defined or measured, nor can the state or process resulting from their 'integration'. Other explanatory constructs can be invoked, such as 'effort' and the 'allocation of resources': a case could be made that a subject is investing effort only on the first few trials, i.e. a novelty effect. But does it really make a difference that we say 'effort is only initially invested' rather than 'arousal is initially high' or 'paid attention is initially high'? Any model of performance, not only of sustained attention, is a testimony of a major problem, namely the proliferation of terms and the varying usage by different investigators: arousal, alertness, activation, attention, effort, capacity, resource (availability or allocation), etc. (in their Table 1, Hockey et al. (1986) present 12 energetical concepts with their principal locus of action). None of those terms can be defined properly, they are sometimes used in a general (nonspecific) sense and sometimes in a specific sense, they have been differently used by various investigators, and sometimes interchangeably, e.g. Kahneman's (1973) key concept of 'effort' refers to 'a nonspecific input, which may be variously labeled effort, capacity, or attention'. Such constructs should be defined in terms that are independent of the way in which they are measured. All constructs are inferential terms, hypothetical intervening psychological states or processes whose existence is inferred from performance, brain waves, skin conductance, etc. We find excellent examples of circularity (circulus viciosus) in this area: first, the state of sustained attention ('vigilance') is inferred from performance, then attention is used to explain performance; lapses of attention are used to explain performance decrements, and lapses are in effect defined as performance decrements. We use concepts such as attention or arousal as if they have an independent existence and therefore could be defined beforehand, which we cannot because they have no function without the phenomena they describe. No solution can be offered here to the problem of proliferation of terms and the inability to define them properly, but see, for example, Roskam (1990). I have made an attempt to come to grips with the constructs of arousal and attention, since, from the early days of research, they have been used to explain the vigilance decrement. What we observe is a decline in detection performance right from the beginning, and apparently we cannot do without hypothetical, intervening constructs such as attention, arousal, activation, alertness, etc. to explain the decline. But we encounter difficulties if we try to pinpoint the cause of the decrement to either arousal or attention, as is testified by the present exercise.
7.2
Aims and Outcome of this Chapter
In the introductory part of this contribution, its major aim was formulated in terms of an evaluation of the state of the art in several main topics of the area of vigilance research, in order to make up the balance after some 50 years of experimentation.
Sustained attention
311
With respect to the themes of vigilance, covered in the present chapter, it must be concluded that we still have more questions than clear-cut answers after so many years of research and possibly several thousand publications. This is not typical of vigilance research, but of many areas in experimental psychology. There is an overwhelming cascade of unorganized facts without systems to synthesize the research information. Fundamental to a synthesis of information in vigilance is the attempt to arrive at a system of classifying tasks, differing in information processing demands. I have discussed the twofold classification system, which is nowadays the dominant taxonomy in vigilance. The conclusion was that we are far from arriving at a complete taxonomy. The feasibility of a selection device, based on individual differences in performance, that will predict performance was assessed in section 3. No hard evidence could be found for the frequently encountered statement in the literature that introverts and electrodermal-labile subjects are superior in maintaining their performance over time. A meta-analysis showed that extreme introverts, as well as slow habituators, had a higher overall level of performance, but were not superior in sustained performance. It was concluded that we would be hard put to justify any type of predictor. Since the constructs of attention and arousal turned out to be key concepts in the many theories of vigilance performance, it was decided to focus on these two in the remainder of the chapter. ERPs have generally been considered to reflect attentional processes, and authorities have stated that ERP studies of vigilance present a reasonably clear and consistent picture. A review of the literature showed this to be misleading, and an experiment indicated that ERPs, for several reasons, have as yet not contributed much to an understanding of deteriorating performance. The construct of arousal was approached in two ways: as a more psychological concept with exogenous stimulation by means of noise, an indirect manipulation; and as a physiological concept with internal stimulation by means of drugs, a direct manipulation of the CNS. Authorities have stated that noise is the most widely used treatment for increasing arousal, and that we have a clearer picture of its effect on (vigilance) performance than of any other arouser. Our review has made such statements rather puzzling. In spite of an enormous number of studies we had to conclude that we know nothing, in the sense of generalizable statements. Noise as a means to manipulate arousal has an extremely complex nature, and it is practically impossible to predict the outcome of experiments. The mechanism(s) of vigilance performance are not revealed by this use of the arousal construct. The approach using a direct manipulation of the CNS by means of drugs has produced several straightforward effects. Vigilance performance appears to be affected by drugs, albeit not always in a hard and fast manner; in the field of psychopharmacology there are also often failures to replicate. Most authors using drugs have explained their results in terms of arousal. This chapter may have provided the basis for somewhat more well-founded statements regarding vigilance, at least with respect to the areas covered. Throughout this contribution it has been emphasized that vigilance performance, and especially the decline in performance over time, is a complex phenomenon, affected by many variables. The impetus to systematic study of sustained attention came from observations of radar operators' task performance during World War II. What has laboratory research contributed to a solution of practical problems?
312
H. S. Koelega
7.3 Laboratory Experiments and Operational Relevance The vigilance paradigm is one of the most widely studied paradigms of experimental psychology: Egeth and Bevan (1973, p. 408) estimated that nearly a thousand papers had appeared during the 1960s alone. More recently, Mackie (1987) considered vigilance to belong to one of the most prolific programs of scientific output in any area of behavioral science. One might well wonder why so much interest has been devoted to a practical problem that surfaced in the performance of radar operators during World War II. Warm (1984a) has stated that the answer lies in the fact that the problem of vigilance occupies a unique niche in psychology: it is the sort of problem that accommodates both basic research and applications. However, it should be noted that for more than 30 years there has been controversy over the operational relevance of vigilance research. No attempt will be made here to summarize all the arguments of dozens of authors involved in this discussion, but a few points may be noted. The attack on laboratory research focuses essentially on two issues. First, it has been questioned whether the vigilance decrement, the sine qua non of vigilance research and often found in the laboratory, exists at all in real-world monitoring tasks. Reports that the decrement was not found in complex tasks have appeared from the early days of research (e.g. Veniar, 1953; cited in Smith and Lucaccini, 1969). Elliott (1960) was probably the first author charging that the decrement is not found in protracted real-world tasks where the effects of time are more nebulous. Buckner (1963) has proposed that, in the laboratory, subjects show an artificially high performance in the beginning, a task novelty effect, i.e. there is a temporary investment of effort which wanes as time proceeds, whereas workers in everyday work settings avoid these high beginning levels, finding no reason to invest extra effort in the beginning of a work period, which may explain the absence of a decrement. But, in contrast to the suggestions that the decrement phenomenon is demonstrable only in the laboratory, Fox (1975) stated that in industrial inspection it is commonly observed that (fault) detection performance deteriorates as a function of time: decreases of 40% in 30 min have been noted. According to Fox there is no doubt that a decrement will appear in applied settings, although they do not demonstrate themselves in industry as frequently as might be predicted from laboratory studies, e.g. because frequent breaks occur in real-life jobs. Wiener (1984), in a discussion of inspection-like tasks such as quality control, acknowledged that we are lacking sufficient examples from real-world applications that exhibit a true decrement. Wiener provides a variety of reasons to explain this state of affairs: for example, operators are usually performing secondary activities at the same time, uninterrupted watchkeeping periods rarely exceed 15 min, and the social and physical atmosphere is not as severe as in the confining laboratoryenvironment (on the shop floor, operators can talk to other workers, have some freedom, observe other activities, examine someone else's work or are being examined themselves). Such factors may allay time decrements, though perhaps at the cost of overall performance. Further, in the laboratory the tasks are often selected to produce a decrement rather than on the basis of an analysis of applied task characteristics: vigilance researchers have learned by now how to produce decrements. From an applied point of view, many laboratory studies may have been focusing on a straw person, as Kibler (1965) suggested. But, since the appropriate studies have yet to be done, we simply don't know whether declines
Sustained attention
313
of the form found in the laboratory frequently occur in the operational context. Careful and very costly experiments, whose design would tax the ingenuity of researchers and the patience of operators, would be required to answer this question. Wiener (1984) suggests that, if such experiments appear in the literature some day, we will probably end up exactly where we are today with the laboratory approach: some will show time decrements, some will not. A second form of criticism is more general and states, in essence, that vigilance research does not contribute to a solution of practical problems (e.g. Chapanis, 1967; Flach, 1990; Kibler, 1965; Mackie, 1984). These authors level their charges at all laboratory experiments carried out in any area of psychology, including learning, perception, decision making, etc. Their main objection to the laboratory approach is that experimental psychologists are busily engaged in the production of prodigious amounts of information which has impact on only very small groups of other experimental psychologists, although not much comfort can be taken from the prospect that many peers will appreciate the research efforts. Mackie cites studies reporting that only about 50% of the research articles in the main psychological journals are read or scanned by as many as 1% of a random sample of professional peers. And, if we would take solace in the notion that what we are doing today will produce still undefined benefits tomorrow, 10 or 20 years downstream, Mackie attacks this 'rationale' by stating that this trend may be accepted as far as the natural sciences are concerned but not in the behavioral and social sciences. Nothing happens to all the facts, and there is not even an attempt to integrate the many findings into a single, cohesive quantitative database or theory: the only thing that has changed, 20 years later, is that the sheer quantity of scientific output is greater than ever. Kibler (1965) noted already more than 25 years ago that 'a fairly common introductory rationale for papers on vigilance is that technological advancements are placing increasing emphasis on the use of man as a monitor, that man is a poor monitor, and that the study to be reported is an experimental attack on the problem'. Kibler expressed his doubts with respect to this. The situation has not changed: authors still make at least a token obeisance in the direction of some real-world monitoring job when they write about their work, there is a nearly universal claim that their research is important for understanding real-world problems, and, of course, this kind of research has some face validity, However, Flach (1990) has recently stated that the domination of information processing approaches to human performance, as investigated in the laboratories, has certainly not been due to the overwhelming success of these approaches. Flach cites Simon (1987) that such approaches have been about as 'useful as teaching your grandmother to suck eggs'. Simon observed that thousands of psychologists perform and publish rigorous experiments each year and, probably, they truly believe that their work has some social significance. Yet, unless they are totally isolated from the 'real world', they cannot fail to realize that most of the data being generated are seldom read or used and, in fact, are often utterly useless. Flach (1990) states that, if the results have no value outside of the laboratory context, then the researcher is only puzzle-solving, i.e. playing an intellectual game that is an end in and of itself, the investigator is not even 'doing' basic science. Meister (1984) has also wondered about behavioral research: if one cannot apply research, if it leads absolutely nowhere, is not that research merely a type of game? Chapanis (1967) called it a curious paradox that the more successfully the precision of a laboratory experiment is increased, the more likely it is that a statistically
314
H. S. Koelega
significant effect will be found that is trivial and has no practical significance; a psychological theory based exclusively on the findings of laboratory experiments may be nothing more than a theory of the unimportant (p. 572). The misgivings of these critics should be well taken, taking into consideration that these authors still hold these opinions, thus merely echoing and paraphrasing similar views expressed some 25 years ago. On the one hand the criticism of irrelevant laboratory experiments in the investigation of human performance is fully justified; on the other hand, there have been attempts to accommodate the critics, but, apparently, without success. The purported differences with real-life tasks have been described by Mackie (1984), who analyzed 86 vigilance studies and concluded that the characteristics (background event rate, signal probability and complexity, length of the session, etc.) had absolutely no relevance to operational applications. It is certainly the case that signal rates are unrealistically high in the laboratory where about 20-60 signals are presented during I h. In real-time process monitoring, the rate of signal appearance would be virtually zero in an hour's vigil and might even be zero over much longer periods, because real-world monitoring often involves keeping watch for extremely rare events such as engine failures in aircraft, nuclear power plant breakdowns, heart failures, security intrusions, etc. The experimenter in the laboratory is torn between scheduling frequent signals so as to provide ample data, and a very low signal rate which is more representative of the real world. Likewise, laboratory experiments may be flawed by employing discrimination along only a single dimension, e.g. a change in brightness or loudness of a simple stimulus. Virtually all operational tasks involve complex multidimensional discriminations, multiple signals that can occur in multiple locations requiring searching. Subjects in the laboratory are rarely asked to interpret the meaning and implications of one of several possible critical signals and to select the most appropriate one of several possible responses. Detection and identification in operational work often involve integrative, interpretive processes. Also, many operational tasks involve simultaneous discrimination, in which the signal is some change in the total pattern. Further, stimuli in the field do not always appear at random; their occurrence can or must in part be inferred from other information. Vigilance requirements in operational situations often cut across the classification of high or low event rate, and the neutral background events (the nonsignals) will usually not be regularly spaced temporally, as in laboratory tasks. In real-life inspection, defects or faults may be of various types, varying in degree and differing in importance; there is no simple physical dichotomy between defective and nondefective articles, the task is not simply to decide on the presence or absence of a fault. In industrial and military watchkeeping a major source of confusion lies in the occurrence of 'target-like' non-targets; there is a dimension of 'signalness' (Craig, 1984). Wiener (1984) also reports that, in the inspection of manufactured goods, the types of defects to be detected are diverse and numerous, so here inspection seems to part company with laboratory experiments where only one type of signal is used. Another criticism has been that results of vigilance research are not only very task-specific but very time-specific as well: experimental sessions are far too short. Mackie (1987) has argued that issues of practical importance may not be correctly identified unless the duration and repetitiveness of real-world operations are closely approximated. He underscored his criticism of the limited task duration with an example of his own research with extended truck-driving operations.
Sustained attention
315
Professional truck and bus drivers drove highways each day for 9-10 h, and this for 6 successive days. Dependent variables were traffic lane drifts and steering variability. Significant degradations of performance were observed but only beginning with the fifth day. Researchers need not be completely discouraged by this rather gloomy scene of the differences between laboratory and real life, for two reasons mainly: (1) there are widely differing opinions about the requirements of operational tasks, and (2) a large number of investigators have made an attempt to incorporate such requirements by using 'complex' tasks, especially during the 1960s. (1) Moray (1984) has stated that in process control there are immensely complex systems prevalent, with sometimes a single operator monitoring several hundred or more displays and controls; moreover, the tasks are dynamic, with dependencies among changing variables. Such tasks can never be simulated in the laboratory. However, Mirabella and Goldstein (1967) described the actual task of sonar operators in practice and concluded that many laboratory studies use tasks imposing division of attention and burden on memory in excess of what might be reasonably expected in practice. Kitchin and Graham (1961) have also stated that the mental load of process operators is much less than is popularly believed. Such contrasting opinions point, of course, to the great diversity of tasks subsumed under 'monitoring'. Monitoring has been loosely defined as the process of attending to an aggregate of (potential) information concerning an activity for the purpose of identifying, sustaining, modifying or interpreting that activity; involved are detection, decision processes, and executive or control behaviors (Kibler, 1965). We can conceive of monitoring tasks as being arrayed along continua of display complexity, degree of motor involvement, decision complexity, etc. The commonly used laboratory vigilance tasks, which in themselves are not a homogeneous class, circumscribe but a small segment of the total spectrum of monitoring activities. Monitoring covers a wide variety of applications ranging from tasks that may overload the operator with stimulus inputs (such as those probably meant by Moray) to tasks characterized by their stimulus underload qualities, such as those of Royal Navy sonar operators (Wylie, Mackie and Smith, 1985). Smith, Mackie and Wylie (1985), carrying out contract research for the Admiralty Research Establishment, observed that the problem of real-world sustained attention in Royal Navy sonar operators is nowadays almost identical to the radar problem that stimulated Mackworth's original experiments, despite progress in automation. Contemporary operators still 'observe' signals ('monitor' their display) but do not always perceive or report them. Although much has been computerized, current overall performance of sonar systems is still critically dependent on human operators' ability to monitor, select and use incoming information. The study by Smith et al. also deals with the approbation that microprocessors will obviate the need for human monitoring functions. Mackworth (1957) has also provided data to show that laboratory parameters may to some extent resemble real-life characteristics. The fastest inspection task noted by Mackworth involved 300 000 objects in a day, i.e. about eight per second; at the other end of the scale, a skilled operator considered 3000 objects in I day, or about one every 12 s. As to the nonsignal/signal ratio, i.e. signal probability, an example given by Mackworth from the wartime radar screen in the Bay of Biscay involved 30 echoes from fishing vessels for every one from a submarine.
316
H. S. Koelega
The same confusion exists in the literature with respect to motivational differences. Wiener et al. (1984) think that laboratory subjects can never experience the demotivating long-term effects of real-world exposure to the day-to-day monotony of understimulating watchkeeping tasks. Smith and Lucaccini (1969) also considered laboratory data to be a result of inadequate motivation. Nachreiner (1977) observed that his subjects performed better when they perceived the experiment as a selection test, from which he concluded that subjects in the laboratory lack motivation and perceive vigilance tasks as dull, monotonous, boring and uninteresting; the tasks seem less real, make less sense, there are no behaviorally relevant consequences, and therefore motivation fades. There is a cost associated with constantly observing, and rational creatures feel that they have better things to do with their time. On the other hand, Vervaeck et al. (1982) are of the opinion that the experimental situation in the laboratory motivates subjects too strongly, and others have suggested that the decrement is a product of extremely high motivation and concentration at the beginning of the task. (2) The vigilance literature is not without complex laboratory tasks where operational requirements are simulated to a certain extent. Adams and his coworkers (e.g. Adams and Boulter, 1964), during the early 1960s, carried out a large number of experiments with complex tasks and many authors have later followed this line of research, one of the latest examples being Thackray and Touchstone (1989). The latter authors used a simulated air traffic control task involving much more than simply detecting and responding, e.g. complex discriminations, interpretations, decisions, memory, etc. Subjects had to detect two aircraft at the same altitude on the same flight path and their performance appeared to decline significantly over the 2 h vigil. The authors concluded that the deterioration was due to the inability to sustain attention, since the other behaviors (decision making, short-term memory, motor movement, etc.) showed no evidence of change, and neither did scanning activity: 97% of the missed events occurred during periods in which subjects had their eyes open and were actively searching the display. I am aware of at least 50 studies carried out during the 1960s and 1970s that used 'complex' tasks. However, the results present an extremely incoherent picture, largely due, of course, to the nebulous concept of 'complexity', which can mean anything and can be manipulated in numerous ways: the amount and type of information presented, the amount of processing required including scanning, memory and decision-making demands, the amount of responding involved, etc. Some interesting lines of research from the 1960s do not seem to have been followed up after that decade, e.g. the different processing of appearing ('add') versus disappearing ('omit') signals, the latter type loading STM more (Goldstein, Johnston and Howell, 1969; Howell, Johnston and Goldstein, 1966; Johnston, Howell and Williges, 1969), or the combination of spatial and temporal uncertainty (Adams and Boulter, 1964). However, it is far beyond the scope of this chapter to discuss the literature on complexity. Warm et al. (1985) have stated that over a century of study of perception has taught experimental psychologists that the perceptual system is designed for complex information processing. Simple monitoring tasks may represent tasks for which the perceptual system is not ideally suited, but extremely complex tasks can yield a similar result. There seems to be a 'window' of complexity within which the system functions optimally in terms of temporal stability. On the other hand, Fisk (1985) has stated that performance is determined more by processing mode than by
Sustained attention
317
task complexity, stage or level of processing, etc. Complex tasks and simple tasks can be performed via automatic processing, via controlled processing, or through a combination of both processes. The type of processing is not determined by the complexity or simplicity of a task, but by the consistency of the task and the amount of practice. The results of 'complex' tasks are rather variable, but such tasks have one finding in common: overall (absolute) level of performance is usually low, often much lower than would be tolerable in operational settings. Early critics of the presence of a temporal decay in real-life task performance have acknowledged this: there is another vigilance problem that may be more serious than the decrement, i.e. detection efficiency is generally poor, suboptimal, throughout a task. There are endless examples of the fruits of inadequate monitoring performance. Miller et al. (1985) have argued that research in transportation safety strongly suggests that a general loss of vigilance is a common phenomenon in complex real-world operations. Wiener (1984) has for many years been involved in investigating aircraft accidents, and other human-machine failures, and reports that the story never changes very much. The accident reports, eye-witness accounts, stories of survivors, etc. are replete with language such as 'I never saw it'; 'The flight crew failed to detect'; 'The operator overlooked indications that he should have detected'. And so it goes, with a remarkable sameness, whether one is dealing with aircraft or automobiles, faulty goods, the performance of medical practitioners, air traffic controllers or security guards. In a large number of publications (e.g. 1977, 1980, 1987, 1989) Wiener has presented a detailed analysis of some aircraft crashes and midair collisions, and has concluded that lowered vigilance was one of the salient features of these accidents. Crew members watched their scopes and did not 'see' what they were supposed to see or, more generally, failed to monitor properly. It would be instructive to critics of vigilance research to take a close look at what a number of accidents had in common (Wiener, 1987, p. 734): (1) they occurred at the end of long flights; (2) highly qualified crews were at their duty positions; (3) there were clear indications of system disturbance; and (4) human operators failed to notice the extreme out-of-limits conditions. Many other examples of imperfect monitoring could be given, such as failures of highly automated systems whose human monitors reacted either too late, incorrectly or not at all. Aircraft piloting, air traffic control or nuclear power plant control room operation are critical tasks. However, there are also less spectacular areas depending on vigilance of the human monitor: assembly line inspection, driving cars, security watch systems in buildings, security monitoring of luggage at airports, monitoring of instruments by nurses and anesthetists in hospitals, etc. The costs of failure to notice small but vital symptoms range from the annoying to the catastrophic. Microprocessors and computers are not going to make the problems go away; on the contrary, they contribute some of their own. Automation eliminates small errors, but may create large ones: pilots flying under autopilot guidance made more large blunders than with manual operation (Wiener, 1989). There is a growing awareness on the part of human error and accident investigators of the importance of human vigilance. Wiener's (1987) opinion is to continue laboratory experimentation by all means, with all its limitations, the critics notwithstanding. One can find little to debate in what critics such as Mackie (1984)
318
H. S. Koelega
have to say: the situation with respect to application is rather bleak, but that does not mean that research to date should be considered wasted effort. The results of vigilance research may at present not be particularly germane to real-life problems. There should be more concerted effort to apply laboratory findings to real-world affairs, but we cannot replicate real-world situations in the laboratory: who is willing to run, finance or participate in a 'realistic' experiment in which subjects monitor a display for 6 months, 8 h a day, and at the end of it are required to report whether the signal was detected (hypothetical experiment taken from Wiener, 1987). Stopping further laboratory research does not guarantee that some day a deus ex machina will turn up to show how to deal with monitoring problems in the real world. In the present chapter much space has been devoted to the relevance and application of vigilance research since the subject turns up persistently in the literature over the past 30 years, and remains a cogent question from society's point of view.
7.4
Whither Vigilance Research?
In the preceding section the misgivings of critics of laboratory vigilance research have been noted. However, there is general agreement that performance in real-life tasks is often suboptimal, sometimes even poor, whether one is dealing with aircraft or automobiles, faulty goods, monitoring of instruments in hospitals, etc. People fail to detect critical signals, make errors or react where they should not have reacted. Vigilance research is part of the more general area of human performance investigation called 'human error'. Even the decrement phenomenon, to which vigilance research owes its popularity, is not unique to vigilance tasks, but may become apparent in many other types of task under suitable conditions (repetitive monotonous stimulation). Mackworth and Taylor had already suggested in 1963 that the 'vigilance' decrement is associated not only with infrequent and randomly timed signals, but is a more general effect, occurring with prolonged observation, in short in tasks that require staying prepared to process incoming stimuli and to execute responses. In particular, the combination of fatigue, monotony and boredom is highly detrimental. The vigilance task lends itself exceptionally well to the investigation of performance declines, but I have come across decrements in other paradigms as well: simple RT (Gustafson, 1986), RT in a complex driving simulator (van Laar, 1988), tracking and scanning (Payne and Hauty, 1954), recognition memory (Bowyer, Humphreys and Revelle, 1983), immediate recall in a 20-min task (Hockey, 1986b, p. 286) and continuous memory (Gillespie and Eysenck, 1980). Hockey (1986b) has suggested that the distinction between vigilance and other cognitive tasks may not be a particularly useful one. The main problem with sustained attention may not be the particular information processing demands of external monitoring or continuous output involvement, but the requirement to maintain a specific task set, which may be somewhat more difficult in some vigilance tasks than in other tasks. Ideas such as those put forward by Hockey, and the presence of decremental phenomena in other types of task, show that vigilance research is not condemned to progress on a research island, is not isolated from the mainstream of experimental psychology. Further, the field of attention disorders is unthinkable without vigilance tasks such as the CPT; poor performance on a
Sustained attention
319
vigilance task is part of the diagnosis and definition of ADHD (attention-deficit hyperactivity disorder) in children. ERP research is unthinkable without a vigilance task, which psychophysiologists call an 'oddball' task, probably because they are more interested in the ERP data than in the performance data and time effects. However, I cannot say that vigilance research is alive and well, and that the present chapter is but a testimony of its continuing vitality. The interest in vigilance research has considerably diminished after the explosive growth of the 1960s and 1970s, even though the problems have not been solved. One could suggest that the paradigm is obsolete, but I do not agree, although some specific problems might have become obsolete. I remarked earlier (Koelega, 1992) that reduced interest after a while is a pervasive phenomenon in psychology where 'old problems never die, they just fade away'. There is a constant emergence of new areas of interest (cognitive artificial intelligence, neural networks, etc.), determined by changes in the Zeitgeist, but the prevalent view is that psy'chologists never really solve anything; after a while they get tired of pondering one problem area and move on to another. Or the problems re-emerge after some time; there are numerous examples from the area of attention and performance: the first experiments on the effect of alcohol on simple reaction time were already carried out more than 100 years ago, as were the first experiments on dual-task performance and divided attention; the inconsistent effects of noise on performance were already reported more than 80 years ago. In all these areas, papers are still being published, but also we still lack answers. The problems, once formulated and attracting many researchers' interest, have not been solved. My position is that we should not go on indeterminately with the hunt for factors affecting the vigilance decrement. Mackie (1987) has provided a partial listing of 50 such factors. Countless variables have been indentified that may influence the decrement or level of performance. I don't think our understanding of vigilance is necessarily enhanced by identifying how many variables are related to vigilance. If we should attempt to control every factor that may affect vigilance, we would create diffuse models that predict nothing. Occam's Razor tells us to be parsimonious, but the lowest level of explanation, a one-mechanism explanation of vigilance, does not seem to be feasible, and I don't see how to proceed to develop a multi-mechanism view. Laboratory vigilance research, some 50 years old now, could easily keep many researchers occupied for decades to come in order to obtain a more complete picture of sustained attention, and new, or as yet neglected, lines of research could be added. For example, visual search, hunting among displays or within a display for critical signals, is an important feature of many real-life monitoring tasks, but the variable is completely absent in most vigilance studies. During the 1960s, O'Hanlon (1965) proposed a hormonal approach to vigilance, with endocrine measures. Such measures could be useful for theories with a renewed interest in the 'effort' construct, but as far as I know this approach has been abandoned in vigilance research during the past decade. In the book edited by Warm (1984b) and in Davies and Parasuraman (1982), many unresolved issues are to be found and in my own reviews I have always formulated hypotheses and have indicated lines of future research. Although vigilance tasks probably resemble 'real-life' task behavior more than do some other tasks used in experimental psychology, this kind of research would not satisfy the critics of laboratory experiments, but some research issues in sustained attention might muffle their critical sounds somewhat.
320
7.5
H. S. Koelega
Attention Disorders and Deficits
O'Hanlon (1991) has recently argued for a greater role of human factor specialists in the field of human psychopharmacology. Their knowledge of performancetesting paradigms that closely resemble real-life activities is necessary to supplement the data obtained with animal models of anxiety, depression, aggression, etc; they could assist in developing standard laboratory performance-batteries. There is another challenge confronting investigators of attention and performance, and that is to contribute to a further understanding of the nature of attention deficits in specific clinical populations, for example in ADHD children. From the early days of vigilance (=sustained attention) research, the decrement has been considered to be an attentional phenomenon, and in ADHD children the behavior problems are believed to be secondary to a primary attentional defect or dysfunction. The model emerged that the clinical efficacy of stimulant drugs in the treatment of behavioral disorders such as ADHD is secondary to an improvement in the children's attentional level, i.e. to a normalization of attentional mechanisms (Gittelman, 1983). The complex nature of the attention disorder in ADHD children precludes generalization of the results of drug studies to attentional phenomena such as the vigilance decrement in the population of normal adults. But performance patterns in several groups of otherwise 'normal' subjects, such as children of alcoholics, children of mothers taking alcohol or smoking cigarettes during pregnancy, and DWI (driving while intoxicated) drivers, bear some similarity to the performance of ADHD children on vigilance tasks, especially with respect to the production of false alarms. Gualtieri and Hicks (1985) have pointed at the striking similarity between ADHD children and patients with damage to the frontal lobes: lack of impulse control and diminished persistence of attending. In fact, normal students selected for poor performance on a vigilance task appeared to do especially poorly on neuropsychological tests of frontal lobe dysfunction. However, performance impairment in other normal groups (e.g. extroverts and fast habituators) may involve a different mechanism from that of ADHD children. Society has much to gain by intensified efforts to investigate childhood attention deficit disorder, one of the most frequent and serious neurobehavioral disorders of childhood, affecting children from their earliest infancy into adult life. There is now overwhelming evidence that a substantial proportion of ADHD children are predisposed to develop affective disorders (depression) and antisocial personalities, and are characterized by drug abuse, associated with criminality; hyperactivity is a particularly relevant predictor of later sociopathy and criminality (Klein and Mannuzza, 1991; Mannuzza et al., 1991). The ADHD syndrome thus constitutes a major public health concern. However, there is also evidence (Satterfield, Satterfield and Schell, 1987) that early identification and therapeutic interventions may decrease the risk for later delinquency. In the identification stage, vigilance (and other attention) tasks may be useful. Nuechterlein and Dawson (1984) have pointed to the clinical predictive value of a vigilance task, the CPT, within high-risk samples of clinical deviance. In several studies they have shown that this task was the most effective condition for isolating a disproportionately large group of high-risk children (from 5 years old on) for schizophrenia. They have stated that, in adults also, the CPT taps a deficit with particular relevance to schizophrenia (1984, p. 183) and may serve as a promising
Sustained attention
321
indicator of vulnerability to schizophrenic disorder. CPT performance in ADHD children may manifest a different deficit (p. 191). More recently, Cornblatt et al. (1991) have also emphasized that attentional dysfunctions, as for example revealed by the CPT, have considerable potential as phenotypic markers of a genetic vulnerability to psychiatric disease: attention deficits, apparent when tested at an average age of 9 years, have been the most successful predictors of later behavioral disturbances such as schizophrenia, according to these authors. In clinical populations, vigilance tasks have been used for more than 30 years now as tools to assess attentional deficits. There is a more developed database of the CPT than of any other type of task in ADHD children, and authorities agree that the CPT (three versions of which are widely used) is very useful for diagnostic purposes. I do not share this opinion, because other clinical groups show the same deficits and for diagnosis we need specific deficits (Koelega, 1995b), but much more could be done. It is often unclear whether we are dealing with an attentional deficit, or with an inability to distinguish signals, an inability to focus on the task, or an unwillingness to participate. Note that ADHD children do not always show impaired sustained attention, i.e. in maintaining performance over time, but they often show a deficit in overall, absolute, level of performance, just as was the case with extroverts and fast habituators. There have as yet been no attempts to compare different vigilance tasks, imposing different processing demands, in these groups; for example, 'sensory' versus 'cognitive' (spatial versus verbal) stimuli, tasks of the Bakan-type with more cognitive involvement (transformations, applying some constant mathematical rule or operation, such as 'three successive odd but unequal digits'), tasks of the Nuechterleintype, in which highly degraded (data limited) visual stimuli are used, tasks with a dynamic component, involving eye movements, versus tasks with static stimuli, etc. Differential performance on these tasks could reveal more specific deficits in information processing, and the use of ERPs and specifically acting drugs could contribute to a further understanding of these impairments. Further, there are very few studies addressing the question of whether ADHD children are impaired in selective attention and flexibility of attention (the capability to switch), both important aspects of 'attention'. Designs in which the same ADHD subjects perform sustained, selective and divided attention tasks, as well as tasks tapping the capacity to switch attention, could also reveal more specific deficits in information processing. The groups should be matched with respect to age and IQ, or these factors should be included as covariates, since there is evidence that they are related to performance on attention tasks. Tasks loading memory, e.g. of the successive discrimination type or so-called 'missing stimulus' vigilance tasks, might be especially prone to differences between normal and ADHD children, when coupled with a high (externally paced) event rate. There is a promising future for researchers in this area: the combination of attention tasks, ERP recordings and other techniques such as positron emission tomography (the ability to visualize changes in brain metabolism) and magnetic resonance imaging may allow localization of areas involved in 'attention', clarifying mechanisms of dysfunction. This combination has revealed a dysfunction of the mid-prefrontal region in the impaired vigilance of schizophrenics (Cohen et al., 1987), and also that part of the dysfunction was attenuated by neuroleptics (Cohen et al., 1988). The new brain imaging techniques have revealed a startling degree of region-specific activity (Posner, 1993). Studies of drugs are a useful supplement,
322
H. S. Koelega
since stimulant drugs have been s h o w n to improve behavior disorders markedly, even in adult ADHD, where m a r k e d i m p r o v e m e n t has been reported in a large proportion (up to about 60%) of the individuals (Zametkin and Borcherding, 1989). Clarification and identification of attentional dysfunction is one area of research that could convince the critics of laboratory vigilance research that the s t u d y of sustained attention m a y be relevant to society's problems.
REFERENCES Adams, J. A. and Boulter, L. R. (1964). Spatial and temporal uncertainty as determinants of vigilance behavior. Journal of Experimental Psychology, 67, 127-131. Avolio, B. J., Alexander, R. A., Barrett, G. V. and Sterns, H. L. (1981). Designing a measure of visual selective attention to assess individual differences in information processing. Applied Psychological Measurement, 5, 29-42. Avolio, B. J., Kroeck, K. G. and Panek, D. E. (1985). Individual differences in informationprocessing ability as a predictor of motor vehicle accidents. Human Factors, 27, 577-587. Barrett, G. V., Alexander, R. A., Cellar, D., Doverspike, D. and Thomas, J. C. (1983). Use of an information-processing based test battery in an applied setting: Prediction of monitoring performance. Perceptual and Motor Skills, 56, 939-945. Barrett, G. V., Forbes, J. B., O'Connor, E. J. and Alexander, R. A. (1980). Ability-satisfaction relationships: Field and laboratory studies. Academy of Management Journal, 23, 550-555. Barroso, F. (1983). An approach to the study of attentional components in auditory tasks. Journal of Auditory Research, 23, 157-180. Beh, H. C. (1989). The effect of passive smoking on vigilance performance. Ergonomics, 32, 1227-1236. Berrien, F. K. (1946). The effects of noise. Psychological Bulletin, 43, 141-161. Bowyer, P. A., Humphreys, M. S. and Revelle, W. (1983). Arousal and recognition memory: The effects of impulsivity, caffeine and time on task. Personality and Individual Differences,4, 41-49. Braune, R. and Wickens, C. D. (1986). Time-sharing revisited: Test of a componential model for the assessment of individual differences. Ergonomics, 29, 1399-1414. Broadbent, D. E. (1957). Effects of noise on behaviour. In C. M. Harris (Ed.), Handbook of Noise Control (pp. 1-23). New York: McGraw-Hill. Broadbent, D. E. (1971). Decision and Stress. London: Academic Press. Broadbent, D. E. (1979). Human performance and noise. In C. M. Harris (Ed.), Handbook of Noise Control (pp. 17.1-17.20). New York: McGraw-Hill. Buckner, D. N. (1963). An individual-difference approach to explaining vigilance performance. In D. N. Buckner and J. J. McGrath (Eds), Vigilance: A Symposium (pp. 171-183). New York: McGraw-Hill. Buckner, D. N. and McGrath, J. J. (Eds) (1963). Vigilance: A Symposium. New York: McGraw-Hill. Cahoon, R. L. (1970). Vigilance performance under hypoxia. Journal of Applied Psychology, 54, 479-483. Callaway, E. (1983). The pharmacology of human information processing. Psychophysiology, 20, 359-370. Cassel, E. E. and Dallenbach, K. M. (1918). The effect of auditory distraction upon the sensory reaction. American Journal of Psychology, 29, 129-143. Cellar, D., Barrett, G. V., Alexander, R., Doverspike, D., Thomas, J. C., Binning, J. F. and Kroeck, G. (1982). Cognitive information processing measures as predictors of monitoring performance. Perceptual and Motor Skills, 54, 1299-1302.
Sustained attention
323
Chapanis, A. (1967). The relevance of laboratory studies to practical situations. Ergonomics, 10, 557-577. Claridge, G. (1967). Personality and Arousal. Oxford: Pergamon Press. Claridge, G. (1981). Arousal. In G. Underwood and R. Stevens (Eds), Aspects of Consciousness, vol. 2 (pp. 119-148). London: Academic Press. Claridge, G. (1987). Psychoticism and arousal. In J. Strelau and H. J. Eysenck (Eds), Personality Dimensions and Arousal (pp. 133-150). New York: Plenum. Coates, G. D., Adkins, C. J., Jr and Alluisi, E. A. (1975). Human performance and aircraft-type noise interactions. Journal of Auditory Research, 15, 197-207. Cohen, R. M., Semple, W. E., Gross, M., Nordahl, T. E., DeLisi, L. E., Holcomb, H. H., King, A. C., Morihisa, J. M. and Pickar, D. (1987). Dysfunction in a prefrontal substrate of sustained attention in schizophrenia. Life Sciences, 40, 2031-2039. Cohen, R. M., Semple, W. E., Gross, M., Nordahl, T. E., Holcomb, H. H., Dowling, M. S. and Pickar, D. (1988). The effect of neuroleptics on dysfunction in a prefrontal substrate of sustained attention in schizophrenia. Life Sciences, 43, 1141-1150. Colquhoun, W. P. and Edwards, R. S. (1975). Interaction of noise with alcohol on a task of sustained attention. Ergonomics, 18, 81-87. Cornblatt, B. A., Obuchowski, M. and Erlenmeyer-Kimling, M. A. L. (1991). Childhood attentional problems and adult psychopathology. Biological Psychiatry, 29, 165A. Craig, A. (1984). Human engineering: The control of vigilance. In J. S. Warm (Ed.), Sustained Attention in Human Performance (pp. 247-291). Chichester: Wiley. Davies, D. R., Jones, D. M. and Taylor, A. (1984). Selective- and sustained-attention tasks: Individual and group differences. In R. Parasuraman and D. R. Davies (Eds.), Varieties of Attention (pp. 395-447). Orlando, FL: Academic Press. Davies, D. R. and Parasuraman, R. (1982). The Psychology of Vigilance. London: Academic Press. Davies, D. R. and Tune, G. S. (1970). Human Vigilance Performance. London: Staples Press. Dejoy, D. M. (1984). The nonauditory effects of noise: Review and perspectives for research. Journal of Auditory Research, 24, 123-150. Dember, W. N. and Warm, J. S. (1979). Psychology of Perception. New York: Holt, Rinehart and Winston. Dittmar, M. L., Warm, J. S. and Dember, W. N. (1985). Effects of knowledge of results on performance in successive and simultaneous vigilance tasks: A signal detection analysis. In R. E. Eberts and C. G. Eberts (Eds), Trends in Ergonomics~Human Factors II (pp. 195-202). Amsterdam: North-Holland. Donchin, E. (1979). Event-related brain potentials: A tool in the study of human information processing. In H. Begleiter (Ed.), Evoked Brain Potentials and Behavior (pp. 13-88). New York: Plenum. Donchin, E. and Isreal, J. B. (1980). Event-related potentials and psychological theory. Progress in Brain Research, 54, 697-715. Dornic, S. (1990). Noise and Information Processing: Findings, Trends and Issues (Rep. 715). Stockholm University: Department of Psychology. Egeth, H. and Bevan, W. (1973). Attention. In B. Wolman (Ed.), Handbook of General Psychology (pp. 395-418). Englewood Cliffs, NJ: Prentice-Hall. Eilers, K., H~inecke, K., Peper, J. and Nachreiner, F. (1988). Time of day effects in vigilance performance. In J. P. Leonard (Ed.), Vigilance: Methods, Models and Regulation (pp. 181-190). Frankfurt: Peter Lang. Elliott, E. (1960). Perception and alertness. Ergonomics, 3, 357-364. Evans, G. W. and Tafalla, R. (1987). Measurement of environmental annoyance. In H. S. Koelega (Ed.), Environmental Annoyance: Characterization, Measurement and Control (pp. 1127). Amsterdam: Elsevier. Eysenck, H. J. (1967). The Biological Basis of Personality. Springfield, IL: Charles C. Thomas. Eysenck, H. J. (1981). Introduction. In H. J. Eysenck (Ed.), A Model for Personality. Berlin: Springer.
324
H. S. Koelega
Eysenck, M. W. (1982). Attention and Arousal. Berlin: Springer. Eysenck, M. W. (1988). Individual differences, arousal and monotonous work. In J. P. Leonard (Ed.), Vigilance: Methods, Models and Regulation (pp. 111-118). Frankfurt: Peter Lang. Fisk, A. D. (1985). Automatic and controlled processing approach to interpreting vigilance performance. Proceedings of the 29th Annual Meeting of the Human Factors Society, 14-18. Flach, J. M. (1990). The ecology of human-machine systems. I: Introduction. Ecological Psychology, 2, 191-205. Fleishman, E. A. (1975). Toward a taxonomy of human performance. American Psychologist, 30, 1127-1149. Fleishman, E. A. and Quaintance, M. K. (1984). Taxonomies of Human Performance. Orlando, FL: Academic Press. Forbes, J. B. and Barrett, G. V. (1978). Individual abilities and task demands in relation to performance and satisfaction on two repetitive monitoring tasks. Journal of Applied Psychology, 63, 188-196. Fox, J. G. (1975). Vigilance and arousal: A key to maintaining inspectors' performance. In C.G. Drury and J. G. Fox (Eds), Human Reliability in Quality Control (pp. 89-96). London: Taylor and Francis. Friedman, D., Vaughan, H. G. and Erlenmeyer-Kimling, L. (1981). Multiple late positive potentials in two visual discrimination tasks. Psychophysiology, 18, 635-649. Gale, A. (1987). Arousal, control, energetics and values. In J. Strelau and H. J. Eysenck (Eds), Personality Dimensions and Arousal (pp. 287-316). New York: Plenum. Gale, A. and Edwards, J. A. (1986). Individual differences. In M. G. H. Coles, E. Donchin and S. W. Porges (Eds), Psychophysiology: Systems, Processes and Applications (pp. 431-507). New York: Guilford Press. Gallwet, T. J. (1982). Selection tests for visual inspectors in a multiple fault type task. Proceedings of the Annual Conference of the Ergonomic Society (Abstract). Gillespie, C. R. and Eysenck, M. W. (1980). Effects of introversion-extraversion on continuous recognition memory. Bulletin of the Psychonomic Society, 15, 233-235. Gittelman, R. (1983). Experimental and clinical studies of stimulant use in hyperactive children and children with other behavioral disorders. In I. Creese (Ed.), Stimulants: Neurochemical, Behavioral and Clinical Perspectives (pp. 205-226). New York: Raven Press. Gluckman, J. P., Dember, W. N. and Warm, J. S. (1988). Capacity demand in dual-task monitoring of simultaneous and successive vigilance tasks. Proceedings of the 32nd Annual Meeting of the Human Factors Society, 1463-1465. Goldstein, I. L., Johnston, W. A. and Howell, W. C. (1969). Complex vigilance: Relevant and irrelevant signals. Journal of Applied Psychology, 53, 45-48. Gopher, D. (1982). A selective attention test as a predictor of success in flight training. Human Factors, 24, 173-183. Gopher, D. and Kahneman, D. (1971). Individual differences in attention and the prediction of flight criteria. Perceptual and Motor Skills, 33, 1335-1342. Gualtieri, C. T. and Hicks, R. E. (1985). Neuropharmacology of methylphenidate and a neural substrate for childhood hyperactivity. Psychiatric Clinics of North America, 8, 875-892. Gustafson, R. (1986). Alcohol and vigilance performance: The effect of small doses of alcohol on simple visual reaction time. Perceptual and Motor Skills, 62, 951-955. Halperin, J. M., Wolf, L. E., Pascualvaca, D. M., Newcorn, J. H., Healey, J. M., O'Brien, J. D., Morganstein, A. and Young, J. G. (1988). Differential assessment of attention and impulsivity in children. Journal of the American Academy of Child and Adolescent Psychiatry, 27, 326-329. Hamilton, P. and Copeman, A. (1970). The effect of alcohol and noise on components of a tracking and monitoring task. British Journal of Psychology, 61, 149-156. Hamilton, P., Hockey, B. and Rejman, M. (1977). The place of the concept of activation in human information processing theory: An integrative approach. In S. Dorni~ (Ed.), Attention and Performance V/(pp. 463-486). Hillsdale, NJ: Erlbaum.
Sustained attention
325
Hancock, P. A. (1984). Environmental stressors. In J. S. Warm (Ed.), Sustained Attention in Human Performance (pp. 103-142). Chichester: Wiley. Hancock, P. A. and Warm, J. S. (1989). A dynamic model of stress and sustained attention. Human Factors, 31, 519-537. Hillyard, S. A. (1981). Selective auditory attention and early event-related potentials: A rejoinder. Canadian Journal of Psychology, 35, 159-174. Hillyard, S. A. and Picton, T. W. (1979). Event-related brain potentials and selective information processing in man. In J. E. Desmedt (Ed.), Cognitive Components in Cerebral Event-Related Potentials and Selective Attention (pp. 1-52). Basel: Karger. Hink, R. F., Fenton, W. H., Tinklenberg, J. R., Pfefferbaum, A. and Kopell, B. S. (1978). Vigilance and human attention under conditions of methylphenidate and secobarbital intoxication: An assessment using brain potentials. Psychophysiology, 15, 116-125. Hockey, G. R. J. (1986a). Temperament differences in vigilance performance as a function of variations in the suitability of ambient noise level. In J. Strelau, F. H. Farley and A. Gale (Eds), The Biological Bases of Personality and Behavior, vol. 2 (pp. 163-171). London: McGraw-Hill. Hockey, G. R. J. (1986b). A state control theory of adaptation to stress and individual differences in stress management. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 285-298). Dordrecht: Martinus Nijhoff. Hockey, G. R. J., Coles, M. G. H. and Gaillard, A. W. K. (1986). Energetical issues in research on human information processing. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 3-21). Dordrecht: Martinus Nijhoff. Hockey, R. (1979). Stress and the cognitive components of skilled performance. In V. Hamilton and D. M. Warburton (Eds), Human Stress and Cognition (pp. 141-177). Chichester: Wiley. Howell, W. C., Johnston, W. A. and Goldstein, I. L. (1966). Complex monitoring and its relation to the classical problem of vigilance. Organizational Behavior and Human Performance, 1, 129-150. Humphreys, M. S. and Revelle, W. (1984). Personality, motivation and performance: A theory of the relationship between individual differences and information processing. Psychological Review, 91, 153-184. Jerison, H. J. (1963). On the decrement function in human vigilance. In D. N. Buckner and J.J. McGrath (Eds), Vigilance: A Symposium (pp. 199-216). New York: McGraw-Hill. Jerison, H. J. (1970). Vigilance, discrimination and attention. In D. I. Mostofsky (Ed.), Attention: Contemporary Theory and Analysis (pp. 127-147). New York: Appleton-CenturyCrofts. Johnston, W. A., Howell, W. C. and Williges, R. C. (1969). The components of complex monitoring. Organizational Behavior and Human Performance, 4, 112-124. Jones, D. M., Smith, A. P. and Broadbent, D. E. (1979). Effects of moderate intensity noise on the Bakan vigilance task. Journal of Applied Psychology, 64, 627-634. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall. Kahneman, D., Ben-Ishai, R. and Lotan, M. (1973). Relation of a test of attention to road accidents. Journal of Applied Psychology, 58, 113-115. Kibler, A. W. (1965). The relevance of vigilance research to aerospace monitoring tasks. Human Factors, 7, 93-99. Kitchin, J. B. and Graham, A. (1961). Mental loading of process operators: An attempt to devise a method of analysis and assessment. Ergonomics, 4, 1-15. Klein, R. G. and Mannuzza, S. (1991). Long-term outcome of hyperactive children: A review. Journal of the American Academy of Child and Adolescent Psychiatry, 30, 383-387. Koelega, H. S. (1987). Introduction: Environmental annoyance. In H. S. Koelega (Ed.), Environmental Annoyance: Characterization, Measurement and Control (pp. 1-7). Amsterdam: Elsevier.
326
H. S. Koelega
Koelega, H. S. (1989). Benzodiazepines and vigilance performance: A review. Psychopharmacology, 98, 145-156. Koelega, H. S. (1990). Vigilance performance: A review of electrodermal predictors. Perceptual and Motor Skills, 70, 1011-1029. Koelega, H. S. (1991). The Effects of Alcohol and Stimulant Drugs on Vigilance Performance. University of Utrecht. Koelega, H. S. (1992). Extraversion and vigilance performance: 30 years of inconsistencies. Psychological Bulletin, 112, 239-258. Koelega, H. S. (1993). Stimulant drugs and vigilance performance: A review. Psychopharmacology, 111, 1-16. Koelega, H. S. (1995a). Alcohol and vigilance performance: A review. Psychopharmacology, 118, 233-249. Koelega, H. S. (1995b). Is the continuous performance task useful in research with ADHD children? Comments on a review. Journal of Child Psychology and Psychiatry, 36, 1477-1485. Koelega, H. S. and Brinkman, J. A. (1986). Noise and vigilance: An evaluative review. Human Factors, 28, 465-481. Koelega, H. S., Brinkman, J. A. and Bergman, H. (1986). No effect of noise on vigilance performance? Human Factors, 28, 581-593. Koelega, H. S., Brinkman, J. A., Hendriks, L. and Verbaten, M. N. (1989). Porcessing demands, effort and individual differences in four different vigilance tasks. Human Factors, 31, 45-62. Koelega, H. S., Brinkman, J. A., Zwep, B. and Verbaten, M. N. (1990). Dynamic vs. static stimuli in their effect on visual vigilance performance. Perceptual and Motor Skills, 70, 823-831. Koelega, H. S. and Verbaten, M. N. (1991). Event-related brain potentials and vigilance performance: Dissociations abound. A review. Perceptual and Motor Skills, 72, 971-982. Koelega, H. S., Verbaten, M. N., van Leeuwen, Th. H., Kenemans, J. L., Kemner, C. and Sjouw, W. (1992). Time effects on event-related brain potentials and vigilance performance. Biological Psychology, 34, 59-86. Kryter, K. D. (1970). The Effects of Noise on Man. New York: Academic Press. Lacey, J. I. (1967). Somatic response patterning and stress: Some revisions of activation theory. In M. H. Appley and R. Trumbell (Eds), Psychological Stress. New York: AppletonCentury-Crofts. Lane, D. M. (1982). Limited capacity, attention allocation and productivity. In E. A. Fleishman (Ed.), Human Performance and Productivity, vol. 2 (pp. 121-156). Hillsdale, NJ: Erlbaum. Lansman, M., Poltrock, S. E. and Hunt, E. (1983). Individual differences in the ability to focus and divide attention. Intelligence, 7, 299-312. Lanzetta, T. M., Warm, J. S., Dember, W. N. and Berch, D. B. (1985). Information processing load and the event rate function in sustained attention. Proceedings of the 29th Annual Meeting of the Human Factors Society, 1084-1088. Levine, J. M., Romashko, T. and Fleishman, E. A. (1973). Evaluation of an abilities classification system for integrating and generalizing human performance research findings: An application to vigilance tasks. Journal of Applied Psychology, 58, 149-157. Loeb, M. (1980). Noise and performance: Do we know more now? In J. V. Tobias, G. Jansen and W. D. Ward (Eds), Proceedings of the Third International Congress on Noise as a Public Health Problem (pp. 303-321). Rockville, MD: The American SLH Association. Loeb, M. and Alluisi, E. A. (1977). An update of findings regarding vigilance and a reconsideration of underlying mechanisms. In R. R. Mackie (Ed.), Vigilance: Theory, Operational Performance and Physiological Mechanisms (pp. 719-749). New York: Plenum. Loeb, M. and Alluisi, E. A. (1984). Theories of vigilance. In J. S. Warm (Ed.), Sustained Attention in Human Performance (pp. 179-205). Chichester: Wiley.
Sustained attention
327
Logan, G. D. (1988). Automaticity, resources and memory: Theoretical controversies and practical implications. Human Factors, 30, 583-598. Lysaght, R. J., Warm, J. S., Dember, W. N. and Loeb, M. (1984). Effects of noise and information-processing demand on vigilance performance in men and women. In A. Mital (Ed.), Trends in Ergonomics~Human Factors I (pp. 27-32). Amsterdam: North-Holland. Mackie, R. R. (Ed.) (1977). Vigilance: Theory, Operational Performance and Physiological Correlates. New York: Plenum. Mackie, R. R. (1984). Research relevance and the information glut. In F. A. Muckler (Ed.), Human Factors Review: 1984 (pp. 1-11). Santa Monica, CA: Human Factors Society. Mackie, R. R. (1987). Vigilance research. Are we ready for countermeasures? Human Factors, 29, 707-723. Mackworth, J. F. (1970). Vigilance and Attention. A Signal Detection Approach. Harmondsworth: Penguin. Mackworth, J. F. and Taylor, M. M. (1963). The d' measure of signal detectability in vigilance-like situations. Canadian Journal of Psychology, 17, 302-325. Mackworth, N. H. (1948). The breakdown of vigilance during prolonged visual search. Quarterly Journal of Experimental Psychology, 1, 6-21. Mackworth, N. H. (1950). Researches on the measurement of human performance. MRC Special Report No. 268. Reprinted in H. W. Sinaiko (Ed.), Selected Papers on Human Factors in the Design and Use of Control Systems. New York: Dover, 1961. Mackworth, N. H. (1957). Some factors affecting vigilance. The Advancement of Science, 53, 389-393. Mannuzza, S., Klein, R. G., Bonagura, N., Malloy, P., Giampino, T. L. and Addalli, K. A. (1991). Hyperactive boys almost grown up. Archives of General Psychiatry, 48, 77-83. Matthews, G. (1985). The effects of extraversion and arousal on intelligence test performance. British Journal of Psychology, 76, 479-493. McGrath, J. J. (1963). Some problems of definition and criteria in the study of vigilance performance. In D. N. Buckner and J. J. McGrath (Eds), Vigilance: A Symposium (pp. 227246). New York: McGraw-Hill. Meister, D. (1984). Letter. Human Factors Society Bulletin, 27, 2. Miller, J. C., Takamoto, N. Y., Bartel, G. M. and Brown, M. D. (1985). Psychophysiological correlates of long-term attention to complex tasks. Behavior Research Methods, Instruments and Computers, 17, 186-190. Mirabella, A. and Goldstein, D. A. (1967). The effects of ambient noise upon signal detection. Human Factors, 9, 277-284. Mirsky, A. F. (1987). Behavioral and psychophysiological markers of disordered attention. Environmental Health Perspectives, 74, 191-199. Mirsky, A. F. and Orren, M. M. (1977). Attention. Advances in Biochemical Psychopharmacology, 17, 233-267. Moore, S. F. and Gross, S. J. (1973). Influence of critical signal regularity, stimulus event matrix and cognitive style on vigilance performance. Journal of Experimental Psychology, 99, 137-139. Moray, N. (1984). Attention to dynamic visual displays in man-machine systems. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention (pp. 485-513). London: Academic Press. Moses, J. L. (1970). Selecting vigilant types: Predicting vigilance performance by means of a field dependence test. Experimental Publication Systems, 4, Ms. no. 151 B. N~i~it~inen, R. (1986). A classification of N2 kinds of ERP components. In W. C. McCallum, R. Zappoli and F. Denoth (Eds), Cerebral Psychophysiology: Studies in Event-Related Potentials (EEG Suppl. 38) (pp. 169-172). Amsterdam: Elsevier. N~i~it~inen, R. and Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure. Psychophysiology, 24, 375-425.
328
H. S. Koelega
Nachreiner, F. (1977). Experiments on the validity of vigilance experiments. In R. R. Mackie (Ed.), Vigilance: Theory, Operational Performance and Physiological Mechanisms (pp. 665-678). New York: Plenum. Neiss, R. (1990). Ending arousal's reign of error: A reply to Anderson. Psychological Bulletin, 107, 101-105. Nuechterlein, K. H. and Dawson, M. E. (1984). Information processing and attentional functioning in the development course of schizophrenic disorders. Schizophrenia Bulletin, 10, 160-203. O'Connor, K. (1983). Individual differences in components of slow cortical potentials: Implications for models of information processing. Personality and Individual Differences, 4, 403-410. O'Gorman, J. G. and Lloyd, E. M. (1988). Electrodermal lability and dichotic listening. Psychophysiology, 25, 538-546. O'Hanlon, J. F. (1965). Adrenaline and noradrenaline: Relation to performance in a visual vigilance task. Science, 150, 507-509. O'Hanlon, J. F. (1991). Human factors in psychoactive drug development: A new challenge and opportunity. Human Factors Society Bulletin, 34(4), 1-3. O'Hanlon, J. F, Fussler, C., Sancin, E. and Grandjean, E. P. (1978). Efficacy of an ACTH 4-9 analog, relative to that of a standard drug (d-amphetamine), for blocking the 'vigilance decrement' in men. Reports Organon International Netherlands. Parasuraman, R. (1979). Memory load and event rate control sensitivity decrements in sustained attention. Science, 205, 924-927. Parasuraman, R. and Davies, D. R. (1977). A taxonomic analysis of vigilance performance. In R. R. Mackie (Ed.) Vigilance: Theory, Operational Performance and Physiological Correlates (pp. 559-574). New York: Plenum. Parasuraman, R. and Mouloua, M. (1987). Interaction of signal discriminability and task type in vigilance decrement. Perception and Psychophysics, 41, 17-22. Parasuraman, R. and Nestor, P. (1986). Energetics of attention and Alzheimer's disease. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 395-407). Dordrecht: Martinus Nijhoff. Payne, R. B. and Hauty, G. T. (1954). The effects of experimentally induced attitudes upon task proficiency. Journal of Experimental Psychology, 47, 267-273. Picton, T. W. (1992). The P300 wave of the human event-related potential. Journal of Clinical Neurophysiology, 9, 456-479. Picton, T. W., Campbell, K. B., Baribeau-Braun, J. and Proulx, G. B. (1978). The neurophysiology of human attention: A tutorial review. In J. Requin (Ed.), Attention and Performance VII (pp. 429-467). Hillsdale, NJ: Erlbaum. Plutchik, R. (1959). The effects of high-intensity intermittent sound on performance, feeling, and physiology. Psychological Bulletin, 56, 133-151. Posner, M. I. (1975). Psychobiology of attention. In M. S. Gazzaniga and C. Blakemore (Eds), Handbook of Psychobiology (pp. 441-480). New York: Academic Press. Posner, M. I. (1978). Chronometric Explorations of Mind. Hillsdale, NJ: Erlbaum. Posner, M. I. (1993). Seeing the mind. Science, 262, 673-674. Posner, M. I. and Boies, S. J. (1971). Components of attention. Psychological Review, 78, 391-408. Posner, M. I., Inhoff, A. W., Friedrich, F. J. and Cohen, A. (1987). Isolating attentional systems: A cognitive-anatomical analysis. Psychobiology, 15, 107-121. Posner, M. I. and Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13, 25-42. Posner, M. I. and Rothbart, M. K. (1980). Attentional mechanisms. Nebraska Symposium on Motivation, 28, 1-52.
Sustained attention
329
Pribram, K. H. and McGuinness, D. (1975). Arousal, activation, and effort in the control of attention. Psychological Review, 82, 116-149. Revelle, W., Humphreys, M. S., Simon, L. and Gilliland, K. (1980). The interactive effect of personality, time of day and caffeine: A test of the arousal model. Journal of Experimental Psychology: General, 109, 1-31. Roskam, E. E. (1990). Formalized theory and the explanation of empirical phenomena. In J. J. Hox and J. de Jong-Gierveld (Eds), Operationalization and Research Strategy (pp. 179-198). Amsterdam: Swets and Zeitlinger. Routtenberg, A. (1968). The two-arousal hypothesis: Reticular formation and limbic system. Psychological Review, 75, 51-80. Sack, S. A. and Rice, C. E. (1974). Selectivity, resistance to distraction and shifting as three attentional factors. Psychological Reports, 34, 1003-1012. Saletu, B., Frey, R. and Gr6nberger, J. (1989). [Traffic noise and sleep]. Wiener Medizinische Wochenschrifl, 139, 257-263 (English abstract). Sanders, A. F. (1986). Energetical states underlying task performance. In G. R. J. Hockey, A. W. K. Gaillard and M. G. H. Coles (Eds), Energetics and Human Information Processing (pp. 139-154). Dordrecht: Martinus Nijhoff. Sanders, M. G. (1973). Personality variables as predictors of performance on a prolonged monitoring task. Unpublished PhD thesis, Texas Tech. University. Satterfield, J. H., Satterfield, B. T. and Schell, A. M. (1987). Therapeutic interventions to prevent delinquency in hyperactive boys. Journal of the American Academy of Child and Adolescent Psychiatry, 26, 56-64. See, J. E., Howe, S. R., Warm, J. S. and Dember, W. N. (1995). Meta-analysis of the sensitivity decrement in vigilance. Psychological Bulletin, 117, 230-249. Simon, C. H. (1976). Analysis of Human Factors Engineering Experiments. Canyon Research Group: Technical Report CWS-02-76. Simon, C. W. (1987). Will egg-sucking ever become a science? Human Factors Society Bulletin, 30, 1-4. Smith, B. D., Rockwell-Tischer, S. and Davidson, R. (1986). Extraversion and arousal: Effects of attentional conditions on electrodermal activity. Personality and Individual Differences, 7, 293-303. Smith, B. D., Wilson, R. J. and Davidson, R. (1984). Electrodermal activity and extraversion: Caffeine, preparatory signal and stimulus intensity effects. Personality and Individual Differences, 5, 59-65. Smith, M. J., Mackie, R. R. and Wylie, C. D. (1985). Stress and sonar operations: Concerns and research methodology. Proceedings of the 29th Annual Meeting of the Human Factors Society, 452-456. Smith, R. L. and Lucaccini, L. F. (1969). Vigilance research: Its application to industrial problems. Human Factors, 11, 149-156. Strauss, J., Lewis, J. L., Klorman, R., Peloquin, L. J., Perlmutter, R. A. and Salzman, L. F. (1984). Effects of methylphenidate on young adults' performance and event-related potentials in a vigilance and paired-associates learning test. Psychophysiology, 21, 609-621. Strelau, J. (1983). A regulative theory of temperament. Australian Journal of Psychology, 35, 305-317. Stroh, C. M. (1971). Vigilance: The Problem of Sustained Attention. Oxford: Pergamon Press. Teichner, W. H. (1974). The detection of a simple visual signal as a function of time of watch. Human Factors, 16, 339-353. Teichner, W. H., Arees, E. and Reilly, R. (1963). Noise and human performance, a psychophysiological approach. Ergonomics, 6, 83-97. Thackray, R. I. and Touchstone, R. M. (1989). Effects of high visual taskload on the behaviours involved in complex monitoring. Ergonomics, 32, 27-38.
330
H. S. Koelega
Thayer, R. E. (1989). The Biopsychology of Mood and Arousal. Oxford: Oxford University Press. Tucker, D. M. and Williamson, P. A. (1984). Asymmetric neural control systems in human self-regulation. Psychological Review, 91, 185-215. Vanderwolf, C. H. and Robinson, T. E. (1981). Reticulo-cortical activity and behavior: A critique of the arousal theory and a new synthesis. Behavioral and Brain Sciences, 4, 459-514. van Laar, M. W. (1988). A Methodical Comparison of the Residual Effects of Lormetazepam I mg and Oxazepam 50 mg upon Simulated and On-the-road Driving Performance. Report. Utrecht: Netherlands Institute for Drugs and Doping Research. van Leeuwen, T. H., Verbaten, M. N., Koelega, H. S., Camfferman, G., van der Gugten, J. and Slangen, J. L. (1994). Effects of oxazepam on eye movements and performance in vigilance tasks with static and dynamic stimuli. Psychopharmacology, 114, 109-118. van Leeuwen, T. H., Verbaten, M. N., Koelega, H. S., Camfferman, G., van der Gugten, J. and Slangen, J. L. (1996). Effects of oxazepam on EEG frequency bands, event-related brain potentials, and vigilance performance. Psychopharmacology (in press). Vervaeck, K., Deboeck, M., Hueting, J. and Soetens, E. (1982). Traces of fatigue in an attention dual task. Bulletin of the Psychonomic Society, 19, 151-154. Vroon, P. A. (1990). [Psychological aspects of sick buildings]. Utrecht: ISOR. Waag, W. L. (1971). The prediction of individual differences in monitoring performance. Unpublished PhD thesis, Texas Tech. University. Warm, J. S. (1977). Psychological processes in sustained attention. In R. R. Mackie (Ed.), Vigilance: Theory, Operational Performance and Physiological Correlates (pp. 623-644). New York: Plenum. Warm, J. S. (1984a). An introduction to vigilance. In J. S. Warm (Ed.), Sustained Attention in Human Performance (pp. 1-14). Chichester: Wiley. Warm, J. S. (Ed.) (1984b). Sustained Attention in Human Performance. Chichester: Wiley. Warm, J. S., Chin, K., Dittmar, M. L. and Dember, W. N. (1988). Effects of head restraint on signal detectability in simultaneous and successive vigilance tasks. Journal of General Psychology, 114, 423-431. Warm, J. S., Dember, W. N., Lanzetta, T. M., Bowers, J. C. and Lysaght, R. J. (1985). Information processing in vigilance performance: Complexity revisited. Proceedings of the 29th Annual Meeting of the Human Factors Society, 19-23. Warm, J. S., Rosa, R. R. and Colligan, M. J. (1989). Effects of auxiliary load on vigilance performance in a simulated work environment. Proceedings of the 33rd Annual Meeting of The Human Factors Society, 1419-1421. Welch, J. (1898). On the measurement of mental activity through muscular activity and the determination of a construct of attention. American Journal of Physiology, 1, 288-396. Wesnes, K. and Warburton, D. M. (1983). Effects of smoking on rapid information processing performance. Neuropsychobiology, 9, 223-229. Wickens, C. D. (1984). Engineering Psychology and Human Performance. Columbus, OH: Merrill. Wiener, E. L. (1977). Controlled flight into terrain accidents: System-induced errors. Human Factors, 19, 171 - 181. Wiener, E. L. (1980). Midair collisions: The accidents, the systems and the Realpolitik. Human Factors, 22, 521-533. Wiener, E. L. (1984). Vigilance and inspection. In J. S. Warm (Ed.), Sustained Attention in Human Performance (pp. 207-246). Chichester: Wiley. Wiener, E. L. (1987). Application of vigilance research: Rare, medium, or well done? Human Factors, 29, 725-736. Wiener, E. L. (1989). Reflections on human error: Matters of life and death. Proceedings of the 33rd Annual Meeting of the Human Factors Society, 1-7. Wiener, E. L., Curry, R. E. and Faustina, M. L. (1984). Vigilance and task load: In search of the inverted U. Human Factors, 26, 215-222. Williges, R. C. (1971). The role of payoffs and signal ratios in criterion changes during a monitoring task. Human Factors, 13, 261-267.
Sustained attention
331
Wyatt, S. and Langdon, J. N. (1932). Inspection Processes in Industry. Industry Health Research Board Report No. 63. London: HMSO. Wylie, C. D., Mackie, R. R. and Smith, M. J. (1985). Comparative effects of 19 stressors on task performance: Major results of the operator survey. Proceedings of the 29th Annual Meeting of the Human Factors Society, 457-461. Yerkes, R. M. and Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18, 459-482. Zametkin, A. J. and Borcherding, B. G. (1989). The neuropharmacology of attention-deficit hyperactivity disorder. Annual Review of Medicine, 40, 447-451.
Chapter 9 Brain Potential A n a l y s i s of Selective Attention A. A. Wijers, G. Mulder, Th. C. Gunter and H. G. O. M. Staid University of Groningen, The Netherlands
Selective mechanisms regulate which information has most impact on the behavior of the organism. Selectivity prevents the organism from responding reflexively to its environment: it enables flexibility of behavior. Given that there is selectivity, one has to address the following questions. First, what constitutes an 'attentional channel'? This question is concerned with a characterization of the range of stimuli that is selectively processed. The second question is by which mechanisms attentional channels are set up, maintained and changed. These mechanisms are sometimes under active voluntary control of the subjects, but they can also be driven externally and automatically (Jonides, 1980). The third question is, in which respects attention influences the processing of a source of information, once this source has been selected. Selective attention can be investigated by studying performance in simple tasks as a function of manipulations of task variables. In this 'mental chronometric' (Posner, 1978) approach it is attempted to deduce the nature and timing of covert information processing from the discrete end-products of this processing (i.e. the overt behavioral responses, e.g. reaction times). Several different approaches for such inferential procedures have been developed (Coles, 1989; Meyer et al., 1988) However, these approaches critically depend on the validity of several basic assumptions, which mostly have not remained unchallenged (e.g. the assumption of pure insertion for the subtraction method and the assumption of constant stage output for the additive factors method). In this chapter we will review research in which selective attention is investigated with the aid of event-related potentials of the brain (ERPs). In contrast to behavioral measures, ERPs are continuous, millisecond to millisecond, online reflections of information processing in the brain. By combining mental chronometry with ERP analysis, an attempt has been made to identify particular ERP components as reliable and valid markers of specific aspects or stages of information processing (Hillyard and Kutas, 1983). Once this has been established, ERPs can serve as 'windows on cognition' (Coles, 1989), enabling a more direct view on the sensory, cognitive and motor processes underlying behavior. ERPs can clarify the timing, ordering and interactions of the intermediate processes that are engaged in specific cognitive activities (Hillyard and Kutas, Handbook of Perception and Action, Volume 3
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
ISBN 0-12-516163-8
333
334
A. A. Wijers et al.
1983). Importantly, ERPs can be obtained to stimuli that have not been responded to overtly. Another aspect of ERPs is that they can also serve as 'windows on the brain' (Coles, 1989). With source localization techniques the generators of ERP components can be localized in the brain; in this way insight can be obtained about how the brain performs cognitive functions. In our opinion these two aspects, ERPs as windows on cognition and as windows on the brain, are intimately intertwined. If the manipulation of a task variable is associated with a particular ERP component, this component is not necessarily a direct manifestation of the manipulated process. Alternatively, the ERP could reflect the consequence of the manipulated process (e.g. awareness of the outcome of the process) or a separate mechanism only indirectly related to the process (e.g. arousal) (Gaillard, 1988). In addition, exactly which mental processes are manipulated by particular task variables may not be clear at all. Therefore, the interpretation of ERP components may be influenced profoundly by knowledge about the underlying brain mechanisms. In the remaining sections we will see that ERP components reliably reflect the differential processing of attended and unattended information. ERP markers of selective processing have contributed important evidence with regard to central questions in the study of selective attention. A traditional issue is the question whether attention influences early or late stages of processing. With ERPs we can determine the exact latency after stimulus presentation at which processing starts to be modulated by attention, and we can investigate how this latency varies as a function of input modality and stimulus discriminability. ERP research has disclosed important characteristics of attentional channels, such as their bandwidths, and how attentional resources are allocated within attentional channels and between multiple attentional channels in divided attention. An important organizational aspect of selective attention concerns whether the different stimulus attributes of perceptual objects are attended separately (analytic processing) or whether the combination of attributes is attended as a whole (holistic processing). It appears that such different modes of processing can be more clearly distinguished on the basis of ERP analysis. By recording ERPs to unattended stimuli, direct evidence can be obtained about the level of processing attained by these stimuli. Approaches have been developed to determine whether particular mental operations (memory search, mental rotation, language processing) are under voluntary control and can selectively be confined to attended stimuli. Voluntary control is an important aspect in the distinction between automatic and controlled modes of processing. Another aspect is that some stimuli may automatically capture attention. This phenomenon can also be studied by recording ERPs to irrelevant stimuli. The available evidence will be reviewed in sections 3 and 8. The lateralized readiness potential (LRP) has proved to be an extremely useful marker of response activation. A central element of the 'continuous flow conception' of information processing (Eriksen and Schultz, 1979) is that irrelevant stimulus aspects may fail to be ignored and activate inappropriate responses. Such response competition conflicts may hamper task performance. With LRPs it has become possible to determine in which situations these limitations of selective attention occur, and to follow closely the time-course of response competition conflicts.
Brain potential analysis of selective attention
335
ERPs A N D THEIR N E U R A L BASES ERPs are small phasic brain potentials that are time-locked to the occurrence of concrete events (e.g. sensory, cognitive or motor events), and which are superimposed on the spontaneous, ongoing background electroencephalogram (EEG). The ERP is separated from this (often larger) background EEG by averaging stimulus-synchronated EEG epochs over several repetitions of the same event. ERPs consist of a series of peaks characterized by their polarity (positive or negative) and latency. The ERP is a far-field reflection of patterned neural activity associated with informational transactions in the brain (Hillyard and Kutas, 1983). It is generally accepted that ERPs are generated by the currents that result from neuronal action potentials and from synaptic contacts between neurons in the nervous system. For both of these phenomena there is intracellular current flow along the axis of the cell prolongations, which is balanced by an equal extracellular current flow in the opposite direction. Postsynaptic currents are diffusive responses extending over relatively long time-scales (several hundred milliseconds); the currents of action potentials are propagating responses with a time-scale in the order of 10 ms (Williamson and Kaufman, 1990). Because of these differential timecourses, it is generally believed that the middle and long-latency ERP components reflect mainly synaptic cortical activity; early ERP components, on the other hand, may represent synchronous volleys of action potentials in the peripheral sensory pathways and in subcortical structures. For example, the waves in the auditory evoked potential occurring within the first 10 ms after stimulus delivery (waves I-VI) are known to be related to activity in the auditory nerve and relay nuclei in the auditory brain stem. The scalp-recorded ERPs (and also EEGs) are the summation of the individual electrical fields generated by large groups of active neurons (5000 or more; Lopes da Silva and van Rotterdam, 1987, cited by Peters and de Munck, 1990). The activity of groups of neurons is only measurable at a distance when the individual elements are activated synchronously. In addition, the electrical field is dependent on the spatial arrangement of the neurons. A 'closed field' cell geometry produces only local currents, whereas an 'open field' geometry produces current flow extending beyond the activated area. An example of an open field geometry is the arrangement of pyramidal cells in primary sensory cortical regions, which receive input from thalamocortical neurons in layer IV and are aligned with their dendrites oriented toward the surface of the cortex. The distribution of the electrical field across the scalp is completely determined by the properties of the electrical sources in the brain (i.e. the type of current sources; their locations, orientations and strengths), and the geometries and conductivities of the various structures of the head (the brain, brain fluid, cerebral membrane, skull and scalp). Given that these properties are known, it is possible to compute the scalp distribution with 'forward calculations'. Useful information can be obtained by recording ERPs simultaneously from several electrodes at different positions at the scalp (as is usually done). From the considerations above, it follows that when two ERP components show different scalp distributions, they necessarily reflect the activity of different generator sources. However, when comparing scalp distributions, care should be taken to account for possible differences in the source strengths (McCarthy and Wood, 1985).
336
A. A. Wijers et al.
With 'inverse calculations', one attempts to localize the active electrical source(s) in the brain on the basis of a measured potential distribution on the scalp. This requires that ERPs be measured from a substantial number of electrodes distributed across the scalp. Inverse calculations do not yield unique solutions (Nunez, 1981). That is, the solution depends upon assumptions regarding the nature of the electrical sources and the conducting medium. Although different potential distributions cannot be generated by the same pattern of electrical activity in the brain, vice versa, the same potential distribution can be generated by different source configurations. It is an important principle that the maximal amplitude of an ERP is not necessarily recorded at an electrode closest to the activated brain area. For visual evoked responses, for example, it has been observed that the responses can be larger over the nonactivated than over the activated hemisphere (Barrett et al., 1976). In the inverse approach, the electrical sources in the brain are mostly modeled as current dipoles (either one or a small number); this model is most appropriate if an ERP component reflects the activity from a (small number of) spatially restricted, localized brain area(s). The head is usually modeled with a homogeneous and isotropic (i.e. the conductivity is the same in all directions) sphere or with a compartment model (i.e. nested, homogeneous, isotropic shells). Recent work has compared sphere models with realistically shaped models of the head (Meijs, 1988) and has investigated the effects of anisotropy (de Munck, 1989). In general, the influence of the head model is more important the deeper the electrical source is localized in the brain. Since the advent of advanced low-temperature technology, it has become feasible to measure the very weak magnetic fields that accompany electrical phenomena in the brain. Whereas the EEG measures extracellular volume currents, the magneto-encephalogram (MEG) is thought to reflect primarily intracellular activity (at least as far as the magnetic field is measured in the direction perpendicular to the scalp). The magnetic field is relatively independent of the conductivities of the tissues interposed between the sources in the brain and the detectors. The magnetic field is also less dependent on the exact shape of the head than the electrical field (Meijs, 1988). Because of these advantages it is believed that electrical sources in the brain may be localized with higher spatial accuracy using magnetic measurements than using electrical measurements. The magnetic field only reflects current flow in directions parallel to the skull (tangential dipoles). Therefore, the magnetic method is probably most sensitive to sources in cortical sulci. The electrical field, on the other hand, reflects current flow both parallel to the skull and perpendicular to the skull (tangential and radial dipoles). Since the MEG and EEG reflect somewhat different aspects of neuronal function, one may think of these methods as providing complementary images of central nervous system activity (Beatty et al., 1986).
2
THE SELECTIVE
ATTENTION
PARADIGM
ERPs are ideally suited for an investigation of attentional phenomena, since, in contrast to behavioral measures, ERPs can be obtained to unattended, ignored stimuli, which are not responded to overtly. By comparing the brain responses to attended and unattended stimuli, we can directly determine the moment at which
Brain potential analysis of selective attention
337
selective attention starts to modulate brain activity. Additionally, we can gain insight about the level of analysis received by unattended stimuli. A useful distinction is that between exogenous and endogenous ERP components. An exogenous component is stimulus-bound, it varies mainly as a function of stimulus parameters, and is relatively insensitive to information processing demands. An endogenous component, on the contrary, is mainly sensitive to psychological factors, to the variations in the tasks assigned to the subject. The ERP effects of selective attention can be disclosed by subtracting the ERPs evoked by unattended stimuli from the ERPs evoked by attended stimuli. These difference potentials directly reflect the onset and duration of endogenous attention-related ERP components, independent of underlying exogenous components. The onset of such a difference potential constitutes the upper boundary for the time point at which selective processing must have started (the selection process could have started earlier without measurable ERP activity). In order to provide valid demonstrations of ERP effects of selective attention, several criteria should be met (see N~i~it~inen, 1975). First, it should be precluded that differences between relevant (to-be-attended or 'attended') and irrelevant (to-be-ignored or 'unattended') stimuli can be attributed to peripheral receptor conditions (e.g. differences in head position, direction of gaze or pupil diameter). Second, the physical stimulus parameters should be identical for attended and unattended stimuli. In practice, typical experiments use two classes of stimuli, with different values with respect to a simple physical feature (e.g. red and blue stimuli); each of these two classes is attended in different successive conditions. In this way effects of attention can be determined independently from effects of feature value, by comparing for identical stimuli (e.g. either red or blue) the conditions in which this value was attended with a condition in which the other value was attended. Third, effects of attention should be established independent of differences in the general, nonspecific state of the organism (e.g. level of arousal). Consequently, attention should not be varied between conditions (e.g. by comparing a condition in which a stimulus series is attended with a different condition in which the same series is ignored). Therefore, relevant and irrelevant stimuli should be randomly intermixed. It should be avoided that the occurrence of the relevant and irrelevant stimuli can be predicted above the chance level. A study of Hillyard et al. (1973) is often cited as providing the first valid demonstration of the effects of attention on early ERP components. The critical characteristics of this paradigm have persisted in most of the selective attention studies since then. These include a rapid presentation of stimuli (usually faster than 1 per second) in order to increase selectivity by task load, and a random, successive presentation of relevant and irrelevant stimuli. In general, in order to obtain early ERP effects of attention, it is necessary that the subject is forced to maintain an attentive state in which he or she has to attend carefully to one input channel in order to deal with the specified task and is unable to deal with both input channels at the same time (N~i/it~inen, 1982). The most important factors that have been manipulated in subsequent research are: (1) Input modality (visual, auditory, somatosensory). (2) Selection cues: the features on the basis of which relevant and irrelevant categories ('channels') can be discriminated. The number of different input channels and their discriminability.
338
A. A. Wijers et al.
(3) Duration of the inter-stimulus interval (ISI) and random versus constant ISis. (4) One-selection tasks versus two-selection tasks. In one-selection tasks all stimuli in the attended category have to be responded to overtly. In twoselection tasks overt responding has to be restricted to a small proportion of the attended stimuli, the 'target' trials. (5) The target-defining feature in two-selection tasks and the probability and discriminability of targets. (6) The type of the target response (e.g. counting, button pressing). (7) Focusing of attention to one input channel versus dividing of attention over multiple channels. A principal finding is that the ERP manifestations of selective attention are in several respects highly modality and task specific. That is, input modality and task parameters have prominent effects on (1) the point of time in the chain of information processing at which brain activity starts to be modulated by selective attention and (2) the brain areas that participate in the selection processes. The earliest onset of ERP effects of selective attention are at about 50 ms poststimulus for the auditory modality and 100 ms for the visual modality. Earlier effects have been reported for all these modalities, but as yet there is little consensus about the replicability and validity of these effects (Hillyard and Picton, 1987).
3
3.1
AUDITORY
SELECTIVE ATTENTION
Basic Findings
In the auditory modality, the effect of attention was initially thought to consist of the enhancement of the exogenously evoked N1 component (a component recorded maximally from midline frontocentral electrodes, with a peak latency at about 100 ms) (Hillyard et al., 1973). Later on it was suggested that the effect might consist of a separate endogenous attention-related negativity-'processing negativity' (N/i/it/inen, Gaillard and M/intysalo, 1978), 'Nd' or 'selection negativity'. In the remaining, the term Nd will be reserved for the difference potential obtained by subtracting the ERPs to unattended stimuli from those to attended stimuli. The term 'processing negativity' is used to denote the negativity in the nonsubtracted ERPs. Both the ERPs to attended stimuli and those to unattended stimuli may contain processing negativities (see below). The processing negativity may in some situations (dependent on task characteristics) overlap with the N1 component, seemingly increasing its amplitude. The processing negativity may extend well beyond the N1 component (N/i/it/inen et al., 1978). In the subsequent section we return to the question of whether the N1 component itself can also be modified by attention. In general, all stimuli within the attended input channel (both targets and non-targets) elicit comparable processing negativities. Furthermore, similar effects are obtained in one-selection tasks (all attended stimuli are responded to) and two-selection tasks (only a subset of the attended stimuli, the targets, are responded to). However, target stimuli may elicit additional ERP components. Hillyard et al. (1973) observed that attended target stimuli evoked a late positive component (P3), which was not evoked by the attended non-targets or by the unattended targets.
Brain potential analysis of selective attention
339
This finding has been replicated many times since then. Hillyard et al. (1973) associated the N1 and P3 with, respectively, early stimulus-set selection and late response-set selection (Broadbent, 1970). In some conditions target stimuli may also elicit other ERP components, namely 'mismatch negativity' (MMN) and 'N2b'. These ERP components will be discussed in section 3.3. In the auditory modality, qualitatively similar ERP effects of attention have been demonstrated for different selection cues: pitch, location, intensity (Hillyard and Hansen, 1986), speech sounds (Hansen et al., 1983), a combination of the pattern of change in pitch and direction of movement in space (Okita, 1979, 1987), and even complex acoustic cues (Woods, Hillyard and Hansen, 1984). However, the onset latency of the Nd depends critically on the discriminability of the attended and unattended stimuli. The easier the attended and unattended channels can be discriminated, the earlier is the onset latency of the Nd (Alho et al., 1986b, 1987; Hansen and Hillyard, 1980; Hansen et al., 1983). Typically, onsets of about 50 ms may be obtained when the channels are highly discriminable. Thus the Nd gives a precise indication of the speed with which the input channels can be discriminated (Hillyard and Picton, 1987). This may be taken as evidence that the Nd is a direct manifestation of selection processes. In general, the amplitude and duration of the Nd also increase with better channel discriminability. However, an investigation of the relations between discriminability, amplitude and duration of Nd is complicated by the fact that it is now agreed that the Nd contains two separate subcomponents. N~i~it~inen and Michie (1979) suggested that an early phase of the Nd (Nde) with a frontocentral distribution can be discriminated from a later phase (Ndl) with a more anterior (frontal) distribution. There is now ample evidence for this idea (Alho et al., 1987b; Hansen and Hillyard, 1980, 1984; Hansen et al., 1983; Michie et al., 1990; N~i~it~inen, Gaillard and Varey, 1981; Okita, 1987; Okita, Konishi and Inamori, 1982; Solowij et al., 1990; Woods et al., 1984). According to N~i~it~inen (1982) the Nde is related to the selection between the attended and unattended channels, whereas the Ndl may reflect the further processing of selected information or the active rehearsal of the channel-defining properties. The effect of channel discriminability can be attributed to an increase of processing negativity in the ERPs to unattended stimuli in the less discriminable conditions (Alho et al., 1986b, 1987a, b). That is, conditions in which attended and unattended stimuli were less discriminable mainly affected the ERPs to unattended stimuli, which showed increased negativity compared with conditions in which the stimuli were more discriminable. This suggests that, when relevant and irrelevant stimuli are made less discriminable, this mainly influences the processing of the irrelevant stimuli. Apparently, the irrelevant stimuli have to be processed longer before they can be rejected. An unexpected finding, however, was that for larger separations between attended and unattended stimuli, the ERPs to unattended stimuli in addition showed late positivity compared with control stimuli (Michie et al., 1990; Solowij et al., 1990). Similar findings were interpreted as reflecting active suppression of the processing of irrelevant stimuli or as temporary relaxation by Alho et al. (1987b). The onset latency of Nd also depends on the ISI. The onset of Nd is delayed with longer ISis (Hansen and Hillyard, 1984; Parasuraman, 1980), but this appears to hold true only for conditions using random variable ISis and not for conditions with constant ISis (N~i~it~inen et al., 1981). These results have been interpreted as
340
A. A. Wijers et al.
suggesting that rapid stimulus presentation with unpredictable ISis results in a high task load requiring a narrowly focused attentional state. With larger random ISis, however, the irregular presentation of relevant stimuli prevents adequate locking of attention and attention is also caught by the unpredictable irrelevant stimuli. With constant ISis, because of a reduced temporal uncertainty about the moment of arrival of the stimuli, attention is properly tuned at the moment of stimulus delivery, even at longer ISis (N~i~it~inen, 1982). The onset latency of Nd increases and its amplitude decreases with lower probabilities of the to-be-attended stimuli (Alho et al., 1990). The effects of ISI and probability suggest that, in order to maintain selectivity, a frequent afferent reinforcement of an internal model of the to-be-attended stimuli is required. A central question is how fast an attentional set can be set up and how long it can be maintained. The most systematic investigation of this issue has been undertaken by Hansen and Hillyard (1988). These authors used short trains of highand low-pitched tones. Before each run, the high or low pitch was randomly designated as to-be-attended. The authors found that Nd amplitude increased and its onset latency decreased over the first few trials, approximating its eventual steady-state level at the third stimulus in each run. These results were interpreted as indicating that selectivity is not preset in advance in a flexible manner. Apparently the development of selectivity needs repeated presentation of exemplars of the to-be-attended stimuli. The Nd effect showed no prominent long-term changes over a period of 60 min. This contrasts with an experiment of Donald and Young (1982), who showed a long-term decline of Nd; as suggested by Hansen and Hillyard (1988), the critical difference may be that Donald and Young (1982) used longer stimulus runs.
3.2
D o e s Attention Modulate the Activity of the N1 Generators?
Whereas the late phase of the Nd effect can easily be discriminated from the exogenous N1 component on the basis of its later latency and its more frontal scalp distribution, both the N1 and the early portion of the Nd effect have a very similar frontocentral distribution. Therefore, the possibility that there is an enhancement of the exogenous N1 component, in addition to the later endogenous Nd, is hard to rule out (Hillyard and Picton, 1987). Theoretically it is important to settle this issue. If selective attention modulates the activity of the brain areas in the primary auditory cortex that generate the N1 component (i.e. the same sources of brain activity underlie the ERPs to both attended and unattended stimuli), this suggests an efferently mediated 'gating' of afferent input at or before the level of the auditory cortex (Arthur et al., 1989). Thus, this would be strong evidence of an early, intraperceptual selection mechanism. If, on the other hand, selective attention only results in separate, endogenous sources of brain activity, this is more supportive of the idea that attention involves an independent, extraperceptual process. This is a clear example of how the interpretation of an ERP component may depend on knowledge of the underlying brain mechanisms. It has been suggested that the N1 reflects the activity from several distinct sources overlapping in time (N~i~it~inen and Picton, 1987). If so, this may hinder a
Brain potential analysis of selective attention
341
straightforward comparison between the N1 and Nd effect (Arthur et al., 1989). If attention modulates only one (or a subset) of these overlapping components, then a dissociation may be observed between the field distribution of the entire unattended N1 component and the field distribution of the Nd effect of attention. This problem could be solved if the individual generators of the N1 subcomponents could separately be localized. Nevertheless, on the basis of subtle differences between the scalp distributions of the N1 and Nd components, it has been concluded that the components reflect different sources of neural activity (Alho et al., 1986a; Teder et al., 1990). In addition, the N1 component was more strongly reduced as a function of decreasing ISI than the early Nd (Teder et al., 1990). Recent research has attempted to localize the sources of the N1 and Nd components with the aid of magnetic responses of the brain (ERFs). Hari et al. (1989) established the existence of the magnetic equivalent of the processing negativity ('Nd-m'), a slow protracted displacement of the ERF as a function of attention. This Nd-m appeared to be located on the supratemporal auditory cortex. The late phase of the processing negativity could be shown to be localized in a different area of the auditory cortex than the N1 component (Arthur et al., 1989, 1991). The sources of the early phase of the processing negativity, however, could not be distinguished from those of the N1 (Arthur et al., 1989; Hari et al., 1989; Woldorff et al., 1991).
3.3
Other ERP Components Elicited by Target Stimuli in Selective Attention Tasks
Although most of this section pertains to findings with auditory stimuli, results that appear to generalize across the auditory and visual modalities will also be discussed. In two-selection selective attention tasks it is usually found that attended target stimuli elicit a centroparietal P3 component, whereas unattended targets do not (Hillyard et al., 1973). Hillyard et al. (1973) associated the N1 and P3 effects of attention with stimulus-set selection and response-set selection (Broadbent, 1970), respectively. Interestingly, relevant targets elicited a P300 for the very first trial of short, cued stimulus series, even though the ERPs evoked by these stimuli did not show processing negativity (Hansen and Hillyard, 1988). Apparently, stimuli can be selected at different stages of analysis, although the early selection mechanism enables more efficient task performance (as evidenced by the finding that reaction time (RT) and P3 latency decreased for the later trials of the series). There is some evidence that under special circumstances (a small separation between attended and unattended stimuli) the P3 may generalize to unattended stimuli (N/i/it/inen, 1982). In general, the P3 is evoked by attended, task-relevant stimuli. However, infrequent physical deviations (e.g. a shift in intensity or pitch) of an irrelevant tone sequence may elicit a positive wave, which is somewhat earlier and more anteriorly distributed than the relevant target P3. This 'P3a' (to be distinguished from the centroparietal 'P3b') is usually preceded by an N2 wave. The 'N2b-P3a' complex is regarded as an index of momentary shifts of attention from the attended into the unattended channel (N/i/it/inen, 1990; N/i/it/inen and
342
A. A. Wijers et al.
Gaillard, 1983). Such shifts of attention probably occur when the physically deviant stimuli are obtrusive, novel or important. Large N2b-P3as were elicited by novel sounds (e.g. environmental noises), occurring only once in an experiment (N/i/it/inen, 1990). Also in the visual modality, irrelevant deviant or novel stimuli may give rise to late positive components (Hillyard and Picton, 1987). In divided attention conditions (i.e. conditions in which all stimuli are designated as equally relevant), the P3 (i.e. P3b) component is sensitive to a variety of experimental manipulations (for a review see Hillyard and Picton, 1987; Pritchard, 1981; Verleger, 1988). Large P3s are evoked by low-probability, task-relevant stimuli which are designated as targets that have to be responded to. P3 amplitude is larger the more information is provided by the stimulus and also the more subjective value is assigned to the stimulus (e.g. in terms of award). In dual-task paradigms it has been found that P3 amplitude reflects the allocation of limited-capacity resources: the P3 to secondary task stimuli is smaller the more difficult the primary task. Moreover, P3 amplitude reflected only the perceptual demands of the primary task but not the response/motor demands (Isreal et al., 1980a, b). Later studies have shown that P3 amplitude to primary and secondary task stimuli reflects the tradeoff in resource allocation between both tasks. If the relative emphasis is shifted from one task to another (e.g. by reward structure), P3 amplitude to stimuli from the first task decreases and P3 amplitude to stimuli from the second task increases (Kramer, Sirevaag and Braun, 1987). The sensitivity of P3 to perceptual task demands is also evident from research on P3 latency (Magliero et al., 1984; McCarthy and Donchin, 1981). This research has led to the stimulus evaluation hypothesis of P3, i.e. the idea that the latency of the P3 component is related to the duration of perceptual stimulus analysis, independent of motor processes (response selection, preparation and execution). In visual and memory search paradigms it has been found that P3 latency increases and its amplitude decreases as a function of both display and memory load (Brookhuis et al., 1981, 1983; Hoffman, Simons and Houck, 1983; Mulder et al., 1984; Van Dellen et al., 1985). P3 latency, however, increases less as a function of load than does RT, and whereas yes- and no-RT responses show a self-terminating pattern for higher loads (the slope of the function relating RT to load is about half as large for yes-responses than for no-responses), P3 latency shows an exhaustive pattern (P3 latency increases equally as a function of load for yes- and no-responses; Brookhuis et al., 1981). It has been concluded from these findings that the effects of task load on RT data may in large part be attributed to motor processes (Mulder et al., 1984). After extensive consistent mapping (CM) training, leading to a transition from controlled to automatic processing according to Shiffrin and Schneider's (1977) and Schneider and Shiffrin's (1977) theoretical framework, P300 latency becomes less dependent on task load (Hoffman et al., 1983). As compared with varied mapping (VM) conditions, P300 amplitude is comparable or even larger (Hoffman et al., 1983; Van Dellen et al., 1985). The latter finding suggests that even after considerable training the stimulus evaluation system is still involved in the processing of information. P300 latency indicates, however, that the build-up of evidence within the stimulus evaluation system may be considerably faster after CM training, probably enabling the response activation system to respond earlier. Thus, although there are large differences between the controlled and automatic processing modes, there is no evidence for a qualitative difference in the sense that
Brain potential analysis of selective attention
343
automatic processing is capacity unlimited whereas controlled processing is capacity limited. N/i/it/inert and Gaillard (1983) have distinguished (mainly on the basis of research in the auditory modality) two varieties of N2 components, both of which occur in response to low-probability deviant stimuli. These are the N2b (see above) and the N2a or 'mismatch negativity' (MMN). MMN is elicited whenever there is a change in repetitive aspects of auditory stimulation. MMN is a slow, frontally maximal negativity overlapping the N1 and P2 components. It may have an onset latency of as early as about 50 ms. The onset of MMN is earlier, its duration is shorter and generally its amplitude is larger, the more the low-probability deviant stimulus differs from the high-probability standard. MMN is strongly probabilitydependent: the smaller the probability of the deviant the larger the MMN. In a two-selection focused attention paradigm, MMN is elicited by both the attended and unattended deviants (N/i/it/inen et al., 1978, 1980). For attended stimuli, MMN is overlapped by the N2b. For unattended stimuli, only MMN is elicited, but in some conditions an N2b is found as well (N/i/it/inen, Simpson and Loveless, 1982). MMN is also generated in control conditions in which the subjects ignore the stimulus sequence (e.g. read a book). There is even evidence that MMN can be observed during sleep (N/i/it/inen, 1990). It is assumed that the neuronal representation that generates MMN has a duration in the order of 5-10 s, since MMN is not observed when ISis longer than this critical value are used (M/intysalo and N/i/it/inen, 1987). The generation of MMN is entirely dependent on the sequence of stimuli that preceded in the last few seconds: even a high-probability event produces MMN when it is preceded by a deviant. Furthermore, MMN to the second deviant in a row is smaller than to the first deviant (Sams, Alho and N/i/it/inert, 1983). There is evidence that MMN is generated in the auditory cortex. MMN reverses its polarity below the Sylvian fissure (Alho et al., 1986a). Also on the basis of magneto-encephalographic research the generator of MMN could be localized in the auditory cortex (Hari et al., 1984). The equivalent dipole for MMN was found to be located in a different area of the auditory cortex than the equivalent dipole of the magnetic N1 component (Rif, Hari and Tiihonen, 1989; Sams et al., 1989). It has been concluded that MMN reflects a preattentive automatic cerebral mismatch process, the mismatch of a perceptual input with a short-term echoic memory representation residing in auditory cortex (N/i/it/inen, 1990). The preattentire nature of this process is evidenced by the finding that attended and unattended deviants equally elicit MMN. The same finding has also been interpreted as demonstrating that physical features of stimuli are fully processed whether or not they are attended (N/i/it/inen, 1990). In some circumstances, the occurrence of irrelevant deviant events may trigger brief shifts of attention from the relevant channel into the irrelevant channel. These shifts of attention are thought to be reflected by the N2b-P3a complex (see above). In the visual modality, no MMN appears to occur (Nyman et al., 1990). N/i/it/inen (1990) suggested that this might be related to the fundamental difference between the mainly sequential nature of auditory processing and the largely parallelprocessing visual system. The fact that the duration of the auditory echoic memory is several seconds, whereas it is less than half a second for visual iconic memory, may reflect this difference, and may be responsible for the fact that no visual MMN has yet been reported.
344
3.4
A. A. Wijers et al.
Attentional Trace Theory
An influential theory on the relation between selective attention mechanisms and their reflections in ERP phenomena is N/i/it/inen's (1982) 'attentional trace theory'. This theory, mainly based on data in the auditory modality, is concerned with voluntary aspects of attention. In a subsequent paper, a more general framework was sketched, also taking into account the automatic aspects of attention (N/i/it/inen, 1990). The theory distinguishes task-unrelated and task-related sensory analysis. Preconscious, task-unrelated sensory analysis consists of two subsystems, a permanent feature-detector system and a transient-detector system. The permanent featuredetector system (partially consisting of subcortical neuronal mechanisms) extracts physical stimulus information for percepts and sensory memory; this information is, however, not necessarily consciously perceived. In sensory memory, precise stimulus representations are encoded; these memory traces are strengthened by repetition of identical stimuli. The transient-detector system is activated by stimulus onsets and offsets of long-duration stimuli, by increases or decreases of stimulus energy, or by qualitative changes (e.g. in frequency) of continuous stimuli. The neuronal mechanisms involved in the transient-detector system are thought to generate the N1 component (actually one of the N1 subcomponents as distinguished by N/i/it/inen and Picton, 1987). When the activation of the transientdetector system exceeds a certain, momentarily varying threshold, this leads to the conscious detection of the stimulus (but not to the perception of its qualitative aspects). The transient-detector system generates interrupt signals for limitedcapacity, central executive mechanisms; if these signals exceed a certain threshold this leads to an attentional switch to the ongoing sensory processes (conscious perception) and to the contents of the sensory memory. Additional interrupt signals may arise from mismatches of the present sensory input with solid sensory memory traces (established by repetition). Again, when a threshold is exceeded this leads to a switch of attention toward the stimulus generating the interrupt. It is this mismatch process that is reflected in the MMN of the ERP. The central element of task-related sensory analysis is the 'attentional trace', which is a voluntarily maintained sensory memory representation on the level of a physical feature (or features) defining to-be-attended stimuli. The maintenance of the attentional trace depends both on afferent input (frequent presentation of relevant stimuli) and on efferent input from the executive mechanisms (selective, effortful rehearsing of the relevant stimulus). Each sensory event initiates a self-terminating matching process in which the encoded stimulus attributes are compared with the attentional trace. This matching process lasts longer the more similar the eliciting stimulus is to the stimulus represented by the attentional trace. The matching process is reflected in processing negativity in the ERP; the longer the matching process takes, the more negativity develops. The stimulus attributes are matched as soon as they are encoded. Soon after stimulus presentation, only crude stimulus characteristics are available, and most stimuli will still match the attentional trace. As processing proceeds, and more and more stimulus details become available, the range of stimuli that still 'hit' the attentional trace will gradually diminish.
Brain potential analysis of selective attention
345
These mechanisms are assumed to pertain to the early phase of the processing negativity; the later phase has been related to postselection further processing or to active rehearsal (N/i/it/inen, 1982). When a stimulus completely matches the attentional trace, then it gains access to the limited-capacity executive processes. Attentional trace theory may be considered as an early selection theory: stimuli are selected on the basis of fast encoded stimulus properties in an early stage of processing. However, the basic sensory analysis proceeds independently of attention. The initial registration of stimulus attributes cannot be influenced. Therefore, attentional trace theory is an extraperceptual theory of selective attention, in contrast to the intraperceptual neural specificity theory (Harter and Aine, 1984). The attentional trace theory takes account of several aspects of the experimental data considered in the preceding sections. If relevant and irrelevant stimuli are more similar, then more of the encoded attributes of the irrelevant stimulus will correspond to the attentional trace, and more negativity will develop in the ERPs to irrelevant stimuli. This explains why the onset latency of Nd is later if relevant and irrelevant stimuli are more similar (Alho et al., 1986b, 1987a; Hansen and Hillyard, 1980). Attentional trace theory also predicts that decreased discriminability of relevant and irrelevant stimuli results in extra processing negativity in the ERPs to irrelevant stimuli (Alho et al., 1986b, 1987a). This mechanism could also account for 'gradients of attention'. The need for afferent reinforcement of the attentional trace is evidenced by the smaller Nd with longer ISI (Hansen and Hillyard, 1984; Parasuraman, 1980), by a decrease of Nd as a function of a smaller probability of the relevant stimulus category (Alho et al., 1990), and by the finding that there is no processing negativity for the first stimulus in a series (Hansen and Hillyard, 1988). N/i/it/inen (1990) proposes that the two processing modes (task-related and task-unrelated sensory analysis) may be in part parallel and based on at least partially different sensory mechanisms. Task-unrelated sensory analysis is thought to be related to subcortical processing and processing at the level of the primary cortex, whereas task-related, selective processing may occur at secondary sensory brain areas. In previous sections we have already reviewed evidence that the N1 and MMN components indeed reflect activity from (different) areas in primary auditory cortex. The evidence with regard to the cerebral localization of the processing negativity seems to be inconclusive.
4
4.1
VISUAL
SELECTIVE ATTENTION
Comparison Between the Effects of Selective Attention in the Auditory and Visual Modalities
ERP effects of selective attention in the visual modality differ in several respects from those in the auditory modality. Generally, the effects in the visual modality have later onset latencies (100 ms or more) than in the auditory modality (earliest effects at about 50 ms). Furthermore, in the auditory modality different types of
346
A. A. Wijers et al.
selection cues result in qualitatively very similar ERP effects, varying only quantitatively (i.e. with respect to onset latency, amplitude and duration), primarily as a function of selection cue discriminability. In the visual modality, on the other hand, qualitatively different ERP effects are found for different selection cues; the available evidence especially points to a special status of selection by visual location (see next section). The onset latencies of the selective attention effects appear to be more strongly dependent on the type of selection cue than on its discriminability (although direct evidence for the influence of discriminability is lacking). In several other respects the visual and auditory modalities are comparable. First, ERP effects of visual selective attention are similar for one- and two-selection tasks, and are also similar for different types of responding (counting versus manual responding). Second, in two-selection tasks the early effects of attention are much the same for target and non-target stimuli, but attended target stimuli elicit additional late positivity (P3). To our knowledge there is no research directly demonstrating that ERP effects of visual selective attention are sensitive to manipulations of the ISI or to the probability of the relevant stimulus. Nevertheless, it appears that the ERP effects of visual spatial attention are less dependent on the length of the ISI than the effects in the auditory modality (Wijers, 1989). For nonspatial selections, on the other hand, early ERP effects of attention were found to be absent with long ISis (Kenemans and Verbaten, 1990), but present for the same stimuli presented with a short ISI (Kenemans, Kok and Smulders, 1993).
4.2 Basic Findings In visual selective attention a very robust finding is that attention to a spatial location and attention to nonspatial stimulus attributes are associated with qualitatively different patterns of ERP effects. The current view is that, whereas nonspatial attention results in a slow endogenous negativity (much like the 'processing negativity' in the auditory modality), spatial attention results in an enhancement of a series of positive and negative deflections, being most prominent at posterior electrodes (Hillyard and Mangun, 1986; N/i/it/inen, 1986). This has been interpreted as suggesting that spatial attention may involve a modulation of perceptual processing, whereas nonspatial attention involves postperceptual selective processes (e.g. attentional trace matching; N/i/it/inen, 1986). Research on spatial attention has typically used tasks in which stimuli are randomly presented to the right or left of fixation, with a substantial separation between fixation and the stimulus locations (at least 3 ~, but mostly much more). The most consistent observation is the enhancement of the posterior P1 (peaking between 100 and 160ms) and N1 (160-210ms) components (Eason, 1981; Eason, Harter and White, 1969; Eason, Oakley and Flowers, 1983; Hillyard and Miinte, 1984; Hillyard, M/inte and Neville, 1985; Hillyard et al., 1984; Mangun and Hillyard, 1987, 1988; Mangun, Hansen and Hillyard, 1986; Neville and Lawson, 1987; Rugg et al., 1987; Van Voorhis and Hillyard, 1977; Wijers et al., 1989d). Spatial attention also influences the ERPs in latency ranges following the N1 component. However, a variety of different results has been observed in different experiment (Wijers, 1989). Therefore, these later effects appear to be quite taskspecific (Hillyard et al., 1985).
Brain potential analysis of selective attention
347
In research on nonspatial attention the following selection cues have been investigated: color (Harter, Aine and Schroeder, 1982; Harter and Salmon, 1972; Hillyard and M/inte, 1984; Wijers et al., 1989a, b, c), spatial frequency of checkerboard patterns (Harter and Previc, 1978) and gratings (Kenemans et al., 1993; Previc and Harter, 1982), orientation (horizontal versus vertical gratings) (Harter and Guido, 1980; Kenemans et al., 1993; Previc and Harter, 1982; Rugg et al., 1987), letter size (Wijers et al., 1989c), contour (horizontal and vertical bars versus diffuse flashes) (Harter and Guido, 1980) and diagonal of a square display (Okita et al., 1985). All of these nonspatial selections are associated with negativities in the ERP, with onset latencies of 150 ms or more. The general consensus appears to be that all these effects are similar, processing-negativity-like, occipitally maximal, endogenous negativities (Harter and Aine, 1984; Hillyard and Mangun, 1986; Hillyard et al., 1990; N/i/it/inen, 1986). However, in many of these experiments too few electrodes were measured to determine the scalp distribution of the effects. Wijers (1989) has argued that the ERP effects of nonspatial attention are actually composed of two functionally and topographically distinct ERP components. The early phase of the effect (onset 150ms) satisfies the above-mentioned, processing-negativity-like characteristics. The effect is negative at Oz and positive at anterior electrodes (Fz, Cz and Pz). Second, starting at about 200 ms, there is a biphasic negative-positive Cz maximum effect of attention. Wijers (1989) suggested that, whereas the early negativity reflects the selection process itself, the later effect is an N2b-P3a complex, which reflects a postselection process in which the stimuli gain access to the 'limited-capacity channel'. The N2b-P3a appears to be feature nonspecific (a similar effect was found for color, letter size, diagonal and location; Gunter et al., 1994b; Wijers, 1989). The early occipital negativity, on the other hand, was argued to be specific for attention directed to the color of stimuli and to be absent when attention is directed to other nonspatial attributes (letter size and diagonal; see Wijers, 1989). More evidence is needed on this issue, however. Thus, the early ERP effects of visual selective attention are feature-specific (at least for selections on the basis of color and location). Such feature-specific effects reflect that different brain mechanisms are involved in selecting different stimulus attributes. The existence of feature-specific selection mechanisms is in accordance with the idea that attention involves the modulation of feature-specific perceptual neural pathways (i.e. an intraperceptual theory of attention; e.g. Harter and Aine, 1984). It has been claimed that different stimulus attributes are selected in a fixed order, as evidenced by the increasing onset latencies of the ERP effects of attending respectively to location, contour, color, spatial frequency, orientation and conjunctions of features (Harter and Aine, 1984). However, there is no conclusive evidence regarding the question whether different onsets could be due to differences in the discriminability of the different feature levels (Hillyard and Mangun, 1986; N/i/it/inert, 1986); as we saw, discriminability is of major importance in the auditory modality. Hillyard and M/inte (1984) investigated two different levels of discriminability in a spatial attention task. However, in the difficult condition the effects of spatial attention disappeared altogether, suggesting that in this condition both locations effectively fell within the 'spotlight of attention'. Wijers et al. (1989d), on the other hand, found similar P1 effects of spatial attention for locations that were easy and difficult to discriminate.
348
A. A. Wijers et al.
Electrophysiological selective attention research usually employs paradigms in which the attended stimulus category remains the same throughout a series of stimuli. This contrasts with the vast majority of behavioral studies on spatial attention. These experiments employ tasks in which a different location is randomly cued on a trial-by-trial basis (the 'Posner task'). The cue indicates the location where the subsequent stimulus is most likely to be presented; subjects respond to all stimuli, both at the cued (valid) and noncued (invalid) locations. In such tasks, performance is more efficient for valid than for invalid trials, but the attentional benefits virtually disappear when the same location is cued on each trial (Posner, Snyder and Davidson, 1980; Posner et al., 1984). This suggests that an active attentional orienting is required for performance benefits to occur. In ERP experiments, on the other hand, very similar results have been obtained in trial-by-trial cuing paradigms as in the standard (constant focus) situation (Aine and Harter, 1986; Harter and Annlo-Vento, 1991; Mangun, Hillyard and Luck, 1993; Wijers, 1989). Therefore, performance in Posner tasks may be governed by mechanisms such as expectancy and/or surprise (Hillyard et al., 1985), in addition to the early selection mechanisms as reflected by ERP effects. Harter's work (Harter and Annlo-Vento, 1991) also revealed slow ERP shifts in the cue-target interval. These effects were thought to be associated with the control, directing and sustaining of spatial attention. Hillyard et al. (1985) investigated the time-course of spatial attention effects within short series of stimuli. Comparable to the results in the auditory modality, it took about four to six trials before the effect of attention on the P1 component was maximally developed. The enhancement of the N1 component, on the other hand, was present for the very first stimulus at the attended location, and remained constant over the series. Note that these findings are hard to reconcile with the results from trial-by-trial cuing paradigms (see above), which showed undiminished ERP effects in conditions in which attention had to be aligned anew on each trial.
4.3 Neurophysiological Mechanisms of Spatial Attention Since the exogenous P1 and N1 components are among the earliest observable deflections in the scalp-recorded ERP, the modulation of their amplitudes by spatial attention, without any change in latency or wave shape, has been interpreted as a sign of facilitated sensory processing within the visual system (Hillyard and Mangun, 1986; Hillyard et al., 1985). If the P1 component reflects the aggregate neuronal response to visual information arriving at the primary visual cortex (V1), then its modulation could even reflect a precortical filtering mechanism (Eason, 1981; Eason et al., 1983; Hillyard et al., 1985). In this view, the principal effect of attention is to increase or decrease the level of activation of the neural elements that are involved in the visual encoding of the stimulus. The existence of such 'sensory gating' or 'sensory gain control' (Hillyard et al., 1990) mechanisms could be demonstrated if it were found that the ERPs evoked by attended and unattended stimuli were generated by the same visual brain areas, differing only in activation strengths.
Brain potential analysis of selective attention
349
The first piece of evidence in this direction can be derived from an investigation of the hemispheric lateralizations of the P1 and N1 components and their modulations by spatial attention. If the effect of attention is largest over the hemispheres showing the largest P1 and N1 components, then it is not implausible that attention modulates the activity of their underlying generators. Indeed, the lateralizations of the P1 and N1 components and their modulations by attention have often been found to correspond. The N1 component is usually larger over the hemisphere contralateral to the visual field of the evoking stimulus; for the P1 both ipsilateral lateralizations (Rugg, Lines and Milner, 1985; Rugg, Milner and Lines, 1985; Rugg et al., 1987; Wijers et al., 1989d) and contralateral lateralizations (Mangun and Hillyard, 1987, 1988, 1990a; Hillyard et al., 1990) have been reported. In the traditional spatial attention paradigm, stimuli are presented randomly one by one to either the right or left visual half-fields; the effect of attention is established by comparing the ERPs evoked by left (or right) visual field stimuli in a condition in which this location was attended with a condition in which the opposite location was attended. One might argue that such conditions may not be optimal for attentional selectivity since the visual system is not heavily loaded and since the appearance of an irrelevant stimulus in an otherwise empty field may automatically draw attention. Recently an approach was developed, exploiting hemispheric lateralization of attention effects, for studying selective attention to multi-element stimulus displays. In this approach stimuli are presented simultaneously to the left and right visual half-fields, and in different stimulus runs subjects are instructed to attend to one half-field only. The effect of attention is established by comparing the ERPs at electrodes contralateral to the attended visual half-field with the ERPs at electrodes ipsilateral to the attended visual half-field. In this way it could be demonstrated that attention enhanced the P1 component contralateral to the attended visual half-field as compared with the ipsilateral P1; this difference emerged even somewhat earlier than 100 ms (Heinze et al., 1990; Hillyard et al., 1990; Luck et al., 1990). The increased contralateral positivity persisted throughout the P1, N1 and P2 latency ranges; thus the usual enhancement of the N1 component was not obtained. It was argued that the usually obtained N1 enhancement results from a reorienting of attention towards the relevant location. This reorienting occurs because the abrupt onsets of unilateral, irrelevant stimuli automatically attract attention. In bilateral displays this does not occur because there is a stimulus in both half-fields (Luck et al., 1990). This hypothesis was supported by analyzing sequence effects on the N1 component: it appeared that the N1 attention effect is mainly present in trials in which a relevant stimulus is preceded by an irrelevant stimulus, necessitating a reorienting of attention. On the basis of multi-channel ERP recordings, the scalp topographies of the P1 and N1 components evoked by attended and unattended stimuli can be compared more precisely. Similar scalp topographies provide evidence that attended and unattended stimuli may have activated the same cerebral generators. Mangun and Hillyard (1988) showed that the scalp topography of the P1 component showed maximal positivity contralateral to the stimulated visual field; the location of the maximal positivity was very similar for attended and unattended stimuli. Similar results were presented by Hillyard et al. (1990) and Mangun et al. (1993). The latter authors concluded on the basis of current source density analyses and magnetic resonance imaging that the maximal P1 focus was overlying the lateral
350
A. A. Wijers et al.
prestriate cortex (areas 18 and 19), for both attended and unattended stimuli. The P1 maximum was situated ventrolaterally in relation to that of the N1, which lay near the border of the occipital and posterior parietal lobes. It was suggested that this could reflect that the P1 attention effect reflects modulation of information flow along the ventral, prestriate stream of visual processing, whereas the N1 enhancement is a sign of attentional control over the dorsal projection route. The ventral stream projects from area V1 through prestriate areas V2, V3 and V4 to the inferior temporal cortex. This visual system is concerned with object recognition. The dorsal stream projects from V1 through prestriate areas V2 and MT to the posterior parietal lobe and is important for encoding the spatial aspects of visual inputs and for guiding visuomotor performance (Desimone and Ungerleider, 1989). Thus, although attention does not modulate activity at the level of the primary visual cortex, it could involve sensory gain mechanisms in extrastriate cortical areas. Preliminary results from research on the localization of spatial attention effects on the basis of magnetic brain responses support the idea that these effects are localized in prestriate brain areas (Wijers et al., 1992). Thus, significant progress is being made in discovering the neurophysiological mechanisms underlying visuospatial attention. The theory of Mangun et al. (1993) represents an important attempt to relate ERP data to neurophysiological schemes based on single-cell recordings in monkeys. As we saw, when a stimulus belongs to an attended input channel, this results in a modulation of the activity of the brain areas known to be involved in visual perception. Much less ERP research is done on the brain mechanisms involved in directing and maintaining attentional channels. Neurophysiological evidence suggests that several brain structures are important in establishing the sensory gain in occipital brain areas. These include the lateral pulvinar nucleus in the thalamus, the superior colliculus (Desimone et al., 1990) and the posterior parietal cortex (Posner and Petersen, 1990). These three brain systems are thought to be involved in different aspects of attentional control. The parietal cortex first disengages attention from its present focus, the superior colliculus acts to move the focus of attention to a different location, and the pulvinar engages attention at the new location (Posner and Petersen, 1990). Unfortunately, not much is known yet about the ERP reflections of such attention-directing mechanisms. Furthermore, there is a puzzling dissociation between the paradigms used to demonstrate effects of attention on single-cell activity and the paradigms used in ERP research. Whereas stimuli are presented one by one at an attended or unattended location in the typical ERP paradigm, single-cell activity in the prestriate visual areas is found to be modulated primarily when to-be-attended and to-be-ignored stimuli are presented simultaneously; in this case attention acts to inhibit the neuronal responses to the irrelevant stimulus (Desimone et al., 1990). Also the effects of deactivating the pulvinar or the superior colliculus could be demonstrated only when attention was directed to a target in the presence of a distractor.
4.4 Neural Specificity Theory The neural specificity theory (Harter and Aine, 1984) specifically addresses the brain mechanisms thought to be involved in visual selective attention, and their reflections in ERP phenomena. A central concept in this theory is the neural
Brain potential analysis of selective attention
351
channel, which is a group of neurons with identical receptive field properties, all encoding the same stimulus feature. Selective attention is thought to be mediated by the efferent modulation of neural channels; this is manifested by attentionrelated negativities (or positivities) in the ERPs. The modulation of such neural channels can occur at both cortical and subcortical levels, and it involves both facilitatory and inhibitory mechanisms. The facilitation/inhibition of neural channels may take place prior to stimulus presentation (by instructions), or after stimulation as a consequence of information derived from the stimulus itself. The onset latencies and timing of 'selection negativities' reflect that the neural channels for encoding different stimulus features are activated in series but continue in parallel once activated. The selection of different attributes may be started in a fixed order: first location, then contour, color, spatial frequency, orientation, and finally conjunctions of features. The level of the nervous system at which neurons are selectively sensitive to a particular stimulus attribute could be a major determinant of the speed and efficiency with which that attribute can be selected (Harter and Aine, 1986), so that certain features have inherent advantages over others in the selection process. Harter and Aine (1984) distinguish two different visual projection systems. The geniculostriate system (retina, lateral geniculate nucleus of the thalamus, striate cortex (area 17), prestriate cortex (areas 18 and 19) and inferotemporal cortex (areas 20 and 21)) is thought to be involved in the (selective) processing of stimulus pattern (i.e. the 'what' of the stimulus). The tectopulvinar system (retina, superior colliculus, pulvinar and lateral posterior nucleus of the thalamus, parietal cortex (area 7) and prestriate cortex (areas 18 and 19)) responds quicker than the geniculostriate system, and is assumed to process primarily location and movement (i.e. the 'where' of the stimulus). It should be mentioned that Harter and Aine's (1984) view regarding the importance of the tectopulvinar system in spatial vision conflicts with the current view that both the pattern recognition functions of the inferotemporal cortex and the visuospatial functions of the posterior parietal cortex depend on input from the striate cortex (Desimone and Ungerleider, 1989). According to this view, there are two major processing systems, both originating from the striate cortex: the ventral projection route for pattern vision (object recognition) and the dorsal projection route for spatial perception and visuomotor performance. The ventral stream projects from area V1 through prestriate areas V2, V3 and V4 to the inferior temporal cortex. The dorsal stream projects from V1 through prestriate areas V2 and MT to the posterior parietal lobe. The critical feature of the neural specificity theory is that it is an early selection, intraperceptual theory of attention, assuming that selective attention acts at the level of the same neuronal mechanisms that are also involved in the perceptual encoding of the visual world. This is in contrast to N/i/it/inen's (1982) attentional trace theory, in which a separate, extraperceptual mechanism (the attentional trace) is postulated to account for selective attention phenomena. Whereas neural specificity theory predicts that the same neural pathways are activated for attended and unattended stimuli, attentional trace theory predicts that the process of attentional trace matching will add a separate, endogenous source of brain activity. Thus neural specificity theory assumes that the ERPs evoked by attended and unattended stimu!i will be generated in the same brain areas. In addition, since different neural channels may be facilitated or inhibited dependent on which feature was attended, qualitatively different ERP signs of attention could be expected for different
352
A. A. Wijers et al.
selection cues. According to attentional trace theory, on the other hand, we would expect that the effect of attention (as determined by the difference potential obtained by subtracting the ERP to unattended stimuli from the ERP to attended stimuli) is generated in different brain areas than the nonsubtracted ERPs to unattended and attended stimuli. Moreover, since attention to different features is mediated by one common mechanism (the attentional trace), it seems more obvious to expect qualitatively similar ERP effects for different selection cues (although one could postulate that the attentional trace is not localized in one specific brain region, but is spread over different brain areas representing different selection features). As we saw, the question of whether or not auditory selective attention involves the modulation of exogenous components has not been settled definitively. In the visual modality, however, it is clear that qualitatively different ERP effects are found for spatial and nonspatial selections. In addition, spatial attention appeared to act upon the same brain areas that were activated by unattended information. On the basis of such considerations, N/i/it~inen (1986) suggests that the attentional trace mechanism pertains to nonspatial attention only, whereas neural-specificitylike mechanisms may be involved in spatial attention. However, as we saw, it is not yet certain whether different nonspatial selections all result in qualitatively similar ERP effects. Furthermore, little is known about the brain mechanisms underlying nonspatial selections. A second difference between attentional trace theory and neural specificity theory is that the former postulates exclusively facilitatory mechanisms, whereas the latter explicitly assumes the possibility of both inhibition and facilitation. For this reason, when ERP effects were indicative of such inhibitions, processes additional to the attentional trace matching process (active suppression, relaxation), had to be postulated (Alho et al., 1987b). Attentional trace theory and neural specificity theory also predict different patterns of results in conditions in which subjects attend to combinations of features ('multi-feature selections'). According to neural specificity theory, the neural channels responsible for the perception of different visual attributes will be facilitated or inhibited independently. Thus, attending different features will give rise to independent ERP effects, i.e. the ERP effect of the relevance of a feature will not depend on the relevance of other features. Attentional trace theory assumes that different features are matched to the attentional trace in a self-terminating fashion. Therefore, if the stimulus is found to be irrelevant along an easily discriminable stimulus dimension, the further matching of the stimulus along the other stimulus dimensions may be terminated. For such a situation we may find that the ERP effect of one stimulus feature being relevant may depend on the relevance of another feature. According to attentional trace theory, in order to maintain an attentional trace, frequent afferent reinforcement is needed. This idea accounts well for the observed effects in the auditory modality of the ISI, probability of the relevant stimulus and the time-course of ERP effects of attention within stimulus series. Furthermore, a self-terminating matching process explains the effects of stimulus discriminability and the occurrence of processing negativity in the ERPs to irrelevant stimuli. The original formulation of neural specificity (Harter and Aine, 1984) did not explicitly account for such findings, probably because the effects of these task variables are less well established in the visual modality.
Brain potential analysis of selective attention BANDWIDTH
OF ATTENTIONAL
353
CHANNELS
An attentional channel may be defined '... by the set of stimuli that are processed more effectively as a result of the attentional act; this includes the attended locus in sensory space and a range of loci adjacent to it along the relevant dimension'. The bandwidth of such a channel describes the size of the attended zone along the chosen sensory dimension. Although bandwidth is a meaningful concept for any attended sensory dimension, in the performance literature it is almost exclusively found in relation with research on visuospatial attention. This research had led to the hypothesis that visuospatial attention can be conceived of as a mental spotlight, which moves independent of the direction of gaze, and facilitates the information processing within its range (Posner et al., 1980). In such a conception one of the objectives becomes to determine the size (i.e. bandwidth) of the spotlight. Research on response competition had demonstrated that the interfering effects of flankers on the identification of a central target letter was restricted to situations in which targets and flakers were presented within about 1~ of vision (Eriksen and Eriksen, 1974; Eriksen and Hoffman, 1972). A later formulation is the zoomlens metaphor of spatial attention (Eriksen and St James, 1986; Eriksen and Yeh, 1985), in which the size of the attended region (i.e. the bandwidth) is variable. The bandwidth of an attentional channel can be determined by recording ERPs in response to a range of stimuli varying with regard to their values on a particular sensory feature. Only one of the feature values is instructed as to-be-attended, and the bandwidth of the attentional channel can be determined as the range of stimuli sharing attention-sensitive ERP components with the to-be-attended stimulus. This approach was followed by Harter and Previc (1978) for size-specific attentional channels. They presented random series of checkerboards with seven different check sizes and diffuse flashes, while subjects attended and responded to one of the check sizes or to the diffuse flashes. The effect of attending check size consisted of a broad negativity. This effect was largest for the checkerboards with attended check size, and became gradually smaller for stimuli that were more dissimilar from the attended check size. Interestingly, at an early phase of the attention effect (at 160ms), there was a broader range of stimuli eliciting the attention-sensitive component than at a later phase (260 ms). This suggests that the bandwidth of the attentional channel became progressively more narrowly tuned in the course of processing. The bandwidth of the attentional channel corresponded to the bandwidth of size-specific channels as determined by interocular interference experiments. This was taken as evidence for the central assumption of neural specificity theory, namely that the same neural mechanisms are at the basis of both perception and attention. Wijers et al. (1989d) investigated the bandwidth of the spatial attention channel by presenting colored bars (either red or blue) one at a time at one out of eight possible locations on a hemi-circle around fixation (Figure 9.1). In one of the conditions, subjects were instructed to attend to one stimulus location (the relevant location) only and to respond to bars in a particular (target) color presented at this location. It appeared that not only were P1 and N1 amplitudes enhanced when the bars were presented at the relevant location, but also when the bars were presented at the two adjacent locations (Figure 9.2).
354
1--1 ,I
A. A. Wijers et al.
Focused Condition
Divided Condition
.1,1 I1,
,I I:],
FI, .
I Relevant Location Irrelevant Locations
,I
9
I
[]' l-J
Relevant Locations
~ Irrelevant Locations
Figure 9.1. Stimulus presentation of Wijers et al. (1989d). Subjects fixated the central fixation box while red or blue colored stimulus bars were presented one at a time at one of the indicated spatial positions (1-8). In the focused attention condition, location 1 is relevant (to-be-attended). In the divided attention condition locations 1-4 were all relevant (to-be-attended). Subjects responded upon the detection of bars in the target color presented at the attended location(s). The separation between fixation and the stimulus bars was 3.1 ~ for all stimulus positions. Bars presented at locations 1 and 2 were separated by 0.15 ~ (shortest distance between the bars; distance between the most distant corners was 1~ Locations 1 and 3 were separated by 1.23 ~ (most distant corners 2.1~ 1 and 4 by 3.83 ~ (4.92~ and 4 and 5 by 0.17 ~ (0.87~
At this stage of processing, the 'attentional spotlight' spanned at least about 2 ~ of visual angle. The target-non-target color difference effect showed three successive phases with different spatial distributions (Figure 9.3). The early phase of the color selection effect (160-240 ms) showed a similar distribution as the effects of spatial attention per se on the P1 and N1 components; this suggested that, for this phase of processing, color selection was dependent on spatial attention. The later target color selection effect (240-400ms: N2b) was observed for the relevant location and the adjacent location. P300 (400-700 ms) was evoked only by bars in the target color at the relevant location. These results suggest that successive stages of selective processing for color become more and more spatially tuned, i.e. show a decreasing bandwidth. A fundamental issue concerns the distribution of attentional resources over the range of stimuli within the attentional channel. One possibility is that resources are evenly distributed over the range of the attentional channel. In the spotlight metaphor of spatial attention this implies an equal 'illumination' for all parts of the attended visual region. Also in the zoomlens metaphor the attentional resources are evenly distributed over the attended region, but since there are only limited resources, as the size of the zoomlens increases, each part within the attended region receives 'less illumination'. An alternative is that attentional resources are unevenly distributed, for example with fewer resources allocated to stimuli further removed from the attended locus. For the attentional spotlight this would imply that areas further away from the center of the spotlight would receive less 'illumination'. The idea of such a 'gradient' structure of spatial attention has received some support from the results
355
Brain potential analysis of selective attention
Focused P 3 / P 4 MICROVOLT
f
N1
P1
-1 -2
-3 -4
I
1/8
L!
2/7
!
!
l
i
!
I
3/6
4/5
1/8
2/7
3/6
4/5
LOCATION RELEVANT FIELD
-~u--IRRELEVANT FIELD
Figure 9.2 Focused attention condition. Mean (n = 6) amplitudes of the P1 and N1 components at the parietal electrodes for each of the different spatial positions, separately for stimuli at the relevant location (relevant field, location 1) and locations in the same visual half-field as the relevant location (relevant field, locations 2-4), and for stimuli in the opposite half-field (irrelevant field, locations 5-8). The values were averaged over attend-right and attend-left conditions and over ispilateral and contralateral electrodes.
of reaction time experiments (Shulman, Sheehy and Wilson, 1986; Shulman, Wilson and Sheehy, 1985). In ERP research, evidence for a gradient structure of attention can be obtained if there is a gradual decline of the attentional response elicited by stimuli as they are further removed from the 'center of attention'. If the stimuli most similar to the attended stimulus elicited a full attentional response, whereas other, more dissimilar stimuli failed to elicit any attentional response, this would supply counterevidence against a gradient of attention. Again, note that, although the idea of a gradient of attention has evolved from research on spatial attention, in principle it applies to the allocation of attentional resources along other feature dimensions as well. Thus, the results of Harter and Previc (1978) are consistent with the idea of an attentional gradient: the attentional response gradually declined as the spatial frequency of the stimuli became less similar to the spatial frequency of the attended stimulus. Note also that such results are directly predicted by attentional trace theory (N/i/it/inen, 1982), without the explicit assumption of uneven distributions of attentional resources. In this theory, the more similar a stimulus is to the attended stimulus, the longer it will 'hit' the attentional trace, and the more negativity will develop in the ERP.
A. A. Wijers et al.
356
Focused Target
- Nontarget
MICROVOLT Fz 1 6 0 - 2 4 0
Cz240-400
ms
ms
P3/P4
400-700
~--- " " ~ "
ms
-~lJ
-1 -2 -3
I
,
,
1/8
2/7
3/6
, 4/5
1/8
2/7
3/6
4/5
i
~
1/8
2/7
, 3/6
i 4/5
LOCATION RELEVANT
FIELD
--o--IRRELEVANT
FIELD
Figure 9.3. Color selection effects in the focused attention conditions. The depicted values were obtained by subtracting the mean amplitudes in the specified latency ranges for irrelevant color stimuli from the mean amplitudes for relevant color stimuli. These values were averaged over attend-right and attend-left conditions and also over attend-blue and attend-red conditions.
Mangun and Hillyard (1987, 1988) presented stimuli randomly to three different locations, 5 ~ to the right or left of the vertical meridian or at the vertical meridian. Gradients of attention were assessed by comparing the ERPs evoked by the lateral (right or left) stimuli in three different conditions (attend the location of the stimulus, attend the central location, and attend the opposite location). It was found that the occipital P1 and N1 components systematically decreased as attention was directed further away from the location of stimulus presentation (Mangun and Hillyard, 1987). A parallel systematic decrease was also obtained for signal detection performance (d'; Mangun and Hillyard, 1988). In another experiment (Mangun and Hillyard, 1990a) stimuli could be presented at four different stimulus locations, two near lateral locations (2 ~ to the right or left of the vertical meridian), and two far lateral locations (6 ~ to the right or left). Each of these locations was attended in different runs. For the stimuli presented at far lateral locations, P1 amplitude decreased progressively as attention was focused at locations further away. Wijers et al. (1989d) failed to obtain evidence for a gradient of attention, although all the four stimulus locations in each visual half-field were within the 5 ~ range over which a gradient was reported to occur by Mangun and Hillyard (1987, 1988, 1990a).
Brain potential analysis of selective attention
357
In the performance literature there is some evidence that the size of the mental spotlight increases as it is directed to positions further from fixation (Shulman et al., 1985, 1986). To our knowledge there is no ERP research addressing this issue. One of the hidden assumptions of bandwidth studies is that the bandwidth is independent of the number and range of irrelevant stimuli. Attentional trace theory assumes that frequent afferent reinforcement of the attentional trace is required in order to maintain an accurate representation of the relevant stimulus. Indeed, it had been found that effects of attention decrease with a lower probability of the relevant stimulus class (Alho et al., 1990). By increasing the number of irrelevant stimulus classes, the probability of the relevant stimulus class usually decreases; therefore, in bandwidth studies the focusing of attention may be less optimal than could have been obtained in other task situations. An important accomplishment of this type of ERP research is the insight that the bandwidth of the attentional channel may change as processing proceeds (Harter and Previc, 1978). For example, the size of the mental spotlight may be different at different latencies after stimulus presentation. Indeed, the central core of N/i/it/inen's (1982) 'attentional trace theory' is that the range of stimuli matching the attentional trace progressively narrows down during perceptual analysis. This corresponds to a gradual decrease of the bandwidth of the attentional channel over time. The concept of bandwidth can be applied, not only to early selection/matching processes, but to later stages of processing as well. Information processing may be conceived of as a series of component processes (stages of processing), each of which receives an input representation, performs some transformation on that representation, and produces an output representation for another process (Miller, 1988; Sanders, 1983). In principle, selectivity could occur at any stage of processing, whenever a process restricts its transformations to a subrange of the representations received by the preceding process. The bandwidth of each process is determined by the characteristics of this subrange of mental representations. In this conception, selectivity of processing is not necessarily restricted to any particular (early or late) stage of processing. Furthermore, what is selected does not necessarily bear a one-to-one relationship to objects in the external world; what is selected is whatever may characterize a subrange of mental representations. It is also conceivable that there exist internal representations that are accessible to some information processing modules but not to others (Kahneman and Treisman, 1984); for example, the meaning of an unattended message may effectively trigger physiological reactions (Corteen and Wood, 1972), yet be unavailable for memory or consciousness. Finally, in principle a stage of processing could act upon representations that were ignored by a preceding stage; the bandwidth of a later process could be wider than that of a later process. For example, even though the encoding of (aspects of) irrelevant stimuli could be attenuated at an early stage of processing, (aspects of) these stimuli could still enter later stages of processing. LaBerge (1983) demonstrated that the size of the spatial spotlight of attention depends on task demands. LaBerge et al. (1991) showed that the bandwidth of the spatial attention channel may also show temporal variations. On the basis of response competition effects it could be demonstrated that the effective size of the attended area was narrowed in order to identify a target symbol, but its size was increased again after this identification had been finished, i.e. attention 'zoomed in' for target identification and 'zoomed out' thereafter.
358
A. A. Wijers et al.
Spotlight or zoomlens analogies of spatial attention presuppose that attention is distributed over a coherent visual area and cannot be divided over two separate locations (Posner et al., 1980). Evidence in favor of this idea can be derived from an experiment of Okita et al. (1985), in which subjects focused attention to a diagonal of a display, i.e. they divided attention over two separate locations, one in each visual half-field. Although the relevant stimuli evoked a similar late negativity (N2b) as in a more typical spatial attention task (Wijers, 1989), the earlier P1 and N1 enhancement was not observed. Therefore, the spotlight conception apparently applies to an early stage of processing only (reflected by P1 and N1 effects), since later effects (N2b) were very similar for the different paradigms.
6
DIVIDED
ATTENTION
In divided attention conditions, multiple input channels have to be monitored at the same time. In studying bandwidths of attentional channels, or gradients of attention, the main idea is that, as a principle of structural organization of the information processing system, some attentional resources spread from the attended locus into a range of loci adjacent to it. In divided attention it is investigated whether and how attentional resources can, as an act of voluntary control, be shared among multiple loci in sensory space. In the auditory modality, when attention is divided over two channels, the N1 amplitude is intermediate between the amplitudes of the attended and unattended stimuli in focused attention conditions (Hink et al., 1977; Okita, 1979; Parasuraman, 1978, 1980). Similar results have been obtained for divided visuospatial attention conditions, in which subjects attended to both locations, one to the left and one to the right of fixation, at the same time. In such conditions the amplitudes of the attention-sensitive components were found to be intermediate to the values of the attended and unattended locations in focused attention conditions (Eason, 1981; Van Voorhis and Hillyard, 1977). Wijers et al. (1989d) presented colored stimulus bars randomly to one out of eight different locations arranged in a hemi-circle (Figure 9.1). In one condition subjects were instructed to divide their attention over all four locations within one visual half-field. Subjects had to respond when bars in a target color were presented at any of the attended locations. For all four attended locations the P1 component was enhanced compared with the four locations in the opposite half-field. This effect was comparable to the effect in a focused attention condition in which only one location was attended. The N1 effect, on the other hand, was largely diminished in the divided attention condition. This may suggest that the N1 is sensitive to the division of attention within visual half-fields, whereas the P1 only reflects the allocation of resources between the visual half-fields. Mangun and Hillyard (1990b) investigated ERPs as a function of the systematic manipulation of the relative allocation of attention between left and right stimulus arrays. The stimuli consisted of four letters, which were randomly presented to the right or left visual fields. In different conditions, subjects were instructed to direct attention either exclusively to the left or right stimulus arrays, or to divide attention between the two sides in specified proportions (75% left/25% right, 50% left/50% right and 25% left/75% right). Subjects were required to detect the occurrence of a
Brain potential analysis of selective attention
359
target letter in the designated visual field(s), pressing a right-hand button for right visual field targets and a left-hand button for left visual field targets. As more attention was allocated to stimuli flashed in one visual field, ERPs elicited by those stimuli increased in amplitude, while ERPs to the opposite field stimuli progressively decreased. This was true for the P1 and N1 components and also for the P300 evoked by target stimuli. In addition, measures of target detectability and RT showed systematic tradeoffs as a function of attentional allocation. AOC (attention operating characteristics) curves were constructed by plotting performance and ERP components for stimuli in the left visual field as a function of performance for the stimuli in the right visual field in the different attention allocation conditions. The P1 and N1 components displayed almost linear tradeoff functions. Thus, as more attention was allocated to a visual field, this resulted in an increase in P1 and N1 for that visual field, which was balanced by an approximately equal decrease in P1 and N1 for the other visual field. Such a pattern of results implies that a single, limited resource was being shared between the two half-fields. Performance and P300, on the other hand, showed rather efficient dual-task performance, in that there were relatively small performance decrements from the focused to the divided attention conditions. This indicates an uncoupling of sensory gain (P1/N1) and perceptual performance (and P300), in which higher-level perceptual processes were still able to extract and analyze the information from the progressively diminishing sensory signal. To summarize: these data show that ERP components sensitively reflect the allocation of attentional resources to multiple input channels. The finding that different ERP components are associated with distinctive tradeoff functions supports the idea that separate resources exist for different stages of processing (Sanders, 1983). A multiple resource view also emanates from research on P300 amplitude in dual-task situations. This research demonstrated that P300 amplitude is sensitive to perceptual-central task load but not to motor task load. Apparently, separate resources exist for the perceptual-central and for the motor stages of information processing.
7
MULTIDIMENSIONAL
SELECTIONS
In multidimensional stimulus selections, the input channels are defined by orthogonal combinations of several feature values along a number of different stimulus attributes. In the following we will deal only with the simplest situation, in which there are four input channels defined by combinations of two values on two different attributes (e.g. for the auditory attributes pitch and location the different stimulus categories are: high pitch/left ear, high pitch/right ear, low pitch/left ear, low pitch/right ear). The subjects are instructed to attend to one of these different categories only. Hansen and Hillyard (1983) and Hillyard and Hansen (1986) set forth how different patterns of ERP effects in such conditions may disclose the validity of different selective attention models. A distinction can be drawn between analytic and synthetic models. In analytic models, individual stimulus features are selected independently, and the results of these independent analyses are then combined into a unified perception. In synthetic models, attention is first focused on the entire
360
A. A. Wijers et al.
configuration of attributes (i.e. the object as a whole), and the individual features can only be retrieved indirectly. In multidimensional stimulus selection experiments, three different patterns of ERP effects may be observed. Let us assume that attention is directed to a combination of pitch (high versus low pitch) and ear of presentation (left versus right ear). First, it could be observed that only those stimuli sharing both relevant attribute values (i.e. the relevant pitch presented to the relevant ear) elicit a distinctive effect on the ERP (e.g. elicit a negativity). Such a conjunction-specific effect would indicate that selective attention is specific to the conjunction of attribute values (in accordance with a synthetic model of attention). Second, it could be observed that all stimuli sharing one of the relevant attribute values (being either of the relevant pitch or presented in the relevant ear) elicit an attentional ERP response independent of the value on the other attribute. Thus, in this case the ERP effect of pitch being relevant is similar for stimuli in the right and left ears, and vice versa, the ERP effect of ear being relevant is similar for high- and low-pitched tones. Such a pattern of results would be in agreement with an analytic model of attention, in which both attributes are selected independently of one another. Finally, the ERP effect of one of the attributes (e.g. pitch) being relevant could depend on the relevance of the other attribute. In this example, pitch being relevant would result in an ERP effect only when the ear was also relevant. Such results would suggest an analytic model of attention, in which the different attribute selections are hierarchically organized, i.e. the outcome of the selection process of one attribute (ear) acts upon the selection process of the other attribute (pitch). Such a hierarchical organization could be realized by both serial and parallel processing contingencies. That is, the selection process of a feature could await the completion of an earlier selection process, or both selection processes could start in parallel, whereby the outcome of one of the selection processes terminates the other selection process before it has been finished. The first to use a multidimensional selection paradigm in ERP research were Previc and Harter (1982), who studied attention to grating stimuli varying in orientation and spatial frequency. For early parts of the ERPs the effects of attention to spatial frequency and orientation were independent, whereas the effects were conjunction-specific for later latency ranges. Hansen and Hillyard (1983) investigated multidimensional selections in conditions in which both auditory stimulus attributes (pitch and ear of presentation) were easily discriminable, and in other conditions in which one of the attributes was made difficult to discriminate. When both attributes were easy, the results were most in accordance with conjunction-specific selections, although the fit was not perfect: there were also small ERP effects when only one of the stimuli was relevant. When one attribute was difficult to discriminate, the relevance of this attribute caused ERP effects only when the other, easy attribute was also relevant. Thus, the selection of the difficult attribute was contingent on the selection of the easy attribute. On the basis of P3 latency and RT data, the authors concluded that the analyses of the stimulus attributes start and proceed in parallel. At the moment that the analysis of the easy attribute indicates that the stimulus is irrelevant, all processing of the stimulus is terminated. In a similar experiment in the visual modality, Hillyard and M6nte (1984) studied attention to conjunctions of color (blue versus red) and location (left versus right of fixation). In separate experiments, the location discrimination was made easy or difficult. In the easy condition the standard spatial attention effects on the
361
Brain potential analysis of selective attention
100-240
ms
250-400
Fz
ms
Cz
microvolt
microvolt
3.5
2.5
1.5
0.5
COLOR-
COLOR*
SIZE*
~
SIZE-
COLOR* ~
COLORSIZE*
~
SIZE-
Figure 9.4. Mean amplitudes of the grand-average ERPs (n = 12) in the 100-240 ms latency range at Fz and in the 250-400 ms latency range at Cz. The values are shown separately for the attended stimulus category (COLOR + SIZE +), for stimuli having only the attended color (COLOR + S I Z E - ) , for stimuli having only the attended size (COLOR- SIZE + ), and for stimuli with neither the attended color nor the attended size (COLOR- S I Z E - ) .
ERPs were obtained, irrespective of the color of the stimulus. The typical effect of color relevance was also obtained, but only for stimuli presented at the relevant location. Consequently, these data suggested that color selection was hierarchically dependent on selection by location. In the experiment with difficult location discriminations, the effects of spatial attention virtually disappeared and the color relevance effect was found for stimuli at both locations. For this experiment, the results excluded the possibility of conjunction-specific attention, but were inconclusive with regard to the other possibilities. Wijers et al. (1989b) presented stimulus letters in two different colors (red or blue) and in two different letter sizes. Both stimulus attributes were easily discriminable. Figure 9.4 shows the mean ERP amplitudes in the 100-240 ms range and in the 250-400 ms range (N2b). In the early latency range, the relevant letter size had an effect only when color was also relevant. This implies that letter size selection is hierarchically dependent on color selection. However, in the later latency range, the relevance of the letter size was reflected in the ERPs even if color was irrelevant. Thus, although the early effects suggested a hierarchical contingency in which stimulus processing is terminated for stimuli in the irrelevant color, these stimuli could still activate the later attentional orienting process as reflected by N2b. This shows that a later stage of processing may use stimuli that appeared to be ignored in an earlier stage. Taken together, most findings on multidimensional selections are consistent with the view that individual features are attended before conjunctions of features. These data support the view of early selection theories of attention, that stimuli are
362
A. A. Wijers et al.
selected before perceptual processing is complete. In many instances hierarchical processing contingencies could be observed, in disagreement with neural specificity theory which predicts independent feature selections. In a discussion on neural specificity theory (Harter and Aine, 1986; Hillyard and Mangun, 1986; N/i/it/inen, 1986), Harter and Aine objected to this point that hierarchical selections have most consistently been demonstrated in the auditory modality, to which neural specificity does not directly apply. In the visual modality, attention to nonspatial cues is regarded as identical to attending a particular cue at a specific location, so that nonspatial selections are necessarily hierarchically dependent on spatial selection (Hillyard and M/inte, 1984). For nonspatial selections, independent selections have indeed been demonstrated (Previc and Harter, 1982; but see Wijers et al., 1989b). This raises the fundamental issue of whether, in visual perception, the location and identity of stimuli are processed independently, or whether, alternatively, the perception of stimulus identity is contingent on the perception of its location (Johnston and Pashler, 1990). In research on visual selective attention, the vast majority of experiments are on spatial selections. In several theories, selective attention is even equated with spatial selectivity (Treisman and Gelade, 1980; van der Heijden, 1991). Such a philosophy is also implicit in spotlight or zoomlens conceptions of selective attention. As we argued, selectivity of processing could in principle occur at any stage of processing, whenever a particular mental operation is confined to a subrange of the available internal representations. There is no a priori reason why subranges of representations should necessarily be determined by spatial properties of stimuli in visual space. The emphasis on the spatial selectivity of attention may (sometimes explicitly) be driven by considerations about the physiological anatomy of the visual information processing system. Visual information is inherently spatially encoded. This spatial encoding persists throughout different levels of processing within the nervous system. At the cortical level, it has been argued that different visual features are represented in parallel in separate, extrastriate visual brain areas, which are all retinotopically organized (Cowey, 1979, 1985). According to feature integration theory (Treisman and Gelade, 1980), there is an early, preattentive, parallel registration of elementary features. Without attention these features cannot be related to one another and there is no information available concerning the spatial localization of these features. A serial, attentional spatial scanning mechanism is required to 'glue' the individual features for the perception of objects. Thus, the registration of the individual spatio-topic 'feature maps' is a preattentive process. Focused spatial attention acts by activating entries in different maps, which makes this information available for higher-level processing (Nissen, 1985; Treisman, 1988). ERP research has shown that spatial selectivity is faster than nonspatial selections and involves distinct patterns of brain activity, which confirms the special status of spatial attention. The finding of feature-specific brain mechanisms for nonspatial selections, on the other hand, is hard to reconcile with the idea that these features are preattentively registered. Additionally, on the basis of behavioral research it was concluded that spatial attention not only facilitates feature integration, but also the encoding of the individual features themselves (Prinzmetal, Presti and Posner, 1986). Accordingly, Johnston and Pashler (1990) identified several elementary confounding factors in behavioral research on location-identity binding. When these factors were eliminated, the perception of identity on the basis of elementary features was closely dependent on the perception of their location.
Brain potential analysis of selective attention
363
This is in agreement with the ERP evidence that selection of color is dependent on spatial attention (Hillyard and M6nte, 1984; Wijers et al., 1989d). However, in these situations both location and color were task relevant. To our knowledge it is unknown whether spatial attention is a prerequisite for nonspatial selections when the spatial location itself is task irrelevant.
8
AUTOMATIC
VERSUS CONTROLLED
PROCESSING
An important distinction is that between automatic and controlled modes of information processing (James, 1890; LaBerge, 1975; Posner and Snyder, 1975; Schneider and Shiffrin, 1977; Shiffrin and Schneider, 1977). Automatic processing is fast (parallel), effortless and occurs without voluntary control. Controlled processing, on the other hand, is slow (serial), effortful and under voluntary control. After sufficient learning in consistent mapping (CM) situations, in which stimuli are consistently mapped onto responses, automatization occurs, i.e. controlled processes become automatic, and performance becomes less dependent on task load. In the study of selective attention, we may distinguish two different aspects of automatic processing. First, processing to stimuli not belonging to the attended input channel may be considered automatic processing, since these processes are not under voluntary control. For example, processing to stimuli outside of the mental spotlight would be considered automatic. As another example, the notion that unattended stimuli automatically activate their semantic representations is an important characteristic of a late selection conception of attention. The second aspect concerns automatic attention responses. According to automatic control theory (Schneider and Shiffrin, 1977; Shiffrin and Schneider, 1977), automatic target stimuli may cause obligatory shifts of the attentional focus. For example, certain stimulus events outside the spotlight of attention cause obligatory switches of the spotlight toward their position.
8.1 Voluntary Control Over Working Memory Operations By combining the standard selective attention paradigm with other information processing paradigms, the degree of voluntary control over various mental processes can be established. For example, visual selective search tasks combine a selective attention task with a memory or display search task. A series of letters is presented, which, as in the usual selective attention paradigm, vary randomly with respect to a simple physical feature. The subject is instructed to attend to one of these stimulus categories and to detect the occurrence of target letters within the attended category. Target letters are letters belonging to a set of letters, memorized before the start of the stimulus series (the memory set). By varying the number of letters in the memory set, one is able to manipulate the duration of a memory comparison (search) process (Sternberg, 1969). Only upon the detection of target letters within the attended category, the subjects have to respond overtly by a fingerlift response. In such tasks, the ERP effects of attending to the physical characteristics of the stimuli (color, location, letter size, diagonal, a conjunction of color and letter size) have been described in previous sections. Importantly, the onset latencies and
364
A. A. Wijers et al.
morphologies of these effects are the same as those obtained in standard selective attention paradigms, and are independent of the size of the memory set. The onset latency of the effect of memory set size affects the ERPs only in a much later latency range. A more extensive search process was found to be reflected by increased, prolonged, long latency shifts (in the range of about 300-600 ms poststimulus), with a maximum at the Cz electrode. Wijers (1989) has argued that this 'search-related negativity' is a direct reflection of a controlled memory search process; this negativity overlaps with the P3b component and accounts (at least in part) for the well-known reduction of P3b amplitude as a function of task load in search paradigms. The negativity is in most cases completely confined to letters in the to-be-attended category. In selective search tasks, attended target stimuli elicit parietally maximal P3b components. The difference between targets and non-targets, due to the targetevoked P3b, usually has an onset latency of about 400 ms. P3b is not elicited by unattended targets: the ERPs to unattended targets and non-targets are mostly similar. These data suggest that the memory search process is under voluntary control, and serially dependent upon an earlier selection, and that target detection in turn is serially dependent on the preceding controlled search process. Similarly, Wijers et al. (1987) demonstrated that a display search process could selectively be restricted to the relevant diagonal of a square display. In this experiment subjects searched for the occurrence of a single target letter in three different conditions, one condition with a display load of two letters and two conditions with a display load of four letters. In the critical condition four display items were presented, but subjects were instructed to attend to the two letters at a cued diagonal only. In the other conditions they attended to all presented display items. The four-letter condition showed increased late negativity compared with the two-letter condition. Most importantly, the condition in which four letters were presented but two had to be attended was very similar to the two-letter condition. Apparently, the subjects effectively confined the display search process to the relevant diagonal, resulting in an effective display load of two. Wijers et al. (1989c) combined the selective search task with a mental rotation task by presenting colored stimulus letters rotated over various angles from their normal upright orientation. According to Cooper and Shepard (1973), the orientation of the letters affects the duration of a mental rotation process, in which the letters are imagined to rotate toward their normal upright position. Again, the effect of attending the color of the letter (occipital negativity with an onset latency of about 150 ms) was unaffected by memory load and letter orientation, even though the manipulations of these variables resulted in reaction times varying as much as between 450 and 1100ms. The mental rotation process was reflected in a late negativity (in the 400-700 ms latency range). This negativity was independent of memory load, but it was obtained only in the ERPs to attended letters. Therefore, the process of mental rotation also appears to be under voluntary control. The 'rotation-related negativity' showed a somewhat different scalp distribution than the 'search-related negativity', which suggests that different types of working memory operations (e.g. symbolic versus analog visual) involve the activation of different brain areas. In previous research, slow negativity in the ERP in a mental rotation paradigm had been observed by Stuss et al. (1983) and Peronnet and Farah (1989). It seems that other types of controlled processing are also associated with slow negativities.
Brain potential analysis of selective attention
365
Ruchkin et al. (1988) found an increase of late negativity related to more difficult mental arithmetic. Mecklinger, Kramer and Strayer (1992) obtained increasing negativity as a function of memory load in a category search paradigm, in which subjects had to decide whether stimulus words belonged to one of the (variable number of) prememorized categories. Long duration DC-recorded negative shifts also appear to be related to effortful processing. R6sler and Heil (1993) reported slow negative shifts (prevailing for as long as 14 s) in semantic and nonsemantic long-term memory retrieval tasks. These shifts showed systematic topographic differences as a function of the type of stored and retrieved material. More research is needed to determine whether these slow negativities reflect common brain mechanisms, or whether specific brain areas are involved in different types of controlled processing. According to R6sler and Heil (1993), slow negativities are the result of the increased excitability of the particular brain areas involved in effort-demanding tasks.
8.2 Voluntary Control Over Semantic Processing In this section we turn to an investigation of the role of selective attention in semantic processing. This is an important issue in the early versus late selection discussion. If semantic processing were fully automatic, this would strongly favour late selection theories of attention. The sensitivity of ERPs to semantic processes has now been well established (see Kutas and Van Petten, 1988, for a review). The N400, a negative ERP component elicited approximately 400ms after stimulus presentation, has proved to be a valuable index of various aspects of semantic processing. The N400 was first demonstrated as a response to violations of semantic expectancies; a large N400 was elicited by words that incongruently ended the preceding sentences (Kutas and Hillyard, 1980). Thus, the amplitude of the N400 depends on the preceding semantic context, which can consist of a sentence (or sentence fragment), a phrase or a single word (Bentin, McCarthy and Wood, 1985; Kutas and Hillyard, 1980; Neville et al., 1986). In a recent experiment, Van Petten and Kutas (1990) investigated N400 amplitude as a function of sentence position. It was found that the N400 word was smaller the later the word occurred in the sentence. This was interpreted as reflecting the gradual build-up of semantic constraints as sentence reading proceeds. Selective attention is particularly important in the process of natural reading, since the momentary point of fixation (the target word) is surrounded in both the horizontal and vertical directions by other words. McCarthy, Nobre and Wood (1989) investigated the modulation of single word processing by visuospatial attention, by presenting vertically arranged words to the left or to the right of fixation, while subjects attended to one visual field only. Whereas attended words evoked an N400-1ike ERP component, unattended words did not. Thus, there was no evidence that the unattended words were semantically processed. However, it could be argued that these results are not represenative of a natural reading situation since the words were presented vertically. Gunter et al. (1994a) investigated spatial attention in a more natural reading situation, in which relevant and irrelevant texts were presented simultaneously.
366
A. A. Wijers et al.
Congruently and incongruently ended sentences were presented word-by-word, flanked by irrelevant words in the lower visual field. The irrelevant flanker words could be congruent or incongruent with the expected sentence ending. If both the relevant sentence ending and the irrelevant flanker were congruent or incongruent, they were the same words. The distance between sentence material and flankers was manipulated as a between-subjects variable (0.57 ~ 0.97 ~ 1.37~ Afterwards, the subjects received a recognition memory test in which they were presented with the words in the reading task (both the attended words and the irrelevant flanker words) or with new words. The subjects were asked to indicate whether or not the presented words had been part of the attended sentence in the reading task. In the recognition test, flanker words were more often (incorrectly) recognized as belonging to the sentences in the reading task than new words. This effect was only observed for the two smallest sentence-flanker distances. The ERPs recorded during sentence reading showed the usual N400 enhancement when the attended sentences ended incongruently compared with when they ended congruently. The type of flanker had an effect only when the attended sentence was ended congruently. In this case, when the flanker item was incongruent, the ERP showed an increased negative shift compared with when the flanker was congruent (Figure 9.5). Importantly, this effect was absent in the condition with the largest sentenceflanker separation (1.37~ Furthermore, in the remaining two conditions, the onset
CONGRUENT 0.9 Degree
0.5 Degree
1.3 Degree
Fz Cz i.
Pz Oz
,,. \ f
Ti
.... :_ ,',,~..
. . . . .
- .,,.,
.,. . . . . . . . . .
. .
.,
,~
_,,~
Tr EOG ,
,
-200
i
0
i
200
i
400
i
600
T i m e in msec
o
800
4
lO00
i
-200
i
0
i
200
i
400
i
,
600
800
Time in msec
i
1000
-200
0
200
400
600
800
I000
T i m e in m s e c
Figure 9.5. Grand-average (n = 24) ERPs to the last words of congruently ended and incongruently ended sentences. The ERPs are superimposed for end words with identical flankers and for end words with a different flanker (thus a congruent flanker for the incongruent endings and an incongruent flanker for the congruent endings). The three columns depict different groups of subjects with different spatial separations between the end words and flankers.
Brain potential analysis of selective attention
367
latency of the negativity was delayed for the larger separation (0.97 ~, onset 490 ms) compared with the smaller separation (0.57 ~, onset 280 ms). These results suggest that irrelevant text intrudes on the reading process only if it is presented within the attended visual area. If the irrelevant information is presented outside the attended area, semantic processing of irrelevant information can be prevented by voluntary control. If the irrelevant information is presented within the attentional focus and the processing time of the relevant information is relatively short (i.e. for congruent sentence endings), then there there appears to be additional, delayed processing of the irrelevant information (see also Gathercole and Broadbent, 1987). To summarize, on the basis of ERP research there is no evidence that irrelevant information presented outside the focus of attention is semantically processed. However, much more research is needed on this topic.
8.3 Automatic Attention Responses, Automatic Target Classification and Learning According to attentional trace theory, some stimuli have an inherent attentioncalling property, generating attentional interrupts, upon which attention may switch from its current focus to the interrupt-generating stimulus. Attention-calling stimuli are, for example, abrupt onsets and changes in repetitive aspects of (auditory) stimulation. Evidence for involuntary, externally driven orienting of attention to peripheral stimulus onsets has been provided in the performance literature (Jonides, 1980; Maylor, 1985; Posner and Cohen, 1984). In most selective attention research, both to-be-attended and to-be-ignored stimuli are usually presented in the form of abrupt stimulus onsets, which would then imply that all of these stimuli initially would automatically attract attention. However, according to attentional trace theory, the interrupts to the attention system have to exceed a certain variable threshold, before the attention switch actually takes place. It seems plausible that this threshold is heightened when irrelevant, abrupt onsets occur frequently. Recently, an interesting method was used to investigate differences in the processing of onset and no-onset stimuli. In these experiments, the no-onset stimuli are presented by elimination of line segments that were present prior to stimulus presentation (Yantis and Jonides, 1984, 1990). Indeed, these authors demonstrated that selective attention may overrule the interrupts of irrelevant onset stimuli (Yantis and Jonides, 1990). It seems promising to use this technique in ERP research as well. Changes in repetitive aspects of (auditory) stimulation are automatically registered as evidenced by the mismatch negativity evoked similarly by both attended and unattended deviant stimuli. A subsequent switch of attention to the automatically registered irrelevant deviant stimuli is thought to be reflected in a separate ERP component, the N2b-P3a complex. This complex is in some conditions evoked by both relevant and irrelevant deviants (N~i~it~inen et al., 1982), whereas in other conditions it is confined to deviants in the attended input channel (N~i~it~inen et al., 1978, 1980). This is another demonstration that the call for focal attention can in some conditions be prevented by selective attention.
368
A. A. Wijers et al.
If irrelevant deviant stimuli actually evoke a switch of attention, this could be disclosed by studying the processing to stimuli presented subsequently to the switch-inducing stimuli. Wagner et al. (1991) examined performance and ERPs to the stimuli following irrelevant 'funny' (frequency modulated) stimuli. The ERPs to the funny stimuli showed an N2 component, suggesting that these did indeed cause attentional switching. When subsequently attended target stimuli were presented, these were responded to less efficiently. This could indicate that attention was still directed to the irrelevant input channel. However, the ERP data were at variance with this interpretation. These showed that relevant stimuli following the funny stimuli showed Nds comparable to regular relevant stimuli; the inter-stimulus interval that was used may have allowed attention to switch back before the next stimulus. Whereas certain stimuli, such as abrupt onsets, inherently possess attentioncalling properties, it is a central element in the automatic control theory (Schneider and Shiffrin, 1977; Shiffrin and Schneider, 1977) that the potential of stimuli to attract attention may be acquired by consistent mapping training. Wijers et al. (1987) compared the ERPs evoked by irrelevant consistently mapped (CM) targets, presented at the unattended diagonal of the display, with the ERPs to irrelevant variedly mapped (VM) targets. Whereas the CM targets should automatically attract attention, VM targets should not. However, these authors failed to demonstrate any differences in the ERPs to irrelevant CM and VM targets, even in conditions in which the spatial separation between the relevant and irrelevant items was very small. In the CM conditions of this experiment subjects searched for the occurrence of digit targets among letter distractors. Although in such conditions the display items may automatically activate nodes representing their category memberships, this may not be enough for automatic attention responses to occur (Schneider and Shiffrin, 1977). Such a response may occur solely when a particular letter (or digit) has consistently been used as a target for thousands of trials. Therefore, more research is needed on this issue. If stimuli can be learned to attract attention, it is an important question whether there is any limitation in which aspects of stimuli can be learned to attract attention: are these only simple physical features, or also more complex combinations of stimulus features, or even semantic stimulus properties? Although ERPs could in principle be extremely useful to address this issue, to our knowledge any evidence is lacking. Although learned automatic attention responses have not yet been demonstrated, evidence has been obtained that both VM and CM target letters are, in some conditions, automatically classified. In several experiments both relevant and irrelevant target letters elicited a negativity in the 200-300 ms range (Wijers et al., 1987, 1989b, c). This effect also occurred in conditions with more than one item in the memory set. Since these effects precede the increased negativity as a function of memory load, it was argued that the early negativity reflects a preattentive, automatic target classification mechanism. The conditions under which such a mechanism can be demonstrated are possibly those favoring a high signal-noise ratio in an early template matching process. The later negativity is considered a reflection of a later, attentive controlled search process, by means of which the noisy information of the first, preattentive stage of processing is verified. Since the onset latency of the early target classification negativity was somewhat later than, for example, the color selection effect, this indicates that, although the processing of the
Brain potential analysis of selective attention
369
irrelevant stimuli was attenuated, they still activated their letter identities. This suggests that the effect depends on the task-dependent preactivation of the memory nodes representing the target letters, which then need little evidence in order to become 'triggered'. Therefore, even the attenuated information from the unattended targets may be sufficient to raise the activation of these nodes above the critical activation threshold for their automatic classification.
LIMITATIONS BY M E A S U R E S
OF SELECTIVE ATTENTION OF MOTOR ACTIVATION
AS REVEALED
It is an essential question whether information processing is continuous or discrete (for overviews see Meyer et al., 1988; Miller, 1988). Information processing can be conceived of as a series of stages that receive an input representation, transform the input, and transmit an output representation to the next stage. If a stage has to be completed before it can transmit its output to the next stage, processing is said to be discrete. On the other hand, if a stage transmits preliminary results of its transformations to the next stage, before it has finished, processing is more continuous. Pure discrete and pure continuous models are end-points on a continuum (Miller, 1988). According to the 'continuous flow model' (Eriksen and Schultz, 1979), the preliminary results of perceptual analysis are continuously fed forward to the motor system. If there is conflicting perceptual information, competing responses may receive activation from. the perceptual system, resulting in less efficient performance (response competition). Response competition was first demonstrated in the 'Eriksen' paradigm, in which subjects base two-choice reactions on a centrally presented target letter (e.g. 'H' for a right-hand response and 'D' for a left-hand response). The central target letters are flanked by irrelevant, to-be-ignored letters. Compared with a condition in which the targets are flanked by response-neutral letters (e.g. XXXDXXX), reaction times are shorter when flanked by response-compatible letters (e.g. DDDD DDD), and delayed when flanked by response-incompatible letters (e.g. HHHD HHH). Similar results are obtained when a set of more than one letter is assigned to the same response hand, and the flankers are different from the target, belonging either to the letter set for the same response hand (response compatible), or to the set associated with the opposite response (response incompatible). Response competition effects are maximal when target and flankers are presented within a small visual area, and diminish with larger target-flanker separations. Eriksen and Eriksen (1974) concluded from these results that visuospatial attention cannot be restricted to areas smaller than about 1~ of visual angle. In the same vein, LaBerge et al. (1991) showed that response competition effects remained restricted to the attended visual area; these authors used response competition effects to trace changes of the size of the attended area over time. Therefore, processing of irrelevant information up to the level of response activation, leading to competition among multiple responses, can be considered as an important index to study limitations of selective attention. If response competition occurs, selectivity in early stages of processing has failed; since subjects will eventually execute the appropriate response even in response competition conditions, selectivity at the central or motor stage apparently solves the conflict.
370
A. A. Wijers et al.
In the present section we will discuss data illustrating limitations of selective attention due to (1) imperfect spatial resolution of visual attention and (2) imperfect object selections. Imperfect object selections may occur because objects can be selected only on the basis of separate attributes and not on the basis of conjunctions of attributes (Treisman and Gelade, 1980). We will discuss several of studies using the lateralized readiness potential (LRP) and electromyographic (EMG) responses as psychophysiological measures. First we will describe how these measures allow for an online monitoring of the time-course of response activation processes. Preceding voluntary movements, there is a gradually increasing negative shift, beginning at about I s prior to movement onset. The later phase of this 'readiness potential' (Deecke, Grozinger and Kornhuber, 1976) is lateralized, with, for hand movements, larger amplitudes above the hemisphere contralateral to the side of the intended movement (Kutas and Donchin, 1980). If this lateralization is considered for the left- and right-hand responses separately, it may be confounded by other processing and structural asymmetries; however, these are constant for left- and right-hand responses. Therefore, by subtracting right-hand asymmetries from lefthand asymmetries, an unconfounded measure of the LRP is obtained. This LRP, because of the logic of its derivation, is a pure, online measure of motor-related activity. It can be used to study the relative activation of response side, e.g. left versus right hand or left versus right foot (De Jong et al., 1988; Gratton et al., 1988). Recording the electromyogram, as a measure of overt muscle activation, simultaneously with the LRP is important for several reasons. In choice RT tasks, the incorrect response muscle is activated on a substantial proportion of trials (i.e. 'EMG errors'), even though the correct response is subsequently executed (Coles et al., 1985; Eriksen et al., 1985; Gratton et al., 1988; Smid, Mulder and Mulder, 1990; Smid et al., 1991). Therefore, the EMG measure provides for a more sensitive measure of incorrect response execution than button-press errors. Therefore, the combination of LRP and electromyography allows for a graded analysis of response activation mechanisms. In this analysis we can distinguish (1) trials in which incorrect response activation leads to an incorrect response (push-button errors), (2) trials in which the incorrect response had been initiated but the correct response was executed (i.e. EMG response error with correct response), and (3) trials in which the correct response was given without any EMG response in the incorrect response channel. The LRP for the latter type of trials probably does not contain components related to neuromuscular activation, and therefore can be used as an index of response activation at the central motor level. Smid et al. (1990) used these measures in an 'Eriksen' response competition task (see above). The reaction data showed the usual facilitation for compatible arrays and inhibition for incompatible arrays compared with the neutral stimuli. This was also mirrored by the number of EMG error trials, which was larger for incompatible than for neutral trials and larger for neutral than for compatible trials. Figure 9.6 shows the LRPs for the three types of display, for the trials without incorrect EMG. Lateralization of the LRP towards the side of the correct response corresponds to a negative, downward deflection. The onset latency of the LRP was about 260 ms for both compatible and incompatible trials, but importantly, on incompatible trials this concerned lateralization towards the incorrect response side (as evidenced by the initially positive LRP wave). This incorrect lateralization was followed by a late lateralization to the correct side (350 ms). In neutral trials only correct lateralization with an onset latency of 320 ms was observed. These results suggest that response
371
Brain potential analysis of selective attention = NO N O I S E
-
................ N E U T R A L
.
.
9~
7"
-1
COMPATIBLE
. . . . . .
INCOMPATIBLE
"\
\ ,
"\
,
\
~J~"-..~.
-2
-3 i
-200
,
-i00
,
0
I
i00
!
200
.
300
|
400
,
500
TIME
|
600
.
700
w
800
.
900
!
1000
IN M S E C
Figure 9.6. Lateralized readiness potentials (LRPs) in an 'Eriksen paradigm', superimposed for target letters alone and for central target letters flanked by neutral, response compatible and response incompatible letters. A negative, downward deflection indicates a lateralization contralateral to the correct response hand. An upward deflection indicates lateralization contralateral to the incorrect response hand.
activation is first driven by the identity of the flanker letters, and is initiated before the perceptual analysis of the display is completed. Only after additional perceptual analysis does the motor system receive information about the correct, central target letter. Thus, subjects are initially unable to restrict their attention to the central target letter without additional processing. This process can be conceived of as the time taken by the attentional system to 'zoom in' on the central target location (Eriksen and St James, 1986). In a second experiment response activation mechanisms were investigated in a visual search paradigm (Smid et al., 1991). Two different letters were used as the targets for left- and right-hand responses (e.g. 'Q' for the left hand and 'H' for the right hand). The target letters were randomly presented at one of the four corners of an imaginary square. On some trials a target letter was presented alone, while it was presented together with three distractor letters on other trials. These distractors could be either response compatible or incompatible. On response-compatible trials the distractors had features in common with the target letter in the display (e.g. a 'Q' among 'D', 'G' and 'C'). On response-incompatible displays the distractors had features in common with the opposite, not-presented target letter (e.g. 'Q' among 'N', 'W' and 'M'). Note that in this experiment the distractors were response neutral at the letter level, but response compatible or incompatible at the feature level. The performance data showed a clear response competition effect: RT was slower for the incompatible than for the compatible displays. More incorrect button presses and more incorrect EMG trials were found for incompatible than for compatible arrays. The LRP waveforms showed that lateralization began at about the same
372
A. A. Wijers et al.
time for all trial types, but between 200 and 320 ms the incorrect response was activated on the incompatible trials. In this experiment the features of the distractor letters in the display initially activated a response. For the single-element displays the correct response was activated by the target, so that this response could be executed without further display search. In the multi-element displays, on the other hand, it was the distractors that initially activated (correct or incorrect) responses, so that search had to continue until a target was found and a response could be executed. Thus preliminary encoded feature information was transmitted to the motor system. These data suggest that initially selective attention was restricted to the global features of the display letters; these were the features that distractors and targets share. Only after additional processing were the subjects able to attend to the more finely grained features that discriminated between targets and distractors. Up to now we considered preliminary response activation by information about to-be-ignored letters. Smid et al. (1992) investigated response activation by different attributes of the same letter. The displays consisted of single letters, presented in different colors. A subset of the colored letters was assigned to four different response alternatives consisting of the middle and index fingers of both hands (GO stimuli). To another subset of the stimuli no responses should be made (NOGO stimuli). The assignment of stimulus attributes to response alternatives was varied in several conditions. We focus here on two of these conditions, which are schematically presented in Table 9.1. In the 'hand:letter' (H:L) condition identical letters in different colors were assigned to the two fingers of the same hand. In the 'hand:color' (H:C) condition different letters in the same color were assigned to the two fingers of the same hand. In previous research (Miller, 1982) comparable conditions were investigated (with stimulus attributes letter and letter size), without the inclusion of the NOGO stimuli, but in addition with a 'hand:neither' (H:N) condition, in which there was no consistent mapping of a stimulus attribute on response hand. This research has demonstrated a reaction time benefit of the H:L condition over the H:N condition. This suggested that in the H:L condition preliminary letter information is transmitted to the motor system for advance preparation of response hand, before the letter size information, specifying response finger within a hand, is available. In the H:N condition no such advance preparation is possible. This explanation suggests that
Table 9.1. Stimulus-response mappings in two different conditions Condition
H:L H:C
Response Hand: GO-LH Finger:M I
NOGO
GO-RH I M
G1 G2 G4 C4
Stimuli G3 G4 N4 N3 D4 Q4 Q3 D3
N2 N1 C3 G3
Numbered letters denote colored letter stimuli. The colors are denoted by the numbers (e.g. 1 = red, 2 = green, 3 = white and 4 = blue). Abbreviations: M, middle finger; I, index finger; LH, left hand; RH, right hand; GO, GO stimuli; NOGO, NOGO stimuli; H:L, hand:letter condition; H:C, hand:color condition.
Brain potential analysis of selective attention
373
letter identity is selectively attended first, and used for advance response activation, whereas letter size is selectively attended later. Only after this second selection step are the conjunction of letter identity and size available for the final response selection and execution. With the H:L and H:C conditions, Smid et al. (1992) tried to determine whether subjects have strategic control over the order of selection steps. If so, the order of selection of color and letter would depend on the assignment of attributes to response hands. If there is no such control, color and letter identity would be selected in a fixed order in both conditions. In the H:L condition the identity of the letter indicates which hand to respond with on GO trials, while color indicates whether it is a GO or N O G O trial, and if it is a GO trial, which finger to respond with. If letter identity can be selected and used to prepare a response hand before color information is selectively attended we should obtain an LRP both on GO and N O G O trials. Similarly, in the H:C condition color indicates response hand (on GO trials), while letter identity signals whether a response should be made and, if so, with which finger. Thus, if color can be selected and used to prepare a response hand before letter identity is selectively attended, we would obtain an LRP both on GO and N O G O trials of this condition too. In Figure 9.7a the LRPs obtained in the GO and N O G O trials of the H:L condition are shown. They were obtained by subtracting the ERPs evoked by the letter indicating a right-hand response ('N') from those evoked by the letter indicating a left-hand response ('G'; see Table 9.1). In the N O G O trials the LRP reached significant amplitudes, but it was clearly smaller and returned to baseline sooner than the GO LRP. The development of an LRP in these N O G O trials suggests that letter identity was selectively attended and used to select response hand before color information, signaling that no response should be made, was selected. When color was attended later, and it indicated a N O G O trial, this information was apparently used to cancel the initial activation of the selected hand. If the color indicated a GO trial, the large LRP indicates that it was used to select one of the two fingers of the hand already selected, which subsequently executed the required response. In Figure 9.7b, the LRPs obtained in the H:C condition are shown. Again, in the N O G O trials the LRP reached significant amplitudes but it was smaller and returned to baseline earlier than its GO counterpart. Apparently, subjects could attend color first, and activate the associated hand. Next, they attended to letter identity and used it for canceling the activated hand on N O G O trials or selecting the appropriate finger on GO trials. The onset latencies of the LRPs in response to color (200 ms) and in response to letter form (190 ms) were about the same. Taken together these results suggest that subjects have considerable voluntary control over the order in which they attend to the different attributes of the same object. The differential saliency of the relevant response dimensions seems to determine which attribute will be selected first. Thus, when color discriminates between response hands it is selected first, but if it discriminates between response fingers it is selected second. These data illustrate the inability of selective attention to select conjunctions of attributes as a single whole. Identification of a conjunction of attributes first requires selective attention to each of the attributes separately. In the preceding paragraphs, we saw how measures of response activation could be used to investigate limitations of selective attention. Information that cannot be ignored by the attentional system may be transmitted to the motor system, and in
374
A. A. Wijers et al.
/uv
0
--
I "I" ~
-72
-
-I.44
-
~-~~"
~ ~ "
~
. . . . . . . . . . .
," ~'' . . . . . . . . .
--2.16 --2/38 i -5.60
-4.32 _
H
(a) I'
I
I
I
l
I
I
I
I
I
I
I
I
,uV 0
-.72
~,,..,.,.
-
... ,
.....
: 7 ~ ., . . . . . . . .
-
-1.44
--
-2.16
-
-2.88
--
-3.6O -4-..32 H:C NOGO
H:C GO
(b) -480
l
i
-360
-240
'
i
I
I
I
i
i
I
i
i
i
-120
0
120
240
360
480
600
720
840
960
Time in msec
Figure 9.7 LRPs to GO and N O G O stimuli in the 'Miller paradigm'. (a) Results in the "hand:letter" condition; (b) "hand:color' condition. For an explanation of conditions and stimuli, see text.
some conditions activate a (incorrect) response. We wonder whether attended information is automatically transmitted to the motor system. Wijers et al. (1989a) measured the LRP in a selective search task, in which subjects searched for memory set letters in a to-be-attended color. A response button had to be pressed only upon detection of relevant target letters. Thus, there was only one response alternative
Brain potential analysis of selective attention
375
and a low probability of responding. The LRP was derived by varying the assignment of the response button to the left or right hands in different stimulus series. Color relevance was shown to be available long before the search process was finished. Therefore, color could in principle be used for preliminary activation of response hand. In this case a LRP should be obtained both for target and non-target stimuli in the attended color. However, this was not what was found. The LRP was strictly confined to attended targets. Therefore, not all attended information is automatically fed forward to the motor system. Whether or not preliminary response activation occurs may strongly depend on the exact task conditions. Important factors might be the number of response alternatives, the probability of GO responses (see Smid et al., 1992, for a demonstration of such an effect), and the consistency of the mapping between stimuli and responses. Furthermore, it could be that subjects have strategic control over preliminary response activation. In this case, subjects could choose for preliminary response activation only in those conditions in which this mode of processing is beneficial overall. In addition, it is conceivable that preliminary response activation occurs in conditions stressing speeded performance, but not in conditions stressing accurate performance.
10
DISCUSSION
AND
CONCLUSIONS
ERP research has provided unique information regarding the temporal structure of mental processing and its implementation in the brain. From this research, evidence has accrued that proved not to be readily attainable by performance analysis. Therefore, ERP research has yielded data carrying much weight in settling longstanding controversies. One of these controversies is the early versus late selection discussion. In our opinion, ERP research has made it convincingly clear that selective attention may act upon early stages of information processing. The onset latencies of ERP effects can be quite early under the proper conditions (about 50 ms in the auditory modality and 100 ms in vision). Furthermore, the onset latencies and morphologies of these effects were found to be independent of the exact response requirements and of task complexity variables manipulating the duration of working memory operations. The effects of such task complexity factors consistently showed later onset latencies than the effects of attention per se; these effects were often selectively confined to stimuli of the attended stimulus category, suggesting that working memory operations are hierarchically and serially contingent upon an earlier selection. All these findings are consistent with the idea that selective attention starts to modulate information processing at an early stage, independent of the nature and duration of later stages of processing. Another argument for early selection is that there is now growing evidence that the generators of attention effects in the auditory and visual modalities are located in the (primary a n d / o r secondary) sensory brain areas. It has been argued that, although theories of selective attention are often formulated to apply universally to different sensory modalities, more emphasis should be put on the differences that exist between the modalities (Neumann, van der Heijden and Allport, 1986; van der Heijden, 1991). Whereas visual information transfer is inherently spatially encoded, spatial direction is just one
376
A. A. Wijers et al.
of the computable properties of auditory information. Indeed, ERP research has clearly shown that selection by location has a special status in vision but not in audition. For nonspatial visual selections, it remains to be determined whether the ERP effects are all similar for different attributes, or feature-specific. Only a small number of different nonspatial visual attributes has been investigated. In addition, little is known about the brain generators underlying the ERP effects of nonspatial selections. By systematically investigating a wide variety of selection attributes, it could be possible to determine the elementary features of vision. Another gap in our knowledge is that ERP research has dealt almost exclusively with selections within rather than across different modalities (with the exception of Hillyard et al., 1984). Important questions concern whether within-modality selection is subordinate to selection by modality and whether attention acts on modalityspecific or modality-independent representations. Are there, besides the modalityspecific brain regions that have been found for within-modality selections, also modality-independent brain regions involved in selective attention? An interesting phenomenon in the processing of multi-modal stimuli is the 'Garner interference effect' (Melara, 1989). This is the finding that mutual interference may occur in the processing of stimulus dimensions in different input modalities. This occurs, for example, when a high- or low-pitched tone is presented together with a visual pattern at a position high or low in the visual field. When the auditory and visual attributes are incompatible, tone discrimination is delayed. ERP research could reveal the level of processing (perceptual?) at which this phenomenon occurs, whether the interference is governed by selective attention mechanisms, and the brain areas that are involved. Results from multi-feature selection tasks showed that mostly individual features are selected before combinations of features. In many cases evidence for hierarchically organized selections was obtained. For example, the analysis of color appeared to depend on spatial attention; such findings provide support for early selection. Thus, features appear to be selected before objects. However, this does not preclude the possibility that top-down mechanisms may facilitate the perception of features. It could well be that features are selected faster if they belong to an object, especially if it is a well-known object. Also in the perception of feature conjunctions, top-down mechanisms appear to be involved. On the basis of performance measures, it has been argued that features have a higher probability of being conjoint within objects than across objects (Duncan, 1980; Prinzmetal, 1981; Prinzmetal and Millis-Wright, 1984). ERPs could disclose whether such phenomena indeed involve the early perceptual stages of processing. An important issue in the early versus late selection discussion is the selectivity of semantic processing. Although there is little ERP research on this matter, the prevailing evidence seems to indicate that semantic processing of irrelevant information occurs only when this information is presented close to the focus of attention. This suggests that semantic processing is restricted to information falling within the effectively attended area (e.g. within the spotlight of attention). Although ERPs indicate that information processing is modulated by selective attention at early stages, such effects do not necessarily indicate that the processing of irrelevant information is terminated (N/i/it~inen, 1990; Rugg, 1991). The available evidence seems to indicate that early selectivity probably acts more like an attenuation filter (Treisman, 1960) than as an all-or-none filter as in Broadbent's (1958) original formulation. Thus, irrelevant information is probably not completely
Brain potential analysis of selective attention
377
rejected; we mentioned several instances where irrelevant stimuli, though exhibiting ERP signs of early selectivity, still appeared to activate later stages of processing. Also, it seems conceivable that early selection mechanisms are operative only in specific environments. At this point we should stress that early selective attention effects in the ERP are obtained only in highly specific task situations, characterized by, for example, a high information load, high discriminability between relevant and irrelevant information, etc. Also, there are many situations in which relevant and irrelevant information cannot be distinguished on the basis of simple physical features (e.g. response set conditions; Broadbent, 1970). In such conditions, selective control has to be transferred to later stages of processing. This was illustrated by a previous section on ERP measures of response activation, where we could establish several limitations of selective attention. In certain conditions, irrelevant stimulus aspects were shown to 'penetrate' up to the level of the motor system. Although this led to the activation of incorrect responses, even as far peripherally as to activate the incorrect muscle, late selectivity could resolve this conflict, leading to the eventual execution of the correct response. It appears, however, that such a continuous transmission of irrelevant stimulus aspects to the motor system is limited to conditions in which early selectivity fails. This occurs when relevant and irrelevant stimulus information belong to the same object or are presented in close proximity. As a working hypothesis we could postulate that continuous transmission of information occurs only for information presented within the attended visual area (spotlight of attention). It seems interesting to speculate about relationships between different modes of multidimensional stimulus selections and preliminary response activation. Preliminary response activation on the basis of individual stimulus features will probably not occur when attention selects the conjunctions of features as a whole, but only in conditions in which the features are analyzed separately. In those conditions in which the stimulus features appear to be selected independently, the feature leading to preliminary response activation is probably under subject control and depends on stimulus-response assignments. If there are hierarchical selection dependencies, on the other hand, it could well be that preliminary response activation is possible only on the basis of the feature that is attended first. In the performance literature the idea of an attentional channel occurs mainly in the context of spotlight or zoomlens models of spatial attention. ERP research has in several important respects enriched our conception of attentional channels. First, ERP research has demonstrated that when attention is directed to nonspatial stimulus attributes, it can be conceived of as an attentional channel with a particular bandwidth, which determines the range of stimuli being processed more efficiently. More research is needed to determine which features of attentional channels are general and which are specific to spatial selections. For instance, do nonspatial channels share with spatial channels the property that attention cannot be divided over nonadjacent regions of sensory space? Is the bandwidth of nonspatial channels variable? Can we describe changes in the direction of nonspatial attention as movements, just as we can for spatial attention? A second important principle derived from ERP research is that the bandwidth of attentional channels may change during the course of information processing. Also for later stages of processing we could in principle think in terms of bandwidths. For example, the bandwidth of the response preparation stage of processing could be defined as the range of motor responses receiving activation.
378
A. A. Wijers et al.
One of the neglected aspects of selective attention is in our opinion the active anticipation and prediction of future events. Not only do people select a range of stimuli across sensory dimensions, but people also expect a range of stimuli to occur within certain time windows. Thus the diminution of selective attention effects with longer inter-stimulus intervals may reflect not so much the need for reinforcement of an internal model, but instead an increased temporal uncertainty. Furthermore, as the research of LaBerge et al. (1991) has shown, the bandwidth of attentional channels may not be fixed, but change dynamically over relatively short time intervals. Perhaps the most intricate aspect of selective attention concerns the mechanisms that establish, control and direct attentional channels. Only little is known about ERP reflections of voluntary attention directing mechanisms. With regard to automatic attention-directing mechanisms, more has to be learned about which aspects of irrelevant stimuli have attention-calling properties, and to what degree these attention-calling properties can be learned. ERP reflections of several aspects of attention switching should be delineated: (1) the automatic processing of the characteristics of unattended stimuli, (2) the attention-interrupt generated by these stimuli, and (3) the actual switch of attention to a new focus. ERPs are in principle also ideally suited to study the speed of movements of attention, the nature of the attentional channel during the movement, and the time-course of the engagement of the attentional channel to the new locus. Finally, recent performance data have shown that attentional mechanisms depend on which information was previously ignored and attended. Thus, there is a mechanism that inhibits attention from moving to locations that were previously attended (Maylor, 1985; Maylor and Hockey, 1985; Posner and Cohen, 1980, 1984), and there is an inhibition of processing of information that was previously ignored (Allport, Tipper and Chmiel, 1985; Tipper, 1985). ERP research could disclose the brain mechanisms underlying such important phenomena (Rugg, 1991).
REFERENCES Aine, C. J. and Harter, M. R. (1986). Visual event-related potentials to colored patterns and color names: Attention to features and dimension. Electroencephalography and Clinical Neurophysiology, 64, 228-245. Alho, K., Donauer, N., Paavilainen, P., Reinikainen, K., Sams, M. and N/i/it/inen, R. (1987a). Stimulus selection during auditory spatial attention as expressed by event-related potentials. Biological Psychology, 24, 153-162. Alho, K., Lavikainen, J., Reinikainen, K., Sams, M. and N/i/it/inen, R. (1990). Event-related brain potentials in selective listening to frequent and rare stimuli. Psychophysiology, 27, 73-86. Alho, K., Paavilainen, P., Reinikainen, K., Sams, M. and N/i/it'a_nen, R. (1986a). Separability of different negative components of the event-related potential associated with auditory stimulus processing. Psychophysiology, 23, 613-623. Alho, K., Sams, M., Paavilainen, P. and N/i/it/inen, R. (1986b). Small pitch separation and the selective-attention effect on the ERP. Psychophysiology, 23, 189-197. Alho, K., T6tt61a, K., Reinikainen, K., Sams, M. and N/i/it~nen, R. (1987b). Brain mechanisms of selective listening reflected by event-related potentials. Electroencephalography and Clinical Neurophysiology, 68, 458-470.
Brain potential analysis of selective attention
379
Allport, A., Tipper, S. P. and Chmiel, N. (1985). Perceptual integration and post-categorical filtering. In M. I. Posner and O. S. M. Marin (Eds), Attention and Performance XI. Hillsdale, NJ" Erlbaum. Arthur, D., Hillyard, S. A., Flynn, E. and Schmidt, A. (1989). Neural mechanisms of selective auditory attention. In S. J. Williamson, M. Hoke, G. Stroink and M. Kotani (Eds), Advances
in Biomagnetism. Proceedings of the Seventh International Conference on Biomagnetism (pp. 113-116). New York: Plenum Press. Arthur, D., Lewis, P. S., Medvick, P. A. and Flynn, E. R. (1991). A neuromagnetic study of selective auditory attention. Electroencephalography and Clinical Neurophysiology, 78, 348360. Barrett, G., Blumhardt, L., Halliday, A. M., Halliday, E. and Kriss, A. (1976). A paradox in the lateralisation of the visual evoked response. Science, 261, 253-255. Beatty, J., Barth, D. S., Richer, F. and Johnson, R. A. (1986). Neuromagnetometry. In M. G. H. Coles, E. Donchin and S. W. Porges (Eds), Psychophysiology: Systems, Processes and Applications (pp. 26-40). New York: Guilford Press. Bentin, S., McCarthy, G. and Wood, C. C. (1985). Event-related potentials associated with semantic priming. Electroencephalographyand Clinical Neurophysiology, 60, 343-355. Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press. Broadbent, D. E. (1970). Stimulus set and response set: Two kinds of selective attention. In D. I. Mostofsky (Eds), Attention: Contemporary Theory and Analysis. New York: AppletonCentury-Crofts. Brookhuis, K. A., Mulder, G., Mulder, L. J. M. and Gloerich, A. B. M. (1983). The P3 complex as an index of information processing: The effects of response probability. Biological Psychology, 17, 277-296. Brookhuis, K. A., Mulder, G., Mulder, L. J. M., Gloerich, A. B. M., van Dellen, H. J., van der Meere, J. J. and Ellerman, H. H. (1981). Late positive components and stimulus evaluation time. Biological Psychology, 13, 107-123. Coles, M. G. H. (1989). Modern mind-brain reading: Psychophysiology, physiology, and cognition. Psychophysiology, 26, 251-269. Coles, M. G. H., Gratton, G., Bashore, T. R., Eriksen, C. W. and Donchin, E. (1985). A psychological investigation of the continuous flow model of human information processing. Journal of Experimental Psychology: Human Perception and Performance, 11, 529-553. Cooper, L. A. and Shepard, R. N. (1973). Chronometric studies of the rotation of mental images. In W. G. Chase (Ed.), Visual Information Processing (pp. 75-176). New York: Academic Press. Corteen, R. S. and Wood, B. (1972). Autonomic responses to shock-associated words in an unattended channel. Journal of Experimental Psychology, 94, 308-313. Cowey, A. (1979). Cortical maps and visual perception. Quarterly Journal of Experimental Psychology, 31, 1- 17. Cowey, A. (1985). Aspects of cortical organization related to selective attention and selective impairments of visual perception: A tutorial review. In M. I. Posner and O. Marin (Eds), Attention and Performance XI (pp. 41-62). Hillsdale, NJ: Erlbaum. Deecke, L., Grozinger, B. and Kornhuber, H. H. (1976). Voluntary finger movements in man: Cerebral potentials and theory. Biological Cybernetics, 23, 99-119. De Jong, R., Wierda, M., Mulder, G. and Mulder, L. J. M. (1988). The use of partial information in response preparation. Journal of Experimental Psychology: Human Perception and Performance, 14, 682-692. De Munck, J. C. (1989). A mathematical and physical interpretation of the electromagnetic field of the brain. Doctoral Thesis, University of Amsterdam. Desimone, R. and Ungerleider, L. G. (1989). Neural mechanisms of visual processing in monkeys. In F. Boiler and J. Grafman (Eds), Handbook of Neuropsychology, vol. 2 (pp. 267298). Amsterdam: Elsevier.
380
A. A. Wijers et al.
Desimone, R., Wessinger, M., Thomas, L. and Schneider, W. (1990). Attentional control of visual perception: Cortical and subcortical mechanisms. Cold Spring Harbor Symposium on Quantitative Biology, vol. 55: The Brain, 963-971. Donald, M. W. and Young, M. J. (1982). The time course of selective neural tuning in auditory attention. Experimental Brain Research, 46, 357-367. Duncan, J. (1980). The locus of interference in the perception of simultaneous stimuli. Psychological Review, 87, 272-300. Eason, R. G. (1981). Visual evoked potential correlates of early neural filtering during selective attention. Bulletin of the Psychonomic Society, 18, 203-206. Eason, R., Harter, M. and White, C. (1969). Effects of attention and arousal on visually evoked cortical potentials and reaction time in man. Physiology and Behavior, 4, 283-289. Eason, R. G., Oakley, M. and Flowers, L. (1983). Central neural influences on the human retina during selective attention. Physiological Psychology, 11, 18-28. Eriksen, B. and Eriksen, C. W. (1974). Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception and Psychophysics, 16, 143-149. Eriksen, C. W., Coles, M. G. H., Morris, L. R. and O'Hara, W. P. (1985). An electromyographic examination of response competition. Bulletin of the Psychonomic Society, 23, 165-168. Eriksen, C. W. and Hoffman, J. E. (1972). Some characteristics of selective attention in visual perception determined by vocal reaction time. Perception and Psychophysics, 11, 169-171. Eriksen, C. W. and Schultz, D. W. (1979). Information processing in visual search: A continuous flow conception and experimental results. Perception and Psychophysics, 5, 249-263. Eriksen, C. W. and St James, J. D. (1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception and Psychophysics, 40, 225-240. Eriksen, C. W. and Yeh, Y. Y. (1985). Allocation of attention in the visual field. Journal of Experimental Psychology: Human Perception and Performance, 11, 583-597. Gaillard, A. W. K. (1988). Problems and paradigms in ERP research. Biological Psychology, 26, 91-109. Gathercole, S. E. and Broadbent, D. E. (1987). Spatial factors in visual attention: Some compensatory effects of location and time of arrival of nontargets. Perception, 16, 433-443. Gratton, G., Coles, M. G. H., Sirevaag, E., Eriksen, C. W. and Donchin, E. (1988). Pre- and poststimulus activation of response channels: A psychophysiological analysis. Journal of Experimental Psychology: Human Perception and Performance, 14, 331-344. Gunter, T. C., Jackson, J. L., Kutas, M., Mulder, G. and Bujink, B. M. (1994a). Focusing on the N400. An exploration of selective attention during reading. Psychophysiology, 24, 375-425. Gunter, T. C., Wijers, A. A., Jackson, J. L. and Mulder, G. (1994b). Visual spatial attention to stimuli presented on the vertical and horizontal meridian: An ERP-study. Psychophysiology, 31, 347-358. Hansen, J. C., Dickstein, P. W., Berka, C. and Hillyard, S. A. (1983). Event-related potentials during selective attention to speech sounds. Biological Psychology, 16, 211-224. Hansen, J. C. and Hillyard, S. A. (1980). Endogeneous brain potentials associated with selective auditory attention. Electroencephalography and Clinical Neurophysiology, 49, 277290. Hansen, J. C. and Hillyard, S. A. (1983). Selective attention to multidimensional auditory stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9, 1-19. Hansen, J. C. and Hillyard, S. A. (1984). Effects of stimulation rate and attribute cueing on event-related potentials during selective auditory attention. Psychophysiology, 21, 394-405. Hansen, J. C. and Hillyard, S. A. (1988). Temporal dynamics of human auditory selective attention. Psychophysiology, 25, 316-329. Hari, R., H/im/il/iinen, M., Ilmoniemi, R., Kaukoranta, E., Reinikainen, K., Salminen, J., Alho, K., N/i/it~inen, R. and Sams, M. (1984). Responses of the primary auditory cortex to pitch changes in a sequence of tone pips: Neuromagnetic recordings in man. Neuroscience Letters, 50, 127-132.
Brain potential analysis of selective attention
381
Hari, R., H/im/il/iinen, M., Kaukoranta, E., M/ikel/i, J., Joutsiniemi, S. L. and Tiihonen, J. (1989). Selective listening modifies activity of the human auditory cortex. Experimental Brain Research, 74, 463-470. Harter, M. R. and Aine, C. J. (1984). Brain mechanisms of visual selective attention. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention (pp. 293-321). Orlando, FL: Academic Press. Harter, M. R. and Aine, C. J. (1986). Discussion of neural specificity model of selective attention: A response to Hillyard and Mangun and to N/i/it/inen. Biological Psychology, 23, 297-311. Harter, M. R., Aine, C. J. and Schroeder, C. (1982). Hemispheric differences in the neural processing of stimulus location and type: Effects of selective attention on visual evoked potentials. Neuropsychologia, 20, 412-438. Harter, M. R and Anllo-Vento (1991). Visual spatial attention: Preparation and selection in children and adults. In C. H. M. Brunia, G. Mulder and M. Verbaten (Eds), Event-Related Brain Research (EEG Suppl. 42) (pp. 183-194). Amsterdam: Elsevier. Harter, M. R. and Guido, W. (1980). Attention to pattern orientation: Negative cortical potentials, reaction time and the selection process. Electroencephalography and Clinical Neurophysiology, 49, 461-475. Harter, M. R. and Previc, F. H. (1978). Size-specific information channels and selective attention: Visual evoked potential and behavioral measures. Electroencephalography and Clinical Neurophysiology, 45, 628-640. Harter, M. R. and Salmon, L. E. (1972). Intra-modality selective attention and evoked cortical potentials to randomly presented patterns. Electroencephalography and Clinical Neurophysiology, 32, 605-613. Heinze, H. J., Luck, S. J., Mangun, G. R. and Hillyard, S. A. (1990). Visual event-related potentials index focused attention within bilateral stimulus arrays. I. Evidence for early selection. Electroencephalography and Clinical Neurophysiology, 75, 511-527. Hillyard, S. A. and Hansen, J. C. (1986). Attention: Electrophysiological approaches. In M. G. H. Coles, E. Donchin and S. W. Porges (Eds), Psychophysiology: Systems, Processes and Applications (pp. 227-243). New York: Guilford Press. Hillyard, S. A., Hink, R. F., Schwent, V. L. and Picton, T. W. (1973). Electrical signs of selective attention in the human brain. Science, 182, 177-180. Hillyard, S. A. and Kutas, M. (1983). Electrophysiology of cognitive processing. Annual Review of Psychology, 34, 33-61. Hillyard, S. A. and Mangun, G. R. (1986). The neural basis of visual selective attention: A commentary on Harter and Aine. Biological Psychology, 23, 266-279. Hillyard, S. A., Mangun, G. R., Luck, S. J. and Heinze, H. J. (1990). Electrophysiology of visual attention. In E. R. John, T. Harmony, L. S. Prichep, M. and Valdes-Sosa, P. A. (Eds), Machinery of Mind. Boston, MA: Birkhauser. Hillyard, S. A. and Miinte, T. F. (1984). Selective attention to color and location: An analysis with event-related brain potential. Perception and Psychophysics, 36, 185-198. Hillyard, S. A., M6nte, T. F. and Neville, H. J. (1985). Visual-spatial attention, orienting and brain physiology. In M. I. Posner and O. S. M. Marin (Eds), Attention and Performance XI (pp. 63-84). Hillsdale, NJ: Erlbaum. Hillyard, S. A. and Picton, T. W. (1987). Electrophysiology of cognition. In V. B. Mountcastle, F. Plum and S. R. Geiger (Eds), Handbook of Physiology, vol. V. Higher Functions of the Brain, Part 2 (pp. 519-584). Bethesda, MD: American Physiological Society. Hillyard, S. A., Simpson, G. V., Woods, D. L., Van Voorhis, S. and M/inte, T. (1984). Event-related brain potentials and selective attention to different modalities. In F. Reinoso-Suarez and C. Ajmone-Marsan (Eds), Cortical Integration (pp. 395-414). New York: Raven Press. Hink, R. F., Van Voorhis, S. T., Hillyard, S. A. and Smith, T. (1977). The division of attention and the human auditory evoked potentials. Neuropsychologia, 15, 597-605.
382
A. A. Wijers et al.
Hot,nan, J. E., Simons, R. F. and Houck, M. R. (1983). Event-related potentials during controlled and automatic targets detection. Psychophysiology, 20, 625-632. Isreal, J. B., Chesney, G. L., Wickens, C. D. and Donchin, D. (1980a). P300 and tracking difficulty: Evidence for multiple resources in dual-task performance. Psychophysiology, 17, 259-273. Isreal, J. B., Wickens, C. D., Chesney, G. L. and Donchin, D. (1980b). The event-related brain potential as an index of display-monitoring workload. Human Factors, 22, 211-224. James, W. (1890). The Principles of Psychology. New York: Dover. Johnston, J. C. and Pashler, H. (1990). Close binding of identity and location in visual feature perception. Journal of Experimental Psychology: Human Perception and Performance, 16, 843-856. Jonides, J. (1980). Voluntary versus automatic control over the mind's eye's movement. Canadian Journal of Psychology, 34, 103-112. Kahneman, D. and Treisman, A. (1984). Changing views of attention and automaticity. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention (pp. 29-61). London: Academic Press. Kenemans, L., Kok, A. and Smulders, F. T. Y. (1993). Event-related potentials to conjunctions of spatial frequency and orientation as a function of stimulus parameters and response requirements. Electroencephalography and Clinical Neurophysiology, 88, 51-63. Kenemans, L. and Verbaten, M. N. (1990). Effects of unpredictable stimulus relevance on the SCR and late positive waves of the ERP. In C. H. M. Brunia, A. W. K. Gaillard and A. Kok (Eds), Psychophysiological Brain Research (pp. 203-207). Tilburg: Tilburg University Press. Kramer, A. F., Sirevaag, E. J. and Braune, R. (1987). A psychophysiological assessment of operator workload during simulated flight missions. Human Factors, 29, 145-160. Kutas, M. and Donchin, E. (1980). Preparation to respond as manifested by movementrelated brain potentials. Brain Research, 202, 95-115. Kutas, M. and Hillyard, S. A. (1980). Event-related brain potentials to semantically inappropriate and surprisingly large words. Biological Psychology, 11, 99-116. Kutas, M. and Van Petten, C. (1988). Event-related brain potential studies of language. In P.K. Ackels, J. R. Jennings and M. H. G. Coles (Eds), Advances in Psychophysiology, vol. 3 (pp. 139-187). Greenwich, CT: JAI Press. LaBerge, D. (1975). Acquisition of automatic processing in perceptual and associative learning. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance V (pp. 50-64). New York: Academic Press. LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimental Psychology: Human Perception and Performance, 9, 371-379. LaBerge, D., Brown, V., Carter, M. and Bash, D. (1991). Reducing the effects of adjacent distractors by narrowing attention. Journal of Experimental Psychology: Human Perception and Performance, 17, 65-76. Lopes da Silva, F. and van Rotterdam, A. (1987). Biophysical aspects of EEG and magnetoencephalogram generation. In Niedermeyer and F. Lopes da Silva (Eds), Electroencephalography: Basic Principles, Clinical Applications and Related Fields, 2nd edn (pp. 29-41). M~inchen: Urban and Schwarzenberg. Luck, S. J., Heinze, H. J., Mangun, G. R. and HiUyard, S. A. (1990). Visual event-related potentials index focused attention within bilateral stimulus arrays. II. Functional dissociation of P1 and N1 components. Electroencephalography and Clinical Neurophysiology, 75, 528-542. Magliero, A., Bashore, T. R., Coles, M. G. H. and Coles, E. (1984). On the dependence of P300 latency on stimulus evaluation processes. Psychophysiology, 21, 171-186. Mangun, G. R. R., Hansen, J. C. and Hillyard, S. A. (1986). Electroretinograms reveal no evidence for centrifugal modulation of retinal input during selective attention in man. Psychophysiology, 23, 156-165. Mangun, G. R. R. and Hillyard, S. A. (1987). The spatial allocation of visual attention as indexed by event-related brain potentials. Human Factors, 29, 195-211.
Brain potential analysis of selective attention
383
Mangun, G. R. and Hillyard, S. A. (1988). Spatial gradients of visual attention: Behavioral and electrophysiological evidence. Electroencephalography and Clinical Neurophysiology, 70, 417-428. Mangun, G. R. and Hillyard, S. A. (1990a). Electrophysiological studies of visual selective attention in humans. In A. B. Scheibel and A. F. Wechsler (Eds), Neurobiology of Higher Cognitive Function (pp. 271-295). New York: Guilford Press. Mangun, G. R. and Hillyard, S. A. (1990b). Allocation of visual attention to spatial locations: Tradeoff functions for event-related brain potentials and detection performance. Perception and Psychophysics, 47, 532-550. Mangun, G. R., Hillyard, S. A. and Luck, S. J. (1993). Electrocortical substrates of visual selective attention. In S. Kornblum and D. E. Meyer (Eds), Attention and Performance XIV (pp. 219-243). Hillsdale NJ: Erlbaum. M/intysalo, S. and N/i/it/inen, R. (1987). The duration of a neuronal trace of an auditory stimulus as indicated by event-related potentials. Biological Psychology, 24, 183-195. Maylor, E. A. (1985). Facilitatory and inhibitory components of orienting in visual space. In M. I. Posner and O. Marin (Eds), Attention and Performance XI (pp. 189-204). Hillsdale, NJ: Erlbaum. Maylor, E. A. and Hockey, R. (1985). Inhibitory component of externally controlled covert orienting in visual space. Journal of Experimental Psychology: Human Perception and Performance, 11, 777-787. McCarthy, G. and Donchin, E. (1981). A metric for thought: A comparison of P300 latency and reaction time. Science, 211, 77-80. McCarthy, G., Nobre, A. C. and Wood, C. C. (1989). Visual spatial selective attention to words: An analysis of event-related potentials. Poster presented at the Ninth International
Conference on Event-Related Potentials, May 1989, Noordwijk, The Netherlands. McCarthy, G. and Wood, C. C. (1985). Scalp distributions of event-related potentials: An ambiguity associated with analysis of variance models. Electroencephalography and Clinical Neurophysiology, 62, 203-208. Mecklinger, A., Kramer, A. F. and Strayer, D. L. (1992). Event related potentials and EEG components in a semantic memory search task. Psychophysiology, 29, 104-119. Meijs, J. W. H. (1988). The influence of head geometries on electro- and magnetoencephalograms. Doctoral Thesis. University of Twente. Melara, R. D. (1989). Dimensional interaction between color and pitch. Journal of Experimental Psychology: Human Perception and Performance, 15, 69-79. Meyer, D. E., Osman, A. M., Irwin, D. E. and Yantis, S. (1988). Modern mental chronometry. Biological Psychology, 26, 3-67. Michie, P., Solowij, N., Crawford, J. and Glue, L. (1990). Auditory ERPs during auditory attention and a visual control task: Effects of auditory task difficulty. In C. H. M. Brunia, A. W. K. Gaillard and A. Kok (Eds), Psychophysiological Brain Research (pp. 208-211). Tilburg: Tilburg University Press. Miller, J. (1982). Discrete versus continuous stage models of human information processing: In search of partial output. Journal of Experimental Psychology: Human Perception and Performance, 8, 273-296. Miller, J. (1988). Discrete and continuous models of human information processing: Theoretical distinctions and empirical results. Acta Psychologica, 67, 191-257. Mulder, G., Gloerich, A. B. M., Brookhuis, K. A., van Dellen, H. J. and Mulder, L. J. M. (1984). Stage analysis of the reaction process using brain-evoked potentials and reaction time. Psychological Research, 46, 15-32. N/i/it/inen, R. (1975). Selective attention and evoked potentials in humans: a critical review. Biological Psychology, 2, 237-307. N/i/it/inen, R. (1982). Processing negativity: An evoked-potential reflection of selective attention. Psychological Bulletin, 92, 605-640. N/i/it/inen, R. (1986). The neural-specificity theory of visual selective attention evaluated: A commentary on Harter and Aine. Biological Psychology, 23, 281-295.
384
A. A. Wijers et al.
N/i/it/inen, R. (1990). The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function. Behavioral and Brain Sciences, 13, 201-288. N/i/it/inen, R. and Gaillard, A. W. K. (1983). The orienting reflex and the N2 deflection of the event-related potential (ERP). In A. W. K. Gaillard and W. Ritter (Eds), Tutorials in Event-Related Potential Research: Endogenous Components (pp. 119-141). Amsterdam: NorthHolland. N/i/it/inen, R., Gaillard, A. W. K. and M/intysalo, S. (1978). Early selective attention effect on evoked potential reinterpreted. Acta Psychologica, 42, 313-329. N/i/it/inen, R., Gaillard, A. W. K. and M/intysalo, S. (1980). Brain potential correlates of voluntary and involuntary attention. In H. H. Kornhuber and L. Deecke (Eds), Motivation,
Motor and Sensory Processes of the Brain: Electrical Potentials, Behaviour and Clinical Use. Progress in Brain Research. Amsterdam: Elsevier. N/i/it/inen, R., Gaillard, A. W. K. and Varey, C. A. (1981). Attention effects on auditory EPs as a function of interstimulus interval. Biological Psychology, 13, 173-187. N/i/it/inen, R. and Michie, P. T. (1979). Early selective attention effects on the evoked potential: A critical review and reinterpretation. Biological Psychology, 8, 81-136. N/i/it/inen, R. and Picton, T. W. (1987). The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure. Psychophysiology, 24, 375-425. N/i/it/inen, R., Simpson, M. and Loveless, N. E. (1982). Stimulus deviance and evoked potentials. Biological Psychology, 14, 53-98. Neumann, O., van der Heijden, A. H. C. and Allport, D. A. (1986). Visual selective attention: Introductory remarks. Psychological Research, 48, 185-188. Neville, H. J. Kutas, M, Chesney, G. and Schmidt, A. L. (1986). Event-related brain potentials during initial encoding and recognition memory of congruous and incongruous sentences. Journal of Memory and Language, 25, 75-92. Neville, H. J. and Lawson, D. (1987). Attention to central and peripheral visual space in a movement detection task: An event-related potential and behavioral study. I. Normal hearing adults. Brain Research, 405, 253-267. Nissen, M. J. (1985). Accessing features and objects: Is location special? In M. I. Posner and O. Marin (Eds), Attention and Performance XI (pp. 205-220). Hillsdale, NJ: Erlbaum. Nunez, P. L. (1981). Electrical Fields of the Brain. New York: Oxford University Press. Nyman, G., Alho, K., Laurinen, P., Paavilainen, P., Radii, T., Reinakainen, K., Sams, M. and N/i/it/inen, R. (1990). Mismatch negativity (MMN) for sequences of auditory and visual stimuli: Evidence for a mechanism specific to the auditory modality. Electroencephalography and Clinical Neurophysiology, 77, 436-444. Okita, T. (1979). Event-related potentials and selective attention to auditory stimuli varying in pitch and localization. Biological Psychology, 9, 271-284. Okita, T. (1987). Event-related potentials and selective attention to tones moving in location and pitch: An examination of movement velocity. Biological Psychology, 24, 225-236. Okita, T., Konishi, K. and Inamori, R. (1982). Attention-related negative brain potential for speech words and pure tones. Biological Psychology, 16, 29-47. Okita, T., Wijers, A. A., Mulder, G. and Mulder, L. J. M. (1985). Memory search and visual spatial attention: An event-related brain potential analysis. Acta Psychologica, 60, 263292. Parasuraman, R. (1978). Auditory evoked potentials and divided attention. Psychophysiology, 15, 460-465. Parasuraman, R. (1980). Effects of information processing demands on slow negative-shift latencies and N100 amplitude in selective and divided attention. Biological Psychology, 11, 217-233. Peronnet, F. and Farah, M. J. (1989). Mental rotation: An event-related potential study with a validated mental rotation task. Brain and Cognition, 9, 279-288.
Brain potential analysis of selective attention
385
Peters, M. and de Munck, J. (1990). On the forward and the inverse problem for EEG and MEG. In F. Grandori, M. Hoke and G. L. Romani (Eds), Auditory Evoked Magnetic Fields and Electric Potentials. Advances in Audiology, vol. 6 (pp. 71-102). Basel: Karger. Posner, M. I. (1978). Chronometric Explorations of Mind. HiUsdale, NJ: Erlbaum. Posner, M. I. and Cohen, Y. (1980). Attention and the control of movements. In G. E. Stelmach and J. Requin (Eds), Tutorials in Motor Behavior. Amsterdam: North-Holland. Posner, M. I. and Cohen, Y. (1984). Components of visual orienting. In H. Bouma and D. Bowhuis (Eds), Attention and Performance X (pp. 531-556). Hillsdale, NJ: Erlbaum. Posner, M. I., Cohen, Y., Choate, L. S., Hockey, R. and Maylor, E. A. (1984). Sustained concentration: Passive filtering or active orienting? In S. Kornblum and J. Requin (Eds), Preparatory States and Processes (pp. 49-65). Hillsdale, NJ: Erlbaum. Posner, M. I. and Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13, 25-42. Posner, M. I. and Snyder, C. R. R. (1975). Attention and cognitive control. In R. L. Solso (Ed.), Information Processing and Cognition: The Loyola Symposium. Hillsdale, NJ: Erlbaum. Posner, M. I., Snyder, C. R. R. and Davidson, B. J. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109, 160-174. Previc, F. H. and Harter, M. F. (1982). Electrophysiological and behavioral indicants of selective attention to multifeature gratings. Perception and Psychophysics, 32, 465-472. Prinzmetal, W. (1981). Principles of feature integration in visual perception. Perception and Psychophysics, 30, 330-340. Prinzmetal, W. and Millis-Wright, M. (1984). Cognitive and linguistic factors affect visual feature integration. Cognitive Psychology, 16, 305-340. Prinzmetal, W., Presti, D. and Posner, M. I. (1986). Does attention affect feature integration? Journal of Experimental Psychology: Human Perception and Performance, 12, 361-369. Pritchard, W. S. (1981). Psychophysiology of P300. Psychological Bulletin, 89, 506-540. RiL J., Hari, R. and Tiihonen, J. (1989). Paried tone presentation enhances responses of human auditory cortex to rare frequency changes. In S. J. Williamson, M. Hoke, G. Stroink and M. Kotani (Eds), Advances in Biomagnetism. Proceedings of the Seventh International Conference on Biomagnetism (pp. 121-124). New York: Plenum Press. R6sler, F. and Heil, M. (1993). Monitoring retrieval from long-term memory by slow event-related brain potentials. Psychophysiology, 30, 170-182. Ruchkin, D. S., Johnson, R., Jr, Mahaffey, D. and Sutton, S. (1988). Toward a functional categorization of slow waves. Psychophysiology, 25, 339, 353. Rugg, M. D. (1991). ERPs and selective attention: Commentary. In C. H. M. Brunia, G. Mulder and M. N. Verbaten (Eds), Event-Related Brain Research (EEG Suppl. 42) (pp. 222227). Amsterdam: Elsevier. Rugg, M. D., Lines, C. R. and Milner, A. D. (1985). Further investigation of visual evoked potentials elicited by lateralized stimuli: Effects of stimulus eccentricity and reference site. Electroencephalography and Clinical Neurophysiology, 62, 81-87. Rugg, M. D., Milner, A. D. and Lines, C. R. (1985). Visual evoked potentials to lateralised stimuli in two cases of callosal agenesis. Journal of Neurology, Neurosurgery and Psychiatry, 48, 367-373. Rugg, M. D., Milner, A. D., Lines, C. R. and Phalp, R. (1987). Modulation of visual event-related potentials by spatial and non-spatial visual selective attention. Neuropsychologia, 25, 85-96. Sams, M., Alho, K. and N/i/it/inen, R. (1983). Sequential effects in the ERP in discriminating two stimuli. Biological Psychology, 17, 41-58. Sams, M., Kaukoranta, E., H/im/il/iinen, M. and N/i/it/inen, R. (1989). Neuromagnetic responses of the human auditory cortex to different types of infrequent deviant stimuli. In S. J. Williamson, M. Hoke, G. Stroink and M. Kotani (Eds), Advances in Biomagnetism. Proceedings of the Seventh International Conference on Biomagnetism (pp. 125-128). New York: Plenum Press.
386
A. A. Wijers et al.
Sanders, A. F. (1983). Towards a model of stress and human performance. Acta Psychologica, 53, 61-97. Schneider, W. and Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. Detection, search and attention. Psychological Review, 84, 1-66. Shiffrin, R. M. and Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127-190. Shulman, G. L, Wilson, J. and Sheehy, J. B. (1985). Spatial determinants of the distribution of attention. Perception and Psychophysics, 37, 59-65. Shulman, J. E., Sheehy, J. B. and Wilson, J. (1986). Gradients of spatial attention. Acta Psychologica, 61, 167-181. Smid, H. G. O. M., Lamain, W., Hogeboom, M. M., Mulder, G. and Mulder, L. J. M. (1991). Psychophysiological evidence for continuous information transmission between visual search and response processes. Journal of Experimental Psychology: Human Perception and Performance, 17, 696-714. Smid, H. G. O. M., Mulder, G. and Mulder, L. J. M. (1990). Selective response activation can begin before stimulus recognition is complete: A psychophysiological and error analysis of continuous flow. Acta Psychologica, 74, 169-201. Smid, H. G. O. M., Mulder, G., Mulder, L. J. M. and Brands, G. J. (1992). A psychophysiological study of the use of partial information in stimulus-response translation. Journal of Experimental Psychology: Human Perception and Performance, 18, 1101-1119. Solowij, N., Michie, P., Crawford, J. and Glue, L. (1990). Auditory ERPs during auditory attention and a visual control task: Effects of visual task difficulty. In C. H. M. Brunia, A.W.K. Gaillard and A. Kok (Eds), Psychophysiological Brain Research (pp. 217-220). Tilburg: Tilburg University Press. Sternberg, S. (1969). On the discovery of processing stages. Acta Psychologica, 30, 276-315. Stuss, D. T., Sarazin, F. F., Leech, E. E. and Picton, T. W. (1983). Event-related potentials during naming and mental rotation. Electroencephalography and Clinical Neurophysiology, 56, 133-146. Teder, W., Alho, K., Reinikainen, K. and N/i/it/inen, R. (1990). Stimulus rate and ERPs during selective auditory attention. In C. H. M. Brunia, A. W. K. Gaillard and A. Kok (Eds), Psychophysiological Brain Research (pp. 221-224). Tilburg: Tilburg University Press. Tipper, S. P. (1985). The negative priming effect: Inhibitory effects of ignored primes. Quarterly Journal of Experimental Psychology, 37A, 571-590. Treisman, A. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental Psychology, 12, 242-248. Treisman, A. (1988). Features and objects: The fourteenth Bartlett memorial lecture. Quarterly Journal of Experimental Psychology, 40A, 201-237. Treisman, A. M. and Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Van Dellen, H. J., Brookhuis, K. A., Mulder, G., Okita, T. and Mulder, L. J. M. (1985). Evoked potential correlates of practice in a visual search task. In D. Papakoustopoulos, S. Butler and I. Martin (Eds), Clinical and Experimental Neurophysiology (pp. 132-155). Beckenham: Croom Helm. Van der Heijden, A. H. C. (1991). Selective Attention in Vision. London: Routledge. Van Petten, C. and Kutas, M. (1990). Interaction between sentence context and word frequency in event-related brain potentials. Memory and Cognition, 18, 380-393. Van Voorhis, S. and Hillyard, S. A. (1977). Visual evoked potentials and selective attention to points in space. Perception and Psychophysics, 22, 54-62. Verleger, R. (1988). Event-related potentials and cognition: A critique of the context updating hypothesis and an alternative interpretation of P3. Behavioral and Brain Sciences, 11, 343-427.
Brain potential analysis of selective attention
387
Wagner, M., Alho, K., Lavikainen, J., Reinikainen, K., Teder, W. and N/i/it/inen, R. (1991). Sequential analysis of auditory selective attention: Evidence for facilitation and inhibition of pitch discrimination. Poster presentation at the symposium "New Developments in Event-
Related Potentials', Hannover, Germany. Wijers, A. A. (1989). Visual selective attention. An electrophysiological approach. Doctoral Thesis, Groningen. Wijers, A. A., Dunajski, Z., Peters, M. and Mulder, G. (1992). Magnetic brain responses in visual selective attention. In M. Hoke, S. N. Erm6, Y. C. Okada, and G. L. Romani (eds), Biomagnetism: Clinical Aspects (pp. 213-216). Amsterdam: Elsevier. Wijers, A. A., Lamain, W., Slopsema, S., Mulder, G. and Mulder, L. J. M. (1989d). An electrophysiological investigation of the spatial distribution of attention to colored stimuli in focused and divided attention conditions. Biological Psychology, 29, 213-245. Wijers, A. A., Mulder, G., Okita, T., Mulder, L. J. M. and Scheffers, M. K. (1989a). Attention to colour: An ERP-analysis of selection, controlled search and motor activation. Psychophysiology, 26, 89-109. Wijers, A. A., Mulder, G., Okita, T. and Mulder, L. J. M. (1989b). An ERP-study on memory search and selective attention to lettersize and conjunctions of lettersize and color. Psychophysiology, 26, 529-547. Wijers, A. A., Mulder, G., Otten, L., Feenstra, S. and Mulder, L. J. M. (1989c). Brain potentials during selective attention, memory search and mental rotation. Psychophysiology, 26, 452-467. Wijers, A. A., Okita, T., Mulder, G., Mulder, L. J. M., Lorist, M. M., Poiesz, R. and Scheffers, M. K. (1987). Visual search and spatial attention: ERPs in focussed and divided attention conditions. Biological Psychology, 25, 33-60. Williamson, S. J. and Kaufman, L. (1990). Theory of neuroelectric and neuromagnetic fields. In F. Grandori, M. Hoke and G. L. Romani (Eds), Auditory Evoked Magnetic Fields and Electric Potentials. Advances in Audiology, vol. 6 (pp. 1-39). Basel: Karger. Woldorff, M., Gallen, C., Hampson, S., Pantev, C. and Hillyard, S. A. (1991). Modulation of early sensory processing in human auditory cortex during selective listening. Oral
presentation at the symposium "New Developments in Event-Related Potentials', Hannover, Germany. Woods, D. L., Hillyard, S. A. and Hansen, J. C. (1984). Event-related brain potentials reveal similar attentional mechanisms during selective listening and shadowing. Journal of Experimental Psychology: Human Perception and Performance, 10, 761-777. Yantis, S. and Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 10, 601-621. Yantis, S. and Jonides, J. (1990). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 16, 121-134.
Chapter 10 Theories of Attention Odmar Neumann Department of Psychology, Bielefeld University, Germany
There has been both continuity and discontinuity in the history of theories of attention. Continuity has existed in the sense that the central questions, and some of the basic theoretical alternatives (e.g. the perennial issue of early versus late selection and the problem of limited capacity), were already formulated in the psychology of the late 19th century; their roots range back into 18th-century philosophy (Neumann, 1995; Posner, 1982). However, there has not been a continuous research tradition in which the inquiry into these questions has systematically unfolded. Although attention research had not disappeared in the behaviorist period (Lovie, 1983), the modern research that began in the 1950s started anew, with very little reference to the earlier work. With some notable exceptions (Sanders, 1971), most of the papers and monographs that initially shaped modern attention research (Broadbent, 1958; Neisser, 1967; Treisman, 1964) give the impression that this was a new research field that came into existence after World War II. The models were mainly taken from the fields of information transmission and information processing, the terminology was new, and so were many of the methods. In retrospect, though, it is easy to see that this research was much more in the spirit of earlier traditions than it was itself aware at the time; one illustrious example of such a look back is van der Molen's contribution to this volume (Chapter 7). The surface discontinuity in the development of attention research after the second world war makes it easy to demarcate the confines of the present chapter. It will be concerned with theories of attention that are 'modern' in the sense that they belong to the research tradition that started afresh in the early 1950s, and that found its first theoretical expression in Broadbent's (1958) seminal book Perception and Communication. This book set the stage for practically all theorizing about attention until the present, both by providing basic ideas and by provoking criticism and alternative views. During the last half of the century, there has thus undoubtedly been a 'cumulative development of attentional theory', to use Posner's (1982) expression. Because of this high degree of coherence within modern theorizing about attention, the main part of this chapter will be organized around common issues rather than around individual theories. The principal sections will not be devoted to particular theoretical approaches, but to specific theoretical questions. To provide a frame of reference for these content-oriented sections, the chapter begins with a historical sketch that describes the major theoretical developments at a more global level. Handbook of Perception and Action, Volume 3 ISBN 0-12-516163-8
Copyright 9 1996 Academic Press Ltd All rights of reproduction in any form reserved
390
O. Neumann
This chronological overview is followed by three main sections. The first is concerned with the functional basis of limited capacity, which has been conceptualized in different ways by capacity theories, but about which noncapacity theories have also had something to say. The central issues will be why capacity is limited, and how it is limited. By contrast, the classical question of where capacity is limited will be deferred to the second main section, whose topic is the locus and mechanisms of attentional selection. The reason is that, as we shall see, capacity theories have tended to confound the issue of where capacity is limited with the issue of where selection takes place (see, for example, van der Heijden's excellent analysis in Chapter 1 of this volume). Like the section on capacity, the section on selection will be in part devoted to filter theory and capacity and resource theories. However, the emphasis will be on the selection mechanisms that have been proposed by the more recent approaches since the 1980s, which have mainly studied visual selection. The final, shorter section will be about a central question that has, however, been given short shrift by many theories of attention. It concerns the function(s) of attention. As we shall see, widely different answers have been proposed or implied, ranging from figural synthesis and feature integration to action control. One may ask, though, whether all of these answers are mutually exclusive. Indeed, I will argue that the label 'attention' does not refer to a unitary component of the human processing system, but to a variety of mechanisms; mechanisms that are, however, united by their common functions of controlling information processing and behavior whenever (i.e. in the cases when, and to the degree that) existing skills and routines alone cannot do this. In this sense, attention is intimately related to 'the processing of novelty', to use Underwood and Everatt's (Chapter 6) formulation.
1
A BRIEF H I S T O R I C A L S U R V E Y
Modern attention research has produced relatively few attempts to integrate the whole range of empiric findings within a common theoretical framework. Perhaps the most influential theories with such a broad scope have been those of Broadbent (1958, 1971), Neisser (1967), Kahneman (1973) and Wickens (1980, 1984). However, most of the theorizing, especially in the last two decades, has been more restricted. Individual theories have focused on fields such as automatic versus controlled processing (Posner, 1978), visual attention (van der Heijden, 1992) or energetic resources (Sanders, 1983). Subdividing these various theoretical contributions, aimed at different aspects of attention and treating them at different levels of generality, into 'approaches' must by necessity be to some degree arbitrary. The following account is based on a tentative distinction between four, partially overlapping, phases of attention research. The first was dominated by Broadbent's (1958) filter theory and the theoretical discussions that revolved around it. Capacity was conceptualized as the transmission capacity of a channel, and selection was performed by a filter that blocked or attenuated the flow of information. In the second phase, the concept of a general, unspecific processing capacity, as proposed by Kahneman (1973), became the dominant idea. In one influential variant of this approach, effortful, capacity-demanding processes were contrasted with automatic, capacity-free processes (Kerr, 1973; Posner and Snyder, 1975; Shiffrin and
Theories of attention
391
Schneider, 1977). In this phase, the metaphor for capacity became that of a 'supply'; selection was its allocation. In the third phase, the notion of unspecific capacity dissolved in favor of the idea of multiple, specific resources. This idea was first proposed by Allport, Antonis and Reynolds (1972) and Sanders (1979), and became prominent in the wake of influential papers by Wickens (1980) and Navon and Gopher (1979). In this phase, the selection problem receded into the background relative to the capacity problem. The topics were the nature and measurement of resources, while the mechanisms that supposedly allocated these resources remained largely unspecified. The fourth, most recent, phase is characterized by the gradual abandonment of the notion that limited capacity is the central functional characteristic of attention. Instead, theorizing has once again focused on the selective (plus the integrative) functions of attention, with a strong emphasis on visual attention (Allport, 1989; Neumann, 1990, 1992; Posner, 1980; Posner and Petersen, 1990; Treisman, 1988; van der Heijden, 1992). The next sections contain a review of these four phases.
1.1 Filter theory Filter theory (Broadbent, 1958, 1971, 1982) emerged from experimental work that was done in an applied context. One impetus for these experiments came from analyses of the working situation of flight controllers in a control tower in which there was vocal communication with several airplanes at the same time (cf. Broadbent, 1982; Neumann, 1985). Some of the early experiments directly mirrored this situation. In listen-and-answer experiments (Broadbent, 1952a, b, c), the subjects were presented with verbal questions that they had to answer on the basis of visually presented information. The main findings were that a temporal overlap between two different messages hampered performance, but that subjects could in part prevent this interference if they knew that one of the messages was irrelevant. These empiric observations found their counterpart in two main components of filter theory. Interference was, according to this theory, due to a limited-capacity, central channel (P system, for 'perception') that could handle only a limited amount of information. The impact of an unwanted message could be reduced by the subject because there was a filter that prevented this irrelevant information from reaching the central channel. The third major constituent of the model was a short-term memory (STM, or S system) component that was located prior to the filter. Its chief empiric basis was the observation from 'split-span' experiments (Broadbent, 1954) that subjects who were presented with two simultaneous threedigit sequences, one presented to each ear, were able to reproduce first one and then the other sequence, apparently because the latter had been held in a short-term storage system (Broadbent, 1954, 1958; Chapter 9). According to the model, information prior to the P s y s t e m - a n d therefore, presumably, information in S T M - was coded in terms of 'pitch, localization, or other similar qualities' (Broadbent, 1958, p. 42). Thus, the filter theory's 'STM' was in some respect similar to what was later (Atkinson and Shiffrin, 1968) called 'sensory memory'. On the other hand, Broadbent (1958, p. 226f) suggested that information from the P system could be fed back into the STM component, apparently implying that STM could also contain processed, (phonetically)
392
0. Neumann
identified information, similarly to Atkinson and Shiffrin's (1968) STM component. The status of the STM component was therefore somewhat unclear in the 1958 version of the theory. Later accounts (Broadbent, 1971, 1982) clarified that the S system was a buffer that contained uncategorized information, and that it was different from 'primary memory', to which categorized information can be recycled. Filter theory had an enormous impact and shaped more than a decade of attention research. In part, this was probably due to its being directly anchored in empiric observations. A further factor may have been its high internal consistency and plausibility. The limited-channel idea was taken from Shannon and Weaver's (1949) information theory and seemed to follow directly from the simple fact that the brain is a physically limited system. The filter was a logical way to protect the central channel from being overloaded, and the STM component had the plausible function of a buffer that enabled the system to cope with momentary input peaks. The system was constructed as if it had been conceived by a clever engineer. A third probable reason for the great influence of filter theory was that it encompassed several simple assumptions to which equally simple, testable alternatives could be formulated. By far the most controversial assumption was the postulate that the obligatory, bottom-up processing of all information, attended or nonattended (i.e. processing prior to the filter), does not lead to an identification or categorization of the stimulus, but only to its representation in terms of simple sensory features. This assumption, later called the 'early selection' view, was challenged by the 'late selection' view, usually attributed to Deutsch and Deutsch (1963), according to which all stimuli are fully identified, but only the attended stimuli are given access to further stages, such as long-term storage and motor control. This 'early versus late selection' dichotomy has been one of the major issues that have driven experimental research since the 1960s. Though the discussion was often vehement, the opposing camps were actually not too far apart. Both believed that there was some 'central bottleneck' and that there was a selection device prior to it. The only point of disagreement regarded the location of the selection device. This aspect of filter theory and its alternatives will be taken up again in the section on the locus and mechanisms of selection.
1.2 Unspecific Capacity Engineering psychology not only provided empiric research questions for the psychology of attention that took shape in the 1950s and 1960s; communication engineering also furnished some of the basic theoretical ideas. These changed as communication engineering progressed. Filter theory had viewed the human being as a device for transmitting information; hence the central portion of the processing system was conceptualized as a channel. During the 1960s, a different technological device began to fascinate psychologists: the computer. Computer technology became the major model for theorizing in the area of attention, as it did in other areas. This did not require abandoning the central concept of limited capacity. Computers suffer from limited capacity, just as transmission lines do. The capacity of the computers of the 1960s was woefully limited. As Moray (1967) mentioned in one of the papers that initiated this new kind of theorizing about attention, the
Theories of attention
393
computer in his institute had, at the time, a working memory capacity of 8 kilo( !)bytes, 4 of which were occupied by an ALGOL compiler. Not surprisingly, scientists who used to work with these computers were inclined to think of the human processing system in terms of limited central capacity. Limited capacity thus survived the shift from the channel metaphor to the computer metaphor. As Moray (1967) pointed out, there are, though, some differences between the capacity of a transmission line and the capacity of a computer, which does not simply transmit information, but 'receives, transforms and generates messages' (Moray, 1967, p. 87). At about the same time, empiric evidence began to accumulate that not all mental processes produce, and suffer from, interference if paired with other, simultaneous processes. For example, Posner and Boies (1971) reported that responding to a probe stimulus (pressing a button when a tone sounded) was not substantially delayed when the probe stimulus was presented at a time during which the processing system was supposedly occupied with the identification of a visual stimulus. Within the computer metaphor, this suggested that there were certain 'wired-in' processes that did not 'demand capacity' (Kerr, 1973). Later on, this led to two-process theories, which distinguished between 'automatic' and 'controlled' processing (LaBerge and Samuels, 1974; Posner and Snyder, 1975; Shiffrin and Schneider, 1977; for reviews see Neumann, 1984, 1989; see Chapter 6). Only controlled processing was believed to occupy the central, capacity-limited portion of the processing system. While Moray (1967) wrote a kind of manifesto of this new, central processing capacity view of attention, and Posner (1978) summarized many of its applications, there was no formal theoretical elaboration in the style of Broadbent's (1958) elaboration of the information transmission metaphor. This gap was filled by a monograph that, in a way, represented an outsider view, and yet became probably the most often cited comprehensive treatment of attention in the 1970s and early 1980s. Kahneman (1973) used the term 'unspecific capacity', but instead of relating it to the computer metaphor, he interpreted it as a 'non-specific input, which may be variously labeled "effort", "capacity", or "attention"' (Kahneman, 1973, p. 9), similar to Freud's concept of libido that had inspired Kahneman's theorizing. This capacity concept combined the functions of Broadbent's central channel with those of the filter. The capacity supply was limited, as had been the transmission capacity of the channel. Capacity could be allocated to some inputs and withheld from others, just as the filter had been able to accept inputs and reject others. Thus, Kahneman's theory impressed by its simplicity. However, by equating capacity with physiological activation, or effort, Kahneman departed widely from the mainstream of the period that sought the inspiration for psychological theories in the communication sciences rather than the brain sciences.
1.3
Specific Resources
Kahneman's (1973) main contention had been that 'interference is nonspecific, and it depends only on the demands of both tasks' (Kahneman, 1973, p. 11). At the time when the book appeared, it had, however, already become evident that, as an empiric generalization, this was not correct. There had been numerous demonstrations of specific interference, i.e. of more interference between structurally similar
394
O. Neumann
than between structurally dissimilar tasks. For example, Brooks (1968) had shown that a spatial output mode (pointing) interfered more heavily with a spatial task (visual imagery) than a vocal output mode (speaking), whereas a verbal task was more strongly interfered with by the vocal than by the spatial output mode. Similarly, Allport et al. (1972) reported a drastic reduction in interference when subjects who shadowed an auditory-verbal message had to monitor pictures instead of a second spoken message. Kahneman (1973) tried to account for these and similar findings by postulating 'structural interference' as a type of interference that was not really attentional. However, this solution rapidly lost its plausibility as more and more examples of specific interference accumulated. At the verge of the 1980s, it had become fairly clear that specific interference was not the exception but the rule. Capacity theory had to come to grips with this empiric situation. It did so by invoking specific resources. A precursor of the concept of specific resources had been Allport et al.'s (1972) concept of 'multiple processors', which was clearly ahead of its time when it was proposed at about the time when Kahneman's (1973) monograph successfully propagated the concept of unspecific capacity. In 1977, Andries Sanders presented an important conference paper on mental load that appeared in print 2 years later (Sanders, 1979). In this paper, he discussed different concepts of limited capacity. His 'processor type C' stated a view of attention according to which '... each mechanism may well have its own capacity which is not exchangeable with that of other mechanisms' (Sanders, 1979, p. 55). This idea was subsequently worked out in two ways. First, theorists tried to identify and characterize individual resources. In one of the most influential papers in this area, Wickens (1980; see also Wickens, 1984) proposed three main types of resource pool: stages, hemispheres and modalities. The idea that resources are related to (though not identical with) processing stages was later elaborated by Sanders (1983), while others (e.g. Friedman and Polson, 1981) took up and developed the idea of hemispheres as processing resources. The second way in which the concept of resources was developed and refined regarded the formal status of resources. Norman and Bobrow (1975) had already introduced the distinction between resource-limited and data-limited processes and suggested the notions of performance-resource function (a function relating the level of performance to the amount of invested resources) and performanceoperating characteristic (a function relating performance in task A to performance in task B in a dual-task situation). In an elaborate attempt to formalize resource theory further, Navon and Gopher (1979) took up these ideas and developed them into a formal system that was inspired by similar formalisms from microecomomic theory (see also Gopher and Sanders, 1984). Despite these impressive contributions, resource theories soon turned out to be built on shaky ground. As Allport (1980) pointed out, performance-resource functions cannot be determined from dual-task data unless one knows in advance that the two tasks rely on the same resource(s). Heuer (1985) showed that the logic of resource measurement suffers from some of the same shortcomings that had besieged classical factor analysis, and Neumann (1985, 1987) concluded from literature reviews that the observed patterns of dual-task interference cannot be explained by a reasonably small number of basic resources. Similarly, David Navon, who had been one of the original proponents of the resource concept, came
Theories of attention
395
to the conclusion that this concept lacks explanatory value (Navon, 1984), and Neumann (1995) showed that both Wickens' (1980, 1984) and Navon and Gopher's (1979) variants of resource theory amount to a redescription, rather than an explanation, of the observed data. In view of these shortcomings, it seems fair to say that the resource concept has failed as an heir to Broadbent's channel capacity and Kahneman's unspecific capacity, i.e. as an explanatory concept for all kinds of attentional phenomena. By contrast, a much more narrow usage of the resource concept, which restricts it to energetic aspects of attention (Sanders, 1983), is immune against most of these criticisms and may, in fact, turn out to be a valuable tool in bridging gaps between different research traditions, both within experimental psychology and between psychology and neurophysiology (see Chapter 7).
1.4
Selection and Integration
The approaches that have so far been discussed have shared the conviction that the basic characteristic of attention is limited capacity. Other aspects of attention were assumed to derive from this fundamental property. In particular, selection was conceptualized as a functional consequence of limited capacity. For example, because Broadbent's (1958) central channel had a limited transmission capacity, there was a filter that protected it from overload. Similarly, because there existed only a limited supply of Kahneman's (1973) unspecific capacity, it had to be strategically allocated. Selection was viewed as the way in which the system copes with its limited capacity. This view did not take into account the possibility that attention may have functions other than selection, and that selection may have functions other than coping with limited capacity. These alternatives began to emerge gradually as theoretical options during the 1980s. At about the same time, there were two further trends. First, the emphasis shifted from dual-task interference to sensory attention, and within the field of sensory attention from the auditory to the visual modality. Second, connectionism began to have its impact on attention theory. Together, these developments have led to a new style of theorizing that stresses specific attentional mechanisms and their functions. At a general level, there have been attempts to chart different types of attentional mechanisms related, for example, to sensory processing, action planning and motor control (Neumann, 1987, 1992, 1995; Pashler, 1993; Sanders, 1983). Within the field of visual attention, the 1980s have seen several very productive experimental paradigms and theoretical ideas, in particular Anne Treisman's proposal that visual attention serves to integrate the features that belong to the same visual object (Treisman, 1988, 1992; Treisman and Gelade, 1980; Treisman and Gormican, 1988; Treisman and Paterson, 1984; Treisman and Schmidt, 1982; Treisman and Souther, 1985), and the cueing paradigm that was introduced by Michael Posner and associates to study the temporal dynamics of shifts of visual attention and their effects (Posner, 1980; Posner, Snyder and Davidson, 1980). The results from these and similar paradigms and approaches have led to a variety of theories and models at different levels of specificity. Treisman's feature integration theory has developed from the initial 'glue' metaphor (Treisman and Gelade, 1980) into an elaborate theory that proposes a set of specialized
396
0. Neumann
mechanisms (Treisman, 1988, 1992). Similarly, the proposition from the 'spotlight' metaphor that attention 'prioritizes' processing in the 'lighted' area (for overviews see Eriksen and St James, 1986; Eriksen and Murphy, 1987; Schneider, 1991) has found more exact counterparts in connectionist models of visual attention, in which the effect of attention is conceptualized as an additional input into the units representing the selected stimuli (Cohen, Dunbar and McClelland, 1990; Phaf, van der Heijden and Hudson, 1990; see Chapter 1). As several authors have pointed out (Allport, 1987; Neumann, 1990), this way of modeling visual attention is supported by findings from neurophysiology and electrophysiology, which indicate that sensory attention consists of the enhancement of the activity of neural units in certain brain areas (e.g. the posterior parietal cortex; see Chapter 9). An insightful theoretical integration of much of this literature has recently been proposed by van der Heijden (1992).
1.5
Conclusions and Preview
This brief survey of the historical development of modern theorizing about attention was intended to provide the reader with a frame of reference for the subsequent sections, which will be organized around topics rather than theories. However, these two principles of organization are not completely orthogonal. Theories have stressed some topics and neglected others. Not all theories have spoken to all topics. Therefore, each theory will be discussed more extensively in some sections than in others.
2
2.1
THE FUNCTIONAL
BASIS OF LIMITED CAPACITY
Basic Conceptualizations and Metaphors
In what may have been the first reference to limited attentional capacity in a philosophical work, Aristotle remarked in 'On the senses and the sense objects' that we may fail to notice what happens before our eyes if we are absorbed in thought, or frightened, or hear a loud noise (Bekker, 1831). It is important to realize in which sense this passage may be interpreted as referring to 'limited capacity'. Aristotle describes empiric observations, which point to the general fact that strong impressions prevent us from noticing other, simultaneous impressions. In this descriptive sense, the existence of limited capacity has been taken as a plain fact ever since ancient philosophy. The usual descriptive terms were 'limitatio attentionis' in Latin and 'Enge des BewuBtseins' in German (roughly, 'limited range of consciousness'; see Neumann, 1971, for a survey). It was against this background that experimental psychology began to investigate attention (e.g. James, 1890; Pillsbury, 1908; Wundt, 1903). In modern attention research, this empiric, descriptive sense of the term 'limited capacity' has tended to be confounded with a theoretical meaning of the term (Neumann, 1978, 1987, 1995). In theories such as those of Broadbent (1958) and
Theories of attention
397
Kahneman (1973), 'capacity' is a theoretical construct that is invoked to explain the fact that capacity is limited in a descriptive sense. Both usages of 'capacity' are legitimate, but they must be kept apart. The word can be used to designate the explanandum ('There is a limit to what a person can perceive and do at the same time'), or the explanans ('This is because the brain cannot process more than x bits per second', or 'This is because all processing operations need an energetical input of which there is only a limited supply in the brain', or 'This is because of crosstalk between similar processing operations', etc.). If, however, both meanings appear within the same theory, then there is a danger of confusing the data and their proposed explanation. In the following, the word 'capacity' will usually refer to the descriptive meaning of the term. When referring to one of the theoretical meanings, I will in most cases use an appropriate synonym (e.g. 'channel capacity', 'effort'), or the context will specify that I refer to the theoretical meaning of the term. Although there have been many theories of why and how capacity is limited, they have all been based on a small number of basic ideas and metaphors. Probably the best-known metaphor has been that of a bottleneck, i.e. a component in the sequence of processing operations that can handle only a limited amount of information per time unit. There have been several variants of this basic view that have differed with respect to where the bottleneck is located and how it is limited, for example in terms of bits per second that can be transmitted (Broadbent, 1958), or in terms of stimuli that can be passed simultaneously (Welford, 1967). Another capacity metaphor expresses the idea that there is some kind of limited supply of something that is needed by all, or almost all, mental processes. One variant has conceptualized this something as energy, with subvariants that either assume that the energy is unspecific (Kahneman, 1973), or else propose different energy supplies for different types of processes (Sanders, 1983) (see Chapter 7). A second main variant, based on the computer hardware analogy, has proposed a limited supply of something similar to the processing power of the central processing unit (CPU) as the functional basis of limited capacity (Moray, 1967). While these approaches assume that limited capacity has a direct counterpart in some basic limitation of the processing system, other theories have proposed more specific causes of limited capacity. Basically, these theories have been built on two ideas. One is the notion of crosstalk-type interference, i.e. interference between processes that are close together in some functional sense and therefore negatively affect each other (Allport, 1980; Goebel, 1991; Kinsbourne and Hicks, 1978; Navon, 1985; Schneider, 1985). The other is the idea that limited capacity is either directly or indirectly a result of peripheral interference at the level of effectors, or effector systems (Allport, 1980; Keele and Neill, 1978; Neisser, 1976; Shallice, 1972, 1978). These theories still stick to the idea that capacity is limited because of deficiencies in processing, though these deficiencies are conceived of as much more specific (and therefore more 'structural') than in theories of channel capacity or computational or energetical capacity. A more radical departure from capacity as an explanatory concept can be found in a different type of theory, according to which limited capacity is not a deficiency, but an achievement of the processing system, with the function of selectively coupling input information to action control (Allport, 1987; Neumann, 1983, 1987, 1992, 1995; van der Heijden, 1990, 1992). To summarize, there have been two basic approaches, each with two major variants. Limited capacity in a descriptive sense has been attributed either to
O. Neumann
398
limited capacity in a functional sense, or to more specific mechanisms. Within the former approach, the limitation has been conceptualized either as a bottleneck or as a limited supply. Within the latter variant, specific interference has either been attributed to competing central a n d / o r peripheral processes, or interference has been viewed as a consequence of the selective coupling of input information to action control. I now review these approaches in more detail.
2.2 2.2.1
The Central Bottleneck B r o a d b e n t ' s (1958) 'P S y s t e m '
The idea of a central bottleneck was mainly derived from the channel model of Shannon and Weaver's (1949) Mathematical Theory of Communication, which had a strong impact on the psychology of the 1950s and early 1960s. This t h e o r y - usually simply called 'information t h e o r y ' - provided two important theoretical tools for psychology. One was its communication model, which consisted of a sender, a receiver and a channel with a limited transmission capacity between them. The other was the quantification of information by means of the 'bit' measure, where one bit was equal to the information content of a decision between two equiprobable alternatives. Psychology exploited the channel model in different manners. In psycholinguistics and social psychology, the sender and the receiver were usually equated with persons, and the communication channel was conceptualized in terms of a sign system (a language or a set of nonverbal signals) that they use to exchange information. In human performance research, the model was used in a different manner: stimulus input was equated with the sender, motor output was an analog to the receiver, and the human processing system as a whole was assigned the function of communication channel. As to the bit measure, the strictness with which the theory was applied varied widely, ranging from exact calculations of bit rates to purely metaphorical usages of the channel concept. Initially, there had been numerous attempts to determine the capacity of the human processing system in terms of transmission rate as a quantitative measure (e.g. Attneave, 1959; Garner, 1962; Miller, 1956). However, the results were largely disappointing. Though the information content of the stimulus could sometimes predict performance (e.g. reaction time: Hick, 1952; and performance in absolute judgment tasks: Miller, 1956), there were many exceptions, and no general limit of human channel capacity could be established. Neisser (1967) closed the case by concluding that attempts 'to quantify psychological processes in informational terms have usually led, after much effort, to the conclusion that the 'bit rate' is not a relevant variable after all' (p. 112). These early attempts to measure actual human channel capacity had, however, surprisingly little impact on attention research. When Broadbent (1958) introduced the idea that the limitation of attentional capacity is due to the limited-channel capacity of a central component of the human processing system, called the 'P system', he did not try to determine channel capacity quantitatively, nor even to offer evidence that the performance limits found in dichotic listening tasks were, in terms of bits per second, in the same order of magnitude as other estimates of human channel capacity. In fact, they were not. An estimation of channel capacity
Theories of attention
399
from performance limits in typical dichotic listening tasks, such as the 'split-span' task of Broadbent (1954), leads to much lower values than all other estimates of human channel capacity that were published at the time (for details see Neumann, 1995, ch. 4). Thus, although Broadbent (1958, p. 5) insisted that he used the term 'capacity' in the exact meaning of the Mathematical Theory of Communication, his 'channel capacity' was actually only a nonquantitative derivative of it. It was a derivative that maintained, however, one important property of the original concept. Shannon and Weaver's (1949) channel capacity had been defined in terms of bits per time unit, not signals per time unit. This implies that the lower the information content of individual signals, the more signals can be transmitted in a given amount of time. The system is limited not in terms of signals, but in terms of the information that they carry. This was also the kind of capacity limitation that Broadbent (1958) proposed for his P system: 'To some extent, two messages may be dealt with simultaneously, if they convey little information. But there is a limit to the amount of information which a listener can absorb in a certain time, that is, he has limited capacity' (Broadbent, 1958, p. 17). As Sanders (1979) has pointed out, an unspecific capacity limitation can be based on either of two types of processors. Processor type A, as Sanders (1979) called it, allows for shared capacity processing, while processor type B can process only one signal (or group of signals) at a time. If the information load per signal is reduced, a type A processor will be able to process more signals in parallel, while processor type B will need less time to process a signal, but continue to process only one signal at a time. Though Sanders (1979) illustrated his concept of a type A processor with Moray's (1967) computational capacity model, it is equally compatible with the concept of channel capacity as defined by the Mathematical Theory of Communication. Broadbent's (1958) central channel was a type A processor. Whereas the idea of humans as a capacity limited transmission channel did not fulfill the original expectations in the areas in which it had been originally explored (e.g. choice reaction time (RT), pattern recognition, short-term memory, language processing) and was therefore largely abandoned, Broadbent's derivative of Shannon and Weaver's (1949) channel concept turned out to be surprisingly long-lived. One reason may have been that it was not formulated in quantitative terms and could therefore not be disproved by simply calculating bit rates. As we shall see in a later section, it disappeared not because it was demonstrated to be wrong, but because the emerging computer age suggested a more attractive metaphor than that of a transmission line. The fate of the alternative concept, Sanders' type B processor, was different. It continued to play a role and has recently received renewed interest.
2.2.2
Single-Channel Theory
The idea that, independently of information content, only one signal at a time can pass through the central bottleneck has been known as 'single-channel theory' (Welford, 1967, 1980). The standard experiment on which this notion has been based is the 'psychological refractory period' (PRP) paradigm in which two stimuli, to which the subject has to react with two different responses, are presented in close succession (for reviews see Kantowitz, 1974; Koch, 1993; Pashler, 1993; Smith, 1967; Welford, 1980). As first reported by Telford (1931), the response to the second stimulus is delayed if the stimulus onset asynchrony (SOA) is short, typically less than 500 ms. The size of the delay has often been found to be complementary to
400
O. Neumann
the SOA, which results in an approximately constant inter-response interval. The standard interpretation of this finding has been that there is some central mechanism that can deal with only one stimulus-response pair at a time, and hence processing of the second stimulus has to be delayed until the first stimulus has left this bottleneck. The recent renewed interest in psychological refractoriness has led to different views of the phenomenon. The research by Pashler and his coworkers (reviewed by Pashler, 1993) suggests two conclusions. First, the PRP is a stable, universal phenomenon that is neither abolished by practice, nor does it disappear when the two responses are made dissimilar. In this sense, the bottleneck does seem to be a general capacity limitation. Second, there are tasks such as visual attention shifts and storing information in visual short-term memory that do not appear to depend on the bottleneck mechanism. Still, they have their own attentional limitations, suggesting that 'divided attention costs do not reflect a single mechanism or capacity' (Pashler, 1989, p. 507). A more radical departure from the classical view of psychological refractoriness has recently been proposed by Koch (1993). Based on a comprehensive re-analysis of data from the literature and his own experiments, Koch has suggested that the PRP results from a combination of several factors. First, there are costs of responding to the two stimuli in the correct serial order. Second, responses to the second of two imperative stimuli are normally less optimally prepared than responses to a single stimulus. Finally, there is specific interference if the two responses involve similar effectors, e.g. the two hands. By eliminating all three factors, Koch (1993) was able practically to abolish the refractory period. These developments in theorizing about the refractory period are typical for a general trend in modern theories of attention: the early general metaphors that were intended to capture a single construct 'attention' give way to models that assume a multitude of specific mechanisms (Neumann, 1992). This was the fate of the bottleneck theories, and, as we shall see in the next sections, it has likewise been the fate of the capacity supply theories.
2.3 Scarce Capacity Supply The group of limited-capacity theories to be discussed in this section conceive of limited capacity not as a narrow space, but as a scarce supply. The shift from the channel metaphor to the supply metaphor is more than a change in the way in which the abstract idea of a central processing limitation is visualized. It also marks a different conceptual view of attention. A narrow space is located at some definite place. By contrast, a scarce supply can be freely distributed among different recipients. Consequently, the supply metaphor has been preferred by those theorists who have believed that limited central capacity is not confined to one particular type of operation within the processing system, but is a general limitation to essentially all kinds of processing operations. This notion of an almost ubiquitous capacity limitation has been put forward in two main variants. The first has been derived from an analogy to the processing capacity of a computer. The second has been closer to physiology than to computing, and has linked limited capacity to an energetical supply. I first describe these
Theories of attention
401
two variants and then discuss their merits and shortcomings. This is followed by two sections on approaches that have assumed multiple resources instead of a single capacity supply. 2.3.1
The CPU Metaphor
a n d t h e E n e r g e t i c a l V i e w of C a p a c i t y
Moray (1967) was probably the first theoretician to state clearly the scarce supply view of limited capacity: '... the brain can divide up its capacity.., and allocate it in different ways according to the task it is set... The Plan, its execution, data storage if needed, transmission line etc., all compete for the available capacity' (Moray, 1967, p. 87f). Moray suggested that all these operations are performed by a device similar to the central processing unit (CPU) of a computer: 'a limited capacity central processor whose organisation can be flexibly altered by internal self-programming' (Moray, 1967, p. 85). Strictly speaking, the CPU metaphor classifies Moray's model as a type B processor in Sanders' (1979) terminology, since the CPU of a von Neumann machine is a serial device that can perform only one processing operation at a time. However, this aspect of the computer analogy has hardly played a role in the 'scarce supply' variants of capacity theory. In the user's perspective, serial computers can share their processing capacity between several concurrent tasks, e.g. text processing can be continued while a document is printed. The fact that, at the level of what happens in the CPU, the computer switches between the two tasks is insignificant for the user. Similarly, the limited central processing capacity supply has usually been understood as something that can be time-shared. In his influential monography Attention and Effort that appeared a few years later, Kahneman (1973) classified his own and Moray's (1967) approach under the name of 'capacity theories'. This common label seems to have somewhat obscured the fact that, despite their shared view of attention as a limited supply, the two concepts of 'capacity' were quite different. Moray (1967) used the term in its usual meaning, as the capability of the processing system to carry out operations. For him, capacity was a feature of the processing system. Kahneman (1973) used the term in a different sense. In Kahneman's theory, capacity was an input into the processing system: ' . . . the completion of a mental activity requires two types of input into the corresponding structure: an informational input specific to that structure, and a nonspecific input, which may be variously labeled "effort", "capacity", or "attention"' (Kahneman, 1973, p. 9). It is not easy to find out how Kahneman (1973) defined the relationship between the concepts of attention, capacity and effort. The above citation gives the impression that he regarded the three terms as synonyms. On the other hand, the title Attention and Effort suggests that he thought of attention and effort as two different concepts. Indeed, a careful reading of the text reveals that he used the two terms with different meanings. However, he did not define their difference consistently, nor did he use the term 'capacity' consistently. One conceptualization of the relationship between attention and effort was based on the distinction between the selective and the intensive aspect of attention, which Kahneman (1973) adopted from Berlyne (1969). The selective aspect refers to the fact that attention can be directed to some contents at the expense of others, while the intensive aspect refers to the fact that a person can deploy more or less
402
O. Neumann
attention. Based on this distinction, Kahneman stated that 'the intensive aspect of attention corresponds to effort' (Kahneman, 1973, p. 12). This suggests that Kahneman regarded attention as the more general concept, and effort as a term that designed the intensive, but not the selective, aspect of attention. Other formulations imply, however, that effort was meant to be the basis of selective attention as well as of the intensive aspect of attention: '...selective attention is viewed as the selective allocation of effort to some mental activity in preference to others' (Kahneman, 1973, p. 12). This seems to express a different conceptualization of the relation between attention and effort, viz. that 'effort' refers to the the supply that is allocated, while 'attention' is the act of allocating it. This interpretation is, however, refuted by other passages in the book where Kahneman refers to an 'input of attention' (e.g. pp. 9 and 12). It is likewise not quite clear how Kahneman conceptualized the relationship between the concepts of 'attention' and 'capacity'. Sometimes he seems to have used the two terms as synonyms (e.g. pp. 9 and p. 13). Other passages (e.g. 'a capacity theory of attention', p. 7) suggest that he regarded capacity as a theoretical construct that was intended to explain attention.
2.3.2
M e r i t s a n d S h o r t c o m i n g s of t h e C a p a c i t y S u p p l y A p p r o a c h
As mentioned, Moray's (1967) model of attention has usually been interpreted as an instance of a type A processor that allows for time-sharing, despite its usage of the CPU metaphor. The fact that a device that is, strictly speaking, a type B processor can nevertheless be described as a type A processor at the level of the 'user illusion' is important for an adequate understanding of capacity supply theories. They have been typically situated at a macroscopic level of description. If one asks subjects to perform a tracking task while at the same time adding numbers (North, 1977), it is obvious that they can execute these tasks simultaneously in the sense that they do not have first to finish one task before they can start the other. It is quite another matter whether there is real time-sharing at a microscopic level, for example whether a central decision mechanism can make two decicions at exactly the same time. There is evidence to the contrary (Fisher, 1975; Ostry, Moray and Marks, 1976; Pashler, 1993; see also Chapter 4). Nevertheless, the description that subjects share their attention between the tasks remains correct as long as one considers only overall performance measures, such as mean tracking error or the average number of correct additions per time unit. The concern with overall performance rather than detailed mechanisms explains many of the strengths and weaknesses of capacity supply theories. Their main strength has been that they have, at the macroscopic level of analysis, contributed considerably to a refined empiric description of dual-task data. In particular, the performance-operating characteristic (POC) has proved to be a valuable tool for describing how two tasks affect each other in a dual-task situation (Navon and Gopher, 1979; Norman and Bobrow, 1975; see Chapter 4). To obtain a POC, dual-task data are collected under different instructions, emphasizing one or the other task in graded degrees. Performance in one task is then plotted against performance in the other task for each instruction. The shape of the resulting function provides information about the kind of tradeoff between the tasks. For example, a flat portion of the POC indicates that better performance in one task can be obtained without a performance loss in the other task, which may indicate either
Theories of attention
403
that the two tasks draw on separate resources (see below), or that one of them is in the data-limited region, where performance cannot be improved even if the task is allotted more capacity. The distinction between data limitation and resource limitation (Norman and Bobrow, 1975) is another example of the success of this kind of analysis at the descriptive level. It refers to the fact that performance can be less than perfect either because the task is not allotted a sufficient amount of capacity, or because of the insufficient quality of the data. In the first case, performance can be improved by devoting more attention to the task, while in the latter case it cannot. It is less certain that this approach has added much to a theoretical understanding of limited capacity. In particular, the hope to discover quantitative relationships between invested capacity or resources and measured performance (performanceresource functions), and thus predict performance from resources, has proven futile (Navon, 1984). As has been pointed out by Neumann (1995), the fundamental problem is that, unlike economic resources in microeconomic theory, which are observables that can be directly measured, the hypothetical resources of capacity supply theories can be determined only indirectly via performance measures. This means that the intervening variable (resources) that is intended to predict a dependent variable (performance) is anchored only in this dependent variable, rendering their relationship circular. Performance-resource functions can therefore not be determined empirically; their shape is a matter of definition, not of discovery. In principle, this difficulty could be overcome by using converging operations, i.e. by anchoring capacity and resources in variables other than performance. Kahneman (1973) intended this when he suggested measuring effort by means of psychophysiological measures such as pupil dilatation. However, Kahneman (1973) also noted that it is difficult to separate the physiological effects of effort from those of stress. As Sanders (1979) has pointed out ' . . . the relation between a psychophysiological measure and capacity assumptions will be very hard to prove. A simple relation between the physiological measure and a task variable is clearly not enough to qualify the measure as one for mental load...' (p. 70). The problem with the concept of effort is not only one of measurement. Successfully anchoring this construct in psychophysiological measures presupposes that there is indeed a general, undifferentiated capacity supply. Empirically, this postulate has been based on anecdotal evidence (e.g. the example of the driver who interrupts a conversation to make a turn; Kahneman, 1973, p. 9) and on experimental data showing that even dissimilar activities such as card sorting and generating random numbers (Baddeley, 1966), or comparing letters and responding with a button press to a tone ('probe signal task'; Posner and Boies, 1971), exhibited interference. However, it is fairly obvious that observations like these do not provide strong evidence in favor of the assumption that there is only one, undifferentiated capacity supply. First, even seemingly dissimilar tasks may well load the same specific process or mechanism. For example, the letter matching and probe signal tasks of Posner and Boies (1971) had in common that subjects responded by pressing a button. A change in the response mode of one of the tasks has been shown to affect strongly the degree of interference (McLeod, 1977). Furthermore, interference between dissimilar tasks proves at best that a competition for unspecific capacity is one source of interference. To demonstrate that it is the only source of interference, one has to compare dual-task interference between two
404
0. Neumann
different primary tasks and two different secondary tasks across all four possible task combinations (Heuer, 1985; Neumann, 1978, 1981). However, this kind of test has rarely been attempted by proponents of undifferentiated capacity theory. Where such data are available (Brooks, 1968; Neumann, 1981; Sanders, 1979; Wickens, 1980) they have usually not supported this theory. It seems, then, that the belief in an undifferentiated capacity supply was motivated more by a theoretical conviction than by empiric facts. In Moray's (1967) variant, this conviction seems to have been based mainly on the attractiveness of the computer CPU metaphor, which led him to postulate 'universal neurons' in the brain, similar to the universal computational power of a CPU. As we have seen, Kahneman's (1973) variant had a different background: the notion of a mental, or psychophysiological, energy in the style of Freud's concept of cathexis. Indeed, Kahneman had been influenced by David A. Rapaport's psychoanalytic thinking and, as he wrote in the preface to Attention and Effort, '... my understanding of attention bears the permanent imprint of that encounter' (Kahneman, 1973, p. V). Freud's energetical (or, as he called it, 'economical') theorizing had been one instance of a theoretical view that enjoyed much prominence at the turn of the century. The idea that attention is based on a limited supply of mental or psychophysiological energy was proposed by many theorists of the time, such as McDougall (1902, 1903, 1906), Spiller (1901) and Erdmann (1920) (see also Chapter 7). Thus, Kahneman (1973) took up an old tradition. However, these energetic notions had always met with criticism, the main objections against the older versions being that they were physiologically unfounded, and that they did not have any explanatory value (Diirr, 1907; Ziehen, 1920; see Neumann, 1995, ch. 2). The same kinds of criticism have been directed by present-day authors against Kahneman's concept of undifferentiated capacity (Allport, 1980; Neumann, 1978, 1995). There is the risk that the empiric observations about the phenomena of attention (it is limited, can be shared between different tasks, can be allotted voluntarily and involuntarily, etc.) are simply reworded as theoretical statements about the construct of capacity, suggesting an explanation where there is actually only a redescription. This danger of a pseudoexplanation is certainly not restricted to energetical theories (homunculus theories are another example; see Allport, 1980), but energetical theories are especially susceptible to it as long as they are not based on independent measures of energy. One of the most important developments since Kahneman's (1973) book has been that such measures are nowadays widely in usage, both in the form of independent empiric variables (e.g. drugs, sleep loss) and of dependent physiological measures such as heart rate and the galvanic skin response (see Chapter 7). This modern version of the energetics of attention has clearly demonstrated the importance of not ignoring its intensive aspects, but it has also accumulated ample evidence that there is no single, undifferentiated 'attentional energy', thus supporting the type of theory to which we now turn.
2.3.3
Multiple Resource Theory
The amount of dual-task interference depends not only on task difficulty, but also on the structure of the competing tasks. Wickens (1980) has gathered many empiric examples, and grouped them into two kinds of effects. Structural alteration effects are
Theories of attention
405
said to exist when a structural change, without a change in task difficulty, affects the degree of interference. For example, interference has been shown to be smaller when task modalities (e.g. visual or auditory, vocal or manual) are different from when they are similar (e.g. Allport et al., 1972; Brooks, 1968; Neumann, 1981; Treisman and Davies, 1973; ). Difficulty insensitivity refers to the case that a change in the difficulty of one task does not affect performance in a second task (Kantowitz and Knight, 1976; North, 1977). Initially, there were attempts to reconcile such findings of specific interference with the notion of undifferentiated capacity. Kahneman (1973) suggested that, besides capacity interference, which 'arises because of the attentional demands of the competing activities', there is structural interference which 'occurs because the activities occupy the same mechanisms of perception or response' (Kahneman, 1973, p. 196). This saved the idea of unspecific capacity, but at the price of excluding conflicting results (namely, all cases of specific interference) from the range of explanation of the theory. This attempt at a solution lost its attractiveness as examples of specific interference accumulated in the 1970s (reviewed by Wickens, 1980). Another way out that seemingly saved the idea of a single capacity was suggested by Kantowitz and Knight (1976). They were concerned with empiric examples of difficulty insensitivity (e.g. varying the difficulty of a verbal task does not affect performance in a concurrent tapping task). To explain these observations, Kantowitz and Knight (1976) capitalized on a particular aspect of the capacity supply metaphor: If capacity is similar to a fluid, it needs a receptacle that contains it. Kantowitz and Knight's (1976) suggestion was that there is more than one such reservoir, and that not all reservoirs can exchange capacity freely. Insensitivity to difficulty occurs if two tasks draw in part on reservoirs that cannot exchange capacity. In hindsight, it is obvious that this was not very different from postulating separate, specific resources instead of a single, undifferentiated capacity. The concept of multiple processors or multiple resources imposed itself as the consequence of the failure of undifferentiated capacity theories. These had predicted that dual-task interference did not depend on the structure of the competing tasks. When it became clear that it did, the natural conclusion seemed to be that capacity is not undifferentiated. At the beginning this was more a negative conclusion than a positively formulated new theory. Thus, when Allport et al. (1972) first suggested the idea of multiple processors, their main intent was (as the title of the paper says) a 'disproof of the single channel hypothesis'. When Sanders (1979) put forward the idea of a type C p r o c e s s o r - a processor consisting of several specialized subprocessors- he similarly pointed out that 'basically, processor C suggests a c~mplete reconsideration of the capacity concept' (Sanders, 1979, p. 55). When Wickens (1980) analyzed the available literature on specific interference, he struck the final blow against the notion of general, undifferentiated capacity by showing that specific interference was ubiquitous. All three contributions were convincing in their negative message: dual-task interference depends on the structure of the competing tasks, not simply on their difficulty. However, it proved much harder to arrive at positive conclusions about specific resources. Basically, there have been two types of attempts toward an inventory of resources. One has been to examine systematically all known cases of structural alteration and difficulty insensitivity effects, and find out whether there is a recurrent pattern in them. One heroic effort of this kind was Wickens' (1980, 1984) analysis of the dual-task literature, which will be described in this section. A second
406
0. Neumann
approach has been to go beyond the dual-task methodology and analyze how different types of independent variables that may be expected to affect resources (e.g. knowledge of results, sleep loss, various psychopharmaca) influence different stages of information prcrcessing, as assessed by Sternberg's (1969) additive factors method. The most notable attempt of this kind has been Sanders' (1983) cognitive-energetic model of information processing, which we shall consider in the next section. The first of these methodological approaches was suggested more than half a century ago by Bornemann (1942). However, the first author who tried to put it to work by systematically analyzing the available dual-task literature seems to have been Wickens (1980). His strategy was to postulate three candidates for resourcesprocessing stages, modalities and hemispheres of the b r a i n - and to scrutinze the literature for structural alteration effects and examples of difficulty insensitivity that support the existence of these resources. His article has usually been cited as evidence for all three kinds of resources. Actually, the pattern of data was complex and ambiguous. Structural alteration effects were present in only about 75% of the experiments in which they should have emerged according to the resource model. The data on difficulty insensitivity were even more equivocal. There were numerous cases of difficulty insensitivity, but many of them did not conform to the three-types-of-resources scheme. For example, tracking tasks did not reveal modality-specific difficulty insensitivity (Wickens, 1980, p. 246). On the other hand, there was evidence that tracking and memory tasks draw on some common resource, a resource that does not seem to have a place in the three-types-of-resources scheme. Furthermore, tracking difficulty seemed to affect all processing stages, suggesting, as one possibility, that 'resources were undifferentiated across stages' (Wickens, 1980, p. 247). Wickens (1980) did not deny these difficulties and discussed them explicitly. Nevertheless, his paper has usually been interpreted as supporting the three-typesof-resources model. One reason may have been that the article contained two different messages that could be easily confounded. Its negative point was that, contrary to undifferentiated capacity theories, dual-task interference is task-specific in a large proportion of the relevant experiments. In this respect, Wickens' survey was convincing, and in this sense it strongly supported a multiple resources theory. The second, positive message was that task-specific interference is based on a small number of resources that can be organized in a simple scheme. With respect to this conclusion the survey was ambiguous, and as far as 'resource theory' refers to this scheme, the survey supported it only weakly. In a second paper that appeared a few years later, Wickens (1984~ was mainly concerned with the conceptual clarification of his resource model. The discussion in this paper was very thoughtful and went through three reformulations of the model. In a first variant, Wickens put forward a dimensional scheme with the dimensions 'stages', 'modalities' and 'codes'. The result was 'a set of independent, nonoverlapping reservoirs each defined by a combination of levels on the three dichotomous dimensions' (Wickens, 1984, p. 87). This was a major reformulation of the 1980 model, since the term resource now referred not to a single processing characteristic such as auditory processing, but to a conjunction of processing characteristics that defined a cell of the scheme, e.g. auditory/verbal/encodingcentral processing (the latter two processes were regarded as a common level on the dimension 'processing stages'). This revision had the advantage of providing a
Theories of attention
407
coherent, well-organized scheme of resources. Its disadvantage was that it predicted no interference between tasks that belonged into different cells of the scheme, which was empirically untenable. As Wickens pointed out, specific interference can be found between tasks that share only one of these processing characteristics, even if they differ on the other two dimensions. This cannot be explained by the dimensional model. Wickens (1984) therefore considered a second, hierarchical model of nested resources. One variant that he explored had auditory and visual resources at the lowest level of the hierarchy, more general verbal and spatial resources available to both modalities at the next level, still more general resources of perceptual/central and response processing at the subsequent level, and finally 'a pool of "undifferentiated resources" that is available to and demanded by all tasks, modalities, codes, and stages' at the highest level (Wickens, 1984, p. 88). Thus, the model incorporated unspecific interference as a subclass of specific interference. Like the dimensional model, it did not, however, come to grips with the empiric data. The problem was that the hierarchical model implies an asymmetry of interference: if two tasks demand different resources at a given level, the model permits that they draw on the same resource at a higher, but not at a lower, level. This is because different resources cannot share the same resource at a lower level in the hierarchy. However, such a pattern of interference has often been found, e.g. input modality specific interference when one task is verbal and the other is spatial. Therefore, this model also had to be abandoned. The final possibility that Wickens (1984) considered can hardly be called a model. He concluded from the failure of the dimensional and the hierarchical models that it 'is probable.., that some aspect of a "shared feature" model must be employed to predict performance, a feature being defined as a level along each of the dichotomous dimensions of the resource space' (Wickens, 1984, p. 88). He goes on to suggest that, according to this scheme, interference will be the stronger, the more features the tasks have in common. This is obviously hardly more than an empiric generalization. To summarize, a careful reading of Wickens' (1980, 1984) contributions shows that they express the problems of the multiple resources approach at least as clearly as its assets. The empiric pattern of results does not seem to lend itself to a simple scheme of resources. We next turn to an approach that also used the term resources, but in a different sense and based on a different kind of data.
2.3.4
Stages and Resources
Despite its shortcomings as a general theory of attention, Kahneman's (1973) monograph had the merit of stressing the fact that attentional phenomena have not only psychological but also physiological aspects that should be incorporated into an adequate theory. This psychophysiological approach was by no means new. It had been the standard approach in classical theorizing about attention (Henning, 1925; Ribot, 1906; Wundt, 1903; Ziehen, 1920; cf. Chapter 7). Though psychophysiological investigations of attentional phenomena had never disappeared, they did not play a prominent role in the majority of modern theoretical approaches to attention, which were dominated by metaphors from communication engineering. As a consequence, there was a structural (mainstream experimental psychology)
408
0. Neumann
and an energetical (psychophysiological) research tradition, with little exchange between them. Kahneman (1973) tried to bridge the gap by using his energetical approach to interpret data that had been collected within the structural tradition. But, apart from the statement that processing structures require a capacity input in addition to informational input, he had little to say about how structural and energetical factors interact in the processing system. The credit for having first systematically pursued this problem belongs to A.F. Sanders, who has put forward a model that w a s - as its name 'cognitive-energetical model' suggests- explicitly aimed at an integration of the two research traditions (Sanders, 1983; see also Gopher and Sanders, 1984, and Chapter 7). On the structural side, Sanders (1983) proposed a linear stage model with four stages (stimulus preprocessing, feature extraction, response choice and motor adjustment). This aspect of the model was mainly based on results from RT research with the additive factors logic, which Sanders (1980) had analyzed in a preceding paper. On the energetical side, Sanders proposed 'three types of energetical supply or resources. In line with the notion of multiple resources, the processes involved in different stages draw upon different energetical resources' (Sanders, 1983, p. 74f). With the exception of the stimulus-preprocessing stage, each of the stages is associated with its own resource type, according to the model. The resource that belongs to the feature extraction stage is arousal; response adjustment is associated with activation, and effort is the resource that is demanded by the response choice stage. Besides supplying this stage, effort serves to control and coordinate arousal and activation. Sanders (1983) based this model on a wealth of data, in part from psychophysiology, but mainly from RT research in which the effects of so-called state variables (e.g. sleep loss, knowledge of results and the application of psychopharmaca) were investigated. The obtained patterns of interaction with structural variables such as signal quality and stimulus-response (S-R) compatibility provided the main empiric basis of the model. One example is Sanders' interpretation of the effects of sleep loss on RT. It interacts with signal quality, impairing responses to degraded stimuli more than responses to intact stimuli. This suggests that sleep loss affects a resource that is required for feature extraction (arousal). Sleep loss further interacts with foreperiod variability, which is known to affect the response stage, suggesting that there is an effect of sleep loss on another resource, specific to motor adjustment (activation). Finally, there is no interaction between sleep loss and S-R compatibility, implying that central decision processes draw on a resource (effort) that is not affected by sleep loss. In a similar way, Sanders interpreted the effects of other variables, e.g. the finding that amphetamines and barbiturates interact with different structural variables. For an excellent review of further research on the the relationship between structural and energetical aspects, see Chapter 7. To understand what distinguishes this resource concept from that of Wickens (1980, 1984), we have to note several facts. First, Sanders' resources were intended to explain the intensive, and only the intensive, aspect of attention, whereas the resource systems as discussed by Wickens were designed to explain all kinds of attentional phenomena. Second, Wickens' resources (at least those of the 1984 version) were purely psychological constructs, whereas Sanders (1983) adopted the names of his resources from Pribram and McGuinness' (1975) physiological theory of attention and pointed out the similarities between the two models. Third, and most important, Wickens' resources were derived from the dependent measures that they were intended to explain. By contrast, Sanders' (1983) resources were
Theories of attention
409
anchored in independent variables and were used to explain dependent measures (RT data). Hence, Sanders' (1983) notion of resources does not suffer from the major shortcomings of the resource concept of which Wickens (1980, 1984) has been the most prominent representative. They are intervening variables that relate independent variables to dependent variables, and do not therefore suffer from the danger of circularity. Further, they integrate behavioral and physiological research and are operationalized by converging operations. Third, by restricting himself to the intensive aspect of attention, Sanders (1983) does not fall into the trap of explaining all cases of attentional selectivity as the allocation of resources to the attended contents at the expense of the nonattended contents, i.e. of explaining the allocation of attention by the allocation of attention. In short, Sanders' (1983) cognitiveenergetical model was so different from resource theorizing in the style of Wickens (1980, 1984) that it was perhaps unfortunate that the two approaches went under the same name. Sanders' (1983) theory was a theory of regulatory psychophysiological mechanisms and of how they are related to information processing mechanisms, not an attempt to account for everything that has been investigated under the heading of 'attention'.
2.4
Mechanisms of Interference
Despite their insistence on the multiplicity of resources, resource theories such as those of Navon and Gopher (1979) and Wickens (1980, 1984) were unitary theories in the sense that they postulated a single functional cause of all attentional phenomena, namely, an insufficient supply of something that is required for processing. In this respect, they did not differ from previous capacity theories. This common conviction of all dominant theories between the 1950s and the 1980s had two implications, one for the theoretical view of interference, and one for the conceptualization of the relationship between interference and selection. As to interference, it was deemed to be a direct consequence of the limitation. If there is not enough capacity (transmission capacity, effort, computational power, etc.) for perfect performance, then the quality of the output will suffer, resulting in interference. This was such a simple and powerful idea that alternative causes of interference seem to have been substantially overlooked by most theorists. However, it is fairly obvious that there are factors that can potentially produce interference without being due to a scarcity of resources. For example, the simultaneous performance of two tasks may induce problems of coordination and of preventing crosstalk. Though these factors were sometimes mentioned by resource theorists (one example is Navon and Gopher's (1979) concept of concurrence costs), they were generally not considered as a theoretically relevant aspect of limited capacity. The first authors to point out the potential importance of these types of interference for an understanding of limited capacity were Neisser (1976) and Allport (1980). This approach will be the topic of the first part of this section. Regarding the relationship between selection and interference, there was likewise a broad consensus from the 1950s to the 1980s. Since Broadbent's (1958) filter theory, attentional selectivity was viewed as a secondary consequence of limited capacity: selection is required to come to grips with capacity limitations. It was largely overlooked that even an organism whose brain enjoys an (for all practical
410
0. Neumann
purposes) unlimited capacity would have to select between alternative actions, and between alternative stimuli that could control these actions. According to the view of capacity that will be discussed in the second part of this section (Allport, 1987, 1989; Neumann, 1983, 1987, 1992; van der Heijden, 1990, 1992), this means that unwanted actions have to be suppressed, and unwanted stimuli have to be prevented from gaining access to the control of behavior, causing a type of interference that is an achievement rather than a shortcoming.
2.4.1 Noncapacity Sources of Interference In his thoughtful and provocative critical analysis of the information processing approach, Neisser (1976) presented a radical alternative to capacity theories. He began his discussion of the capacity problem by pointing out that no one 'has ever demonstrated that the facts of selective attention have any relation to the brain's real capacity, if it has any' (Neisser, 1976, p. 98). On the other hand, it cannot be denied that people may have problems doing several things at the same time. What Neisser (1976) set out to show was that these difficulties can be explained as resulting from several specific problems that have nothing in common with the conventional concepts of capacity or resources. Neisser considered the following types of problems. First, there is the possibility of peripheral interference. Two actions may be physically incompatible, and two sensory stimuli may mask each other. Second, there can be problems of coordination. For example, two actions may be hard to combine because they require incompatible timings and postures. To perform them together, they have to be reorganized, which requires learning. A lack of practice is also invoked by Neisser to explain cases where the simultaneous performance of two actions (such as car driving and leading a conversation) is interrupted when an emergency arises. He argues that this type of situation is usually not well practiced. A further type of interference is attributed to problems that arise when the same perceptual schema has to be used for two incompatible purposes, as, for example, in observing one spatial arrangement and at the same time imagining another (Brooks, 1968). Finally, Neisser assumes that there is probably 'some genuine informational impediment to the parallel development of independent but similar schemata' (Neisser, 1976, p. 102f). The extreme difficulty of attending to two similar events in the same modality, e.g. listening to two dichotic messages, may, Neisser suggests, be due to a danger of crosstalk: 'If each schema involves anticipations that span appreciable amounts of time . . . . the problem of applying new information to the correct schema may be insuperable' (Neisser, 1976, p. 103). Thus, Neisser (1976) suggested a radically new perspective on limited capacity. He proposed a variety of perceptual and motor control difficulties that result in the empiric phenomenon of limited capacity. Basically, they fall into two categories. First, if two actions have to be performed simultaneously, they are not simply added. They have to be coordinated, which may be difficult. Therefore, dual-task performance cannot be expected to be identical to single-task performance. Second, if two events have to be perceived simultaneously, there may be problems in keeping them apart and using the correct perceptual schemata to interpret them. Therefore, they cannot be expected to be perceived as readily as single events.
Theories of attention
411
With this view of the capacity problem, Neisser (1976) departed from the main assumptions that had been shared by all strands of capacity theorizing. He attributed interference between two activities not to a competition for something that they both need and that is scarce, but to specific problems that arise because performing one action changes the conditions under which the other is executed. These difficulties may arise in different areas, from perception to peripheral motor coordination. Consequently, 'attention' does not refer to a particular mechanism or type of mechanism in the brain: 'So far .... no separate mechanisms of attention have been found. In my opinion, that is because none exist' (Neisser, 1976, p. 80). These were provocative theses by an author who, with his monography Cognitive Psychology (Neisser, 1967), had been one of the theoretical founders of information processing psychology a decade earlier. Nonetheless, their impact was surprisingly weak. There may have beem several reasons for this (e.g. the general cool reception of the book). In hindsight, one obvious reason was that Neisser (1976) was simply ahead of his time. Subsequently, similar ideas were put forward with gradually increasing acceptance by Allport (1980), Navon (1985; Navon and Miller, 1987) and Hirst and Kalmar (1987), among others. Allport (1980) based his argument on a comprehensive analysis of the empiric findings on dual-task interference that were available at the time, and on considerations about the functional organization of processing in the brain. At the former, empiric level, he showed that there are tasks in which there is little or no interference (e.g. simultaneous monitoring tasks and concurrent skilled transcription tasks). At the latter, theoretical level, he anticipated the turn that theoretical psychology took in the 1980s by pointing out the importance of 'distributed, content-specific, parallel processing in the nervous system' (Allport, 1980, p. 124). Taken together, both aspects suggest that the limitations of attention cannot be simply explained by an insufficient computational power of the brain. Thus, 'the q u e s t i o n . . , arises: Why should there be "attentional" limitations at all?' (p. 125). Similar to Neisser (1976), Allport's answer was that the simultaneous execution of processing operations is limited by various specific difficulties. His list included three kinds of such difficulties: function-specific limitations, data-specific limitations and limitations in keeping different goals active. In discussing function-specific limitations, Allport first points out that actions within the same category may be physically incompatible- we cannot, for example, execute two eye movements into different directions at the same time. Mechanisms of inhibition could play an important role in establishing dominance in this case, and likewise in the case where the competing actions are physically compatible, such as similar categories of action with the left and right hands. Finally, Allport extends this principle to 'competition for the same, function-specific subsystem' (p. 145), such as visual analyzers or an auditory-verbal short-term memory. Data-specific limitations arise, according to Allport (1980), when 'inputs or the intermediate products of processing proper to one task are also eliciting conditions for the domain of action required by the other' (p. 145). One classical example is dichotic listening. The words in the irrelevant channel have to be prevented from controlling the vocal shadowing response, which 'apparently has the effect of also decoupling that information from the control of other voluntary responses' (p. 145). Third, Allport suggests that there are limitations if several action goals have to be kept active simultaneously, a variety of divided attention of which 'we know
412
0. Neumann
singularly little' (p. 147), and which is, for example, reponsible for concurrence costs if two tasks have to be prepared simultaneously. In retrospect, it is apparent that these ideas were, like those of Neisser (1976) and similar suggestions by Keele and Neill (1978), an enormous step forward towards an analysis of specific mechanisms of attention. Being a first step, they could, of course, not solve all problems. They were mainly concerned with processing problems and had comparatively little to say about how the system solves these problems. They convincingly analyzed peripheral causes of interference, but then went on to generalize these causes to central mechanisms, without making clear why this can be done, if there are no central capacity limitations. In the next section, I discuss an approach that presents an attempt to overcome these shortcomings. Its basic assumption is that there are indeed central processing limitations, but that these are produced by functionally useful mechanisms, rather than being based on deficiencies of the processing system.
2.4.2
Limited Capacity as a Consequence of Selection
The idea that the need for selection is a consequence of capacity (resource) limitations seems to have been so plausible that it has long been regarded as trivially true. In filter theory, the P system needed a filter to protect it, because its capacity was limited. In resource theorizing, resources had to be selectively allocated, because they were scarce. The implicit assumption was that no attentional selectivity would be required if the central channel were sufficiently large, or if there were an unlimited supply of resources. Occasionally, this tacit assumption was stated explicitly. For example, Broadbent wrote in the 1971 revision and extension of filter theory: 'If there were really sufficient machinery available in the brain to perform such an analysis [recognition, O.N.] for every s t i m u l u s , . . , it is difficult to see why any selection at all should occur. The obvious utilily of a selection system is to produce an economy in mechanism' (Broadbent, 1971, p. 147). While this view of the relationship between selection and interference has been commonplace since the 1950s, several earlier theories of attention had been based on a different perspective. Theorists such as Ribot (1906) and Henning (1925) proposed that the attentional selection of stimuli serves a vital function in the control of behavior by determining which stimuli gain access to motor control. When Deutsch and Deutsch (1963) suggested that attentional selection does not decide which stimuli are perceptually analyzed, but which stimuli gain access to further processes, such as controlling behavior, they referred to a similar possibility. (However, their view, which was rooted in behavior theory, has as a rule been interpreted in the light of capacity theorizing and has regularly been cited as an example of a 'late selection' theory, which has usually been understood as differing from filter theory only with respect to the location of the bottleneck.) More recently, this action-oriented view of attentional selectivity has been taken up by several theorists (Allport, 1987, 1989, 1993; Neumann, 1978, 1983, 1985, 1987, 1990, 1992, 1995; van der Heijden, 1990, 1992). Since these approaches are very much alike, they will be discussed together. Their starting point is similar to that
Theories of attention
413
of Neisser (1976) and Allport (1980): based on a criticism of capacity theorizing, they analyze problems of coordination and control. The main difference to these earlier theories is that these problems are not regarded as the immediate causes of interference. Rather, it is assumed that organisms possess specific mechanisms for coping with these problems, given the functional characteristics of the central nervous system (modularity, massively parallel processing, etc.). These mechanisms produce selectivity, and limited capacity is a byproduct of selectivity. Thus, this approach reverses the standard view of how limited capacity and selection are functionally related: selection is the basic phenomenon, and limited capacity is its functional consequence. Interference is based on peripheral control problems in the sense that mechanisms that cause it are in the service of solving these problems, not in the sense that these problems themselves produce interference. Within this common framework, Allport's (1987, 1989) and van der Heijden's (1990, 1992) analyses focused on visual attention, while Neumann (1983, 1985, 1987, 1992, 1995) tried to classify the whole range of phenomena that have been studied under the label of attention. He started with a distinction between two kinds of control problems: effector recruitment and parameter specification. The problem of effector recruitment arises because the same effectors can be recruited for different actions, but have usually to be dedicated to one action at a time to prevent a behavioral chaos. The problem of parameter specification refers to the fact that the same intended action can be executed in many different ways, but only one specific mode of execution can be carried out at a given time. According to Neumann (1987, 1992), the mechanisms of attention have evolved to cope with these problems. He proposed five such mechanisms. First, there is what biologists have called behavioral inhibition. Alternative action tendencies inhibit each other, with the result that normally one action will become dominant and be the only one that gains access to the effector system (cf. Shallice, 1972, 1978). This is the way in which the effector recruitment problem is generally solved by animals; they switch between actions instead of performing dual tasks. Humans can perform dual tasks because of a second mechanism, action planning and action control by an action plan that coordinates simultaneous actions. This was assumed to be mainly a function of the prefrontal cortex (cf. Kuhl, 1995; Norman and Shallice, 1986; Posner and Petersen, 1990). Dual tasks are, however, subject to a limitation that results from a way in which the parameter specification problem is solved: due to specific inhibition, the same skill cannot be used for alternative purposes at the same time (skill-based interference as a third attentional mechanism). One manner of overcoming this limitation is through practice, which renders skills more specific and thereby reduces skill-based interference (cf. Heuer, 1984). A second way of solving the parameter specification problem, and a fourth mechanism of attention, is the selection of the stimuli that control an action (selection-for-action; cf. Allport, 1987, 1989). In vision, spatial selection is the main means to achieve this. One of the major brain structures involved in selection-for-action was assumed to be the posterior parietal cortex (cf. Posner and Petersen, 1990; see Chapter 9). Fifth, Neumann argued that organisms have to find an equilibrium between protecting an ongoing action against interruption, and the need to orient towards new, important stimuli. This was attributed to the mechanisms of arousal, activation and effort as described by Pribram and McGuinness
414
0. Neumann
(1975) and Sanders (1983) (cf. also Chapter 5). Allport (1987, 1989) and van der Heijden (1990, 1992; cf. Chapter 1) focused on the mechanisms of selection-foraction in vision. Similar to Neumann, Allport (1989) started from the basic assumption that 'attentional functions have evolved to satisfy a range of positive, biological ("computational") purposes' (Allport, 1989, p. 648). In visually controlled actions, there are usually several objects that could control the action. 'Consequently some selective process is necessary to map just those aspects of the visual array, specific to the target object, selectively onto the appropriate control parameters of the action' (Allport, 1989, p. 648). This problem has to be solved by a system in which 'all cognitive processes have to be realized by means of massively parallel computation, in distributed neuronal networks' (Allport, 1989, p. 654). Based on a comprehensive analysis of behavioral and physiological data, Allport concluded that this is performed by 'mechanisms of competitive priority a s s i g n m e n t . . . , implemented through the selective modulation, potentiation, tuning, output inhibition, and so on in specific coding pathways' (Allport, 1989, p. 654). One example of such a mechanism has been worked out in detail by van der Heijden (1992). His model is not a model of 'attention' as such, but of the visual system and the place of attentional mechanisms within its architecture. Based on data and theories from neurophysiology, van der Heijden assumes three visual modules: an input module (primary visual cortex), an identity module (probably inferior temporal cortex) and a location module (probably posterior parietal cortex). Information from the input module is transferred to the other two modules in parallel. The mechanism of visual attention consists of a feedback loop from the location module to the input module, resulting in an enhanced activation of one of the items in the input module, which is thereby selected and gains access to motor control. Although all input items are fully identified (have access to the identity module), only the most strongly activated item in the input module will gain access to the response system. Thus, 'limited capacity' at the response level is a necessary consequence of selection. In a careful analysis, van der Heijden (1992) showed that this model accounts for a wealth of data from behavioral experiments on selective attention. This model illustrates the new view of the relationship between selection and limited capacity that characterizes the theories that have been reviewed in this section. For capacity supply theories such as Kahneman's (1973), attention was limited capacity, and selection existed simply because capacity had to be allocated. For these new theories, attention is selection, and limited capacity exists simply because the nonselected items have not been selected. Thus, these theories reflect the general current trend towards a stronger emphasis on the selection aspect of attention. We now turn to this aspect.
3
THE LOCUS
AND
MECHANISMS
OF SELECTION
As we have seen, most modern theories of attention until the viewed the selection problem and the capacity problem as two coin. Selection was supposed to exist because capacity is scarce, assumed to occur where capacity is scarce. This had decisive
early 1980s have sides of the same and selection was consequences for
Theories of attention
415
theorizing about both the locus and mechanisms of attentional selectivity. As to its locus, the question 'Where does selection take place?' was considered to be essentially identical with the question 'Where is capacity limited?'. As to mechanisms of attention, they were conceptualized with regard to how well they served the purpose of handling the limited capacity supply, either by blocking access to the limited-capacity portion of the processing system (Broadbent, 1958), or by allocating resources to stimuli and processing operations (Kahneman, 1973). There were mainly four developments since the early 1980s that have led away from this classical doctrine. First, it was pointed out that, at least logically, the question of where selection takes place is different from the question of where capacity is limited (Neumann, 1980; van der Heijden, 1981). Second, there was a shift in empiric research interests from dual-task experiments with their natural emphasis on the capacity aspect of attention to experiments on visual attention, rooted in a research tradition that had always stressed the selection aspect, e.g. ignoring visual noise elements (Eriksen and Hoffman, 1973; Eriksen and Schultz, 1978), visual search (Treisman and Gelade, 1980; Treisman and Souther, 1985) and shifting visual spatial attention (Posner, 1980; Posner et al., 1980). The other two developments unfolded somewhat later, mainly since the mid-1980s. There was immense progress in research on the brain mechanisms of visual selection, including single-cell recordings (Allport, 1987, 1989) and event-related potential (ERP) research (see Chapter 9), both of which began to have their impact on psychological theorizing. Finally, connectionist models of visual attention started to appear (Cohen et al., 1990; Phaf et al., 1990), which were essentially models of how selection is achieved. Together, these development have led to a broad stream of research on (mainly visual) attention with a clear stress on selection and comparatively little emphasis on limited capacity, almost reversing the situation that had existed until about 15 years ago. Another difference is that present-day research is no longer dominated by overarching theories in the style of Broadbent (1958) or Kahneman (1973). Diverse, much more restricted, theoretical approaches exist that have often been centered around one particular, empirically anchored idea with its specific experimental paradigms, such as feature integration or the spotlight metaphor. Instead of discussing these approaches in detail, the rest of the chapter will concentrate on the common issues that they continue to address. Indeed, the basic issues and metaphors have remained remarkably similar during the past three decades. The dominating issue has been where in the processing sequence ('early' or 'late') selection takes place. Many theories have assumed that there is one, and only one, locus of selection, although some other theories have postulated two or many such places. The second main issue has concerned the mechanism(s) of selection. The selection mechanism has been conceptualized either as a device that prevents or reduces the processing of unwanted items, or as a device that enables or facilitates the processing of the desired items. Selection has been described as either all-or-none (an item is either selected or not) or graded (there is a more or less pronounced difference in processing between selected and nonselected items). Finally, there has been the question of whether selection is a selective transition from one stage of processing to the next, or whether it consists in some modification within a given level of processing. The rest of this section is structured around these issues.
416
3.1
0. Neumann
The Locus of Selection
The early-versus-late-selection issue developed in the early 1960s mainly in two, largely independent, lines of research. The more prominent one, documented in every textbook of cognitive psychology, was rooted in filter theory and started with a controversy about an alternative to filter theory, put forward by Deutsch and Deutsch (1963). The main methodological tool in this line of research was dichotic listening. The other, parallel development was based on paradigms from visual information processing, such as the partial-report paradigm (Averbach and Coriell, 1961; Sperling, 1960), that were initially aimed not at selective attention, but at other information processing issues such as iconic memory, scanning operations and visual backward masking. During the 1980s, these issues lost their prominence, and it became apparent that the earlier experiments had (also) been experiments about visual attention. Although these two research traditions developed largely independently, the logical structure of the early-versus-late-selection issue was strikingly similar. The two approaches shared the following six assumptions (cf. Neumann, 1990): (1) Selection is the selective transition from one stage of processing to a subsequent stage of processing. (2) There is only one such locus of transition. (3) This locus is situated between the unlimited-capacity portion of the system and its limited-capacity portion. (4) Selected stimuli are respresented in conscious awareness, and unselected stimuli are not represented in conscious awareness. The final two assumptions are corollaries to the first four assumptions and relate to what were considered to be the critical data that are needed to decide where the point of transition is located. Both approaches assumed that there were two kinds of such data: (5) If some stimulus aspect x can be used as an efficient selection cue, then this aspect is processed prior to selection, and hence in the unlimited-capacity portion of the system; conversely, if a stimulus aspect is not an efficient selection cue, this is because it is not processed prior to selection. (6) If a stimulus aspect x can be shown to be processed although the subject ignores it and is unaware of it, then this stimulus aspect has been processed in the unlimited-capacity portion of the system. Hence this aspect is processed unselectively, i.e. whenever an appropriate stimulus is presented. These assumptions were made in both the visual and the auditory research traditions, and, moreover, they were shared by most late as well as early selection theorists in both traditions. The only difference was that the point of transition was localized between different processing stages. According to early selection theorizing, it is located at a place before the stimulus has been semantically processed (categorized, fully identified, etc.). According to late selection theorizing, it is located after this stage.
Theories of attention
417
3.1.1 Filter Theory and Late Selection Theory The original filter model of Broadbent (1958) provides a coherent illustration of the six assumptions and their interrelationship. As described earlier, the two main components of the model were STM, with an an unlimited capacity, and the P system, whose capacity was limited. Selection took place in the filter, which was located at the point of transition between these two components (assumptions 1 and 3); and this was the only point of transition where attentional selection was possible (assumption 2). The model was an early selection model because of Broadbent's assumption that only general physical features such as 'pitch, localization, or similar qualities' (Broadbent, 1958, p. 42) are analyzed at the STM level, whereas all further stimulus processing is performed in the P system. This assumption was mainly based on the observation that general physical features are good selection cues, i.e. a relevant message can be efficiently selected if it differs from the irrelevant message with respect to such a feature, or 'sensory channel' (assumption 5). In the 1960s and 1970s, evidence accumulated from dichotic listening experiments that, contrary to this account, unattended stimuli can receive semantic analysis. Among the revelant data (summarized and critically evaluated by Holender, 1986; see also Chapter 6) was the 'own name' effect (the c l a i m - o f t e n cited, apparently never formally replicated, but easy to set up as a classroom demonstrat i o n - that the subject's own name draws attention even if presented in an unattended channel; Moray, 1959); demonstrations of channel switching if the shadowed message was continued in the unattended channel (Treisman, 1960) and of electrodermic responses to unattended stimuli that belonged to a specific semantic category (Corteen and Wood, 1972; Dawson and Schell, 1982; von Wright, Anderson and Stenman, 1975); reports that unattended stimuli affected shadowing latency if they were synonyms of attended stimuli (Lewis, 1970; Treisman, Squire and Green, 1974); and experiments that indicated that the meaning of a word in the unattended channel could disambiguate ambiguous sentences in the attended channel (Dennis, 1977; MacKay, 1973). The most influential account of this type of findings was the late selection view, according to which the unattended stimuli in these experiments receive semantic processing for the simple reason that all unattended stimuli receive semantic processing (Deutsch and Deutsch, 1963; Keele, 1973; Marcel, 1983). This proposal reflected assumption (6), based on assumptions (1), (2) and (4): if there is only one locus of selective transition, then unattended stimuli must be processed unselectively. If stimuli of which the subject is not consciously aware are unattended, then these stimuli cannot be processed selectively. Hence, if some stimuli of which the subject is unaware can be demonstrated to be semantically processed, this implies that all unattended stimuli are always semantically processed. Late selection theory was thus based on essentially the same assumptions as early selection theory. Assumptions (1) to (6) not only dominated theorizing about auditory attention in the 1960s and early 1970s; they were likewise at the basis of work on visual information processing that was not explicitly concerned with attention. An excellent account of this research tradition can be found in Chapter 1. The next section summarizes some of the main work.
418
3.1.2
0. Neumann
Selection from the Visual Information Storage
When theorists began to explore the computer metaphor, their natural interest was in tracing the 'flow of information in the organism' (Broadbent, 1963). An obvious starting point was to ask what happens to stimulus information immediately after a stimulus has been presented. Since the architecture of the system was supposed to consist of a sequence of stores, this was equivalent to asking which was the first store through which the flow of information passes. The answer that emerged in the 1960s was that it was a sensory store of a very short duration (less than half a second according to most estimates), but w i t h - f o r all practical p u r p o s e s - unlimited capacity (visual information storage; VIS). Of interest in our present context are two aspects of this line of research: first, one of the methods that were used to demonstrate VIS and, second, the way in which the VIS was supposed to be linked to the subsequent storage system, short-term memory (STM). To demonstrate VIS, Averbach and Coriell (1961) and Sperling (1960) independently invented the partial-report technique in which a cue or indicator (e.g. a visual bar marker or a tone of variable pitch) tells the subject which stimulus or stimuli to report from a multielement (e.g. letter) display. If the indicator follows the stimulus display after a short interval, performance is essentially perfect, indicating that most of the stimulus information is still available after the end of the exposure. Though this was not stressed by the original authors, this was an experiment not only on sensory memory, but also on visual attention. It demonstrated the subjects' ability to select part of the stimulus information for their verbal report, based on the information from the bar marker or the tone as the selection cue. This experimental paradigm is still prominent in present-day research, with the modification that RT instead of accuracy has become the preferred dependent measure, and that the major research interest is now visual attention, e.g. the kind of mechanism (automatic or 'exogenous' versus voluntary or 'endogenous') that controls the selection process (for reviews see van der Heijden, 1992, and Chapter 1). Sperling's (1960) theoretical account of the partial-report data was in terms of a sequential readout process that transfers the information from VIS to STM. Thus, his view conformed to assumptions (1), (2) and (3): selection was the transition from one stage to a subsequent stage; there was (as far as attention is concerned) only one such locus of transition; and since VIS had unlimited capacity, while STM capacity was limited, this locus was situated between the unlimitedand the limited-capacity portions of the system. Most important, however, is the similarity between Sperling (1960) and filter theory with regard to assumption (5). In Sperling's basic experiment, the stimuli were letters arranged in three rows, and the tone indicated which row to report. The finding that subjects could efficiently use this selection cue implied that information regarding letter location was available in VIS. What about information regarding category membership? Sperling examined this issue by intermixing letters and digits and asking for partial report according to category membership rather than according to spatial position. No partial-report superiority was found, suggesting to Sperling that category membership is not coded in VIS; that is, VIS is precategorical. Though Sperling's data have not always been replicated (Merikle, 1980), the bulk of the subsequent work has confirmed that location, color and other simple physical
Theories of attention
419
attributes are good selection cues, while semantic attributes are poor selection cues (see Chapter 1). This was in excellent accord with the data on dichotic listening upon which filter theory had been based, and the conclusion was the same: attentional selection is early selection. Like Broadbent's early selection view in the auditory research tradition, Sperling's early visual selection view was challenged on the basis of findings indicating that unattended visual stimuli receive semantic processing. For example, masked stimuli were demonstrated to cause semantic priming (Carr et al., 1982; Marcel, 1980), and analyses of partial-report data indicated that selection was not from a precategorical store, but from a 'character buffer' that contains identified letters (Campbell and Mewhort, 1980; Mewhort et al., 1981). This was interpreted as evidence that all visual stimuli receive full semantic processing (Coltheart, 1984; Marcel, 1983). As can be readily seen, this interpretation follows the same logic as the reasoning that had led to the late selection alternative to filter theory, i.e. it is based on assumption (6), which states that if some unattended stimuli can be shown to be semantically processed, then this must be true for all unattended stimuli. As will be remembered, this assumption is based on assumptions (1) (selective transition), (2) (only one locus of transition) and (4) (conscious awareness). Alternatives to these two assumptions will be discussed in later sections. In the next section, we look into the assumption on which Broadbent (1958) as well as Sperling (1960) based their early selection theories. This is assumption (5), according to which the efficiency of attentional selection indicates which kind of coding has taken place prior to selection.
3.1.3 Selection,Coding and Capacity If an attribute can be used as an efficient selection cue, then this attribute must have been coded prior to selection. This is undisputable. The opposite conclusion is, however, not valid. The finding that an attribute cannot be used as a selection cue does not necessarily imply that it has not been coded prior to selection. One illustration is provided by the filter metaphor itself: a mechanical filter, e.g. a sieve, selects among the elements that are put into it according to one attribute, namely size. However, this does not imply that the input elements do not differ with respect to other attributes, such as their color or their specific weight. The filter is simply not sensitive to these attributes. Similarly, the finding that only physical attributes are good selection cues does not imply that only these attributes are coded prior to selection. It could also mean that the selection mechanism is only sensitive to them, but not to semantic attributes. Surprisingly, this logical state of affairs seems to have long been overlooked. To the best of my knowledge, it was first discussed independently by van der Heijden (1981) and N e u m a n n (1980). An extended and updated version of van der Heijden's reasoning can be found in Chapter 1 of the present volume. Essentially, van der Heijden points out that, while data on the efficiency of different selection cues imply early selection, other kinds of data (e.g. from analyses of identification and location errors) demonstrate that several visual inputs (e.g. letters from a multielement display) can be processed in parallel, and that this processing includes semantic (categorical) coding. Within the logic of assumption (5), these two groups of findings seem to be contradictory.
420
0. Neumann
The contradiction disappears, however, as soon as one distinguishes between the early versus late selection issue and the limited versus unlimited capacity issue. According to van der Heijden, all inputs are fully processed (unlimited capacity), but selection can occur only on the basis of simple physical attributes as the selection cues (early selection). Van der Heijden's term for this early selection of fully identified information is 'postcategorical filtering' (see also van der Heijden, 1984; van der Heijden, Hagenaar and Bloem, 1984). This is a clear and convincing position, strongly based on empiric evidence. However, by decoupling selection from capacity, van der Heijden's theory deviated from the consensus of almost all theorizing about attention. It violated not only assumption (5), but also assumption (3), according to which selection takes place at the interface between the unlimited-capacity and the limited-capacity portions of the system. Given the strong entrenchment of these assumptions, it is not surprising that van der Heidjen's position has sometimes been interpreted as a late selection theory (Pashler, 1984). In part, this may have been due to the unusual meaning in which van der Heijden used the term 'early selection' (namely, to denote selection of fully identified stimuli, from a visual store and based on physical attributes). But the difficulty has probably not only been one of terminology. Within capacity theory, it is hard to understand why selection should occur where capacity is not limited. Noncapacity theories that view selection in the context of action control (see above) do not, of course, have this problem. Van der Heijden (1992) therefore regards theories such as those of Allport (1987, 1989) and Neumann (1987) as natural extensions of his point of view. Neumann's (1980, pp. 347-362, 384-442) line of argument resembled that of van der Heijden in its logical structure, but differed from it in its details. Like van der Heijden, Neumann started from an analysis of empiric findings that strongly suggest that only simple physical attributes are good selection cues. Similar to van der Heijden, he contrasted this body of evidence with findings and considerations that speak against early selection (in the usual sense of the term, not van der Heijden's), mainly from the field of subliminal perception and from neurophysiology. Like van der Heijden, he concluded from this apparent contradiction that the superiority of simple physical attributes as selection cues is related to the selection process itself, and not to the level of coding that is reached by unattended stimuli. In trying to explain why physical selection cues are superior to semantic selection cues, Neumann (1980) suggested several distinctions. First, the attentional selection process encompasses the selection of a relevant action (e.g. to be reported) attribute X, based on a (usually, though not necessarily, different) attribute Y, the selection criterion, where X and Y belong to the same perceptual object. (For example, in Sperling's (1960) standard experiment, X was the identity of a letter and Y its location.) Second, a distinction was made between intentional and unintentional selection. In intentional selection, both the relevant attribute X and the selection criterion Y are defined prior to stimulus presentation, while no such definition exists in the case of unintentional selection (for a comprehensive discussion of this distinction see Chapter 5). Further, two cases of intentional selection were distinguished, called 'constant focusing' and 'focus shifts'. In constant focusing, attention stays locked to one perceptual object, as in dichotic listening or in auditory streaming. A focus shift is a displacement of attention from one perceptual object to another.
Theories of attention
421
The superiority of physical selection criteria was explained within this conceptual framework. First, N e u m a n n (1980) noted that the superiority is restricted to intentional selection, whereas semantic selection criteria are at least as efficient as physical criteria in the case of unintentional selection (Moray's, 1959, 'own name' effect). This suggests that the superiority is related to some characteristic(s) of the intentional focusing process. In the case of intentional constant focusing, the task of the system is to keep track of a perceptual object despite its change, e.g. to listen to one message in dichotic listening despite its changing words and sounds. This requires that there is at least one property that constitutes object permanence (Neisser's, 1967, 'primitive unity', extended into the time domain). Object permanence can be constituted by physical properties such as location, pitch a n d / or temporal coherence. (Neumann mentioned some examples from listening experiments; for an extensive analysis see Chapter 3. Another example was the demonstration by Neisser and Becklen (1975), that two visual, dynamically changing scenes, one projected upon the other, can be easily disentangled by selectively attending to one of them.) Since semantic attributes alone cannot constitute object permanence, so the argument went, they cannot be effective selection cues in the case of constant focusing. For example, words that belong to the same semantic class, but differ with respect to their pitch and location, do not form a coherent auditory stream and cannot therefore be the object of constant focusing. As to focus shifts, Neumann (1980) argued that an intentional (unlike an unintentional) focus shift requires that the to-be-selected object is somehow represented in perceptual experience prior to selection, at a hierarchical level different from that of the object itself. For example, in order to shift attention intentionally to the only round letter in a display, its roundness has to be detected prior to the focus shift. Thus, roundness must be represented at both the global level (as an aspect of the overall texture of the layout) and at the local level (as an attribute of the letter itself). The same is true for selection according to spatial position, color, etc. This multiple representation at different hierarchical levels is possible for physical, but not for semantic, attributes. For example, the meaning of a word is represented only at the level of the word itself. The content of a sentence cannot be determined independently of the meaning of the words in it; words have to identified before the meaning of the sentence can be understood. There is no 'semantic texture' at the supraword level. Therefore, according to Neumann's (1980) argument, semantic attributes cannot be used as selection criteria for an intentional focus shift. They are not available at the hierarchical level of conscious representation from which the focus shift is controlled. This account was obviously speculative and contained several shortcomings (for example, it overlooked the possibility that an intentional act of selection can be based on information that is not represented in conscious perceptual experience; see Neumann and Klotz, 1994). The problem why physical attributes are so much better selection criteria than semantic attributes is, however, still unsolved. What seems clear from the work of Neumann (1980) and van der Heijden (1981) is that an explanation in terms of the classical early selection account is not logically required by the data. In other words, assumption (5) cannot be accepted on a priori grounds.
422 3.1.4
0. Neumann
H o w M a n y Loci of S e l e c t i o n ?
Assuming the correctness of assumption (1)- selection is the transition from one processing stage to the n e x t - one obvious question is how many such points of transition exist. Essentially, there have been three answers: one, two and many. As we have seen, filter theory and late selection theories have shared the assumption that there is only one locus of selective transition: the filter or, respectively, the late selection mechanism. This one-locus-of-selection view has been shared by many other theorists. For example, Neisser (1967) distinguished between preattentive processes whose main function is to segregate figural units, and focal attention. Focal attention selects the perceptual objects that will undergo a more detailed analysis and will be subject to constructive processes. Similarly, Kahneman (1973) put forward an account of perception in which attention (effort) was given the role of 'figural emphasis', based on preattentive grouping processes. However, Kahneman's attention (effort) was also involved in all postperceptual processes, and it could therefore be argued that he belonged more to the multipleloci-of-selection camp. Despite its radical departure from information processing theorizing, Neisser's (1976) theory also postulated a single locus of transition. However, this was not really a mechanism of attention. As already mentioned, Neisser's (1976) position was that there are no special mechanisms of attention at all. He held an extreme top-down view of perception, in which schemata are used to pick up information by anticipating it. 'Attention is nothing but perception: we choose what we will see by anticipating the structured information it will provide...What, then, happens to the unattended information? In general, nothing happens to i t . . . ; we simply don't pick it up' (Neisser, 1976, p. 87). Neisser's point of transition was thus, at least in part, located at the sensory periphery; consequently, the exposition of the theory started with a section on selective looking. This was similar to Neisser's emphasis on peripheral interference that we discussed in the section on limited capacity. Neisser (1976) was certainly correct when he stressed the contribution of peripheral mechanisms of selection, which had largely been neglected in previous theories of attention (see Chapter 2, on the state of the art about attention and head and eye movements). On the other hand, there are obvious examples of nonperipheral selection (e.g. dichotic listening, visual selection with short tachistoscopic exposures). Some of Neisser's 'selective information pickup' must therefore have been assumed to take place somewhere in the nervous system; but it seems that Neisser found it uninteresting to specify its locus. Unlike Neisser's (1976) theory of interference, discussed earlier, which provided a seminal new position, his theory of selection was not really developed and had no strong impact on subsequent theorizing. A much more influential theory of one locus of selection has been Anne Treisman's feature integration theory (Treisman, 1988, 1992; Treisman and Gelade, 1980). Like most present-day theories, it is mainly a theory of visual attention, and like most of them, it puts special emphasis on the spatial allocation of attention. The point of transition is, according to this theory, demarcated by different ways in which the visual information is represented. At the preattentive level, the representation is in terms of features such as color, spatial orientation or shape. In order to 'glue' these features together, attention has to be directed to their location.
Theories of attention
423
Feature integration theory has been the second theory put forward by Anne Treisman that has had a deep influence on more than a decade of attention research. The first instance was a two-loci-of-selection theory, in fact the two-loci-of-selection theory (Treisman, 1960, 1964), which was later incorporated into Broadbent's (1971) revision of filter theory. It was originally designed to explain the kind of findings that also gave rise to late selection theories, and that Broadbent (1982) has subsumed under the heading 'breakthrough of the unattended'. At the time, the examples were mainly involuntary shifts of attention to the unattended dichotic channel, based on semantic attributes of the words in this channel. Treisman (1960) accounted for these findings by two modifications of the filter model: an alteration and an addition. The alteration consisted in the assumption that information from the unattended channel is not completely blocked, but merely attenuated. The addition regarded the processes that take place after the information has passed the filter. Treisman assumed that stimulus identification is based on the activation of dictionary units (later usually called logogens; Morton, 1969) with variable thresholds. If a dictionary unit has a low threshold (e.g. the own name, or a word that fits into the attended context and is therefore preactivated), then even an attenuated input may be sufficient to activate it above threshold, and the input will be identified. Broadbent (1971) developed these ideas into a comprehensive revision of the filter model, which accounted for a wealth of data. He continued to use the term filtering for what Treisman had called attenuation, and suggested the term 'pigeonholing' for Treisman's activation of dictionary units (his equivalent for these units themselves was 'category states'). The corresponding attentional sets were stimulus set and response set. By arguing that effects at the level of filtering should have their influence on the sensitivity measure of signal detection theory (d'), while effects at the level of pigeonholing should show up as changes in the criterion measure (beta), Broadbent (1971) was able to account for data from many areas of attention research. An instructive discussion of this model can be found in Chapter 7. Both the original and the revised versions of filter theory were examples of the multistore, information flow theorizing that Atkinson and Shiffrin's (1968) model exemplified in memory research. In the area of memory, this model was challenged by Craik and Lockhart's (1972) concept of depth of processing. A similar line of thought appeared in attention research. It assumed not one or two definite loci of selection, but a continuum of selectivity: 'Selective operations can potentially be performed anywhere along a continuum from early to late in perceptual processing, and nontarget processing can vary from shallow to deep levels' (Johnston and Heinz, 1979, p. 169). Johnston and Heinz (1979) based this generalization on findings indicating that the degree of interference from an unattended channel depended on the kind of selection criterion that the subjects could use (see also Dark et al., 1985). Whereas this multiple-loci-of-selection view has so far not become very popular in experimental psychology (perhaps because many theorists feel that it is unparsimonious and difficult to refute), it has recently found considerable support from neurophysiological, neuropsychological and electrophysiological studies of attention. As pointed out, for example, by Neumann (1990), these findings strongly suggest that there are multiple loci of selection. For example, single unit recordings indicate that visual selection is mediated by brain structures from the superior colliculus to the pul.vinar and posterior parietal cortex. Similarly,
424
0. Neumann
electrophysiological evidence points to a multitude of selection mechanisms, possibly starting as early as the retina. For an excellent summary of many of these data, see Chapter 9. These authors also discuss the interesting idea, supported by electrophysiological data, that the bandwidth of selection becomes smaller from early to late loci of selectivity, i.e. selection starts at a coarse level, at which many stimuli that are similar to the targets are still accepted, and becomes progressively more sharply tuned to exclusively the relevant stimuli. If one takes into account these data from brain research in addition to behavioral findings, then a multiple-loci-of-selection view seems at present to be the best answer to the question 'Where does selection take place?'. Up to now, we have equated this question with the question 'Where are the loci, or is the locus, of transition?'; that is, we have assumed the correctness of assumption (1). This assumption is one of the topics of the next section.
3.2
M e c h a n i s m s of Selection
This section is concerned with three questions. First, is selection performed by inhibiting (attenuating, blocking, rejecting) the unwanted information, or via a facilitation (enhancement, anticipation) of the processing of the desired information? Second, is selection all-or-none or graded? Third, does selection consist of the transition to a further processing stage, or in some kind of modification within a level of processing?
3.2.1
Inhibition or Facilitation?
Among the most influential assumptions of filter theory has been the proposition that attention serves to regulate the flow of information in the sense of determining which portion of the input information passes through all stages, and which information is not further processed. Let us assume for the moment that this is correct. How is the selective transition performed? Obviously, there are two possibilities. It is possible that the nondesired information is somehow prevented from further processing, or it could be that the desired information is somehow given an advantage that enables its further processing. At first glance, one might surmise that the preference for one or the other alternative depends on one's overall conception of the flow of information in the processing system. If the system is viewed as a basically bottom-up device, then the task of selection is to prevent the processing of unwanted information. If the system is assumed to function via top-down mechanisms, then the task is to foster processing of the wanted information. Filter theory, so it seems, was the prototype of the first variant, while Neisser's (1976) schema theory was a straightforward example of the latter. Broadbent's filter was an inhibitory mechanisms for controlling bottom-up processing, whereas Neisser's (1976) 'anticipation' was a facilitatory mechanism for the control of top-down processing. Upon closer examination, the situation is somewhat more complicated. While Broadbent (1958) seems to have thought of the filter as a device for blocking
Theories of attention
425
('rejecting') unwanted information, the filter metaphor as such does not carry this meaning. One might as well argue that a filter provides pathways for the processing of the desired information, while leaving the rest of the information unaffected. Indeed, given the wealth of information that impinges upon the sensory surface at every moment in time, it is very hard to think of mechanisms that actively affect the processing of all of them in a negative manner, while leaving the processing of the comparatively small amount of attended information unaffected. Such a system would require an immense amount of inhibition and would therefore work very inefficiently. Thus, although Broadbent (1958) may have visualized the functioning of the filter as an active 'rejection', a (both technologically and biologically) more plausible interpretation of the filter metaphor would be in terms of selective facilitation. Things are similar for the 'attenuation' metaphor. As reported by Neisser (1967, p. 212), Treisman explained that it referred to the signal-to-noise ratio between the selected and the nonselected message, with respect to information content. While she seems to have assumed that this ratio is reduced in all unwanted messages, the same effect could be obtained by improving the signal-to-noise ratio in the wanted message; that is, by enhancing the wanted information relative to its noise background. Indeed, virtually all theories of selection seem to be compatible with a selective facilitation view. More recent approaches have in fact exhibited a preference for facilitation. One reason has been that neurophysiological single-unit studies show facilitation (enhancement) effects associated with selective attention, while there is much less evidence for inhibition (see Allport, 1987, 1989; Neumann, 1990). Likewise, connectionist models of selective attention have relied more on facilitation than on inhibition (Cohen et al., 1990; Phaf et al., 1990). On the other hand, it would be premature completely to dismiss inhibitory mechanisms. First, inhibition has been reported in some animal studies of visual attention at the single unit level (for a review see Neumann, 1990). Second, while attention units themselves are not inhibitory in the models of Cohen et al. (1990) and Phaf et al. (1990), they change the conditions for intralevel inhibition and thereby help one unit to inhibit the others and win the intralevel competition. One might say that they are facilitatory, but also facilitate inhibition. Third, there are findings from behavioral studies that are easier to interpret in terms of inhibition than in terms of facilitation. One example is negative priming, as first reported by Allport, Tipper and Chmiel (1985). If the component of Stroop-like stimuli that was irrelevant in the experimental trial n becomes the relevant component in trial n + 1, this causes an increase in RT, presumably because the response to the irrelevant component had to be suppressed, and this suppression is still operative in the subsequent trial (for a review see Tipper, 1992). These observations suggest that it would be unwarranted completely to dismiss inhibition as a mechanism of selection, in addition to facilitation. Some theorists have proposed that inhibition might play a role, especially at output. As mentioned earlier, Neumann (1987, 1992) has argued that the problems of effector recruitment and parameter specification can be solved only if alternative actions that would engage the same effectors, and alternative ways of performing an action~ are efficiently inhibited. This is similar to the suggestion of Shallice (1972, 1978) that one task of attention is to assure the dominance of one action system over competing action systems by inhibiting them.
426 3.2.2
0. Neumann
A l l - o r - n o n e or G r a d e d Selection?
Both facilitation and inhibition can be conceived of as all-or-none or as graded. Filtering in the original filter model was purely all-or-none; an input item was either accepted or rejected by the filter. Selection in Neisser's (1976) schema theory was also all-or-none; input information was either picked up or discarded. Other theories have assumed some variant of graded selection. Graded selection could exist in several forms. The most obvious variant would be a continuum of facilitation or inhibition, from completely unattended stimuli (no facilitation or maximum inhibition) to fully attended stimuli (maximum facilitation or no inhibition), with the possibility of all gradings in between (variant A). A second variant might assume that all attended stimuli receive full processing (maximal facilitation or no inhibition), but unattended stimuli are also facilitated or inhibited to some degree (variant B). Third, it could be that unattended stimuli receive no facilitation, or maximal inhibition, but there is a continuum of facilitation or inhibition for attended stimuli (variant C). Some theories have been explicit about which of these options they represent, while others are difficult to classify. One example of the former class is the Treisman-Broadbent revision of filter theory. As we have seen, one change from the original filter model to the revised model consisted of the replacement of blocking by attenuation, which was an instance of variant B (no inhibition at all of attended stimuli, and some, but not a complete, inhibition of unattended stimuli). The lowering of the thresholds of dictionary units (category states) was an example of variant A (a continuum of threshold values). Another example of theories that allow a precise interpretation are connectionist models that encompass attention units which selectively activate the units in the content domains. If the input from an attention unit may be either zero or have some positive value, this constitutes an example of variant C. One example of a theory that is difficult to classify is Kahneman's (1973) account of selection. As mentioned earlier, he viewed selection as the selective allocation of effort to some mental activity in preference to others. In the chapter on attention and perception, Kahneman described the flow of information from sensory registration to response selection and identified two stages that are affected by attention: figural emphasis and response selection. Figural emphasis was related to figureground organization: emphasized objects are perceived as figures, while the rest of the objects constitute the ground (Kahneman, 1973, p. 79). Since figure-ground organization is an all-or-none phenomenon, this seems to suggest that selection at the stage of figural emphasis is all-or-none, similar to the selection of one input channel according to the original filter theory. On the other hand, Kahneman stated that 'some of the units isolated earlier receive greater Figural Emphasis than others' (Kahneman, 1973, p. 68), which seems to imply graded selection, perhaps similar to the Treisman-Broadbent revision of filter theory. In more recent theorizing, both variants of Kahneman's (1973) views about figural emphasis have been elaborated within a more general discussion about the spatial characteristics of visual attention. The figure-ground idea has been developed by theorists who assume that visual attention is directed to objects (Duncan, 1984; Wolff, 1977), which may be classified as an all-or-none theory of selection. The idea of a continuum of more or less figural emphasis has been elaborated in theories that conceive of spatial attention as a gradient of varied intensity (Hughes and
Theories of attention
427
Zimba, 1987), corresponding to graded selection, variant A. A somewhat different view is expressed by the zoomlens or searchlight metaphor, according to (one variant of) which attention is distributed over a more or less extended spatial area, with an inverse relationship between the extension and the intensity of attention (Eriksen and St James, 1986). This is an instance of variant C. With respect to the all-or-none versus graded issue, these more recent theories can thus be classified according to the same criteria as the classical theories. However, they differ from them in one important respect: they equate the allocation of attention not with the transition from one stage of processing to the next, but with a modulation within one stage of processing. The next section looks into this theoretical alternative, which is closely associated with the relationship between attention and conscious awareness. The topics of this section are thus assumptions (1) and (4).
3.2.3
Transition or Modulation?
The idea that attentional selection involves a selective transition from one processing stage to the next has been among the most influential assumptions from filter theory. As we have seen, it was shared by late selection theories, which located the point of transition at a more central place in the processing sequence. In Neisser's (1967) theory, this point was situated between the preattentive, parallel processes that segment the input into perceptual units and the subsequent constructive processes that elaborate it sequentially. Similarly, Kahneman (1973) located it between early processes that segregate and group figures, and the stage of figural emphasis. For two-process theories (Kerr, 1973; Posner, 1978) the transition was from effortless, automatic, obligatory processing ('pathway activation') to effortful, optional, controlled processes. Even Neisser (1976), who departed radically from the information processing framework, held the selective transition view. In his theory, the transition occurred when information was picked up by a perceptual schema. If a theoretical assumption is so widely held, this suggests that it is viewed as having a high degree of a priori plausibility, independently of supporting empiric evidence. There have probably been two main reasons why the selective transition assumption has appeared so plausible. The first has already been discussed: within filter theory, the concepts of limitedchannel capacity and of a selective transition were functionally interdependent; a selective transition was required to protect the central channel from overload. It is less obvious why late selection models should adhere to the selective transition assumption. If selection occurs between full perceptual analysis and the control of open behavior, as suggested, for example, by Deutsch and Deutsch (1963), then all that is required is a mechanism that determines which stimulus is to control behavior. Indeed, Deutsch and Deutsch suggested just this type of mechanism: essentially, a device that finds the highest value among a set of values, not unlike models of response selection in RT research. However, as we have already seen, late selection has usually been interpreted as a variant of the information flow model, with the filter simply located later in the processing sequence than assumed by early selection theories. As to capacity supply theories, they are, in principle, compatible with both a selective transition and a selective modulation view.
428
0. Neumann
However, most of them have assumed early processes that do not demand capacity, which implies a selective transition from these processes to capacity-demanding processes. A further likely reason why the selective transition assumption has enjoyed such a high popularity is rooted in a conviction that dates back to the earliest theories of attention in ancient philosophy (see Neumann, 1995). Ever since Aristotle first described the phenomenon of selective attention, there has been a strong tendency to equate it with the phenomenon of selective access to conscious awareness. At least since Descartes, it was commonly assumed that this selectivity was located not at the sensory periphery, but at some more central site, e.g. between the purely mechanical brain processes that also occur in animals and mental processes proper (Descartes), or between the nonconscious 'minor perceptions' and full apperception (Leibnitz). In 19th century psychology, there was a kind of standard model according to which all sensory stimuli produce sensations, but only some proportion of these sensations is integrated into conscious perceptions (Ziehen, 1890; see below). Perhaps the most elaborate modern version of this idea has been Marcel's (1983) theory of recovery. He suggested to 'separate functionally those representations automatically yielded by and utilized by perceptual analyses from those of which we are conscious' (Marcel, 1983, p. 243). The former processing stage produces nonconscious representations of all incoming stimuli ('perceptual records'), i.e. it works in a purely bottom-up manner. The selective transition to the second stage, or recovery, is identical to a transition into consciousness. 'A conscious percept is obtained by a constructive act of fitting a perceptual hypothesis to its sensory source' (Marcel, 1983, p. 245). Although Marcel (1983) did not explicitly declare this theory as a theory of attention, he equated the stage of consciousness with focal attention (Marcel, 1983, p. 254) and the act of recovery with selectively attending (p. 271). This model (to which we shall return in the section on perceptual activity and voluntary control) was thus strikingly similar to 19th century theorizing. At a more general level, the classical tradition has found its sequel in modern two-process theories that have contrasted automatic and controlled information processing (Posner and Snyder, 1975; Shiffrin and Schneider, 1977). Usually, this distinction has been more or less explicitly equated with the distinction between unconscious and conscious processes (for reviews see LaBerge, 1981; Neumann, 1984, 1989; Schneider, Dumais and Shiffrin, 1984; Chapter 6). More recently, functional differences between conscious and nonconscious processes have been explored in numerous empiric research contexts, including explicit versus implicit learning and memory, discrimination of subliminal stimuli, and direct parameter specification (for a recent summary of the state of the art, see Umilta and Moscovitch, 1994). There have thus been several reasons why the view that attentional selection is a selective transition from an early (preattentive, automatic, nonconscious) stage to another (late, controlled, conscious) stage of processing has come to be regarded as almost self-evident. There is, however, the alternative that selective attention causes a modulation within a level of processing or representation, instead of a transition. This notion also has a long historical tradition. It has existed in several variants. One has maintained that attentional selection takes place not at the point of access to conscious awareness, but within consciousness itself. A complementary view has assumed that there is attentional selectivity even for stimuli that do not reach, or have not yet reached, a conscious representation. Both views are not mutually
Theories of attention
429
exclusive. Further, there have been approaches that have conceptualized attentional selection as a modulation without committing themselves as to the consciousness issue. The idea that attention has its effects within consciousness itself rather than at access to it dates back to the 18th century (see Neumann, 1995). It was put forward by the French philosopher Etienne Bonnot de Condillac (1715-1780) and elaborated by the Scottish philosopher Dugald Stewart (1753-1828). According to Condillac (1947) and Stewart (1792), we are conscious of all stimuli, but unattended stimuli are immediately forgotten and can therefore not be reported. Attention serves to strengthen the representation of a stimulus, so that we become more conscious of it than of unattended stimuli, and can remember it. A modern version of the view that attention counteracts forgetting and thereby enables a verbal report of the attended stimuli has been suggested by van der Heijden (1981). However, van der Heijden's theory has been formulated in purely functional terms, without any direct reference to consciousness. The most influential proponent of selection-withinconsciousness in 19th century psychology was Wilhelm Wundt. He distinguished between the 'field of consciousness' (Blickfeld des BewuBtseins) and the 'focus of consciousness' (Blickpunkt des BewuBtseins). The focus of consciousness, which encompassed only part of what was represented in consciousness, was determined by the direction of attention. Thus, a shift of attention was equivalent to bringing some contents from the field of consciousness into its focus, while others, which had been in the focus of consciousness, receded into the field of consciousness. In the section on the nature and functions of consciousness we shall return to this distinction in connection with Wundt's concept of apperception, which was his theoretical equivalent of attention. For Wundt, 'consciousness' was thus the more inclusive concept, and 'attention' was a function that had its effects within consciousness. While this view has not been shared by many modern theorists, there have been some authors who have considered the possibility that attention causes a modification within conscious experience instead of controlling access to consciousness. One example is Neumann (1980, 1990). As already mentioned, Neumann (1980) argued that the intentional selection of a perceptual object requires that it is somehow represented in conscious awareness before attention is focused on it. The underlying view of perceptual representation was mainly taken from Wolff (1977). It was assumed that the structure of the visual world consists of nested representations at several hierarchical levels (e.g. text p a g e - w o r d - s y l l a b l e - l e t t e r - p a r t of a letter), and that attention can be focused only on a single perceptual object or group of objects at a given hierarchical level. Attention 'moves within this representation of the situation' (Neumann, 1980, p. 416), either by switching to a different hierarchical level ('zooming'), or by shifting to a different object or group of objects at the same level ('camera shift'). That which is represented at the nonselected levels, and the nonfocused objects and groups at the selected level, may be termed 'preattentive', but, as Neumann (1980) stressed, this does not imply that they have only been processed in a crude, global manner, as assumed by Neisser (1967). They differ from the attended objects or groups only in that they are not included in an ongoing action. Thus, according to this scheme, the focused object or group was similar to Wundt's 'focus of consciousness', while the rest of the representation of the situation was similar to his 'field of consciousness'. However, N e u m a n n (1980) did not address the question of whether the unfocused representations can in some
430
0. Neumann
respect be called conscious, as Wundt had assumed. In a later paper, Neumann (1990) discussed this issue within a modified version of the 1980 approach. The representation of the situation was now called an internal model, and it was argued that one of the mechanisms of 'visual attention' (which was no longer regarded as unitary) consists of the updating of some aspect of this internal model, in response to a change in the stimulation. Neumann (1990, p. 257) suggested that the updating usually produces awareness of the selected stimuli. A more difficult question concerned the status of the internal model if there is no updating, and of the portions of it that are not updated. Neumann suggested from introspection that 'I am no longer attending to it, and yet it would be incorrect to say that I am not conscious of it. Metaphorically speaking, it has receded into the background of my consciousness' (Neumann, 1990, p. 258). This position was essentially identical to that of Wundt, and Neumann related it to Allport's (1988) argument that the term consciousness does not refer to a unitary phenomenon. However, introspection is of course unreliable, and one might well argue that the impression that something is in the 'background of consciousness' is actually based on occasional brief shifts of attention to these representations. As mentioned, the option that attentional selection involves an intralevel modification rather than a selective transition is not restricted to consciousness. Neumann (1990) suggested that, besides the mechanisms that subserve an updating of the internal model, there are phylogenetically older mechanisms that select information for the control of action. Part of the evidence for these mechanisms comes from single-cell recordings and from ERP research. These data suggest that attention modifies output strength, whereas there is little evidence that attended stimuli are processed in subsystems that nonattended stimuli do not reach. One example of such a modulation that does not seem to be related to conscious awareness is the response to a change in auditory stimulation, the so-called mismatch negativity (see Chapter 9). The physiological evidence seems thus to point to a selective modulation rather than a selective transition. Behavioral research has recently also tended toward this position, though the issue does not seem to have often been discussed explicitly. One dominant experimental paradigm in the last decade has been the cueing paradigm that dates back to Averbach and Coriell (1961) (see above), but has become popular in the version of Posner (1980; Posner et al., 1980). Essentially, a cue is used to direct the subject's spatial attention to a stimulus object or an area in the visual field, and the effect of this manipulation is registered as a change in response latency relative to a neutral condition. Many interesting effects have been discovered with this paradigm (for summaries see van der Heijden, 1992); but the important point in our present context is that the effect of selective attention is to speed up (or slow down) processes, i.e. to modify them. Similar (but not identical) effects of cueing have also been found with psychophysical judgements as the dependent variable (Neumann, Esselmann and Klotz, 1993). Similarly, connectionist models of attention have usually been selective modification models. While there must, of course, be a selective transition to the response level (only one response can be selected at a time), the effect of attention at the other levels has usually been assumed to consist of a modification of the pattern of activation over the units, with attention enhancing the activation of the desired unit(s) (Cohen et al., 1990; Phaf et al., 1990).
Theories of attention
431
Taken together, the developments and alternatives that have been discussed in this section indicate a shift in the dominant view of attentional selectivity away from the classical approach as it was represented by filter theory. As for the capacity issue, there seems to be a trend towards a multitude of mechanisms instead of a unitary construct of attention (cf. Neumann, 1992). Early and late selection are no longer viewed as mutually exclusive; facilitation seems to be the basic principle of selection, but accessory inhibitory mechanisms are not excluded; and it seems that modulations within processing stages have to be assumed, at least in addition to selective transitions. Many of these changes reflect the influence of the brain sciences and of connectionist modeling, but behavioral data have had their share in bringing them about.
4
THE NATURE
AND
FUNCTIONS
OF ATTENTION
Most theories of attention encompass explicit or implicit assumptions about the nature and function(s) of attention, i.e. about the kind of task(s) that attentional mechanisms fulfill. The following summary of these different approaches will be brief, because most of them have been mentioned in passing in previous sections. However, the present perspective will be somewhat different. With a few exceptions, this chapter has not been concerned with the historical predecessors of modern theories of attention. The issue of the nature and functions of attention invites a summary in terms of historical lines of development.
4.1
Concepts of Attention
The historical evolution of the concept of attention may be viewed under two aspects (Neumann, 1971, 1995). There is, first, the gradual development of a descriptive concept of attention, which refers to empiric observations of attentional phenomena. This development began in ancient philosophy and was largely completed in the 18th century; the phenomena that authors such as Condillac, Leibnitz or Stewart discussed under the label of attention were essentially the same that we still class under this heading. Second, diverse theoretical concepts of attention evolved. Until the late 18th and early 19th century, theoretical discussions of attention were usually embedded in more general philosophical issues; often epistemological, but sometimes even theological, as in Malebranche (1967a). Stewart (1792) may have been the first author who wrote a whole chapter on attention in which he discussed it as a problem in its own right. Most of these early authors failed to draw a clear distinction between the empiric and theoretical meanings of the term attention. Theoretical considerations were usually based on one particular empiric aspect of attention, which was assumed to reflect its nature. This style of theorizing continued into the 19th century, when attention became one of the central subjects of scientific psychology. In his comprehensive work on 19th century theories of attention, Pillsbury (1908) concluded that 'each has picked out some more or less important concomitant process
432
0. Neumann
or some aspect of attention and regarded it as the explanation of all the remaining parts or aspects.' (Pillsbury, 1908, p. 292). This suggests distinguishing the different historical views of attention according to the empiric aspects of attention upon which their theorizing was mainly built. These theoretical strands have been continued into modern theorizing, so that the differences between them can also be used to classify the present-day approaches. N e u m a n n (1971) reviewed the development of the empiric concept of attention and suggested that it has encompassed three main components: attention is limited (the capacity aspect), it is selective (the selection aspect), and it can be allocated voluntarily (the activity aspect). Each of these aspects has been stressed by one theoretical tradition. A division into these three major strands provides a convenient way of ordering the different views of the nature and functions of attention, although there have certainly been theories that have belonged to more than one of these traditions.
4.2 Coping with Limited Capacity The first strand of theorizing has focused on the limited-capacity aspect of attention, the 'narrowness of consciousness' in classical terminology. It dates back to Aristotle and has, of course, dominated most modern theorizing. Its main concern has been to explain how and why capacity is limited, and how this limitation is dealt with. The attentional phenomena therefore fall into two classes: those that are the direct consequence of the limitation (e.g. interference), and those that can be attributed to coping with it (e.g. selection). In filter theory, each of these two phenomena had its own underlying mechanism (the P system and the filter), whereas capacity supply and resource theories have attributed them to two aspects of the same construct, capacity: it is limited, and it can be selectively allotted. As to the functions of attention, limitation does not have a function; it just arises from the physical characteristics of the system. If attention can be said to have a function, it resides in the selection component or aspect, which serves to counteract some of the adverse effects of the limitation. Selection is, however, not the only possible way of coping with limited capacity. Probably the first consideration of how interference can be coped with dates back to Aristotle (see Neumann, 1995). As mentioned earlier, Aristotle developed his theory of attention in one of his shorter writings, 'On the senses and the sense objects'. The basic observation was that we fail to notice even what occurs before : our eyes if we are lost in thought, or frightened, or if we hear a loud noise. The reason is, according to Aristotle, that a stronger movement (in the soul) inhibits a weaker movement, so that we do not perceive the weaker stimulus. This was a mechanistic view of attention that attributed limited capacity to a mutual interference between processing operations; an idea that has reappeared in modern theories such as those of Allport (1980) and Navon (1985), discussed earlier. Aristotle did not content himself with explaining why capacity is limited; he also discussed when and how interference can be prevented. He argued that, if both stimuli are about equally strong and are in the same sense modality, neither will completely inhibit the other. Rather, they will be mixed and form a common percept. Though this renders each of them more difficult to perceive, this has the
Theories of attention
433
positive effect that they can be perceived simultaneously. This answers the question (which Aristotle had already discussed in 'On the soul') whether several things can be perceived at the same time: they can, but only if they are integrated into a single whole. Interference can be overcome by integration. This idea has been revived in modern theorizing. For example, Neumann (1987) has suggested that alternative actions inhibit each other, but that this inhibition can be overcome if they are controlled by either a common superordinate skill or a common action plan. Task integration turns a dual-task situation into a single-task situation as, for example, in the case of the eye-voice span in oral reading, where the reader identifies one part of the text while at the same time pronouncing a different text passage, identified earlier. Recently, Korteling (1994) has taken up these ideas and developed them within a cognitive neuroscience framework. Besides selection and integration, automatization has long been discussed as a further means of preventing or overcoming capacity limitations. The idea that part of our brain processes are automatic was clearly expressed by Descartes (1973), who believed that all processes that can also be found in animals are automatic, i.e. they are performed without an intervention of the soul. Malebranche (1967b) developed this idea in his discussion of perception. Anticipating Helmholtz's (1924) doctrine of unconscious inference, he suggested that we perceive distance in depth by means of a 'natural judgement', which occurs so fast that we do not become aware of it. In the 18th century there was a lively debate about whether automatic processes occur without attention, as maintained, for example, by Hartley (1749), or whether they are accompanied by fast shifts of attention that we do not, however, consciously remember. The latter position was defended by Stewart (1792), who used the example of the juggler who shifts attention rapidly between balls, while Hartley's example was playing the cembalo while at the same time conducting a conversation (an example that was experimentally explored by Allport et al., 1972!). The idea that automatic processes are not subject to capacity limitations and interference was commonly held in the 19th century (see Neumann, 1989). As discussed in earlier sections, the distinction between automatic and controlled processes returned in psychology in the 1970s (see also Chapter 6). The automatic-controlled distinction is an example of an issue that cuts across the boundaries of the three theoretical traditions. Within the capacity-oriented view of attention, it expressed the idea that there are capacity-free processes. Within the activity-oriented view of attention to which we now turn, it has referred to processes that are not under top-down control.
4.3 Perceptual Activity and Voluntary Control This second thought tradition has had its roots in the dualist doctrine that was largely founded by Plato. It was developed in the roman area by philosophers such as Lucretius and Augustinus, and has had many adherents throughout the history of theorizing about attention, from Descartes and Malebranche to Wundt and James. Its proponents have viewed attentional processes as an inner activity, often conceptualized as a voluntary effort. For Lucretius, conscious perception could occur only if the soul directs itself towards a sense object. Augustinus maintained a radically dualist position, according to which matter could not causally affect the
434
0. Neumann
soul. Perception was not a direct result of stimulation, but an active response, which requires an effort by the soul, called 'attentio' (from ad = towards and tendere = to tense). Elements of this doctrine appeared in Descartes' theory of attention, according to which the soul directs attention by moving the pineal gland in an appropriate direction or by maintaining it in a fixed position, as in the case of sustained attention. Malebranche took up this view and accentuated it by speaking of the 'labor of attention', which is required to resist temptations of the soul. In the 19th century, Wundt and James were the most prominent proponents of this view of attention. Wundt's theoretical term for attention was apperception, which he considered to be an inner voluntary action. For James, voluntary attention was an expectation or anticipation: a 'preperception', which preactivates brain centers for the expected or anticipated object and may be regarded as 'half of the perception of the looked-for thing,' (James, 1890/1950, p. 442). It should be noted that both Wundt and James recognized that there is also a passive, involuntary form of attention. This was also in the tradition of Descartes, who had called this involuntary attention 'admiration'. All theories within this tradition have related attention to conscious perception and to an inner, voluntary activity. However, they have differed with respect to the relative emphasis that they have put on these two aspects. Augustinus wanted to explain perception, and he did so by invoking an activity of the soul. Malebranche focused on attention as a voluntary action, not necessarily related to perception. Wundt put about equal emphasis on both aspects, but regarded volition as the essence of attention, whereas James' theory was one of sensory attention, and he did not generally equate attention with volition. The modern variants of the attention-as-activity approach exhibit an even sharper segregation. Some have focused on the effort aspect without relating it explicitly to perception (effort theories), while others have taken up the perception aspect without relating it to volition (construction theories). Both directions appeared relatively late within modern theorizing about attention. In the original version of filter theory there had been no place for top-down processes. This was true for the P system as well as for the filter itself. At the end of Perception and Communication, Broadbent (1958) summarized his theory in 12 postulates. Three were concerned with selection. They referred to factors such as stimulus properties and previous reinforcements, but did not mention the voluntary control of selection. Though the subjects in dichotic listening experiments were instructed to attend intentionally to one of the channels, the theory actually failed to explain how this could be done! Theories that included top-down processes were put forward in the early 1970s. Broadbent (1971) extended filter theory to include regulatory energetical mechanisms, an 'upper' and a 'lower' mechanism (see Chapter 7). Kahneman (1973) presented his theory of effort, and Pribram and McGuinness (1975) introduced the distinction between arousal, activation and effort as three regulatory mechanisms, a scheme that was later taken up by Sanders (1983). These theories have been discussed in earlier sections. In our present context, the important thing to note is that at least one proposed type of mechanism (Broadbent's, 1971, upper mechanism; the effort mechanisms of Pribram and McGuinness, 1975, and Sanders, 1983; and in part Kahneman's, 1973, effort) was in the tradition founded by theorists such as Augustinus, Descartes and Malebranche: the function of attention is top-down control, and this requires effort to counteract adverse influences or compensate for
Theories of attention
435
insufficiencies of other mechanisms. More recent approaches have continued this line of thought. Often they focus on voluntary control, with less emphasis on energetics (Norman and Shallice, 1986; Posner and Petersen, 1990). While these modern theories have continued the line of thought that relates attention to volition, a second strand has focused on perception as the product of active attentional processes. As we have seen, Augustinus had maintained that conscious perception is not simply a consequence of stimulation, but the result of an activity of the soul that he called attention. This basic idea has taken many forms in subsequent theorizing. All theories of this type have shared the assumption that perception is not exclusively determined by sensory stimulation, but also by central processes that interpret the stimuli and actively construct a coherent representation of the world. Usually, this distinction between two types of determinants has been paralleled by a distinction between two stages of processing: a first, unselective, data-driven ('automatic') stage and a second, selective ('controlled') stage at which the constructive processes take place. Theories of this type have therefore in most cases also been selective transition theories, and several of them have already been discussed in the section on selective transition or selective modulation. In our present perspective, the interesting aspect is historical continuity. Several of the selective transition theories are in the tradition of Augustinus, Descartes, and of Leibnitz, on whom Wundt's doctrine of apperception was based. As we have already seen, Wundt distinguished between the field of consciousness and the focus of consciousness. Apperception selected the focus of consciousness. However, this was not its only function. There was not only analytic apperception, which isolates a content, but also synthetic apperception, which creates new wholes, from perceptual structures to concepts. Thus, apperception was involved both in perception and in all higher cognitive activities. In modern theorizing, it was probably Neisser (1967) who came closest to this view of attention. For Neisser, one of the functions of attention was to select the 'field of focal attention' (Neisser, 1967, p. 88), thereby making the stimuli in it available for further analysis. However, this was actually more than an analysis. According to Neisser, it is 'important to think of focal attention as a constructive, synthetic activity' (Neisser, 1967, p. 94). Like Wundt's apperception, this synthetic activity was not restricted to perception. In his chapter on memory and thought, Neisser proposed that the 'secondary processes of directed thought and deliberate recall are like focal attention in vision. They are serial in character, and construct ideas and images' (Neisser, 1967, p. 304). This was essentially Wundt's doctrine of apperception, and even the terms were similar ('focus', 'synthesis'). Neisser's (1967) two-stage view was taken up by the two-process theories of the late 1970s (Posner and Snyder, 1975; Schneider and Shiffrin, 1977; for an analysis of the similarities between these theories and Wundt's doctrine see Neumann, 1989). A more recent view of attention that belongs to this tradition is Marcel's (1983) two-stage model, already mentioned in the section on selective transition or selective modulation. In the present context, the point of interest is that Marcel not only proposed a selective transition, but that his second stage involved a synthetic activity. Phenomenal experience consists of 'the imposition of a particular interpretation' (Marcel, 1983, p. 243), which is performed by 'a constructive act' (p. 245) in 'an attempt to make sense of as much data as possible' (p. 248), which involves 'the use of hypotheses as frameworks to both recover and synthesize the information'
436
0. Neumann
(p. 249). As Marcel (1983) stressed, this is more than a selective transition. Access to consciousness is not simply like opening a door, rather it involves an inferential step (p. 250). The idea that attention serves a constructive, synthetic function can also be found in theories of a more limited scope. The most prominent example is Treisman's feature integration theory, discussed earlier (Treisman, 1988, 1992; Treisman and Gelade, 1980). Though it differs considerably from theories such as those of Neisser (1967) and Marcel (1983), its view of the basic function of attention is similar: attention is required to integrate the features of an object that are represented in separate maps at the preattentive level. In this sense, attention has a synthetic function. Unlike Neisser (1967), Treisman does not, however, extend this synthetic function to processes beyond the level of object perception, and she does not relate it to an inner activity. Another example of such a more restricted synthetic function of attention is provided by theories such as that of Underwood (1976), according to which attention is required to integrate novel stimulus sequences, as in sentence comprehension (see Chapter 6).
4.4
Selection, Memory and Action Control
In both theoretical strands discussed so far, selection has been regarded as an important aspect of attention, but not as its central aspect. Within the capacity-oriented tradition, selection has been a functional consequence of limited capacity. In the activity-oriented tradition, it has been one of the functions of attentional activity, besides synthesis. The thought tradition to which we now turn has regarded selection as the basic feature of attention. This tradition had its earliest roots in Locke's associationism and was worked out by Condillac and Stewart in the 18th century, whose theories have already been mentioned. Among its most prominent 19th century proponents were Helmholtz, Ribot, Titchener and Ziehen. Much of the recent research on selective attention, e.g. the cueing experiments discussed earlier, has been in its spirit. Modern theories of attention that belong to it have been put forward by Deutsch and Deutsch (1963), Allport (1987, 1989) and van der Heijden (1981, 1992), among others. The theorists in this tradition have shared not only similar views of attention, but also a common scientific attitude. It has been more mechanistic than that of the other two traditions, and more critical of commonsense concepts. It has tended to detect problems where others believed they had found solutions, and it has not shunned away from introspectively implausible assumptions. Its proponents have often been polemic against capacity and activity-oriented approaches, which they have regarded as nonexplanatory. Many, though not all, theorists in this tradition have tried to explain why attention exists at all, i.e. what are its functions for the organism. As discussed earlier, the concept of limited capacity was the theoretical derivative of the phenomenal experience that the range of attention is limited, and the concept of attention as an inner voluntary activity was the theoretical counterpart of the observation that we can direct our attention at will. The key to an understanding of the selection-oriented approach is that its proponents have denied that simply giving a phenomenal experience the status of a theoretical construct is an
Theories of attention
437
acceptable explanation. At a functional level, attention may well be fundamentally different from its phenomenology. Some of these theorists have maintained that capacity is unlimited, although the range of phenomenal attention is not, and all have proposed that selection occurs according to mechanical principles, despite our phenomenal experience that the self is directing it. The reality of these phenomenal experiences has, of course, not been denied by these theorists. But they have regarded them as the explanandum and not the explanans. The explanans has been sought in the mechanisms of attentional selection. The argument between the voluntarist, activity-oriented and the mechanistic, selection-oriented approach was, in a way, preprogrammed by Descartes' views of attention (for details see Neumann, 1995, ch. 1). Descartes the metaphysician regarded attention as an activity of the soul that moves the pineal gland (see above). Descartes the physiologist held a mechanistic view according to which the basis of attention is an afflux of little particles in the nerves (animal spirits) to the locus in the brain where an impression has been generated, thereby enhancing this impression. The former concept of attention was taken up by nativists such as Leibnitz, while the latter was further developed by empirist theorists. The empirist tradition dates back to John Locke. For Locke, the mind consisted of ideas and their associations. Among the ideas were sensations, and all sensations were conscious. Thus, attended stimuli were simply those that caused a conscious sensation, and unattended stimuli were those that, for various reasons, did not. There was no need for an inner attentional activity. This was both a mechanistic, selection-oriented view and an early selection view. However, early selection is not a necessary feature of selection-oriented theories. In fact, many have been late selection theories. What unites these theories is not their position in the early versus late selection debate, but their conviction that, to explain attention, one has to understand how selection works. Perhaps the most outspoken proponent of this position in the 19th century was Theodor Ziehen, who was strongly influenced by British associationism. Ziehen (1890) argued vigorously against Wundt's concept of apperception, which he regarded as a cheap deus ex machina that does not explain anything. He held an early selection position. There were two processing stages, the stage of sensations and that of ideas. We become aware of sensations only when they are assimilated to an idea. Attentional selection is based on two kinds of competition. First, all latent ideas (ideas that are not presently conscious) compete for becoming conscious, and only the most strongly activated idea will actually enter consciousness. Second, there is a competition among the sensations. Sensations that are assimilated to the winning idea become conscious, while all others remain unconscious and have no effect whatsoever. The outcomes of these competitions are determined by many factors, which Ziehen analyzed in detail, e.g. the intensity of the sensations, their similarity to the latent ideas, the emotionality of the sensations, and contextual facilitation or inhibition of ideas (the so-called constellation). As can be seen, the architecture of this theory is quite similar to that of contemporary connectionist models of attention (Cohen et al., 1990; Phaf et al., 1990). There are two layers (sensations and ideas); there are facilitatory connections between them (sensations activate ideas); and all units within a layer inhibit each other. Further, the interplay between facilitation and inhibition leads to the effect that only one idea will win the competition and become conscious ('winner-takesall'). The similarity to the model of van der Heijden (1992), which describes
438
0. Neumann
attention as selection, based on intermodule facilitation and intramodule competition, is also readily apparent. Ziehen summarized his theory with the statement that the 'act of attending consists of nothing but this selection among simultaneous sensations' (Ziehen, 1920, p. 428). A modern author who used almost the same words to charactize his theory was Neisser (1976). His formula was that 'attention is nothing but perception' (Neisser, 1976, p. 87). However, perception was for Neisser (1976) an active process, so that his theory may belong more properly to the attention-as-activity camp. There is one assumption in Ziehen's theory that he does not justify: only one idea can win the competition, and therefore only one group of sensations (those that are assimilated to the winning idea) can become conscious. Ziehen's failure to explain why this should be the case points to a general problem of selection-oriented theories: why should there be selection at all? Capacity-oriented and activityoriented approaches can easily answer this question (because capacity is limited, because only one activity can be carried out at a time), but an analysis of the mechanisms of selection as such does not yet provide an answer. Several selection-oriented theorists have recognized this gap. Basically, two kinds of answer have been suggested. One is that attention serves the selective storage of experiences; the other is that selection is needed for the control of action. The idea that selection is related to memory storage was first clearly expressed by Stewart (1792). As will be remembered, Stewart adhered to Condillac's late selection theory. Like Condillac, he maintained that all stimuli are represented in consciousness, but those that are not attended will be immediately forgotten, and therefore we do not know of them. Thus, the effect of attention is to create a permanent memory trace. Without attention we 'have no recollection or memory whatever' (Stewart, 1792, p. 108). Stewart then goes on to analyze the usefulness of such a selective storage. We need not store everything, but we should store what is important for the future; therefore 'the great use of attention and memory is to enable us to treasure up the results of our experience and reflexion for the future regulation of our conduct' (Stewart, 1792, p. 117). The notion that attention may be related to memory has played a certain role in modern theorizing, although it has rarely been proposed that controlling access to memory is the function of attention. Deutsch and Deutsch (1963) proposed that the consequence of attentional selection is to 'switch in further processes, such as motor output, memory storage, and whatever else it may be that leads to conscious awareness' (p. 84). Both filter theory and corresponding models of visual information processing such as that of Sperling (1960) assumed that nonselected stimuli do not contact short-term or long-term memory. Van der Heijden (1981) proposed explicitly that attention has the effect of preventing 'short term visual information forgetting'. The second answer to the question of why selection takes place was probably first explicitly formulated in the late 19th century. As already briefly mentioned, Th6odule Ribot (1906) suggested a view of attention that was radically different from that of most of his contemporaries. He did not start from introspection, but from biological considerations. For him, attention was necessary to select the impressions from the environment that are required to control action. 'Attention is in the service of need and depends on it' (Ribot, 1906, p. 44). Ribot went on to distinguish between a primitive, reflexive form of attention and voluntary attention in adult humans, which he localized in the prefrontal cortex.
Theories of attention
439
Henning (1925) developed these ideas into a detailed evolutionary theory of attention. At the lowest level, behavior is not yet integrated and different effectors may show incongruent behavior, as when different tentacles of the sea anemone act independently. At the next level, the organism acts as a unit. This is achieved by the mutual inhibition between alternative action tendencies. In higher vertebrates, this is insufficient, due to their highly developed sensory systems, which provide a wealth of potentially action-relevant stimuli at any moment in time. Coordinated action is achieved by sensory selection, the 'limited range of consciousness'. Finally, humans, who have to cope with a still much larger number of potential actions, possess an additional attentional system, which permits the selection of stimuli for action control not according to their strength, but according to their meaning and importance. These theories were the predecessors of the action-oriented approach to attention of contemporary theorists such as Allport (1987, 1989), Neumann (1983, 1987, 1995) and van der Heijden (1987, 1992), already discussed in previous sections. The ideas that selection is in the service of action control and that it serves memory storage are not mutually incompatible. N e u m a n n (1990) has suggested that there are phylogenetically old selection mechanisms that subserve the immediate control of action (selection-for-action in Allport's, 1987, terms), and that the evolution of mammals has produced a second function of attentional mechanisms, the updating of an internal representation of the world. The latter function is essentially the same as that envisaged by Stewart.
4.5
What is Attention?
Like all psychological concepts, the notion of attention was initially based on common experience and, like most psychological concepts, it referred initially to a unitary entity, which dissolved into components, as research and theoretical sophistication progressed. Consider the concept of memory. Today we know that there is no unitary memory. It has been suggested that there is short-term and long-term memory, episodic and semantic, procedural and declarative, explicit and implicit memory, and so on. Similarly, there is every reason to believe that the term 'attention' does not refer to a unitary entity or mechanism. This should not prevent us from using the term, but it should be clear that it is a descriptive term, and that it describes the effects of a variety of mechanisms. It is therefore by no means clear whether the different approaches to attention are really incompatible. Possibly, at least some of them, which were meant by their authors as models of attention as such, are valid, and mutually compatible, as local models of specific mechanisms.
REFERENCES Allport, D. A. (1980). Attention and performance. In G. Claxton (Ed.), Cognitive Psychology- New Directions. London: Routledge and Kegan Paul. Allport, D. A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum.
440
0. Neumann
Allport, D. A. (1988). What concept of consciousness? In A. J. Marcel and E. Bisiach (Eds), Consciousness in Contemporary Science. Oxford: Clarendon Press. Allport, D. A. (1989). Visual attention. In M. I. Posner (Ed.), Foundations of Cognitive Science. Cambridge, MA: MIT Press. Allport, D. A. (1993). Attention and control: Have we been asking the wrong questions? In D. E. Meyer and S. Kornblum (Eds), Attention and Performance 14. Cambridge, MA: MIT Press. Allport, D. A., Antonis, B. and Reynolds, P. (1972). On the division of attention: A disproof of the single channel hypothesis. Quarterly Journal of Experimental Psychology, 24, 225235. Allport, D. A., Tipper, S. P. and Chmiel, N. R. J. (1985). Perceptual integration and postcategorical filtering. In M. I. Posner and O. S. Marin (Eds), Attention and Performance 11 (pp. 107-132). Hillsdale, NJ: Erlbaum. Atkinson, R. C. and Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence and J. T. Spence (Eds), The Psychology of Learning and Motivation: Advances in Research and Theory, vol. 2. New York: Academic Press. Attneave, F. (1959). Applications of Information Theory to Psychology: A Summary of Basic Concepts, Methods and Results. New York: Holt, Rinehart and Winston. Averbach, E. and Coriell, A. S. (1961). Short-term memory in vision. Bell System Technical Journal, 40, 309-328. Baddeley, A. D. (1966). The capacity of generating information by randomization. Quarterly Journal of Experimental Psychology, 18, 119-129. Bekker, I. (Ed.) (1831). Aristoteles Opera. Berlin: PreuBische Akademie der Wissenschaften. Reprint: Berlin: Gruyter, 1970. Berlyne, D. E. (1969). The development of the concept of attention in psychology. In C. R. Evans and T. Mulholland (Eds), Attention as a Neurophysiological Concept. London: Butterworths. Bornemann, E. (1942). Untersuchungen fiber den Grad der geistigen Beanspruchung. (Investigations on the degree of mental load.) Arbeitsphysiologie, 12, 142-191. Broadbent, D. E. (1952a). Speaking and listening simultaneously. Journal of Experimental Psychology, 43, 267-273. Broadbent, D. E. (1952b). Listening to one of two synchronous messages. Journal of Experimental Psychology, 44, 51-55. Broadbent, D. E. (1952c). Failures of attention in selective listening. Journal of Experimental Psychology, 44, 428-433. Broadbent, D. E. (1954). The role of auditory localization in attention and memory span. Journal of Experimental Psychology, 47, 191-196. Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press. Broadbent, D. E. (1963). Flow of information within the organism. Journal of Verbal Learning and Verbal Behavior, 2, 34-39. Broadbent, D. E. (1971). Decision and Stress. New York: Academic Press. Broadbent, D. E. (1982). Task combination and selective intake of information. Acta Psychologica, 50, 253-290. Brooks, L. R. (1968). Spatial and verbal components of the act of recall. Canadian Journal of Psychology, 22, 349-368. Campbell, A. J. and Mewhort, D. J. K. (1980). On familiarity effects in visual information processing. Canadian Journal of Psychology, 34, 134-154. Carr, T. H., McCauley, C., Sperber, R. D. and Parmelee, C. M. (1982). Words, pictures and priming: On semantic activation, conscious identification and the automaticity of information processing. Journal of Experimental Psychology: Human Perception and Performance, 8, 757-777. Cohen, J. D., Dunbar, K. and McClelland, J. L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332-361.
Theories of attention
441
Coltheart, M. (1984). Sensory memory. A tutorial review. In H. Bouma and D. G. Bouwhuis (Eds), Attention and Performance 10. Hillsdale, NJ: Erlbaum. Condillac, E. de (1947). Trait6 des sensations. (Treatise on sensations.) Oeuvres Philosophiques de Condillac, vol. 1. Paris: Presses Universitaires de France. Corteen, R. S. and Wood, B. (1972). Autonomic responses to shock-associated words in an unattended channel. Journal of Experimental Psychology, 94, 308-313. Craik, F. I. M. and Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671-684. Dark, V., Johnston, W. A., Myles-Worsley, M. and Farah, M. J. (1985). Levels of selection and capacity limits. Journal of Experimental Psychology: General, 114, 472-497. Dawson, M. E. and Schell, A. M. (1982). Electrodermal responses to attended and unattended significant stimuli during dichotic listening. Journal of Experimental Psychology: Human Perception and Performance, 8, 315-324. Dennis, I. (1977). Component problems in dichotic listening. Quarterly Journal of Experimental Psychology, 29, 437-450. Descartes, R. (1973). Les passions de l'ame. (The passions of the soul.) Oeuvres Philosophiques de Descartes, vol. 3. Paris: Garnier. Deutsch, J. A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Duncan, J. (1984). Selective attention and the organization of visual information. Journal of Experimental Psychology: General, 113, 501-517. DLirr, E. (1907). Die Lehre vonder Aufmerksamkeit. (The doctrine of attention.) Leipzig: Quelle and Meyer. Erdmann, B. (1920). Grundz~ge der Reproduktionspsychologie. (Essentials of reproduction psychology.) Berlin, Leipzig: Vereinigung wissenschaftlicher Verleger. Eriksen, C. W. and Hoffman, J. E. (1973). The extent of processing of noise elements during selective encoding from visual displays. Perception and Psychophysics, 14, 155-160. Eriksen, C. W. and Murphy, T. D. (1987). Movement of attentional focus across the visual field: A critical look at the evidence. Perception and Psychophysics, 42, 229-305. Eriksen, C. W. and Schultz, D. W. (1978). Temporal factors in visual information processing: A tutorial review. In J. Requin (Ed.), Attention and Performance 7. Hillsdale, NJ: Erlbaum. Eriksen, C. W. and St James, J. D. (1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception and Psychophysics, 40, 225-240. Fisher, S. (1975). The microstructure of dual-task interaction. 1. The patterning of main-task responses within secondary-task intervals. Perception, 4, 267-290. Friedman, A. and Polson, M. C. (1981). Hemispheres as independent resource systems: Limited-capacity processing and cerebral specialization. Journal of Experimental Psychology: Human Perception and Performance, 7, 1031-1058. Garner, W. R. (1962). Uncertainty and Structure as Psychological Concepts. New York: Wiley. Goebel, R. (1991). Binding, episodic short-term memory and selective attention, or why are PDP models poor at symbol manipulation? In D. S. Touretzky, J. L. Elman, T. J. Sejnowski and G. E. Hinton (Eds), Connectionist Models. Proceedings of the 1990 Summer School. San Mateo, CA: Morgan Kaufmann. Gopher, D. and Sanders, A. F. (1984). 'S-Oh-R': Oh stages! Oh resources! In W. Prinz and A. F. Sanders (Eds), Cognition and Motor Processes. Berlin: Springer. Hartley, D. (1749/1971). Observations on Man, his Frame, his Duty and his Expectations. New York: Garland. Helmholtz, H. v. (1924). Treatise on Physiological Optics. Rochester: Optical Society of America. Henning, H. (1925). Die Untersuchung der Aufmerksamkeit. (The study of attention.) In E. Abderhalden (Ed.), Handbuch der Biologischen Arbeitsmethode, Abt. 6, Teil 3. Berlin: Urban and Schwarzenberg. Heuer, H. (1984). Motor learning as a process of structural constriction and displacement. In W. Prinz and A.F. Sanders (Eds), Cognition and Motor Processes. Berlin: Springer.
442
0. Neumann
Heuer, H. (1985). Some points of contact between models of central capacity and factor-analytic models. Acta Psychologica, 60, 135-155. Hick, W. E. (1952). On the rate of gain of information. Quarterly Journal of Experimental Psychology, 4, 11-26. Hirst, W. and Kalmar, D. (1987). Characterizing attentional resources. Journal of Experimental Psychology: General, 116, 68-81. Holender, D. (1986). Semantic activation without conscious identification in dichotic listening, parafoveal vision and visual masking: A survey and appraisal. Behavioral and Brain Sciences, 9, 1-66. Hughes, H. C. and Zimba, L. D. (1987). Natural boundaries for the spatial spread of directed visual attention. Neuropsychologia, 25, 5-18. James, W. (1890/1950). The Principles of Psychology, vol. 1. Reprint. New York: Dover Publications. Johnston, W. A. and Heinz, S. P. (1979). Depth of nontarget processing in an attention task. Journal of Experimental Psychology, 5, 168-175. Kahneman, D. (1973). Attention and Effort. Englewood Cliffs, NJ: Prentice Hall. Kantowitz, B. H. (1974). Double stimulation. In B. H. Kantowitz (Ed.), Human Information Processing: Tutorials in Performance and Cognition. Potomac, MD: Erlbaum. Kantowitz, B. H. and Knight, S. L. (1976). Testing tapping time-sharing, II: Auditory secondary task. Acta Psychologica, 40, 343-362. Keele, S. W. (1973). Attention and Human Performance. Pacific Palisades, CA: Goodyear. Keele, S. W. and Neill, W. T. (1978). Mechanisms of attention. In E. C. Carterette and M.P. Friedman (Eds), Handbook of Perception, vol. 9: Perceptual Processing. New York: Academic Press. Kerr, B. (1973). Processing demands during mental operations. Memory and Cognition, 1, 401-412. Kinsbourne, M. and Hicks, R. E. (1978). Functional cerebral space: A model for overflow, transfer and interference effects in human performance: A tutorial review. In J. Requin (Ed.), Attention and Performance 7. Hillsdale, NJ: Erlbaum. Koch, R. (1993). Die psychologische Refrakt~irperiode. (The psychological refractory period). Unpublished doctoral dissertation, Ludwig-Maximilian-University, Munich. Korteling, J. E. (1994). Multiple-Task Performance and Aging. Groningen: Bariet, Ruinen. Kuhl, J. (1995). Wille und Freiheitserleben: Formen der Selbststeuerung. (Volition and the experience of freedom: Forms of self control.) In J. Kuhl and H. Heckhausen (Eds),
Motivation, Volition und Handlulng. Enzyklop~die der Psychologie, Series "Motivation und Emotion', vol. 4. G6ttingen: Hogrefe, in press. LaBerge, D. (1981). Automatic information processing: A review. In J. Long and A. Baddeley (Eds), Attention and Performance 9. Hillsdale, NJ: Erlbaum. LaBerge, D. and Samuels, S. J. (1974). Toward a theory of automatic information processing in reading. Cognitive Psychology, 6, 292-323. Lewis, J. L. (1970). Semantic processing of unattended messages using dichotic listening. Journal of Experimental Psychology, 85, 225-228. Lovie, A. D. (1983). Attention and behaviorism- fact and fiction. British Journal of Psychology, 74, 301-310. MacKay, D. G. (1973). Aspects of the theory of comprehension, memory, and attention. Quarterly Journal of Experimental Psychology, 25, 22-40. Malebranche, N. de (1967a). M6ditations chr6tiennes. (Christian meditations.) Oeuvres de Malebranche. Paris: J. Vriu. Malebranche, N. de (1967b). La recherche de la v6rit6. (The search for truth.) Oeuvres de Malebranche. Paris: J. Vriu. Marcel, A. J. (1980). Conscious and preconscious recognition of polysemous words: Locating selective effects of prior verbal context. In R.S. Nickerson (Ed.), Attention and Performance 7. Hillsdale, NJ: Erlbaum.
Theories of attention
443
Marcel, A. J. (1983). Conscious and unconscious perception: An approach to the relations between phenomenal experience and perceptual processes. Cognitive Psychology, 15, 238-302. McDougall, W. (1902). The physiological factors of the attention-process (I). Mind, 11, 316-351. McDougall, W. (1903). The physiological factors of the attention-process (II and III). Mind, 12, 289-302, 473-488. McDougall, W. (1906). The physiological factors of the attention-process (IV). Mind, 15, 329-359. McLeod, P. (1977). A dual-task response modality effect: Support for multiprocessor models of attention. Quarterly Journal of Experimental Psychology, 29, 651-667. Merikle, P. M. (1980). Selection from visual persistence by perceptual groups and category membership. Journal of Experimental Psychology: General, 109, 279-295. Mewhort, D. J. K., Campbell, A. J., Marchetti, F. M. and Campbell, J. I. D. (1981). Identification, localization, and 'iconic memory': An evaluation of the bar-probe task. Memory and Cognition, 9, 50-67. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology, 11, 56-60. Moray, N. (1967). Where is capacity limited? A survey and a model. Acta Psychologica, 27, 84-92. Morton, J. (1969). Interaction of information and word recognition. Psychological Review, 76, 165-178. Navon, D. (1984). Resources. A theoretical soup stone? Psychological Review, 91, 216-234. Navon, D. (1985). Attention division or attention sharing? In M. I. Posner and O. S. M. Marin (Eds), Attention and Performance 11. Hillsdale, NJ: Erlbaum. Navon, D. and Gopher, D. (1979). On the economy of the human processing system. Psychological Review, 86, 214-255. Navon, D. and Miller, J. (1987). Role of outcome conflict in dual-task interference. Journal of Experimental Psychology: Human Perception and Performance, 13, 435-448. Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts. Neisser, U. (1976). Cognition and Reality. San Francisco, CA: Freeman. Neisser, U. and Becklen, R. (1975). Selective looking: Attending to visually specified events. Cognitive Psychology, 7, 480-494. Neumann, O. (1971). Aufmerksamkeit. (Attention.) In J. Ritter (Ed.), Historisches W'6rterbuch der Philosophie, vol. 1. Basel, Stuttgart: Schwabe. Neumann, O. (1978). Aufmerksamkeit als 'zentrale Verarbeitungskapazit~it'. Anmerkungen zu einer Metapher. (Attention as 'central processing capacity': Remarks on a metaphor.) In M. Tiicke and W. Definer (Eds), Proceedings of the 2nd Osnabr~ck Psychology Workshop. Osnabr6ck: Universit~it Osnabriick. Neumann, O. (1980). Informationsselektion und Handlungssteuerung. (Information selection and action control.) Doctoral dissertation, Faculty of Philosophy, Ruhr-University Bochum. Neumann, O. (1981). Interferenz beim Beachten simultaner sprachlicher Texte: Unspezifische Kapazit~itsbegrenzung oder spezifische Verarbeitungsschwierigkeiten? (Interference in attending to simultaneous verbal messages: Unspecific capacity limitation or specific processing problems?) Reports from the Cognitive Psychology Unit, Ruhr-University Bochum (FRG), 18. Neumann, O. (1983). Ober den Zusammenhang zwischen Enge und Selektivit~it der Aufmerksamkeit. (On the relation between the limits and the selectivity of attention.) Reports from the Cognitive Psychology Unit, Ruhr-University Bochum (FRG), 19. Neumann, O. (1984). Automatic processing: A review of recent findings and a plea for an old theory. In W. Prinz and A. F. Sanders (Eds), Perspektiven der Kognitionspsychologie. Berlin: Springer.
444
0. Neumann
Neumann, O. (1985). Die Hypothese begrenzter Kapazit/it und die Funktionen der Aufmerksamkeit. (The limited-capacity hypothesis and the functions of attention.) In O. Neumann (Ed.), Perspektiven der Kognitionspsychologie. Berlin: Springer. Neumann, O. (1987). Beyond capacity: A functional view of attention. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Hillsdale, NJ: Erlbaum. Neumann, O. (1989). On the origins and status of the concept of automatic processing. Zeitschrifi fiir Psychologie, 197, 411-428. Neumann, O. (1990). Visual attention and action. In O. Neumann and W. Prinz (Eds), Relationships between Perception and Action: Current Approaches. Berlin: Springer. Neumann, O. (1992). Theorien der Aufmerksamkeit: Von Metaphern zu Mechanismen. (Theories of attention: From metaphors to mechanisms.) Psychologische Rundschau, 43, 83-101. Neumann, O. (1995). Konzepte der Aufmerksamkeit. Entstehung, Wandlungen und Funktionen eines psychologischen Begriffs. (Concepts of attention. Origins, mutations, and functions of a psychological concept.) G6ttingen: Hogrefe, in press. Neumann, O., Esselmann, U. and Klotz, W. (1993). Differential effects of visual spatial attention on response latency and temporal order judgment. Psychological Research, 56, 26-34. Neumann, O. and Klotz, W. (1994). Motor responses to nonreportable, masked stimuli: Where is the limit of direct parameter specification? In M. Moscowitch and C. Umilt~i (Eds), Attention and Performance 15. Cambridge, MA: MIT Press. Norman, D. A. and Bobrow, D. G. (1975). On data-limited and resource-limited processes. Cognitive Psychology, 7, 44-64. Norman, D. A. and Shallice, T. (1986). Attention to action: Willed and automatic control of behavior. In R. J. Davidson, G. E. Schwartz and D. Shapiro (Eds), Consciousness and Self-regulation, vol. 4. New York: Plenum Press. North, R. A. (1977). Task Components and Demands as Factors in Dual-Task Performance (Technical Report). Savoy, IL: University of Illinois at Urbana-Champaign. Ostry, D., Moray, N. and Marks, G. (1976). Attention, practice and semantic targets. Journal of Experimental Psychology: Human Perception and Performance, 2, 326-336. Pashler, H. (1984). Evidence against late selection: Stimulus quality effects in previewed displays. Journal of Experimental Psychology: Human Perception and Performance, 10, 429448. Pashler, H. (1989). Dissociations and dependencies between speed and accuracy: Evidence for a two-component theory of divided attention in simple tasks. Cognitive Psychology, 21, 469-514. Pashler, H. (1993). Dual-task interference and elementary mental mechanisms. In D. Meyer and S. Kornblum (Eds), Attention and Performance 14. Cambridge, MA: MIT Press. Phaf, R. H., van der Heijden, A. H. C. and Hudson, P. T. W. (1990). SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology, 22, 273-341. Pillsbury, W. B. (1908). Attention. London: George Allen and Unwin. Posner, M. I. (1978). Chronometric Explorations of Mind. Hillsdale, NJ: Erlbaum. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, Section A, 32, 3-25. Posner, M. I. (1982). Cumulative development of attentional theory. American Psychologist, 37, 168-179. Posner, M. I. and Boies, S. J. (1971). Components of attention. Psychological Review, 78, 391-408. Posner, M. I. and Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13, 25-42. Posner, M. I. and Snyder, C. R. R. (1975). Attention and cognitive control. In R. Solso (Ed.), Information Processing and Cognition: The Loyola Symposion. Potomac, MD: Erlbaum.
Theories of attention
445
Posner, M. I., Snyder, C. R. R. and Davidson, B. J. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109, 160-174. Pribram, K. H. and McGuinness, D. (1975). Arousal, activation and effort in the control of attention. Psychological Review, 22, 116-149. Ribot, T. (1906). Psychologie de l'Attention. Paris: Felix Alcan. Sanders, A. F. (1971). Psychologie der Informationsverarbeitung. (Information processing psychology.) Bern: Huber. Sanders, A. F. (1979). Some remarks on mental load. In N. Moray (Ed.), Mental Workload. Its Theory and Measurement. New York: Plenum Press. Sanders, A. F. (1980). Stage analysis of reaction time. In G. E. Stelmach and J. Requin (Eds), Tutorials in Motor Behavior. Amsterdam: North-Holland. Sanders, A. F. (1983). Towards a model of stress and human performance. Acta Psychologica, 53, 61-97. Schneider, W. (1985). Toward a model of attention and the development of automation processing. In M. I. Posner and O. S. M. Marin (Eds), Attention and Performance 9. Hillsdale, NJ: Erlbaum. Schneider, W.X. (1991). Visuelle Aufmerksamkeit, Handlungssteuerung und die LichtkegelMetapher. Untersuchungen zur Wirkung r~iumlicher Hinweisreize in einem InterferenzVersuch. (Visual attention, action control and the spotlight metaphor: Investigations on the effect of spatial cues in an interference experiment.) Unpublished doctoral dissertation, Bielefeld University. Schneider, W., Dumais, S. T. and Shiffrin, R. M. (1984). Automatic and control processing and attention. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention. New York: Academic Press. Shallice, T. (1972). Dual functions of consciousness. Psychological Review, 79, 383-393. Shallice, T. (1978). The dominant action system: An information-processing approach to consciousness. In K. S. Pape and J. L. Singer (Eds), The Stream of Consciousness. New York: Plenum Press. Shannon, C. E. and Weaver, W. (1949). The Mathematical Theory of Communication. Urbana, IL: University of Illinois Press. Shiffrin, R. M. and Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84, 127-190. Smith, M. C. (1967). Theories of the psychological refractory period. Psychological Bulletin, 67, 202-213. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monograph, 74(11) (whole no. 498). Spiller, G. (1901). The dynamics of attention. Mind, 10, 498-524. Sternberg, S. (1969). The discovery of processing stages: Extension of Donder's method. Acta Psychologica, 30, 276-315. Stewart, D. (1792/1971). Elements of the Philosophy of the Human Mind. New York: Garland. Telford, C. W. (1931). Refractory phase of voluntary and associative responses. Journal of Experimental Psychology, 14, 1-36. Tipper, S. P. (1992). Selection for action: The role of inhibitory mechanisms. Current Directions in Psychological Science, 1, 105-109. Treisman, A. M. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental Psychology, 12, 242-248. Treisman, A. M. (1964). Selective attention in man. British Medical Bulletin, 20, 12-16. Treisman, A. M. (1988). Features and objects: The fourteenth Bartlett memorial lecture. Quarterly Journal of Experimental Psychology, 40A, 201-237. Treisman, A. M. (1992). Visual attention and the perception of objects. International Journal of Psychology, 27, 13.
446
0. Neumann
Treisman, A. M. and Davies, A. (1973). Divided attention to ear and eye. In S. Kornblum (Ed.), Attention and Performance 4. New York: Academic Press. Treisman, A. M. and Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Treisman, A. M. and Gormican, S. (1988). Feature analysis in early vision: Evidence from search asymmetries. Psychological Review, 95, 15-48. Treisman, A. M. and Paterson, R. (1984). Emergent features, attention and object perception. Journal of Experimental Psychology: Human Perception and Performance, 10, 12-31. Treisman, A. M. and Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107-141. Treisman, A. M. and Souther, J. (1985). Search asymmetry: A diagnostic for preattentive processing of separable features. Journal of Experimental Psychology: General, 114, 285-310. Treisman, A. M., Squire, R. and Green, J. (1974). Semantic processing in dichotic listening? A replication. Memory and Cognition, 2, 641-646. Umilta, C. and Moscovitch, M. (Eds) (1994). Attention and Performance 15. Cambridge, MA: MIT Press. Underwood, G. (1976). Semantic interference from unattended printed words. British Journal of Psychology, 67, 327-338. van der Heijden, A. H. C. (1981). Short-Term Visual Information Forgetting. London: Routledge and Kegan Paul. van der Heijden, A. H. C. (1984). Postcategorical filtering in a bar-probe task. Memory and Cognition, 12, 446-457. van der Heijden, A. H. C. (1987). Central selection in vision. In H. Heuer and A. F. Sanders (Eds), Perspectives on Perception and Action. Berlin: Springer. van der Heijden, A. H. C. (1990). Visual information processing and selection. In O. Neumann and W. Prinz (Eds), Relationships Between Perception and Action: Current Approaches. Berlin: Springer. van der Heijden, A. H. C. (1992). Selective Attention in Vision. London: Routledge and Kegan Paul. van der Heijden, A. H. C., Hagenaar, R. and Bloem, W. (1984). Two stages in post categorical filtering and selection. Memory and Cognition, 12, 458-469. von Wright, J. M., Anderson, K. and Stenman, U. (1975). Generalization of conditioned GSRs in dichotic listening. In P. M. A. Rabbitt and S. Dornic (Eds), Attention and Performance 5. New York: Academic Press. Welford, A. T. (1967). Single-channel operation in the brain. Acta Psychologica, 27, 5-22. Welford, A. T. (1980). The single-channel hypothesis. In A. T. Welford (Ed.), Reaction Times. New York: Academic Press. Wickens, C. D. (1980). The structure of attentional resources. In R. S. Nickerson (Ed.), Attention and Performance 8. Hillsdale, NJ: Erlbaum. Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman and D. R. Davies (Eds), Varieties of Attention. New York: Academic Press. Wolff, P. (1977). Entnahme der Identit~its- und Positionsinformation bei der Identifikation tachistoskopischer Buchstabenzeilen. Ein theoretischer und experimenteller Beitrag zur Grundlagenforschung des Lesens. (Extraction of identity and position information during the identification of tachistoscopically presented letter rows. A theoretical and experimental contribution to the investigation of basic processes in reading). Unpublished doctoral dissertation, Ruhr University Bochum. Wundt, W. (1903). Grundz~ge der physiologischen Psychologie (Principles of physiological psychology), 5th edn. Leipzig: Engelmann. Ziehen, T. (1920). Leitfaden der physiologischen Psychologie in 16 Vorlesungen (A manual of physiological psychology in 16 lectures), 11th edn. Jena: Fischer.
Index For core concepts and keywords, please refer to the Table of Contents as well. arousal 233 ff, 242 if, 256 ff, 289, 292, 297 if, 304 ff, 334, 413 upper vs lower mechanism 240 ff, 434 Yerkes-Dodson law 129, 230, 297, 306 attentional trace theory 344, 352, 367 auditory illusions 101 ff auditory streaming 79, 84 ff automatic vs controlled processing 26, 28, 44, 54, 144, 145, 155, 163, 185 if, 205 if, 291, 309, 333,334, 342, 363 if, 390, 393, 427 ff, 433
lateralised readiness potential 334, 370 ff neural basis 335, 336 mismatch negativity 170 if, 343, 367 processing negativity 338 ff, 344 P3 341 ff, 359, 360 facilitation 424 ff feature integration theory, 17, 33, 196, 395, 422, 423 filled duration illusion 91 ff functional cerebral space 137, 138 functional visual field 48 ff
bar probe task 11, 20, 27, 28 Gestalt 57, 105, 142, 233, 234 galvanic skin response 82, 288, 404
closed loop vs open loop control 188 ff cocktail party effect 79 ff cognitive-energetical model 256 if, 390, 406, 408, 409 contextual facilitation hypothesis 198, 199 coupling of movements 140 covert orienting 60, 162 ff, 208 ff, 294, 395 central vs peripheral cue 163 ff abrupt onset 163 if, 367
heart rate response 262 if, 404 inhibition 424 ff instance theory 188, 191 if, 292 interference concurrence costs 125, 409 structural interference 113, 114, 124, 126, 136, 410, 411 process interference 137 if, 186, 392, 394, 397, 403, 407, 409 ff
dichotic listening 79 ff, 94 ff, 203 ff, 286, 291, 411, 422, 423 early vs late selection 17, 24 ff, 32, 33, 44, 52, 53, 58, 196 if, 205, 211 ff, 233, 238 ff, 291, 334, 345, 351, 376, 377, 390 if, 415, 416 ff EEG desynchronisation 236, 297 effort 124, 186, 238, 242 ff, 256 ff, 292, 402, 408, 409, 413, 434 electromyogram 370 ff evoked potentials 160 ff 170 ff 289, 294 ff 333 ff415
limited capacity 14, 17 if, 54, 116 ff, 122 ff, 190, 191, 239, 242, 256, 291, 292, 359, 390 ff, 396, 397, 400 if, 410, 414, 432 masking 22 memory span 7, 9, 16 multiple resources 130 ff, 245, 291,292, 359, 391, 393, 404 ff multidimensional selection 359 ff
447
448
Index
neural model 159, 235 neural specificity theory 345, 350 ff neurophysiology of attention 245 ff, 293 ff, 349, 350 non-selective access hypothesis 198 orientation reaction 158, 169 if, 235, 246, 288 habituation 158, 159, 246, 280, 288 level shift 157 ff rule deviation 157, 159 168 ff parallel processing 18, 22, 23, 28, 31 ff, 52, 55, 402 partial-report task 9, 20 performance operating characteristic 122 if, 402 peripheral vision 58, 59, 68, 207 ff postcategorical filtering model 30, 420 postperceptual stm 6, 7, 10 priming 24, 201 ff, 395 psychological refractory period 118 ff, 399, 400 reaction time and attention 230 ff, 238 ff, 249 ff rhythmically guided attention 86 ff secondary task 54, 190, 245 similarity of tasks 132, 135, 139 single channel theory 116 if, 234, 397, 399 ff search strategies 44, 62 ff, 69 cognitive search models 44 conditional sampling 65,66 optimal scanning 64 ff 69, 116 random vs systematic search, 62,63 reconstruction 65, 113 selection of action 43, 113, 412 if, 436 ff effector recruitment 413 parameter specification 413
spatial localisation (spotlight, zoomlens) 28 if, 44, 58 347 348 353 ff 358, 362, 365, 376, 377, 395, 414, 430 speed-accuracy trade-off 254 ff stage models 128, 135 238, 249 ff, 333, 357, 369, 394, 406, 408 stress and performance 44, 67 ff, 230, 238 ff, 252 ff, 297 alcohol 255, 301ff drugs 252, 260, 261,281, 298, 301 ff, 404, 408 noise 234, 240, 297 ff sleep-loss 237, 240 ff, 252, 261,404, 408 Stroop effect 24, 194 if, 207, 236, 291 structural displacement 145 ff time-sharing skill 147, 148, 288 two-process selection theory 14 if, 52 if, 175 290, 422, 427, 435 vigilance 277 ff applied roots 277, 278, 281 individual differences 285 ff task taxonomy 282, 283 theories279 ffvisual acuity 18, 23 visual image; icon 6, 11, 12, 14, 15, 22, 25, 161, 391, 392, 418 location errors 28, 29 visual search 24, 26, 30, 43 ff, 174 ff, 371 ff, 415 background control 55 ff, 175 ff conspicuity area 45, 58 lateral inhibition 21, 24,207 lobe 45 ff, 49, 54, 55 saccades 19, 44, 47ff, 58 ff, 116, 208, 422 scan paths 44 visibility area 48