Fechner’s Legacy in Psychology: 150 Years of Elementary Psychophysics
Fechner’s Legacy in Psychology: 150 Years of Elementary Psychophysics Edited by
Joshua A. Solomon
LEIDEN • BOSTON 2011
Cover illustration: Fechner’s Functions © Nicholas Wade This book is printed on acid-free paper. Library of Congress Cataloging-in-Publication Data Fechner’s legacy in psychology : 150 years of elementary psychophysics / edited by Joshua A. Solomon. p. cm. Includes index. ISBN 978-90-04-19220-1 (hbk. : alk. paper) 1. Psychophysics. 2. Fechner, Gustav Theodor, 1801-1887. Elemente der Psychophysik. I. Solomon, Joshua A. BF237.F38 2011 152–dc22 2011000135
ISBN-13
978-90-04-19220-1
Copyright 2011 by Koninklijke Brill NV, Leiden, The Netherlands. Koninklijke Brill NV incorporates the imprints Brill, Hotei Publishing, IDC Publishers, Martinus Nijhoff Publishers and VSP. All rights reserved. No part of this publication may be reproduced, translated, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission from the publisher. Authorization to photocopy items for internal or personal use is granted by Koninklijke Brill NV provided that the appropriate fees are paid directly to The Copyright Clearance Center, 222 Rosewood Drive, Suite 910, Danvers, MA 01923, USA. Fees are subject to change.
Contents Preface Commemorating Elemente der Psychophysik J. A. Solomon
1
Fechner’s Law: Where Does the Log Transform Come From? D. Laming
7
Fechner’s Elusive Parallel Law H. E. Ross and N. J. Wade
25
Magnitude of Perceived Change in Natural Images May Be Linearly Proportional to Differences in Neuronal Firing Rates D. J. Tolhurst, M. P. S. To, M. Chirimuuta, T. Troscianko, P.-Y. Chua and P. G. Lovell
39
Measuring Perceptual Hysteresis with the Modified Method of Limits: Dynamics at the Threshold H. S. Hock and G. Schöner
63
Functional Adaptive Sequential Testing E. Vul, J. Bergsma and D. I. A. MacLeod
87
A Criterion Setting Theory of Discrimination Learning that Accounts for Anisotropies and Context Effects M. Lages and M. Treisman
121
Sensory Integration Across Modalities: How Kinaesthesia Integrates with Vision in Visual Orientation Discrimination M. Treisman and M. Lages
155
Fechner’s Aesthetics Revisited F. Phillips, J. F. Norman and A. M. Beers
183
What Comes Before Psychophysics? The Problem of ‘What We Perceive’ and the Phenomenological Exploration of New Effects B. Pinna
193
Index
213
Preface Commemorating Elemente der Psychophysik
The articles following this preface were collected to commemorate the 150th anniversary of Fechner’s most famous publication: Elemente der Psychophysik. By his own account (in Volume II, as yet untranslated into English), Fechner had something of an epiphany on 22 October 1850. It’s a good story, how the logarithmic transformation of physical intensity into psychical intensity came to him at dawn, but since Fechner didn’t mention it until 10 years after the fact (Rosenzweig, 1987), perhaps it is only that, a story. Fechner’s writing belies more than a little inclination for hyperbole. He complains that, ‘I ruined my eyesight. . . looking often at the sun. . . so that by Christmas 1839 I could no longer use my eyes’. According to one expert ‘Fechner became so sensitive to light that he blindfolded himself and diagnosed himself as blind. . . . In 1841 he lost all appetite and emaciated himself to the point that he could no longer stand upright’ (Heidelberger, 2004). What was Fechner’s problem? Well, a diagnosis consistent with the DSM-IV might not be available, but (again, according to Heidelberger) Imre Hermann suspected ‘intrauterine regression’. Then, in 1843, Fechner recovered. ‘God himself called me to do extraordinary things’. Elemente is just one of the 26 books (and 61 articles) he produced between 1843 and 1887. And this period excludes all the early classics, such as Carving Meat and Setting the Table (I’m not making this up). Fechner’s first lecture after recovering was ‘On The Greatest Good’, one of the tenets of a new philosophy called the ‘day view’ (in contradistinction to the ‘night view’ of a purely mechanical or deterministic universe), which I cannot pretend to understand. However, I do understand that, by the greatest good, Fechner meant that people (at least should) strive to bring happiness to the world. He dubbed this drive Lustprinzip, or the ‘pleasure principle’, not dissimilar from Freud’s later description of what constitutes the id. The day view incorporated Fechner’s belief that plants (see Nanna — or On the Soul Life of Plants, 1843) and worlds have souls. This would seem crazy to most people today, but is it really any crazier than believing that people have souls? I don’t know. I’m not a metaphysician like Fechner, but if I were, I would certainly
2
Preface
agree with his opinion that metaphysics were best-approached using ‘unpretentious empiricism’. According to Fechner, the whole of Elemente ‘evolved on the basis of and in connection with’ the idealistic interpretation of immortality as initially set out in his Little Book on Life After Death (1836). Fechner may not have had an undying soul, but you cannot argue with the persistence of his influence. His greatest contribution to psychology must surely have been the notion that mental events can be measured in terms of the stimuli that elicit them. In Elemente, he explicitly refuses credit for inventing any of the three now-standard methods of measuring mental events (which he calls the method of just-noticeable differences, the method of right and wrong cases, and the method of average error), but he certainly must have been the first to fully compare their various pros and cons. His deep appreciation for these pros and cons clearly came from extensive practice. He accurately notes that ‘Practice soon allows one to carry out these (psychophysical trials) quite mechanically’, and arguably attributes this feeling of automaticity to low attentional load: ‘Even the direction of attention soon becomes uniform and mechanical so that, as my data themselves show, attention does not seem noticeably weakened’ (p. 82, see Note 1). Fechner’s concept of attention was inspired by energy conservation in physics (p. 32), but it really wasn’t too different from the more contemporary notion of a limited resource that can be allocated to various activities as needs be. Assembly of this commemorative edition gave me an excuse to read Adler’s translation of Elemente, cover-to-cover. I was amused by descriptions of several phenomena that seem to have been re-discovered later. One particularly entertaining example was something that I needlessly re-dubbed ‘stochastic re-calibration’ (a.k.a. criterion fluctuation; see Solomon and Morgan, 2006). Fechner wrote: ‘Unfortunately there is, strictly speaking, no constancy of constant errors. . . [T]he variability due to constant errors becomes mixed up with the purely variable error and contaminates it’ (p. 76). My favourite sentence is this one: ‘(Psychophysical) methods possess the big advantage that they rely on the proven principles of probability and may even themselves add something to the development of these principles’ (p. 64). Initially, I was struck by the alliteration, which probably is not present in the original German. However, upon reflection, this seems to be Fechner’s one statement of faith, or maybe only hope, which we now know was right on the money. A thorough evaluation of Fechner’s contribution to statistics appears elsewhere (Sheynin, 2004), but I would like to mention one particular section of Elemente where Fechner’s application of statistics to a perceptual problem parallels my own current interest. In his extensive measurements of the threshold for size discrimination, Fechner implicitly accepted what we might now call an equivalent noise model (see Pelli, 1990). The basic equation first appears on page 186. Volkmann’s constant is the name he gives to what is effectively the root-mean-squared amplitude of the equivalent noise for size discrimination. It’s not clear to me why he didn’t similarly adopt
Commemorating Elemente der Psychophysik
3
an equivalent noise model for light discrimination. Instead, he advocated the existence of an essentially constant intrinsic light. Just about every aspect of Fechner’s legacy is discussed in this volume. Laming’s contribution (this volume) summarises decades of his work, and succinctly describes exactly how the logarithmic transformation of luminosity to brightness arises in the visual system. Ross and Wade (this volume) examine one of Elemente’s all-but-forgotten hypotheses: The Parallel Law. Fechner wrote, ‘When the sensitivity to two stimuli changes in the same ratio, the perception of their difference will nevertheless remain the same’. Why would sensitivity change? One obvious reason is adaptation, and this is the focus of Ross and Wade’s investigation. My reading of their paper leads me to the conclusion that Fechner’s Parallel Law is effectively a gain-control model of adaptation. It is based on the aforementioned logarithmic transformation of physical intensity φ to perceived intensity ψ, i.e., ψ = a log φ + C, where a and C are arbitrary constants. The difference between two perceived intensities is thus: ψ2 − ψ1 = a log(φ2 /φ1 ). If, and only if, the Parallel Law holds, then ψ2 − ψ1 will remain unchanged after adaptation. I think this can only happen if, after adaptation, ψ = a log(bφ) + C , where b and C are also arbitrary constants. In other words, if both Weber’s Law and the Parallel Law are true, then adaptation effectively reduces the input gain of the physical signal (i.e., when b < 1) and/or shifts the range of perceived intensities (i.e., when C = C) (see Note 2). Fechner was naturally aware that organisms transduced physical stimuli into physiological responses, and therefore wished to know how these responses would be related to psychical, or perceived intensities. The work of Tolhurst, To, Chirimuuta, Troscianko, Chua and Lovell (this volume) addresses these ‘inner psychophysics’ by comparing the ability of human observers to discriminate between slightly different natural images with that of a physiologically-based imageprocessing model. When perceived intensities in the latter are proportional to neural activity, its performance is similar to that of human observers. Psychophysical methodology is the subject in each of the next four papers. Hock (this volume) provides a summary of the problems with Fechner’s method of just noticeable differences, and reviews the logic and benefits of his more contemporary version: the modified method of limits. Vul, Bergsma and MacLeod (this volume) also offer a contemporary modification of one of Fechner’s three methods; in this case, the method of right and wrong cases. Of Fechner’s three methods, this is probably the most popular amongst today’s psychophysicists. Between Fechner’s time and now, several algorithms (e.g.,
4
Preface
staircases) have been proposed for sampling from a continuum of stimulus intensities. Vul et al. take these algorithms one step further, and demonstrate how best to sample from, not merely a single continuum, but a plane of stimulus values. Regardless how stimulus intensities are selected, most contemporary psychophysicists use the method of right and wrong cases in tandem with analyses based on Signal-Detection Theory (SDT; Green and Swets, 1966), which ignores the influence any given trial may have on performance in the following trial. In the first of two papers, Lages and Treisman (this volume) review an extension of SDT, called Criterion Setting Theory (CST), in which these sequential effects play a starring role. To highlight the features of CST, they present data from a particularly elegant experiment, featuring my all-time favourite task: orientation discrimination. For me, reading these papers was a lot like reading Elemente. I was repeatedly finding citations of old ideas I thought were either mine or that of my collaborators, e.g., linking observers’ increased variable error with oblique orientations (i.e., the oblique effect) to criterial fluctuation (Solomon and Morgan, 2009). Moreover, I found CST to be conceptually similar to a Bayesian model (Tomassini et al., 2010), in which perceptual biases arise because of a predisposition for orientations with which we or our ancestors have had ‘prior’ experience. CST improves on this notion by incorporating really recent sensory experience. For a truly Bayesian model to capture all the effects described by Lages and Treisman (this volume), its prior would have to evolve on a trial-by-trial basis. Above, I claimed not to be a metaphysician. Nonetheless, each time a criterion shift, adaptation or other experience-related effect is successfully modelled, it seems to me that there is really less randomness in human performance than previously thought. The science Fechner developed to support his ‘day view’ of indeterminism increasingly seems to support its opposite: psychological responses are highly mechanical. Indeed the ability to predict behaviour with great accuracy is what led me away from the social sciences and toward psychophysics in the first place. Perhaps because of his philosophy, Fechner was attracted to psychological conundra less amenable to mechanistic interpretation. In particular, his study of aesthetics at least implicitly conceded an inability to manipulate and predict a single observer’s affinities. Instead, Fechner attempted to examine aesthetics using a large sample from the population. In what may have been one of the first psychological questionnaires (Heidelberger, 2004), Fechner solicited preferences for one of two drawings of the Madonna. Disappointed with the response rate, Fechner returned to the laboratory for a more straightforward task, using simple rectangles. In what must have been a fun experiment, Phillips, Norman and Beers (this volume) adopted Fechner’s method for use with slightly more naturalistic shapes. The results confirm Fechner’s suspicion that aesthetics are not completely random. It is possible to predict which types of shape people tend to prefer. The final paper is this collection’s most radical. Ever since Fechner, one recipe for experimentation has dominated the field of psychophysics: the experimenter
Commemorating Elemente der Psychophysik
5
formulates a hypothesis, measurements are recorded, and these measurements are used to either support or reject the initial hypothesis. Pinna (this volume) argues for a better recipe. He advocates a systematic phenomenological investigation to reveal which hypotheses should be psychophysically tested. Pinna defends this sequence of phenomenology and psychophysics on philosophical grounds, but as I have noted repeatedly, I am not a philosopher. Instead, I would like to see it tested with something analogous to psychophysics. In other words, I am advocating a Fechnerian sequence: (1) Hypothesize that systematic phenomenological investigation will turn up something worth measuring. (2) Conduct the aforementioned systematic phenomenological investigation. (3) Evaluate the resultant hypotheses. Of course, the problem with my sequence is that we have no obvious metrics on which to evaluate hypotheses. All that psychophysics can tell us is whether any given hypothesis is true or false. Still missing is a universally accepted scale for measuring whether a given hypothesis is worth testing or not. Joshua A. Solomon City University London Notes 1. Page numbers refer to the Adler translation edited by Howes and Boring (1966). 2. Range shifts have been proposed for adaptation on cyclical dimensions, like orientation (e.g., Gibson, 1937), where Weber’s Law does not apply. References Fechner, G. (1860/1966). Elements of Psychophysics (trans. E. Adler), D. H. Howe and E. G. Boring (Eds). Holt, Rinehart and Winston. Gibson, J. J. (1937). Adaptation, after-effect, and contrast in the perception of tilted lines. ii. Simultaneous contrast and the areal restriction of the after-effect, J. Exper. Psychol. 20, 553–569. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley, New York, USA. Heidelberger, M. (2004). Nature From Within: Gustav Fechner and His Psychophysical Worldview (trans. C. Klohr). University of Pittsburgh Press, Pittsburgh, PA, USA. Pelli, D. G. (1990). The quantum efficiency of vision, in: Vision: Coding and Efficiency, C. B. Blakemore (Ed.). Cambridge University Press, Cambridge, UK. Rosenzweig, S. (1987). The final tribute of E. G. Boring to G. T. Fechner, Amer. Psychol. 42, 787–790. Sheynin, O. (2004). Fechner as a statistician, British J. Mathemat. Statist. Psychol. 57, 53–72. Solomon, J. A. and Morgan, M. J. (2006). Stochastic re-calibration: contextual effects on perceived tilt, Proc. Royal Soc. London B 273, 2681–2686. Solomon, J. A. and Morgan, M. J. (2009). Strong tilt illusions always reduce orientation acuity, Vision Research 49, 819–824. Tomassini, A., Morgan, M. J. and Solomon, J. A. (2010). Orientation uncertainty reduces perceived obliquity, Vision Research 50, 541–547.
Fechner’s Law: Where Does the Log Transform Come From? Donald Laming ∗ University of Cambridge, Department of Experimental Psychology, Downing Street, Cambridge, England, CB2 3EB
Abstract This paper looks at Fechner’s law in the light of 150 years of subsequent study. In combination with the normal, equal variance, signal-detection model, Fechner’s law provides a numerically accurate account of discriminations between two separate stimuli, essentially because the logarithmic transform delivers a model for Weber’s law. But it cannot be taken to be a measure of internal sensation because an equally accurate account is provided by a χ 2 model in which stimuli are scaled by their physical magnitude. The logarithmic transform of Fechner’s law arises because, for the number of degrees of freedom typically required in the χ 2 model, the logarithm of a χ 2 variable is, to a good approximation, normal. This argument is set within a general theory of sensory discrimination. Keywords Detection, differential coupling, discrimination, Fechner’s law, logarithmic transform, signal detection theory
1. Introduction On the morning of 22nd October, 1850, whilst Fechner was lying in bed thinking about the problem of sensation, the idea came to him to make “the relative increase of bodily energy the measure of the increase of the corresponding mental intensity” (Boring, 1950, p. 280). Expressing this idea formally, if X1 and X2 are two stimulus magnitudes generating sensations S1 and S2 respectively, then (S2 − S1 ) is a function of the ratio (X2 /X1 ). From this it follows that (see Note 2): Sensation = log(physical magnitude).
(1)
Fechner conceived of an ‘outer’ psychophysics (the psychophysics of physical stimuli) and an ‘inner’ psychophysics (the psychophysics of sensation). The logarithmic *
E-mail:
[email protected]
8
D. Laming
transform in equation (1) connected the two. A unit difference in sensation (at the inner level) corresponds to a fixed ratio of stimulus magnitudes (at the outer level), and Fechner’s intuition thereby accounts for Weber’s law. In this paper I look back at Fechner’s intuition with the benefit of 150 years’ hindsight of subsequent experimental study and theoretical exploration. It can now be shown where the log transform comes from. It plays a pivotal role in a numerically accurate model for discriminations between two separate stimuli but, alas, it does not provide a basis for measuring sensation. Accordingly, the analysis that follows is expressed in terms of physical stimulus magnitudes (i.e., ‘outer’ psychophysics), not in terms of sensation or perceptual entities (‘inner’ psychophysics). The argument relies on the data from two visual studies by Nachmias and Steinman (1965) and Leshowitz et al. (1968). These two have been chosen, first, because they are visual, second, because their results are typical of many other similar studies, but, third, most important of all, because their designs embrace all the controls are needed in the argument. Moreover, the two studies use very similar stimuli (a circular field 1◦ in diameter), differing only in duration. These two studies are typical, not only of visual experiments, but of experiments in all other sensory modalities as well (Laming, 1986), so that the theoretical argument is of very general applicability. 2. Difference Discriminations The empirical properties of a discrimination between two stimulus magnitudes, luminances L and L + L, depend on the configuration in which the two levels are presented to the observer. It matters whether the two levels are contiguous in space and time, sharing a common boundary, or whether they constitute two separate stimuli. In Fechner’s day, limitations of instrumentation meant that most experimental studies concerned discriminations between two separate stimuli — two flashes of light in darkness, two bursts of noise in silence, two weights to be lifted in succession, or two lines placed end to end. Figure 1 shows the typical result when such a ‘difference’ discrimination is combined with a signal-detection rating paradigm. On each trial of this experiment (Nachmias and Steinman, 1965, Expt. III) the subject observed a single flash of light filling a circular field 1◦ in diameter. The luminance was either L or L + L (where L + L was 0.1 log unit greater than L — an increase of 26%) and after each stimulus presentation the subject was asked to say whether the brighter or the lesser luminance had been presented, expressing her posterior confidence on a six-point scale (cf. Swets et al., 1961). The two different sets of data relate to two different durations of flash, 52 ms and 230 ms. The data conform well to the normal, equal variance, signal-detection model, and this result has been obtained repeatedly in all sensory modalities. This experiment can be modelled as follows (see Fig. 2). The presentation of a stimulus is represented by a random sample from a normal distribution with mean either 0 (luminance L) or d (luminance L + L). Each data point corresponds to a
Fechner’s Law
9
Figure 1. Signal-detection data for a discrimination between two brief flashes of light differing by 0.1 log unit in luminance (Nachmias and Steinman, 1965, Expt. III). The two sets of data correspond to two different durations, 52 and 230 ms, and the two sets of operating characteristics correspond to two different models, the normal (Fig. 2) and χ 2 (Fig. 7). (Adapted from Sensory Analysis by D. Laming, p. 26. © 1986, Academic Press. Reproduced by permission.)
Figure 2. The normal model for discriminations between two separate stimuli. The continuous density functions and the five criteria (broken lines) have been calculated to model the 52 ms data in Fig. 1. Additional density functions (dotted curves) can be added to model the entire continuum of luminance.
criterion (broken lines in Fig. 2); the coordinates of the point are the areas under the continuous density functions to the right of the criterion (shaded areas in Fig. 2).
10
D. Laming
2.1. Canonical Model of the Operating Characteristic It is important not to read too much into this model. It consists of just two components: (a) the operating characteristic and (b) the choice of an operating point on that characteristic. The experiment by Nachmias and Steinman (1965) speaks only to the shape of the characteristic, and that shape, the very same shape, can be modelled in many different ways. Think of the experiment as a communication channel. The input is a stimulus, of luminance either L or (L + L), selected by computer according to a quasi-random schedule. The computer knows the identity of the stimulus without error. The output is the observer’s judgment, and collectively the observer’s responses show an interesting frequency of errors. (This is the raison d’être of the experiment.) The operating characteristic represents the information in the observer’s responses about the stimuli actually presented and, by implication, the information that is lost in transmission. (Note that the capacity of the channel is of no consequence; capacity limitations do not enter into the model.) The operating characteristic represents a limit on the accuracy of discrimination by the observer. In principle, any point on this characteristic is achievable, and randomisation between different operating points will deliver an aggregate performance lying below the characteristic. But performance corresponding to a point lying above the characteristic is not possible. The operating characteristic is the relation between the tail areas in Fig. 2 that is generated as the criterion varies. It is convenient to describe it by the parametric equations: α(x) = 1 − F (x|L), β(x) = 1 − F (x|L + L),
(2)
where F (x|L) and F (x|L + L) are the cumulative distribution functions conditional on luminances L and L + L, α is the tail area under the L-distribution (α is traditionally used to represent the significance level in a statistical test) and β the corresponding area under the (L + L)-distribution (β represents the power of the test). These tail areas are invariant under any monotone transformation of the decision variable (any monotone transform of the argument x in equation (2)), so that the normal distributions need have no particular significance for the structure of underlying processes. The normal model in Fig. 2 is simply a convenient vehicle for calculation. It is interesting, however, to substitute the variable ξ = ln(dβ/dα) for x in equation (2): ξ = ln(dβ/dα) = ln{f (x|L + L)/f (x|L)}, where f (x|L) and f (x|L + L) are the density functions in Fig. 2: = 12 {(x − log L)2 − (x − log(L + L))}2 = x log((L + L)/L) − 12 log((L + L)/L){log(L + L) + log L} = d x − 12 [log(L + L) + log L] ,
(3)
Fechner’s Law
11
because d = log(1 + L/L). The variable ξ is therefore normal with difference of mean equal to d 2 ; d 2 is the mean information per trial in favour of luminance (L + L) and against luminance L. The variance is also d 2 , so that division by the standard deviation (d ) returns to the exact model of Fig. 2. What is interesting is that the transformation ξ = ln(dβ/dα) may be applied following any prior monotone transform of the decision variable to give the same end result. It happens thus because, while an arbitrary monotone transformation of the decision axis in Fig. 2 may introduce all kinds of distortions to the shape of the distributions, their relative densities (that is, dβ/dα) are left unchanged. Equation (3), therefore, provides a canonical formulation of the operating characteristic, independent of whatever discrimination model might be preferred (see Laming, 1973, pp. 97–104). Nothing has so far been said about how, in practice, the random sample from one or the other distribution is converted into a response. For the sake of simplicity, the model in Fig. 2 assumes a fixed criterion; this is known to be a gross oversimplification. Successive trials in a signal detection experiment are not statistically independent. The criterion appears to wander from trial to trial, with errors occurring chiefly at the extreme excursions. This is most readily apparent from the study by Tanner et al., 1970 (see also Laming, 2004, Ch. 12). Feedback indicating an error produces a large and immediate shift in criterion. Moreover, that shift appears to be little more than ordinal (Laming, 2001). This has two implications for the present argument. First, the fact that a discrimination model might appear to require some non-linear transform of the decision variable to determine the response is neither here nor there. That non-linear transform might be approximated by purely ordinal adjustments. Second, variation of the criterion trial to trial introduces an additional source of variability and consequent further loss of information. At present, that further loss is subsumed in the operating characteristic. There is a surreptitious assumption here that variation of the criteria of judgment, trial to trial, does not interact with those stimulus parameters (e.g., luminance) that govern the loss of information in transmission. The objective of theory then is to characterise that loss of information in transmission in a manner that applies to as wide a range of experiments as possible. Fechner sought to do just this. Further normal densities can be added to this model to represent other luminances (dotted curves, Fig. 2). If ‘0’ be replaced by log L and d by log(L + L), then conformity to Weber’s law is preserved if the other densities are located such that they all have means equal to log(Luminance). This realises Fechner’s intuition. The resultant model then describes discriminations over the entire continuum of luminance (of 1◦ diameter flashes of 52 ms duration). Two predictions follow. 2.2. Method of Constant Stimuli Suppose that both luminances are presented on each trial, but in random order: Which is the brighter? The natural development of the model in Fig. 2 is to suppose two random samples, one from each distribution. The stimuli will be correctly
12
D. Laming
Figure 3. Proportions correct in the method of constant stimuli for (a) difference discriminations of luminance (filled symbols) and (b) increment detections (open symbols). The broken curve fitted to the difference discrimination data is a normal integral with respect to L, and the continuous curve a normal integral with respect to log(1 + L/L). The dotted curve fitted to the increment detection data is a normal integral with respect to (L)2 . (The data have been recovered from figures published by Leshowitz et al., 1968, and Treisman and Leshowitz, 1969.)
identified if the sample from the (L + L)-distribution is greater than that from the L-distribution. Using the canonical formulation of the operating characteris2 tic (equation (3)), the difference of mean between the two is d and the standard deviation of that difference is (2d 2 ), so that √ (4) P (Correct identification) = (log(1 + L/L)/ 2), where () is the cumulative normal integral. Fechner’s intuition contributes the log transform in equation (4). Figure 3 (filled symbols) shows data from Leshowitz et al. (1968), also for 1◦ diameter flashes, but now of 32 ms and 320 ms duration, presented to the darkadapted eye of the subject. The continuous line is the prescribed normal integral with respect to log(1 + L/L). For these data, this fits a little better than a normal integral with respect to L/L (broken curve). 2.3. Weber’s Law If the probability of a correct response in the Method of Constant Stimuli increases as a normal integral with respect to log(1 + L/L) (equation (4)), 75% correct is achieved at a fixed ratio L/L. Figure 4 (filled symbols) shows the 75% thresholds corresponding to the data in Fig. 3. The straight line has gradient 1 and represents Weber’s law. For a discrimination between two separate luminances, Weber’s law holds down to approximately the region of absolute threshold (shown by the vertical arrows). Similar results are obtained for difference discriminations with nearly
Fechner’s Law
13
Figure 4. 75% thresholds from the same experiment as in Fig. 3. As before, filled symbols represent difference thresholds and open symbols increment thresholds. The continuous straight line has gradient 1 and corresponds to Weber’s law; the broken line has gradient 1/2 and represents the limit imposed by the de Vries–Rose law. The data have been recovered from the figure published by Leshowitz et al. (1968). (Adapted from Sensory Analysis, by D. Laming, p. 10. © 1986, Academic Press. Reproduced by permission.)
all quantitative stimulus attributes. In combination with the normal signal-detection model, Fechner’s law gives a numerically accurate account of discriminations between two separate stimuli. 3. The Logarithmic Transform Fechner’s ‘outer’ psychophysics concerned the properties of discriminations between physical stimuli; ‘inner’ psychophysics concerned the internal processes underlying those discriminations. Fechner conjectured that the logarithmic transform characterised the transition between the two and that, subsequent to that transformation, inner psychophysics was strictly linear (see Adler, 1966, pp. 56– 57). Accordingly, experimenters have repeatedly looked for logarithmic relations between stimulus intensity and neural response in physiological experiments. The best-known finding comes from Hartline and Graham (1932) recording from a single ommatidium in the king crab, Limulus. The maximal frequency of discharge at onset increases as the logarithm of luminance over about three log units; though the sustained rate of discharge, measured after 3.5 s, does not, but follows a power law instead. In a subsequent investigation Hartline (1938), recording from the frog retina, could find only one logarithmic relation, between luminance and the latency of initial response. These results are, at best, equivocal. However, there is other, much more compelling, evidence to show that (a) the underlying transform
14
D. Laming
is not logarithmic, and (b) the mean rate of neuron discharge has nothing to do with the case. 3.1. Increment Detection Suppose that the luminance L is now presented continuously and L is added to the L for 32 ms. The open symbols in Fig. 4 are the 75% increment thresholds from the same study. At the two highest luminances the increment thresholds are lower by 0.7 log units; that is, the eye is five times more sensitive when the stimulus levels are contiguous. (At lower luminances, however, the increment thresholds are limited by the de Vries–Rose law, represented by the broken line of gradient 1/2.) The open symbols in Fig. 3 are the corresponding psychometric function data. They fit well to the dotted curve which is a normal integral with respect to (L/L)2 . They do not conform to the normal integral with respect to log(1 + L/L), which describes the function for a difference discrimination. All this is consequent on a simple change in the configuration of the stimulus levels to be distinguished. It needs to be emphasised that both difference and increment thresholds were obtained from the same observer; the stimuli were presented using the same instrumentation, in the same experimental paradigm, and at the same luminances. They are literally from the same eye. A single sensory analysis is needed to deliver these rather different results, contingent solely on the levels to be distinguished being contiguous. So what is going on? The eye is sensitive principally to changes in luminance — to edges and temporal discontinuities. In sensory discrimination, information from such cues dominates any contribution from the absolute comparison of luminances. This is shown by the Craik–Cornsweet Illusion. 3.2. The Craik–Cornsweet Illusion When the sectored disk in the left hand panel of Fig. 5 is rotated at 1600 rpm, it appears as in the centre panel. The inner disk appears darker than the annulus
Figure 5. The Craik–Cornsweet Illusion. The centre and right-hand panels have been made from the same photographic negative which was taken while the disc in the left-hand panel was rotated at high speed. The inner disc has the same (time-average) luminance as the periphery, but it appears darker in the centre panel because at the boundary only the sharp step in luminance is perceptible. (From Sensory Analysis, by D. Laming, p. 64. © 1986, Academic Press. Reproduced by permission.)
Fechner’s Law
15
surrounding it. This is so, notwithstanding that both the periphery and the centre present equal sectors of black and white and, once the rate of rotation is sufficiently fast that the flicker is no longer perceptible, ought, according to the Talbot–Plateau law, to be indistinguishable. But if the boundary between centre and the periphery be obscured by an opaque annulus (right-hand panel), it can be seen that the Talbot– Plateau law does not lead us wrong, for now the inner disc and the outer annulus appear of equal brightness. Remove the annulus from the boundary and the illusion is restored. This illusion was first described by Craik (1966) and related illusions have been explored by O’Brien (1958) and Cornsweet (1970, pp. 270–275). Most people will suppose that a difference in brightness between centre and periphery would be detected when visual neurons focussed on those two different areas signal different luminances. While differences in luminance (of 25 percent or more) can be detected in this manner, the illusory difference in Fig. 5 is inferred from the change in luminance at the boundary. Detection of a step change in luminance is much more sensitive (2 percent) than a direct comparison between centre and periphery (and in Fig. 5 there is no difference between centre and periphery to be detected). The eye is therefore organised to detect changes in input, not to compare luminances absolutely. 3.3. Differential Coupling The eye is principally sensitive to edges and temporal discontinuities because the interface between the physical stimulus and neural response is differential. This has to be so to accommodate the shape of the detectability function in Fig. 3. The argument runs as follows (Laming, 1973, pp. 148–150). Suppose that neural response is some smooth transform f (L) of luminance. The diagram in the upper panel of Fig. 6 shows a log function, but any smooth function will do. Such a function can be expanded about L in a Taylor series to give: f (L + L) = f (L) + L · f (L) + 12 (L)2 f (L) + · · · ,
(5)
for small excursions about L. The internal difference in sensation between stimuli of luminances L and L + L is therefore L · f (L) + 1/2(L)2 f (L) + · · · — the interval CB in the upper panel of Fig. 6. This interval is approximately proportional to L (CB in Fig. 6). On this basis one should expect the probability of a correct response to increase approximately as a normal integral with respect to L. But the detectability function is approximately a normal integral with respect to (L)2 , and (L)2 enters into the reckoning only if f (L) = 0, that is, if the gradient is horizontal as in the lower panel of Fig. 6. But the square-law shape of the detectability function is not specific to one particular luminance; it is a general finding at all luminances, so that the gradient must be horizontal everywhere. That implies a differential interface. I emphasise, to pre-empt possible confusion, that this argument, of itself, does not account for the shapes of either discriminability or detectability functions; that is yet to come. But it does show that the interface has to be differential. In
16
D. Laming
Figure 6. Diagrams to illustrate the argument demonstrating differential coupling. In the upper diagram (a) an increase in stimulus magnitude from L to L + L produces an increase CB in mean internal sensation, which is approximately proportional to L. So d should increase in proportion to L, and this is so for difference discriminations. However, for increment detection d increases approximately as (L)2 , so that a configuration is required like that in the lower diagram (b). Here C and B coincide, and BB is approximately proportional to (L)2 . But this can happen only if the tangent at A is horizontal. (From Sensory Analysis, by D. Laming, p. 62. © 1986, Academic Press. Reproduced by permission.)
consequence, (a) the transfer function cannot be logarithmic and (b) mean neural discharge has nothing to do with the case. 4. The χ 2 Model Let us begin the argument over again. A visual stimulus is a Poisson process of light quanta. If the luminance is suitably scaled, its mean and variance are both equal to L. The upper trace in Fig. 7(a) shows a sample output from a lightemitting diode by way of example. The receptive fields of sensory neurons have two components, one positive (excitatory; upper trace in Fig. 7(a)) and one negative (inhibitory; lower trace). The χ 2 model assumes that these two components absorb statistically independent copies of the stimulus and are, moreover, balanced.
Fechner’s Law
17
Figure 7. A: Two Poisson traces of equal density and opposite polarity, representing the inputs respectively to the excitatory and inhibitory components of a receptive field. B: Their sum, a Gaussian noise process centred on zero mean. (From Sensory Analysis, by D. Laming, p. 80. © 1986, Academic Press. Reproduced by permission.)
In consequence, the means cancel, but the variances combine in square measure, because the two inputs are statistically independent. Cancellation appears to be an intrinsic property of visual neurons. Even visual inputs, e.g., gratings, that ought to unbalance the neuron nevertheless elicit a response that quickly fades, a problem that every electro-physiologist has to grapple with. The response of such a neuron to a uniform level of stimulation consists simply of Gaussian noise (Fig. 7(b)). The resultant Gaussian noise process has power L. A discrimination between two separate stimuli cannot thereafter be more sensitive than a discrimination between two samples of Gaussian noise of powers proportional to L and L + L. However, while sensory neurons accept inputs of both polarities, they transmit action potentials of one polarity only. There is an internal half-wave rectification. This means that only the positive excursions of the Gaussian noise are transmitted; they are observed as a maintained discharge (see Fig. 9 below). It also means that the discrimination of luminance between two separate stimuli has to depend on the positive excursions of the Gaussian noise (the maintained discharges) only. The statistical properties of half-wave rectified-noise do not differ from those of the full process; there is simply a loss of half the degrees of freedom. The energy in a sample of Gaussian noise, of bandwidth W , duration T , and power L, is distributed approximately as χ 2 with 2W T d.f., scaled by L/2W T . Envisage a sample from a Gaussian noise process with an exactly level spectrum over the frequency band (0, W ), and no frequencies higher than W . The waveform cannot change more rapidly than W −1 and for this reason the sample waveform can be reconstructed from a knowledge of its values at intervals (2W )−1 (Green and
18
D. Laming
Swets, 1966, pp. 157–161). In a sample of length T , 2W T such values are needed to specify the sample waveform. They are normal variables of zero mean and variance proportional to the power of the noise process; they are mutually independent. The sum of their squares is a χ 2 variable with 2W T d.f. They equate to the integral of the square of the waveform, which is equal to its power times duration (T ). An alternative and equivalent model of a sample of Gaussian noise of duration T represents it as the sum of 2W T sinusoids of frequencies n/W T . Both sine and cosine waveforms are needed at each frequency to capture the phase of that component. The amplitudes of these sinusoids are again normal variables of zero mean and variance proportional to the power, and the sum of their squares is a χ 2 variable with 2W T d.f. (Green and Swets, 1966, pp. 174–175). A Gaussian noise process with an exactly level spectrum over the frequency band (0, W ), and no frequencies higher than W is actually a mathematical ideal that cannot occur in nature, so that, in practice, the χ 2 distribution is only an approximation (see Note 3). But proportionality to the power of the noise (in the present case, to luminance) is nevertheless exact, so that the explanation of Weber’s law is not compromised. So, the data in Fig. 1 might also be modelled with two χ 2 distributions in a manner exactly analogous to the normal model in Fig. 2. This χ 2 model is shown in Fig. 8, with criteria calculated, as before, from the 52 ms data in Fig. 1. Each density function in Fig. 8 represents the distribution of energy in samples of Gaussian noise from stimuli of different luminances. This model too describes discriminations over the entire continuum of luminance (of 1◦ diameter flashes of 52 ms duration). But how accurate is it? The signal detection data from Nachmias and Steinman (1965) require 73 d.f. (52 ms) and 53 d.f. (230 ms) respectively in the χ 2 model. For numbers of degrees
Figure 8. The χ 2 model for discriminations between two separate stimuli, analogous to the normal model of Fig. 2. The continuous density functions and the five criteria (broken lines) have again been calculated to model the 52 ms data in Fig. 1. The dotted curves indicate some of the additional density functions that would be needed to model the entire continuum of luminance.
Fechner’s Law
19
of freedom as large as these, the approximation of log χ·2 by a normal variable is very good. The precision of this approximation can be assessed from Fig. 1 in which the broken operating characteristics are calculated from the χ 2 model. Both models fit the data to within ordinary limits of experimental error. As previously, the operating characteristic represents a limit on the accuracy of discrimination that the observer can achieve; and as explained above, the shape of the operating characteristic and the other properties of the model are independent of the particular transform of the decision variable — a log transform in Fig. 8. The model is formulated in terms of stimulus energy as a matter of convenience in calculation, and this choice of formulation has no implications for the manner in which a random sample from one or the other of the χ 2 distributions is converted into a response. Nothing is said here about how, in practice, this is accomplished. So the fact that the decision variable is the aggregate of the squares of the Gaussian excursions is of no consequence at all. 4.1. Where Does Fechner’s Log Transform Come From? There remains one difference between the normal and χ 2 models that needs to be emphasised. In the normal model (Fig. 2) the different distributions are spaced in proportion to log (Luminance) — that is to say, a logarithmic transformation of the stimulus variable is essential in that model to accommodate the progressively increasing loss of information in transmission as luminance increases. In the χ 2 model, on the other hand, the distributions are scaled strictly according to luminance (L). The progressively increasing loss of information, as luminance increases, that underlies Weber’s law now comes about through differential coupling eliminating the mean input, and there is no need of any non-linear transformation of physical stimulus magnitude. So . . . A single stimulus — a flash of light — generates a neural process with the statistical structure of a sample of Gaussian noise. A discrimination between two separate flashes is, therefore, equivalent to a discrimination between two samples of Gaussian noise. The distribution of energy in a sample of Gaussian noise is approximately χ 2 . For the number of degrees of freedom typically implied by the precision of a difference threshold, log χ 2 is approximately normal. The log transform in Fechner’s law is simply a reflection of that mathematical relationship. There is this consequence: Sensory discrimination does not provide any empirical foundation for a measure of sensation distinct from the physical magnitude of the stimulus
20
D. Laming
because the data can equally accurately be modelled in terms of the physical stimulus magnitude. 5. Detectability Functions The normal model in combination with Fechner’s law works because it is an approximation to the χ 2 model. For discriminations between two separate stimuli, both are accurate to within ordinary limits of experimental error. But the χ 2 model is to be preferred because it accommodates a much wider range of phenomena. This is demonstrated here with respect to the shape of detectability functions. 5.1. The Square Law Transform The centre component of a receptive field has a slightly shorter latency than does the surround (in the retina, for example, the surround is routed through amacrine cells and has a slightly longer path). So, when an increment (of duration T ) is added to a continuous background, the negative component is delayed (as in Fig. 9(a)). That delay means that, after cancellation between the positive and negative inputs, the onset and offset of the increment survive as positive and negative perturbations in the output (Fig. 9(b)). Sensory neurons transmit action potentials of one polarity only, so that the combined output is half-wave rectified (Fig. 9(c)). The negativegoing perturbation is thereby attenuated relative to the positive. Envisage, now, that some subsequent ‘observation window’ scoops both perturbations into one pool for examination. Mathematical analysis shows that the net perturbation (Fig. 9(d)) is then proportional to the square of the increment and the proportion correct is approximately a normal integral with respect to (L)2 . This relation is shown as the dotted curve in Fig. 3. To pre-empt misunderstanding, positive and negative perturbations in the output result from relative delay in the positive and negative inputs to the same neuron. It happens that the retina contains both on-centre and off-centre neurons. It also happens that these two populations of neurons conduct along different pathways. There is no suggestion here that these two populations interact. In fact, the phenomena of sensory discrimination can be modelled with reference to one population (e.g., the on-centre neurons) only; the other population does not show. 5.2. The Transition from Outer to Inner Psychophysics Notwithstanding the numerical accuracy of the normal model (Fig. 2), the underlying transform is not logarithmic. Instead, it consists of: (i) A differential coupling (realised by cancellation between two statistically independent copies of the stimulus) followed by; (ii) A half-wave rectification (implicit in neural transmission by there being action potentials of one polarity only).
Fechner’s Law
21
Figure 9. A: Sample functions from two Poisson processes of equal density and opposite polarities, each with an increment of duration T superimposed; the increment to the negative component is delayed relative to the positive increment by a fixed interval τ . B: The combination of the two samples in A, being a Gaussian noise process with two half-increments of opposite polarities embedded in it. C: The combined process after low-pass filtering and half-wave rectification. D (scale enlarged 10×): The process after aggregation within an ‘observation window’ of a span long compared with the duration of the incremental perturbation. The depression of responding due to the negative half increment has now disappeared; so also has the greater part, but not all, of the positive half-increment. (From Sensory Analysis, by D. Laming, p. 112. © 1986, Academic Press. Reproduced by permission.)
Fechner envisaged just one transition between the two domains, outer and inner. But this particular transform is applied successively many times over by successive layers of receptive field units. This gives sensory pathways the capability of analysing an immense array of different patterns of stimulation. Of course, it was impossible to envisage all that 150 years ago, at the virtual beginning of sensory-experimental time. What Fechner’s intuition has accomplished is to inspire all the experimental study that has led to our present day understanding of sensory discrimination (see Note 4). Notes 1. This article is a greatly expanded version of D. Laming. Fechner’s law: Where does the log transform come from? In: E. Sommerfeld, R. Kompass and
22
D. Laming
T. Lachmann (Eds), Fechner Day 2001. Lengerich: Pabst pp. 36–41. Copyright © 2001 Pabst Science Publishers, D-49525 Lengerich. The material in that earlier version is incorporated here by permission. 2. Let X1 , X2 and X3 be three stimulus magnitudes generating sensations of magnitudes S1 , S2 and S3 . The equation: (S3 − S2 ) + (S2 − S1 ) = (S3 − S1 ), transposes into: f (X2 /X1 ) + f (X2 /X1 ) = f (X3 /X1 ), where f (X2 /X1 ) is the functional relation between (S2 − S1 ) and the ratio (X2 /X1 ). Putting X2 /X1 = ex , etc. leads to Cauchy’s functional equation: g(x) + g(y) = g(x + y), which, under quite mild regularity conditions (if g(x) is continuous, the conclusion certainly follows) admits only the solution g(x) = cx. It follows that f (X2 /X1 ) = c ln(X2 /X1 ). 3. The model of Gaussian noise presented by Green and Swets (1966, pp. 154– 161) is a special case of a very general theory. A Gaussian noise process can be specified by just two entities, its autocorrelation function and its power (or scale factor). These two entities are independent. A sample of noise of specified autocorrelation function over an interval T can be represented as the sum of a finite number of orthogonal components. The orthogonal functions are determined by the combination of the autocorrelation function and the sample interval. (sin πnt/W T and cos πnt/W T are the orthogonal functions for the special case considered in the text.) The coefficients of these orthogonal functions are normal variables of zero mean and variance proportional to the noise power (see Davenport and Root, 1958, pp. 96–99 and App. 2). The only difference from the idealised situation considered in the text is that the variances of the normal coefficients are no longer equal. Their sum of squares (the energy in the sample) is properly a general gamma distribution (McGill, 1967). But proportionality to the power of the noise is still exact, so that the explanation of Weber’s law is not compromised. 4. Since Fechner’s day, Stevens (1957) has proposed that sensation increases as a power function of stimulus magnitude, rather than a logarithmic function. Apart from their common use of the word ‘sensation’, Fechner and Stevens are talking about two quite different concepts. For Fechner, the unit of sensation is the just noticeable difference and a scale of sensation may be constructed by adding up j.n.d.s. The validity of this operation depends on the validity of the logarithmic model. For Stevens, sensation is measured by the numbers assigned by participants in magnitude estimation and other ‘direct’ methods of scaling,
Fechner’s Law
23
and is therefore a matter of usage of numerical responses. Those numerical responses measure sensation only by fiat. The issues involved in attempts to measure sensation, and the question whether sensation is in fact measurable, have been examined by Laming (1997). References Adler, H. E. (trans.) (1966). Elements of Psychophysics, Fechner, G. T., Vol. 1. Holt, Rinehart and Winston, New York, NY, USA. Boring, E. G. (1950). A History of Experimental Psychology, 2nd edn. Appleton-Century-Crofts, New York, NY, USA. Cornsweet, T. N. (1970). Visual Perception. Academic Press, New York, NY, USA. Craik, K. J. W. (1966). The Nature of Psychology, Sherwood, S. L. (Ed.). Cambridge University Press, UK. Davenport, W. B. and Root, W. L. (1958). An Introduction to the Theory of Random Signals and Noise. McGraw-Hill, New York, NY, USA. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley, New York, NY, USA. Hartline, H. K. (1938). The response of single optic fibers of the vertebrate eye to illumination of the retina, Amer. J. Physiol. 73, 400–415. Hartline, H. K. and Graham, C. H. (1932). Nerve impulses from single receptors in the eye, J. Cell. Compar. Physiol. 1, 277–295. Laming, D. (1973). Mathematical Psychology. Academic Press, London, UK. Laming, D. (1986). Sensory Analysis. Academic Press, London, UK. Laming, D. (1997). The Measurement of Sensation. Oxford University Press, UK. Laming, D. (2001). Statistical information, uncertainty, and Bayes’ theorem: some applications in experimental psychology, in: Symbolic and Quantitative Approaches to Reasoning with Uncertainty (Lecture Notes in Artificial Intelligence), Benferhat, S. and Besnard, P. (Eds), Vol. 2143, pp. 635– 646. Springer-Verlag, Berlin, Germany. Laming, D. (2004). Human Judgment: The Eye of the Beholder. Thomson Learning, London, UK. Leshowitz, B., Taub, H. B. and Raab, D. H. (1968). Visual detection of signals in the presence of continuous and pulsed backgrounds, Percept. Psychophys. 4, 207–213. McGill, W. J. (1967). Neural counting mechanisms and energy detection in audition, J. Mathemat. Psychol. 4, 351–376. Nachmias, J. and Steinman, R. M. (1965). Brightness and discriminability of light flashes, Vision Res. 5, 545–557. O’Brien, V. (1958). Contour perception, illusion and reality, J. Optic. Soc. Amer. 48, 112–119. Stevens, S. S. (1957). On the psychophysical law, Psychol. Rev. 64, 153–181. Swets, J. A., Tanner, W. P., Jr. and Birdsall, T. G. (1961). Decision processes in perception, Psychol. Rev. 68, 301–340. Tanner, T. A., Jr., Rauk, J. A. and Atkinson, R. C. (1970). Signal recognition as influenced by information feedback, J. Mathemat. Psychol. 7, 259–274. Treisman, M. and Leshowitz, B. (1969). The effects of duration, area, and background intensity of the visual intensity difference threshold given by the forced-choice procedure: derivations from a statistical decision model for sensory discrimination, Percept. Psychophys. 6, 281–296.
Fechner’s Elusive Parallel Law Helen E. Ross 1,∗ and Nicholas J. Wade 2 1
Department of Psychology, University of Stirling, Stirling FK9 4LA, UK 2 School of Psychology, University of Dundee, Dundee DD1 4HN, UK
Abstract Weber’s Law states that the differential threshold or just-noticeable-difference (jnd) is proportional to the physical intensity of the stimulus. Fechner built up his logarithmic law of sensation intensity from Weber’s Law and the assumption that all jnds are subjectively equal. He thought it important that the Parallel Law should also hold. The Parallel Law states that, when perceived stimulus intensity is changed by something other than physical intensity (such as adaptation), Weber’s Law continues to hold: discrimination should be unchanged provided the perceived values of the two stimuli change in the same ratio. Fechner claimed that weight discrimination was unaffected by weight adaptation; he was unsure about light adaptation; and he claimed that tactile length discrimination was unaffected by perceived changes caused by the bodily location of the stimulus. Modern research on adaptation for weights and other sensory stimuli shows that changes occur both in perceived intensity and in discrimination. Discrimination between stimuli is usually finest when the adaptation level is appropriate to the test level. There is insufficient evidence concerning the discrimination of tactile length and visual length when perceived length is changed. However, the Parallel Law may be untestable because of the difficulty of obtaining measures in the same experiment both for changes in discrimination and for the ratios of the perceived changes of the stimuli. Keywords Weber’s law, Fechner’s law, parallel law, discrimination, adaptation, weight, light, length
1. Introduction One hundred and fifty years ago, Fechner (1860) published Elemente der Psychophysik. In this book he sought to establish Weber’s Law for several sense modalities, because he believed that the logarithmic law of sensation magnitude was dependent on the validity of Weber’s Law. Weber’s Law states that the jnd (justnoticeable-difference) or DL (difference limen or threshold) is proportional to the *
To whom correspondence should be addressed. E-mail:
[email protected]
26
H. E. Ross, N. J. Wade
Figure 1. Fechner’s Parallel Law by Nicholas Wade. Fechner can be seen at a slightly different brightness in the text from p. 300 of the first volume of Elemente der Psychophysik (Fechner, 1860).
physical intensity of the stimulus. Fechner wanted to build a relationship between a physical dimension of a stimulus (such as its weight) and its psychological attribute (heaviness). He assumed that all jnds are subjectively equal, and that this would produce a logarithmic relation between the stimulus intensity and the sensation — assumptions that have both been questioned (see Heidelberger, 2004; Masin et al., 2009). Most researchers nowadays accept that a power function is a more likely relationship, or that a logarithmic function is one of a family of possible functions (see review by Murray, 1993, and commentaries). Nevertheless Fechner is still respected for his pioneering work in psychophysics and for his other wide-ranging interests (Heidelberger, 2004). One of Fechner’s lesser known statements was the Parallel Law, the assumption that Weber’s Law applies to changes in perceived stimulus intensity in the same way as to changes in physical intensity: if the perceived intensity is altered by some factor other than physical intensity (such as adaptation), discrimination should remain unchanged. He added the proviso that this would be true only if the two stimuli appear to change in the same ratio. In Fig. 1, Fechner is shown in the text from his Elemente in which the Parallel Law was introduced. Fechner discussed the Parallel Law in Chapter 12 of The Elements of Psychophysics (1966, p. 250). He wrote:
Fechner’s elusive parallel law
27
“I shall call the law with which we are mainly concerned the law parallel to Weber’s Law, or, in short, the parallel law, since we can look at it as a transposition of Weber’s Law from the external to the internal realm. It can be formulated in this way: When the sensitivity to two stimuli changes in the same ratio, the perception of their difference will nevertheless remain the same. It can also be rephrased in the following equivalent manner: When two stimuli are both perceived as weaker or stronger than before, their difference appears unchanged, if both stimuli would have to be changed in the same ratio to restore them to their previous absolute sensation level”.
Fechner was concerned about the validity of the Parallel Law for two reasons. Firstly, he thought the Parallel Law was a bridge between ‘outer’ and ‘inner’ psychophysics: Weber’s Law should be transferable from the measurements of external stimuli to the measurements of internal effects. This was important to him as part of his belief in ‘psychophysical parallelism’ regarding the mind-body problem (see Heidelberger, 2004). Secondly, he believed that Weber’s Law could not be true unless the Parallel Law was also true. However, he gave no justification for this assertion. He was unwilling to accept that Weber’s Law might hold when other factors are held constant, but not when spatial location or adaptation level is varied (Murray and Ross, 1988). Fechner derived his logarithmic scale of sensation indirectly from jnds, thus creating a tautologous link between sensory magnitude and discrimination. Many modern theorists (see Murray, 1993) have tried to establish a unified psychophysical law, enquiring whether there is some relation between sensory discrimination and scales of sensory magnitude constructed by some other direct method. For example, Ekman (1956, 1959) argued that the subjective size of the jnd is a linear function of the subjective magnitude of the standard (as derived from a ratio scale), and that this is analogous to Weber’s law which applies in the physical realm. Similarly, Teghtsoonian (1971) discussed the relation between the dynamic range of a sensory system (the ratio of the greatest to the smallest stimulus intensity to which an observer can respond), and the sensory range (the ratio of the corresponding sensory magnitudes). He argued that widely varying dynamic ranges may all be mapped onto the same sensory range. He claimed that the exponents of the power function for the sensory scale varied with the dynamic range of the sense modality; and that there was a constant relation between the exponent of the power function for a given modality and its typical (steady-state) Weber fraction. Others have questioned whether the empirical data support that relationship (e.g., Laming, 1997), the values of the Weber fractions being particularly debateable. However, those seeking a unified law restrict their enquiries to changes in sensory magnitude brought about by changes in the intensity of the physical stimulus. Changes in sensory magnitude can occur for many other reasons (e.g., Lockhead, 1992). Fechner’s Parallel Law was concerned with some of these other changes. Fechner thought the Parallel Law could be investigated by examining spatial differences (some parts of the body being more sensitive than others) and temporal differences (adaptation altering sensitivity). He devoted the greater part of Chap-
28
H. E. Ross, N. J. Wade
ter 12 to experiments on weight discrimination, a smaller section to visual sensation and light adaptation, and one paragraph to sensations of extent. This paper follows the order of Fechner’s topics, and attempts to redress the balance of his material. 2. Weight Perception and Discrimination 2.1. Fechner’s Arguments Fechner devoted the bulk of the section on weights (1966, pp. 252–267) to a description of his personal experiments on adaptation. He recounted a gruelling series of experiments in which he lifted standard weights of 1.0, 1.5 or 3.0 kg for varying lengths of time (0.5–4 s) and compared the standards with slightly heavier weights. He measured discrimination from the number of correct judgements as to which was heavier. He claimed that discrimination usually remained unchanged with lifting time, even though the weights felt heavier with fatigued arms. He also lifted weights of 9.5 pounds (4.53 kg if the German Apothecary pound equalled 477 g — see Zupko, 1978; or 4.56 kg if it equalled 480 g — see Ross and Murray, 1996, p. 20). He raised them above his head to induce fatigue, before discriminating between lighter weights. These experiments brought on various ailments (such as raised pulse and temperature), which Fechner described in greater detail than his experimental methods. The ailments prevented him from continuing the tests and obtaining what he regarded as sufficient data. Nevertheless he concluded that the Parallel Law held because adaptation did not affect discrimination. Fechner then examined the spatial question by looking at Weber’s data on ‘absolute sensitivity’ (Weber’s and Fechner’s term for ‘perceived heaviness’) and ‘differential sensitivity’ (differential thresholds) for weights placed on different parts of the body, and found no correspondence between the two. Weber (1834, 1996) argued that static weight perception (the sense of pressure) should be related to tactile acuity (two-point threshold), whereas active (muscular) weight perception need not be so related. Weber measured static weight discrimination thresholds in different parts of the body and argued that the smaller the jnd, the more acute the touch sense, and the heavier the impression of the weights. He also measured relative heaviness in different parts of the body, and assumed that heaviness was a measure of tactile sensitivity. He concluded that the rank order of sensitivity for different body parts was much the same as for the two-point threshold — but the correlation is only moderately convincing. Fechner rightly pointed out that Weber’s different methods of measurement could not be compared. Nevertheless, he took the absence of a good correlation between the two to mean that differential discrimination was unchanged when perceived heaviness varied in different body parts, and he used this as evidence in favour of the Parallel Law. 2.2. Modern Work Modern work on the relation between different tactile abilities gives more support to Weber than to Fechner. Stevens (1979) found that sensitivity to weights placed
Fechner’s elusive parallel law
29
on the body (as measured by magnitude estimates) agreed well with Weinstein’s (1968) data on point localisation and two-point threshold. However, there appear to be no adequate data on whether weight magnitude estimates correlate with weight discrimination in different body parts. No light is shed on the validity of the Parallel Law for variations in bodily location. Modern work on the effect of adaptation is more informative, and the fact that changes in discrimination occur seems to contradict the Parallel Law. Various predictions could be made about the effects of adaptation. For weight discrimination, as for most types of sensory discrimination, Weber’s Law holds only in the middle range of intensities and the Weber fraction rises at the extremes of the scale (see Laming, 1986; Masin, 2009). It could then be predicted that (for light weights) a reduction in perceived weight should raise the Weber fraction, and an increase in perceived weight should lower it; but the Parallel Law might hold for weights in the mid range. Adaptation is a cause of changes in perceived weight: weights feel heavier after adaptation to lighter weights, and lighter after adaptation to heavier weights. Thus, when testing very light weights, adaptation to heavy weights should worsen discrimination and adaptation to even lighter weights should improve it. (Note that Fechner’s experiments dealt with the effects of extreme fatigue, which made all weights feel heavier. See Burgess and Jones (1997) for a discussion of the difference between fatigue and normal weight perception.) An alternative prediction is that any inappropriate adaptation, whether too high or too low, will worsen discrimination. Most modern theories regard adaptation as a change in the gain of the system, the gain being set to the appropriate level for maximum discrimination and for protection against sensory overload (e.g., Keidel et al., 1961). Adaptation also changes the apparent value of a stimulus, with the result that accurate absolute intensity perception is sacrificed to increased discrimination. There is evidence from many sense modalities showing that discrimination is finest when the observer is adapted to the test intensity level, and is poorer when adapted to some other level, a higher level usually being more harmful than a lower level. Weight experiments show that discrimination deteriorates when the observer is adapted to an inappropriate weight level, whether too low or too high. For example, adaptation to heavier weights causes a deterioration in weight discrimination (Gregory and Ross, 1967; Holway et al., 1938), and the DL for lighter weights may rise by a factor of 2–6 (Ross, 1981). A varying standard causes poorer discrimination than a steady standard (e.g., Hellström, 2000; Woodrow, 1933), because it prevents appropriate adaptation: the weight DL may rise by a factor of about 1.3. Discrimination is also impaired in many other situations requiring adaptation, such as under water or reduced or increased accelerative forces (Jones, 1986; Ross, 1981). Changes in expectation can cause illusions and seem to act like changes in physiological adaptation, by adjusting the range of expected intensities. The size–weight illusion may be described as contrast with the expected weight (e.g., Buckingham and Goodale, 2010). Small objects are expected to be lighter than large objects of
30
H. E. Ross, N. J. Wade
the same material, but if the objects have the same weight the smaller object feels heavier when lifted. Several different predictions have been made as to whether the illusion affects the discriminability of two objects of the same size and material but slightly different weight. Seashore (1896) and Ross and Gregory (1964) suggested that the DL should be proportional to the perceived rather than the physical stimulus intensity, so that an increase in perceived weight should raise the Weber fraction. Ross and Gregory measured the DL for canisters with densities of about 0.61 and 5.2 and found a higher threshold for the denser and apparently heavier canisters. However, the authors tested over a wider range of densities (1970), and the results supported a different hypothesis: discrimination should be best for stimuli close to the expected density, and poorer for those with higher or lower densities. Ross (1969) measured the expected density by having subjects lift visible weights of different volumes and masses and compare them with a hidden weight. Weight perception can be defined as veridical when the lifted objects have the expected weight for their volume and material, and feel equal to a hidden weight. This value was found to correspond to a density of about 1.7 for tin canisters and 0.14 for blocks of polystyrene. Ross and Gregory (1970) found that discrimination for tins and polystyrene blocks tended to be finest for containers nearest to those densities. They also measured the expected density of tins for three subjects, obtaining a value of 1.0, and found that their discrimination was best for tins near that value. The DL was raised by a factor of up to 1.2 for objects that are apparently too heavy or too light for their size (see Ross, 1981). They argued that the effect of the illusion was similar to that of adaptation to inappropriate weights, with discrimination being best when the gain was set appropriately. A different hypothesis was put forward by Jones and Burgess (1998). They suggested that the neural gain is increased when a smaller object is lifted, so that a given sensory input has a larger central effect and the heaviness of an object is felt to be greater; and the increased gain also causes finer discrimination. They found that the judged difference ratio between two small metal cans (density about 1.55) was greater than that between two larger cans of the same mass (density about 0.11). The method employed by Jones and Burgess is interesting because they used suprathreshold differences and asked the subjects to estimate the ratio of one can to the other. They did not directly measure discrimination, though they argued that their measure was related to it. Unfortunately they did not test for cans with significantly higher densities than 1.5: these should give even better discrimination according to their theory, but poorer discrimination according to the theory of Ross and Gregory that the optimum density is in the region of 1.5. Regardless of the outcome of such an experiment, both predictions seem to refute the Parallel Law. There remains a formidable obstacle to testing the validity of the Parallel Law. As Ward (1876) pointed out, it is questionable whether a change in ‘sensibility’ would leave the ratios of two stimuli unchanged, as required by Fechner’s definition of the law. If the law only holds when the two stimuli to be discriminated are changed in the same ratio, the law is in danger of being a tautology. If discrimination changes
Fechner’s elusive parallel law
31
when the perceived values change, this could be because the two stimuli are not changed in the same ratio, or because there is some change in the ‘noise’ in the system, or both. It is difficult to design experiments to disentangle these factors. 3. Light Adaptation Fechner discussed the Parallel Law for brightness discrimination under various lighting conditions (1966, pp. 268–273) — that is, during adaptation. Adaptation refers either to changes with constant stimulation or to adjustments with varying stimulation. An example of the former is the change in apparent intensity of a light source when observed for some time. A common instance of the latter is dark and light adaptation. Both of these were mentioned by Fechner, but he paid most attention to what would now be called dark adaptation. As the editor of the translation noted, ‘Every time this topic comes up Fechner is stumped’ (1966, p. 268). Unlike the previous section on weight discrimination, Fechner’s considerations of brightness and its variations are qualitative. Indeed, they are not his own observations but are drawn from contemporary texts. A major problem was that no adequate definition of adaptation was available at that time. Aubert’s (1865) distinctions between adaptation and accommodation lay ahead. Aubert systematically plotted the time course of light and dark adaptation, calibrating the intensity of the light source with an episcotister of his own invention: he measured his own thresholds for detecting light at regular intervals throughout almost two hours in darkness. Fechner, on the other hand, either recounted the observations of others or returned to his earlier experiments on afterimages. Afterimages can be seen following either brief, intense illumination of the eye or prolonged fixation of an illuminated stimulus. As an example, a negative portrait of Fechner is shown in Fig. 2; fixating on his left eye for about 30 s and then looking at the white square will result in seeing his face in normal contrast. Historically, afterimages have been given several names, like ocular spectra and accidental colours, both of which refer to the coloured characteristics of the phenomenon. In fact, the term ‘afterimage’ was used by Purkinje (1823) and it was taken up by Fechner (1840) in his more detailed study of them; it has since superseded the other names given to them. Fechner noted that observing afterimages had amplified the variations in perceived brightness over time: “When one is observing an afterimage, the appearance jumps to an earlier or a later phase, depending upon the sudden increase or decrease of light entering the eye. . . . The longer one has observed an object, the more intense the whole appearance of the afterimage and the longer it lasts, before it becomes indistinct. . . . This result appeared to me so distinctive and important that I have repeated my observations several times. . . . The experiments with measurements of duration in diffuse daylight and with the cross-bar have yielded very constant results. . . . The conclusion is that if the afterimage disappears quickly, only the first phase can be seen clearly. . . . Movement of the
32
H. E. Ross, N. J. Wade
Figure 2. Fixating on Fechner’s left eye for about 30 s and then shifting the gaze to the black dot in the right square will produce a negative afterimage. The face will be in its normal contrast, surrounded by a dark disc and a white square. When the afterimage starts to fade it can be restored by blinking rapidly. The portrait was derived from an illustration in Hall (1912).
eye, or the rest of the body, leads to the disappearance of complementary afterimages” (Fechner, 1840, pp. 215, 217–218 and 221).
Fechner knew that adaptation to inappropriate light intensity impaired intensity discrimination, but he was unable to provide a satisfactory explanation. He suggested that the breakdown of the Parallel Law followed the breakdown of Weber’s Law at high and low intensities: ‘. . . these deviations from the parallel law occur under conditions completely analogous to those of the deviations from Weber’s Law at its lower limits. . . .We do not claim validity for the parallel law, within wider limits than those for Weber’s Law’ (p. 270). Modern approaches to visual adaptation converge on gain control rather than returning to the ‘parallel’ concepts applied by Fechner (see Clifford and Rhodes, 2005). Work on adaptation and discrimination extends to phenomena that are not subject to Weber’s Law. For example, adaptation to the luminance contrast of a sinewave grating may reduce the apparent contrast of that grating but enhance the discrimination of differences in contrast between gratings, though the effects vary with the viewing conditions (Abbonizio et al., 2002; Greenlee and Heitger, 1988). Many authors have examined adaptation to a grating pattern, and its effects on detection thresholds and on contrast discrimination for similar patterns of differing contrast. The varying results have led to several different models for the components of changes in gain control (e.g., Foley and Chen, 1997; Snowden, 2001; Wilson and Humanski, 1993). Some authors have compared models of adaptation with models of attention (e.g., Pestilli et al., 2007; Reynolds and Heeger, 2009). The nature of visual adaptation remains a controversial topic.
Fechner’s elusive parallel law
33
4. Sensations of Extent 4.1. Tactile Acuity and Perceived Tactile Length Fechner had little to say on sensations of extent, so it is worth quoting him in full (1966, p. 273): “I have carried out some comparative experiments by the methods of average error and of equivalents, once on chin and upper lip, another time on the five fingers, in order to find out whether the appearance of a difference between two sets of compass points would vary according to the region of the skin, or if no essential relationship exists in this comparison. My experiments speak against any essential dependence of one on the other. Since my observations in this respect were neither complete nor fully discussed, I shall, for the present, refrain from giving further details”.
Fechner did not publish further details. He also failed to mention Weber’s finding that there was a relationship between areas of greater tactile acuity and apparent length, though he presumably had this in mind as the reason for his experiments on the discrimination of differences in extent. Weber (1834; 1996, p. 46) stated that two points feel further apart in areas of the skin with better tactile acuity. If two distinguishable compass points are moved over different parts of the body, the distance between the points seems to shrink or grow in those areas where the two-point threshold is larger or smaller. This is known as Weber’s illusion, and it is well supported by modern experiments (e.g., Cholewiak, 1999; Green, 1982). Cholewiak found that a given length was estimated as about 4–5 times longer on the finger than on the palm, but only slightly longer on the palm than on the thigh. Spatial acuity could be related to peripheral receptor density, receptive field size and cortical magnification (e.g., Brown et al., 2004). It seems plausible that an increase in spatial acuity should cause some increase in perceived length or separation. It has also been shown that a change in the perceived length of the index finger (caused by tendon vibration) causes related changes in the perceived tactile distance on that finger (de Vignemont et al., 2005). The above findings form the background to the question of whether the Weber fraction for tactile line length varies in different parts of the body. It is doubtful that Weber’s Law holds for tactile line length, or cutaneous extent, when the length is impressed upon the skin. Danesino (1932) and Ricci (1937) (both cited in Masin, 2009) measured the Weber fraction for lengths of 1–5 cm, and found that the fraction decreased with length. Jones and Vierk (1973) used reference lengths of 0.16–12.7 cm, and again found that the fraction was smaller for longer lengths. Fechner would argue (in support of the Parallel Law) that if tactile line lengths obey Weber’s Law, discrimination would not vary in areas of differing tactile acuity, despite differences in perceived length. However, if the Weber fraction is lower for perceptually longer lines, it could be argued that the Weber fraction should be lower in areas of higher tactile acuity. Again, there seem to be no data on this question.
34
H. E. Ross, N. J. Wade
4.2. Size Scaling and the Weber Fraction for Visual Line Length and Size There is some evidence that the visual perception of line length is analogous to Weber’s illusion for tactile perception, in that there is some variation in perceived size connected to location in the visual field. Variations could be related to optical blur, or to neural factors such as retinal receptor density, receptive field size, or cortical mapping. In general, retinal and cortical receptor density is associated with slightly larger perceived size (Ross, 1997). The Parallel Law would state that variations in perceived length due to retinal location should not cause any changes in the Weber fraction. As with the tactile question, there appear to be no data to resolve the issue. Size-constancy scaling produces very large changes in perceived size for the same retinal image size, making it possible to investigate whether there are any associated changes in the Weber fraction. Weber’s Law probably does not hold for visual line length. Weber himself (1834, 1996) measured discrimination around one length of line (about 100 mm), for which he found a Weber fraction of about 0.01 or 0.02. The majority of later studies yield similar values when the DL is calculated at the 75% correct level, but few studies explored a sufficient range of line lengths to establish Weber’s Law (Ross, 2003). In fact, Volkmann (1863, cited in Laming, 1986) and Kiesow (1926, cited in Laming, 1986, and Masin, 2009) found that the Weber fraction declined with line length. Because the fraction is not constant, it can be asked whether it follows retinal or perceived length more closely. Ono (1967) used lines subtending angles of about 0.5–6.0◦ at viewing distances of 1.5–4.5 m, thus contrasting retinal size and object size (perceived size through size-constancy scaling): he found that the Weber fraction decreased with line length, but the relationship lay midway between that expected for retinal size and objective size. Constancy scaling thus had the expected direction of effect on the Weber fraction, though the size of the effect was less than predicted. This result could be taken to refute or support the Parallel Law, depending on whether allowance is made for the increase in the Weber fraction for smaller lines. The discrimination of size has also been compared with that of depth. Kaufman et al. (2006) used a 2AFC procedure with stereoscopic distance to measure the precision of depth discrimination at different distances from 2.5 to 20 m, and found that discrimination decreased in proportion to the square root of distance. They used a similar procedure to measure size discrimination when the perceived size changes were caused by size–distance scaling: observers had to detect a difference in perceived size at different perceived distances when angular size was constant. This type of size discrimination was nearly an order of magnitude poorer than depth discrimination, and decreased with distance in parallel with depth discrimination. The size data imply that as size grew perceptually larger with perceived distance, size discrimination became poorer. This appears to be evidence against the Parallel Law.
Fechner’s elusive parallel law
35
5. Adaptation and Contrast Effects in Other Visual Tasks There are many examples of the effects of adaptation on discrimination in various sense modalities. We describe a few recent visual studies to illustrate the ubiquity of the phenomena. Visual motion was studied by Clifford and Wenderoth (1999), who showed that after adaptation to a moving pattern perceived speed decreased, and discrimination of speed differences improved near the adapted level. Similarly, Durgin and Gigone (2007) found that discrimination of speed of optic flow when walking was better if the visual speed was near walking speed, and was worse for slower visual speeds. This may be an example of long term perceptual learning rather than short term adaptation. Most of the examples discussed in this paper involve sensory magnitudes that might in principle obey Weber’s Law (in so far as the law holds). There are other modalities (such as colour and orientation) to which Weber’s law cannot apply. These other modalities are also subject to adaptation and contrast effects, and it can be asked whether such changes are accompanied by changes in discrimination. Spatial contrast illusions cause displacement effects similar to those due to adaptation. Tilt contrast illusions occur when a line is surrounded by lines of a different orientation. Meng and Qian (2005) fixed the physical orientation of their targets at 5◦ off vertical, and found better orientation discrimination when the targets appeared closer to vertical (because of the presence of more oblique surrounds) than when the targets appeared more oblique (because of the presence of near-vertical surrounds). The authors concluded that the oblique effect (poorer discrimination of oblique than vertical lines) depended on perceived rather than physical orientation. However, Solomon and Morgan (2009) showed that loss of discrimination was not solely due to the apparent oblique effect. They used oblique reference orientations, and found that discrimination was best when the surround and reference lines were parallel and there was no tilt illusion; discrimination deteriorated when contrasting surround orientations caused tilt illusions. While this is not a test of the Parallel Law as defined by Fechner, it does add to the evidence that perceptual illusions may cause a deterioration of discrimination. 6. Conclusions Fechner’s discussion of the Parallel Law is one of his least satisfactory contributions to psychophysical science. He failed to justify the theoretical importance of the law, or to offer any clear empirical evidence in support of it. Support would require accepting the null hypothesis — that there are no changes in discrimination accompanying changes in perceived value, when the latter are caused by adaptation or spatial location. No adequate discrimination experiments have been conducted on the effects of spatial location, perhaps because the perceived changes are rather small. However, adaptation has large effects, and experiments have revealed many instances where the Parallel Law does not appear to hold. Adaptation causes contrast illusions, when the perceived value of a stimulus is increased after adaptation
36
H. E. Ross, N. J. Wade
to a lower value, and decreased after adaptation to a higher value. Discrimination is generally best when observers are adapted to the value of the test stimulus, and is poorer when they are adapted to some other value. Thus, contrary to the Parallel Law, changes in perceived but not physical stimulation are associated with changes in discrimination. In its strict form, the Parallel Law is probably untestable. Fechner specified that the law would only hold when the perceived values of both comparison stimuli changed in the same proportion. It does not seem to be possible to obtain satisfactory simultaneous measures both of discrimination and of changes in perceived values. Instead, investigators have concentrated on establishing the changes in discrimination that occur with changing adaptation, and proposing models to account for such changes. This continues to be a topic of interest to current researchers, whereas the theoretical value of Fechner’s Parallel Law is lost in obscurity. Acknowledgements We should like to thank David Murray, Joshua Solomon and three anonymous referees for comments on earlier versions of this paper. References Abbonizio, G., Langley, K. and Clifford, C. W. (2002). Contrast adaptation may enhance contrast discrimination, Spatial Vision 16, 45–58. Aubert, H. (1865). Physiologie der Netzhaut. Morgenstern, Breslau, Germany. Brown, P. B., Koerber, H. R. and Millechia, R. (2004). From innervation density to tactile acuity: 1. Spatial representation, Brain Research 1011, 14–32. Buckingham, G. and Goodale, M. A. (2010). Lifting without seeing: the role of vision in perceiving and acting upon the size weight illusion, PLoS ONE 5, e9709; doi:10.1371/journal.pone.0009709. Burgess, P. R. and Jones, L. F. (1997). Perceptions of effort and heaviness during fatigue and during the size–weight illusion, Somatosens. Motor Res. 14, 189–202. Cholewiak, R. W. (1999). The perception of tactile distance: influences of body site, space and time, Perception 28, 851–875. Clifford, C. W. G. and Rhodes, G. (2005). Fitting the Mind to the World. Adaptation and After-Effects in High-Level Vision. Oxford University Press, Oxford, UK. Clifford, C. W. G. and Wenderoth, P. (1999). Adaptation to temporal modulation can enhance differential speed sensitivity, Vision Research 39, 4324–4331. Danesino, A. (1932). Sopra l’apprezzamento di differenze spaziali nel campo delle senszioni tattili pure, Archivio Italiano de Psicologia 10, 160–166. Durgin, F. H. and Gigone, K. (2007). Enhanced optic flow speed discrimination while walking: contextual tuning of visual coding, Perception 36, 1465–1475. Ekman, G (1956). Discriminal sensitivity on the subjective continuum, Acta Psychologica 12, 233– 243. Ekman, G. (1959). Weber’s law and related functions, J. Psychol. 47, 343–352. Fechner, G. T. (1840). Ueber die subjectiven Nachbilder und Nebenbilder, Annalen der Physik und Chemie 40, 193–221. Fechner, G. T. (1860, 1966). Elemente der Psychophysik. Breitkopf und Härtel, Leipzig, Germany (1860). Transl. H. E. Adler (1966) as The Elements of Psychophysics, D. H. Howes and E. G. Boring (Eds), Vol. 1. Holt, Rinehart and Winston, New York, USA.
Fechner’s elusive parallel law
37
Foley, J. M. and Chen, C.-C. (1997). Analysis of the effect of pattern adaptation on pattern pedestal effects: a two-process model, Vision Research 37, 2779–2788. Green, B. G. (1982). The perception of distance and location for dual tactile pressures, Percept. Psychophys. 31, 315–323. Greenlee, M. W. and Heitger, F. (1988). The functional role of contrast adaptation, Vision Research 28, 791–797. Gregory, R. L. and Ross, H. E. (1967). Arm weight, adaptation and weight discrimination, Percept. Motor Skills 24, 1127–1130. Hall, G. S. (1912). Founders of Modern Psychology. Appleton, New York, USA. Heidelberger, M. (2004). Nature From Within: Gustav Theodor Fechner and His Psychophysical Worldview. Transl. C. Klohr. University of Pittsburgh Press, Pittsburgh, USA. Hellström, A. (2000). Sensation weighting in comparison and discrimination of heaviness, J. Exper. Psychol.: Human Percept. Perform. 26, 6–17. Holway, A. H., Goldring, L. E. and Zigler, M. J. (1938). On the discrimination of minimal differences in weight: IV. Kinesthetic adaptation for exposure intensity as variant, J. Exper. Psychol. 23, 536– 544. Jones, L. A. (1986). Perception of force and weight: theory and research, Psycholog. Bull. 100, 29–42. Jones, L. F. and Burgess, P. R. (1998). Neural gain changes subserving perceptual acuity, Somatosens. Motor Res. 15, 190–199. Jones, M. B. and Vierck, C. J. Jr. (1973). Length discrimination on the skin, Amer. J. Psychol. 86, 49–60. Kaufman, L., Kaufman, J. H., Noble, R., Edlund, S., Bai, S. and King, T. (2006). Perceptual distance and the constancy of size and stereoscopic depth, Spatial Vision 19, 439–457. Keidel, W. D., Keidel, U. O. and Wigand, M. E. (1961). Adaptation: loss or gain of sensory information?, in: Sensory Communication, W. A. Rosenblith (Ed.), pp. 319–338. Wiley, London, UK. Kiesow, F. (1926). Über die Vergleichung linearer Strecken und ihre Beziehung zum Weberschen Gesetze, Archiv für die Gesamte Psychologie 56, 421–451. Laming, D. (1986). Sensory Analysis. Academic Press, London, UK. Laming, D. (1997). The Measurement of Sensation. Oxford University Press, Oxford, UK. Lockhead, G. R. (1992). Psychophysical scaling: judgments of attributes or objects? Behav. Brain Sci. 15, 543–601. Masin, S. C. (2009). The (Weber’s) law that never was, in: Fechner Day 2009, M. A. Elliott, S. Antonijevi´c, S. Berthaud, P. Mulcahy, C. Martyn, B. Bargery and H. Schmidt (Eds), pp. 441–446. International Society for Psychophysics, Galway, Eire. Masin, S. C., Zudini, V. and Antonelli, M. (2009). Early alternative derivations of Fechner’s law, J. History Behav. Sci. 45, 56–65. Meng, X. and Qian, N. (2005). The oblique effect depends on perceived, rather than physical, orientation and direction, Vision Research 45, 3402–3413. Murray, D. J. (1993). A perspective for viewing the history of psychophysics, Behav. Brain Sci. 16, 115–186. Murray, D. J. and Ross, H. E. (1988). E. H. Weber and Fechner’s psychophysics, Passauer Schriften zur Psychologiegeschichte Nr. 6, G. T. Fechner and Psychology, J. Brožek and H. Gundlach (Eds), pp. 79–86. International Gustav Theodor Fechner Symposium 1987, Passau, Germany. Ono, H. (1967). Difference threshold for stimulus length under simultaneous and nonsimultaneous viewing conditions, Percept. Psychophys. 2, 201–207. Pestilli, F., Viera, G. and Carrasco, M. (2007). How do attention and adaptation affect contrast sensitivity? J. Vision 7 (7), art. 9; doi: 10.1167/7.7.9
38
H. E. Ross, N. J. Wade
Poulton, E. C. (1989). Bias in Quantifying Judgments. Erlbaum, Hove, UK. Purkinje, J. (1823). Beobachtungen und Versuche zur Physiologie der Sinne. Beiträge zur Kenntniss des Sehens in subjectiver Hinsicht. Calve, Prague, Chekoslovakia. Reynolds, J. H. and Heeger, D. J. (2009). The normalization model of attention, Neuron 61, 168–185. Ricci, A. (1937). Sulla sensibilità di differenza nell’apprezzamento tattile di stimuli estesi applicati su regioni differenti della pelle, Archivio Italiano di Psicologia 16, 383–392. Ross, H. E. (1969). When is a weight not illusory? Quart. J. Exper. Psychol. 22, 346–355. Ross, H. E. (1981). How important are changes in body weight for mass perception? Acta Astronautica 8, 1051–1058. Ross, H. E. (1997). On the possible relations between discriminability and apparent magnitude, Brit. J. Mathemat. Statist. Psychol. 50, 187–203. Ross, H. E. (2003). Context effects in the scaling and discrimination of size, in: Fechner Day 2003, B. Berglund and E. Borg (Eds), pp. 257–262. International Society for Psychophysics, Stockholm, Sweden. Ross, H. E. and Gregory, R. L. (1964). Is the Weber fraction a function of physical or perceived input? Quart. J. Exper. Psychol. 16, 116–122. Ross, H. E. and Gregory, R. L. (1970). Weight illusions and weight discrimination — a revised hypothesis, Quart. J. Exper. Psychol. 22, 318–338. Ross, H. E. and Murray, D. J. (1996). E. H. Weber on the Tactile Senses, 2nd edn. Erlbaum (UK) Taylor and Francis, Hove, UK. Seashore, C. E. (1896). Weber’s Law in illusions, Studies from the Yale Psychological Laboratory, IV, USA. Solomon, J. A. and Morgan, M. J. (2009). Strong tilt illusions always reduce orientation acuity, Vision Research 49, 819–824. Snowden, R. J. (2001). Contrast gain mechanism or transient channel? Why the effects of a background pattern alter over time, Vision Research 41, 1879–1883. Stevens, J. C. (1979). Thermal intensification of touch sensation: further extensions of the Weber phenomenon, Sensory Processes 3, 240–248. Teghtsoonian, R. (1971). On the exponents in Stevens’ Law and the constant in Ekman’s Law, Psycholog. Rev. 78, 71–80. de Vignemont, F., Ehrsson, H. H. and Haggard, P. (2005). Bodily illusions modulate tactile perception, Curr. Biol. 15, 1286–1290. Volkman, A. W. (1863). Physiologische Untersuchungen im Gebiete der Optik. Breitkopf und Härtel, Leipzig, Germany. Ward, J. (1876). An attempt to interpret Fechner’s Law, Mind 1, 452–466. Weber, E. H. (1834, 1996). De tactu. Annotationes anatomicae et physiologicae. Koehler, Leipzig (1834). Transl. H. E. Ross and D. J. Murray (Eds) (1996). E. H. Weber on the Tactile Senses, 2nd edn. Erlbaum (UK), Taylor and Francis, Hove, UK. Weinstein, S. (1968). Intensive and extensive aspects of tactile sensitivity as a function of body part, sex, and laterality, in: The Skin Senses, D. R. Kenshalo (Ed.), pp. 195–222. Thomas, Springfield, Ill., USA. Wilson, H. R. and Humanski, R. (1993). Spatial frequency adaptation and contrast gain control, Vision Research 33, 1133–1149. Woodrow, H. (1933). Weight discrimination with a varying standard, Amer. J. Psychol. 45, 391–416. Zupko, R. E. (1978). British Weights and Measures: A History from Antiquity to the Seventeenth Century. University of Wisconsin Press, Madison, USA.
Magnitude of Perceived Change in Natural Images May Be Linearly Proportional to Differences in Neuronal Firing Rates David J. Tolhurst 1,∗ , Michelle P. S. To 1 , Mazviita Chirimuuta 1 , Tom Troscianko 2 , Pei-Ying Chua 1 and P. George Lovell 2 1
2
Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK Department of Experimental Psychology, University of Bristol, 12a Priory Road, Bristol, BS8 1TU, UK
Abstract We are studying how people perceive naturalistic suprathreshold changes in the colour, size, shape or location of items in images of natural scenes, using magnitude estimation ratings to characterise the sizes of the perceived changes in coloured photographs. We have implemented a computational model that tries to explain observers’ ratings of these naturalistic differences between image pairs. We model the action-potential firing rates of millions of neurons, having linear and non-linear summation behaviour closely modelled on real V1 neurons. The numerical parameters of the model’s sigmoidal transducer function are set by optimising the same model to experiments on contrast discrimination (contrast ‘dippers’) on monochrome photographs of natural scenes. The model, optimised on a stimulus-intensity domain in an experiment reminiscent of the Weber–Fechner relation, then produces tolerable predictions of the ratings for most kinds of naturalistic image change. Importantly, rating rises roughly linearly with the model’s numerical output, which represents differences in neuronal firing rate in response to the two images under comparison; this implies that rating is proportional to the neuronal response.
Keywords Natural images, ratings, V1 model, transducer function
*
To whom correspondence should be addressed. E-mail:
[email protected]
40
D. J. Tolhurst et al.
1. Introduction Fechner and Weber suggested that, in many sensory domains, there is a geometric scale of sensory intensity perception and that this might arise from a compressive relation between neuronal response magnitude and stimulus intensity (Murray and Ross, 1988; Ross, 1995). These pioneering psychophysical ideas may not have led to a perfect unifying Law of Sensation (Stevens, 1961) but the proposals about psychophysical magnitude discriminations have underlain many of the important and testable comparisons between psychophysical performance and single neuron sensory physiology. Fechner and Weber might have supposed that ‘delta-C is directly proportional to C’, implying a logarithmic transducer function between neuronal response and contrast, but careful psychophysical measurements have suggested other compressive transducer shapes, and explanations have been sought in the actual behaviour of sensory neurons. This has been especially the case for the study of sinewave grating contrast discrimination (Boynton et al., 1999; Chirimuuta and Tolhurst, 2005a; Foley, 1994; Goris et al., 2009; Legge and Foley, 1980; Watson and Solomon, 1997). Measurements of how visual neurons respond to contrast (Albrecht and Hamilton, 1982; Heeger et al., 2000; Sclar et al., 1990; Tolhurst et al., 1981, 1983) can inspire quantitative models of how a whole organism discriminates contrast stimuli (Watson and Solomon, 1997). In this paper, we extend such ideas to consider how the responses of visual cortex neurons to simple stimuli can help us understand how people perceive changes in natural images. Since the earliest electrophysiological recordings of the responses of sensory neurons and the discovery that intensity is coded as action potential frequency (Adrian and Zotterman, 1926), it has been an important question how neuronal response properties relate to human psychophysical performance (e.g., Barlow and Levick, 1969; Borg et al., 1966; Parker and Newsome, 1998; Werner and Mountcastle, 1965). The way that neuronal response depends upon stimulus intensity has been of particular interest, since it bears direct comparison with the well formulated psychophysical ideas of Fechner and Weber (e.g., Chirimuuta and Tolhurst, 2005a, b; Tolhurst et al., 1983, 1989; Werner and Mountcastle, 1965). As early as 1931, B. H. C. Matthews studied the muscle spindle (a receptor which increases its activity when the muscle is stretched) and he reported: ‘If the frequency [of action potential firing]. . . be plotted against the logarithm of the load, the points lie very nearly on a straight line. . . . It has long been held that, as a stimulus increases in geometric progression, the sensation increases in arithmetic progression (Fechner’s Law)’. That is, he suggested that the logarithmic relation between response and load (i.e., the tension in the muscle’s tendon) might underlie Fechner’s Law of Sensation. While this looks like a pleasingly straightforward relation between neuronal behaviour and psychophysical performance, the observation points to a number of complications in trying to compare overall psychophysical performance with the behaviour of single neurons.
Perceived changes in natural images and Fechner’s Law
41
1.1. What Is the Appropriate Measure and Range of Stimulus Intensity? The later study of the muscle spindle (P. B. C. Matthews, 1972) illustrates important caveats in the search for Laws of sensory coding or perception. First, it turns out that the response of the muscle spindle is determined by muscle length and not by tension, and it also turns out that tension is logarithmically related to length. Thus, response is actually linearly proportional to length, the appropriate measure of stimulus intensity in this case. In trying to deduce relations between neuronal response or psychophysical magnitude estimation and stimulus intensity (Stevens, 1961), it is clearly important to understand what system of measurement is appropriate for the stimulus intensity. Indeed, it may sometimes be the case (as in our present study) that there is no obvious single-dimensional metric to use. The behaviour of the muscle spindle illustrates a second caveat: while it is a valid experimental strategy to study a system under the widest range of conditions, it should be noted when, in everyday life, that system is subject to only limited parts of the potential range. It is important to recognise the natural intensity range and to ask whether the system’s behaviour is the same within the natural range as it is overall. Muscle spindle response is linearly proportional to muscle length (as said above), but only for very small stretches. While this seems, at first, to lessen any interest in a linear response range, it is compatible with the observation that, in situ, the length of a muscle is actually able to change rather little (about 5–10%), even though the joints be fully flexed or fully extended. In this paper, we study human perception of naturalistic changes in digitised photographs of everyday scenes: i.e., natural images. In vision, we may be exposed at different times to stimuli that vary by many log units in intensity (mostly at different times of day). Stevens (1961) found that the power law relation between psychophysical magnitude and stimulus intensity for bright spots had an exponent of about 0.33 over a large range of intensities. However, when viewing natural scenes, the range of intensities in any one scene is much more modest. More generally, it is a tenet of vision science that the appropriate measure of stimulus intensity should be contrast, a measure of the intensity of an object relative to the average level (Enroth-Cugell and Robson, 1966; Troy and Enroth-Cugell, 1993) and, in a typical natural scene, the contrast is mostly low and ranges over only 2 log units (Brady and Field, 2000; Clatworthy et al., 2003; Lauritzen and Tolhurst, 2005). The relation between perceived magnitude and contrast has a different power (0.65–1.0) from that reported by Stevens for intensity (Biondini and De Martelli, 1985; Cannon, 1979; Gottesman et al., 1981; Peli et al., 1991). It has also been questioned whether the appropriate metric might actually be contrast energy, the contrast squared (Solomon, 2009). As with the example of the muscle spindle, this visual example shows the importance of considering the appropriate intensity metric and a natural range of stimuli. Simply equating neuronal response magnitude with the substrate for human magnitude estimates raises a third caveat. It is well accepted that human detection thresholds or intensity discrimination limens involve statistical judgments about
42
D. J. Tolhurst et al.
changes in an ‘internal variable’ (Green and Swets, 1966). It is a common observation in sensory neurophysiology that neuronal response variability (or ‘noise’) increases with response magnitude and presumably with stimulus intensity (Matthews and Stein, 1969; Tolhurst et al., 1981, 1983, 2009; Werner and Mountcastle, 1965). Any model of how psychophysical judgments depend upon neuronal behaviour should include knowledge not only of how neuronal response magnitudes depend upon intensity, but also of how the ‘response noise’ changes (we shall consider this point in the Discussion). 1.2. Is a Single-Neuron Model Appropriate? While it is attractive that overall psychophysical performance seems to relate to the response-intensity relations of single sensory neurons, it is obviously the case that neurons will not be active in isolation: overall performance must be the result of activity in many (probably disparate) neurons. Take, for instance, Weber’s classical experiment on the ability to discriminate the weights of objects (Brodie and Ross, 1984). Suppose that we hold our upper arms by our sides, our forearms horizontally with our palms upright and that we hold two different weights on our two palms. It may be that there are muscle spindles (compare B. H. C. Matthews, 1931) in the biceps whose stretch and responses will depend on the weights on the palms. However, there are very many different kinds of sensory receptor in the arm (including the elbow or wrist), which must surely be involved (spindles in fore-arm and upperarm muscles, receptors in joints, and pressure receptors in the skin of the hand). The magnitude of response given by a muscle spindle in response to a given stretch also depends upon efferent outflow from the nervous system and is affected by the amount of force needed to keep the forearms horizontal (P. B. C. Matthews, 1972). Finally, it is likely that judgements about the relative weights in the two hands will involve some deliberate ‘testing’ motor activity, determining how much extra force is needed to lift the objects a small distance (Brodie and Ross, 1984). In all, the neural activity is complex and disparate even for a ‘simple’ judgement of which of two stimuli is the more intense. How much more complex when the stimuli vary in multiple dimensions that are not easy to quantify individually? A true picture of how perceptual magnitude relates to neural responses requires a comprehensive description of the individual contributions of all the disparate sensory signals involved, and hypotheses about how those many signals are weighted and combined to give a single judgement. In fact, just such an enterprise has been ongoing for 25 years in vision science (Daly, 1993; Foley, 1994; Lovell et al., 2006; Lubin, 1995; Parraga et al., 2005; To et al., 2009, 2010; Watson, 1987; Watson and Ahumada, 2005; Watson and Solomon, 1997). Using the immense amount of quantitative data on single V1 neuron responses and human channel behaviour, it has been possible to build computational models of how millions of neurons respond to visual stimuli, and then to compute how the population behaviour differs in response to different stimuli. Such models are used to understand the neural contributions to ‘simple’ decisions about stimulus contrast (an intensity metric in the
Perceived changes in natural images and Fechner’s Law
43
line of Weber and Fechner), and they have also been used as the basis of image quality metrics to describe the degree of perceived corruption in images that have been subject, say, to JPEG compression (e.g., Daly, 1993; Lubin, 1995). We have been investigating human suprathreshold perception of changes in natural images (To et al., 2008, 2009, 2010): e.g., changes in the shapes, colours or numbers of objects in a scene. The great variety of image changes in our experiments (e.g., changes in hue, blur, object size or posture, number of objects in view) means that there is no obvious single physical stimulus metric against which we can compare the observers’ ratings to all those change types, separately or in combination; a model of the responses of millions of V1 neurons is an attempt to unify the data. In this paper, we shall examine whether such a model helps us to understand the relationship between psychophysical judgements and neural population response for suprathreshold changes in natural visual images. We have already applied a V1-based model to the results of a ratings experiment with a disparate set of images constructed with an unsystematically chosen variety of image changes (To et al., 2010); we argued that some of those image changes (e.g., changes in facial expression) might not be amenable solely to V1 modelling. Here, we extend the V1 modelling to show explicitly how a single model with few parameters can link ‘classical’ psychophysical experiments on contrast discrimination (‘dipper’ functions with sinewave gratings) to our experiments which measure the perceived magnitudes of changes in, for example, the shapes, colours, locations and numbers of objects in natural scenes. We shall discuss how the ‘transducer function’ (Legge and Foley, 1980), which we model as underlying the dipper function, is related to the population response of V1 neurons. We ask particularly whether the model gives a straight-line fit to our ratings data on natural image differences. We concentrate on a newer set of experimental results, obtained with stimuli in which the various changes (e.g., in object size, location and colour) are systematically changed (separately and in combination) to provide a well-distributed range of rating data to challenge the model. 2. Methods 2.1. Coloured Natural Image Stimuli and Magnitude Estimate Ratings Our methods for constructing and presenting stimuli are given in detail by To et al. (2008, 2010). In a first experiment (To et al., 2008), 294 image pairs were made from a small number of parent images. Six parent images led to 48 variants each. An image could differ from the parent along one dimension, along a second dimension or along both together. Including ‘no change’, there were 7 steps along each dimension (the steps were intended to be of equal perceptual magnitude), giving 7 × 7 variants in total. Variants could differ in the locations of objects within the image (e.g., Fig. 1A, B), the sizes or colours of objects, or in the intensity of shadows (Fig. 1B). The 294
44
D. J. Tolhurst et al.
Figure 1. Monochrome representations of some of the kinds of image pair used in our experiments. (A) and (B) from the ‘garden scene’ series. The left-hand images show two of the parent images in the experiments, while the middle and right-hand images show variant images against which the parents could be compared. There were 48 variants of each. (A) Shows two variants that differ in the magnitude of a single change type, while (B) shows variants that differ in 2 different ways. (C–E) From the ‘varied pairs’ series; the upper stimulus is the parent, and the lower image is one of 5 variants in the experiments. (C) Shows a colour change; (D) shows a shape change; (E) shows an item disappearing. The ‘colour change’ was achieved by changing the hue and the saturation of one banana, using code written in Matlab (The Mathworks). ‘Shape’ and ‘appearance’ changes used time-lapse photography. For details and coloured examples, see To et al. (2008, 2010).
Perceived changes in natural images and Fechner’s Law
45
images could be presented upright or inverted, in random order, giving 588 pairs altogether. We have not previously applied a V1-based model to these data. In a second experiment (To et al., 2010), 900 pairs of images were made from a wide variety of coloured photographs of natural scenes, covering subject matter such as animals, plants, people, man-made objects, landscapes, still-lifes and garden scenes. Some image pairs could be made by taking one natural photograph and using some kind of image processing technique to change the colour (hue and/or saturation) of all or part of the scene (e.g., Fig. 1C; coloured examples are given in To et al., 2008). Images could also be blurred or sharpened. Many image pairs were made from a pair of photographs of the same scene. In the time between photographs, the shape or arrangement of objects in the scene may have changed (e.g., Fig. 1D), or an object may have appeared or disappeared (e.g., Fig. 1E), or the natural lighting and shadowing may have changed due to change in the time of day or the weather. Some image pairs were made by combining the natural shape changes with image-processed colour or blur changes. There were 180 parent images, each paired against 5 variants. The images were 256 by 256 pixels, 3.2 deg square surrounded by uniform grey in a larger display. The stimuli were presented through a ViSaGe system (Cambridge Research Systems) so that we had precise control and knowledge of the luminance of each pixel. In a given trial, the observer had to compare two related images; one was a parent image and the other was a variant of it. The observer viewed a small spot in the centre of the display, and then the images in a pair were presented sequentially. The first image was presented for 833 ms followed by a 83 ms interval when the screen was uniform grey apart from the fixation spot. The second image was then presented for 833 ms, followed by a 83 ms grey interval and a 833 ms re-presentation of the first image. The fixation spot was extinguished while the images were present, but the observers were instructed to view the centre of the images and not move their gaze about. The observer then gave a numerical rating of the perceived difference between the images. Every 10 trials, one particular image pair was presented (a picture of a red flower where the difference was in colour saturation); the numerical difference between this standard pair was set to ‘20’, and observers were instructed and trained to use a ratio scale to rate any kind of difference in any other image pair with respect to this standard. The observers were sometimes surprised by this instruction at first, but after a little practice, they mostly reported being very comfortable with the idea that they could rate any perceived change against, say, a colour saturation change. However, this does raise the question whether the observers really did maintain a single rating standard or whether they might have unintentionally held slightly different standards for different image change types. Observers could give a rating of ‘0’ if they perceived the images in a pair to be identical; they had been told that some of the image pairs might be identical, but they were not told what proportion. There was no upper limit set for the ratings. At the end of the experiments, each of the observers’ ratings was
46
D. J. Tolhurst et al.
divided by the median value for that observer, and the normalized ratings for each pair were averaged across observers. This stimulus presentation protocol is similar to those that might produce ‘change blindness’ (Simons and Rensink, 2005). However, apart from image pairs that differed in the detailed organizations of textures or in small movements of objects (see below), the observers did not seem to be subject to change blindness. The observers had had sufficient training and practice (see To et al., 2010) with image pairs like those to be presented in the main experiment that, we presume, they were clearly aware of what sorts of image change to expect. 2.2. Contrast Discrimination Dippers Our methods for stimulus presentation and the staircase procedure for obtaining contrast thresholds have been described in detail before (Chirimuuta and Tolhurst, 2005a). Pedestal images were presented at a variety of contrasts, defined as dB attenuation from the maximum (when the brightest pixel in the image was double the surrounding grey of the display and the darkest pixel had a nominal value of zero). The pedestal image was a 6 deg square image derived either from a monochrome photograph of a street scene (Fig. 2A) or a 1/f random noise pattern (Fig. 2C) or versions of these that had been notch filtered or bandpass filtered with 1 octave wide filters. The increment was added to the pedestal and its contrast was adjusted in a 2AFC staircase to determine the increment threshold — the increment and pedestal were presented on alternate frames (frame rate 120 Hz) and their contrasts were controlled separately using pseudo-15 bit LUTs. The increment was a Gaussianweighted patch of the street scene, the 1/f pattern or a filtered variant; the spread of the Gaussian was 0.38 deg. Eight different combinations of increment and pedestal were studied, with 11 different pedestal contrasts in each. In an experimental session, the staircases for 5–6 pedestal contrasts would be randomly interleaved. In a trial, the pedestal would be presented alone in one 100 ms interval and the pedestal plus the increment in the other interval. The increment was assigned to the first or second interval at random on each trial. In response to the observer’s choices, the increment contrast was increased or decreased, and the staircase generally stabilized at a contrast near to the point where the observer would correctly choose the increment interval on 75% of trials. Threshold was calculated by fitting an error function to the psychometric function that resulted from 100–200 trials. 2.3. A V1 Based Model of Visual Discrimination Our model is based on that of Watson and Solomon (1997) which is used to explain detection thresholds for monochrome grating stimuli. We have tried to extend that model to encompass coloured stimuli (see Lovell et al., 2006; To et al., 2010) and suprathreshold decisions. We have implemented a variety of models with different receptive field shapes and interactions (To et al., 2010) and here we describe the one that yielded the best match to the psychophysical dipper data: a phase-invariant ‘complex cell’ model.
Perceived changes in natural images and Fechner’s Law
47
Figure 2. Contrast ‘dipper’ experiments that were used to optimize the numerical parametersof the V1 model. In all, 8 experiments were examined, and a single set of numerical parameters fitted to them, and 3 of those 8 are summarized here. (A) The circles show the average discrimination thresholds of 2 observers in an experiment where a small patch of the image was added to the centre of a monochrome photograph of a street scene. The line is the dipper calculated from the V1 model with the optimized parameters. (B) The same except that the full street scene has been notch filtered to remove vertical components near 16 cycles per picture, while the test patch was band-pass filtered around vertical 16 c/picture. (C) The same except that the full ‘picture’ is a 1/f filtered random noise pattern; the small central patch is of the same filtered noise.
Briefly, the model consists of ‘simple-cell’ receptive fields with Gabor profile — 6 orientations by 5 spatial frequencies by two spatial phase symmetries. The fields are elongated and are not self-similar: the spatial frequency and orientation tuning gets sharper as spatial frequency is increased (Tolhurst and Thompson, 1981; Yu et al., 2010). The bandwidths were graded with optimal frequency: for fields with op-
48
D. J. Tolhurst et al.
tima of 1.25, 2.5, 5, 10, 20 cycles per degree, the frequency bandwidths were 2.12, 1.43, 0.93, 0.64, 0.45 octaves (full width at half height) and the orientation bandwidths were 43.4, 34.5, 22.8, 17.7, 11.6 degrees (full width at half height). These values lie in the ranges found for single neurons and deduced for psychophysical channels. The sensitivities of these frequency bands within the model were weighted according to typical observers’ contrast sensitivity for gratings of those frequencies. There are, in fact, three of each field type to deal with image colour: a ‘luminance’ detecting field, and an isoluminant red–green opponent and a blue– yellow opponent field based on the MacLeod–Boynton transformation (Lovell et al., 2006). Each field must be represented at every spatial location in the images and so the field templates must be convolved with the images. The first image in the pair is first convolved with all the different field types, giving a set of values proportional to image luminance. These linear responses are divided by the local mean luminance values to give contrast responses. We follow the method of Peli (1990): for each spatial frequency band, the image is also convolved with a 2D Gaussian blob having the same spread as the Gaussian envelope of the Gabor fields. The ‘linear response’ of each Gabor field is divided by the ‘linear response’ of the matched Gaussian blob whose field is centred on the same point in space. The r.m.s. is taken of the responses of paired odd-symmetric and even-symmetric ‘simple cells’ to give phase-invariant ‘complex cell’ responses. The contrast output of each ‘complex cell’ is weighted according to estimates of a human observer’s contrast sensitivity for luminance or isoluminant sinusoidal gratings of the field’s centre spatial frequency, orientation and eccentricity from the fovea. We measured foveal contrast thresholds for luminance gratings of appropriate orientations and spatial frequencies, and deduced the sensitivities within the model so that the model would explain those thresholds; we estimated thresholds for chromatic gratings from Mullen (1985) and Mullen and Kingdom (2002). The spatial fall-off in sensitivity from the centre of the fovea was modelled from Pointer and Hess (1989) and Mullen and Kingdom (2002). For luminance gratings the sensitivity falls by a factor of 10 in about 40 cycles of the grating along the vertical meridian and in about 60 cycles along the horizontal meridian (Pointer and Hess, 1989; Robson and Graham, 1981). The sensitivity for isoluminant gratings falls off much more quickly with eccentricity (Mullen and Kingdom, 2002): a factor of 10 in only 8 cycles for low spatial frequency RG gratings and in 15 cycles at higher frequencies. The sensitivity for BY isoluminant gratings falls off by a factor of 10 in about 25 cycles. The point of maximum sensitivity was, of course, in the centre of the target image. In the simplest model, the second image would be similarly processed and the responses to the image pair would be compared neuron by neuron. However, realistic models of V1 incorporate threshold behaviour, and non-linear suppressive interactions between neurons (e.g., Blakemore and Tobin, 1972; Heeger, 1992). The simple
Perceived changes in natural images and Fechner’s Law
49
contrast responses of the neurons must be transformed (following Legge and Foley, 1980) by a sigmoidal transducer function of the form (see Fig. 3): Response =
|Contrast|p . 1 + WN · Normalise q + WS · Surround r
(1)
The numerator with power p gives a positive acceleration or threshold; the terms in the divisor represent two different kinds of inhibitory interaction between neurons and their effect is to cause the response to become compressed at higher contrasts. Divisive normalizing nonlinearities have been established neurophysiologically and psychophysically (Bonds, 1989; Foley, 1994; Heeger, 1992; Watson and Solomon, 1997). We have modeled two forms of suppressive signal, treating them as distinct since neurophysiology has usually shown them to have different properties. However, Cavanaugh et al. (2002) do suggest that the two suppressive phenomena may grade into each other. First, we model a spatially localized, non-specific suppression (Heeger, 1992); we implemented the normalizing signal for all the neurons at a given point in the image as the sum of the responses of all the neurons with fields centred at that point; the absolute value of each suppressing neuron’s response was raised to a power q before the sum. Thus the spatially-localized signal is summed across all spatial frequencies and orientations, and the same divisive signal is applied to all neurons at that point. We have also modeled surround suppression that is stimulus specific (Blakemore and Tobin, 1972; Cavanaugh et al., 2002; Maffei and Fiorentini, 1976); this is necessary to explain the forms of contrast discrimination functions for gratings of different geometry (Meese, 2004). Here, the suppressive signal applied to a neuron at a given location in the image is derived only from neurons of the same optimal spatial frequency and orientation, but whose fields are centred in a blurred annulus around the neuron being suppressed. The absolute values of the responses of these suppressing cells are raised to a power r, then summed and weighted according to a blurred annulus around the inhibited neuron: Surround_strengthf = d · e−d
2 /(2·rad 2 ) f
,
(2)
where d is distance from the center of the suppressed field, and rad f is the radius of the annulus, which is proportional to the spatial period of the carrier sinewaves of the Gabor fields. Thus, we have two suppressive signals: one is spatially very localized but is diffuse in spatial frequency and orientation (Heeger, 1992), whilst the other is specific to spatial frequency and orientation but diffuse in space (Blakemore and Tobin, 1972). Watson and Solomon (1997) used one suppressing signal which is potentially more diffuse in space than ours and is partly specific for orientation or frequency; Cavanaugh et al. (2002) would suggest that the degree of stimulus specificity should vary with distance from the centre of the suppressed neuron’s receptive field. The transformed responses of the millions of neurons to one image are subtracted from the transformed responses to the other image in the pair, and the millions of differences are pooled by Minkowski summation of the absolute values (Watson
50
D. J. Tolhurst et al.
and Solomon, 1997) with exponent m. The luminance, red–green and blue–yellow planes were processed entirely separately, and the cues they provided to image difference were finally combined only at this Minkowski summation stage. This gives a single number which is the model’s prediction of the observer’s magnitude estimate. Although the images may have differed in many, disparate ways along different dimensions, the whole dataset of 588 or 900 naturalistic images pairs is summarized by giving a single numerical output value to each image pair. The 7 model parameters (5 in equation (1), radf and m) were adjusted to give the best overall fit to the 88 data points in 8 contrast discrimination experiments with monochrome images (Fig. 2). The iterative search minimized the sum of squares deviation between model prediction and experimental thresholds. However, the fit with just those 7 parameters was not satisfactory: some model dippers were slightly displaced above the experimental data and some slightly below. We added an extra private parameter to the fits for 6 of the dippers to allow the model to slide each model dipper to better fit the data. The 6 extra parameters increased or decreased the grating contrast thresholds underlying the model by up to +/ − 2 dB, independently for each dipper. These shifts may represent day-to-day differences in observer sensitivity or differences in the adaptational state caused by slight differences in the contrast energy in the stimuli making up the different dipper experiments. 3. Results The 7 numerical parameters of the V1 model (plus the 6 threshold adjustment parameters) were sought by iteratively searching to minimize the sum of squares deviation between the results of 8 contrast discrimination ‘dipper’ experiments (11 thresholds each) and the predictions of the model for the 88 threshold measurements that comprised the experiments. The mean sum of squares error (MSE) between model and experimental data point was 2.21 dB squared, compatible with estimated standard errors on the threshold measurements of 1–1.5 dB. Figure 2 shows the experimental results and the model fit for 3 of those 8 dippers; these fits are representative of the MSE across the whole data set. The images beside the graphs show Figure 3. (A) The sigmoidal transducer function resulting from our V1-based model of contrast discrimination in monochrome naturalistic stimuli (Fig. 2). The transducer shows the response of a ‘neuron’ to a sine-wave grating; the neuron’s receptive field matches the orientation and spatial frequency of the grating, and is in the centre of the grating (the ‘fovea’). (B) The transducer function reported by Legge and Foley (1980) to describe their ‘dipper’ functions with sine-wave gratings. The graph of (A) may seem to have a threshold at low contrasts, while that of (B) has a small positive acceleration. The different appearance only reflects how the different graphs intersect the identical log–log axes in parts (A) and (B). (C) A histogram of the equivalent luminance contrasts in a selection of our naturalistic stimuli. Equivalent contrast is calculated from the first stage of the V1 model: the image is convolved with a receptive-field template and the resultant convolution is divided by the local mean luminance; that value is then calibrated against the model’s initial response to sine-wave gratings of known contrast and the appropriate spatial frequency (Tadmor and Tolhurst, 2000).
Perceived changes in natural images and Fechner’s Law
(A)
(B)
(C)
51
52
D. J. Tolhurst et al.
the pedestal stimulus, based either on a photograph of a street scene or an image of 1/f filtered noise. The 8 dippers used these pedestals or bandpass or notch-filtered (e.g., Fig. 2B) versions of these. The increments were small Gaussianly-weighted central patches of the main images or filtered versions of them. The best-fitting exponents p, q and r in equation (1) were 4.23, 3.45 and 3.86 with a Minkowski summing exponent m of 2.16. These values are comparable with those in other studies, primarily of the visibility of sinewave gratings (Foley, 1994; Foley et al., 2007; Watson and Ahumada, 2005; Watson and Solomon, 1997). The weights WN and WS were 0.0504 and 0.779, with a surround radius radf of 2.14 periods. For the ‘dipper’ in Fig. 2A, the y-axis intersection (−28 dB) and the pedestal contrast at the depth of the dip (about −35 dB) are lower than for the other two illustrated ‘dippers’. This is likely because the street scene (with its regularly spaced vertical door and window frames) contains a band of higher contrast energy than do the other stimuli. The biggest equivalent Michelson contrast (see below, Fig. 3C) in the street scene is 0.56, but is only 0.19 in the 1/f filtered noise (an approximately 10 dB difference). Six of the 7 numerical parameters determine the form of the Naka–Rushton transducer function (the 7th is the Minkowski summing exponent m). Figure 3A plots the transducer function for the optimized set of parameters; it shows the result of equation (1) for the single ‘neuron’ giving the largest response when the stimulus is a sinusoidal grating covering the full spatial extent of the modeled x, y space. Figure 3B shows the transducer given by Legge and Foley (1980) for their seminal description of contrast discrimination of sinewave gratings. They used a simple Naka–Rushton formulation with p of 2.4 and q of 2.0. The transducers in Fig. 3A and 3B are not identical, but the inflexion is at a remarkably similar contrast in the two. For interest, Fig. 3C tries to indicate approximately where in the non-linear transducer range our model is operating for natural image stimuli. The histogram shows the frequency with which the many ‘neurons’ in the model ‘saw’ the stimulus as containing a feature giving the same magnitude of response as their favoured sinewave grating — the equivalent Michelson contrast, which is the contrast of the optimal sinewave grating that would evoke the same response as did that location in the natural image (Lauritzen and Tolhurst 2005; Tadmor and Tolhurst, 2000). These responses are calculated at an early stage of the model: after the convolution with receptive field templates and division by the mean luminance, but before application of the suppressive nonlinearities. The modal value of ‘contrast in natural images’ is low, as we have shown before, and is close to the inflexion in the non-linear transducer functions. The model, with exactly the same parameters as shown for the contrastdiscrimination dippers of Fig. 2, was applied to the coloured image pairs in two experiments where observers gave magnitude estimation ratings of the perceived difference between images in each pair. The 7 model parameters were, of course, derived for monochrome images (Fig. 2), and we used the same parameter values
Perceived changes in natural images and Fechner’s Law
53
for the MacLeod–Boynton RG and BY planes of the images, which we have used to investigate perception of chromatic changes (see To et al., 2010, for detailed discussion). The model treated the Luminance plane and the RG and BY planes entirely separately; the only differences in processing were in the luminance or isoluminant grating sensitivities fed into the model and the ways in which these fell off with eccentricity (Mullen and Kingdom, 2002). Figure 4A plots the observers’ ratings for the 294 ‘garden scene’ image pairs against the output of the model (which is in arbitrary units). Each image pair was presented once upright and once upside-down; we were interested whether highlevel cognitive cues might have contributed to the ratings and we supposed that image inversion might frustrate such cues. The results for upright and inverted are very similar. The solid line shows the robust linear regression through the graph (r is 0.83; n of 588). It is worth noting that the ratings were correlated with the pixelby-pixel r.m.s. difference between the images with r of 0.60. The dashed line shows the regression with an added quadratic term. Although, the modest curvature of the quadratic regression does not seem to add much to the overall relation, addition of the quadratic term did have a highly significant effect on the residuals (F of 75.7). Figure 4B plots the ratings for the 900 ‘varied image pairs’ against the output of the model. There is clearly scatter in the fit of the ratings to the model (r = 0.55; n = 900) and, not surprisingly, there must be more to predicting an observers’ ratings than understanding only the low-level coding processes of V1. While this correlation is low, it is substantially higher than the correlation between the ratings and the r.m.s. differences between the images in the pair (r = 0.28). However, the correlation is adequate enough that we can see that rating is roughly directly proportional to the model’s output; adding a quadratic term (dashed line) gave insignificant improvement. We have discussed in detail (To et al., 2010) possible reasons why the V1-based model might give poor correlations for some kinds of image change. The higher correlation in Fig. 4A is probably because the 288 image pairs differed along only a few dimensions and because the stimuli were based on only 6 parent images so that each dimensional change was represented at 7 magnitudes. The 900 images pairs in the other set (Fig. 4B) were derived from 180 rather different parent images, and the pairs could differ in one or more disparate ways. We have argued (To et al., 2010) that one particular kind of image change will be badly modeled: image changes where there are small changes in object location or where there are changes in textures of image parts (e.g., changes in the detailed arrangements of pebbles on a gravel path). The model, very literally, compares the two images with exact spatial precision, and this seems an unrealistic expectation of human vision. Figure 4C replots the data of Fig. 4B, after discarding the many image pairs where the major difference was in object textures or where objects moved only a small amount. The remaining ratings have a higher correlation with the model (r = 0.65; n = 722), but there is still work to explain all the variance in the observers’ ratings. Again, adding a quadratic term to the regression did not give any significant improvement.
54
D. J. Tolhurst et al.
Overall, the 3 graphs in Fig. 4 seem to show that observers’ ratings of the multidimensional and disparate changes in natural images tend to be directly proportional to the output of the model, based on the pooling of the supposed response magnitudes of cortical neurons. Quadratic regressions have only slight curvature or add nothing beyond a linear regression. 4. Discussion We have extended a quantitative model (Daly, 1993; Lubin, 1995; Watson, 1987; Watson and Solomon, 1997) of populations of V1 neurons to examine how it would fare with naturalistic stimuli (Rohaly et al., 1997; To et al., 2010). We have determined the numerical values of the model parameters by optimizing the model on a relatively straightforward experimental paradigm: contrast discrimination ‘dippers’ (compare Foley, 1994; Legge and Foley, 1980; Solomon, 2009; Watson and Solomon, 1997). Rather than being constructed conventionally with sinusoidal gratings, our ‘dippers’ were performed with a photograph of a natural scene, a 1/f random noise pattern (a surrogate for natural images whose power spectra are very roughly 1/f ) and various filtered versions of these. The model, therefore, was designed to give a good match to experiments of the same kind as considered by Weber and Fechner — measurements of the ability to distinguish stimuli of different intensity. We have then examined how the same model ‘perceives’ the suprathreshold differences between the coloured naturalistic images for which we have observers’ magnitude estimate ratings. The images in these experiments covered a very large range of subject matter and there were many kinds of image change, which might be considered as the basic elements in many everyday visual tasks (see To et al., 2010). V1-based modeling (after Watson, 1987) has proved useful in developing metrics for assessing the quality of compressed or corrupted images (e.g., Daly, 1993; Lubin, 1995), and Rohaly et al. (1997) have also tried to model the visibility of targets in terrain scenes. We, too, are interested in applying vision research to explain the visibility of objects in natural scenes and the detectability of changes in those scenes. After five decades of fundamental psychophysical and neurophysiological research on the elements of visual processing and coding, we are in a position to ask whether all the careful and systematic experiments on, for example, sinusoidal gratings is enough to start to model everyday vision. While we cannot possibly say that Figure 4. Plots of the experimental magnitude ratings against the output (in arbitrary units) of the V1 model optimized on contrast-discrimination dippers. The solid lines show robust least-squares regression lines, while the dashed lines have an added quadratic term. (A) The ratings for the 294 garden scenes are plotted against the model output (r = 0.83); each image pair was presented twice — once upright (filled symbols) and once upside-down (open symbols). (B) The ratings for all 900 ‘varied pairs’ series are plotted against the model output (r = 0.55). (C) As for (B), except that a subset (722) of the ‘varied pairs’ is plotted (r = 0.65); stimulus images differing in small spatio-chromatic (‘texture’) changes have been discarded.
Perceived changes in natural images and Fechner’s Law
(A)
(B)
(C)
55
56
D. J. Tolhurst et al.
we have uniformly sampled the disparate multi-dimensional space of all possible natural images, we have considered a great variety of everyday scenes and everyday differences (see To et al., 2008, 2010 for many examples). The correlations we have shown (Fig. 4) between human response and V1 model are promising, and suggest that it is not premature to study vision with natural images. We are also in a position to ask how different aspects of visual processing contribute to everyday vision and whether the detailed knowledge of V1 processing is sufficient. We have reported (To et al., 2010) that, surprisingly, a model without the well-studied phenomena of divisive normalization fares little worse than a full model. We have been able to show that V1 modeling will have limits, since there are some scenes or scene changes where the models consistently fail to explain human performance, such as changes in facial expressions, shadowing or in textures (To et al., 2010). In one experiment, with a relatively limited range of images and image changes, the correlation between experiment and model output was 0.83; in the other experiment, with a greater range of subject matter in the images, the correlation was poorer. We have discussed elsewhere (To et al., 2010) why the implementation of such models may fail to match an observer’s perception of the magnitude of some kinds of image change. It should be noted that V1-inspired models are substantially better predictors of ratings than the pixel-by-pixel r.m.s. difference between images in the pairs. It should also be noted that we can develop ‘better’ models of the ratings by optimizing the model parameters on the rating experiment stimuli themselves, rather than on the ‘dipper’ data. The best correlations that we have obtained for the data in Fig. 4 are 0.86, 0.65 and 0.73 with a model in which the combination of numerical parameters is such that the transducer (compare Fig. 3A) is no longer monotonic. While such a re-optimized model may be useful in some other context, we wish to stress that the model we describe here directly links to the tradition of Weber and Fechner because it is optimized on intensity-discrimination data. The pioneers of sensory science (Fechner, Weber, Adrian, and B. H. C. Matthews) supposed that a person’s numerical magnitude judgment would be directly proportional to the response magnitude of appropriate neurons. We can argue whether sensory receptors or cortical neurons are the appropriate neurons to choose, and we can argue about the physical metrics used to define stimuli. We can surely accept that stimuli of greater magnitude must produce some greater internal neural response and that this leads to an increased magnitude of sensation and increased numerical rating values. However, it has been debated whether the internal magnitude of sensation need be directly proportional to neuronal response or whether numerical magnitude estimation ratings need be directly proportional to the internal magnitude of sensation (e.g., Gescheider, 1997). We instructed our observers to give ratings proportional to the magnitude of their sensations; in so far as the model gives an independent measure of image difference, the linear relation between ratings and model predictions implies that the observers did use an appropriate scale. Thus, our present results suggest that observers’ ratings do depend linearly on neuronal response levels. The rating of perceived complex differences in natural
Perceived changes in natural images and Fechner’s Law
57
scenes seems directly proportional to the numerical output of our V1 modelling which, we suppose, reflects differences in the magnitudes of neuronal responses to the two images under comparison. At first sight, it would seem that the perceived magnitude difference between two natural images is directly proportional to the difference in neuronal response to the two images, where response might be expressed as total number of action potentials generated during an image presentation. This presumes that the transducer function (Fig. 3A) underlying our modelling is a representation of how neuronal response magnitude increases with contrast. Boynton et al. (1999) have argued that the sigmoidal transducer function which is presumed to underlie contrast discrimination thresholds is the same shape as the relation between the V1 BOLD (fMRI) signal and contrast while Heeger et al. (2000), in turn, argue that the BOLD signal follows the relation between neuronal action potential rate and contrast. However, the hypothesized sigmoidal transducer function (like equation (1); Fig. 3) does not simply describe the the relation between response amplitude and contrast for single V1 neurons. Individual V1 neurons each respond to very limited ranges of contrast, while the dynamic ranges of different neurons cover different contrast ranges (Albrecht and Hamilton, 1982; Sclar et al., 1990; Tolhurst et al., 1981, 1983). The BOLD response is like the summed activity of many neurons whose threshold contrasts differ and whose responses saturate at different contrasts (Heeger et al., 2000). Thus, if we were to model the psychophysical transducer truly from the behaviour of V1 neurons, we would have to pool the responses of many neurons, and the transducer’s shape would reflect the number of neurons responsive at each contrast as well as the shape of the response-contrast functions of single neurons (Chirimuuta and Tolhurst, 2005a; Clatworthy et al., 2003; Goris et al., 2009; Watson and Solomon, 1997). Furthermore, psychophysical judgments are probabilistic and there is no explicit decision ‘noise’ within our model. We have modelled average performance rather than trial-by-trial variability, and so fixed noise is implicit as the fixed extra amount of ‘response magnitude’ in the transducer (Fig. 3A, B) that would be needed, on average, for discrimination. However, we need to recognize that the variability of neuronal responses as well as their magnitude is not fixed with contrast; our ability to discriminate contrasts will be affected by how response variability changes with contrast (Kontsevich et al., 2002). It is an often repeated observation that the variance of neuronal firing rates increases with increasing response level in visual cortex (Dean, 1981; Geisler and Albrecht, 1997; Snowden et al., 1992; Tolhurst et al., 1981, 1983; Vogels et al., 1989; Wiener et al., 2001). The standard errors of our ratings tend to be higher for higher rating values (To et al., 2010). Even if firing rate were linearly proportional to contrast, then we would still see something approaching Fechner’s Law (despite the absence of a logarithmic transducer) because we would need bigger contrast increments at the higher contrasts to overcome the higher response variability (see Solomon, 2009). The transducer of Fig. 3 may not be simply a schematic of how pooled neuronal firing rate depends upon contrast;
58
D. J. Tolhurst et al.
it is an ‘effective transducer’ in that it fits the dipper data, implicitly incorporating any changes of variance (Chirimuuta and Tolhurst, 2005a; Goris et al., 2009; Kontsevich et al., 2002). It may, however, be that the dominant source of noise in decision processes lies elsewhere than in V1 and that it is less dependent on stimulus intensity. In fact, the interpretation of the contrast dipper function is debatable (Georgeson and Meese, 2006; Kontsevich et al., 2002; Solomon, 2009). According to signal detection theory, appropriate contrast-dependent changes in response variance could as much give rise to the contrast dipper as could contrast-dependent changes in response magnitude; ‘it could be argued that the transducer is not much of an explanation, simply an alternative way to describe the data’ (Solomon, 2009). However, the effective transducer does look similar to the response magnitude relation implied by the BOLD signal (Boynton et al., 1999). While ‘the jury is still out’ (Georgeson and Meese, 2006) as regards the degree to which the effective transducer to explain dipper functions has a shape incorporating contrast-dependent changes in variance, this ambiguity may have an important effect on our use of the same transducer for suprathreshold magnitude ratings (J. M. Foley, personal communication). The transducer is designed to explain the results of contrast discrimination experiments, and variability in responses will be a crucial contributor to discrimination thresholds. However, this is not true for magnitude ratings, where the average rating would be expected to depend on the average neuronal response magnitude, and not on any variance (fixed or changing with contrast). Perhaps, there would be a better match between model and ratings if the magnitude and variance aspects of the effective transducer were better distinguished. If the transducer that is effective for the dipper functions does include an element of increased variance for large stimulus intensities, then this may tend to exaggerate the predicted magnitude ratings for big stimulus differences; this is consistent with the significant downward deviation from a straight line fit of the graph of ratings against model prediction in Fig. 4A. B. H. C. Matthews (1931) proposed that the logarithmic relation between action potential firing rate and tension in a muscle receptor could be an explanation for Fechner’s hopefully-universal Law of Sensation, where the ability to discriminate stimuli along simple intensity dimensions followed the rule that I /I is constant. While this proposal is now seen to be too simplistic in specific detail, it was of fundamental importance in the quest to link human perceptual performance with the behaviour of individual or populations of nerve cells. Our results with complex natural visual images do, indeed, suggest that perception of difference is directly related to differences in neuronal response, but this is a population response pooled across neurons. Acknowledgements This research was supported by grants from the EPSRC and Dstl on the Joint Grants Scheme (EP/E037097/1 and EP/E037372/1) to D. J. Tolhurst and T. Troscianko.
Perceived changes in natural images and Fechner’s Law
59
M. P. S. To and P. G. Lovell were employed on those grants. M. Chirimuuta received a research studentship from the MRC. P.-Y. Chua received a studentship from the Defence Science and Technology Agency (Singapore). We thank J. A. Solomon and J. M. Foley for their challenging criticisms of this work. References Adrian, E. D. and Zotterman, Y. (1926). The impulses produced by sensory nerve endings, J. Physiol. (Lond.) 61, 151–171. Albrecht, D. G. and Hamilton, D. B. (1982). Striate cortex of monkey and cat: contrast response function, J. Neurophysiol. 48, 217–237. Barlow, H. B. and Levick, W. R. (1969). Three factors limiting the reliable detection of light by retinal ganglion cells of the cat, J. Physiol. (Lond.) 200, 1–24. Biondini, A. R. and De Martelli, M. L. F. (1985). Suprathreshold contrast perception at different luminance levels, Vision Research 25, 1–9. Blakemore, C. and Tobin, A. (1972). Lateral inhibition between orientation detectors in the cat’s visual cortex, Exper. Brain Res. 15, 439–440. Bonds, A. B. (1989). Role of inhibition in the specification of orientation selectivity of cells in the cat striate cortex, Vis. Neurosci. 2, 41–55. Borg, G., Diamant, H., Strom, L. and Zotterman, Y. (1966). The relation between neural and perceptual intensity: a comparative study on the neural and psychophysical response to taste stimuli, J. Physiol. (Lond.) 192, 13–20. Boynton, G. M., Demb, J. B., Glover, G. H. and Heeger, D. J. (1999). Neural basis of contrast discrimination, Vision Research 39, 257–269. Brady, N. and Field, D. J. (2000). Local contrast in natural images: normalisation and coding efficiency, Perception 29, 1041–1055. Brodie, E. E. and Ross, H. E. (1984). Sensorimotor mechanisms in weight discrimination, Percept. Psychophys. 36, 477–481. Cannon, M. W. (1979). Contrast sensation: a linear function of stimulus contrast, Vision Research 19, 1045–1052. Cavanaugh, J. R., Bair, W. and Movshon, J. A. (2002). Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons, J. Neurophysiol. 88, 2530–2546. Chirimuuta, M. and Tolhurst, D. J. (2005a). Does a Bayesian model of V1 contrast coding offer a neurophysiological account of human contrast discrimination? Vision Research 45, 2943–2959. Chirimuuta, M. and Tolhurst, D. J. (2005b). Accuracy of identification of grating contrast by human observers: Bayesian models of V1 contrast processing show correspondence between discrimination and identification performance, Vision Research 45, 2960–2971. Clatworthy, P. L., Chirimuuta, M., Lauritzen, J. S. and Tolhurst, D. J. (2003). Coding of the contrasts in natural images by populations of neurons in primary visual cortex (V1), Vision Research 43, 1983–2001. Daly, S. (1993). The visible differences predictor: an algorithm for the assessment of image fidelity, in: Digital Images and Human Vision, A. B. Watson (Ed.), pp. 179–206. MIT Press, Cambridge, Mass., USA. Dean, A. F. (1981). The variability of discharge of simple cells in the cat striate cortex, Exper. Brain Res. 44, 437–440. Enroth-Cugell, C. and Robson, J. G. (1966). The contrast sensitivity of retinal ganglion cells of the cat, J. Physiol. (Lond.) 187, 517–552.
60
D. J. Tolhurst et al.
Foley, J. M. (1994). Human luminance pattern-vision mechanisms: masking experiments require a new model, J. Optic. Soc. Amer. A, Optic. Image Sci. Vis. 11, 1710–1719. Foley, J. M., Varadharajan, S., Koh, C. C. and Farias, M. C. (2007). Detection of Gabor patlerns of different sizes, shapes, phases and eccentricities, Vision Research 47, 85–107. Geisler, W. S. and Albrecht, D. G. (1997). Visual cortex neurons in monkeys and cats: detection, discrimination and identification, Vision Neurosci. 14, 897–919. Georgeson, M. A. and Meese, T. S. (2006). Fixed or variable noise in contrast discrimination? The jury’s still out, Vision Research 46, 4294–4303. Gescheider, G. A. (1997). Psychophysics — The Fundamentals. Lawrence Erlbaum Associates, USA. Goris, R. L. T., Wichmann, F. A. and Henning, G. B. (2009). A neurophysiologically plausible population code model for human contrast discrimination, J. Vision 9 (15), 1–22. Gottesman, J., Rubin, G. S. and Legge, G. E. (1981). A power law for perceived contrast in human vision, Vision Research 21, 791–799. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley, Chichester. Heeger, D. J. (1992). Normalization of cell responses in cat striate cortex, Vis. Neurosci. 9, 181–197. Heeger, D. J., Huk, A. C., Geisler, W. S. and Albrecht, D. G. (2000). Spikes versus BOLD: what does neuroimaging tell us about neuronal activity? Nature Neurosci. 3, 631–633. Kontsevich, L. L., Chen, C. C. and Tyler, C. W. (2002). Separating the effects of response nonlinearities and internal noise psychophysically, Vision Research 42, 1771–1784. Lauritzen, J. S. and Tolhurst, D. J. (2005). Contrast constancy in natural scenes in shadow or direct light — a proposed role for contrast-normalisation (non-specific suppression) in visual cortex, Network, Comput. Neur. Syst. 16, 151–173. Legge, G. E. and Foley, J. M. (1980). Contrast masking in human vision, J. Optic. Soc. Amer. A, Optic. Image Sci. Vis. 70, 1456–1471. Lovell, P. G., Párraga, C. A., Ripamonti, C., Troscianko, T. and Tolhurst, D. J. (2006). Evaluation of a multi-scale color model for visual difference prediction, ACM Trans. Appl. Percept. 3, 155–178. Lubin, J. (1995). A visual discrimination model for imaging system design and evaluation, in: Vision Models for Target Detection and Recognition, E. Peli (Ed.), pp. 245–283. World Scientific, Singapore. Maffei, L. and Fiorentini, A. (1976). The unresponsive regions of visual cortical receptive fields, Vision Research 16, 1131–1139. Matthews, B. H. C. (1931). The response of a single end organ, J. Physiol. (Lond.) 71, 64–110. Matthews, P. B. C. (1972). Mammalian Muscle Receptors and Their Central Actions. Hodder and Stoughton, London. Matthews, P. B. C. and Stein, R. B. (1969). The regularity of primary and secondary muscle spindle afferent discharges, J. Physiol. (Lond.) 202, 59–82. Meese, T. S. (2004). Area summation and masking, J. Vision 4, 930–943, http://journalofvision.org/4/ 10/8/, doi:10.1167/4.10.8 Mullen, K. T. (1985). The contrast sensitivity of human color vision to red–green and blue–yellow chromatic gratings, J. Physiol. (Lond.) 359, 381–400. Mullen, K. T. and Kingdom, F. A. (2002). Differential distributions of red–green and blue–yellow cone opponency across the visual field, Vis. Neurosci. 19, 109–118. Murray, D. J. and Ross, H. E. (1988). E. H. Weber and Fechner’s psychophysics, in: Passauer Schriften zur Psychologiegeschichte Nr. 6, G. T. Fechner and Psychology, J. Brozeck and H. Gundlach (Eds), pp. 79–86. Passavia Universitatsverlag, Passau, Germany. Parker, A. J. and Newsome, W. T. (1998). Sense and the single neuron: probing the physiology of perception, Ann. Rev. Neurosci. 21, 227–277.
Perceived changes in natural images and Fechner’s Law
61
Párraga, C. A., Troscianko, T. and Tolhurst, D. J. (2005). The effects of amplitude-spectrum statistics on foveal and peripheral discrimination of changes in natural images, and a multi-resolution model, Vision Research 45, 3145–3168. Peli, E. (1990). Contrast in complex images, J. Optic. Soc. Amer. A, Optic. Image Sci. Vis. 7, 2032– 2040. Peli, E., Yang, J. A., Goldstein, R. and Reeves, A. (1991). Effect of luminance on suprathreshold contrast perception, J. Optic. Soc. Amer. A, Optic. Image Sci. Vis. 8, 1352–1359. Pointer, J. S. and Hess, R. F. (1989). The contrast sensitivity gradient across the human visual field: with emphasis on the low spatial frequency range, Vision Research 29, 1133–1151. Robson, J. G. and Graham, N. V. (1981). Probability summation and regional variation in contrast sensitivity across the visual field, Vision Research 21, 409–418. Rohaly, A. M., Ahumada, A. J. and Watson, A. B. (1997). Object detection in natural backgrounds predicted by discrimination performance and models, Vision Research 37, 3225–3235. Ross, H. E. (1995). Weber then and now, Perception 24, 599–603. Sclar, G., Maunsell, J. H. and Lennie, P. (1990). Coding of image contrast in central visual pathways of the macaque monkey, Vision Research 30, 1–10. Simons, D. J. and Rensink, R. A. (2005). Change blindness: past, present, and future, Trends Cognit. Sci. 9, 16–20. Snowden, R. J., Treue, S. and Andersen, R. A. (1992). The response of neurons in areas V1 and MT of the alert rhesus monkey to moving random dot patterns, Exper. Brain Res. 88, 389–400. Solomon, J. A. (2009). The history of dipper functions, Attent. Percept. Psychophys. 71, 435–443. Stevens, J. J. (1961). To honor Fechner and repeal his Law, Science NY 133, 80–86. Tadmor, Y. and Tolhurst, D. J. (2000). Calculating the contrasts that retinal ganglion cells and LGN neurones encounter in natural scenes, Vision Research 40, 3145–3157. To, M. P. S., Gilchrist, I. D., Troscianko, T., Kho, J. S. B. and Tolhurst, D. J. (2009). Perception of differences in natural-image stimuli: why is peripheral viewing poorer than foveal? ACM Trans. Appl. Percept. 6 (26), 1–9. To, M. P. S., Lovell, P. G., Troscianko, T. and Tolhurst, D. J. (2008). Summation of perceptual cues in natural visual scenes, Proc. Royal Soc. Lond. B. Biol. Sci. 275, 2299–2308. To, M. P. S., Lovell, P. G., Troscianko, T. and Tolhurst, D. J. (2010). Perception of suprathreshold naturalistic changes in colored natural images, J. Vision 10, 1–22. Tolhurst, D. J. (1989). The amount of information transmitted about contrast by neurones in the cat’s visual cortex, Vis. Neurosci. 2, 409–413. Tolhurst, D. J., Movshon, J. A. and Dean, A. F. (1983). The statistical reliability of signals in single neurones in cat and monkey visual cortex, Vision Research 23, 775–785. Tolhurst, D. J., Movshon, J. A. and Thompson, I. D. (1981). The dependence of response amplitude and variance of cat visual cortical neurones on stimulus contrast, Exper. Brain Res. 41, 414–419. Tolhurst, D. J., Smyth, D. and Thompson, I. D. (2009). The sparseness of neuronal responses in ferret primary visual cortex, J. Neurosci. 29, 2355–2370. Tolhurst, D. J. and Thompson, I. D. (1981). On the variety of spatial frequency selectivities shown by neurons in area 17 of the cat, Proc. Royal Soc. London B Biol. Sci. 213, 183–199. Troy, J. B. and Enroth-Cugell, C. (1993). X and Y ganglion cells inform the cat’s brain about contrast in the retinal image, Exper. Brain Res. 93, 383–390. Vogels, R., Spileers, W. and Orban, G. A. (1989). The response variability of striate cortical neurons in the behaving monkey, Exper. Brain Res. 77, 432–436. Watson, A. B. (1987). Efficiency of a model human image code, J. Optic. Soc. Amer. A, Optic. Image Sci. Vis. 4, 2401–2417.
62
D. J. Tolhurst et al.
Watson, A. B. and Ahumada, A. J. (2005). A standard model for foveal detection of spatial contrast, J. Vision 5, 717–740, doi: 10.1167/5.9.6, http://www.journalofvision.org/5/9/6/ Watson, A. B. and Solomon, J. A. (1997). Model of visual contrast gain control and pattern masking, J. Optic. Soc. Amer. A, Optic. Image Sci. Vis. 14, 2379–2391. Werner, G. and Mountcastle, V. B. (1965). Neural activity in mechanoreceptive cutaneous afferents: stimulus-response relations, Weber functions, and information transmission, J. Neurophysiol. 28, 359–397. Wiener, M. C., Oram, M. W., Liu, Z. and Richmond, B. J. (2001). Consistency of encoding in monkey visual cortex, J. Neurosci. 21, 8210–8221. Yu, H. H., Verma, R., Yang, Y., Tibballs, H. A., Lui, L. L., Reser, D. H. and Rosa, M. G. P. (2010). Spatial and temporal frequency tuning in striate cortex: functional uniformity and specializations related to receptive field eccentricity, Eur. J. Neurosci. 31, 1043–1062.
Measuring Perceptual Hysteresis with the Modified Method of Limits: Dynamics at the Threshold Howard S. Hock 1,∗ and Gregor Schöner 2 1
Department of Psychology and Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA 2 Institute for Neuroinformatics, Ruhr-University Bochum, Germany
Abstract This article describes modifications to the psychophysical method of limits that eliminate artifacts associated with the classical method, and thereby indicate whether or not there is perceptual hysteresis. Such hysteresis effects, which are characteristic of dynamical systems, would provide evidence that the nearthreshold perception of an attribute is affected by stabilization mechanisms intrinsic to individual neural detectors, and by nonlinear interactions that functionally integrate the detectors when there is sufficient stimulus-initiated activation, thereby stabilizing activation at suprathreshold levels. The article begins with a review of research employing the modified method of limits. It concludes with a model and computational simulations showing how detection instabilities inherent in neural dynamics can create ‘activational gaps’ between the functionally-integrated and functionally-independent states of neural ensembles, resulting in clear and distinct discrimination between the perception and non-perception of an attribute. The ‘self-excitation’ threshold for engaging such functionally-integrating detector interactions is differentiated from the traditional ‘read-out’ threshold (criterion) that determines whether or not the attribute in question can be perceived.
Keywords Method of limits, psychophysics, hysteresis, neural dynamics, detection instability, motion quartets, apparent motion
1. The Classical Method of Limits The method of limits is typically cited as one of the psychophysical methods developed by Fechner (1860), though its antecedents have been traced back as far as 1700 (Boring, 1942). The method, which has been used to measure absolute thresholds (the minimal intensity for detection) as well as difference thresholds (the minimal *
To whom correspondence should be addressed. E-mail:
[email protected]
64
H. S. Hock, G. Schöner
noticeable difference), entails trials that begin with parameter values that are well above threshold (descending trials) alternating with trials that begin with parameter values that are well below threshold (ascending trials). For both ascending and descending trials, the parameter is gradually changed over a sequence of steps until values are reached for which the observer reports that the attribute in question is now perceived, or it no longer is perceived (see Note 1). A well-known issue for threshold measurements using the classical method of limits is that the parameter value for the transition from non-perception to perception differs from the parameter value for the transition from perception to non-perception. The artifacts contributing to this difference have been understood for many years (see Woodworth and Schlosberg, 1938, as well as more recent textbooks). Artifactual ascending/descending differences have been attributed to: (1) response perseveration (historically called ‘habituation’), a tendency to repeat the same response for successive stimuli irrespective of what is perceived; (2) inferences from trial duration (historically called ‘anticipation’), where perceptual transitions are reported after a long sequence of stimulus steps because the parameter is thought to have reached values for which the percept should have changed; and (3) judgment uncertainty, reports of a change are withheld until the transition is perceptually definite. Less frequently discussed are potential artifacts for rapidly changing stimuli, as in the study of motion perception. Differences between ascending and descending trials might then be artifacts of decision/response time: the time required for the observer to reach a decision and execute a response would delay the response until after the perceptual transition has occurred. The assumption in using the method of limits has been that the artifacts described above are symmetrical, so an accurate measure of the threshold would be obtained by averaging the transitional parameter values for the ascending and descending trials. More recently, however, it has been recognized that the difference in transitional parameter values between ascending and descending trials can reflect a meaningful, non-artifactual perceptual effect. 2. Perceptual Hysteresis Perceptual hysteresis occurs when the percept formed at the start of a descending trial persists despite a parameter changing to values for which the alternative would have been perceived during an ascending trial, and vice versa for the percept formed at the start of an ascending trial (e.g., Fender and Julesz, 1967; Williams et al., 1986). Such hysteresis effects are signatures of state-dependent neural dynamics (Hock and Schöner, 2010; Hock et al., 2003; Wilson, 1999). Accordingly, perception at any moment in time depends not only on stimulus-initiated detector activation, but also on the immediately preceding activation state of the ensembles of detectors that are activated by the stimulus. It is implicit in classical psychophysics that near-threshold perception depends on whether a stimulus adequately activates detectors that are responsive to it. The
The modified method of limits
65
threshold indicates the parameter value for which the stimulus is barely adequate; i.e., the value for which the attribute in question can be perceived, but the percept is uncertain and indistinct. The dynamical account is much different. Perceptual hysteresis indicates that in addition to the activation initiated by the stimulus, nearthreshold perception depends on whether the stimulus-initiated detector activation is sufficient to create excitatory interactions among the stimulated detectors. When such excitatory interactions are engaged, activation is amplified and stabilized well above the threshold ‘read out’ level that determines whether or not there is sufficient detector activation for an attribute to be perceived. When excitatory interactions are not engaged, detector activation is stabilized below the read-out threshold. The activational gap between the alternative perceptual states that is created by the presence vs the absence of self-excitatory interactions makes the alternatives (whether the attribute is perceived or not) clear and distinct. Moreover, excitation-amplified activation states persist over time, so there is a predisposition for the perception of the attribute to persist, even when there are stimulus changes that would otherwise result in a different percept. This is the basis for perceptual hysteresis. Classical and dynamical psychophysics also differ with respect to whether nearthreshold perception reflects only the feedforward processing of visual information. Activation induced in the feedforward path is largely stimulus determined, the preferential responding of different detectors occurring because different receptive fields respond selectively to different stimulus attributes. Although this would be consistent with the classical perspective, most neuronal activity entails more than feedforward processing. Braitenberg (1978) has estimated that 95% of the input to each cortical neuron comes from its connectivity with other cortical neurons; Felleman and Van Essen (1991) have determined that there are more feedback than feedforward connections between higher- and lower-level areas in the brain; and Movshon and Newsome (1996) and Girard et al. (2001) have shown that feedforward and feedback signals are more than fast enough (on the order of several milliseconds) for feedback from higher brain levels to affect percepts established at lower brain levels. It will be shown that it is because of this neural connectivity that percepts can be stabilized at activation levels beyond the minimal level required for perception, so when something is perceived for near-threshold parameter values, the percept is clear and distinct, and persists despite parameter changes that favor a change in perception (as per perceptual hysteresis). The distinction between classical and dynamical psychophysics thus entails more than a technical question of how best to measure thresholds using the method of limits. Its importance lies in the evidence it provides that detectors that would otherwise function independently (classical psychophysics) are potentially organized into functional units that amplify the differences in activation that determine whether or not a near-threshold attribute is perceived (dynamical psychophysics). Given this theoretical significance of perceptual hysteresis, the classical method of limits was modified in order to eliminate potential artifacts involving response
66
H. S. Hock, G. Schöner
perseveration, decision/response time, inferences from trial duration, and judgment uncertainty. 3. Response Perseveration and Decision/Response Time Response perseveration refers to the persistence of a response that is repeated over and over again for each stimulus step in a sequence of ascending or descending parameter changes. If there were no perceptual hysteresis, the parameter value for perceptual transitions would be the same for ascending and descending trials, but response perseveration would result in artifactual hysteresis because the responses indicative of a perceptual transition would not occur for some period of time subsequent to the actual perceptual change. Decision/response time poses a similar potential for artifact: hysteresis could occur because of the time required for the observer to reach a decision and execute a response. Once again, if there were no perceptual hysteresis the parameter value for the perceptual transition would be the same for ascending and descending trials. However, the time required for a decision and response execution (depending on the extent to which response speed is stressed) would make it appear as though the initial percept had persisted until values were reached that are later in the parameter sequence, after the perceptual transition had actually occurred. Different transition values for ascending and descending trials due to decision and response execution time could result in a hysteresis effect, but it would not be perceptual hysteresis. The time required for decision and response execution is not a factor when the parameter in the method of limits is changing very slowly, so that the perceiver’s response to an ascending or descending parameter step would occur before there is a change to the next parameter value. Even when the parameter changes more quickly, perceptual hysteresis can be inferred by showing that the size of the measured hysteresis effect is larger than an independent estimate of the hysteresis that would be attributable to the time required for decision and response execution time (see, for example, Gori et al., 2008). However, it is possible for true perceptual hysteresis effects to be smaller than estimates of artifactual hysteresis due to decision/response time, perhaps because of stimulus perturbations that reduce the size of the perceptual hysteresis (e.g., Hock and Ploeger, 2006). It is shown next how the modified method of limits eliminates artifacts of response perseveration, decision/response time, and dependence on the rate of parameter change. 4. The Modified Method of Limits The key to the modified method of limits is that it allows one to determine when perceptual transitions have occurred without requiring the observer to respond during the sequence of ascending or descending steps, and without concern for how quickly the parameter has changed or how quickly or slowly the observer decides that there was a perceptual change and executes an appropriate response. As in
The modified method of limits
67
the classical version, the modified method of limits begins ascending and descending trials with parameter values for which only one of two perceptual alternatives is possible. The parameter is then gradually decreased (descending trials) or gradually increased (ascending trials) by a variable number of steps, so the final, end-point parameter value for each trial also is variable. For trials with just a few parameter steps, it is unlikely that there will be a change in perception. For trials with more steps, the probability of the initial percept persisting for the entire trial will decrease as the number of parameter steps increases. It then can be determined when perceptual transitions were likely to have occurred by comparing reports of perceptual change for ascending and descending trials with different end-point parameter values. Perceptual hysteresis would be indicated if the percept for a particular end-point parameter value is different, depending on whether the end-point is reached via an ascending or descending sequence of parameter changes. Response perseveration, decision/response-time, and the rate of parameter change are not factors because the order of ascending and descending trials is randomized, and because the observer does not respond until the end of each trial (and then, without speed stress). 4.1. Motion Quartets The modified method of limits was first used to measure perceptual hysteresis with motion quartets (Hock et al., 1993). The motion quartet is an apparent motion stimulus for which two spots of light corresponding to the opposite corners of an imaginary rectangle are presented during odd numbered frames, and two spots of light corresponding to the other, opposite corners of the imaginary rectangle are presented during even numbered frames. Either parallel-path horizontal motion or parallel-path vertical motion can be perceived for the same stimulus, but both are never perceived at the same time (Fig. 1(a)). The control parameter for the motion quartet is its aspect ratio, the vertical divided by the horizontal distance between the spots of light composing the quartet. Large aspect ratios favor the perception of horizontal motion (vertical motion is perceived less often than horizontal motion), as in Fig. 1(b). Small aspect ratios favor the perception of vertical motion (horizontal motion is perceived less often than vertical motion), as in Fig. 1(c). Ascending trials in Hock et al. (1993) all began with an aspect ratio of 0.5, which strongly favored the perception of vertical motion. Descending trials all began with an aspect ratio of 2.0, which strongly favored the perception of horizontal motion. The aspect ratio changed in steps of 0.25 for both ascending and descending trials. As indicated in Table 1, there were six kinds of ascending trials that varied with respect to the number of steps by which the aspect ratio was increased, and six kinds of descending trials that varied with respect to the number of steps by which the aspect ratio was decreased. The twelve ascending and descending trials were presented in random order, with observers indicating at the end of each trial whether or not they perceived a change from the initially perceived to the alternative motion pattern at any time during the trial. (In other experiments observers first indicated
68
H. S. Hock, G. Schöner
Figure 1. (a–c) Illustration of the motion quartets used in the hysteresis experiments reported by Hock et al. (1993) and Hock et al. (2005). Either parallel-path vertical motion or parallel-path horizontal motion is perceived, depending on the aspect ratio of the motion quartet (the vertical divided by the horizontal path length). (d–f) Stimuli with independent vertical and horizontal motions that were matched in aspect ratio with the motion quartets in Hock et al. (2005).
whether their initial motion percept was horizontal or vertical, and then, whether or not there was a change to the alternative percept.) The frequency with which the initial percept switched to the alternative percept was graphed as a function of the trial’s end-point aspect ratio. The results for one of the participants in Experiment 2 of Hock et al. (1993) are presented in Fig. 2.
69
The modified method of limits Table 1. Twelve trials (one per row) that differ with respect to whether the sequences of motion quartet aspect ratios constituting each trial are ascending (starting with 0.5) or descending (starting with 2.0), and differ as well with respect to their end-point aspect ratio Ascending trials 0.5 0.75 0.5 0.75 0.5 0.75 0.5 0.75 0.5 0.75 0.5 0.75
1.0 1.0 1.0 1.0 1.0
1.25 1.25 1.25 1.25
1.5 1.5 1.5
1.75 1.75
2.0
Descending trials 2.0 1.75 2.0 1.75 2.0 1.75 2.0 1.75 2.0 1.75 2.0 1.75
1.5 1.5 1.5 1.5 1.5
1.25 1.25 1.25 1.25
1.0 1.0 1.0
0.75 0.75
0.5
Figure 2. Hysteresis effect observed by gradually increasing or gradually decreasing the aspect ratio of a motion quartet for a participant in Hock et al.’s (1993) second experiment. The proportion of trials with switches from horizontal to vertical motion, and vice versa, are graphed as a function of the aspect ratio at which each ascending or descending sequence of aspect ratios ends. (Note the inversion of the axis on the right.)
It can be seen that the frequency with which there were switches during trials with a particular end-point aspect ratio was different, depending on whether that aspect
70
H. S. Hock, G. Schöner
ratio was preceded by an ascending sequence (the vertical axis on the left side of the graph) or by a descending sequence of aspect ratios (the inverted vertical axis on the right side of the graph). For example, when the end-point aspect ratio was 1.25, horizontal motion was perceived without a switch to vertical motion for all of the descending trials and vertical motion was perceived without a switch to horizontal motion for most of the ascending trials. Perception therefore was bistable for the aspect ratio of 1.25 and other aspect ratios near it; both horizontal and vertical motion could be perceived for the same stimulus, the proportion of each depending on the direction of parameter change. As indicated above, this evidence for perceptual hysteresis was obtained under conditions in which potential artifacts of response perseveration and decision/response time were eliminated. Described next is an extension of the modified method of limits, which showed that the hysteresis effects obtained for motion quartets are not an artifact of judgment uncertainty. 5. Judgment Uncertainty In most psychophysical procedures, observers are required to distinguish between two perceptual alternatives: “Do you perceive the attribute, or not?” or “Do you perceive attribute A or attribute B?” Judgment uncertainty would arise if the observer’s percept does not clearly correspond to one of the alternatives. Hock et al. (2005) addressed the issue of judgment uncertainty within the framework of the modified method of limits by comparing ascending and descending trials on the basis of two different response criteria. One criterion was the same as above; observers indicated after each trial whether or not there was a change from one of the specified perceptual alternatives to the other anytime during the trial. For the second criterion, they indicated whether or not their perception of the initial alternative was lost anytime during the trial. The idea was that judgment uncertainty would be indicated if there were an interval during a trial for which an observer’s initial percept was replaced by an intermediate percept that could not be confidently judged to be the alternative to the initial percept. The modified method of limits is particularly well suited for this kind of determination. If judgment uncertainty were a factor, trials with a relatively small number of steps would reach parameter values for which the initial percept was lost, but trials with more steps would be required to reach parameter values for which there was a change to the alternate percept. The intermediate percept would occur during the intervening steps. Hock et al. (2005) tested for judgment uncertainty with motion quartets and with stimuli for which observers were required to judge the relative length of independent horizontal and vertical motion paths (Fig. 1(d)–(f)). The variable-duration ascending and descending trials were constructed as in Table 1, with matching aspect ratios for the motion quartets and the stimuli with independent horizontal and vertical motions (the particular values of the aspect ratio were somewhat different than in Hock et al. 1993).
The modified method of limits
71
A participant’s results for judgments of path length are presented separately for ascending and descending trials in Fig. 3(a) and 3(b). It can be seen for the ascending trials that the initial percept (“the horizontal path is longer”) was lost for trials with smaller end-point aspect ratios compared with trials for which there was a change to the alternative percept (“the vertical path is longer”). For the descending trials, the initial percept (“the vertical path is longer”) was lost for trials with larger end-point aspect ratios compared with trials for which there was a change to the alternative percept (“the horizontal path is longer”). The difference in aspect ratio between the loss of the initial percept and the emergence of the alternative percept indicated the range of aspect ratios for which there was an intermediate percept (“the motion paths are equal in length”), so judgments were uncertain with respect to the specified alternative percepts. It can be seen in Fig. 3(c) and 3(d) that this was not the case for the motion quartets (all the results in Fig. 3 are for the same participant). Even though the two response criteria were tested during separate blocks of trials, the loss of the initial percept and the change to the alternative percept occurred for the same end-point aspect ratios. There were no aspect ratios for which participants were unsure whether the perceived motion pattern was vertical or horizontal, confirming that the hysteresis effect obtained for motion quartets was indicative of perceptual hysteresis, and was not an artifact of judgment uncertainty due to the occurrence of an intermediate percept (e.g., diagonal motion). 6. Inferences from Trial Duration and Single-Element Apparent Motion Rather than a true change in perception, it might be argued that observers tested with the modified method of limits were basing their responses on inferences drawn from the duration of each trial (“the trial lasted long enough for the percept to have changed”). For example, they might never perceive switches between horizontal and vertical motion for motion quartets, but nonetheless report that they had occurred more frequently for long duration trials (many parameter steps) than for short duration trials (fewer parameter steps). This possibility has been addressed for motion quartets by Hock et al. (1993) and for single element apparent motion by Hock et al. (1997). We focus here on the latter because the single-element apparent motion paradigm is closer to the intent of threshold-measuring psychophysical procedures. Hock et al.’s (1997) study was based on a generalized version of a single-element apparent motion stimulus that was similar to a stimulus that previously was described by Johansson (1950). In standard apparent motion, a visual element appears first at one location, and then is shifted discretely to another location (Fig. 4(a)). For generalized apparent motion, elements are simultaneously visible at both locations (Fig. 4(b)). Motion is perceived when luminance contrast decreases at one location and increases at the other. (Standard apparent motion is a special case of generalized apparent motion for which the lower luminance value at each element location corresponds to the luminance of the background.)
72
H. S. Hock, G. Schöner
Figure 3. Results for one of the participants in Hock et al.’s (2005) comparison of hysteresis effects for motion quartets and for stimuli with independent vertical and horizontal motion paths. In separate blocks of trials, the participants reported whether or not they ‘lost’ their initial percept anytime during a trial, or whether or not there was a change to the alternative percept anytime during a trial. The results for these response criteria are reported separately for trials with ascending and trials with descending aspect ratios, for both independent motion paths, (a) and (b), and for motion quartets (c) and (d).
The parameter for the generalized apparent motion stimulus was the backgroundrelative luminance contrast (BRLC), the change in luminance for each element divided by the difference between the element’s average luminance and the luminance of the background: the larger the BRLC value, the greater the likelihood that motion will be perceived. Hock et al. (1997) created trials with ascending and descending BRLC steps, as per the modified method of limits. Ascending trials began with a BRLC value of 0.1, for which non-motion always was perceived. Descending trials began with a BRLC value of 0.9, for which motion always was perceived. BRLC values were then changed in steps of 0.1 for a variable number of steps. In Hock et al.’s (1997) third experiment, ‘inferences from trial duration’ were elimi-
73
The modified method of limits
Figure 4. Examples of standard and generalized apparent motion stimuli. Table 2. Twelve trials that differ with respect to whether the BRLC values for sequences of single element apparent motion stimuli that are ascending or descending, and differ as well with respect to their end-point BRLC value Ascending trials 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.2 0.3
0.1 0.1 0.1 0.1 0.1 0.2 0.3 0.4
0.1 0.1 0.1 0.1 0.2 0.3 0.4 0.5
0.1 0.1 0.1 0.2 0.3 0.4 0.5 0.6
0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Descending trials 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.8 0.9 0.8 0.7
0.9 0.9 0.9 0.9 0.9 0.8 0.7 0.6
0.9 0.9 0.9 0.9 0.8 0.7 0.6 0.5
0.9 0.9 0.9 0.8 0.7 0.6 0.5 0.4
0.9 0.9 0.8 0.7 0.6 0.5 0.4 0.3
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
nated as a potential artifact by constructing trials with equal duration but different end-point BRLC values. This was done by repeating the first BRLC value in the series an appropriate number of times, as indicated in Table 2. The frequency with which the initial motion percept switched to non-motion (descending trials) and the initial non-motion percept switched to motion (ascending trials) was graphed as a function of each trial’s end-point BRLC value. A participant’s hysteresis effect is presented in Fig. 5. It can be seen that the frequency with
74
H. S. Hock, G. Schöner
Figure 5. Hysteresis effect observed by gradually increasing or gradually decreasing the background relative luminance contrast (BRLC) for a participant in Hock et al.’s (1997) third experiment. The proportion of trials with switches from the perception of motion to the perception of nonmotion, and vice versa, are graphed as a function of the BRLC value at which each ascending or descending sequence of BRLC values ends. (Note the inversion of the axis on the right.)
which there were switches during trials with a particular end-point BRLC value was different, depending on whether that aspect ratio was preceded by an ascending (vertical axis on the left side of the graph) or a descending sequence of BRLC values (the inverted vertical axis on the right side of the graph). For example, when the end-point BRLC value was 0.5, motion continued to be perceived without a switch to non-motion for 90% of the descending trials, and non-motion continued to be perceived without a switch to motion for 58% of the ascending trials. Perception therefore was bistable for this BRLC value and other BRLC values near it; both motion and non-motion could be perceived for the same stimulus, the proportion of each depending on the direction of parameter change. It was thus confirmed that the hysteresis effect obtained for single-element apparent motion was indicative of perceptual hysteresis, and was not an artifact of ‘inferences from trial duration’. 7. Near-Threshold Neural Dynamics The perceptual hysteresis effect described above indicates that there are two stable activation states possible for the motion detectors stimulated by generalized apparent motion stimuli, one suprathreshold (motion is perceived) and the other subthreshold (motion is not perceived). Because of this stabilization of near-threshold activation, motion and non-motion percepts both can occur for the same stimulus (bistability), and both can resist random fluctuations and stimulus changes that would result in frequent switches between them. 7.1. Why Stabilization Is Necessary Whether an individual detector is activated by a stimulus or not, a random perturbation will with equal probability increase or decrease its activation. Assume it
The modified method of limits
75
increases activation. The next and all following random perturbations will again with equal probability increase or decrease activation. For unconstrained sequences of perturbations that by chance resulted in more increases than decreases, activation would drift toward higher levels. Similarly, for unconstrained sequences of perturbations that by chance resulted in more decreases than increases in activation, activation would drift toward lower activation levels. Although the steady-state, mean activation would remain the same, the variance of the activation would increase indefinitely over time were it not constrained by stabilization mechanisms intrinsic to individual neurons. Such mechanisms would resist random changes in activation that would move activation away from the neuron’s steady-state level by ‘pushing’ activation back toward the mean, steady-state value. The biophysics of ion flows through neural membranes provides a mechanism through which this stabilization of neural activation could be achieved (Trappenberg, 2002). Nonetheless, neural stability does not guarantee perceptual stability. When an appropriate stimulus is presented, detector activation increases from no-stimulus, resting states toward stimulus-determined, steady-state values. The changing activation values are continuously stabilized by change-resistant neural mechanisms, as described above. If, however, activation settles at a value close to the read-out threshold (which determines whether or not there is sufficient detector activation for an attribute to be perceived), random fluctuations would rapidly shift activation back and forth across the threshold (Fig. 6(a)). This would render perception highly unstable and uncertain, despite the neural stabilization of activation. Such perceptual instability and uncertainty might occur for some stimulus attributes, but it does not occur for others. For example, either stable motion or stable non-motion is clearly perceived for generalized single-element apparent motion stimuli, with minimal uncertainty (Hock et al., 1997). 7.2. Detection Instability The stabilization of activation at suprathreshold levels for stimulus values that would otherwise have brought activation close to the read-out threshold is made possible by the connectivity of detectors, and in particular, by virtue of activation passing through a detection instability (Bicho et al. 2000; Johnson et al., 2009; Schneegans and Schöner, 2008; Schöner, 2008). The principle is that when a stimulus is presented for which an ensemble of detectors with similar selectivity is responsive, the activation of the detectors rises from their no-stimulus resting level until a level is reached for which their activation is boosted by self-excitation; i.e., via mutual excitatory interactions within the ensemble, and/or excitatory feedback from higher-level detectors. Crossing such a self-excitation threshold results in detection instability; i.e., unstabilized, transient changes in activation that rapidly raise it to steady-state levels that are well above the read-out threshold (Fig. 6(b)). In signal detection terms (Green and Swets, 1966), the read-out threshold would correspond to the criterion, and the increased detector activation that results from passing through the detection instability would increase the detectability of a near-threshold
76
H. S. Hock, G. Schöner
$FWLYDWLRQ
D :LWKRXW6HOI([FLWDWLRQ
5HDG2XW 7KUHVKROG
K 1HXUDOO\6WDELOL]HG$FWLYDWLRQ ,QFUHDVHV:KHQDQ$SSURSULDWH 6WLPXOXVLV3UHVHQWHG
7LPH
Figure 6. Illustration of detector activation increasing from its no-stimulus resting level (h) upon the presentation of a stimulus attribute to which it responds. The increasing activation is stabilized by intrinsic neural mechanisms. Panel (a) illustrates why perception would be unstable if the steady-state activation value were near the read-out threshold for the detector, which determines whether or not there is sufficient activation for the stimulus attribute to be perceived. Panel (b) illustrates the transient increase in detector activation (i.e., the detection instability) that would result from activation crossing a self-excitation threshold, with activation stabilizing at a level that is well above the read-out threshold.
attribute by decreasing the overlap of the signal-plus-noise with the noise-alone distribution. 7.3. A Dynamical Model The effect of detection instability on near-threshold perception for generalized apparent motion is illustrated with a simplified feedforward/feedback model (Fig. 7). The stimulus, which was discussed in the preceding section, is depicted in Fig. 4(b). In the model, leftward and rightward motion detectors are alternately activated by the back-and-forth apparent motion stimulus. When activation for either reaches a threshold level, it feeds forward to a bidirectional ‘horizontal motion’ detector, which is activated by both leftward and rightward motion signals. (Cortical neurons with either unidirectional or bidirectional selectivity are found in Areas V1 and MT; Albright, 1984; Felleman and Kaas, 1984.) Excitatory feedback from the bidirectional horizontal detector closes the loop, adding activation to both the leftward and rightward motion detectors that boosts their activation well over the threshold level required for their perception (see Note 2). The coupled dynamical equations that determine how activation evolves over time for the three detectors are presented in the Appendix. The equations for the leftward and rightward detectors indicate, at each moment in time, how their activation (uL and uR ) will change in the immediate future, as determined by whether duL /dt and duR /dt are positive or negative. Whether activation will increase or decrease, and by how much, depends on the detectors’ current level of activation,
The modified method of limits
77
Figure 7. A feedforward/feedback model implementing detection instability. It is composed of unidirectional leftward and rightward motion detectors whose stimulus-initiated activation, if greater than 0, feeds forward to a bidirectional horizontal motion detector. Excitatory feedback from the horizontal to the leftward and rightward detectors boosts their activation.
the level of stimulus-initiated activation relative to their no-stimulus resting level, the feedback that is received from the bidirectional horizontal detector, and random noise. How the activation of the bidirectional horizontal detector (uH ) will change in the immediate future will likewise depend on whether duH /dt is positive or negative, as determined by its current level of activation, the input it receives from the leftward and rightward motion detectors relative to its no-stimulus resting level, and random noise. With these recursive increases and decreases, the activation levels of the three detecting units evolve over time until they settle at steady-state values for which all remaining changes in activation are due to random fluctuations. This occurs when duL /dt, duR /dt, and duH /dt are approximately equal to 0. The model generates motion-perceived signals when the steady-state activation values for the leftward and rightward detectors exceed the read-out threshold, which is set at 2. Whether or not motion is perceived then depends on the extent to which activation exceeds this threshold relative to the level of noise in the decision process. Feedforward from the leftward and rightward detectors to the bidirectional horizontal detector occurs in the model only when leftward or rightward activation is greater than the self-excitation threshold of 0. This nonlinear excitatory interaction is immediately followed by the feedback of excitation from the horizontal to the leftward and rightward detectors. The feedback also is nonlinear. It is implemented in the model with a Naka–Rushton equation (Naka and Rushton, 1966) that approximates a step function; i.e., there is no feedback when there is no feedforward activation, and the amount of feedback is the same for all input activation values greater than 0.1 (the latter prevented activation levels from soaring as a result of the closed feedforward/feedback loop being excitatory) (see Note 3).
78
H. S. Hock, G. Schöner
In the four simulations that follow, the temporally evolving activation states for the leftward, rightward, and horizontal detectors were determined for trials composed of eight back-and-forth frames. With the exception of differences in the stimulus-initiated activation of the directionally selective leftward and rightward motion detectors, the model parameters were the same for all the simulations. 7.4. Simulations 1 and 2: Near-Threshold Perception without and with Feedback The stimulus-initiated activation of the leftward and rightward detectors was S = 10.5 for Simulation 1, so with a no-stimulus resting level of h = −8, and in the absence of feedback, steady-state leftward and rightward activation values were reached at uL = uR = S + h = 2.5 (Fig. 8(a)). Although this stabilized activation is just above the read-out threshold of 2, low signal-to-noise ratios in the decision process would sometimes result in motion-perceived decisions and sometimes result in motion-not-perceived decisions. Feedback was introduced in Simulation 2 (stimulus-initiated activation was the same as in Simulation 1). Because the stimulation of the leftward and rightward detectors was sufficient for their activation to exceed the self-excitation threshold of 0, the feedforward/feedback loop was engaged. The resulting detection instability led to the stabilization of activation for the leftward and rightward detectors at a steady-state value of 8.5 (the feedback excitation was 6), well above the readout threshold level for perception (Fig. 8(b)). With the noise level in the decision process the same as in Simulation 1, the higher signal-to-noise ratio would much more consistently result in motion-perceived decisions. 7.5. Simulation 3: Bistability The feedforward/feedback loop is activated when the activation of leftward or rightward detectors exceeds the self-excitation threshold, but because of random fluctuations in activation, crossing this threshold can occur for values of stimulus-initiated activation that would, without feedback, result in activation levels less than 0. Random changes in activation occur once every millisecond in the model, so for motion signals nominally lasting for 200 ms, there were 200 opportunities per frame for the occurrence of a threshold-crossing random fluctuation. When such a fluctuation occurs, the feedforward/feedback loop is engaged and activation is transiently boosted to values well above the read-out threshold for perception. When a sufficiently large fluctuation does not occur, the leftward and rightward motion detectors remain stabilized at an activation level that is subthreshold for perception. This bistability was demonstrated in Simulation 3 for a stimulus-initiated activation of S = 7 (Fig. 9). Within the same trial, activation initially was stabilized below the read-out threshold at uL = uR = −1. Later in the trial, there was a random fluctuation large enough for activation to cross the self-excitation threshold, and there was a switch to activation levels that were well above the read-out threshold (uL = uR = 5).
The modified method of limits
79
Figure 8. (a) Simulation 1: single trial simulation without feedback for stimulus-initiated activation (S = 10.5) that just exceeds the read-out threshold (u = 2), which determines whether there is sufficient detector activation to signal the perception of motion. (b) Simulation 2: single trial simulation for the same stimulus, but with feedback from the bidirectional horizontal motion detector boosting the activation of the unidirectional leftward or rightward motion detectors. This occurs when the stimulus-initiated activation of the leftward or rightward detectors is sufficient for activation to pass through a detection instability (the feedforward/feedback loop is engaged when the activation of the leftward or rightward detector exceeds the threshold for self-excitation (u = 0 in these simulations).
80
H. S. Hock, G. Schöner
Figure 9. Simulation 3: single trial simulation for stimulus-initiated activation (S = 7.0) that would be insufficient to engage the feedforward/feedback loop (as was the case during the first part of the simulated trial) were it not for random fluctuations in activation that crossed the self-excitation threshold for a detection instability (as occurred during the second part of the simulated trial).
7.6. Simulation 4: Perceptual Hysteresis The final simulation brings this article back to its beginning, where it was argued that the elimination of various artifacts in the classical method of limits could reveal the presence of perceptual hysteresis. Simulation 4 (Fig. 10) showed that neural feedback can produce perceptual hysteresis. Descending trials began with stimulusinitiated activation values that were well above and were gradually decreased to where they were well below the self-excitation threshold for the perception of motion. The opposite was the case for the ascending trials. The presence of hysteresis was indicated in the model by perception for stimulus-initiated activation values of S = 5, 6, and 7 depending on their immediately preceding perceptual history. That is, the perception of motion was signaled for these stimulus activation values when activation was above the self-excitation threshold during the preceding frames (descending trials), but not when activation was below the self-excitation threshold during the preceding frames (ascending trials). 8. Conclusion When an attribute is presented for which an ensemble of detectors with similar selectivity is responsive, activation will increase for each detector at a rate determined by its intrinsic neural stabilization mechanism. It will settle at a steady-state level below the read-out threshold for weak attribute values and above the read-out threshold for strong attribute values. For intermediate values, steady-state activa-
The modified method of limits
81
Figure 10. Simulation 4: single trial simulations for an ascending trial composed of a sequence of frames with increasing stimulus-initiated activation, and for a descending trial composed of a sequence of frames with decreasing stimulus-initiated activation. Hysteresis occurs when stimulus-initiated activation of the leftward or rightward detectors has crossed the self-excitation threshold of u = 0 and engages (ascending trials) or disengages (descending trials) the feedforward/feedback loop. It is indicated for end-point attribute values of S = 6, 7, 8 and 9. Whether motion or nonmotion is perceived for these attribute values depends on the activational history that precedes their presentation i.e., motion is perceived when these attribute values are encountered during descending trials, and nonmotion is perceived when these attribute values are encountered during ascending trials. This is shown only for the leftward motion. The same result is obtained for rightward motion, which is not shown.
tion will lie near the read-out threshold, where in the absence of a self-excitation induced detection instability, perception of the attribute is uncertain. This description is characteristic of classical psychophysics; detectors function independently and their cumulative effect is well characterized by signal detection theory (Green and Swets, 1966). What sets the dynamical approach apart is what happens when stimulus-initiated activation reaches a level that engages excitatory feedforward/feedback loops and/or within-ensemble excitatory interactions. The stimulated detectors would then become functionally integrated, and each detector’s activation would be transiently boosted to levels that exceed the read-out threshold. As a result of such detection instabilities, near-threshold perception can be clear and distinct, even for attribute values for which the alternatives are perceived equally frequently (motion and non-motion in our example), and non-artifactual perceptual hysteresis can be
82
H. S. Hock, G. Schöner
observed for near-threshold attribute values when using the modified method of limits. To be sure, there may be many stimuli for which the classical rather than the dynamical account pertains; e.g., the detection of dim light. For such stimuli, perceptual hysteresis would not be expected when potential artifacts due to response perseveration, decision/response time, and inferences from trial duration are eliminated by using the modified method of limits. In addition, evidence for judgment uncertainty would be expected for such stimuli, as per the ‘two response criteria’ methodology that could be used along with the modified method of limits. In conclusion, there are two conceptually important implications of observing artifact-free perceptual hysteresis for near-threshold stimuli. The first is the indication it gives that individual detectors that would otherwise function independently can be organized into functional units when stimulus-initiated activation is sufficient to engage either feedforward/feedback loops or mutual excitatory interactions within ensembles of stimulated detectors. The second implication concerns the traditional definition of the threshold as a relatively arbitrary read-out criterion that determines whether or not there is sufficient detector activation for an attribute to be perceived. The neural dynamic account of perceptual hysteresis includes such a read-out threshold, but in addition, specifies a threshold for excitatory interaction that must be reached in order for feedforward/feedback loops and ensemble interactions to create an activational gap that enhances discrimination between alternative perceptual states. In contrast with the read-out threshold, or criterion, the self-excitation threshold is directly involved in the processing of near-threshold stimuli.
Notes 1. The parameter that is varied when using the method of limits need not correspond to the to-be-perceived attribute. For example, the size of the change in element luminance could be the varied parameter, and motion could be the tobe-perceived attribute, as illustrated for generalized apparent motion stimuli in Fig. 4(b). 2. A more complete model would include mutually inhibitory interactions among the leftward and rightward detectors to reflect the fact that they are generally not perceived simultaneously across the same space, as might occur (but does not) for counterphase sine gratings (Levinson and Sekuler, 1975). These interactions are not a factor in the current simulations because the opposing directions are never simultaneously stimulated. 3. Activation in a closed feedforward/feedback loop also can be prevented from soaring by the addition of delayed inhibitory interactions.
The modified method of limits
83
References Albright, T. D. (1984). Direction and orientation selectivity of neurons in visual area MT of the macaque, J. Neurophysiol. 52, 1106–1130. Bicho, E., Mallet, P. and Schöner, G. (2000). Target representation on an autonomous vehicle with low-level sensors, Intl J. Robotics Res. 19, 424–447. Boring, E. G. (1942). Sensation and Perception in the History of Experimental Psychology. AppletonCentury-Crofts, New York, USA. Braitenberg, V. (1978). Cortical architechtonics: general and areal, in: IBRO Monograph Series, Architectonics of the Cerebral Cortex, 3, Brazier, M. A. B. and Petsche, H. (Eds), pp. 443–466. Raven Press, New York, USA. Fechner, G. T. (1860). Elemente der Psychophysik (Elements of Psychophysics). Trans. Adler, H. E. Holt, Rinehart and Winston, New York, USA. Felleman, D. J. and Kaas, J. H. (1984). Receptive field properties of neurons in middle temporal visual area (mt) of owl monkeys, J. Neurophysiol. 52, 488–513. Felleman, D. J. and Van Essen, D. C. (1991). Distributed hierarchical processing in primate visual cortex, Cerebral Cortex 1, 1–47. Fender, D. and Julesz, B. (1967). Extension of Panum’s fusional area in binocularly stabilized vision, J. Optic. Soc. Amer. 57, 819–830. Girard, P., Húpe, J. M. and Bullier, J. (2001). Feedforward and feedback connections between areas V1 and V2 of the monkey have similar rapid conduction velocities, J. Physiol. 85, 1328–1331. Gori, S., Giora, E. and Pedersini, R. (2008). Perceptual multistability in figure-ground segregation using motion stimuli, Acta Psychologica 129, 399–409. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley, New York, USA. Hock, H. S., Bukowski, L., Nichols, D. F., Huisman, A. and Rivera, M. (2005). Dynamical vs judgmental comparison: hysteresis effects in motion perception, Spatial Vision 18, 317–335. Hock, H. S., Kelso, J. A. S. and Schöner, G. (1993). Bistability and hysteresis in the organization of apparent motion patterns, J. Exper. Psychol.: Human Percept. Perform. 19, 63–80. Hock, H. S., Kogan, K. and Espinoza, J. K. (1997). Dynamic, state-dependent thresholds for the perception of single-element apparent motion: bistability from local cooperativity, Perception and Psychophysics 59, 1077–1088. Hock, H. S. and Ploeger, A. (2006). Linking dynamical decisions at different levels of description in motion pattern formation: psychophysics, Perception and Psychophysics 68, 505–514. Hock, H. S. and Schöner, G. (2010). A neural basis for perceptual dynamics, in: Nonlinear Dynamics in Human Behavior, Jirsa, V. and Huys, R. (Eds). Springer Verlag, Berlin, Germany. Hock, H. S., Schöner, G. and Giese, M. (2003). The dynamical foundations of motion pattern formation: stability, selective adaptation, and perceptual continuity, Perception and Psychophysics 65, 429–457. Johansson, G. (1950). Configurations in Event Perception. Almqvist and Wiksells Boktryckeri AB, Uppsala, Sweden. Johnson, J. S., Spencer, J. P. and Schöner, G. (2009). A layered neural architecture for the consolidation, maintenance, and updating of representations in visual working memory, Brain Research 1299, 17–32. Kloeden, P. E. and Platen, E. (1992). Numerical Solution of Stochastic Differential Equations. Springer-Verlag, Berlin. Levinson, E. and Sekuler, R. (1975). The independence of channels in human vision selective for direction of movement, J. Physiol. 250, 347–366.
84
H. S. Hock, G. Schöner
Movshon, J. A. and Newsome, W. T. (1996). Visual response properties of striate cortical neurons projecting to area MT in macaque monkeys, J. Neurosci. 16, 7733–7741. Naka, K. I. and Rushton, W. A. (1966). S-potentials from colour units in the retina of fish, J. Physiol. 185, 584–599. Schneegans, S. and Schöner, G. (2008). Dynamic field theory as a framework for understanding embodied cognition, in: Handbook of Cognitive Science: An Embodied Approach, Calvo, P. and Gomila, T. (Eds), pp. 241–271. Elsevier, The Netherlands. Schöner, G. (2008). Dynamical systems approaches to cognition, in: Cambridge Handbook of Computational Cognitive Modeling, R. Sun (Ed.), pp. 101–126. Cambridge University Press, Cambridge, UK. Trappenberg, T. P. (2002). Fundamentals of Computational Neuroscience. Oxford University Press, Oxford, UK. Williams, D., Phillips, G. and Sekuler, R. (1986). Hysteresis in the perception of motion direction as evidence for neural cooperativity, Nature 324, 253–255. Wilson, H. R. (1999). Spikes, Decisions and Actions: Dynamical Foundations of Neuroscience. Oxford University Press, Oxford, UK. Woodworth, R. S. and Schlosberg, H. (1938). Experimental Psychology. Holt, Rhinehart and Winston, New York, USA.
Appendix The dynamical model for the perception of single-element apparent motion is composed of three coupled stochastic differential equations representing the activational states for the unidirectional leftward and rightward detectors (uL and uR ), and the bidirectional horizontal detector (uH ). The equations are as follows: τ
duL = −uL + huni + SL (t) + ω · σ (uH ) + q · ξ(t), dt
τ
duR = −uR + huni + SR (t) + ω · σ (uH ) + q · ξ(t), dt
duH = −uH + hbi + (uL ) + (uR ) + q · ξ(t), dt where τ determines the time scale of activation change, huni is the no-stimulus resting level of the unidirectional detectors, hbi is the resting level of the bidirectional detector, and q is the strength of the additive Gaussian white noise, ξ(t). The above equations are written somewhat differently by mathematicians. ‘dt’ is shifted to the right side of the equations and ξ(t) is replaced by ‘dW ’, where ‘W ’ denotes a Wiener process; i.e., Brownian motion. Its differentiation leads to white noise stochastic perturbations. The stimulus-initiated activation is SL (t) for leftward motion, and alternating with it, SR (t) for rightward motion. Feedback from the bidirectional horizontal detector to the unidirectional detectors is ω · σ (uH ), where ω denotes the maximum strength of the excitatory feedback and σ (uH ) denotes a Naka–Rushton equation τ
The modified method of limits
85
(Naka and Rushton, 1966) approximating a step function that begins at the selfexcitation threshold of 0: (uH − 0)4 for u 0; σ (uH ) = 0 for u < 0. (0 − 0.05)4 + (u − 0.05)4 The feedforward input from the leftward and rightward motion detectors to the bidirectional horizontal detector is determined by ramp functions restricted to positive activation levels, as follows: σ (uH ) =
(uL ) = uL
and
(uR ) = uR
(uL ) = (uR ) = 0
for uL 0 and uR 0;
for uL < 0 and uR < 0.
The time varying stimulus-initiated activations for the leftward and rightward motion detectors (SL and SR ) are present during alternating 200 millisecond frames. In Simulations 1 and 2, SL = SR = 10.5. In Simulation 3, SL = SR = 7.0. In Simulation 4, the ascending trial begins with SL = SR = 3 and increases in steps of 1 until SL = SR = 12. The reverse is the case for the descending trial. All simulations are otherwise based on the same set of parameters: τ = 10 ms, huni = −8, hbi = −2, and q = 0.008. The excitatory strength for the feedback is ω = 6, except when feedback is excluded in Simulation 1 (when ω = 0). The read-out threshold is 2. Motion is signaled when the activation of the leftward or rightward motion detectors exceeds this value. The numerical integration used the forward Euler procedure for stochastic differential equations (see Kloeden and Platen (1992), chapter 10).
Functional Adaptive Sequential Testing Edward Vul ∗ , Jacob Bergsma and Donald I. A. MacLeod Department of Psychology, University of California, San Diego, USA
Abstract The study of cognition, perception, and behavior often requires the estimation of thresholds as a function of continuous independent variables (e.g., contrast threshold as a function of spatial frequency, subjective value as a function of reward delay, tracking speed as a function of the number of objects tracked). Unidimensional adaptive testing methods make estimation of single threshold values faster and more efficient, but substantial efficiency can be further gained by taking into account the relationship between thresholds at different values of an independent variable. Here we present a generic method — functional adaptive sequential testing (FAST) — for estimating thresholds as a function of another variable. This method allows efficient estimation of parameters relating an independent variable (e.g., stimulus spatial frequency; or reward delay) to the measured threshold along a stimulus strength dimension (e.g., contrast; or present monetary value). We formally describe the FAST algorithm and introduce a Matlab toolbox implementation thereof; we then evaluate several possible sampling and estimation algorithms for such two-dimensional functions. Our results demonstrate that efficiency can be substantially increased by considering the functional relationship between thresholds at different values of the independent variable of interest. Keywords Psychophysical methods, adaptive testing, Bayesian inference
1. Introduction In many psychophysical and cognitive experiments, stimuli are presented in multiple trials, and the subject makes a response that is classified into a binary alternative (e.g., seen/not seen, correct/incorrect, or too much/too little) after each trial. For instance, in a taste detection experiment, one might aim to estimate how many grams of sugar should be dissolved in liter of water to be detected 90% of the time; in contrast sensitivity experiments: the contrast at which subjects respond with 75% *
To whom correspondence should be addressed. E-mail:
[email protected]
88
E. Vul et al.
accuracy; or in delay discounting experiments: the immediate value of a delayed reward corresponds to a 50% chance of preferring a smaller immediate reward to the delayed reward, etc. In general, the ‘psychometric function’ relates the proportion of positive responses to stimulus strength (e.g., contrast, chromaticity, or immediate reward value), and the results of such experiments are often summarized as the stimulus strength necessary for a certain proportion of positive responses (e.g., a threshold for detection, a threshold level accuracy for discrimination, or a match point giving equal proportions of positive and negative responses for matching). Fechner’s pioneering discussion of psychophysical methods (Fechner, 1860) introduced three methods to estimate these threshold: the Method of Constant Stimuli, the Method of Limits, and the Method of Adjustment. Because the method of limits tends to produce bias through hysteresis effects and the method of adjustment yields slow, and potentially over-thought, responses, the method of constant stimuli has been more often used as a general procedure for experiments of this type. In the method of constant stimuli, the stimulus strength on any trial is selected randomly from a fixed set. Due to the importance and prevalence of threshold estimation, and the growing need to test more conditions faster, many adaptive testing techniques have been developed to expedite the estimation process over and above the original method of constant stimuli. The initial development of these techniques focused on unidimensional threshold estimation: the parameters of a psychometric function, or just a single point along the psychometric function. However, modern psychophysical and cognitive experiments almost always measure not just a single threshold or match point, but how the threshold or match point changes depending on some independent stimulus or task parameter of interest: for instance, the contrast sensitivity function, which is defined by the change of contrast threshold (the reciprocal of sensitivity) with spatial frequency. A single experiment typically has the goal of estimating such a function, relating (usually) threshold stimulus strengths or (less often) match points or null settings (we can think of these measures as dependent variables in an experiment) to other stimulus parameters (independent variables). We will refer to such a function, that a particular experiment is intended to estimate, as the ‘threshold function’, because the threshold case is the most common. However, our discussion and proposals apply equally to matching or null measurements that are based on responses that may be classified into binary outcomes. The threshold function in the sense used here is quite different from the psychometric function that relates the proportion of positive responses to stimulus strength. The threshold function specifies how the psychometric function as a whole depends on some independent variable — for example, spatial frequency in the case of the contrast sensitivity function. Typically the independent variable translates the psychometric function on a suitably chosen axis. For example the change in contrast sensitivity with changing frequency can be considered as a translation of the psychometric function on a scale of log contrast. Likewise the threshold vs contrast (TvC) function describes the just detectable contrast increment as a function of the
Functional adaptive sequential testing
89
contrast to which the increment is added; cortical magnification may be measured as the threshold size necessary for some probability of detection (e.g., gap size for Vernier acuity) as a function of eccentricity; the time-course of adaptation may be expressed by the nulling stimulus strength as a function of adaptation time; and the delay-discounting function may be expressed by the subjectively equivalent proportion of immediate to delayed reward amount. In all these cases and indeed in most psychophysical experiments, the estimation problem is two dimensional, in the sense that the goal is the estimation of threshold as a function of another variable. However, unidimensional adaptive methods for assessing thresholds are less efficient for such problems, and there has been relatively little work on general formalisms and algorithms for two-dimensional adaptive testing. This is the problem we aim to address with Functional Adaptive Sequential Testing: we seek to make estimation of threshold functions more efficient, just as estimation of unidimensional thresholds has been made more efficient with the development of adaptive testing methods since Fechner introduced psychophysics.
2. Unidimensional Adaptive Testing While there is no need for a thorough review of unidimensional adaptive testing here (for useful reviews see: King-Smith and Rose, 1997; Klein, 2001; Leek, 2001; McKee et al., 1985; Treutwein, 1995) it is useful to note where and how progress has been made on this problem, as efficient estimation of the two-dimensional threshold function must begin with efficient methods for the one-dimensional case. Early methods for threshold estimation (Fechner’s method of constant stimuli; Fechner, 1860) were inefficient (see Note 1). Many trials were wasted: stimuli were presented at very high or very low stimulus strengths, at which subjects will nearly always be correct or incorrect, and thus responses are minimally informative. This was improved with adaptive reversal staircases, which reduced the frequency of testing at the less informative stimulus strengths (Dixon and Mood, 1948; Wetherill and Levitt, 1965). Later, by explicitly formalizing the underlying psychometric function, procedures like QUEST (Taylor and Creelman, 1967; Watson and Pelli, 1983) could achieve even greater efficiency. By fixing the slope parameter of the underlying function a priori, Watson and Pelli could use data from all of the trials to find the most informative stimulus placement to estimate the threshold. Others (CoboLewis, 1997; King-Smith and Rose, 1997; Kontsevich and Tyler, 1999) extended this procedure to estimate the slope parameter concurrently with the threshold, thus rendering the estimation procedure more efficient and robust under variable slopes. These are the current state-of-the-art adaptive, parametric threshold estimation procedures (see Note 2).
90
E. Vul et al.
3. Multidimensional Adaptive Testing Despite this progress in estimating single thresholds, estimation of threshold functions has seldom been considered. A straightforward approach has been called the Method of 1000 Staircases (Cornsweet and Teller, 1965; Mollon et al., 1987): To estimate the threshold function, estimate a threshold at many points along the function independently, and then fit a function to those points. The threshold at each point of the function may be estimated efficiently (for instance with QUEST or a reversal staircase), but because each threshold is estimated independently, such a procedure encounters inefficiencies similar to those that arise in the method of constant stimuli. Because each staircase is independent, it must start with assumptions of ignorance, leading to several trials at minimally informative stimulus strengths — across all of the independent staircases, many trials are wasted in this manner. The QUEST (Watson and Pelli, 1983) and the Psi (Kontsevich and Tyler, 1999) procedures allow more efficient unidimensional threshold estimation by choosing the presented stimulus values that best constrain the parameters of the psychometric function. Similarly, estimation of the threshold function can be made efficient by specifying that function formally in terms of parameters whose values are to be experimentally estimated. With the assumption that the slope of the psychometric function is the same at points along the underlying threshold function, one may define a probability surface (Fig. 1) with a relatively parsimonious parameterization (Fig. 2). This idea has previously been implemented for the cases of estimating the TvC function (Lesmes et al., 2006), the CSF function (Lesmes et al., 2010), and the somewhat more general case of ellipsoidal threshold functions (Kujala and Lukka, 2006). Although these examples use specific parametric models for specific applications, the assumption of independent translation of the psychometric function across the threshold function can be fruitfully extended to a very broad family of parametric models that cover most experiments aiming to estimate parametric threshold functions, as suggested by Lesmes et al. (2006, 2010). Here, we write down the
Figure 1. (Left panel) The two dimensional response probability surface corresponding to a contrast sensitivity function (probability of detection as a function of grating spatial frequency and contrast; from equation (3)). (Right panel) The same surface for a decaying exponential curve, as would be seen in the timecourse of aftereffects.
Functional adaptive sequential testing
91
Figure 2. Like many two-dimensional functions of psychophysical interest, the contrast sensitivity and decayed exponential response probability surfaces can be defined by factoring into a threshold function (relating threshold contrast to spatial frequency, and match-point to time) and an invariant psychometric function (relating probability of detection to stimulus contrast or match-point).
formalization of such a generalized two-dimensional testing procedure and provide a generic implementation of such an algorithm. There are two primary advantages to characterizing the threshold function using parametric optimization, rather than independently measuring points along the threshold function. First and foremost, each point along the threshold function is informed by every other one: each trial contributes information about the psychometric functions at all values of the independent variable by changing the most likely values of the threshold function parameters (see Note 3). The utilization of data across the threshold function speeds up the selection of optimally informative stimulus parameters at each point as the sequence of trials progresses (Fig. 3). Thus, estimation of all thresholds is more efficient and fewer trials are necessary to converge on an estimate of the underlying function. Second, because any point along this function can be used to inform the overall estimate of the function parameters, estimation need not be restricted to several discrete points: instead one can sample any point along the function. Such continuous sampling not only produces prettier graphs (Fig. 3), but may also indicate systematic deviations from the assumed func-
92
E. Vul et al.
Figure 3. 200 trials of a simulated experiment on the contrast sensitivity function and exponential time-course using isolated sampling (independent adaptive threshold estimation at each point; upper graphs), and adaptive functional sampling (lower graphs). Adaptive functional sampling can sample x values randomly, and samples fewer trials at uninformative points of the stimulus space, where the y value is far from its corresponding threshold. Dark grey circles correspond to negative responses; light grey circles: positive responses; the curves correspond to response probability quantiles of the final function fit.
tional form that may otherwise be missed if the function is sampled at only a few discrete points. We will begin with a formal description of functional adaptive sequential testing, then briefly describe the Matlab toolbox where we have implemented this method, and present simulations and experiments demonstrating the reliability and efficacy of this tool. 4. Theoretical Framework The goal of functional adaptive sequential testing (FAST) is to estimate efficiently the response probability as a function of stimulus strength (y) and an independent mediating variable (x). As described previously for the TvC function (Lesmes et al., 2006) and an ellipsoid function (Kujala and Lukka, 2006), this two-dimensional probability surface is parsimoniously parametrized by partitioning it into two functions: a psychometric function, and a threshold function that describes the translation of the psychometric function along the stimulus strength axis. First, the probability of a response is described by a psychometric link function: pr = (yp , yc , S),
(1)
Functional adaptive sequential testing
93
where pr is the probability of a particular response which we will refer to as the positive response (for instance ‘yes’ in a detection task, or the correct alternative in a forced choice task, or ‘too much’ in a matching task). is the psychometric function, which may be of any formulation desired (logistic, cumulative normal, etc.). yp is the presented stimulus strength — for instance, if estimating a contrast sensitivity function the y dimension would typically be the logarithm of the test grating contrast. yc is the ‘threshold’ parameter that determines the translation of the psychometric function along the y dimension. S is the width (or inverse slope) parameter of the psychometric function — while psychometric functions may differ in what this slope parameter means, all have a parameter that corresponds to the steepness of the function. Other parameters of the psychometric function (such as lapse probability, or guessing probability) are typically fixed, and need not be estimated through testing. The threshold function relates the critical value, or translation parameter, of the psychometric function described above to the independent variable of interest (for example, the spatial frequency of a grating in the case of the contrast sensitivity function): yc = F (x, ),
(2)
where F is the ‘threshold function’: a function that relates the independent variable, x, to the translation of the psychometric function, yc . is a vector of parameters of the threshold function, and the ultimate goal of the experiment is to estimate those parameters. To make this abstract formulation more specific, we continue with the contrast sensitivity example and assume that we are estimating the probability of detection as a function of grating contrast and spatial frequency. We employ the logistic psychometric function for as a function of the log contrast yp (lapse probability parameters are excluded in this example). For the CSF we adopt an intuitive parameterization of the CSF as a log parabola (Pelli et al., 1986) for F , that allows sufficient flexibility in specifying yc as a function of the spatial frequency x (see Figs 1, 2 and 3 for plots of this function). Note that this is a simpler function than that of Watson and Ahumada (2005) as used in Lesmes et al. (2010), although it appears sufficient when no particularly low spatial frequencies are being tested: pr = 1/(1 + exp((yp − yc )/S)), yc = 1 + 103 (log10 (x) − 2 )2 .
(3)
Here yc corresponds to the decimal log of contrast threshold. The parameter 1 is the decimal log of the contrast threshold at peak sensitivity, 2 is the decimal log of the spatial frequency that yields peak sensitivity, and 3 is the decimal log of the inverse bandwidth of the CSF. For any trial with presented stimulus values xp and yp the theoretical probability of either possible response (r, quantified as 1 for the ‘positive’ response or 0 for the
94
E. Vul et al.
other response option, corresponding to yes/no or correct/incorrect), can be stated for any set of model parameters ( and S) as: for r = 1, pr , P (r|xp , yp , , S) = (4) (1 − pr ), for r = 0. Multiplying this probability by the prior likelihood function in the model parameter space in accordance with Bayes theorem, we obtain the posterior probability of a given set of parameters ( and S) as: Ppost (, S|xp , yp , r) ∼ P (r|xp , yp , , S)Pprior (, S),
(5)
where Pprior (, S) corresponds to the prior probability of a given set of and S parameters. The prior can be uninformative at the onset of the experiment, but after the first trial will reflect any information gained from previous trials — that is, the posterior from the current trial will be the prior for the subsequent trial. With this equation, all of the elements necessary for Bayesian estimation of the posterior probability of the parameters of and F are in place, and we must next decide how to sample stimuli for efficient estimation of these parameters. 4.1. Stimulus Placement Placement of a stimulus on any given trial determines the information we should expect to obtain on that trial. There are several available placement strategies that may be roughly divided along two dimensions. First, should we choose a global ‘optimal’ position in both dimensions (x and y), or a local optimum by picking the best y for a given x? Second, what defines ‘optimality’? What follows is a description and discussion of this space of placement strategies as implemented in FAST. 4.2. Global Placement Strategies One may decide to choose x and y values simultaneously to find the globally most informative stimulus parameters (Lesmes et al., 2006). The obvious advantage of choosing the globally most informative point is that the trial is guaranteed to provide the most useful information, given the assumed parameterization of the response probability surface. One drawback of global placement strategies is that if the form of the assumed threshold function is wrong, global minimization may choose (x, y) locations that would be most informative if the function were correct, but are not efficient for detecting deviations from this hypothesis and may not be efficient for estimating parameters of an alternative function, once the deviation is established. Moreover, the computations required for choosing the globally optimal stimulus may be slower than is desirable. 4.3. Local Placement Strategies Alternatively, one may choose the locally optimal y value given a fixed x (choosing the most informative contrast given a pre-determined spatial frequency), while x is either sampled randomly, or according to some fixed scheme (for instance,
Functional adaptive sequential testing
95
decay-time in adaptation experiments). Although local placement strategies are theoretically less efficient than choosing the global optimum, the lost efficiency is traded for increased robustness. 4.4. ‘Optimality’: Minimizing Expected Posterior Entropy The principle of minimizing expected posterior entropy of the multimensional probability density function for the parameter values has been recently advocated as a method for stimulus placement (Cobo-Lewis, 1997; Kontsevich and Tyler, 1999; Kujala and Lukka, 2006, Lesmes et al., 2006). This method is motivated by information theory and selects stimuli so as to maximize the certainty about parameter values one may have after the trial, thus maximizing the information gained from that trial. This method has been described in sufficient detail in the papers cited above, and our procedure is broadly similar to theirs (see Appendix B). This method has the advantage of providing the theoretically most informative point, given the goal of increasing confidence about parameter values; however, the tradeoff between efficiency of estimating one or another parameter is made arbitrarily based on the scale and resolution of those parameter values. In practice, small changes along one parameter might make a much larger difference to the response probability surface than large changes along another parameter, and expected posterior entropy of the distribution of parameter values cannot take these factors into account. 4.5. ‘Optimality’: Minimizing Average Expected Posterior Predictive Variance One principled method for trading off efficiency in estimating one parameter or another would be to explicitly take into account the impact of these parameters on the response probability surface. To do so, we define a new optimality criterion: Minimizing expected average posterior predictive variance (or ‘minimizing predictive variance’ for short). Essentially, this amounts to finding a stimulus value which is expected to alter the posterior distribution in a manner that decreases the uncertainty about the predicted threshold (y ∗ ) values across a range of x values deemed relevant (see Appendix C for a mathematical treatment). On this optimality criterion, the tradeoff between different parameters is made implicitly through their impact on the predicted threshold values: focusing on the statistics of the expected threshold distribution ensures that trials will not be wasted to efficiently estimate parameters that have little impact on the predicted thresholds. One could choose to minimize the entropy, rather than the variance, of the posterior predictive pdf of the threshold. But variance minimization is appropriate if larger errors carry a greater cost than small ones. Variance minimization is optimal when cost increases as the square of the error, while entropy minimization is not supported by any such increasing cost function, instead minimizing errors without regard to their size (Twer and MacLeod, 2001). The minimum expected variance criterion provides a principled method for trading off the precision of different parameters, that can be generally applied to all threshold functions. But it has some drawbacks. It provides no incentive for ef-
96
E. Vul et al.
ficient estimation of the slope parameter of the psychometric function. It is slower to compute than minimum expected posterior entropy, and most important, in our hands (see simulation section) it does not in practice provide a substantial benefit. 4.6. ‘Optimality’: Obtaining Constant Response Probabilities A classical, and intuitive, stimulus selection procedure favors sampling a specific response probability quantile: that is, choosing stimulus parameters that will yield a particular pr . This is done in many staircasing procedures transformed or weighted to converge to a specific quantile (Kaernbach, 2001). If the data are being used to constrain the slope of the psychometric function, multiple response probabilities must be tracked, much as in multiple staircase procedures. For the case of twodimensional estimation, only one y value will produce a particular probability of response at a given x value. This y value can be found by adopting the current best estimate of the parameters and S; computing yc for the chosen x value from the threshold function and the current best estimate of , and then computing the inverse of the psychometric function to find the y value at quantile pr . But it is not logical to condition one’s estimates of one parameter on definite assumed values for other parameters that are in fact uncertain. So in FAST, we instead combine all possible values in the parameter space in proportion to their posterior probability, thus avoiding premature commitment to a currently plausible estimate of the model parameters. This space of placement strategies yields several plausible combinations: global and local entropy minimization, global and local minimization of predictive variance, and local quantile sampling. These alternatives are implemented in FAST, and their relative advantages are discussed in later sections. 5. Implementation 5.1. Estimating the Posterior Likelihoods Functional adaptive sequential testing is designed to be a general procedure for estimating threshold functions. The estimation approach we have described may be implemented in a number of different ways, each with certain advantages, and each with certain drawbacks. A number of probabilistic sampling methods are guaranteed to converge to the true posterior probability defined over the entire space of and S (e.g., MCMC; Kuss et al., 2005); however, these methods may take thousands of iterations to converge. For adaptive testing in psychophysical experiments, the posterior over and S should ideally be recomputed before each trial so that the stimulus for the next trial can be chosen most appropriately, thus MCMC strategies tend to be prohibitively slow. An alternative to running a full MCMC search is to employ sequential monte carlo (particle filtering). This method will also converge on the true posterior, and is efficient for online estimation problems. However, the optimal proposal distribution for the essential MCMC rejuvenation step (Kujala and Lukka, 2006) is usually
Functional adaptive sequential testing
97
specific to particular parameters, and as such, this step may be ineffective or slow in the generic case. Thus, to make this computation speedy for any threshold function, we opted to do a simple grid search (Kontsevich and Tyler, 1999; Lesmes et al., 2006). At initialization of the FAST structure, we define a lattice over parameter space. Each parameter is sampled within user-defined bounds (either logarithmically or linearly), with a particular uniform sampling density (this sampling density varies depending on the number of parameters, since for each added parameter, the total size of the parameter grid is multiplied by the number of sampled values). Each point in the lattice defined over parameter space corresponds to a certain vector of values of psychophysical function parameters () and the psychometric slope parameter (S). Each point is also initialized with a prior probability (which may be informative, uninformative, or uniform, as deemed appropriate). After each trial, the log-posterior probability of each parameter lattice point is updated by adding the log of the likelihood of the new response under the parameter values represented by that lattice point. Thus, the un-normalized log-posterior is computed after each trial. 5.2. Dynamic Lattice Resampling With only a finite number of points in the parameter lattice, there is an unavoidable tradeoff between the range and density of sampled parameters. From one experiment to the next, one might opt for a larger range, or a finer sampling grain, but inevitably, a situation will arise wherein both a large range and a fine sampling grain are required. In such situations, if one samples too narrow a range (but with high enough density) one may find that the maximum a posteriori parameter values are not contained in the parameter lattice, and testing will be inefficient, if not useless; similarly, if one samples too coarsely (but with a sufficiently wide range), steps between lattice points may be too large for efficient testing. Although Lesmes et al. (2006) reported that the sampling density over parameter space did not substantially alter the efficiency of the qTvC algorithm, we find that in cases of more extreme undersampling these problems do arise. In the qTvC case, such problems can, for the most part, be avoided due to the small number of parameters required for the linear TvC function. However, for functions like the CSF, where the total number of parameters (including slope) is 5 or higher, the number of lattice points rises with n5 (or even steeper), where n is the number of sampled points along each parameter. In such cases, the computational costs associated with high sampling densities become prohibitive. Since FAST is designed to operate in the general case, with any threshold function, this is a considerable problem. Our solution is to adopt dynamic resampling, or a ‘roving range’. Once some considerable amount of data has been gathered (enough to provide a reasonable estimate of the probability surface over the current parameter lattice), one may resample the parameter lattice such that it either shifts or shrinks appropriately to converge on the global maximum.
98
E. Vul et al.
Specifically, we obtain an estimate of the parameter values, and uncertainty over those values (the marginal mean and standard deviation; see next section). Using these estimates, we pick a new range that is centered on the marginal mean, and spans several standard deviations (2 is used in the subsequent simulations) on either side of the mean. With this method, an initial coarse range will shrink as additional information is gleaned about the underlying parameter values — similarly, an initially misplaced parameter range will shift to higher probability regions of parameter space. Although the details of this algorithm are rather unprincipled, we find that in practice, dynamic resampling is quite helpful (see simulations in later sections). 5.3. Estimating Parameter Values From the computed joint posterior probability density (pdf) over the parameter lattice, there are several possible ways to derive a point estimate and corresponding confidence measure for each parameter value. We focus mainly on the ‘marginal’ probability density function, in which the N -dimensional pdf in the space of parameter values is integrated over all parameters except the one of interest, to yield 1-dimensional pdfs for each parameter in turn (this may be loosely thought of as the profile of the N -dimensional joint probability density as projected onto the axis of interest). The peak of the marginal pdf does not generally agree with the peak of the N -dimensional pdf. The latter estimate, from the ‘maximum a posteriori’ (MAP) point in the N -dimensional space of parameter values, would be appropriate if we knew that the values of all other parameters were the ones associated with the MAP point. But in reality, of course, the values of all parameters are uncertain, and the probability of any given value for a single parameter of interest has to include all the cases generated by variation in the other parameters. This is what the marginal pdf represents (illustrated in Fig. 4). In specifying the confidence interval or standard error for the estimates, similar considerations arise. The most likely value for one parameter generally depends, often strongly, on the values assumed for the other parameters. This creates a ridge of high probability in parameter space. The standard error for one parameter is often calculated on the assumption that the parameter of interest is varied in isolation, with other parameters pinned at fixed values. The resulting error measure reflects the width of that ridge (usually, the cross-section through the MAP point in the relevant direction). But when the concurrent uncertainty about other parameters, as reflected in the N -dimensional pdf, is taken into account, the appropriate ‘integrated’ uncertainty measure is determined from the marginal pdf. FAST reports this ‘integrated standard error’, which is commonly much greater than the ‘isolated’ standard error obtained from the cross-section through the MAP point. In practice, FAST samples the continuous marginal pdf for any parameter at lattice points equally spaced over a finite range. The lattice can be recentered and rescaled by resampling, as described above, but the estimate of the mean and standard error of the pdf are always prone to distortion by truncation and sampling error.
Functional adaptive sequential testing
99
Figure 4. Parameter estimation procedure. We begin with a multidimensional probability density function (pdf) sampled at a few discrete lattice points (grey bars). By summing over all other parameters, we obtain a marginal probability distribution for a parameter of our choosing (two sets of black bars correspond to these marginal distributions). We then fit a Gaussian distribution to these marginals to obtain an estimate of the marginal mean and standard deviation that is less susceptible to truncation (grey lines).
Notably, when the parameter lattice is too coarse and concentrates the probability mass at one sample point, the mean regresses toward that point and the standard deviation based on the sampled points is an underestimate of the standard deviation of the complete pdf. Both in FAST’s intermediate computations and in its final estimates, these problems are addressed by fitting a Gaussian to the lattice point probabilities (see Fig. 4), and using the mean and standard deviation of the fitted Gaussian, rather than the sample values, to represent the mean and standard deviation of the continuous pdf. By focusing on the 1-dimensional pdfs during intermediate computations, FAST initially neglects the correlational structure of the N -dimensional pdf. That structure might seem important not only for a complete characterization of the uncertainty in parameter values, but also for directing the resampling process described above along the ridge of high probability in parameter space. The resampling procedure in FAST addresses this by always recentering the sample points for each parameter in turn. In this way the sequence of N 1-dimensional translations moves the lattice efficiently along the ridge. The Gaussian description may be poor in cases where the contours of the posterior probability function in parameter space deviate markedly from symmetry around the peak, as commonly happens when the slope parameter S of the psy-
100
E. Vul et al.
chometric function must be estimated from the data). In these cases the mean of the Gaussian fit to the maringal pdf differs from the mode of the marginal pdf as well as (generally) from the MAP. But even here the mean remains the optimum estimate in the least squares sense, and minimizes the cost of errors if cost increases as the square of the error (see Note 4). 5.4. Choosing Stimuli Choosing informative stimuli by minimizing the entropy of the expected posterior probability distribution requires that the expected posterior probabilities be evaluated for every stimulus of interest. This is impossible in practice, and numerical approximations must be used. To carry out these calculations for global stimulus placement strategies, we define conditional probability look-up tables for a pre-defined range of stimulus values. The conditional probability look-up table specifies the probability of a positive or of a negative response under each set of parameters in the lattice for a given stimulus. These conditional probabilities are the factors by which the relative likelihood of each set of parameter values is changed when the response is known. Using this look-up table, one can compute, for any proposed stimulus placement, the expected posterior entropy for a trial with that stimulus placement, and choose the stimulus placement that minimizes this value. In this manner, the expected posterior probability over the space of x and y stimulus values may be computed with reasonable efficiency. To circumvent the loss of resolution associated with discrete sampling of the stimulus space in the pre-defined look-up table, we also implement quadratic interpolation to estimate the optimal stimulus values, as we did for the interpolated MAP estimator. Carrying out the conditional probability calculations for local stimulus placement is easier because fixing x greatly reduces the number of alternative stimulus placements that need to be considered. Thus, the optimal stimulus strength y for a particular x value is a relatively fast computation that can usually be made on a trial-by-trial basis without constructing a conditional probability look-up table for all x values in advance. Our implementation of this search is recursive: we coarsely sample a range of y values, and compute the expected posterior entropy for each. We then choose the y value that minimizes this quantity, and resample more finely around this y value. A few iterations quickly produce a fine-grained optimal estimate that is typically much faster than finely sampling the full initial range of y values. 6. Simulation Results In this section we present computer simulations that illustrate the efficiency and flexibility of FAST. In these simulations, all parameter estimates are obtained using the mean obtained from a Gaussian fit to the marginal pdf (see section on Estimation for details and a priori justification; see Fig. 9 and accompanying discussion for empirical justification).
Functional adaptive sequential testing
101
We evaluate the parameter estimates obtained from a particular simulation in terms of their error: 1 i,j − i,j )2 , ( n n
MSE(j ) =
(6)
i=1
is the parameter estimate, i is experiment where is the true parameter vector, number, and j is the parameter number. Thus, error MSE(j ) is the mean squared deviation, to which bias and variance both contribute. 6.1. Efficiency in an Extreme Case The FAST algorithm is most effective when estimating functions with many sampled points. It was developed for the purpose of estimating time-courses of aftereffects — a scenario in which the gains from functional estimation over isolated estimation are most dramatic. We compare adaptive functional testing to adaptive testing at several discrete x values without assuming an underlying threshold function — ‘the method of 1000 staircases’ (Mollon et al., 1987; or more precisely, 100 staircases to simulate a sensible time-course experiment, Vul et al., 2008). Each independent staircase tests adaptively according to our implementation of the Psi testing algorithm described by Kontsevich and Tyler (1999). For these simulations, we adopt a logistic psychometric function, as would be natural to use in a matching task, and assume that the underlying threshold function is an exponential decay: yc = 1 + (2 − 1 )(1 − exp(−x/3 )),
(7)
where 1 is the initial (pre-adaptation) threshold value; 2 is the saturation point; and 3 is the time-constant. We compare the error in estimating these parameters over 1000 trials when those trials are sampled according to 100 independent staircases (at each of 100 timepoints), to estimation over those same time-points using functional adaptive testing, where the stimulus placement at each point is informed by all other points. For each of the 250 simulated experiments, we randomly choose new parameter values for and S. 1 is drawn uniformly from the range of −4 to 4; 2 is drawn uniformly from the range of −15 to 15; 3 is drawn uniformly from the range of 1 to 50; and S is fixed to 0.4. The prior probabilities over these values for the estimation are: 1 : normal with μ = 0, σ = 3; 2 : μ = 0, σ = 8; 3 : log10 normal with μ = 0.75, σ = 0.5; S: log10 normal with μ = −1, σ = 0.8 (the parameters that have log-normal priors are ones that have a [0, ∞) support). Figure 5 shows the results of our simulations: Functional testing confers a substantial early advantage over independent testing at different points along the function, and this initial advantage is sustained over 1000 trials for nearly all (local)
102
E. Vul et al.
Figure 5. The mean squared error as a function of the number of trials for different stimulus placement strategies estimating an exponential decay curve (as often used to describe the time-course of aftereffects). The four panels represent different parameters, from top to bottom: initial null-setting value (at = 0), saturated null-setting value, time-constant, and logistic slope parameter. Mean squared error decreases substantially faster under all functional sampling schemes compared to independent adaptive estimation procedures at each timepoint (of 100), and this initial advantage persists throughout the course of many trials.
functional testing strategies outperform independent testing throughout the duration of the ‘experiment’; thus validating the basic premise of FAST: substantial testing and estimation efficiency may be gained by explicitly considering the function relating variation in thresholds to an independent variable. 6.2. Efficiency in a Less Extreme Case (CSF) The previous example was designed to illustrate an extreme situation in which one seeks to obtain threshold estimates at 100 different points along a function — in this
Functional adaptive sequential testing
103
case, running 100 independent staircases is grossly inefficient compared to sampling those 100 points while considering the underlying function. However, few psychophysical experiments aim to assess a function with that level of precision; often, a much smaller set of points is used. In this section, we aim to both illustrate the generality of functional adaptive testing by applying it to the contrast sensitivity function, and also to quantify the reduced benefit of functional adaptive testing over independent testing when fewer points are considered. In this case we adopt a logistic psychometric function with a 4AFC detection task (mimicking the psychophysical data presented later in this paper), and assume that the underlying threshold function is the log-parabola contrast sensitivity function from equation (3) and Figs 1–3. We compare the error in estimating the three parameters of this function over 800 trials when those trials are sampled according to 14 independent staircases (at each of 14 spatial frequencies, equally spaced on a logarithmic scale), to estimation over those same spatial frequencies where the stimulus placement at each point is informed by all other points via the commonly estimated function. For each of the 100 simulated experiments, we randomly choose new parameter values for and S. 1 is drawn uniformly from the range of −4 to −1; 2 is drawn uniformly from the range of −1 to 1.5; 3 is drawn uniformly from the range of −1.5 to 0.5; and S is drawn from the range of 0.01 to 0.4. The prior probabilities over these values for the estimation are: 1 : normal with μ = −2, σ = 2; 2 : μ = −1, σ = 2; 3 : normal with μ = −1, σ = 0.3; S: log10 normal with μ = −1, σ = 1. Figure 6 reveals the predicted effect — with only 14 points being independently tested, the advantage of functional testing is much less dramatic than with 100 points. Nonetheless, a considerable advantage remains and lasts over the whole range of trials. Again, as can be seen in the results for the slope parameter, the optimality criterion has an important effect, wherein stimulus placement via posterior predictive variance minimization is particularly poor at estimating the slope parameter — this is to be expected, as the variance of the predicted threshold is completely independent of the slope, so stimuli are not chosen to estimate this parameter. 6.3. Dynamic Lattice Resampling Although Lesmes et al. (2006) found no substantial effects of decreasing the parameter sampling grain for the qTvC method, all of the sampling grains tested were rather fine. This is practicable for the qTvC case, wherein there are only three parameters to be estimated. However, in the general case, where one might aspire to test different functions, with potentially higher-dimensional parameter spaces, the sampling density becomes an issue (indeed an exponential issue, as the total number of lattice points increases with the power of the number of dimensions). Thus we tested the effects of rather dramatic under-sampling of parameter space: 5 lattice points per parameter.
104
E. Vul et al.
Figure 6. The mean squared error as a function of the number of trials for different stimulus placement strategies estimating a contrast sensitivity function. The four panels represent different parameters (see equation (3)). Here, the advantage of functional estimation over independent is smaller because fewer independent staircases are used (14). Moreover, a marked detriment is evident for the precision of estimating the slope parameter for predictive variance minimization.
Figure 7 shows the perhaps intuitive results of under-sampling: lower sampling density effectively caps the accuracy of parameter estimates. A coarser parameter sampling grain results in a higher asymptotic RMSE for all parameters. However, when we allow dynamic resampling of the 53 lattice (the ‘roving range’ we implemented), this problem is substantially reduced: we achieve much lower asymptotic errors by dynamically shrinking and resampling the (under-sampled) parameter lattice.
Functional adaptive sequential testing
105
Figure 7. A ‘roving range’ (dynamic lattice resampling) mitigates the problems that arise from sparse sampling of the parameter lattice — by shifting and shrinking the lattice, we can achieve greater accuracy.
6.4. Effects of Assuming an Incorrect Function Here we consider the effects of assuming an incorrect model of the data. Although global entropy minimization is theoretically most efficient for estimating the parameters of the assumed model, the data gathered by global stimulus placement may not be as effective at constraining the parameters of another model (which may turn out to be the reality underlying the data). To test this idea, we run 250 experiments, each with 800 trials, where we attempt to fit the correct (non-linear) TvC function to data that were gathered on the assumption of a linear TvC function, using two sampling schemes: (1) global entropy minimization, and (2) local entropy minimization. Specifically, we conduct simulations in which the real model is a non-linear TvC function: ⎧ 3 x ⎨ , for x > 1 , 2 (8) yc = 1 ⎩ for x 1 , 2 ,
106
E. Vul et al.
Figure 8. When stimulus sampling is done assuming a linear TvC function (equation (9)), but in fact, the correct model is a non-linear TvC curve (equation (8)), global stimulus placement schemes confer a substantial detriment, as seen in the final mean squared error when fitting the final, correct, function.
while functional testing assumes a simpler, linear TvC function: ⎧ ⎨ 2 x , for x > 1 , 1 yc = ⎩ for x 1 . 2 ,
(9)
Figure 8 shows that data gathered with the goal of global entropy minimization are far less effective at constraining parameters of an alternate (and only slightly different) model, yielding substantial bias, and greatly inflated mean squared error over parameter values estimated using local entropy minimization. For this reason, we advocate using fixed or random sampling of x values, and local, functional, stimulus selection. 6.5. Validation of Estimator and Estimation Procedure Thus far, we have presented data describing the efficiency of different adaptive sampling schemes while the estimates themselves were obtained using a somewhat rarely adopted estimation procedure: fitting a Gaussian probability density function to the marginals of each parameter, and using the resulting mean and standard de-
Functional adaptive sequential testing
107
viation. In this section, we demonstrate the validity of this estimator by empirically calibrating the confidence intervals obtained from it. We ask: does a q% confidence interval around the mean of this Gaussian have q% probability of containing the true value (in simulations)? This can be measured in two ways: Z-quantiles over time, and quantile–quantile plots. To obtain Z-quantiles over time, for each simulated experiment, trial, and parameter, we compute the mean and standard deviation of the current Gaussian fit to the marginals. We compute Zerr = ( − μˆ )/σˆ , where Zerr is the Z-score distance of the true parameter value , from the Gaussian mean (μˆ ) and standard deviation (σˆ estimated for that parameter). By computing this quantity for each trial, each experiment, and each parameter, we obtain a large number of Zscores. To check the calibration of the Gaussian fit to reality, we then compute the Z-quantiles, asking: what is the Z-score threshold such that q% of all computed Zerr values are below it? If the procedure is calibrated, we should expect these quantiles to remain constant over an experiment (despite the fluctuations in mean squared error seen in Fig. 5), and to correspond to the theoretically correct −1 quantiles: Zthresh (q) = G−1 CDF (q, μ = 0, σ = 1), where GCDF is the inverse of the Gaussian cumulative density function. For example, q = 0.025 yields the familiar Zthresh (q) = −1.96. Figure 9 shows that this plot for the exponential data from Fig. 5 reflects an ideal, stable calibration (similar results obtain for other simulations). A quantile–quantile (QQ) plot is a fine-grained summary of the results presented above. To compute the QQ plot in Fig. 9, we calculate the Z-score boundary of a q% confidence interval around the mean: CI = G−1 CDF (0.5 ± q/2, 0, 1). Then we
Figure 9. (Left panel) Z-quantile calibration, here we show a representative sample of two parameters (the initial setting in an exponential decay, and the slope parameter from an exponential decay — see equation (7)). Z-score quantiles of parameter errors are stable over time, and near their theoretical values. (Right panel) This calibration of the Gaussian fit to the marginal distribution can be seen in the QQ plot, where the confidence interval percentile predicts with near perfect identity the empirical quantile (e.g., 95% of true parameter values lie within a 95% confidence interval). Different lines here correspond to different parameters of the exponential, all are near the unity line indicating that our estimator is generally well calibrated.
108
E. Vul et al.
compute the proportion of trials in which the true parameter value lies within this confidence range p. Figure 9 plots p as a function of q. A perfectly calibrated plot will yield a perfect diagonal with unity slope, such that every q% interval contains p% of the true values. Our data are well aligned with this ideal, showing none of the over-confidence (a line below the diagonal) of truncated marginal estimators, or much of a miscalibration due to the true posterior not being Gaussian (often revealed as a crossing of the diagonal). 7. Psychophysical Results FAST has been used to advantage in our investigations of topics including the time course of the generation and decay of afterimages; the dynamics of contrast gain control; and contrast sensitivity functions. We next report an experiment designed specifically to provide a comparison between the efficiency of FAST and other methods. This was a CSF measurement in which trials generated by FAST are interleaved with trials generated by a similar search algorithm operating independently at each spatial frequency. We used a four-alternative forced choice procedure to measure contrast sensitivity as a function of spatial frequency. The stimulus was an annulus displayed for 200 ms. On each trial, one quadrant, randomly selected, contained a radial sinusoidal contrast grating, while the other three quadrants were of uniform luminance. Subjects were asked to indicate which quadrant was nonuniform. The luminance of each quadrant was independently offset by a small random amount each trial, to prevent detection of the nonuniform quadrant through a difference in average luminance (despite a gamma-corrected display). Eight spatial frequencies were presented in a cycle, and an equal number of trials were presented at each spatial frequency. To compare the performance of FAST with the performance of independent staircases, we used two conditions, which ran simultaneously with trials randomly interleaved. The difference between the two conditions was in the method used to select the level of contrast for the stimulus grating. In both cases, the contrast level was chosen so as to be detectable at a rate between 0.65 and 0.85, based on data from previous trials. In the independent condition, the detection threshold was estimated for each spatial frequency based only on data collected from that spatial frequency. In the CSF condition, FAST estimated the detection threshold based on data previously collected from all spatial frequencies, under the assumption that the contrast sensitivity function would fit the form of a log parabola CSF (Pelli et al., 1986). The log parabola CSF model fits a parabola to the logarithm of the spatial frequency (equation (3)), such that 1 determines the level of contrast sensitivity at the peak (shifting the parabola up and down); 2 determines at what spatial frequency this peak occurs (shifting the parabola left and right); and 3 determines the rate at which sensitivity drops off (stretching the parabola horizontally). Data were only used to inform the adaptive testing procedure in the same condition, so the CSF
Functional adaptive sequential testing
109
and independent conditions remained independent in their parameter estimates and choice of stimuli throughout the experiment. After data were collected, they were pooled to calculate a final estimate of contrast sensitivity, both independently at each spatial frequency, and using the log parabola CSF (Fig. 10). We used the final independent estimates of contrast threshold at at spatial frequency x (yx∗ ) as ‘ground truth’ to evaluate the error. Mean squared error for each trial, for each condition was computed using equation (6). The estimated contrast threshold, yˆx∗ , for a given spatial frequency was calculated using the same method by which data were collected for that condition: For the independent condition, each spatial frequency’s threshold was calculated independently, and for the CSF condition FAST estimated the threshold using the log-parabola CSF. For both conditions, the estimate at trial t was compared against the final independent estimate yx∗ based on all data from every condition at a given spatial frequency. In the first 100 or 200 trials, the FAST condition obtains a more accurate estimate than the independent condition due to the advantages of pooling information across spatial frequencies and more effective stimulus placement. Thus, as our simulations suggested, FAST can be used to expedite the estimation of contrast sensitivity functions, even when a small number of spatial frequencies are tested (for a thorough treatment of expedited CSF estimation through functional adaptive testing, see Lesmes et al., 2010). However, after about 200 trials, the precision of the independent threshold estimates becomes sufficient to uncover systematic deviations from the model. If the model cannot fit the data perfectly (likely, most real-world cases), the error of a model-based estimate will have a nonzero asymptote because even after the parameters reach their optimal values, there will still be a difference between the model and reality. The independent estimate, providing more degrees of freedom, can be expected to perform better in the long run, and does so here; but, of course, data collected efficiently using FAST may later be fit with independent thresholds to take advantage of these additional degrees of freedom. 8. Discussion Since Fechner (1860) suggested that perception can be measured against a physical standard, researchers have been estimating psychophysical thresholds and match points across the gamut of perceptual domains. As the sophistication of psychophysical experiments has increased, so too has the demand for efficient measurement of thresholds, and experimental methods have been gradually improving to meet this demand. However, as of yet, no general two-dimensional threshold estimation procedure has been proposed. We described a general and efficient algorithm for estimating two-dimensional response probability surfaces, and provide a Matlab implementation of the algorithm. While the FAST toolbox provides a variety of alternative strategies for stimulus placement and parameter estimation, here
110
E. Vul et al.
Figure 10. (Top panel) Mean squared error (log scale) for estimated threshold values as a function of the number of trials (log scale). Predictions using isolated staircases are less accurate as compared to functional adaptive testing. (See text for details.) (Center panel) Simulations using the final threshold estimates obtained from the psychophysical data and the same simulated testing procedure — these results indicate that given identical model mis-specification, our idealized simulations predict gains similar to those seen for real human observers. (Bottom panel) Final model fit to the data (lines correspond to different response probability quantiles — 40% 60% and 80%) and final independent threshold estimates of the 62.5% quantile (asterisks), in this case, the simple log-parabola CSF function appears to be a satisfactory (albeit imperfect) model of the data.
Functional adaptive sequential testing
111
we want to briefly discuss the strategies that seem best, and potentially useful future extensions of this work. 8.1. Assumptions of Functional Adaptive Sequential Testing As mentioned throughout the paper, the functional adaptive testing formulation does make several important assumptions, which may yield imprecise, and perhaps misleading, results if violated. First, like nearly all psychophysical testing procedures and experimental designs, our formulation assumes that experimental trials are exchangeable — in other words, they are invariant to order. This assumption is crucial to essentially all cognitive and psychophysical experiments, but may often be violated due to a temporal correlation of errors (e.g., from phasic internal states) or learning effects. For basic threshold estimation, FAST may be used to account for online learning by modeling the threshold as an exponential learning curve, thus allowing for some robustness to these effects; however, if FAST is used to estimate a threshold function, like a CSF, learning effects will constitute unaccounted for variability. This is not a serious problem — in the worst case scenario such learning or temporal correlation effects will increase response variance and thus reduce the efficiency when testing human participants, but this loss of efficiency will apply to any common testing procedure. Second and more critically, we must assume that the psychometric function is translation invariant in both stimulus strength and the independent variable of interest; in other words a single slope parameter captures the relationship between increasing stimulus strength and probability of positive response regardless of variations of the independent variable (e.g., spatial frequency) and critical value (e.g., 75% discrimination contrast threshold). Within the apparently wide jurisdiction of ‘Crozier’s Law’ of proportionality between threshold stimulus magnitudes and their standard deviations (Treisman, 1965), the translation assumption holds provided that the psychometric function is expressed in terms of the logarithm of the stimulus magnitude. With that caveat, the translation assumption appears to be approximately valid in the literature (Watson and Pelli, 1983), but it may well be violated in particular psychophysical or cognitive paradigms. Third, FAST requires an assumption of a particular parametric form of the threshold function — this assumption will generally be insecure and at best inexact (for instance, will delay discounting follow an exponential or a power law?) For this reason, we prefer local to global stimulus placement (see next section) thus trading off some theoretical efficiency in favor of robustness to an incorrect assumed functional form. We stress that FAST is not rendered useless by failure of the theoretical form assumed for the threshold function. On the contrary, it provides an efficient method for detecting small deviations from the assumed theoretical form. But the efficiencey of FAST for that purpose, and for threshold estimation in general, declines when the deviations of the theoretical model from reality exceed the current threshold uncerainty. This limitation is illustrated by the results of Fig. 10
112
E. Vul et al.
(top panel) after large numbers of trials. We have also found that FAST can efficiently distinguish between models of different form: multiple FAST algorithms can be run concurrently, each using a different candidate threshold function: for each trial, a threshold function is randomly selected, and stimuli are chosen based on that threshold function, while the data are used to update the parameters of all functions. This procedure yields some confidence that no threshold function will benefit from a bias due to adaptive testing. 8.2. Global or Local Stimulus Placement? Theoretically and empirically, selecting the point in stimulus space that is globally most informative generally results in faster convergence to the true function parameters than alternative methods. Indeed, so long as the psychometric function is monotonic, no disadvantages are incurred by choosing the optimal stimulus parameters in the unidimensional threshold estimation case. However, for twodimensional estimation — particularly if the variation of response probability along one dimension is not monotonic — the stimulus that is globally most informative in theory may not be most informative in practice. When employing global optimization for stimulus placement, there is no guarantee that the stimulus space will be sampled in an intuitively informative manner. Once a particular form of the threshold function (and hence, the two-dimensional probability surface) is assumed, there will often be particular locations in stimulus space that are always most informative about the parameters of the assumed function as the experiment progresses. For example, if one assumes that an aftereffect decays to zero, and attempts only to estimate the initial aftereffect strength and the rate of decay, the points that would constrain these two parameters best are in the initial period of rapid decay, and there is no need to sample stimuli when the aftereffect has decayed to the (assumed) limit of zero. In all such cases, stimulus selection based on global entropy minimization will choose trials isolated to a few particular regions of the stimulus space. This is optimal if one is correct about the form of the threshold function. However, the assumed functional form may not be accurate, for example, an aftereffect may not actually decay to zero, but may have a persistent component (Vul et al., 2008). In this case, the stimulus parameters that seemed to be most informative under incorrect assumptions about the underlying function, will actually fail to inform about the deviation. Due to the possibility of these cases, it is generally useful to sample the stimulus space more uniformly, so as to detect systematic deviations from the assumed functional form. One may refuse on these grounds to entrust the sampling of the stimulus x axis to an entropy minimization algorithm, and instead select in advance the x values to sample, in the standard way. Another alternative is to sample the x axis randomly. Another alternative (discussed in the ‘Assumptions’ section) is to choose stimuli based on multiple candidate threshold functions intermittently. If the goal of testing is to explicitly distinguish different threshold models, one might wish not to just optimize stimulus placement for the estimation of parameters in a specific model,
Functional adaptive sequential testing
113
but instead to optimize stimulus placement for selection among alternative models and functional forms. This approach has been used to design experimental protocols in advance of experimentation with some success (Myung and Pitt, 2009), but not in online adaptive testing — this is a promising direction for future research. Nevertheless, if adaptively testing with the goal of model selection, it would not be possible to specify all possible alternative models, and the tradeoff between efficiency and robustness entailed in global vs local stimulus placement may still fall on the side of robustness. 8.3. What Optimality Criterion to Use? We have considered several criteria for ‘optimal’ stimulus placement. According to information theory, the ideal is minimizing expected posterior entropy — when this is optimized one chooses the theoretically most informative stimulus. However, the goals of typical psychophysical experiments are not to minimize entropy over the joint probability distribution over all parameters, but rather to minimize expected variance (that is, decrease standard error and confidence intervals) of some or all parameters, or else of the threshold itself. In the latter case there is no principled way to combine variances across parameters at different scales. In future work it may be useful to consider explicit costs of errors in different parameters. This way, stimulus placement in an experiment primarily interested in the estimation of the peak frequency of the contrast sensitivity function would optimize this goal, while an experiment interested in the steepness of the sensitivity drop-off with increasing spatial frequencies could best estimate that parameter. Although such explicit formulations of the loss function may yield the most useful experiment, the efficiency gained may not outweigh the added complexity. 8.4. Estimation of Non-parametric Functions Although in many psychophysical experiments, there is much to be gained from conducting functional adaptive testing with simple parametric functions, in some cases the parametric function is unknown, or very complex. In these cases, one may want to consider functions with many parameters, or even non-parametric estimation procedures like locally weighted regression. These might provide a basis for an alternative, multi-dimensional adaptive search technique requiring less strong assumptions, but to our knowledge such approaches has not yet been explored for adaptive testing. 9. Summary We introduced and validated a method for efficient estimation of two-dimensional psychophysical functions and a Matlab toolbox implementation of it: Functional Adaptive Sequential Testing (see Appendix A). This toolbox (and corresponding manual) may be downloaded here: http://code.google.com/p/fast-toolbox/ This tool substantially improves efficiency of psychophysical testing under correct (or approximately correct) assumptions about the parametric model of the
114
E. Vul et al.
function relating threshold (e.g., contrast) to some independent variable of interest (e.g., spatial frequency). Specifically, so long as the model error is smaller than the uncertainty inherent in the data, FAST will remain a more efficient alternative than running multiple independent staircases. However, when sufficient data are gathered such that they provide threshold estimates with precision exceeding the model error, then FAST will lead to less efficient estimation, and independent staircases should be used instead. Acknowledgement This work was supported by a Russell Sage, NDSEG, an NSF Dissertation grant (EV), and National Institutes of Health Grant EY01711 (DM). Notes 1. The method of adjustment tends to be quite efficient, but is often disfavored in practice because making a given setting takes a substantial amount of time, so it is inapplicable to quickly decaying effects, and may be susceptible to ‘overthinking’ on the part of the participants. 2. We do not discuss here current developments in non-parametric adaptive testing procedures, which improve upon classic staircase methods without invoking an explicit psychometric function (Faes et al., 2007). 3. Each point is not equally informative, and thus the testing procedure should be adaptive, to choose the most informative points. 4. To see this, recall that the mean squared value is the variance plus the squared mean, for any distribution; if the mode is adopted as an estimate of a random variable, the distribution of error in the estimate is the distribution of deviations from the mode, and the mean squared error is increased by the squared difference between mean and mode. References Cobo-Lewis, A. B. (1997). An adaptive psychophysical method for subject classification, Percept. Psychophys. 59, 989–1003. Cornsweet, T. and Teller, D. (1965). Relation of increment thresholds to brightness and luminance, J. Optic. Soc. Amer. 55, 1303–1308. Dixon, W. and Mood, A. (1948). A method for obtaining and analyzing sensitivity data, J. Amer. Statist. Assoc. 43, 109–126. Faes, L., Nollo, G., Ravelli, F., Ricci, L., Vescoci, M., Turatto, M., Pavani, F. and Antolini, R. (2007). Small-sample characterization of stochastic approximation staircases in forced-choice adaptive threshold estimation, Percept. Psychophys. 29, 254–262. Fechner, G. (1860). Elemente der psychophysik. Breitkopf and Hartel, Leipzig, Germany.
Functional adaptive sequential testing
115
Kaernbach, C. (2001). Adaptive threshold estimation with unforced-choice tasks, Percept. Psychophys. 63, 1377–1388. King-Smith, P. E. and Rose, D. (1997). Principles of an adaptive method for measuring the slope of the psychometric function, Vision Research 37, 1595–1604. Klein, S. A. (2001). Measuring, estimating, and understanding the psychometric function: a commentary, Percept. Psychophys. 63, 1421–1455. Kontsevich, L. L. and Tyler, C. W. (1999). Bayesian adaptive estimation of psychometric slope and threshold, Vision Research 39, 2729–2737. Kujala, J. and Lukka, T. (2006). Bayesian adaptive estimation: the next dimension, J. Mathemat. Psychol. 50, 369–389. Kuss, M., Jakel, F. and Wichmann, F. (2005). Approximate Bayesian inference for psychometric functions using mcmc sampling, Max-Planck-Institut fur biologische Kybernetik Max Planck Institute for Biological Cybernetics Technical Report No. 135. Leek, M. R. (2001). Adaptive procedures in psychophysical research, Percept. Psychophys. 63, 1279– 1292. Lesmes, L., Jeon, S., Lu, Z. and Dosher, B. (2006). Bayesian adaptive estimation of threshold versus contrast external noise functions: the quick tvc method, Vision Research 46, 3160–3176. Lesmes, L., Lu, Z.-L., Baek, J. and Albright, T. D. (2010). Bayesian adaptive estimation of the contrast sensitivity function: the quick csf method, J. Vision 10, 1–21. McKee, S., Klein, S. and Teller, D. (1985). Statistical properties of forced-choice psychometric functions: implications of probit analysis, Percept. Psychophys. 37, 286–298. Mollon, J. D., Stockman, A. and Polden, P. G. (1987). Transient tritanopia of a second kind, Vision Research 27, 637–650. Myung, J. and Pitt, M. (2009). Optimal experimental design for model discrimination, Psycholog. Rev. 116, 499–518. Pelli, D. G., Rubin, G. S. and Legge, G. E. (1986). Predicting the contrast sensitivity of low-vision observers, J. Optic. Soc. Amer. A, 13, 56. Taylor, M. and Creelman, C. (1967). Pest: efficient estimates on probability functions, J. Acoustic. Soc. Amer. 41, 782–787. Treisman, M. (1965). Signal detection theory and Crozier’s law: derivation of a new sensory scaling procedure, J. Mathemat. Psychol. 2, 205–218. Treutwein, B. (1995). Adaptive psychophysical procedures, Vision Research 35, 2503–2522. Twer, T. von der and MacLeod, D. (2001). Optimal nonlinear codes for the perception of natural colours, Network 12, 395–407. Vul, E., Krizay, E. and MacLeod, D. I. A. (2008). The McCollough effect reflects permanent and transient adaptation in early visual cortex, J. Vision 8, 1–12. Watson, A. B. and Ahumada, A. J. (2005). A standard model for foveal detection of spatial contrast, J. Vision 5, 717–740. Watson, A. B. and Pelli, D. G. (1983). Quest: a Bayesian adaptive psychometric method, Percept. Psychophys. 33, 113–120. Wetherill, G. and Levitt, H. (1965). Sequential estimation of points on a psychometric function, Brit. J. Mathemat. Statist. Psychol. 18, 1–10.
Appendix A: Functional Adaptive Sequential Testing (FAST) Matlab Toolbox Our implementation of FAST as a Maltab toolbox can be downloaded from the Google Code repository: http://code.google.com/p/fast-toolbox/
116
E. Vul et al.
The implementation includes functions for: 1. Initializing a FAST structure (fastFull and fastStart). 2. Updating the structure with new data (fastUpdate). 3. Dynamic resampling — implementing the roving range (fastResample). 4. Obtaining estimates and confidence intervals on parameters (fastEstimate). 5. Plotting the current data and function fits (fastPlot). 6. Selecting the next stimulus (fastChooseY). Details on the use of these functions may be found in the associated help files. The FAST toolbox contains several common threshold functions: 1. A single value — (funcVal), a function that does not vary with x — used for simple, unidimensional, threshold estimation: yc = 1 . 2. Linear — (funcLine), a simple linear change: yc = 1 + 2 x; or reparameterized in (funcMscale) to have more intuitive parameters as Magnification scaling with eccentricity: yc = 1 (1 + 2 x). 3. Hyperbolic/Simplified exponential decay — as often used to describe delay discounting, or simple decay of aftereffects (funcHyperbolic: yc = 1/(1 + 1 x) and funcExpS: yc = exp(−1 x)). 4. Exponential decay — (funcExp) as commonly used to describe time-courses of aftereffects and adaptation with more variable parameters: x . yc = 1 + (2 − 1 ) 1 − exp − 3 5. Polynomial — (funcPolynomial), a polynomial of any degree (determined by the number of parameters provided): yc = i x i−1 . i
6. CSF — the contrast threshold function as described by Pelli et al. (1986), which is simply a log-parabola relating log(threshold contrast) to log(spatial frequency) (funcCSF — more complex parameterizations are included as options): yc = 1 + 103 (log10 (x) − 2 )2 . 7. Threshold vs Contrast — including both a simple, linear, threshold vs contrast function (funcTVC): ⎧ ⎨ 2 x , for x > 1 , 1 yc = ⎩ 2 , for x 1
Functional adaptive sequential testing
117
and a non-linear version (funcTVCnl):
yc =
⎧ ⎪ ⎨ ⎪ ⎩
2
x 1
3 ,
for x > 1 , for x 1 .
2 ,
There is also a set of psychometric functions built into FAST; each of these is the full ogival cumulative density function (0–1) parameterized by a scale parameter S. These full ogival functions are then scaled depending on task parameters (number of alternatives, and lapse parameters) by the FAST function fastPsyScale: 1. Gumbel: pr (y) = 1 − exp(− exp(S(y − yc )). √ 2. Normal: pr (y) = 12 + 12 (erf ((y − yc )/( 2S))). √ 3. Half-Normal: pr (y) = erf (( yyc )S / 2), y > 0. c −1 4. Logistic: pr (y) = (1 + exp(− y−y S )) .
5. Weibull: pr (y) = 1 − exp(−( yyc )S ), y > 0. The Weibull function differs from the others listed in that the critical value: yc scales the stimulus magnitude instead of subtracting from it. But it is equivalent to case 1, the Gumbel function, if y and yc are replaced by their logarithms there. A more thorough description of the virtues and uses of each of these different functions may be found in the manual, or in the help for the individual files in the FAST toolbox. Appendix B: Minimizing Expected Posterior Entropy While others have described the calculations necessary to compute the points that minimizes expected posterior entropy (Kontsevich and Tyler, 1999), for the sake of completeness, we will provide the description here as well: First, we define the probability of obtaining a particular response (r as 0 or 1) to a particular stimulus (x, y), by integrating over all possible parameter values (, S): P (r = 1|x, y) =
P (r = 1|x, y, , S)P (, S),
,S
P (r = 0|x, y) =
(B.1) P (r = 0|x, y, , S)P (, S).
,S
Using these probabilities as normalizing constants, we can compute the posterior
118
E. Vul et al.
probability over parameters given a particular response: P (r = 1|x, y, , S)P (, S) P (r = 1|x, y) (B.2) P (r = 0|x, y, , S)P (, S) . P0 = P (, S|r = 0, x, y) = P (r = 0|x, y) For each of these posterior probability distributions we can define the entropy over parameters as: H (P ) = − P log2 P , (B.3) P1 = P (, S|r = 1, x, y) =
,S
and we can compute the expected entropy: E[H (P )] = H (P1 )P (r = 1|x, y) + H (P0 )P (r = 0|x, y).
(B.4)
The goal of global entropy minimization is the solution to: arg min{E[H (P )]}, x,y
(B.5)
and solving for the local solution amounts to solving this with x fixed (to c): arg min{E[H (P )]|x = c}. y
(B.6)
Appendix C: Minimizing Expected Average Posterior Predictive Variance Here we introduce an optimality criterion that provides a principled, and general, method for trading off efficiency of estimating different parameters. The goal of minimizing posterior variance is to choose a stimulus (xp , yp ) that is expected to alter the posterior distribution in such a way that it decreases the variance of the predicted critical values (y ∗ ) for a number of relevant x values (x eval ). In other words, this optimality criterion aims to increase the confidence of our predictions about the threshold (y ∗ value) for a number of points along the threshold function (specified by x eval ). In contrast to entropy minimization, the minimization of posterior predictive variance is supported by a rational cost function, that associates errors with a cost proportional to the square of their magnitudes, whereas entropy minimization is tantamount to minimizing error frequency in parameter space without regard to the magnitude of the error either in parameter space, or its consequences for our predictions (Twer and MacLeod, 2001). The calculation starts just as in Appendix B: First, we define the probability of obtaining a particular response (r as 0 or 1) to a particular stimulus (xp , yp ), by integrating over all possible parameter values (, S): P (r = 1|xp , yp , , S)P (, S), P (r = 1|xp , yp ) = ,S
P (r = 0|xp , yp ) =
,S
P (r = 0|xp , yp , , S)P (, S).
(C.1)
Functional adaptive sequential testing
119
Using these probabilities as normalizing constants, we can compute the posterior probability over parameters given a particular response: P (r = 1|xp , yp , , S)P (, S) P (, S|r = 1, xp , yp ) = , P (r = 1|xp , yp ) (C.2) P (r = 0|xp , yp , , S)P (, S) P (, S|r = 0, xp , yp ) = . P (r = 0|xp , yp ) Using these (possible) posterior probability distributions, we then compute the variance of the predicted critical values (y ∗ ) over k potential x eval values (x1eval to xkeval ). These x eval values correspond to the relevant x values that it is our goal to learn about. Recalling that y ∗ = F (x, ), the mean and variance for y ∗ for each x eval , and each response r, can be written as follows: y∗ eval E μi |r = 1 = F xi , P (, S|r = 1, xp , yp ),
eval y∗ F xi , P (, S|r = 0, xp , yp ), E μi |r = 0 = E
y∗ σi |r
eval 2 y∗ =1 = F xi , P1 (, S) − E[μi |r = 1]2 ,
(C.3)
eval 2 y∗ y∗ F xi , P0 (, S) − E[μi |r = 0]2 . E σi |r = 0 =
From these expected posterior predictive variances for each x eval conditioned on a response, we compute the expected average posterior predictive variances by averaging over x eval , and integrating out the two possible responses, weighted by their prior probability: k 1 y∗ y∗ E σi |r = 1 P (r = 1|xp , yp ) E[σ ] = k i=1
+
k 1 y∗ E σi |r = 0 P (r = 0|xp , yp ). k
(C.4)
i=1
The goal of global predictive variance minimization is the solution to: ∗
arg min {E[σ y ]} xp ,yp
(C.5)
and solving for the local solution amounts to solving this with x fixed (to c): ∗
arg min{E[σ y ]|xp = c}. yp
(C.6)
Finally, if some x eval values are more important than others — for instance, if we aim to estimate the contrast sensitivity function to assess reading function, where specific spatial frequencies may be expected to play a greater role, the variances associated with different x eval values can be differentially weighted to reflect this.
A Criterion Setting Theory of Discrimination Learning that Accounts for Anisotropies and Context Effects Martin Lages 1,∗ and Michel Treisman 2 1
Department of Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, Scotland, UK 2 Department of Experimental Psychology, University of Oxford, Oxford OX1 3UD, UK
Abstract We can discriminate departures from the vertical or horizontal more accurately than from other orientations. This may reflect perceptual learning, but the mechanisms behind such learning are not well understood. Here we derive a theory of discrimination learning based on criterion setting theory (CST; Treisman and Williams, 1984), an extension of signal detection theory in which judgment of the current stimulus is partly determined by previous discriminations and context. The CST-based theory of discrimination learning (CST-DL) describes mechanisms which use information from previous acts of discrimination to improve current decision making. CST-DL distinguishes between types of decision criteria and provides an account of anisotropies and context effects affecting discrimination. Predictions from this model are tested in experiments on anisotropies in orientation and depth perception. The results obtained support CST-DL. They also support the conclusion that the account of the retention of sensory information in delayed discrimination provided by CST is superior to the traditional belief that information retention relies on a fixed memory trace or representation of the stimulus. Keywords Discrimination learning, orientation anisotropy, oblique effect, depth perception, context, criterion setting theory, memory trace
1. Introduction One of the earliest findings in psychophysics was that the threshold for discrimination at a point on a sensory dimension is proportional to the intensity at that point. This relation, Weber’s Law (Weber, 1834/1996) can be given an explanation *
To whom correspondence should be addressed. E-mail:
[email protected]
122
M. Lages, M. Treisman
based on signal detection theory (Treisman, 1964). But discrimination can also vary non-monotonically at specific values on a sensory dimension, giving anisotropies that are not well understood. A familiar example is superior discrimination of the vertical or horizontal (‘cardinal’ or ‘principal’ axes) as compared with other orientations: orientation anisotropy or the oblique effect (Appelle, 1972; Essock, 1980; Howard and Templeton, 1966). How can anisotropies be explained? Here we present a theory of discrimination learning that also accounts for anisotropies and context effects on discrimination. Discrimination is commonly understood in terms that go back to Fechner’s (1860/1966) account, in which a standard or reference stimulus is presented, accompanied or followed by a test stimulus, and the subject must ‘notice’ any difference between them. This implies that the perception of the reference stimulus, when the inputs are simultaneous, or a representation of that stimulus, when they are not, acts as a fixed record with which the subjective input from the test stimulus is compared. We refer to the belief that a fixed representation or memory trace of the reference stimulus underlies discrimination as Fixed Memory Trace Theory (FMTT). This assumption has an intuitive appeal which has helped to maintain it as the leading theory of sensory information retention. Orban and Vogels (1998), for example, talk of ‘the three processes underlying temporal comparison: (1) sensorial representation of visual stimuli, (2) maintaining a trace of the preceding stimulus, and (3) comparison of the incoming stimulus with that trace’, and similar accounts can be found in Blake et al. (1997), Lee and Harris (1996), Magnussen et al. (2003), Matthews et al. (2005), Regan (1985), and elsewhere. To explain orientation anisotropy, this model would require that internal long-term representations of the cardinal orientations, of exceptional strength and accuracy, underlie the superior discrimination. The development of signal detection theory (SDT; Green and Swets, 1966; Macmillan and Creelman, 1991) has long offered an alternative approach, and SDT has been further extended by criterion setting theory (CST; Treisman, 1987; Treisman and Williams, 1984) which models sensory decision making as a dynamic process. CST has been applied to phenomena such as sequential dependencies (Treisman and Williams, 1984), levels of confidence (Treisman and Faulkner, 1984a), the contingent coding of criteria (Treisman, 1984b), the effect of signal probability on the ROC slope (Treisman and Faulkner, 1984b), criterion interchange (Treisman and Faulkner, 1985), and absolute judgment and the limits on information transmission (Treisman, 1985). It has provided an alternative to the attention band and response ratio hypotheses (Treisman, 1984a) and a model for random number generation (Treisman and Faulkner, 1987). Lages and Treisman (1998) showed that it provides a better account of the retention of sensory information in delayed discrimination than FMTT. We commence by briefly describing CST, then extend the theory to provide an account of contextual effects on discrimination, sensory discrimination learning, and anisotropies. Predictions are derived and tested in experiments on visual ori-
Orientation and other anisotropies
123
entation discrimination and, to examine whether the theory provides an account of anisotropies in general, on ocular vergence and binocular disparity. Where FMTT offers alternative predictions, these are also examined. 2. Criterion Setting Theory In SDT, presentation of a stimulus generates an input on the corresponding sensory decision axis. The decision process uses the direction of the difference between the value of the sensory input and the decision criterion to determine the response. The position of the decision criterion is calculated from the signal and noise probabilities and the values and costs of different outcomes. These parameters are normally assumed to have fixed values for an experimental session and the criterion is correspondingly also fixed. The accuracy of the decision process is compromised by noise that introduces variability into the relation between the intensity of the stimulus and the magnitude of the corresponding sensory input. This variability is described by a probability density function. It is this noise that limits discrimination or detection accuracy. Although it has been successful, SDT has a flaw. The assumption that the parameters determining the criterion are constant over time is a simplification that does not reflect the conditions under which we discriminate in daily life. An experimenter may hold parameters constant for the duration of an experimental session, but in the real world the conditions of discrimination vary from one moment to the next. To function optimally, the decision mechanism must take account of this continuing variation. It should update its estimate of the optimal value of the decision criterion for each new trial. CST corrects this flaw: it makes the decision criterion adjustable from trial to trial and adds mechanisms that have been selected to attempt to optimize the location of the criterion for each new decision, using information from any source that can contribute to this. This principle of continuing optimization and the employment of relevant information from any source is basic to CST. The theory predicts that even if the experimental parameters are held constant, the current effective criterion may vary from one trial to another. This variation provides an account of the sequential dependencies that are found in psychophysical data: these result from the ability of a previous stimulus or response to influence a current judgment. Sequential dependencies have often been described (e.g., Collier, 1954; Helson, 1947, 1964; Jesteadt et al., 1977; Parducci and Sandusky, 1965; Vogels and Orban, 1986; Ward, 1982, 1990; Ward and Lockhead, 1970) but not adequately explained. CST attributes them to trial-by-trial adjustments in the criterion. It assumes that an initial reference criterion, E0 , is selected, based on previous experience, knowledge of the experimental parameters, or the initial stimuli. On each successive trial a criterion setting mechanism (CSM) updates the effective decision criterion Ec to move it closer to the optimal value for that trial. We briefly describe two criterion setting
124
M. Lages, M. Treisman
processes, defined by different sources of information, that may update the criterion over trials. 2.1. The Criterion Stabilization Mechanism As a sequence of stimuli is presented, a corresponding flux of sensory inputs is generated on the decision axis which may be described by a probability density function (pdf). (We refer to a stimulus undergoing judgment as a critical stimulus.) This pdf may be centred on the initial criterion value, E0 , or its mean may be higher or lower, perhaps because the stimuli are mainly higher or lower. It is possible for the mean of the distribution to fall initially near E0 but for the flux of sensory inputs to drift to another location over time, for example, due to adaptation. If that leaves the criterion too low (or high) in relation to the bulk of the inputs, the responses will be mostly HIGH (or LOW), which conveys little information. (We use lower case for stimuli and UPPER CASE for responses.) The essential requirement for discrimination is that it should transmit information. To maximize information transmitted, on each trial the stabilization mechanism should place the current effective criterion as near as possible to the mean of current sensory inputs. To do this, it uses negative feedback to prevent the criterion diverging from the central tendency of the sensory inputs: the input on each trial causes the criterion values for following trials to shift towards its location. Thus if, over time, more inputs fall above the original criterion location than below it, their net effect will be to move the criterion upwards until this imbalance is corrected. This system produces a sequential dependency between trials. If the input on trial i falls above the criterion (giving the response HIGH), this will cause the criterion to be higher on trial i + 1, reducing the probability of HIGH on that trial, a negative dependency on preceding stimuli or ‘contrast’. The operation of the stabilization mechanism is illustrated in Fig. 1 for a discrimination experiment in which an initial reference stimulus is followed by a sequence of test stimuli, randomly selected from a test range. Corresponding to each stimulus is a normal distribution of sensory effects on the decision axis, E. Distributions are shown for two of the test stimuli, sn and sn+1 . On trial 1 the current effective value of the criterion, Ec (1), is assumed to be equal to the reference criterion value E0 . E1 , the sensory input registered on the decision axis on trial 1 is shown falling to the right of Ec (1). The information this provides is that the criterion may be too low and should move to the right. The CSM for this criterion stores this information in a form which will be referred to as a stabilization indicator trace. This is a very different concept from the traditional memory trace. The latter is a more or less pictorial representation of a stimulus previously presented; it is subject to attrition by decay or interference. The indicator trace is a record of two quantities, a magnitude and a direction in which the criterion should shift in coming trials; the CSM may compute changes in the magnitude over trials. In the figure, the indicator trace set up on trial 1 is shown on the right as an arrow whose magnitude and
Orientation and other anisotropies
125
Figure 1. Criterion setting theory: operation of the stabilization mechanism when an initial reference stimulus is followed by a random sequence of test stimuli; four trials are shown. Distributions of sensory effects are shown on the decision axis, E, for two of the test stimuli, sn and sn+1 . On each trial i, (a) the current criterion value Ec (i) is computed as the sum of the initial reference value E0 plus indicator traces persisting from previous trials; (b) the current input Ei is registered and the direction of the difference between it and Ec (i) determines the response; (c) the indicator trace for trial i is set up. This stores an estimate of the magnitude and direction of criterion adjustment required to optimize the criterion. Indicator traces decline to zero over trials.
direction indicate the required criterion adjustment. The magnitude is calculated as s (E1 − Ec (1)), where s is a weighting constant. On trial 2, s (E1 − Ec (1)) has declined to s (E1 − Ec (1)) − δs , where δs is a decrementation parameter: the trace is reduced by δs on each successive trial until it reaches zero when it disappears. This decrementation reflects the decreasing relevance of past evidence as time proceeds. On trial 2, the residue of the trace is added to the reference criterion to give the effective criterion for that trial, Ec (2) = E0 + s (E1 − Ec (1)) − δs . The sensory input received on trial 2, E2 , is compared with Ec (2) to determine the response; in Fig. 1, E2 is shown falling to the right of the criterion, giving the response HIGH. Consequently, the stabilization indicator trace set up on this trial, which has initial magnitude s (E2 − Ec (2)), points to the right. On trial 3 both indicator traces have decreased but a residue of each is still present, giving Ec (3) = E0 + s (E1 − Ec (1)) − 2δs + s (E2 − Ec (2)) − δs . The input on that trial, E3 , falls to the left of Ec (3), so that the response is LOW and a negative indicator trace, s (E3 − Ec (3)) < 0, is generated, pointing to the left.
126
M. Lages, M. Treisman
Stabilization tends to maintain the criterion in the neighbourhood of the inputs received, E1 , E2 , E3 , . . . . Thus the responses LOW and HIGH occur roughly equally often, maximizing information transmission. 2.2. The Probability Tracking Mechanism In SDT the probability of the signal, regarded as constant for the experimental session, is important in determining the value of the criterion. In daily life, however, the probability we assign to the source of a signal is rarely fixed; it may be frequently modified by new information. If a hunter in the field briefly sights what may be prey, the subjective probability assigned to the possibility that prey is present rises. If no evidence of prey can be found, the probability estimate falls. Thus a realistic account of criterion setting should include a mechanism that tracks changes in signal probability over trials and uses the most recent information to compute the best criterion for the next trial. Positive observations raise the subjective probability of the signal, justifying a lower criterion in upcoming trials, negative responses indicate the criterion should be raised. The relevant information comes from prior responses, each of which embodies the best decision about the signal the observer could make at that time. Since in the real world objects usually persist through time, a positive response on trial i raises the probability that the signal will be observed again on trial i + 1, a negative response lowers it. If the response is HIGH (or YES in a detection task), the criterion should move down next time, increasing the probability of repeating the same response, a positive dependency on past responses (‘assimilation’). If the response was LOW (or NO) the criterion should rise. To achieve this positive feedback, following each decision the probability tracking mechanism sets up a response-dependent tracking indicator trace which specifies the direction and magnitude of criterion shift required. Thus HIGH generates a negative trace, LOW a positive one. Tracking indicator traces have a constant initial absolute value r and this decreases by δr 0 on each trial until the trace reaches zero and disappears. 2.3. Feedback CST also embodies a mechanism for dealing with a third source of information, experimental feedback, but as this is not employed in the experiments below it is not further discussed here (Treisman and Williams, 1984). 2.4. Integrating Information Sources To compute the best current value for the criterion, the CSM simply sums the indicator traces from the different information sources, and this determines the net criterion shift. The magnitude parameters, s and r , and decrementation parameters, δs and δr , determine the relative weights of the different mechanisms and the number of trials over which the influence of an event will persist. Other factors, such as values or costs attaching to different outcomes, may also be relevant to the
Orientation and other anisotropies
127
optimal placement of the criterion; these may be accommodated by appropriately modifying the magnitude parameters attaching to positive and negative indicator traces. As stabilization and tracking tend to shift the criterion in opposite directions their effects may interfere but they do not necessarily cancel out because the values of their parameters may differ. The sources of information above are not the only ones that may affect the criterion: a further source is the context in which a decision is made. The sequence of previous similar judgments is part of the context of a current decision, and there are other contextual sources of information. Thus CST opens the way to develop a theory of context. A related topic is the improvement that may occur in psychophysical discrimination with practice. This is an instance of learning but it is difficult to apply familiar models of learning, such as association or connectionism, to explain it. Here we extend CST to provide an account of discrimination learning, the effects of context on discrimination, and anisotropies, based on decision making processes. 3. A Criterion Setting Model for Discrimination Learning To extend CST to provide a theory of discrimination learning (CST-DL) we add four assumptions: (A1) Indicator traces may be differentially weighted and decremented in order to optimize performance. CST proposes that the senses have evolved to process sensory information efficiently. This requires using all available relevant information to optimize the current value of the decision criterion. To maximize the effective use of information, the CSM may adjust the magnitude parameters, s and r , to weight more informative traces more highly, and may adjust the decrementation parameters, δs and δr , to ensure that traces are preserved while they continue relevant or decremented at the rate they lose their relevance. Thus if conditions are stable, as in most laboratory experiments, so that traces retain their relevance over time, δs and δr should be small or, if conditions are completely constant, zero. We can extend this argument to past occasions. To benefit from past experiences of continuing relevance, δs and δr should be sufficiently low for indicator traces from one session to be available in future sessions. As experience grows and the residue of cumulated past traces becomes larger, the criterion will be more effectively stabilized near an optimal location. This use of past experience constitutes a mechanism of discrimination learning. (A2) Criteria may be more or less well-established. Different degrees of prior discrimination learning produce differences in how well criteria are established. Frequently-recurring tasks produce criteria which may be described as permanent or semi-permanent as the residue of traces preserved from earlier experiences contributes to stabilizing them. For example, the need to align
128
M. Lages, M. Treisman
ourselves with gravity ensures that verticality is monitored and departures from the vertical rapidly discriminated, producing a permanent criterion, stabilized by a substantial residue of past traces. But to discriminate departures from a novel standard, say 13◦ , the system must set up an ad hoc criterion based on the first few stimuli presented, and this will have less support and be more variable. Intermediate cases, such as the obliques, 45◦ and 135◦ , are discriminated less often than cardinal orientations, but are sometimes important and so may acquire the support of a smaller residue of traces; these are relatively ad hoc. Thus discrimination learning provides a basis for anisotropies, well-established and overlearned permanent criteria. (A3) The context may contribute to setting the criterion. A stimulus is discriminated in the context of other stimuli. We refer to earlier test stimuli that contribute indicator traces to CSM(test), the criterion setting mechanism for the current criterion as ‘intrinsic context’. All other ambient stimuli constitute the ‘extrinsic context’. Of these, some stimuli may be ineffective in modifying the current discrimination, others effective. Two questions arise: which ambient stimuli constitute effective context, and how do they contribute to setting the criterion? We may think of a stimulus as a collection of values on different sensory dimensions, such as orientation, size, colour, location, that are associated together. We hypothesize that a contextual stimulus is effective in modifying the test stimulus criterion if it is ‘similar’ to the test stimuli in that both sets of stimuli share the dimension being judged, say orientation, and their values on this dimension fall in the same or similar ranges. (We refer to such stimuli as ‘isodiscriminal’.) Isodiscriminal stimuli may contribute to setting the criterion for test stimuli on the shared dimension even if on orthogonal sensory dimensions, such as colour, the contextual and test stimuli are quite different. (A4) A projection process enables extrinsic context to modify the criterion for the test stimuli by transferring indicator traces to CSM(test). How do contextual stimuli affect the discrimination of a test stimulus? CST-DL posits a ‘projection process’ that allows information generated by the covert or overt discrimination of isodiscriminal contextual stimuli to be transferred to CSM(test). Consider the discrimination of orientation in relation to a standard of 90◦ , using a criterion, Ec (test), dedicated to the set of test stimuli. The effective contextual stimuli may also undergo discrimination, in the normal course of perception, in relation to 90◦ or a similar standard and produce indicator traces employed in maintaining their own criterion, Ec (context). The projection process makes copies of these indicator traces and transfers or ‘projects’ them to CSM(test), which adds them to the store of intrinsic traces. CSM(test) sums the projected traces with the intrinsic indicator traces to determine a current value for Ec (test). The magnitude and decrementation parameters of the contextual indicator traces may be adjusted when they
Orientation and other anisotropies
129
are copied to CSM(test) to reflect the quality and relevance of the information they convey, and perhaps other factors such as the general similarity of the stimuli. 4. Predictions from CST-DL (DL-P1) The distribution of the test stimuli affects the location of the psychometric function: the PSE shift prediction. This holds for both ad hoc and permanent criteria. The location of the psychometric function is the 50 percent point (traditionally known as the ‘point of subjective equality’ or PSE). Is this a fixed value, determined by a memory trace, or can the context modify it? We consider delayed discrimination experiments in which a standard or reference stimulus is followed after a delay by a test stimulus. Test stimuli are selected equally often in random order from a range of equally spaced values. The subject indicates whether the test is higher or lower than the reference stimulus. In the method of constant stimuli (MCS) the reference and test stimuli are both presented on each trial. In the method of single stimuli (MSS) the reference stimulus may be presented just once initially, and test stimuli alone are presented on the test trials. A normal cdf fitted to the data provides estimates of the mean μ and standard deviation σ : μ is estimated by the PSE and corresponds to the mean location of the criterion on the decision axis; σ is estimated by the sample standard deviation (SD); its value is determined by the sources of variation affecting the discrimination. Accuracy is measured by some function of the slope of the psychometric function such as the SD or threshold. In the classical model of discrimination the critical stimulus is compared with a fixed memory trace of the reference stimulus. As this determines a corresponding location for the PSE, shifting the test stimulus range can have no effect on it. CST makes a different prediction. Indicator traces produced by the context of preceding test trials and reference stimulus presentations contribute to determining the current effective criterion. A range of test stimuli will produce inputs on the decision axis that fall equally about the range midpoint and produce corresponding stabilization traces. (We use R(m) to refer to a range of equally spaced test stimuli with midpoint m. If m is greater [less] than the reference value, R(m) is a positive [negative] asymmetric range.) Thus if m falls above (below) the reference value, the criterion will more often experience stabilization indicator traces tending to shift it upwards (downwards) towards m. Therefore stabilization will tend to shift the PSE towards the midpoint of the range: the PSE shift prediction. The predictions this makes for asymmetrical ranges are illustrated in Fig. 2. It might be argued for FMTT that, leaving ad hoc criteria aside, fixed memory traces would be needed to sustain the precise discrimination characteristic of anisotropies. Then the PSE shift might be shown for ad hoc criteria but not for permanent criteria. However, CST predicts a shift will occur for all criteria.
130
M. Lages, M. Treisman
Figure 2. FMTT and CST predictions for orientation discrimination using MSS. Hypothetical results for a 45◦ reference stimulus and test ranges R(43◦ ) (long dashes), R(45◦ ) (continuous line) and R(47◦ ) (short dashes). The psychometric functions show the probabilities of a counterclockwise (CCW) response for each range; they illustrate the CST PSE-shift prediction. FMTT would predict the continuous line function in each case.
(DL-P2) Permanent criteria show smaller PSE shifts than ad hoc criteria. Permanent criteria benefit from a residue of indicator traces from past sessions which on average, we can assume, tend to place Ec near the reference value. When this residue is summed with traces generated by an asymmetrical test range, the former will dilute the effect of the latter on the current location of the permanent criterion. With no such residue, an ad hoc criterion should shift more. PSE shift is given by the slope of the regression of obtained PSE values onto range midpoints. For example, assuming a 45◦ reference and negative and positive asymmetric ranges R(43◦ ) and R(47◦ ), the magnitude of the PSE shift is given by: Span = (PSE(R(47◦ )) − PSE(R(43◦ ))/(47◦ − 43◦ ).
(1)
FMTT would predict Span = 0, CST-DL that 0 < Span 1. CST-DL predicts that Span(ad hoc) will be higher than Span(permanent). Thus for any intermediate value of Span, such as Span = 0.50, values of Span(permanent) are relatively more likely to fall below it, and values of Span(ad hoc) are relatively more likely to fall above it. (DL-P3) Partitioning the intrinsic context: sequential dependencies. PSE shifts represent the summed effects of all past test stimuli and responses. We can also attempt to isolate the effect a specific event on a preceding trial ti may have on the current or index trial tk ; we refer to this as a sequential dependency. We measure the time lag from the past stimulus to the index stimulus in terms
Orientation and other anisotropies
131
of trials: lag = (k − i). These preceding events include the sensory input on the decision axis (about which all we may know is the stimulus value presented) and the response made. The effect on trial tk is a measure of the residual magnitude of the trace produced by the event on trial ti . By examining the average impact of each category of past event we can separate out the effects of past stimulus magnitude, past response, and the passage of time (or trials). FMTT has no mechanism by which events on a previous trial can systematically influence the response on the current trial. It must attribute any apparent dependency to unstable and unpredictable noise and error. c , the mean value of Ec for a block of randomly ordered test According to CST, E trials, reflects the net effect of all context events affecting the criterion. The PSE for c = 0 on the c , which we may assign the value E the block of trials corresponds to E central scale. To determine the general effect of a HIGH response one trial back on the criterion for an index trial, we would select all trials in the block that follow a HIGH response at lag 1, fit a psychometric function to them, and find the PSE. The effect of all preceding trials, other than the immediately preceding trial, would place the criterion at 0. But the trials immediately preceding the index trials always gave the response HIGH and therefore always generated tracking traces −r which will have decreased to −(r − δr ) on the index trials. Thus the preceding HIGH responses will move the mean criterion for the index trials downward, as compared c = 0. We could similarly determine the effect of presenting a high stimulus with E one trial back. This would generally produce an input higher than the criterion, generating a stabilization trace that would move the mean criterion for the index trials upwards. The negative dependency on past stimuli will tend to oppose the positive dependency on past responses, but any cancellation is incomplete because the two processes have different parameters. However, attempts to calculate either dependency alone give results that may be difficult to interpret. Examining different combinations of past stimulus and response is more informative. The predictions that result are illustrated in Fig. 3. As both a high stimulus and a LOW response tend to move the criterion upward, the index criterion values contingent on a preceding (high LOW) combination show the highest positive criterion displacements. Similarly, the (low HIGH) combination predicts the most extreme negative displacements. In the (low LOW) and (high HIGH) combinations, the two components have opposite effects so the curves should be intermediate. Their relative positions as shown are arbitrary, as they depend on the exact values of s and r . FMTT assumes that memory traces, being subject to decay and interference, inevitably decline with time. In CST, in contrast, the CSM sets the rates at which indicator traces decrement to match the rates at which their relevance decreases. When the conditions of observation may change rapidly, old indicator traces may become uninformative quite soon and so the decrementation parameter will be large and sequential dependencies will extend for a few lags only. But in constant laboratory conditions, past trials may have the same relevance whether they are early or
132
M. Lages, M. Treisman
Figure 3. Predicted sequential dependencies. Each point gives the mean criterion for a set of index trials, where these are contingent on the different stimulus–response combinations, at a lag from 1 to 5. Index trial criterion values, expressed as a departure from the overall mean (Criterion Displacement = 0), are shown for four different combinations of past stimulus and RESPONSE: high LOW (filled circles), low LOW (empty circles), high HIGH (filled squares) and low HIGH (empty squares).
recent; δs or δr may then be small or zero, in which case the sequential dependency curves decline slowly or not at all with lag. Since both are maintained in the same way, ad hoc and permanent criteria will show the same patterns of dependencies. (DL-P4) The variance is smaller for permanent than for ad hoc criteria. The variance of a psychometric function is determined partly by noise arising in the stimulus and in neural transmission and processing, and partly by criterion variance. The sample of indicator traces that sets the momentary position of the criterion differs from trial to trial, as traces decrement and new traces are added. This causes Ec to vary; the resulting criterion variance adds to that from other sources and thus reduces the slope of the psychometric function. The magnitudes of δs and δr affect the criterion variance. If these are large, traces disappear quickly, leaving a small recent sample to determine Ec on each trial, giving a larger variance. If they are small, traces persist, giving a larger sample and smaller variance. For a task encountered repeatedly, such as discriminating the vertical, the CSM should set the decrementation parameters low so as to preserve past traces; this larger sample will give a lower criterion variance. As the sample available to determine Ec is less for an ad hoc than for a permanent criterion, the variance will be greater. (DL-P5) Extrinsic context may produce PSE shifts. Criterion optimization employs all useful information, including that from effective extrinsic contextual stimuli. Indicator traces from discriminations of such stimuli
Orientation and other anisotropies
133
may be projected to the CSM(test) and contribute to stabilizing Ec (test). Thus extrinsic context may affect the location of the psychometric function. Consider two ranges of visual orientation stimuli that are easily distinguishable on an orthogonal dimension (one might be Red, the other Green) both used with the same reference, say 45◦ , and presented in a randomly intermingled order. When we analyze responses to the ‘Red’ stimuli we shall refer to them as the Judged Range (JR) and the ‘Green’ stimuli as the Contextual Range (CR), and vice versa. CST-DL predicts that a judged range with its midpoint at 47◦ , JR(47◦ ), will tend to shift the PSE toward 47◦ . Similarly, the projected indicator traces from an asymmetrical contextual range, CR(47◦ ), will shift the PSE for the JR towards the CR midpoint, 47◦ . When JR and CR have the same midpoint (they are ‘concordant’) both will tend to shift the Judged Range criterion, Ec (JR), towards this common midpoint. But if the ranges are ‘discordant’, i.e., opposite midpoints are paired, their effects will partially cancel, placing Ec (JR) at an intermediate position at or near the reference value. (DL-P6) Extrinsic context may produce sequential dependencies. Just as with preceding test stimuli, the effects of preceding extrinsic contextual stimuli on criterion adjustment will be exhibited in greater detail if we examine any sequential dependencies of Ec (JR) on preceding contextual stimuli and responses partitioned by lag, stimulus and response. Preceding CR trials should shift Ec (JR) in the same way as preceding JR test trials, producing sequential dependencies on contextual trials similar to those predicted in Fig. 3. (DL-P7) Criterion specificity: the same discrimination performed on different sets of isodiscriminal stimuli may employ different criteria in each case. Consider a hypothetical vertical orientation discrimination experiment on two similar test stimulus ranges, one consisting of Short Lines (SL), say 1 cm long, displayed for 100 ms at low contrast, the other consisting of Long Lines (LL), 10 cm long, displayed for 10 s at high contrast. If the two ranges are randomly intermixed in a series of judgments, and if SL is analyzed as the JR, with LL as the CR, indicator traces from previous judgments of both SL and LL stimuli may be used by CSM(JR = SL) to optimize successive values of Ec (JR = SL). Conversely, if LL is analyzed as the JR with SL contextual, both SL and LL traces may be employed in optimizing Ec (JR = LL). This allows us to consider two models for the organization of criteria. First, the single criterion model: there may be a unique criterion for the visual vertical; a single CSM(vertical) may use indicator traces from both SL and LL presentations to optimize the single criterion; and this criterion Ec (vertical) would be used in discriminating stimuli from both sets. In this case both test ranges should give the same PSE, within error. The criterion variance would be the same for both, though not necessarily the total variance. If ‘vertical’ were represented by a FMT, a similar result for PSE would be entailed.
134
M. Lages, M. Treisman
An alternative, the multiple criterion model, is suggested by the CST principle that for each specific situation all useful information should be drawn upon, and by the ability of the CSM to select information sources and modify magnitude parameters to reflect the quality of information provided. Then for every combination of type of test stimulus and type of context, a corresponding criterion should be computed. It follows that there may be separate CSMs, CSM(JR = SL) and CSM(JR = LL), computing different criteria. The same traces could be used in calculating Ec (JR = SL) and Ec (JR = LL), but they may be weighted differently, giving different results. Thus CSM(JR = SL) might give high weights to traces from past SL stimuli, since these are test stimuli, and high weights to LL stimuli, as offering high-quality information. Using the same stimulus ranges, CSM(JR = LL) might give high weights to LL traces, as test stimuli, and low weights to SL stimuli, as offering poorer information. Thus the criteria may differ, giving different PSEs and there may be different strengths of contextual sequential dependencies in the two cases. This is the CST-DL prediction. These predictions are tested below. 4.1. Experiment 1: Orientation Discrimination 4.1.1. Method Orientation discrimination was measured for vertical and oblique reference orientations, using two intermingled asymmetrical test stimulus ranges in each case, and MSS. 4.1.2. Subjects Six students, selected from a subject panel, two male and four female, between 19 and 28 years, with normal or corrected-to-normal visual acuity, served as subjects. They were naive about the experiment and were paid to attend two sessions on consecutive days. One subject misunderstood the instructions and had to be replaced. 4.1.3. Apparatus The experiment was programmed in C/C++ and run on a Macintosh PowerPC with a Macintosh 12 inch high-resolution monitor. This was a cathode-ray tube with aluminized PC104 and PC193 phosphors. This display appeared achromatic. The monitor was calibrated using a Minolta LS-110 photometer with close-up lens, using routines from VideoToolbox (Pelli and Zhang, 1991). Its frame rate was 66.7 Hz. The on-screen luminance modulation was improved by using a video attenuator that combined the red, green, and blue output signals from the computer’s 8-bit DACs to simulate a linear 12-bit display. The centre of the screen was viewed binocularly at a distance of 114 cm through a tube of the same length. The tube was 15.25 cm in diameter and presented a circular aperture of 7.6◦ visual angle at the screen. It was airbrushed internally with non-reflecting black and bore virtually no markings. Black goggles were attached to the near end of the tube to exclude peripheral visual cues such as the monitor frame, other equipment and corners of walls.
Orientation and other anisotropies
135
The observer was seated comfortably in front of the monitor and keyboard in a darkened cubicle, and rested his or her head on a chin-rest to view the monitor through the goggles. 4.1.4. Stimuli All stimuli were presented at the centre of the screen on a uniform background of 38.7 cd/m2 in a circular Gaussian envelope with a SD of 1.2◦ visual angle. The uniform background continued between trials. The stimuli were sine-wave gratings presented at random spatial phase; they varied in orientation and SF; they subtended 7.2◦ visual angle (14.4 cm by 14.4 cm). Stimulus and surround subtended 7.6◦ visual angle when viewed through the tube. The stimuli had a mean luminance of 38.7 cd/m2 and a Michelson contrast of 20%. A pattern of randomly selected black and white pixels with the same average luminance and Michelson contrast as the stimuli provided a random-noise mask. The reference gratings were either oblique (45◦ or 135◦ ) or vertical (90◦ ). The test stimulus ranges were centred at 43◦ and 47◦ , or 133◦ and 137◦ , in the oblique conditions, and 88◦ and 92◦ in the vertical condition. Each set of 11 test stimuli covered a range extending ±5◦ around the midpoint in steps of 1◦ . For every reference orientation x, two sets of 11 test gratings were generated, centred on x − 2◦ or x + 2◦ , and ten gratings for warm-up trials. 4.1.5. Procedure Each subject attended two sessions, on consecutive days. In one session the reference orientation was vertical, in the other oblique (either 45◦ or 135◦ ). In each session four blocks of test trials were given. In each block two ranges of test stimuli were presented, one with the gratings at 2 cpd and the other at 10 cpd, randomly intermixed. For each reference, two ranges of 2 cpd stimuli and two of 10 cpd were combined factorially with x − 2◦ and x + 2◦ midpoints. The sequence of conditions across blocks was balanced over observers. In each block the reference stimulus was presented initially only, and on subsequent trials test stimuli were presented alone. The reference consisted of a grating at an orientation of x degrees, where x is one of 45◦ , 90◦ or 135◦ . This was followed after an interval by test stimuli randomly selected on each trial from two test stimulus ranges, with midpoints x − 2◦ or x + 2◦ . (We use 0(180)◦ to represent the horizontal, 45◦ for the right oblique, 90◦ for the vertical and 135◦ for the left oblique, with respect to the oculocentric [and also gravitational] vertical.) Each block proceeded as follows: (1) the reference orientation was presented, first as a 2 cpd grating presented for 10 s, followed by a 0.5 s mask, then as a 10 cpd grating at the same orientation presented for 10 s, followed by a 0.5 s mask. The spatial frequencies were always in this order. (2) A 10 s retention interval followed. (3) A beep signalled the onset of 20 warm-up trials in which ten 2 cpd and ten 10 cpd gratings were presented in random order; these trials were not analyzed. They were followed immediately by twelve presentations of each of eleven 2 cpd gratings and each of eleven 10 cpd gratings in random order. Each test grating
136
M. Lages, M. Treisman
appeared on the screen for 1 s, followed by a mask to hide any afterimage; this lasted until a response was made. The subject responded by pressing a labelled key on the keyboard to indicate whether the grating was rotated clockwise (CW) or counter-clockwise (CCW) from the reference stimulus of the same SF. A 1.5 s ISI intervened before the next trial. No feedback was given. (4) The block was followed by a rest period of 3–4 min. 4.1.6. Results and Discussion For each condition, the relative frequency of CCW responses was plotted against the test stimulus orientation, and psychometric functions fitted. (CCW and CW represent counter-clockwise and clockwise responses, and ccw and cw stimuli rotated counter-clockwise or clockwise from the reference value.) Gaussian cumulative distribution functions with parameters μ and σ are commonly used for psychometric data (Finney, 1971), but the Weibull function with parameters a and b may also be used (Quick, 1974). The data for each observer, for each block of each session and for each spatial frequency, were fitted both by a Gaussian cumulative distribution function G(x) and by a Weibull distribution W (x). The Weibull distribution was fitted using a maximum likelihood monotonic method (Pelli, 1997; Watson, 1979) and for comparability, a similar procedure was used to fit the Gaussian cumulative distribution function. A value of χ 2 was calculated for each psychometric function, and these were summed to give an overall measure of goodness of fit. Both the Gaussian and Weibull distributions gave acceptable fits, but the Gaussian fits to the psychometric functions were significantly better overall than the Weibull fits (F (678, 678) = 1.14, p < 0.05). Therefore, only Gaussian fits are used in the following analyses. Figure 4 shows Gaussian psychometric functions fitted to data for discrimination of oblique gratings on the left, and vertical gratings on the right, for each combination of a JR at one SF and an intermixed CR at the other SF. In each case, for presentation only, the data were pooled across observers, and for oblique stimuli they were pooled across the 45◦ and 135◦ conditions, as the results were comparable in the two cases. The proportions of counter-clockwise responses, P(CCW), are plotted against the orientation of the 2 cpd test stimuli in the upper panels and against the 10 cpd test stimuli in the lower panels. The results appear to conform with the CST-DL predictions. 4.1.7. PSE Predictions CST predicts that the psychometric function will shift toward the midpoint of the judged range. For FMTT all psychometric functions should be located near the reference stimulus, whatever the distribution of test stimuli, and especially for the strong long-term memory traces FMTT would assume to underlie anisotropy. But in each set of data in Fig. 4 the curves for the high and low concordant range pairs (e.g., l47 h47 and l43 h43) are displaced toward the high and low orientation values, for both oblique and vertical standards.
Orientation and other anisotropies
137
Figure 4. Experiment 1: Orientation discrimination. Gaussian psychometric functions for 2 cpd (l) stimuli (JR) when intermixed with 10 cpd (h) stimuli (CR), upper panels. Results for the same 10 cpd stimuli (now JR) when intermixed with 2 cpd stimuli (now CR), lower panels. Oblique reference gratings (45◦ or 135◦ , pooled) on left, vertical (90◦ ) on right. See figure for symbol labels. These show the SF (l or h) and midpoint values for each range in a pair; the 2 cpd range is always followed by the 10 cpd range. Upper panel labels are in the order JR-CR; lower panels, in the order CR-JR. Circles indicate high midpoint JRs; triangles, low. Filled symbols indicate high midpoint CRs; empty symbols low.
CST-DL attributes anisotropy to an accumulation of past indicator traces which stabilize the criterion and reduce the relative effect of an asymmetric distribution of test stimuli. Thus permanent criteria should show smaller PSE shifts than ad hoc criteria, giving a lower value of Span for the vertical than for the obliques. In Fig. 4 the concordant range pairs’ PSEs are more widely separated for the oblique stimuli on the left than for the vertical concordant pairs on the right. Span for the oblique data is 0.58 (SE = 0.215) and for the vertical condition 0.34 (SE = 0.108); while in the predicted direction these values are not significantly different; see further below. 4.1.8. Context Effects CST-DL predicts that the locations of JR psychometric functions may be shifted by asymmetrical ranges of intermixed CR stimuli. In accord with this, the discordant range pairs tend to be displaced from and to fall between the upper and lower concordant range pairs. Thus in the upper left panel, showing results for 2 cpd, the l47 h43 combination (here ‘l’ is JR, ‘h’ CR) is shifted to the left compared with the
138
M. Lages, M. Treisman
l47 h47 pair, and the l43 h47 pair is shifted to the right compared with the l43 h43 pair. This pattern of displacement is seen in 7 out of 8 cases (p = 0.035). It is not compatible with FMTT. These results were tested by an analysis of variance of the PSE values. To render the data for the different reference stimuli comparable the PSEs were transformed to standardized estimates given by μ = PSE − Reference value. These values served as the dependent variable, with repeated measurements on the factors Orientation (vertical and oblique; data for the reference stimuli 45◦ and 135◦ were treated as equivalent in the analysis and entered as oblique), JR (−2◦ and +2◦ ), CR (−2◦ and +2◦ ), and SF of the test grating (2 and 10 cpd). This gave statistically significant main effects for both JR (F (1, 5) = 50.55, p = 0.001) and CR (F (1, 5) = 10.13, p = 0.024), confirming the shifts in PSE and the effectiveness of context, but not for Orientation and SF. There was a significant two-way interaction between CR and SF (F (1, 5) = 8.148, p = 0.04), and between JR and CR (F (1, 5) = 6.55, p = 0.05). The main effects for JR and CR indicate that both preceding judged trials and intermixed contextual trials systematically influence the PSE. This confirms the PSE shift, and it also confirms that contextual stimuli at one SF may contribute to setting the criterion used to judge the orientations of stimuli at another SF. Similar contextual effects were found for judgments of spatial frequency (Lages and Treisman, 1998). CST-DL predicts criterion specificity, that is, distinct criteria may be used for 2 cpd and 10 cpd SF orientation stimuli. The data in Fig. 4 support this. For both oblique and principal orientations, the separation between the PSEs of the discordant psychometric functions for 2 cpd JRs (upper panels) is less than the corresponding separations for 10 cpd JRs (lower panels). This indicates that different criteria were used in the two cases even though, for example, the same trials underlie the curve for l47 h43 above (where l47 is analyzed as JR, and h43 is context) and the curve for l47 h43 below (where h43 is analyzed as JR, with l47 as CR). The significant CR × SF interaction (p = 0.04) confirms this finding. It arises because 10 cpd contextual stimuli were more effective in shifting the 2 cpd criterion than the reverse. This is further illustrated in Fig. 5, which shows the PSEs for 2 cpd JRs regressed on JR midpoints on the left, and for 10 cpd JRs on the right. Consider the 2 cpd data. All the slopes are positive, illustrating the PSE shift induced by the intrinsic context, the preceding test stimuli. The filled symbols show vertical data, the empty symbols oblique. Squares are used when the intermingled CRs have high midpoints, triangles for low midpoints. Thus the vertical displacements between the curves with filled squares and triangles (vertical discrimination), and empty squares and triangles (oblique), illustrate the PSE shifts induced by the intermixed 10 cpd CRs. The vertical displacement between the curves with filled symbols is less than that for empty symbols, illustrating the greater resistance to context for the vertical.
Orientation and other anisotropies
139
Figure 5. Experiment 1. PSEs are plotted against the range midpoints expressed as deviations from the reference value, for 2 (10) cpd JRs in the left (right) panel. Filled (empty) symbols represent judgements of the vertical (obliques). Squares indicate a high midpoint CR; triangles, a low midpoint CR. The symbols are shown in the panels. Error bars are SEs of the mean.
The right panel has 10 cpd JRs. These show a similar pattern except that the vertical displacements between the two curves for the vertical and between the two oblique curves are much less. This illustrates the lesser effectiveness of 2 cpd contextual stimuli in modifying the criterion for 10 cpd stimuli than the reverse. This suggests that larger s values were assigned to the stabilization indicator traces produced by 10 cpd stimuli. In this sense the 10 cpd sine-wave stimuli are more salient than the 2 cpd stimuli. It is not evident that this difference could arise at the earliest stages of processing, assuming that for each SF this requires activation of an appropriate narrow passband filter. There was no significant main effect of SF in the ANOVA of SDs (below). The salience accorded 10 cpd stimuli may be related in some way to the number of oriented linear features in the final percept. Assumption (A3) proposed that stimuli that are isodiscriminal to the test stimuli but differ on other dimensions may function as effective context. Lages and Treisman (1998) confirmed this for judgments of SF: vertical and horizontal contextual stimuli both produced similar effects on judgments of spatial frequency with vertical (or horizontal) test stimuli. The present results generalize this finding to judgments of orientation with contextual stimuli of similar or dissimilar SF. A re-analysis of data obtained by Lages and Treisman (1998; experiment 3) on spatial frequency discrimination provides further evidence for the criterion specificity prediction. Figure 6 illustrates SF discrimination results for reference values of 2.5 cpd using JRs and CRs with midpoints of 2.25 or 2.75 cpd. Two SF ranges were intermingled that differed in orientation, one being horizontal, the other vertical. This did not prevent CRs significantly shifting JR decision criteria. The right panel illustrates that discordant vertical CRs were more effective in reducing the PSE shifts produced by horizontal JRs than the reverse (left panel). Significant in-
140
M. Lages, M. Treisman
Figure 6. SF discrimination (Lages and Treisman, 1998; Experiment 3, re-analyzed), reference value 2.5 cpd, intermingled horizontal and vertical SF stimuli, JRs and CRs centered at 2.25 or 2.75 cpd. Left panel: Gaussian psychometric functions fitted to vertical SF JRs with horizontal CRs. Right panel: horizontal JRs with vertical CRs. Continuous lines, concordant ranges. Long dashes: JR(2.25), CR(2.75) in each case; short dashes JR(2.75), CR(2.25).
teractions CR × OR (orientation), (F [1, 5] = 6.925, p = 0.046), and JR × OR (F [1, 5] = 13.687, p = 0.014) support this evidence of specificity. In this respect vertical stimuli are more salient than horizontal, perhaps because of their more direct relationship to the direction of gravity. For both SF and orientation, we have found that the same discrimination may employ different criteria for distinct sets of isodiscriminal stimuli, indicating that traces may be weighted differently according to their sources and their application. For orientation, 10 cpd stimuli may project larger traces to the CSM for 2 cpd stimuli than the reverse. For SF, vertical stimuli project larger traces to the CSM for horizontal stimuli than the reverse. 4.1.9. The Slopes of the Psychometric Functions CST-DL proposes that permanent criteria, being stabilized by larger samples of traces, give lower variances. This provides an account of orientation discrimination anisotropy. Figure 4 shows that vertical discrimination produces steeper psychometric functions than the obliques. An ANOVA of the SDs of the Gaussian fits gave a highly significant main effect of Orientation (F (1, 5) = 595.91, p = 0.0001). (There were no other significant main effects. The interactions JR × SF (F (1, 5) = 6.92, p = 0.047) and OR × JR × SF were also significant (F (1, 5) = 15.04, p = 0.012).) 4.1.10. Sequential Effects The indicator traces that are summed on trial k determine the value of the current c . (We criterion Ec (k); this approximates the mean criterion value for all trials, E c = 0 for the present discussion.) But if we analyse only those index trials that set E follow at a lag of n trials after a trial on which a particular stimulus–RESPONSE
Orientation and other anisotropies
141
c will have added to it the characteristic effects procombination s–R occurred, E duced by the traces generated by s–R on those trials. (See Fig. 3.) To calculate sequential dependencies, each test stimulus range was divided into two subsets of five stimuli each: the lowest five stimuli were classified as cw, the highest five stimuli as ccw, and midpoint trials were discarded. Dependencies were calculated only for test blocks with concordant JR and CR to ensure all stimulus values were classified in the same way for both ranges. The data were combined across observers for each condition. Figure 7 shows the effects of the different preceding stimulus–RESPONSE combinations on the criterion; estimated criteria (μ = PSE − Reference Value) are plotted. The data in the upper panels are pooled over intrinsic and extrinsic context and SF. The middle panels show dependencies on earlier JR trials (intrinsic context) and the bottom panels show dependencies on CR trials (extrinsic context): judgments of 2 cpd stimuli are shown on the left, and of 10 cpd stimuli on the right. The data in the lower four panels are pooled over orientation. The results show: (1) Preceding stimulus–response combinations determine differences in the position of the criterion on the index trial at all lags examined. (2) The curves for the four preceding stimulus–response combinations are ordered as predicted in Fig. 3 (‘cw’ corresponding to ‘low’, and ‘ccw’ to ‘high’). There is a positive dependency on preceding responses and a negative dependency on preceding stimuli as predicted by the properties of tracking and stabilization. (3) Dependencies on JR stimuli (middle panels) and on CR stimuli (bottom panels) appear very similar. The effects of preceding 10 cpd stimuli on judgments of current 2 cpd stimuli cannot be distinguished from the effects of preceding 2 cpd stimuli on judgements of current 10 cpd stimuli, in these data. (4) The curves have very shallow slopes or are flat, indicating that under the constant experimental conditions δs and δr were very small or zero, as was expected. A five-way analysis of variance on these data used the factors Preceding Stimulus (cw and ccw), Preceding Response (CW and CCW), Spatial Frequency of Preceding Stimulus (2 and 10 cpd), Spatial Frequency of Current Stimulus (2 and 10 cpd) and Lag (1 to 10). As the four and five-way interactions were not significant, their sums of squares were combined to give the error term. The main effects for Preceding Stimulus (F (1, 46) = 684.4, p < 0.0001) and Preceding Response (F (1, 46) = 1497.9, p < 0.0001) were highly significant. All other main effects, including Lag, were not significant. The interactions Preceding Response × Lag (F (9, 46) = 6.81, p < 0.025), Preceding Stimulus × Preceding Response (F (1, 46) = 13.23, p < 0.01), Preceding Response × Spatial Frequency of Preceding Stimulus × Spatial Frequency of Current Stimulus (F (1, 46) = 6.10, p < 0.025) and Preceding Stimulus × Spatial Frequency of Preceding Stimulus × Spatial Frequency of Current Stimulus (F (1, 46) = 4.00, p < 0.05) were all significant. (5) The prediction that sequential effects would be shown by both ad hoc and permanent criteria is confirmed by the findings for obliques (left top panel) and for the vertical (right top panel). These results present a problem for FMTT.
142
M. Lages, M. Treisman
Figure 7. Experiment 1: sequential dependencies. Estimated criteria are plotted, scaled as μ = PSE − Reference Value, for four preceding combinations of stimulus and RESPONSE, at lags 1–10, for oblique discrimination data in the left top panel and for vertical data in the right top panel. These data are pooled over the SF values. Criteria for data pooled over orientations are similarly plotted in the middle panels which show dependencies on preceding JR trials for 10 cpd stimuli on the left and 2 cpd on the right. The bottom panels show dependencies on preceding CR trials, preceding 10 cpd CR stimuli on the left, preceding 2 cpd stimuli on the right.
Unexpectedly, the effects appear larger for the vertical than the obliques. This finding was confirmed by a four-way ANOVA conducted on the PSE estimates in this form, using three-way and higher interactions as the error term.
Orientation and other anisotropies
143
This gave the following significant effects: Preceding Response (F (1, 37) = 122.21, p < 0.001), Preceding Stimulus (F (1, 37) = 70.77, p < 0.001), Orientation (F (1, 37) = 4.22p < 0.05), Orientation × Preceding Response (F (1, 37) = 6.12, p < 0.025), and Orientation × Preceding Stimulus (F (1, 37) = 7.49, p < 0.01). The lack of a main effect for Lag (over 10 trials!) indicates that δr and δs were small or zero, as expected in the strictly controlled laboratory environment. The Orientation effect indicates that traces originating from preceding vertical trials were given greater weight, presumably by assigning larger values of s and r , than for preceding oblique trials, perhaps indicating that vertical trials are treated as more reliable information sources. The system would have been right to assume this: although the test stimulus ranges were the same size, 10◦ , for both orientations, the standardized distances from the criterion would be larger for the vertical stimuli, since the variance was less, so that the proportion of correct responses was greater. The system may be sensitive to this difference in reliability and choose values for s and r accordingly. This finding echoes the stronger effect of vertical than horizontal context when judging SF illustrated in Fig. 6. 4.2. Depth Discrimination We have confirmed CST-DL predictions relating to PSE shifts, contextual effects, sequential dependencies, and the slope of the psychometric function. These results support CST-DL’s distinction between types of criteria and its attribution of enhanced discrimination of the cardinal axes to long-term discrimination learning. It does not support a qualitative distinction between permanent and ad hoc criteria that might suggest different information retention mechanisms in the two cases. But it might be that orientation is a special case, and that anisotropies on other sensory dimensions might be found to be determined by FMTs and not by criterion setting and discrimination learning. To test the generality of our findings we turn to another dimension, depth perception. This provides an anisotropy in the form of the heightened depth discrimination provided by visual disparities near the fusion point (Farell et al., 2004): zero binocular disparity might well be thought to be a physiologically determined hard-wired value; if so it should not satisfy CST-DL predictions. We ask, first, whether the PSE is fixed or can shift when the disparity at fixation is used as a criterion for depth; is Span > 0? Second, if the permanent-ad hoc criterion distinction applies to depth, is Span lower for the presumed permanent criterion at fixation, and higher for an ad hoc criterion? Third, it would be useful to confirm that the PSE shift is not restricted to the MSS but can also be obtained with the MCS, despite the presence of the reference stimulus on every trial. CST-DL applies to any procedure which employs a criterion, whereas FMTT should assume that an FMT is set up on each MCS trial, which ensures greater resistance to ‘error’ and thus precludes PSE shifts.
144
M. Lages, M. Treisman
Fourth, FMTT would assume that presentation of the reference stimulus on each MCS trial would set up a memory trace that could decay or be modified only during the inter-stimulus interval (ISI) between the reference ending and presentation of the test stimulus. Thus, FMTT might explain away any PSE shift as due to effects acting on the memory trace during the ISI. But if there is no ISI, no PSE shift should occur. For CST-DL, indicator traces from past reference and test stimuli may persist and be effective whatever the ISI. Thus we ask whether, if we abolish the ISI by presenting reference and test stimulus at the same time, CST-DL predictions will fail. Absent monocular and movement cues, the main cues to depth are lateral binocular disparity and vergence (Backus and Matza-Brown, 2003; Enright, 1991a, b; Rady and Ishak, 1955; Wright, 1951). Retinal disparity offers a well-practised basis for a permanent criterion at zero disparity. The subject’s useful input can be restricted to disparity by requiring maintained fixation on the reference stimulus; vergence and accommodation can then make only a constant contribution to perceived depth. On the other hand, if vergence alone is used, the depth of the reference target would constitute an ad hoc criterion. Retinal disparity works best when reference and test stimuli are simultaneous; depth thresholds rise sharply when the stimuli are successive (Enright, 1991b; Westheimer, 1979). Previous work employing rapid sequential fixations on targets separated in space has shown that vergence can provide significant information at larger separations (Enright, 1991a; Wright, 1951). Brenner and van Damme (1998) showed this was not restricted to equidistance judgments and suggested depth information was provided by the change in vergence as subjects fixated on targets at different distances. Backus and Matza-Brown (2003) demonstrated the effectiveness of ‘delta vergence’, as they call this cue, in a conflict between disparity and vergence cues. The next experiment offers an opportunity to determine whether vergence can provide depth information when reference and test targets at different depths are not laterally separated in space (requiring vergence but not version) but are considerably separated in time. We examine depth discrimination using the ocular vergence cue, MCS, and a 900 ms reference-test stimulus interval, sufficient to ensure that disparity is unlikely to contribute. The reference stimulus, presented at an arbitrary distance, provides a corresponding ad hoc criterion on a dimension, vergence, which is continuous with no salient values. 4.3. Experiment 2: Ocular Vergence in Depth Discrimination The PSE shift prediction was tested using ocular vergence to provide depth information. On each trial the reference stimulus was presented at a fixed distance, followed after an inter-stimulus interval by the test stimulus. When the reference stimulus is fixated, the eyes verge so as to reduce the retinal disparity to zero or near zero. We assume that the vergence angle adopted contributes to setting a criterion. The vergence angle required to fixate the subsequent test stimulus provides an input that is tested against that criterion to determine the relative depth of the stimulus.
Orientation and other anisotropies
145
Accommodation is normally linked to vergence and can itself supply depth information (Fisher and Ciuffreda, 1988). However, the present stimuli are produced by presenting two similar haploscopic images, one to each eye. A difference in lateral disparity between the two projections evokes a corresponding vergence angle, but as the distance from each eye to the corresponding distal image is the same for all stimuli, accommodation is constant. Depth presented in a display typically appears flattened or distorted because focus cues — accommodation and retinal blur gradient — do not vary as in a real scene (Watt et al., 2005). But this should not affect the comparisons with which we are concerned. 4.3.1. Method Depth discrimination was measured using vergence angle information with the reference stimulus presented on each trial, using the MCS. Two asymmetrical and one symmetrical test stimulus ranges were employed. 4.3.2. Subjects Three male subjects, 21 to 33 years, with normal or corrected-to-normal vision, participated. Two subjects, one an author (ML), were practised and accustomed to the stereoscopic setup; the third subject was less familiar with stereoscopic viewing. A fourth subject performed almost randomly and was judged to have impaired stereovision; he was excluded. All subjects attended three sessions of half an hour each on consecutive days. They were not paid. 4.3.3. Apparatus and Stimuli The stimuli were presented in a standard Wheatstone configuration (Howard and Rogers, 1995). Two vertically and horizontally aligned Macintosh 12 in highresolution monochrome monitors, the face of each parallel to the observer’s median plane and equidistant from it, were controlled by a Macintosh PowerPC computer. The monitors were unattenuated but with linearized Colour Lookup Tables. The observer was comfortably seated facing two haploscopic mirrors, each at 45◦ to the observer’s median plane and at 45◦ to the monitor face, in a completely darkened laboratory, with his head on a chin-rest. He viewed the two screens reflected in the mirrors at a distance of 114 cm. The stimuli on the screens were identical but could be displaced horizontally so as to vary their lateral disparity. The stimulus was perceived at the centre of the visual field. The reference stimulus was a fixation cross consisting of a horizontal and a vertical line of 12 arcmin (12 pixels) intersecting at the centre of the screen and flanked by nonius lines. The test stimulus was a bright vertical line blurred to a Gaussian luminance profile with vertical and horizontal SDs of 0.6 arcmin and a peak luminance of 75 cd/m2 . Each bar was centred vertically and subtended 60 to 80 arcmin vertically; the length was randomized to prevent relative size providing a depth cue. Instructions were to focus on depth only and ignore line length. The bars were displaced horizontally on the two monitors, giving the perception of a single bar in depth at the centre of the display when the two images were fused.
146
M. Lages, M. Treisman
4.3.4. Procedure Each trial began with the presentation of the fixation cross (reference stimulus) at the centres of the two screens, with zero disparity, for 1.4 s. The reference stimulus was then followed by a 0.9 s ISI during which only a uniform dark background was visible. The test stimulus was then presented centrally; it remained present until the subject responded, or 45 s elapsed. When it appeared, the subject fixated the bar and pressed a labelled key on a Kensington keypad to indicate whether it was in front of or behind the reference stimulus. Each response was followed by a 1.4 s completely dark inter-trial interval. Three ranges of test stimuli were used, each consisting of 11 test bars spaced at intervals of 1 arcmin of horizontal disparity. They were centred on −2, 0 and 2 arcmin of disparity. Each subject attended three sessions on consecutive days. Each session consisted of one block of test trials, using one of the three test ranges. For each subject the range was centred on 2 arcmin in the first session, on 0 arcmin in the second, and on −2 arcmin in the third. Each block consisted of 12 presentations of each of 11 stimuli in random order. The experiment was conducted in complete darkness. 4.3.5. Results We asked whether a PSE shift would be shown and if so, whether Span would be relatively large. Since Span may vary between 0 and 1, we take this to mean that it will lie in the upper half of this range. Both Weibull and Gaussian psychometric functions were fitted to the data sets for each observer and range, and the overall goodness of fit was tested for each using χ 2 . For both functions there were significant departures from goodness of fit, but the functions were not significantly different. An oddity is that in each of the three ranges, the zero horizontal disparity stimulus gave a response probability close to 0.5. It is possible that this particular stimulus was unaffected by context, but it is also possible that the agreement at this point may be merely a coincidence due to noise, which would seem the more parsimonious assumption. This invites further investigation. The Gaussian fits to the data for the three test ranges (summed over observers) are shown in the upper left panel of Fig. 8. The parameters for the fitted curves, PSE, SD, for the ranges centred at −2, 0 and 2 arcmin are, respectively, −2: −3.27, 5.77; 0: −0.32, 5.76; 2: 0.061, 7.35. A one-way ANOVA for Range found that the separation of the psychometric functions was highly significant (F (2, 4) = 23.0, p = 0.006). The best-fitting linear regression of PSE on range midpoint is illustrated in the upper right panel (PSE = −1.177 + 0.832 midpoint). The slope gives Span = 0.832 (SE = 0.087). These results confirm that the PSE shift occurs on this dimension, and that Span is relatively large. It is unlikely that retinal disparity contributed significantly to the results. Absolute retinal disparity is a much weaker cue than relative disparity (Regan et al., 1986) and depth thresholds rise sharply when reference and test stimuli are not simultaneous, and continue to rise as the ISI increases (Enright, 1991b; Foley, 1976;
Orientation and other anisotropies
147
Figure 8. The results for Experiment 2, ocular vergence, are shown above and for Experiment 3, binocular disparity, below. Gaussian ogives are fitted to the data (pooled over observers) for three test ranges on the left, and regressions of the PSEs against the range midpoints are shown on the right; error bars are SEs of the mean. Squares represent +2, circles 0 and triangles −2 arcmin/AU. Range midpoints are indicated by vertical lines.
Westheimer, 1979). A vergence movement consists of an initial pre-programmed section, as in a saccade, followed by one or more feedback-controlled adjustments to achieve fixation on the target (Semmlow et al., 1994). Backus and Matza-Brown (2003) have suggested two models for the operation of delta vergence: “one in which relative disparity is measured by integrating all changes in vergence needed to achieve fixation of the second target, versus one in which the absolute retinal disparities of the targets are measured before and after the eye movement, and added to the vergence change during the eye movement”. The models are not exclusive, but here the 900 ms ISI favors the first model. 4.4. Experiment 3: Binocular Disparity in Depth Discrimination Lateral binocular disparity is used as the cue for depth. The superior discrimination at zero disparity defines it as a locus of anisotropy (Farell et al., 2004). CST-DL assumes that implicit judgments of divergences from accurate fixation in the past
148
M. Lages, M. Treisman
have established a residue of indicator traces that will sum with and dilute the effect of indicator traces set up during the current session, thus reducing the extent of PSE shifts. 4.4.1. Method Depth discrimination is measured using the MCS, with no reference-test stimulus delay. Two asymmetrical and one symmetrical test stimulus ranges are employed. On each trial (a) the reference stimulus, a fixation cross, is presented and the observer maintains vergence on this; (b) without removing the fixation cross, the test stimulus, a vertical bar varying in horizontal disparity is presented for a short interval during which both reference and test stimuli are simultaneously present. 4.4.2. Subjects One female and three male postgraduate students, of whom one was an author (ML), 21 to 33 years, participated. All were experienced psychophysical observers and, except for the author, naive regarding the experiment. They had normal or corrected-to-normal vision. Each subject attended three sessions of half an hour, held on consecutive days with at least 24 h between sessions. Subjects were not paid. 4.4.3. Apparatus and Stimuli The same apparatus was employed. The observer was seated comfortably facing the haploscopic mirrors and the keyboard in a dimly lit room with a chin-rest supporting the head, in order to keep viewing angle and distance constant. The background illumination was approximately 6.5 cd/m2 , allowing the two monitor faces to provide a fusion lock. They were viewed at a distance of 57 cm. The subject depressed a labeled key on the keyboard to indicate whether the test stimulus was in front of or behind the fixation cross. The fixation cross was displayed at zero disparity. It consisted of horizontal and vertical lines measuring 20 by 2 arcmin (10 by 1 pixel) intersecting at the screen centre and flanked vertically by nonius lines above and below which each subtended 14 by 2 arcmin (7 by 1 pixel). The fixation cross together with the nonius lines subtended 52 arcmin vertically and 20 arcmin horizontally and had a peak luminance of about 70 cd/m2 . The test stimulus was a vertical line which subtended 40 arcmin vertically and 2 arcmin horizontally. It had a Gaussian luminance profile with a vertical and horizontal SD of 1.2 arcmin. The stimulus appeared as a thin bright bar with slightly blurred edges above the fixation cross. The stimuli were presented on a uniform dark background of near-zero luminance that was constantly present. 4.4.4. Procedure In a preliminary session the adaptive up-down transformed response method (UDTR, Wetherill and Levitt, 1965) was used to estimate each subject’s discrimination threshold (calculated as the SD) for horizontal binocular disparity. There was considerable variation between subjects, their thresholds ranging from 0.22 to
Orientation and other anisotropies
149
0.51 arcmin, so that the same scale of physical values could not conveniently be used for all of them. To make the data more easily comparable when combined in the figure, with each set of data given similar weight, the horizontal disparity measures for each subject were expressed in units of that subject’s estimated SD, where a unit, referred to below as an arbitrary unit (AU), was defined as half the SD. For each subject horizontal disparity test stimuli were generated at intervals of 1 AU on this scale. All stimuli were within Panum’s fusional area (Krol and van de Grind, 1982). The use of arbitrary units here, to allow for large differences between subjects, was not necessary in the orientation experiment, where the differences were less, and constitutes a difference between the experiments. The MCS was used. At the beginning of each session the observer dark-adapted for 5 min to the uniform, blank display set at minimal luminance. Each trial began with the presentation of the fixation cross and vertical nonius lines, which the observer maintained fixation on. After 450 ms the test bar was added to the screen; it was displayed for 240 ms and then removed, leaving the fixation cross visible until the subject responded or 45 s elapsed. The fixation cross then disappeared and the black background alone was visible for a 240 ms inter-trial interval. Each subject attended three experimental sessions, held on consecutive days. Three ranges of test stimuli were employed, consisting of 11 equally spaced horizontal disparities centred on −2, 0 or 2 AU. A session consisted of a block of test stimuli employing one of these ranges: 132 trials, 12 trials at each of 11 stimulus values, were administered in random order. The ranges were presented in an ascending or descending sequence over sessions, balanced over subjects. 4.4.5. Results We asked whether the PSE would shift if there were no reference-test delay and, if so, whether the use of a permanent criterion would cause Span to be relatively small. Weibull and Gaussian psychometric functions were fitted to the data sets for each observer and range, and their overall goodness of fit was tested using χ 2 . Departures from goodness of fit were significant for both functions, but the two functions were not significantly different from each other, and the departures from goodness of fit were not systematic. The Gaussian fits to the data for the three test ranges (summed over observers on the basis of AU values) are shown in the lower left panel of Fig. 8. The parameters for the fitted curves (PSE, SD) for the ranges centred at −2, 0 and 2 AU respectively, are −2: 0.65, 2.91; 0: 1.22, 2.32; 2: 1.96, 2.11. A one-way ANOVA of the PSE values found Range highly significant (F (2, 6) = 16.0, p = 0.004). The best-fitting linear regression of PSE on range midpoint is illustrated in the lower right panel. This gives PSE = 1.277 + 0.328 midpoint. Span (the regression slope: 0.328, SE = 0.065) falls in the lower half of the range. All observers consistently underestimated the depth of the test stimuli. Since these were presented nearly half a degree above the fixation point, and since the vertical horopter is typically slanted in depth, this response bias corresponds with predictions derived from the binocular viewing geometry (Ledgeway and Rogers,
150
M. Lages, M. Treisman
1999; Schreiber et al., 2008). As fixation on the reference stimulus was maintained, vergence and accommodation were constant, which may have reduced the range of perceived depth. The continuously visible monitor face provided a fusion lock as in most stereoscopic experimental settings, and the fixation cross was present throughout each trial; thus the reference value was constantly available and there could be no need to rely on a memory trace of the reference stimulus, yet PSE shifts were obtained, as predicted. This frustrates any argument linking PSE shifts to loss of clarity of a memory trace of the reference during an ISI. CST-DL predicted that Span would be high for ocular vergence but low for lateral disparity. Span was 0.83 in the former case, and it is 0.33 for disparity. Vergence is considered to be a weak source of depth information which provides relatively low acuity (Foley, 1980). The average threshold (the SD of the psychometric function) was found to be 6.30 arcmin for ocular vergence but 0.49 arcmin for horizontal disparity. The difference may result from different levels of noise in the two systems, but the lower threshold for disparity may also reflect the advantage of employing a permanent criterion, the continual presence of the fixation cross, and the fusion lock provided by the monitor faces. 5. Discussion and Conclusion Criterion setting theory proposes that the sensory system applies a simple set of procedures to approximate the optimal decision criterion on each trial. The theory of discrimination learning erected on this basis, CST-DL, makes a number of predictions that have been tested here and have received significant support. Asymmetric test stimulus ranges produced PSE shifts for visual orientation discrimination, and for depth discrimination based on vergence or on binocular disparity. These results appear robust against variation in procedure (MCS or MSS) and in ISI and extend previous observations on SF (Lages and Treisman, 1998). The data show that isodiscriminal contextual stimuli contribute to criterion setting and both preceding test and contextual stimuli produce sequential dependencies that occur in the patterns predicted. Similar results were found both for discrimination of the vertical (permanent criterion) and of obliques (relatively ad hoc criterion) with variances significantly smaller for the former than for the latter. Significant evidence was also obtained for criterion specificity: for the same discrimination and the same stimulus sets the criterion may differ depending on which isodiscriminal stimulus set is tested and which forms the context. This was shown both for orientation discrimination, and, in a re-analysis of earlier data, for SF. CST-DL predicts smaller Span values for permanent than for ad hoc criteria. In agreement with this, the Span for disparity (0.33) is much less than for vergence (0.83). For orientation, the difference between the Spans for cardinal and oblique orientations is in the predicted direction but not significant. However, when these and other results are taken together (Treisman and Lages, 2010), the 9 points pre-
Orientation and other anisotropies
151
dicted to be large fall above Span = 0.5 and the 11 points predicted to be small fall below it. The probability of this occurring by chance is p = 2−20 < 0.000001. These findings support the theory of discrimination learning and related phenomena developed here. CST-DL explains a number of phenomena very economically. Improvement in discrimination with practice (discrimination learning) arises from the continuing recruitment of indicator traces, giving more accurate criterion stabilization. Anisotropies result, at least in part, from highly practised discrimination learning, leading to the accumulation of traces from relevant past stimuli. Effective contextual stimuli exert their effects by projection. Sensitivity to contextual effects occurs because the CSM has the ability to hold simultaneously and sum together indicator traces from different sources, preceding test and isodiscriminal contextual trials. Anisotropies and contextual effects may at first sight seem very different phenomena but in the present model they draw on the same capacity: to accumulate a large residue of traces that can provide a well-stabilized permanent criterion. For the vertical, for example, the system needs to be able to draw together traces from the many different types of vertically-oriented critical and contextual stimuli encountered in daily life. The finding that both sensory anisotropies studied, the visual vertical and the lateral disparity associated with fixation, were susceptible to criterion shifts induced by intrinsic context and, where examined, varied in response to effective extrinsic context, is evidence that anisotropies are not invariant features fixed by the physiological structure of the sensory system. The present results also reject a widely held traditional view that discrimination is a process of ‘comparing’ a sensory input to a fixed memory trace (Lages and Paul, 2006; Lages and Treisman, 1998). In every comparison between CST-DL and FMTT the former gave a better account of the data. An important alternative to memory trace theory was proposed by Bartlett (1932). He envisaged that memory was a ‘construction’ based on ‘a whole, active mass of organised past reactions or experience’ (p. 213), continually modified by new experiences, which he referred to as a schema. However, Bartlett did not offer a model of the processes underlying construction and use of the schema. CST proposes and the present results support a model for information retention that is similar in spirit to Bartlett’s schemata but provides an account of the underlying processes and a rationale for their operation. The present analysis has an important limitation: we have considered only test and contextual stimuli in the same modality. But judgments in one modality may be able to affect discriminations in another modality, and any understanding of anisotropies is incomplete that does not include an account of any cross-modal interactions. This question is taken further by Treisman and Lages (2010). Acknowledgements The experimental work described herein was carried out by M. Lages in partial fulfilment of the requirements for the D.Phil. degree at Oxford University, under
152
M. Lages, M. Treisman
the supervision of M. Treisman. Any errors in the theory presented herein are the responsibility of M. Treisman. This work was supported by an EPSRC studentship to M. L. and BBSRC grant to M. T. References Appelle, S. (1972). Perception and discrimination as a function of stimulus orientation: the “oblique effect” in man and animals, Psychol. Bull. 78, 266–278. Backus, B. T. and Matza-Brown, D. (2003). The contribution of vergence change to the measurement of relative disparity, J. Vision 3, 737–750. Bartlett, F. (1932). Remembering: A Study in Experimental and Social Psychology. Cambridge University Press, London, UK. Benjamin, A. S., Diaz, M. and Wee, S. (2009). Signal detection with criterion noise: applications to recognition memory, Psychol. Rev. 116, 84–115. Blake, R., Cepeda, N. J. and Hiris, E. (1997). Memory for visual motion, J. Exper. Psychol.: Human Percept. Perform. 23, 353–369. Brenner, E. and van Damme, W. J. M. (1998). Judging distance from ocular convergence, Vision Research 38, 493–498. Collier, G. (1954). Intertrial association at the visual threshold as a function of intertrial interval, J. Exper. Psychol. 48, 330–334. Enright, J. T. (1991a). Exploring the third dimension with eye movements: better than stereopsis, Vision Research 31, 1549–1562. Enright, J. T. (1991b). Stereo-thresholds: simultaneity, target proximity and eye movements, Vision Research 31, 2093–2100. Essock, E. A. (1980). The oblique effect of stimulus identification considered with respect to two classes of oblique effects, Perception 9, 37–46. Farell, B., Li, S. and McKee, S. P. (2004). Coarse scales, fine scales, and their interactions in stereo vision, J. Vision 4, 488–499. Fechner, G. (1860/1966). Elements of Psychophysics, Vol. 1 (transl. H. E. Adler). Holt, Rinehart and Winston, New York, USA. Finney, D. J. (1971). Probit Analysis. Cambridge University Press, Cambridge, UK. Fisher, S. K. and Ciuffreda, K. J. (1988). Accommodation and apparent distance, Perception 17, 609– 621. Foley, J. M. (1976). Successive stereo and Vernier position discrimination as a function of dark interval duration, Vision Research 16, 1269–1273. Foley, J. M. (1980). Binocular distance perception, Psycholog. Rev. 87, 411–434. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley, New York, USA. Helson, H. (1947). Adaptation-level as a frame of reference for prediction of psychophysical data, Amer. J. Psychol. 60, 1–29. Helson, H. (1964). Adaptation-Level Theory: an Experimental and Systematic Approach to Behavior. Harper & Row, New York, USA. Howard, I. P. and Rogers, B. J. (1995). Binocular Vision and Stereopsis. Oxford University Press, New York, USA. Howard, I. P. and Templeton, W. B. (1966). Human Spatial Orientation. Wiley, New York, USA. Jesteadt, W., Luce, R. D. and Green, D. M. (1977). Sequential effects in judgments of loudness, J. Exper. Psychol.: Human Percept. Perform. 3, 92–104.
Orientation and other anisotropies
153
Krol, J. D. and van de Grind, W. A. (1982). Rehabilitation of a classical notion of Panum’s fusional area, Perception 11, 615–619. Lages, M. and Paul, A. (2006). Visual long-term memory for spatial frequency? Psychonom. Bull. Rev., 13, 486–492. Lages, M. and Treisman, M. (1998). Spatial frequency discrimination: visual long-term memory or criterion setting? Vision Research 38, 557–572. Ledgeway, T. and Rogers, B. J. (1999). The effects of eccentricity and vergence angle upon the relative tilt of corresponding vertical and horizontal meridia revealed using the minimum motion paradigm, Perception 28, 143–153. Lee, B. and Harris, J. (1996). Contrast transfer characteristics of visual short-term memory, Vision Research 36, 2159–2166. Macmillan, N. A. and Creelman, C. D. (1991). Detection Theory: A User’s Guide. Cambridge University Press, Cambridge, UK. Magnussen, S., Greenlee, M. W., Aslaksen, P. M. and Kildebo, O. O. (2003). High-fidelity perceptual long-term memory revisited — and confirmed, Psycholog. Sci. 14, 74–76. Matthews, N., Rojewski, A. and Cox, J. (2005). The time course of the oblique effect in orientation judgments, J. Vision 5, 202–214. Mueller, S. T. and Weidemann, C. T. (2008). Decision noise: an explanation for observed violations of signal detection theory, Psychonom. Bull. Rev. 15, 465–494. Orban, G. A. and Vogels, R. (1998). The neuronal machinery involved in successive orientation discrimination, Prog. Neurobiol. 55, 117–147. Parducci, A. and Sandusky, A. (1965). Distribution and sequence effects in judgment, J. Exper. Psychol. 69, 450–459. Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: transforming numbers into movies, Spatial Vision 10, 437–442. Pelli, D. G. and Zhang, L. (1991). Accurate control of contrast on microcomputer displays, Vision Research 31, 1337–1350. Quick, R. F. (1974). A vector magnitude model of contrast detection, Kybernetik 16, 65–67. Rady, A. A. and Ishak, I. G. H. (1955). Relative contributions of disparity and convergence to stereoscopic acuity, J. Optic. Soc. Amer. 45, 530–534. Regan, D. (1985). Storage of spatial-frequency information and spatial-frequency discrimination, J. Optic. Soc. Amer. A2, 619–621. Regan, D., Erkelens, C. J. and Collewijn, H. (1986). Necessary conditions for the perception of motion in depth, Investigat. Ophthalmol. Vis. Sci. 27, 584–597. Schreiber, K. M., Hillis, J. M., Filippini, H. R., Schor, C. M. and Banks, M. S. (2008). The surface of the empirical horopter, J. Vision 8, 1–20. Semmlow, J. L., Hung, G. K., Horng, J.-L. and Ciuffreda, K. J. (1994). Disparity vergence eye movements exhibit preprogrammed motor control, Vision Research 34, 1335–1343. Treisman, M. (1964). Noise and Weber’s law: The discrimination of brightness and other dimensions, Psycholog. Rev. 71, 314–330. Treisman, M. (1984a). A theory of criterion setting: an alternative to the attention band and response ratio hypotheses in magnitude estimation and cross-modality matching, J. Exper. Psychol.: General 113, 443–463. Treisman, M. (1984b). Contingent aftereffects and situationally coded criteria, Annals of the New York Academy of Sciences 423, Timing and Time Perception, 131–141. Treisman, M. (1985). The magical number seven and some other features of category scaling: Properties of a model for absolute judgment, J. Math. Psychol. 29, 175–230.
154
M. Lages, M. Treisman
Treisman, M. (1987). Effects of the setting and adjustment of decision criteria on psychophysical performance, in: Progress in Mathematical Psychology — I, E. E. Roskam and R. Suck (Eds), pp. 253–297. Elsevier Science Publishers B. V. (North-Holland), Amsterdam, The Netherlands. Treisman, M. and Faulkner, A. (1984a). The setting and maintenance of criteria representing levels of confidence, J. Exper. Psychol.: Human Percept. Perform. 10, 119–139. Treisman, M. and Faulkner, A. (1984b). The effect of signal probability on the slope of the receiver operating characteristic given by the rating procedure, Brit. J. Math. Statistical Psychol. 37, 199– 215. Treisman, M. and Faulkner, A. (1985). Can decision criteria interchange locations? Some positive evidence, J. Exper. Psychol.: Human Percept. Perform. 11, 187–208. Treisman, M. and Faulkner, A. (1987). Generation of random sequences by human subjects: cognitive operations or psychophysical process? J. Exper. Psychol.: General 116, 337–355. Treisman, M. and Lages, M. (2010). Sensory integration across modalities: how kinesthesia integrates with vision in visual orientation discrimination, Seeing and Perceiving 23, 435–462. Treisman, M. and Williams, T.C. (1984). A theory of criterion setting with an application to sequential dependencies, Psychol. Rev. 91, 68–111. Vogels, R. and Orban, G. A. (1986). Decision factors affecting line orientation judgments in the method of single stimuli, Percept. Psychophys. 40, 74–84. Ward, L. M. (1982). Mixed-modality psychophysical scaling: sequential dependencies and other properties, Percept. Psychophys. 31, 53–62. Ward, L. M. (1990). Mixed-method mixed-modality psychophysical scaling, Percept. Psychophys. 48, 571–582. Ward, L. M. and Lockhead, G. R. (1970). Sequential effects and memory in category judgments, J. Exper. Psychol. 84, 27–34. Watson, A. B. (1979). Probability summation over time, Vision Research 19, 515–522. Watt, S. J., Akeley, K., Ernst, M. O. and Banks, M. S. (2005). Focus cues affect perceived depth, J. Vision 5, 834–862. Weber, E. H. (1834/1996). E. H. Weber on the Tactile Senses (transl. H. E. Ross and D. J. Murray). Erlbaum and Taylor & Francis, Hove, UK. Westheimer, G. (1979). Cooperative neural processes involved in stereoscopic acuity, Exper. Brain Res. 36, 585–597. Wetherill, G. B. and Levitt, H. (1965). Sequential estimation of points on a psychometric function, Brit. J. Math. Statistical Psychol. 18, 1–10. Wright, W. D. (1951). The role of convergence in stereoscopic vision, Proc. Phys. Soc. 64, 289–297.
Sensory Integration Across Modalities: How Kinaesthesia Integrates with Vision in Visual Orientation Discrimination Michel Treisman 1,∗ and Martin Lages 2 1
2
Department of Experimental Psychology, University of Oxford, Oxford OX1 3UD, UK Department of Psychology, University of Glasgow, 58 Hillhead Street, Glasgow, G12 8QB, Scotland, UK
Abstract Stimuli in one modality can affect the appearance and discriminability of stimuli in another, but how they do so is not well understood. Here we propose a theory of the integration of sensory information across modalities. This is based on criterion setting theory (CST; Treisman and Williams, 1984), an extension of signal detection theory which models the setting and adjustment of decision criteria. The theory of sensory integration based on CST (CST-SI) offers an account of cross-modal effects on sensory decision-making; here we consider its application to orientation anisotropy. In this case, CST-SI postulates that the postural senses are concerned with the relations between momentary body posture and the cardinal dimensions of space, vertical and horizontal, and that they also contribute to stabilizing perception of the cardinal orientations in vision through actions on the corresponding visual decision criteria, but that they have little effect on perception of diagonal orientations. Predictions from CST-SI are tested by experimentally separating the contributions that different information sources make to stabilizing the visual criteria. It is shown that reducing relevant kinaesthetic input may increase the variance for discrimination of the visual cardinal axes but not the obliques. Predictions that shift in the location of the psychometric function would be induced by varying the distribution of the test stimuli, and that this effect would be greater for oblique than cardinal axes were confirmed. In addition, peripheral visual stimuli were shown to affect the discrimination of cardinal but not oblique orientations at the focus of vision. These results support the present account of anisotropies.
Keywords Sensory integration, orientation anisotropy, oblique effect, context, decision criterion, criterion setting theory
*
To whom correspondence should be addressed. E-mail:
[email protected]
156
M. Treisman, M. Lages
1. Introduction Sensory discrimination may be enhanced at certain points on a sensory dimension. An example is orientation anisotropy or the oblique effect (Appelle, 1972; Essock, 1980; Howard and Templeton, 1966), the superior discrimination of the vertical or horizontal (‘cardinal’ or ‘principal’) axes as compared with oblique and other orientations. Lages and Treisman (2010) have presented a theory of sensory decision-making and discrimination learning which provides an account of anisotropies, for stimuli within a single modality. But their fuller understanding requires that we also consider cross-modal contributions. Below we extend the theory of discrimination learning to provide an account of cross-modal sensory integration and of its role in orientation and other anisotropies, and test it experimentally. In Fechner’s seminal model of discrimination (Fechner, 1860/1966), a standard or reference stimulus was presented, together with or followed by a test stimulus, and the subject either did or did not ‘notice’ a difference between them; the measure of discrimination was the ‘just noticeable difference’ (jnd). This assumed that a perception of the test stimulus was compared with that of the reference stimulus when they were simultaneous, or with a representation or copy, a unique memory trace of it, when they were not. To explain orientation anisotropy on the basis of such a model would require that the cardinal axes produce particularly strong longlasting representations or fixed memory traces (FMTs) to define them. The greater stability of these memory traces would give them a special resistance to forgetting and to external sources of noise and interference, such as adaptation and contextual effects, resulting in improved discrimination. An alternative to the Fechnerian approach was provided by signal detection theory (SDT; Green and Swets, 1966; Macmillan and Creelman, 1991; Tanner and Swets, 1954; Thurstone, 1927) which introduced a more sophisticated account of the role of noise, and replaced the concept of a fixed threshold with that of an adjustable decision criterion, to which the assumption of criterion variability was added by Wickelgren (1968). This approach was taken further by criterion setting theory (CST; Treisman, 1984a, b, 1985, 1987; Treisman and Faulkner, 1984a, b, 1985, 1987; Treisman and Williams, 1984) which offers an account of the setting of criteria and their modification from trial to trial. Lages and Treisman (this issue) have presented a theory of discrimination learning based on criterion setting theory (CST-DL) which also models contextual effects and anisotropies. While Lages and Treisman (2010) have successfully tested predictions derived from this model, for both orientation and depth discrimination, CST-DL has a major limitation: it is restricted to test and contextual stimuli in the same modality. But information from more than one sensory modality may be used in judging orientation in visual space: vision may be supplemented by information from other postural senses, the vestibular system, proprioceptive sensation from muscles, ligaments and joints, and somaesthetic sensation from touch and pressure; we use the term ‘kinaesthesia’ to cover these senses. For example, in Mittelstaedt’s (1983) study of the effect of posture on recognition of the visual vertical, when observers
Sensory integration in orientation anisotropy
157
who had been rotated away from the vertical were asked to adjust a tilted line to the subjective visual vertical, they placed it at an orientation rotated towards the current position of the body axis. This demonstrates that kinaesthesia affects the criterion for the visual vertical, which raises the question: How does postural or other cross-modal information affect the value of the visual criterion? The present paper develops a theory of sensory integration based on CST (CST-SI) that provides a basis for cross-modal interactions, and we test the theory in orientation discrimination. As the theory derives from CST-DL, we commence by briefly summarising its main features; for a fuller account and further references see Lages and Treisman (2010). 2. Criterion Setting Theory and Discrimination Learning SDT substitutes the concepts of noise and a computable criterion for the traditional fixed memory trace theory (FMTT) in which the test stimulus is compared with an unchanging copy or representation of the standard or reference stimulus. But it assumes that the criterion is fixed over the course of a session, which does not allow for the changeability of the environment within which we normally operate. Criterion setting theory assumes that a criterion setting mechanism (CSM) attempts to compute the best current value of the criterion on each trial. It aims to optimize the criterion by making use of all available information, and it scales and modifies its records of such information to reflect its quality, initial relevance and changing relevance as time passes. CST includes the following mechanisms. 2.1. The Criterion Stabilization Mechanism This mechanism adjusts the criterion to ensure that it is centrally placed, so as to transmit the maximum information, in relation to the distributions of inputs from signal and noise or from different magnitudes of the signal. This is illustrated in Fig. 1. An initial reference decision criterion, E0 , is selected, based on previous experience, the experimental parameters, or the first few stimuli presented. On each successive trial k the CSM updates the current effective criterion Ec (k) according to rules that will tend to make it closer to optimal, given the information available. It does so by calculating Ec (k) as the sum of E0 and records of needed adjustments to the criterion, referred to as stabilization indicator traces, retained from earlier trials. If on trial k the sensory input Ek is registered on the decision axis, its deviation from the criterion, (Ek − Ec (k)), indicates the direction in which the criterion should shift on the next trial. The indicator trace is further modified by a magnitude or weighting parameter, s , determining the extent of the shift, and a decrementation parameter, δs , giving the rate at which the indicator trace declines over trials. Thus on trial k + 1, the magnitude of (a positive) trace will be s (Ek − Ec (k)) − δs . (The absolute magnitude of the trace declines by δs on each trial and it disappears on reaching zero.) Discrimination responses are of no value unless they transmit information. To maximize information transmitted, the criterion should lie at the mean of the current
158
M. Treisman, M. Lages
Figure 1. Criterion setting theory: operation of the stabilization mechanism when an initial reference stimulus is followed by a random sequence of test stimuli; four trials are shown. Distributions of sensory effects are shown on the decision axis, E, for two of the test stimuli, sn and sn+1 . On each trial i, (a) the current criterion value Ec (i) is computed as the sum of the initial reference value E0 plus indicator traces persisting from previous trials; (b) the current input Ei is registered and the direction of the difference between it and Ec (i) determines the response; (c) the indicator trace for trial i is set up. This stores an estimate of the magnitude and direction of criterion adjustment required to optimize the criterion. Indicator traces decline to zero over trials.
sensory inputs. The stabilization mechanism uses negative feedback to ensure the criterion tracks the central tendency of the sensory inputs if this changes. 2.2. The Probability Tracking Mechanism This uses a different source of information, the best estimate of the probability of the signal. Our best guides to that are our previous responses; negative or positive detections should lower or raise the probabilities we assign to the presence of the target. The probability tracking mechanism resembles the stabilization mechanism except that it uses positive feedback: a positive response should lower the criterion, encouraging more positive responses; a negative response should raise it. Tracking indicator traces have a constant initial absolute value r and the trace decreases by δr 0 on each trial until it reaches zero and disappears.
Sensory integration in orientation anisotropy
159
2.3. Feedback Feedback from external sources may be dealt with in a similar manner. As feedback is not employed below it is not further discussed here. In CST the different information sources are integrated by simply summing the currently available indicator traces to determine the net criterion shift. CST provides a theory of the effect of preceding trials on current decisionmaking. Lages and Treisman (2010) have extended CST to provide a theory of discrimination learning (CST-DL) which includes effects of past trials and contextual stimuli in the same modality; this allows it to account for a number of further phenomena. Under the extended theory, the retention of information from past occasions can improve performance over time by making current criteria more stable, which constitutes a mechanism of discrimination learning. Where a discrimination is frequently made in daily life the result may manifest as an anisotropy. CST-DL adds the following assumptions to CST (Lages and Treisman, 2010). (A1) The CSM can adjust the magnitude parameters, s and r , to weight more informative or reliable traces more highly, and it can adjust the decrementation parameters, δs and δr , to ensure that relevant traces are preserved and traces that lose their relevance are decremented at the same rate. (A2) Frequently recurring discriminations produce criteria which may be described as permanent since the residue of traces preserved from past discriminations ensure they are more effectively stabilised and maintained. For a novel task, the first few stimuli or other available information may be used to set up an ad hoc criterion, which will have less support and be more variable. (A3) Earlier test stimuli that contribute indicator traces to the CSM for the current criterion constitute the ‘intrinsic context’. There is also an ‘extrinsic context’ of ambient stimuli, some of which may be effective in modifying the test stimulus criterion. Effective contextual stimuli are those in the same modality that are ‘isodiscriminal’ to the test stimuli in that they are discriminable on the same dimension for a similar range of values. (A4) Effective extrinsic context acts on Ec (test) through a projection process which copies indicator traces generated by discriminations of the isodiscriminal contextual stimuli and transfers or ‘projects’ these copies to CSM (test), which adds them to the store of intrinsic traces, modifying their parameters if appropriate. The projected traces are summed with the intrinsic indicator traces to determine a value for Ec (test). Lages and Treisman (2010) have derived and successfully tested a number of predictions given by CST-DL. Of these three will be tested again in the course of the study below. These are: (DL-P1) An asymmetric distribution of test stimuli can cause the point of subjective equality (PSE) of the psychometric function to shift towards the midpoint of the test stimulus range. (R(m) represents a range with
160
M. Treisman, M. Lages
midpoint m.) This PSE shift will occur whether the criterion is permanent or ad hoc. (DL-P2) Permanent criteria undergo smaller PSE shifts than ad hoc criteria. A measure of PSE shift, Span, is given by the slope of the regression of obtained PSE values onto range midpoints (Lages and Treisman, 2010). (DL-P4) Criterion variance is smaller for permanent than for ad hoc criteria. Below we extend CST-DL to cover cross-modal effects, including cross-modal contributions to determining anisotropies. 3. A Criterion Setting Theory of Sensory Integration (CST-SI) Our environment is dominated by the vertical direction of the force of gravity and the horizontal surface of the earth. We navigate through this seeking to maintain our stability in relation to these axes. One sign of the effort that goes into this is our superior visual discrimination of the cardinal axes. While CST-DL provides an account of how we use visual information from previous discrimination trials and from the visual context to maintain this high performance, an account limited to the visual modality may tell only part of the story. There is considerable evidence that other modalities can affect visual orientation discrimination (Howard and Rogers, 1995; Howard and Templeton, 1966). Judgments of the visual vertical are most acute when the body is in its normal upright position, and acuity decreases with increasing tilt of the body (Fregly et al., 1965). Previous studies have demonstrated that judgments of orientation may be affected by inputs in more than one modality, including somaesthetic and vestibular (Buchanan-Smith and Heeley, 1993; Fahle and Harris, 1998; Howard and Templeton, 1966). While these findings demonstrate interaction between vision and kinaesthesia in orientation, the mechanisms that integrate information from different modalities remain to be explained. CST-DL offers an account of contextual effects when stimuli are isodiscriminal (Lages and Treisman, 2010) but cross-modal stimuli necessarily do not share the same discriminal dimension. It may also be possible to have contextual interactions between stimuli in the same modality that are not isodiscriminal. To account for cross-modal sensory integration in anisotropies, we extend CST-DL to provide a criterion setting theory of sensory integration in discrimination (CST-SI) by adding a fifth assumption. (A5) Stimuli in different modalities (or the same modality) may be functionally linked; such stimuli may act as effective extrinsic context for each other. Judgments of the distance of an object and judgments of its size are not isodiscriminal. Different sources of information enter into each discrimination. But they are functionally linked, in that the distance and retinal size of a given object vary inversely, so that information provided by one source may be relevant to judging the other, and variation in either dimension may have the same implications for behaviour. This could make it advantageous, under appropriate conditions, for judgments of retinal size to contribute to setting criteria for judgments of distance. In general,
Sensory integration in orientation anisotropy
161
we assume that if two sets of non-isodiscriminal stimuli, A and B, are functionally relevant to each other, then either set may act as effective extrinsic context, contributing to setting criteria for the other. The concept of functional relevance will include cases where a real-world relationship imposes a correlation between the values to be expected of two different inputs, as with size and distance in depth perception. It will cover cases where two sets of inputs may be determined by a common cause. Consider an observer who is on the watch for signs of alarm in members of two species of birds, indicating the possible detection of a predator; if he sees such signs in one species, this might justify lowering his criterion for detecting alarm in the other. It will apply when the consequences for the behaviour of the observer of detecting a high value on one dimension are the same as the consequences of detecting a high (or if inversely related, low) value on the other. In all cases where coordinating decisions on two dimensions may improve the efficiency of behaviour, we make the assumption: If variation of contextual stimuli A along dimension A is functionally relevant to variation of stimuli B (which could be the same stimuli) along dimension B , indicator traces generated by covert or overt A judgments may be projected to the corresponding criterion setting mechanism for dimension B , CSM(B ). Thus discrimination of the direction of gravity in proprioception is functionally relevant to and so may affect the discrimination of departures from the vertical in vision. This functional link arises because detection of a given deviation from the direction of gravity (which defines the visual vertical) in either modality would require the organism to make a similar corrective response to maintain its posture. Thus functional relatedness provides the condition for kinaesthetic discriminations of body position to project indicator traces to the mechanisms setting the criterion for the visual vertical. The experiment below further tests the three predictions, (DL-P1), (DL-P2) and (DL-P4), given above. To these CST-SI adds three further predictions: (SI-P1) Homomodal stimuli that are isodiscriminal to the test stimuli may be effective as extrinsic context whatever the number of orthogonal dimensions on which the sets may differ. Lages and Treisman (1998, 2010) examined whether isodiscriminal contextual stimuli needed to look similar to the critical stimuli to act as effective context. They showed that such stimuli could modify the criterion for the critical stimuli despite differing in value on an orthogonal dimension. For example, the criterion for discriminating spatial frequencies, using SF test stimuli that are vertical, can be modified by contextual SF stimuli that are horizontal. We take this further: the principle that all relevant information should be used in criterion setting requires that the possession of useful information, not overall appearance, should determine the relevance of context. Thus isodiscriminal stimuli can be effective context even if they differ in many aspects of appearance.
162
M. Treisman, M. Lages
(SI-P2) Functionally relevant kinaesthetic inputs may contribute to setting the criteria for visual discrimination of cardinal orientations. Although kinaesthesia and vision are not isodiscriminal they share common implications for behaviour. The consideration that the need to stay erect and avoid falls requires kinaesthetic discrimination of the direction of gravity and the slope of the ground, but not of obliques, predicts that a significant kinaesthetic input may be made to the criteria for the visual cardinal axes, but little or none for obliques. (SI-P3) If depriving the observer of information from source A, or from source B (where these are contextual sources in the same or different modalities) impairs discrimination, CST-SI predicts that depriving the observer of both sources of information should more greatly impair discrimination. This assumes that the A and B indicator trace magnitude parameters remain the same under both conditions. Thus if depriving the observer of kinaesthetic information, or of peripheral visual information, impairs discrimination of the vertical at the centre of vision, the impairment when the observer loses both should be greater than for either one alone. 4. An Experiment on Orientation Discrimination The aim was to study the effects of two types of extrinsic context on the variance of discrimination and on criterion shifts. The subject discriminated the orientations of stimuli presented on a computer screen in central vision. (We use 0(180)◦ to represent the horizontal, 45◦ for the right oblique, 90◦ for the vertical and 135◦ for the left oblique, with respect to the oculocentric (and also gravitational) vertical.) Two sources of extrinsic information were studied: (a) Peripheral visual stimuli, not nominally part of the experimental stimulus presentation. These stimuli are left in view or hidden. (b) Covert postural discriminations serving to maintain body posture. These have two functions that are relevant here. First, they serve to maintain posture in relation to gravity. When the observer is in a normal sitting position, facing a vertical screen, the direction of a vertical feature on the display is closely aligned with the direction of gravity acting on the observer’s body. We compare this with a condition in which the relation between the gravitational vertical as detected by the postural senses, and the visually defined vertical seen on the screen is removed. This is done by having subjects lie on their backs and judge stimuli on a visual display placed horizontally above their heads (see Fig. 2). The direction of gravity is now orthogonal to the display and to the long axis of the body. However, the second major function of the postural senses remains, which is to provide the observer with a sense of the ongoing relations between different body parts, and in particular a sense of the location of the long axis of the body and head which can be related to components of the environment. Information relating to the long axis of the body may still be of use in defining the vertical on the horizontal display screen.
Sensory integration in orientation anisotropy
163
Figure 2. Observation conditions. The subject may or may not use a viewing tube and goggles, and may sit or lie flat.
4.1. Method Orientation discrimination was measured using the method of single stimuli (MSS). The reference stimulus was presented at the start of each block, once only, and judgments subsequently made of test stimuli presented alone. These were sine-wave gratings presented on a computer display. The initial reference stimulus consisted of a grating at an orientation of (one of) 45◦ , 90◦ , 135◦ or 180◦ . This was followed after an interval by a block of randomly ordered test stimuli. Visual surrounds were occluded (V−) or visible (V+), allowing the effect of peripheral visual cues to be addressed. Comparing performance when the observer is sitting upright (S) or lying supine (L) allows the gravitational contribution to kinaesthetic orientation discrimination to be controlled. When the subject sits upright, somaesthetic, proprioceptive and vestibular inputs may contribute to stabilising the visual criteria for the principal axes. When the observer is supine, with the display now horizontal overhead, the gravitational vertical is perpendicular to the display screen, largely excluding gravitational cues to the axes on the horizontal screen. When the observer lies flat and also uses the viewing tube to block peripheral vision, the effects of the two deprivations are combined. Two SFs were used, 2 cpd and 10 cpd. It might seem that the latter would offer more information than the former, having about five times as much oriented contour. It thus seemed possible that the high frequency stimuli would produce steeper psychometric functions. Alternatively, the eye might reach its ceiling in extracting orientation information with the 2 cpd stimuli in which case the 10 cpd results would be similar. 4.2. Subjects One of the authors (ML), and three graduate students, aged between 24 and 28 years, with normal or corrected-to-normal visual acuity, participated in this experiment. All participants were experienced in psychophysical discrimination tasks.
164
M. Treisman, M. Lages
Except for ML, they were naive as to the aim of the experiment. They attended four sessions of two hours each, on different days. They were not paid. 4.3. Apparatus and Stimuli The tasks were programmed in C/C++ and run on a Macintosh PowerPC with a Macintosh 12 inch high-resolution monitor. The monitor was a cathode-ray tube with aluminized PC104 and PC193 phosphors. This display appeared achromatic. The monitor was calibrated using a Minolta LS-110 photometer with close-up lens, using routines from VideoToolbox (Pelli and Zhang, 1991). The on-screen luminance modulation was improved by using a video attenuator that combined the red, green, and blue output signals from the computer’s 8-bit DACs in order to simulate a linear 12-bit display. The monitor had a frame rate of 67 Hz. The centre of the screen was viewed binocularly at a distance of 114 cm. In the V− condition it was viewed through a tube of the same length. The tube was 15.25 cm (6 in) in diameter and presented a circular aperture of 7.6◦ visual angle at the screen. It was air brushed inside with non-reflecting black and bore virtually no markings. Black goggles were attached to the near end of the tube to exclude peripheral visual cues such as the monitor frame, other equipment and corners of walls. The observer was seated comfortably in front of the monitor and keyboard in a darkened cubicle, and rested his or her head on a chin-rest to view the monitor through the goggles. All stimuli were presented at the centre of the screen on a uniform background of 38.7 cd/m2 . This background continued between trials. The reference and test stimuli were sine-wave gratings presented at random spatial phase; they varied in orientation and were at one of two values of spatial frequency, 2 or 10 cycles per degree (cpd). The stimuli subtended 7.2◦ visual angle (14.4 cm by 14.4 cm). Stimulus and surround subtended 7.6◦ visual angle when viewed through the tube. The stimuli had a mean luminance of 38.7 cd/m2 and a Michelson contrast of 20%. A random-noise mask consisted of a pattern of randomly selected black and white pixels with the same average luminance and Michelson contrast as the test stimuli. The stimuli were 2 cpd and 10 cpd sine-wave gratings in a circular Gaussian spatial envelope with a standard deviation of 1.2◦ visual angle, presented at the centre of the screen. Orientations of the reference gratings were 45◦ , 90◦ , 135◦ or 180◦ . The midpoints of the ranges of test stimuli were shifted clockwise by 2◦ from the corresponding reference orientations: the ranges were distributed around midpoints of 43◦ , 88◦ , 133◦ and 178◦ . Each set of 11 test stimuli covered a range of ±5◦ around the midpoint in steps of 1◦ . For each 2 cpd and 10 cpd reference stimulus, 11 test stimuli (plus 10 stimuli for training trials) were prepared differing in orientation in steps of 1.0◦ . Under the restricted viewing condition (V−) the observer viewed the stimuli through the circular tube and goggles. In the unrestricted viewing condition (V+) the tube was removed, so that vertical, horizontal and other contours of equipment,
Sensory integration in orientation anisotropy
165
furniture and walls were visible in the visual periphery of the black, dimly lit cubicle. Background illumination was approximately 6.5 cd/m2 . To control postural cues observers either sat in an upright position with their heads supported by a chin rest (S) or they lay supine with their heads supported by a head rest (L). Stimuli were displayed on a monitor mounted in front of the observer (Condition S) or vertically above the observer with the screen horizontal (Condition L). 4.4. Procedure Each observer attended four sessions, on different days, each with a different testing condition. The two viewing conditions and two postural conditions were combined factorially to give four testing conditions for the four sessions. A session comprised four blocks of test trials, one block for each reference orientation, 45◦ , 90◦ , 135◦ and 180◦ . The sequence of testing conditions across sessions and the sequences of blocks within sessions were randomized for each observer. Blocks of trials were separated by a rest interval of 3–4 min. At the beginning of each block a 2 cpd reference grating of oblique or cardinal orientation was presented for 10 s followed by a mask for 0.5 s. Then a 10 cpd grating of the same orientation was presented for 10 s again followed by a mask for 0.5 s. Both masks consisted of random pixels and were shown for 0.5 s each. The reference stimuli were always given in this order. After a retention interval of 10 s whose end was signalled by a short beep, the block of test trials began. This started with twenty warm-up trials. These were followed by twelve repetitions of eleven low and eleven high spatial frequency stimuli in randomly intermixed order. In this sequence trials with 2 cpd and 10 cpd gratings were randomly intermixed. Each test stimulus was presented for 1.0 s followed by a random-noise mask which persisted until the observer pressed a response key. Following each response, there was a 1.5 s intertrial interval during which only the uniform blank background was visible. On each trial the observer had to decide whether the grating was tilted clockwise or counter-clockwise from the initial reference grating. No feedback was given. The subject responded by pressing one of two labelled keys. 5. Results The main experimental questions relate to the slopes of the psychometric functions and the PSE shifts. Both Weibull and Gaussian distribution functions were fitted to each data set. However, the ratio of summed χ 2 values for the two function types significantly favoured Gaussian over Weibull psychometric functions (F (968, 968) = 1.22, p < 0.05). Therefore, only results from the Gaussian analyses are reported below. Figure 3 plots the proportion of counterclockwise responses for each stimulus value and the Gaussian psychometric functions fitted to each combination of viewing condition and body position, for oblique gratings on the left and principal axes
166
M. Treisman, M. Lages
Figure 3. Gaussian psychometric functions fitted to the proportions of CCW responses, for oblique gratings on the left and principal axes on the right. Upper panels: judgments of 2 cpd stimuli, lower panels: 10 cpd. Curves are shown for each combination of viewing condition and body position. S/V+: filled circles; L/V+: filled triangles; S/V−: empty circles; L/V−: empty triangles.
on the right. The two upper panels give results for 2 cpd stimuli and the lower panels for 10 cpd. To prepare representative curves for plotting the data were pooled across observers, as individual effects were comparable, and were pooled for the two oblique orientations and for the two principal axes. The difference between filled and empty symbols of the same shape illustrates the effect of peripheral visual stimulation. A comparison of circles and triangles, both filled or both empty, illustrates the effect of reducing gravitational cues. Table 1(a) shows the mean standard deviations of the psychometric functions for the four testing conditions for the oblique and principal orientations. Figure 3 and Table 1 provide answers to the questions raised earlier (significance tests are presented below). The predicton (DL-P1) was that PSE shifts would be generated by both ad hoc and permanent criteria. In each panel in Fig. 3, the right vertical line is the reference value, and the left vertical line represents the midpoint of the test stimulus range. In every case, for both oblique and cardinal orientations, at the 50 per cent level the psychometric function is shifted away from the reference value in the direction of the range midpoint. If the cardinal axes were anchored by fixed and unchanging memory traces, such PSE shifts should not occur.
167
Sensory integration in orientation anisotropy Table 1.
(a) Mean standard deviations for four testing conditions (S = Sitting, L = Lying; V+ = Visual periphery visible, V− = Visual periphery occluded), for two orientations (O = Obliques, P = Principal)
S/V+ L/V+ S/V− L/V−
Obliques
Principal axes
2.26 2.16 2.42 2.19
0.35 0.31 0.89 1.43
(b) Mean Half-Spans for eight conditions, compared for two orientations on left, for two postures on right (2 = 2 cpd; 10 = 10 cpd)
S/V+/2 S/V+/10 S/V−/2 S/V−/10 L/V+/2 L/V+/10 L/V−/2 L/V−/10
Oblique
Principal
0.620 0.639 0.485 0.547 0.443 0.329 0.373 0.490
0.254 0.265 0.372 0.311 0.191 0.121 0.290 0.296
O/V+/2 P/V+/2 O/V+/10 P/V+/10 O/V−/2 P/V−/2 O/V−/10 P/V−/10
Sitting
Lying
0.620 0.254 0.639 0.265 0.485 0.372 0.547 0.311
0.443 0.191 0.329 0.121 0.373 0.290 0.490 0.296
(c) Half-Spans: Interaction between Orientation and Spatial Frequency Orientations
Oblique
2 cpd 10 cpd
45◦ 0.873 0.492
Principal 135◦ 0.0875 0.510
90◦ 0.342 0.288
180◦ 0.211 0.208
The second prediction (DL-P2) was that PSE shifts would be larger for ad hoc than for permanent criteria. Thus Half-Span = (PSE − Reference orientation)/(Range midpoint − Reference orientation) should be greater for oblique orientations than for the principal axes. The results are listed in Table 1(b) for 8 conditions under Oblique and Principal Orientations. In each case the mean HalfSpan is greater for Obliques than for Principal Axes. Third (DL-P4), the variance should be less for the cardinal axes than for the obliques. Figure 3 shows that the psychometric functions are steeper for the cardinal axes. Table 1(a) and Fig. 4 confirm that in every condition SD is less for the cardinal axes than for the obliques. The fourth question (SI-P1) was whether large differences in appearance between the test and isodiscriminal contextual stimuli would exclude an effect of the latter on the criterion. The peripheral visual stimuli in this experiment differed in many respects from the test stimuli, but they were mainly oriented at or near the
168
M. Treisman, M. Lages
Figure 4. The mean standard deviations in the four viewing conditions are plotted separately for the left and right oblique and the vertical and horizontal orientations. S/V+: filled circles; L/V+: filled squares; S/V−: empty circles; L/V−: empty squares. The data for 0◦ and 180◦ are the same.
cardinal directions. The differences between the S/V+ and S/V− conditions and between the L/V+ and L/V− conditions in the right-hand panels of Fig. 3 confirm that occluding the peripheral visual stimuli considerably increased the variance of the psychometric functions for the cardinal axes (see also Table 1(a)), but did not detectably affect the variance of the obliques (left-hand panels). It appears that, however different isodiscriminal contextual stimuli may otherwise appear, perception can abstract the presence of the relevant dimension, and use this information in maintaining the criterion. Prediction (SI-P2) implies that kinaesthetic inputs carrying information about the cardinal axes will contribute to stabilizing the criteria for the corresponding visual dimensions. If the kinaesthetic contributions are reduced, the criterion variance will increase, reducing the slopes of the visual psychometric functions. This prediction is tested by the difference between the Sitting and Lying conditions, and is supported, when the peripheral visual stimuli are occluded, by the greater slopes of the psychometric functions for L/V− than for S/V−. With the visual periphery occluded, the SD is 60% greater when subjects Lie supine than when they Sit upright. The expectation that this would be shown for the cardinal axes, but weakly if at all for the obliques, is also confirmed. The predictions that the peripheral visual and the kinaesthetic inputs would make a greater contribution to setting criteria for the cardinal axes than for other visual orientations, and that removing these inputs would reduce the cardinal axes’ advantage, decreasing the anisotropy, are supported, as can be seen in Fig. 4. Finally (SI-P3), we expected that the effects of removing kinaesthetic information and peripheral visual information should be roughly additive. But this was confirmed only in part. For the cardinal axes, the comparisons between the S/V+ and S/V− conditions, and between the S/V− and L/V− conditions show the expected effect: the loss of postural information in the last case adds to the loss of
Sensory integration in orientation anisotropy
169
visual information in the first. But there is no apparent effect when we go from the S/V+ to the L/V+ condition: with peripheral visual stimulation present, the loss of postural information is tolerated. This suggests that peripheral visual information dominates the postural information: when the visual input is present it can compensate for loss of the kinaesthetic information, perhaps because there is a ceiling effect. The two experimental manipulations may also not have been equally effective. It was possible to occlude peripheral visual information completely. But the reduction in kinaesthetic information was less complete: although the L condition made the direction of gravity orthogonal to the horizontal display, subjects’ bodies lay in a straight line with their heads supported by headrests, so that covert postural judgments relating to the long axis of the body could still have projected traces to stabilize the visual criterion. The mean SDs for the four testing conditions are plotted for each orientation in Fig. 4. At 45◦ and 135◦ these values are similar for the four conditions, in keeping with the assumption that the peripheral visual and cross-modal inputs had little effect on the criteria for visual obliques. But for both the horizontal and vertical axes the anisotropy is reduced in the S/V− condition and further reduced in the L/V− condition. Not even in the L/V− condition does SD rise to the level of the obliques, perhaps because postural information specifying the longitudinal axis of the body is still available. Separate analyses of variance were performed on the SDs for the principal orientations and the oblique orientations for the four subjects. In each analysis, the factors were Viewing Condition (V−, V+), Body Position (S, L), Orientation (45◦ and 135◦ for the oblique analysis, 90◦ and 180◦ for the principal analysis), and Spatial Frequency of the judged stimulus (2 and 10 cpd) resulting in a four-way (2 × 2 × 2 × 2) ANOVA with repeated measurements on all factors. The analysis for principal orientations found statistically significant effects of Viewing Condition (F (1, 3) = 75.31, p = 0.003) and Body Position (F (1, 3) = 14.91, p = 0.031), but no significant main effects of Orientation (F (1, 3) < 1) or Spatial Frequency (F (1, 3) < 1). A significant two-way interaction of Viewing Condition × Body Position (F (1, 3) = 31.99, p = 0.011) was also obtained. The analysis for oblique orientations showed no significant main effects of Viewing Condition (F (1, 3) < 1), Body Position (F (1, 3) = 6.24, p = 0.088), Orientation (F (1, 3) < 1), or Spatial Frequency (F (1, 3) = 3.75, p = 0.148) but the twoway interaction of Orientation × Spatial Frequency (F (1, 3) = 13.68, p = 0.034) was just significant. This relates to the occurrence of larger SDs for 10 cpd conditions than for 2 cpd conditions, this difference being greater at 45◦ . This suggests that any additional information 10 cpd stimuli may carry is redundant. The Half-Span measures for the four observers were analysed for the factors Viewing Condition, Posture, Orientation (45◦ , 90◦ , 135◦ and 180◦ ) and Spatial Frequency. The results showed a highly significant two-way interaction, Orientation × Spatial Frequency (F (3, 9) = 13.501, p = 0.001) as well as a significant third order interaction, Orientation × Viewing Condition × Spatial Frequency
170
M. Treisman, M. Lages
(F (3, 9) = 4.623, p = 0.032), but no significant main effects. However, a significant interaction in an experiment ‘can sometimes disguise or mask’ significant main effects (Walpole, 1982; see also Keppel, 1973, Kirk, 1968, and Mendenhall and Sincich, 1988) and requires careful examination of the data. Mean Half-Spans for Orientation × Spatial Frequency are given in Table 1(c). They show that for the Oblique but not for the Principal orientations, SF has a different effect for the two orientations. For 2 cpd stimuli, Half-Span is much greater at 45◦ than at 135◦ but at 10 cpd the two values are almost identical. For the principal orientations, there is a suggestion that the vertical gives larger HalfSpans than the horizontal. These effects, if repeatable, are of interest, and deserve further investigation, but they appear to be located in contrasts between the two oblique orientations, and the two principal orientations, that are not relevant to the present model. Therefore we can bypass them by comparing results for Principal and Oblique Orientations directly, averaging over the two orientations in each case; the results are shown in Table 1(b), for 8 conditions for the Oblique and Principal orientations, and the greater oblique measures accord with expectation. The overall values for Half-Span(Obliques) = 0.490 (SE = 0.0387), and Half-Span(Principal axes) = 0.263 (SE = 0.0271); a paired one-tailed t-test gives t = 6.18 (d.f. = 7, p = 0.00025), showing that the difference between the PSE shifts for Oblique and Principal orientations is highly significant, confirming prediction (DL-P2). Another observation is of interest. Inspection of the mean Half-Span values in Table 1(b) shows that for every comparable pair of readings, the Sitting value is higher than the Lying value. The overall values for Half-Span(Sitting) = 0.437 (SE = 0.0554), and Half-Span(Lying) = 0.316 (SE = 0.0432); a paired two-tailed t-test gives t = 3.68 (d.f. = 7, p = 0.0078), showing that the PSE shifts are significantly larger when Sitting than when Lying. This result was not predicted. It will be discussed further below. Random sampling without replacement was used to order the trials. A reviewer raised the question whether subjects could have exploited this to improve their performance. We believe this can be discounted because naïve subjects did not know the number of stimuli in the test range, the number of presentations of each stimulus, and whether sampling was with or without replacement. 5.1. Span for ad hoc and Permanent Criteria in Earlier and Present Experiments The theory predicts that Span will be higher for an ad hoc criterion than for a permanent criterion. Thus for any intermediate value of Span, such as 0.50, values of Span(permanent) are relatively more likely to fall below it, and values of Span(ad hoc) are relatively more likely to fall above it. Dividing the range of Span values, for convenience, at its midpoint, this suggests we might find 0 < Span(permanent) 0.5, and 0.5 Span(ad hoc) 1. Data from earlier experiments as well as the present results bear on this prediction. Results from several sources are brought together in Fig. 5 for comparison.
Sensory integration in orientation anisotropy
171
Figure 5. Span is plotted for three SF discrimination experiments (Lages and Treisman, 1998), for 2.5 and 5.0 cpd reference values (SF1, SF2, SF3/V, SF3/H). It is plotted for orientation, for 2 cpd and 10 cpd orientation data (OR1/2 and OR1/10), for oblique and principal axes; for ocular vergence; and for horizontal disparity (Lages and Treisman, 2010). Half-Span is plotted for each viewing condition, orientation and SF (SV+/2. . .LV−/10) in the present experiment.
Half-Span values from the present experiment are plotted for each combination of posture, viewing condition, orientation and SF. SF discrimination data, based on ad hoc criteria, are plotted for three experiments of Lages and Treisman (1998). Orientation data (OR1/2 and OR1/10) for oblique and vertical discrimination; a value for ocular vergence (ad hoc criterion); and for binocular disparity (permanent criterion), are plotted from Lages and Treisman (2010). SF and ocular vergence discrimination should employ ad hoc criteria; we expect Span to be large. The values for binocular disparity and for orientation at the cardinal axes relate to permanent criteria and should be small. Although the oblique orientations are less salient than the cardinal axes they are sometimes given a special place in perception and so may have the support of a smaller residue of traces. This places them at an intermediate position between the cardinal axes and a randomly chosen unfamiliar orientation, suggesting they should be intermediate in value. If we take Span = 0.5 as an arbitrary division between ‘large’ and ‘small’, the values for SF and vergence in the figure are all large. The mean of the 8 points plotted for SF is 0.82 and the value for vergence is 0.83. The values for discrimination of the moderately salient oblique orientations have a mean of 0.49. The 10 points for the principal axes average 0.28 and the value for binocular disparity is 0.33.
172
M. Treisman, M. Lages
Every result accords with the expectations of the theory: the 9 points predicted to be large fall above Span = 0.5 and the 11 points predicted to be small fall below it. The probability of this occurring by chance is p = 2−20 < 0.000001. 6. Discussion and Conclusion CST-SI proposes that decision criteria can vary between permanent and ad hoc extremes, with principal criteria being permanent and criteria for the obliques relatively ad hoc. It predicts that asymmetric stimulus ranges will produce PSE shifts but that these will be smaller for permanent criteria, which are better stabilized, than ad hoc criteria. Our results confirm these predictions. The theory predicts that as proprioceptive recognition of the direction of gravity and the slope of the ground are important for monitoring and maintaining posture, kinaesthetic information will contribute to maintaining the visual criteria for the principal axes, but will be less or unimportant for the obliques. The greater variance for L/V− than for S/V−, for the principal orientation psychometric functions but not for the obliques, supports these predictions. The low variances for the principal orientation V+ conditions confirm that peripheral visual stimuli that differ on many dimensions from the test stimuli can act as effective context, contributing to setting the principal axis criteria. Similarity of appearance is not required for contextual stimuli to be effective. Kinaesthetic inputs may serve at least two purposes. To align our bodies with gravity we require measures of the alignment of the body and the direction of gravity; kinaesthetic (including vestibular) inputs may provide both. When we sit or stand, corrective movements as the body is adjusted to align with gravity will generate vestibular and proprioceptive inputs that may be used to stabilize kinaesthetic criteria for the principal axes, which are defined in relation to gravity, e.g., Ec (kin vertical). Kinaesthesia may also be used to monitor the changing relations between the parts of the body, maintaining a body schema and relating it to a schema for the immediate environment. An important component of the body schema is the long axis of the trunk and head; kinaesthetic inputs would maintain a criterion Ec (kin long axis) defining the general lie of the body. Of the information sources that may affect the visual criteria normal vision has the highest resolution, measures based on gravity are likely to benefit from the specialized vestibular system, while our ability to define the long axis of the body, mobile and flexible as it is, in relation to the other contents of space may be least precise. The CSM may weight traces from different sources according to their usefulness; it may not be beneficial to include a source that is much less reliable than another in use (Treisman, 1998). With this in mind, we offer a tentative interpretation of the results in Table 1(a). By far the lowest variance is seen for the Principal axes under V+ conditions. This illustrates an advantage of visual peripheral context in maintaining the visual criterion; the failure of performance to worsen in the L/V+ condition strongly suggests that with visual information present kinaesthetic inputs were given low weights or not used. For the principal axes performance deteriorated
Sensory integration in orientation anisotropy
173
for the V− conditions, as compared with V+; in the absence of peripheral visual inputs performance would have needed to rely on kinaesthetic inputs. For these conditions, the clear advantage of S/V− over L/V− suggests that the kinaesthetic inputs related to gravity that were available in the former case were more reliable than the long axis inputs that remained available when subjects lay supine. In contrast to the differences seen with the principal axes, the four combinations of Viewing Condition and Posture produce remarkably similar results for oblique discriminations. It seems that when judging oblique visual stimuli, the relatively ad hoc criteria employed gain little benefit from information relating to the cardinal axes, whether from peripheral visual or gravity-related kinaesthetic sources. To obtain information relevant to the oblique criteria, an interpolation process might be necessary and this might be unreliable. The result is consistent with one source, the body schema (long axis) kinaesthetic inputs, providing a basis for oblique judgments. We noted above a finding that was not predicted but is significant and of interest. A comparison of the mean Half-Span values for Sitting and Lying in Table 1(b) shows that in every case, Half-Span for Sitting is markedly higher than for Lying. Why should this be? A possible explanation is suggested by findings reported by Lages and Treisman (2010). CST-DL assumes (A1) that the CSM can adjust the magnitude parameters, s and r , to weight more informative or reliable traces more highly. This may be illustrated by the finding that in discrimination of SFs, if the critical stimuli are oriented vertically and otherwise identical contextual isodiscriminal stimuli horizontally, or the reverse, vertical contextual stimuli produce bigger criterion shifts than do horizontal stimuli. This suggests that vertical stimuli are weighted more highly, perhaps because they are assessed as more reliable. Similarly, in orientation discrimination, if the critical stimuli have SFs of 2 cpd and contextual stimuli 10 cpd, or the reverse, 10 cpd contextual stimuli produce bigger shifts of the orientation criterion than do 2 cpd stimuli. A similar distinction may be in play here. Orientation discriminations made when the subject is lying supine and the display screen horizontal may be assessed as less reliable than discriminations made in the more familiar upright position, and Lying indicator traces assigned smaller values of s than Sitting traces. Smaller stabilization traces would result in the PSE shift produced by a block of Lying trials being less than for a block of Sitting trials. If this is so, a component of variance should be reduced in the Lying conditions. This would not apply to the comparison of S/V− and L/V− for the principal axes as it seems likely that here ‘gravity’ information employed in S/V− is replaced by considerably more variable ‘long axis’ information in L/V−. But in every other such comparison, S/V+ versus L/V+, for both principal and oblique axes, and S/V− versus L/V− for obliques, there is a small fall in SD in the Lying condition. This has not been shown to be significant, however, and at most suggests that this finding deserves further examination.
174
M. Treisman, M. Lages
Models of information combination in discrimination fall into two classes: ‘averaging’ models, which are ‘modular’ in that decisions are first made separately by two (or more) sensory discrimination mechanisms on each trial, and their final ouputs are subsequently combined according to some rule; and ‘integration’ models in which information is integrated on each trial prior to a single final decision on the combined input (Treisman, 1998). Applied to discrimination, ‘averaging’ models do not show overall discrimination (as measured by the slope of the psychometric function) that is superior to the performance of the best of the two individual discrimination mechanisms. But ‘integration’ models can generate discrimination performance that is superior to that of either individual discrimination mechanism. CST-SI is such a model. We have contrasted CST with FMTT. Memory traces as fixed unchanging simulacra of past stimuli are still widely employed in modeling discrimination (e.g., Magnussen et al., 1990). But in no case here or earlier (Lages and Treisman, 1998, 2010) has a result been obtained that favoured the FMTT prediction. A more complex model for intensity scaling and discrimination that draws on both the Thurstonian assumptions of SDT and fixed memory trace theory, and separates sensitivity and response bias, was proposed by Durlach and Braida (1969). It assumes two types of memory, sensory trace mode (which decays over time) and context coding mode (which is more stable). The latter is employed in both one- and twointerval paradigms, the trace mode, in which a stimulus may be compared with the memory trace of a previous stimulus, in two-interval paradigms only. In the context mode the stimulus is compared with the context (in the earlier theory) or (in a later revision, Braida et al. (1984)) with internal references or perceptual anchors, fixed values within or at the extremes of the stimulus range. The contribution of both memory modes to the two-interval paradigm makes it more stable. However, it is not evident that this model could account for the major findings here. If two-interval and one-interval paradigms are indeed processed differently, with the latter being less stable, it might be thought that criterion setting theory should apply only to one-interval designs. Indeed, SDT has employed a model of 2AFC which bypasses a criterion (Green and Swets, 1966; Macmillan and Creelman, 1991). However, Treisman and Leshowitz (1969) compared the SDT model for 2AFC with a model employing decision criteria, and found that the latter fitted the data better. In Lages and Treisman (Experiment 2, 2010), two intervals are compared on each trial and the data conform with CST predictions. CST requires that a criterion is employed in all psychophysical procedures. Other approaches to cross-modal sensory integration have been proposed. Landy et al. (1995) present a model for the combination of depth cues in which individual cues produce estimates of depth at each point in a ‘depth map’, and the ‘final estimate of depth at each location is a weighted average of the estimates derived from the individual cues’ (p. 409). It is shown that a similar weighted average can be derived from a Bayesian analysis. Under appropriate conditions the weights are the inverse variances of the two cues. This model is concerned with estimates of depth;
Sensory integration in orientation anisotropy
175
here we are concerned specifically with discrimination, a different psychophysical task. Several authors have proposed models in which stimulus estimates from different sources are combined by weighting them by their inverse variances and averaging them, a procedure which minimizes the resulting variance (e.g. Alais and Burr, 2004; Beierholm et al., 2008; Dyde et al., 2006; Ernst and Banks, 2002; Jacobs, 1999; Landy et al., 1995; Muller et al., 2009; Yuille and Bülthoff, 1996). In this minimal variance model (MVM), the variance of the combined estimate is less than or equal to the variances of the individual sources. In an interesting study, Alais and Burr (2004) applied this model to information combination in discrimination of stimulus localization, where the stimuli providing localization information were a visual blob and an auditory click. They obtained evidence that for a particular value of the visual stimulus the bimodal variance was better than either unimodal variance and conclude that this is evidence for the minimal variance model (MVM). But there is a major difficulty with this. The MVM is usually applied to a measure derived from a series of trials, such as a magnitude estimate, variance, PSE or threshold. Such measures are based on and summarize many successive decisions. This makes is difficult to regard the MVM as a plausible model of discrimination performance at the level of the individual trial, because such a model should explain how performance can be shaped on each separate trial so as to determine a desired overall effect, such as a reduction in variance for the block of trials. But the information available to the sensory system on an individual trial is the location of the sensory input in relation to the decision criterion on that trial, and the response this leads to, not the contribution that response may make to the variance or PSE which will characterize the block of trials once they have concluded. Thus combining variances or PSEs in accordance with the MVM cannot explain the underlying processes of discrimination. The MVM defines what the optimum may look like; it does not tell us how the sensory system gets there. Other models are based on theories of coding in neural populations. Jazayeri and Movshon (2006), for example, offer an account of discrimination and other procedures which assumes that the responses of different sensory neurons to a stimulus are weighted by the logs of their tuning curve values and summed, to provide a likelihood function, and Ma et al. (2006) offer an alternative which avoids calculating log likelihood values. Such models do not provide an account of the effects of rewards and costs on sensory decision making (Green and Swets, 1966; Macmillan and Creelman, 1991), nor of phenomena such as the PSE shift, or the patterns in which sequential dependencies fall (Lages and Treisman, 1998, 2010; Treisman and Williams, 1984), or context effects, or the patterns of cross-modal integration found above. Thus they do not offer alternatives to CST, though this does not mean they cannot be reconciled with it. For instance, the coding schemes developed by Jazayeri and Movshon (2006) or Ma et al. (2006) could apply to the processes that compute the sensory input on the decision axis produced by a given stimulus presentation (Ei in Fig. 1).
176
M. Treisman, M. Lages
The MVM could also be reconciled with CST. In CST the parameters s and r that weight the indicator traces may be given values that reflect the reliability of the corresponding sources. A feedback mechanism that optimizes performance would tend to adjust these parameters so that the final performance might well approximate the optimal calculated by the MVM. In CST, the weighted information from different sources is combined on each trial prior to the decision, and so it is capable of reducing the variance (Treisman, 1998). Wickelgren (1968) proposed that variation in the criterion contributed to the variance of the psychometric function, and Solomon and Morgan (2006) suggested that the same phenomenon may underlie the oblique effect. Another approach interprets the oblique effect as a feature of encoding in a sensory reference frame. Lipshits et al. (2005), Lipshits and McIntyre (1999) and McIntyre et al. (2001) have discussed the multisensory reference frames (Ma and Pouget, 2008; Soechting and Flanders, 1992) in relation to which visual orientation information may be stored. They found that while the multimodal reference frame may include proprioceptive and gravitational cues, the latter are not essential for the oblique effect to occur, a conclusion consistent with our results. These approaches lead us to ask how criterion variance arises and whether it may have a function; and how reference frames are set up and stitched together across different modalities and sources of information. CST represents an attempt to answer these questions. The concept of reference frames underlies work by Luyat and Gentaz (2002) who ask whether the oblique effect is defined in an egocentric or a gravitational reference frame. They define the possible sensory frames and review the relevant and sometimes conflicting evidence. Their experimental results lead them to conclude that reproduced orientations are mapped in neither a gravitational nor an egocentric reference frame but in what they call a subjective gravitational reference frame. In this approach the problem of explaining the oblique effect is seen as one of determining which of the alternative frames the orientations are defined in relation to. This looks at the oblique effect as resulting from features of the reference frame employed. CST defines the problem as arising from the nature of the task: not from the architecture of a particular sensory framework but from the operation of the decision mechanisms and the particular constraints operating on decision, such as the extent of relevant past experience that can affect the stability of the criterion in the current situation, and the availability of relevant information from other modalities and sources of information. Luyat and Gentaz (2002) believe that oblique effects such as we have described may be explained by ‘a general bias in the representation of an oblique stimulus’ which they attribute to an oblique orientation being ‘encoded in relation to the vertical and horizontal norms that define a reference frame, whereas the vertical and horizontal orientations would be encoded directly’. This leads them to conclude that ‘identifying the nature of the spatial reference frame in which the orientations are mapped would allow a better understanding of the origin’ of the effect.
Sensory integration in orientation anisotropy
177
In contrast with this classical reference framework approach, CST is concerned with the process of discriminating a stimulus in the sensory system and the effects of past, contextual and heteromodal inputs on this process. Criterion setting is a sensory stage mechanism prior to operations of representing and encoding that may occur at the higher perceptual and cognitive levels. If CST were not able to fully explain the phenomena in terms of the decision stage, it would then be appropriate to consider possible higher level operations as well. The differences between the approaches can be illustrated by considering the experiments reported by Luyat and Gentaz (2002). In these experiments subjects adjusted a luminescent rod set against a black background to produce verbally defined orientations or to reproduce a previously viewed orientation of the rod. The orientation produced when subjects attempted to set the rod to the physical vertical is described as the subjective vertical (SV). The subject’s body was either vertical or oblique. The authors’ convention for orientations is to refer to the vertical as as 0◦ , the right oblique as 45◦ , and the left oblique as −45◦ . This convention will be used while discussing their work. Some results: (1) With body vertical, subjects produced a subjective vertical on average less than 1◦ from the true vertical. With their bodies in an oblique position (45◦ or −45◦ ) the SV was shifted 3.7◦ toward the body axis. CST would suggest the following explanation. Over time, the criterion for the visual vertical, Ec (vis vertical), is determined by frequent covert judgments of visual stimuli expected to be vertical, such as the junctions between the walls of a room. These may generate orientation inputs that represent clockwise (CW) or counter clockwise (CCW) deviations from the criterion; the resulting stabilisation indicator traces will maintain the criterion relatively stable at a value near the physical vertical. In upright observers, kinaesthetic measures of pressure and tension in the body may be similarly compared with a criterion Ec (kin vertical), identifying the direction of gravity on a kinaesthetic decision axis. These are likely to generate stabilization indicator traces distributed about Ec (kin vertical). Projections of the kinaesthetic traces onto the visual decision axis will also help to stabilize Ec (vis vertical). If now the observer is placed at a −45◦ orientation, the kinaesthetically generated inputs in this position will all represent CCW departures from the vertical, and will give rise to indicator traces projected to the visual decision axis that will tend to shift Ec (vis vertical) in the CCW direction. Thus the phenomenon of the shifted SV is simply a PSE shift effect produced by the asymmetric range of kinaesthetic inputs generated in an obliquely positioned observer. (2) In the Experiment 1 reproduction task, a standard orientation was presented (one of the cardinal and oblique orientations, and the previously determined SV), the rod was moved to a different position, and the observer reproduced the standard orientation. With the observer upright, the oblique effect was ob-
178
M. Treisman, M. Lages
tained. When the observer was tilted, there was no oblique effect. SV was more precisely reproduced than other orientations. CST analyses these results as follows. SV corresponds to the permanent cardinal criterion for the vertical, shifted to a small extent by the asymmetric traces projected from the kinaesthetic system when the observer is in a tilted condition. This criterion benefits from a residue of past indicator traces, which gives it greater stability. To understand the reported absence of the oblique effect when the observer is tilted, say at −45◦ , compare the situations when SV and the physical vertical (0◦ ) are presented. Reproducing SV locates the position of Ec (vis vertical) which is the (especially stable) cardinal criterion, shifted away from the physical vertical towards −45◦ by a small PSE effect. Reproducing 0◦ requires setting up a corresponding ad hoc criterion, distinct from Ec (vis vertical), and therefore no better stabilized than the obliques. The oblique effect has in fact not disappeared; it is still to be seen as the greater precision of reproduction of SV. The extension of CST to sensory integration provides new insights into context effects. CST-DL identified as ‘effective context’ ambient stimuli in the same modality that were isodiscriminal to the test stimuli and it was shown in two cases that isodiscriminal stimuli differing from the critical stimuli on a single orthogonal dimension could act as effective context (Lages and Treisman, 1998, 2010). Here we have shown that stimuli (visual stimuli in the periphery) that are isodiscriminal to the test stimuli used in cardinal orientation discrimination can act as effective context even though they may differ on numerous dimensions from the test stimuli. These were sinewave gratings displayed for one second at the centre of the display, while the peripheral stimuli were the edges of the bezel surrounding the display screen, edges of the monitor and of the table, and the lines at which the walls and ceiling met each other; mainly approximating the vertical and horizontal. We were interested in determining whether it is safe to disregard the range of peripheral visual stimuli that may occur in a visual experiment, in the belief that they are ineffective, as is sometimes assumed. We did not attempt at this stage to distinguish the contributions of different types of such stimuli. The present result shows such stimuli are important, and suggests it would be useful to explore their effects further by examining, for instance, whether the closest stimuli, like the bezel on the monitor, have more effect, because they are nearest to the display, or whether distant stimuli like the lines at which walls and ceiling meet are more important because they best delineate the main dimensions of the experimental space, or whatever the case may be; also whether allowing only oblique lines as peripheral stimuli would reduce the oblique effect. The effect of peripheral visual stimulation in orientation discrimination has received little attention in the past literature. At earlier times, efforts were generally made to hide peripheral visual cues, but more recently it has sometimes been implicitly assumed that such stimuli are unimportant and measures to mask them off may not be reported (e.g., Clifford et al., 2001; Li and Westheimer, 1997; Westheimer and Beard, 1998; Westheimer and Li, 1996) or we may be told only that
Sensory integration in orientation anisotropy
179
the face of the monitor was masked (e.g., Boltz et al., 1979; Burr and Wijesundra, 1991). Essock et al. (2003) do report that they masked room contours. Our results point to the need to take this precaution routinely. CST-SI extends the concept of effective context to stimuli which are not isodiscriminal to the test stimuli, whether in the same or a different modality, but are functionally related to them, and it defines a mechanism for contextual interaction, modification of the test stimuli criterion. An example in perception may be the use of information from visually observed lip movements to disambiguate heard speech (Green, 1998). The discrimination of departures from the vertical in vision, and the discrimination of the direction of gravity based on kinaesthetic inputs are functionally related, as a given deviation of the body from the direction of gravity, detected in either modality, will require the same corrective movement to maintain balance. Under the assumptions of CST-SI this leads to a model for the oblique effect, based on cross-modal interactions and also on the intra-modal effects described by CST-DL (Lages and Treisman, 2010). Acknowledgements The experimental work described herein was carried out by M. Lages in partial fulfilment of the requirements for the D. Phil. degree at Oxford University, under the supervision of M. Treisman. Any errors in the theory presented herein are the responsibility of M. Treisman. This work was supported by an EPSRC studentship to M. L. and BBSRC grant to M. T. References Alais, D. and Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration, Current Biology 14, 257–262. Appelle, S. (1972). Perception and discrimination as a function of stimulus orientation: the “oblique effect” in man and animals, Psycholog. Bull. 78, 266–278. Beierholm, U. R., Körding, K. P., Shams, L. and Ma, W. J. (2008). Comparing Bayesian models for multisensory cue combination without mandatory integration, Adv. Neural Info. Proc. Systems 20, 1–8. Boltz, R. L., Harwerth, R. S. and Smith III, E. L. (1979). Orientation anisotropy of visual stimuli in rhesus monkey: a behavioral study, Science 205, 511–513. Braida, L. D., Lim, J. S., Berliner, J. E., Durlach, N. I., Rabinowitz, W. N. and Purks, S. R. (1984). Intensity perception. XIII. Perceptual anchor model of context-coding, J. Acoust. Soc. Amer. 76, 722–731. Buchanan-Smith, H. M. and Heeley, D. W. (1993). Anisotropic axes in orientation perception are not retinotopically mapped, Perception 22, 1389–1402. Burr, D. C. and Wijesundra, S.-A. (1991). Orientation discrimination depends on spatial frequency, Vision Research 31, 1449–1452. Clifford, C. W. G., Wyatt, A. M., Arnold, D. H., Smith, S. T. and Wenderoth, P. (2001). Orthogonal adaptation improves orientation discrimination, Vision Research 41, 151–159.
180
M. Treisman, M. Lages
Durlach, N. I. and Braida, L. D. (1969). Intensity perception. I. Preliminary theory of intensity resolution, J. Acoust. Soc. Amer. 46, 372–383. Dyde, R. T., Jenkin, M. R. and Harris, L. R. (2006). The subjective visual vertical and the perceptual upright, Exper. Brain Res. 173, 612–622. Ernst, M. O. and Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion, Nature 415, 429–433. Essock, E. A. (1980). The oblique effect of stimulus identification considered with respect to two classes of oblique effects, Perception 9, 37–46. Essock, E. A., DeFord, J. K., Hansen, B. C. and Sinai, M. J. (2003). Oblique stimuli are seen best (not worst!) in naturalistic broad-band stimuli: a horizontal effect, Vision Research 43, 1329–1335. Fahle, M. and Harris, J. (1998). The use of different orientation cues in vernier acuity, Percept. Psychophys. 60, 405–426. Fechner, G. (1860/1966). Elements of Psychophysics, Vol. 1 (transl. H. E. Adler). Holt, Rinehart and Winston, New York, USA. Fregly, A. R., Graybiel, A., Miller, E. F. and Van den Brink, G. (1965). Visual localization of horizontal as function of body tilt utilizing several positions with respect to gravity, Technical Report NASACR-69427, Document ID 19660006291. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley, New York, USA. Green, K. P. (1998). The use of auditory and visual information during phonetic processing: implications for theories of speech perception, in: Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech, R. Campbell and B. Dodds (Eds). Erlbaum, London, UK. Howard, I. P. and Rogers, B. J. (1995). Binocular Vision and Stereopsis. Oxford University Press, New York, USA. Howard, I. P. and Templeton, W. B. (1966). Human Spatial Orientation. Wiley, London, UK. Jacobs, R. A. (1999). Optimal integration of texture and motion cues to depth, Vision Research 39, 3621–3629. Jazayeri, M. and Movshon, J. A. (2006). Optimal representation of sensory information by neural populations, Nature Neurosci. 9, 690–696. Keppel, G. (1973). Design and Analysis: A Researcher’s Handbook. Prentice-Hall, Englewood Cliffs, NJ, USA. Kirk, R. E. (1968). Experimental Design: Procedures for the Behavioral Sciences. Brooks Cole Publishing Company, Belmont, CA, USA. Lages, M. and Paul, A. (2006). Visual long-term memory for spatial frequency? Psychonom. Bull. Rev. 13, 486–492. Lages, M. and Treisman, M. (1998). Spatial frequency discrimination: visual long-term memory or criterion setting? Vision Research 38, 557–572. Lages, M. and Treisman, M. (2010). A criterion setting theory of discrimination learning that accounts for anisotropies and context effects, Seeing and Perceiving 23, 401–434. Landy, M. S., Maloney, L. T., Johnston, E. B. and Young, M. (1995). Measurement and modeling of depth cue combination: in defense of weak fusion, Vision Research 35, 389–412. Li, W. and Westheimer, G. (1997). Human discrimination of the implicit orientation of simple symmetrical patterns, Vision Research 37, 565–572. Lipshits, M., Bengoetxea A., Cheron G. and McIntyre, J. (2005). Two reference frames for visual perception in two gravity conditions, Perception 34, 545–555.
Sensory integration in orientation anisotropy
181
Lipshits, M. and McIntyre, J. (1999). Gravity affects the preferred vertical and horizontal in visual perception of orientation, Neuroreport 10, 1085–1089. Luyat, M. and Gentaz, E. (2002). Body tilt effect on the reproduction of orientations: studies on the visual oblique effect and subjective orientations, J. Exper. Psychol.: Human Percept. Perform. 28, 1002–1011. Ma, W. J., Beck, J. M., Latham, P. E. and Pouget, A. (2006). Bayesian inference with probabilistic population codes, Nature Neurosci. 9, 1432–1438. Ma, W. J. and Pouget, A. (2008). Linking neurons to behavior in multisensory perception: a computational review, Brain Research 1242, 4–12. Macmillan, N. A. and Creelman, C. D. (1991). Detection Theory: A User’s Guide. Cambridge University Press, Cambridge, UK. Magnussen, S., Greenlee, M. W., Asplund, R. and Dyrnes, S. (1990). Perfect visual short-term memory for periodic patterns, Eur. J. Cognit. Psychol. 2, 345–362. McIntyre, J., Lipshits, M., Zaoui, M., Berthoz, A. and Gurfinkel, V. (2001). Internal reference frames for representation and storage of visual information: the role of gravity, Acta Astronautica 49, 111–121. Mendenhall, W. and Sincich, T. (1988). Statistics for the Engineering and Computer Sciences, 2nd edn. Collier Macmillan, London, UK. Mittelstaedt, H. (1983). A new solution to the problem of the subjective vertical, Naturwissenschaften 70, 272–281. Muller, C. M. P., Brenner, E. and Smeets, J. B. J. (2009). Testing a counter-intuitive prediction of optimal cue combination, Vision Research 49, 134–139. Pelli, D. G. and Zhang, L. (1991). Accurate control of contrast on microcomputer displays, Vision Research 31, 1337–1350. Soechting, J. F. and Flanders, M. (1992). Moving in three-dimensional space: frames of reference, vectors, and coordinate systems, Ann. Rev. Neurosci. 15, 167–191. Solomon, J. A. and Morgan, M. J. (2006). Stochastic re-calibration: contextual effects on perceived tilt, Proc. Royal Soc. B 273, 2681–2686. Tanner, W. P. and Swets, J. A. (1954). A decision-making theory of visual detection, Psycholog. Rev. 61, 401–409. Thurstone, L. L. (1927). Psychophysical analysis, Amer. J. Psychol. 38, 369–389. Treisman, M. (1984a). Contingent aftereffects and situationally coded criteria, Ann. New York Acad. Sci. 423, Timing and Time Perception, 131–141. Treisman, M. (1984b). A theory of criterion setting: an alternative to the attention band and response ratio hypotheses in magnitude estimation and cross-modality matching, J. Exper. Psychol.: General 113, 443–463. Treisman, M. (1985). The magical number seven and some other features of category scaling: properties of a model for absolute judgment, J. Mathemat. Psychol. 29, 175–230. Treisman, M. (1987). Effects of the setting and adjustment of decision criteria on psychophysical performance, in: Progress in Mathematical Psychology — I, E. E. Roskam and R. Suck (Eds), pp. 253–297. Elsevier Science Publishers B. V. (North-Holland), Amsterdam, The Netherlands. Treisman, M. (1998). Combining information: probability summation and probability averaging in detection and discrimination, Psychological Methods 3, 252–265. Treisman, M. and Faulkner, A. (1984a). The setting and maintenance of criteria representing levels of confidence, J. Exper. Psychol.: Human Percept. Perform. 10, 119–139.
182
M. Treisman, M. Lages
Treisman, M. and Faulkner, A. (1984b). The effect of signal probability on the slope of the receiver operating characteristic given by the rating procedure, Brit. J. Math. Statistical Psychol. 37, 199– 215. Treisman, M. and Faulkner, A. (1985). Can decision criteria interchange locations? Some positive evidence, J. Exper. Psychol.: Human Percept. Perform. 11, 187–208. Treisman, M. and Faulkner, A. (1987). Generation of random sequences by human subjects: cognitive operations or psychophysical process? J. Exper. Psychol.: General 116, 337–355. Treisman, M. and Leshowitz, B. (1969). The effects of duration, area, and background intensity on the visual intensity difference threshold given by the forced-choice procedure: derivations from a statistical decision model for sensory discrimination, Percept. Psychophys. 6, 281–296. Treisman, M. and Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies, Psycholog. Rev. 91, 68–111. Walpole, R. E. (1982). Introduction to Statistics, 3rd edn. Collier Macmillan, New York, USA. Westheimer, G. and Beard, B. L. (1998). Orientation dependency for foveal line stimuli: detection and intensity discrimination, resolution, orientation discrimination and vernier acuity, Vision Research 38, 1097–1103. Westheimer, G. and Li, W. (1996). Classifying illusory contours by means of orientation discrimination, J. Neurophysiol. 75, 523–528. Wickelgren, W. A. (1968). Unidimensional strength theory and component analysis of noise in absolute and comparative judgments, J. Mathematical Psychol. 5, 102–122. Yuille, A. L. and Bülthoff, H. H. (1996). Bayesian decision theory and psychophysics, in: Perception as Bayesian Inference, D. C. Knill and W. Richards (Eds). Cambridge University Press, Cambridge, UK.
Fechner’s Aesthetics Revisited Flip Phillips 1,∗ , J. Farley Norman 2,∗ and Amanda M. Beers 2 1
Department of Psychology and Neuroscience Program, Skidmore College, 815 North Broadway, Saratoga Springs, NY 12866, USA 2 Department of Psychology, Western Kentucky University, USA
Abstract Gustav Fechner is widely respected as a founding father of experimental psychology and psychophysics but fewer know of his interests and work in empirical aesthetics. In the later 1800s, toward the end of his career, Fechner performed experiments to empirically evaluate the beauty of rectangles, hypothesizing that the preferred shape would closely match that of the so-called ‘golden rectangle’. His findings confirmed his suspicions, but in the intervening decades there has been significant evidence pointing away from that finding. Regardless of the results of this one study, Fechner ushered in the notion of using a metric to evaluate beauty in a psychophysical way. In this paper, we recreate the experiment using more naturalistic stimuli. We evaluate subjects’ preferences against models that use various types of object complexity as metrics. Our findings that subjects prefer either very simple or very complex objects runs contrary to the hypothesized results, but are systematic none the less. We conclude that there are likely to be useful measures of aesthetic preference but they are likely to be complicated by the difficulty in defining some of their constituent parts.
Keywords Fechner, history, aesthetics, beauty, shape, form
1. Introduction Widely respected as a founding father of experimental psychology and psychophysics, toward the end of his career, Fechner also made a rather significant contribution to the field of empirical aesthetics. His curiosity with the oft-worshiped golden section (Fechner, 1865, 1871, 1876) and the authenticity questions surrounding the Holbein Madonna (Fechner, 1876) was the beginning of a significant movement whose goal was to empirically investigate art and aesthetics. Of course, structured inquiry into the questions of beauty and aesthetics can be found as far *
To whom correspondence should be addressed. E-mail:
[email protected] or
[email protected]
184
F. Phillips et al.
back as the works of Plato and Aristotle. Fechner was aware of other efforts to define beauty and its elements, such as Hogarth’s ‘line of beauty’ (Hogarth, 1873), as seen in Fig. 1 (Barasch, 2000). Nevertheless, Fechner’s was the first effort to empirically and psychophysically parameterize the elements of beauty. One of the ongoing fundamental problems in empirical aesthetics is the definition and discovery of appropriate metrics that can be used to evaluate artwork. McWhinnie (1965, 1968, 1971) provides a somewhat older review of the topic and Gibson (1975), Gibson and Pickford (1976a, b) provide an interesting discussion of the pros and cons of such an endeavor. To make empirical aesthetics a truly psychophysical enterprise there must be a meaningful way to compare the artwork with subjects’ ratings, preferences, and evaluations. The golden ratio (sometimes called the ‘golden section’ or the ‘divine proportion’) is a well-known geometrically-oriented metric that, for thousands of years, has been thought to possess special aesthetic properties. The ratio, √ 1+ 5 ≈ 1.61803, (1) 2 typically denoted by the symbol ϕ, appears in artistic artifacts that date back as early as 440 BC with the construction of the Parthenon (Hemenway, 2005) as well as various locations in nature, such as nautilus shells and sunflower seeds (Ghyka, 1977). In his rectangle study, Fechner found peak pleasingness with rectangles possessing the proportions of the golden section. Alas, there has been a terribly mixed bag
Figure 1. Hogarth’s line of beauty, an ‘s’-shaped curve. Here we have scaled it to have the proportions of the golden ratio.
Fechner’s aesthetics revisited
185
of results when attempting to replicate these findings (see Höge, 1997, for an extensive review). The most recent findings seem to suggest that Fechner’s results were anomalous and/or due primarily to methodology (e.g., Green, 1995; Höge, 1997). Of course, this has not put the question to rest (e.g., Dio et al., 2007). Furthermore, we would contend that it is somewhat difficult to assign a precise aesthetic valuation to a simple rectangle based solely on its proportions, and in isolation from all other contextual information. It is rare to observe a debate over the appeal of a simple four-sided shape, yet, as is well known by experimental psychologists, the nature of the demand characteristic is such that subjects will report some preference in these experiments. Though not reported by Fechner, we suspect that, even if the effect of the golden section is real, its effect size is somewhat limited by other factors. There are certainly other metrics that can be used as a hypothetical model for aesthetic choice. One frequently posited metric is that of complexity (see Berlyne, 1968, 1971, 1974; Birkhoff, 1933; Boselie and Leeuwenberg, 1985; Leeuwenberg and Helm, 1991, for some examples). Berlyne suggested that the aesthetic value and pleasurableness of a stimulus starts at a relatively indifferent level, then increases as a function of complexity up to a certain level, then decreases and becomes more unpleasant as complexity increases. This theory is clearly inspired by the ideas of Wundt (Wundt, 1874) with respect to stimulus intensity rather than complexity (see Fig. 2). This relationship has gained further currency within the field of robotics, concerning the desired realism of humanoid robots, known commonly as The Uncanny Valley (Mori, 1970/2005; Pollick (in press); Tondu and Bardou, 2009). The mathematician George Birkhoff extended the notion of simple complexity to include another factor, order: O M= . (2) C In this scenario, complexity, C is thought of as attentional ‘effort’ while order, O, takes on a role as the amount of ‘harmony’ or symmetry in the object, their ratio,
Figure 2. The Wundt curve, an inspiration to Berlyne and others with respect to complexity and preference.
186
F. Phillips et al.
yielding the aesthetic measure, M. There is a certain appeal to this extension — the idea that complexity and the potential for complexity are somehow in balance. Still, as with complexity, ‘order’ has its own definitional difficulties. What follows is a simple experiment in the spirit of the Fechner’s rectangle study — using three-dimensional objects with greater visual and structural complexity. 2. Experimental 2.1. Method 2.1.1. Stimuli Fifty randomly-shaped, smoothly-curved virtual objects (5 sets of 10 objects) with varying levels of complexity were created using procedures developed by Norman et al. (1995). To produce each object, a sphere was sinusoidally modulated in depth (i.e., along the z-axis). The object was then rotated randomly about each of the three Cartesian coordinate axes and modulated in depth (sinusoidally) again. This process was repeated a number of times to create the final object. We manipulated the complexity of the objects by varying the number of iterations, which ranged from 2 (producing the least complex object in each set, object 1) to 31 (producing the most complex object in the set, object 10). The magnitude of the modulation in depth was constant at each iteration for all objects. The surface of each object was defined by the positions and orientations of 8192 triangular polygons (there were 4098 unique vertices). One set of 10 stimulus objects is shown in Fig. 3. We used the variability in vertex distance (across all 4098 vertices) from the center of the objects as a measure of complexity (see Fig. 2 of Norman and Raines, 2002). Consider a sphere, the simplest globally-convex object. For a sphere, there is no (i.e., zero) variation in vertex distance from its center; the standard deviation of vertex distances is zero. Next consider a potato: its shape is more ‘complex’ in that there is more variability in the distances from its center to each surface location (i.e., there is a larger standard deviation of surface distances). In comparison, the shape of a bell-pepper is even more complex, in that there is even more variation in the surface distances from its center. All of our objects have the same overall size (the mean vertex distance was approximately 6.7 cm for all objects). However, the complexity of our objects varied widely. For each of our five sets of ten objects, the least complex object (object 1) possessed a standard deviation (of vertex distances) of 0.2 cm, while the most complex object (object 10) possessed a standard deviation of 1.1 cm. Intermediate surface complexities possessed intermediate standard deviations. The objects were rendered with texture and Lambertian shading. The texture resembled red granite and the shading was produced using OpenGL’s standard reflectance model, with an ambient component of 0.3 and a diffuse component of 0.7; the specular component was set to zero, simulating matte surfaces. The ob-
Fechner’s aesthetics revisited
187
Figure 3. An example stimulus set, ordered by complexity.
ject surfaces were illuminated by a simulated point light source at infinity that was coincident with the observer’s line of sight. The stimulus displays were rendered by a dual-processor Apple Power Macintosh G4 computer using OpenGL. Four random spatial arrangements of the ten objects for each set were created and printed, in color, on white paper. There were thus a total of 20 stimulus sheets, each depicting a different spatial arrangement of one of the five sets of ten objects. 2.1.2. Procedure Each observer was shown one of the five sets of ten stimulus objects. One of the four spatial arrangements of that object set was chosen at random. We used Fechner’s method of choice (Fechner, 1876/1997). The observers were asked to indicate which of the objects possessed the most attractive (i.e., most aesthetically pleasing) 3-D shape. The observers were given as much time as they needed to make their judgment. 2.1.3. Subjects Two-hundred observers (101 males and 99 females) participated in the experiment. All had normal, or corrected-to-normal vision.
188
F. Phillips et al.
Figure 4. Each bar in the histogram indicates the number of times a given stimulus was selected as the most attractive/aesthetically pleasing. Object complexity increases along the x-axis.
2.1.4. Results The results are shown in Fig. 4. Each bar in the histogram indicates the number of times a given stimulus was selected as the most attractive/aesthetically pleasing. Object complexity increases along the abscissa. As can be easily seen, performance was decidedly not random, χ 2 = 134.7 (p < 0.0001), with the two most complex objects garnering the greatest preference, followed by the simplest object. There was no effect of object set or gender on preference (both p > 0.1). This result runs counter to the simple complexity-based predictions and is, in fact, the inverse of such a prediction. Factoring in an ‘ordering’ parameter that emphasized symmetry (as one possible factor) would only affect the results slightly. The geometric symmetry decreases with object complexity with these object sets. We constructed one such measure, based on a method proposed by Beran (1968) for testing the uniformity of spherical distributions. The Gauss map (see Note 1) of a symmetrical spherical object will be uniformly distributed. As the spherical object becomes more asymmetric (especially in the way that our stimulus generation procedure distorts them) the Gauss map will exhibit clusters of normals and regions relatively devoid of them. The calculation of this hypothetical M is shown in Fig. 5 for our stimuli. Comparing this predicted result to the actual results shown in Fig. 4 accounts for the preference of the most simple stimuli but not for the most complicated. One possible explanation for the discrepancy of our results with those predicted by the Berlyne/Wundt hypothesis might be that our range of stimuli did not feature objects of sufficient complexity to drive preference toward the middle. While this is certainly possible, we would suggest that, since the entire range of stimuli were
Fechner’s aesthetics revisited
189
Figure 5. Calculations of M ala Birkhoff, using an ordering calculation for O based on the distribution of each stimulus’ surface normals relative to those of a sphere, and a complexity C based on the average displacement of the surface from that of a sphere. Comparing this predicted result to the actual results shown in Fig. 4 accounts for the preference of the most simple stimuli but not for the most complicated.
presented simultaneously, the subjects were able to easily calibrate their judgement against the full range of available complexities. Additionally, there are very few natural objects that possess a complexity greater than our maximally complex object. 3. General Discussion We believe that much of the problem with applying a metric to beauty comes from the simultaneous interplay of the denotative and, more significantly the connotative properties of artwork. While the denotative properties are usually clear cut and easily discernible by the viewer, the connotation is a more private event that may be vulnerable to very large individual differences. For example, there are numerous examples in the art world of the notorious ‘blank painting’ but we contend the aesthetic appeal of such works lies more in their connotative value — information that is not completely conveyed by the work itself but is suggested in addition to its literal interpretation. For example, the shape of the ‘C3’ series Corvette, manufactured from 1968 to 1982, is sometimes said to be implicative of a reclining nude woman (see Note 2). Clearly the car was not a literal expression of this shape, but rather, only suggestive of it. For this study we restricted ourselves to the denotative or literal values of the artwork under examination, but the effect of connotation should not be underestimated. Indeed, several subjects responded that the objects reminded them of other, real objects (hearts,
190
F. Phillips et al.
various parts of human anatomy, faces, rocks). Any emotional attachment to such denotation may sway an observer toward or away from a particular stimulus. In our study, the most complex objects were preferred, followed by the most simple. This runs counter to several existing theories of aesthetic choice. Of course, the notion of complexity is a difficult one. By some definitions, such as description length, our objects possess the same or similar complexity, whereas by others, such as the variability-based metric used, they fall into a naturally-appearing progression. Adding other factors to normalize the complexity, such as order, further complicates the matter. Our symmetry measure helped to explain the observers’ preference for the simpler objects, but is totally contrary to our findings of preference for the most complex objects. Lastly, demand characteristics clearly come into play. It is interesting to note that, across 200 subjects, not a single one refused to pick a preferred object, no one complained that they couldn’t do the task. In fact, some subjects ruminated significantly, trying to reduce their choice to a single object. Along with others who seek empirical explanations for aesthetic phenomena, we believe that the problem is tractable. Clearly, our results and the results of others are not random — there is structure to many of the findings. We simply must carry on Fechner’s quest to find the right metrics to explain it. Notes 1. This is essentially, the mapping of an object’s surface normals onto a sphere. 2. It should be noted that the Corvette is manufactured in Bowling Green Kentucky — home of Western Kentucky University and two of the authors. References Barasch, M. (2000). Theories of Art 3: From Impressionism to Kandinsky, 2nd edn. Routledge, New York, USA. Beran, R. J. (1968). Testing for uniformity on a compact homogeneous space, J. Appl. Probability 5, 177–195. Berlyne, D. E. (1968). The Psychology of Aesthetic Behavior (Tech. Rep.). State College, Penn State, PA, USA. Berlyne, D. E. (1971). Aesthetics and Psychobiology. Appleton Century Crofts, New York, USA. Berlyne, D. E. (1974). Studies in the New Experimental Aesthetics. Hemisphere, Washington, USA. Birkhoff, G. D. (1933). Aesthetic Measure. Harvard University Press, Cambridge, USA. Boselie, F. and Leeuwenberg, E. (1985). Birkhoff revisited: beauty as a function of effect and means, Amer. J. Psychol. 98, 1–39. Dio, C. D., Macaluso, E. and Rizzolatti, G. (2007). The golden beauty: brain response to classical and renaissance sculptures, PLoS One 2, e1201. Fechner, G. (1865). Über die Frage des goldenen Schnittes [On the question of the golden section]. Archiv für die zeichnenden Künste 11, 100–112. Fechner, G. (1871). Zur experimentalen ästhetik [On experimental aesthetics]. Hirzel, Leipzig, Germany.
Fechner’s aesthetics revisited
191
Fechner, G. (1876). Vorschule der aesthetik [Preschool of aesthetics]. Druck und Verlag von Breitkopf und Härtel, Leipzig, Germany. Fechner, G. (1997). Various attempts to establish a basic form of beauty: experimental aesthestics, golden section, and square (Translated by M. Niemann, J. Quehl, H. Höge and C. von Ossietzky). Empir. Stud. Arts 15, 115–130 (1876). Vorschule der aesthetik, Druck und Verlag von Breitkopf und Härtel, Leipzig, Germany. Ghyka, M. (1977). The Geometry of Art and Life. Dover, New York, USA. Gibson, J. (1975). Pickford and the failure of experimental esthetics, Leonardo 8, 319–321. Gibson, J. and Pickford, R. (1976a). On the failure or success of experimental aesthetics (continued), Leonardo 9, 260–261. Gibson, J. and Pickford, R. (1976b). On the failure or success of experimental aesthetics (continued), Leonardo 9, 348–349. Green, C. (1995). All that glitters: a review of psychological research on the aesthetics of the golden section, Perception 24, 937–968. Hemenway, P. (2005). Divine Proportion: Phi in Art, Nature and Science. Sterling, New York, USA. Hogarth, W. (1873). Analysis of Beauty. J. Reeves, London, UK. Höge, H. (1997). The golden section hypothesis — its last funeral. Empir. Stud. Arts 15, 233–255. Leeuwenberg, E. and Helm, P. V. D. (1991). Unity and variety in visual form, Perception 20, 595–622. McWhinnie, H. (1965). A review of some research on aesthetic measure and perceptual choice, Stud. Art Educ. 6, 34–41. McWhinnie, H. (1968). A review of research on aesthetic measure, Acta Psychologica 28, 363–375. McWhinnie, H. (1971). A review of selected aspects of empirical aesthetics III, J. Aesth. Educ. 5, 115–126. Mori, M. (1970/2005). Bukimi no tani [The uncanny valley] (Translated by F. MacDorman and T. Minato), Energy 7, 33–35. Norman, J. F. and Raines, S. R. (2002). The perception and discrimination of local 3-d surface structure from deforming and disparate boundary contours, Perception and Psychophysics 64, 1145–1159. Norman, J. F., Todd, J. T. and Phillips, F. (1995). The perception of surface orientation from multiple sources of optical information, Perception and Psychophysics 57, 629–636. Pollick, F. (in press). In Search of the Uncanny Valley. UC Media, LNICST Proceedings, Springer. Tondu, B. and Bardou, N. (2009). Aesthetics and robotics: which form to give to the human-like robot? World Acad. Sci., Engng Technol. 58, 650–657. Wundt, W. M. (1874). Grundzüge der physiologischen Psychologie [Outline of physiological psychology]. Engelmann, Leipzig, Germany.
What Comes Before Psychophysics? The Problem of ‘What We Perceive’ and the Phenomenological Exploration of New Effects Baingio Pinna ∗ Department of Architecture, Design and Planning, University of Sassari, Palazzo del Pou Salit, Piazza Duomo 6, 07041 Alghero (SS), Italy
Abstract The psychophysical methods were developed by Fechner to find out the perceptual threshold of a stimulus, that is, the weakest stimulus that could be perceived. In spite of the strong efficiency in measuring thresholds, psychophysics does not help to define the multiplicity and complexity of possible percepts emerging from the same stimulus conditions, and accordingly, of what we perceive. In order to define what we perceive it is also necessary to define what we can perceive within the multiplicity of possible visual outcomes and how they are reciprocally organized. Usually the main experimental task is aimed at focusing on the specific attribute to be measured: what comes before psychophysics, i.e., the phenomenological exploration, is typically not fully investigated either epistemologically or phenomenally, even if it assumes a basic role in the process of scientific discovery. In this work, the importance of the traditional approach is not denied. Our main purpose is to place the two approaches side by side so that they complement each other: the phenomenological exploration complements the quantitative psychophysical measurement of the qualities that emerge through the preliminary exploration. To demonstrate the basic role played by the phenomenological exploration in complementing the psychophysical investigation we introduce three critical visual conditions, called visual gradient of perceptibility, perceptible invisibility and visual levels of perceptibility. Through these conditions several new illusions are studied and some phenomenological rules are suggested.
Keywords Gestalt psychology, psychophysics, phenomenology, visual illusions, perceptual organization
*
E-mail:
[email protected]
194
B. Pinna
1. From Phenomenology to Psychophysics: The Problem of ‘What We Perceive’ The psychophysical methods were developed by Fechner (1860/1966) to find out the perceptual threshold of a stimulus, that is, the weakest stimulus that could be perceived. They are: the method of adjustment, limits and constant stimuli. These methods investigate the relationship between physical stimuli and correlated percepts: the perceptual effect is measured by systematically varying the properties of a stimulus along one or more physical dimensions. Before starting any psychophysical measurement it is necessary to define what to measure and, more generally, what we perceive. In spite of the strong efficiency in measuring absolute and difference thresholds, psychophysics does not help to define the multiplicity and complexity of possible percepts emerging from the same stimulus conditions, and accordingly, of what we perceive. In order to define what we perceive it is also necessary to define what we can perceive within the multiplicity of possible visual outcomes, i.e., within a visual gradient of perceptibility, and how they are reciprocally organized. This is what each scientist usually does during the preliminary phase of exploration of a phenomenon though focusing mostly on the main attribute under investigation. This implies that further results related to the same stimulus are not considered or perceived at all even if they might cast new light on the main phenomenon and hypotheses or be much more theoretically interesting and enlightening than the main one. In addition, this preliminary investigation is done without any conventional rule and with little phenomenological attention. Usually the main experimental task is aimed at focusing on the specific attribute to be measured. What comes before psychophysics is typically not fully investigated either epistemologically or phenomenally, even if it assumes an essential role in the process of scientific discovery. More specifically, ‘what to measure’ is led by experimental hypotheses; however ‘what to measure’ is usually a specific visual phenomenon or a property, which can emerge in its complexity and within the net of possible relations with other phenomena and properties through a good phenomenal exploration. To demonstrate the basic role played by the phenomenological exploration in complementing the psychophysical investigation we introduce three critical visual conditions, called visual gradient of perceptibility, perceptible invisibility and visual levels of perceptibility. On the basis of these conditions, several new illusions are also studied and some phenomenological rules are suggested. This work does not deny the importance of the traditional approach and is not in contrast to the current psychophysical tradition, which is deemed to be Fechner’s legacy; what we propose is a deeper and more systematic scientific attention to the phenomenological exploration. Phenomenological exploration and psychophysics are not here considered as alternatives, one for the other — alternative views are suggested instead by other authors (Bozzi, 1989, 1990; Kanizsa, 1980, 1991; Masin, 1993; Massironi, 1998; Metzger, 1941; Thinès, 1977; Vicario, 2001, 2005), who introduced the epistemological idea of a science of percepts iuxta propria principia
What comes before psychophysics?
195
independent from any psychophysical and neural substrates — but as support one for the other to obtain a more complete view of the object to be studied. The main purpose of this work is therefore to place synergistically the two approaches side by side so that they complement each other: the phenomenological exploration complements the quantitative psychophysical measurement of the qualities that emerge through the preliminary exploration. 2. General Methods In order to define appropriately the phenomena, it is necessary to adopt at least two suitable methods. Firstly, a phenomenological free-report method (as used by Gestalt psychologists) is chosen, through which untutored subjects are given a carefully chosen series of visual stimuli and asked to report anything they see. Secondly, the free-report method is supplemented by a quantitative one (magnitude estimation), where subjects are instructed to rate (in percent) the descriptions obtained in the phenomenological experiments. 2.1. Subjects Two groups of 45 undergraduate students of linguistics, art, literature, architecture and design participated in the experiments. Subjects had some notions of Gestalt psychology and visual illusions but were naive as to the effects here presented and as the purpose of the experiments. They were both male and female undergraduates at the University of Sassari and all had normal or corrected-to-normal vision. 2.2. Stimuli The stimuli were the figures illustrated in the next sections. The overall sizes of the figures were ∼5◦ . The luminance of the white background was 122.3 cd/m2 . Black and gray shapes had a luminance value of 2.6 cd/m2 and 51.3 cd/m2 . The stimuli were presented on a computer screen with ambient illumination from a Osram Daylight fluorescent light (250 lux, 5600◦ K). All conditions were displayed in frontoparallel plane at a distance of 50 cm from the observer. 2.3. Procedure 2.3.1. Phenomenological Task A phenomenological free-report method was adopted by the first group of 45 subjects. The task of the subjects was to report spontaneously what they perceived by giving, as much as possible, an exhaustive description of the main visual properties. The descriptions reported in the next sections were the descriptions of no less than 40 out of 45 subjects. All the reports were quite spontaneous and observation time was unlimited. The edited descriptions were judged by four graduate students of linguistics, naive as to the hypotheses, to provide a fair representation of those provided by the observers. The descriptions are included within the text to aid the reader in the stream of argument.
196
B. Pinna
2.3.2. Scaling Task The phenomenological free-report method is supplemented by a more quantitative one (magnitude estimation), where subjects are instructed to rate (in percent) the descriptions obtained in the phenomenological experiments. A new group of 45 subjects was instructed to scale the relative strength or salience (in percent) of the descriptions of the phenomenological task, i.e., the degree to which it captures the phenomenon being rated: ‘please rate whether this statement is an accurate reflection of your perception of the picture, on a scale from 100 (perfect agreement) to 0 (complete disagreement)’. Throughout the text, each description is followed by the result of the magnitude estimation (mean rating). Observation time was unlimited. During the experiment, subjects were allowed: to make free comparisons, confrontations, afterthoughts; to see in different ways; to make variations in the illumination, distance, etc.; to match the stimulus with every other one. The subjects could also receive suggestions/questions of any kind, like for example: What is the shape of each small element? What is the global shape? All the variations and possible comparisons occurring during the free exploration were noted down by the experimenter. This was necessary to define the best conditions for the occurrence of the emerging phenomena. 3. From What We Perceive to What We Can Perceive: Exploring a World of Possible Perceptions 3.1. The Visual Gradient of Perceptibility Within a visual stimulus, the phenomenal results do not pop out with the same strength: some appear immediately and with a clear vividness, while some others are so slight that require time or advice to be perceived and their strength can be unstable or reversible. More generally, the visual results within the same stimulus pattern can be considered as placed along a visual gradient of perceptibility. This notion is considered as a phenomenal matter and, therefore, in a phenomenal acceptation prior to and epistemologically independent from the Bayesian framework. The Bayesian approach is in fact a model of statistical inference (Kersten 1999; Kersten et al., 2005; Knill and Pouget, 2004; Knill and Richards, 1996; Mamassian et al., 2002; Rao et al., 2002) to extract information that assumes perception as a visual inference problem (Helmholtz 1866), based on Bayes’ theorem: the posterior probability is proportional to the product of the prior probability and the likelihood. In Fig. 1(a), a large square shape made up of small black squares is perceived (100). This result is similar but, at the same time, different from or opposite to the following one: black small squares arranged in a large square array (93). In the first description, the object emerging as first is the whole that organizes the perceptual meaning and the role of the single elements. In the second description, the elements create the whole shape. A third possible description is: a white orthogonal grid on a black square background (89). This result reverses the figure–ground segregation of the previous possible outcomes. Another effect, first noticed and studied
What comes before psychophysics?
197
Figure 1. The rectangle illusion: both small and large squares appear similar to vertical (b) and horizontal (c) rectangles. Control (a) the grouping diamond illusion (c, see Fig. 2).
by Hermann (1870), not visible immediately, refers to the dark spots placed at the intersection of the squares and mostly visible under peripheral vision (88). All these effects emerge not simultaneously and with different strength. Some are due to local or foveal vision, others to global or peripheral vision. It is not immediately clear which result emerges as first and in which order. This figure reveals the richness and complexity of the possible visual results; furthermore, it can be considered as a control of more interesting conditions illustrated in Fig. 1(b) and 1(c), where the visual gradient of perceptibility can be more easily described. In Fig. 1(b) and 1(c), two large square shapes made up of columns (b) or rows (c) of small black and gray squares are perceived (98) (more details of this description are not needed in this context). This global–local description is placed at about the same level of the local–global one, i.e., rows and columns of black and gray squares create two large squares (95). The organization in rows or columns is due to the Gestalt grouping principle of similarity (Wertheimer, 1922, 1923). The outer global organization in large square shapes is due to the principle of similarity of shape and to the principle of exhaustiveness. After having perceived the organization in rows and columns of Fig. 1(b) and 1(c), the same kinds of organizations can be perceived in Fig. 1(a), even if they are reversible and of short duration (89). This result was previously invisible (93). 3.1.1. The Rectangle Illusion In Fig. 1(b) and 1(c), there are two other effects, which do not emerge at first sight as strongly as the row and column organizations, although they probably depend on these organizations. The perception of the columns ‘distorts’ the large square, by lengthening its height, which appears like a vertical rectangle (94). On the contrary, the row organization induces the perceptual widening of the large square that appears like a horizontal rectangle (91). The perception of these phenomena requires a
198
B. Pinna
more precise and analytical observation of the sides and the suggestion to focus the perception on the comparison between base and height. The effect is more clearly perceived by comparing the three previous figures. Different kinds of observations, suggestions and comparisons are basic components of the phenomenological exploration of this illusion. A demonstration of the importance of this exploration is the following related effect, less immediate than the previous ones and requiring an even more subtle observation and more suggestions. This effect is the shape distortion of the small squares in the same direction as the whole square shape: each small square appears distorted like a rectangle according to the direction of the grouping (91). Both the whole and the local shape distortions are what we call the rectangle illusion (see also Pinna, 2010; Pinna and Albertazzi, in press; Pinna and Reeves, 2009). This is perceived even more strongly by comparing these figures with the control illustrated in Fig. 1(a) or by zooming the gaze only on a small array of squares, for example on the 4 × 4 squares on the top right-hand corner of Fig. 1(b) and 1(c) (96). The different difficulties experienced in localizing the 4 × 4 squares in the two figures demonstrate once more the differences in the shape perception. The rectangle illusion is opposite to the Oppel–Kundt illusion, according to which an empty (unfilled) space looks wider than a space filled by some objects (Da Pos and Zambianchi, 1996; Kristof, 1961; Oppel, 1854–1855) and to Helmholtz’s square illusion, where a square appears wider when it is filled with vertical lines and higher when filled with horizontal lines (Helmholtz, 1866). These results demonstrate that grouping influences shape perception: squares ‘become’ rectangles (92). However, grouping principles per se do not make any prediction about shape. In fact, the role of the Gestalt principles is to define the rules of ‘what is or stays with what’ and not the shape (Wertheimer, 1922, 1923). The notion of ‘whole’ due to grouping is phenomenally different from the one due to shape. The direction imparted by grouping is the phenomenal meta-attribute that influences the shape by polarizing it in one or in another specific direction both globally and locally (see Pinna, 2010; Pinna and Albertazzi, in press; Pinna and Reeves, 2009). These results clearly demonstrate the important role of the phenomenological exploration — based on different kinds of observation, suggestions and comparisons — in favouring the perception of something not immediately visible. Finally, the rectangle illusion suggests that perceptions are placed along a visual gradient of visibility, which is not a static and absolute structure but a set of possible percepts with a flexible organization in terms of perceptibility: the visibility of the ‘rectangular’ shape of the squares increased after it had been noticed (93). The phenomenological exploration influences the gradient by changing the strength and therefore the order of the possible percepts. This suggests some questions that need to be explored more deeply in further works: Why do some percepts emerge more clearly than others? How does their visibility change?
What comes before psychophysics?
199
3.1.2. The Grouping Diamond Illusion In the wake of the previous phenomenological results, we can go ahead in exploring how the inner organization of elements influences the form of shape of both the elements and the whole. This purpose is sufficient to induce in the reader a new gradient of visibility for Fig. 1(d). This gradient is different from the one of naive subjects, who saw this figure for the first time without any suggestion. Expectation and past experience are important in determining and influencing not what we can see (this was fully studied by Gestalt psychologists, who demonstrated their limited influence, see Kanizsa, 1980, 1991; Koffka, 1935; Metzger, 1975) but the shape of the gradient, namely what appears in the foreground, in the background and in the other intermediate phenomenal location of the gradient. In Fig. 1(d), the similarity principle groups the small squares along the diagonal of the whole square (97). Under these conditions, the direction of the grouping influences the shape: the small squares slightly appear as diamonds (87). Naive subjects noticed this effect after a suggestion and after perceiving the rectangle illusion of Fig. 1(c) and 1(d). The perception of the whole square as a diamond is on the contrary much less immediate, but it can be perceived through the comparison with the previous figures, now used as controls. This can be more easily perceived if the whole Fig. 1(d) is perceptually considered as a diamond rotated by 45◦ (95). This effect involving small and whole elements is the grouping diamond illusion. This illusion can be enhanced (see Figs 2 and 3; the figures within the text are represented scattered to avoid possible contaminations of the effects of adjacent stimuli) by comparing new conditions where squares and diamonds either small or
Figure 2. The grouping diamond illusion: the inner elements and the whole figures appear like having a diamond (b) and (c) or a square shape (d) and (e). Control (a).
200
B. Pinna
Figure 3. Further examples of the grouping diamond illusion.
large are systematically varied in horizontal/vertical and clockwise and anticlockwise oblique orientations. In Fig. 2(a), large squares are made up of small diamonds (100). The small squares are grouped in columns (b), rows (c), clockwise (d) and anticlockwise (e) oblique orientations (97). Figure 2(a) (control) shows diamonds joined on their vertexes to create a large square (95). The single elements can also appear as squares rotated by 45◦ , but this result is perceived very weakly (93). In Fig. 2(b), the perception of the diamond shape of the single elements is enhanced and pushed much more in the foreground of the gradient of perceptibility (94). The whole shape appears like a vertical rectangle (97). In Fig. 2(c), the small diamonds, similar to those of Fig. 2(b), appear wider than higher and the global shape is perceived as a horizontal rectangle (96). All these percepts are placed in different levels within the gradient of perceptibility. During the previous description the reader can perceive this gradient and its changes by moving the gaze from one figure to another and by comparing them. This is clearly demonstrated in Fig. 2(d) and 2(e), where the single elements are both perceived as diamonds and alternately as squares rotated by 45◦ , whose directions go anticlockwise or clockwise (98). By comparing these figures with Fig. 2(b) and 2(c), the rotated square result appears more strongly (91). A more accurate observation reveals that the square can also be perceived as rotated rectangles (94) with the longer side oriented in the direction of the grouping (the rectangle illusion). Moreover, in Fig. 2(d) and 2(e), the whole shapes are seen like diamonds rotated in
What comes before psychophysics?
201
the direction of the side but in opposite directions (97). These results do not appear either immediately or easily. It is worthwhile noticing that the rotated diamonds are not perceived as placed in the horizontal plane, but as slightly tilted, respectively anticlockwise or clockwise by 1◦ or less (88). Figure 2(a–c) is likely related to the square–diamond illusion (Mach, 1914/1959; Schumann, 1900), according to which the same geometrical figure is perceived as a square when its sides are vertical and horizontal, but as a diamond when its sides are diagonal. This effect is clearly enhanced by the grouping of the components in columns and rows. Nevertheless, the role of this illusion is very weak or absent when these figures are compared with Fig. 2(d) and 2(e), where the perceived results are the opposite of what would be expected by the square–diamond illusion: the ‘diamonds’ appear like rotated squares. (To see the distinction between ‘diamonds’ and ‘rotated squares’ compare the phenomenal results of Fig. 2(b) and 2(c) with Fig. 2(d) and 2(e). All the shapes here illustrated are ‘diamonds’ or squares with sides that are diagonal. Nevertheless, they appear like ‘diamonds’ in Fig. 2(b) and 2(c) and ‘rotated squares’ in Fig. 2(d) and 2(e). On the other way around, Fig. 3 demonstrates the independent role of the grouping diamond illusion with respect to the square–diamond illusion. Under these conditions, the small and large components are the opposite of those of Fig. 2, i.e., large diamonds and small squares with vertical/horizontal and oblique grouping (95). The comparison between the two sets of figures demonstrates the strength of the grouping diamond illusion. By considering Fig. 3(a) as a control, where small squares and large diamonds are more clearly perceived (the small elements are perceived as diamonds only along the boundaries of the large shape), the small squares appear more and more as squares in Fig. 3(b) and 3(c) or as diamonds in Fig. 3(d) and 3(e) (90). On the contrary, the large squares appear as diamonds in Fig. 3(b) and 3(c), or as squares rotated clockwise or anticlockwise by 45◦ (92). These phenomenal comparisons clarify and make easy the emergence of what can be perceived but what does not necessarily emerge spontaneously, i.e., squares that at the same time are perceived as diamonds and vice versa. The phenomenal exploration before the quantitative psychophysical measurement is necessary to obtain this result. The direction imparted by virtue of the grouping by similarity can be stronger than the direction created by both single and global shapes as illustrated in Fig. 4 (91). In Fig. 4(a) and 4(b), the two controls clearly show the diamonds, which can appear stronger (Fig. 4(b), 4(c) and 4(f)) or more similar to squares (Fig. 4(e) and 4(g)) (92). It is worthwhile noticing that the local and global shape deformations (base vs height sizes) of the rectangle illusion persist under these conditions (compare Fig. 4(a) with Fig. 4(c) and 4(d)). These shape effects involve both the small and the large components (91). Now that we have perceived these illusory shape effects, it is easier to perceive similar effects in conditions, like those illustrated in Fig. 5, where the shape effects are induced not by similarity but on the basis of the main direction created by differ-
202
B. Pinna
Figure 4. New conditions of the grouping diamond illusion.
ent objects. In Fig. 5(a) and 5(b), the circle groups with the array of black elements along two different directions, respectively, the upper sides of the array and its top vertex. The whole direction orients the shape of the small and large components that are perceived respectively as diamonds or rotated squares (90). In Fig. 5(c) and 5(d), the large components are perceived similarly to those of Fig. 5(a) and 5(b), but the small elements appear respectively like squares (c) and rotated diamonds (d) (91). The previous results demonstrated that the preliminary phenomenological approach explores the visual gradient of perceptibility and picks up possible percepts that are like islands of organized properties within the whole visual pattern. Through this exploration the profile of the gradient changes by making visible what is invisible or scarcely perceptible at first sight. This does not mean that we can perceive ad libitum everything, but what is invisible at first sight is what we can perceive within the set of possible ‘things’ placed along the gradient of perceptibility. The extreme of the gradient, i.e., the perceptible invisibility, is the topic of the next section.
What comes before psychophysics?
203
Figure 5. The whole direction, induced by the arrangement of the circle and the array of small squares, orients the shape of the small and large components that are perceived respectively as diamonds or rotated square.
3.2. The Perceptible Invisibility Figure 6 shows four conditions where the direction, determined by the grouping principle of similarity, induces different shapes within the same pattern. Briefly, when the sides of the large shapes appear made up of squares (Fig. 6(a–c)) the diagonals are perceived made up of diamonds and vice versa (Fig. 6(b–d)) (88). These alternated shape effects occur in spite of the context where all the components are geometrically equivalent and in spite of the gestalt Prägnanz principle (Wertheimer, 1922, 1923). 3.2.1. The Illusion of the Diagonal In Fig. 6, at the extreme of the gradient of perceptibility, in the region that we can call of perceptible invisibility, there is another illusion related to the perceived direction/continuation due to grouping: the diagonal is perceived slightly longer than the true diagonal of the square array that appears distorted or protruding towards the direction where the diagonal seems to continue. This is the illusion of the diagonal. After the description of the illusion, it becomes easy to perceive this effect (from 5 out of 45 subjects to 39 out of 45), but at first sight the effect is absent, weak or invisible. In Fig. 7(a) and 7(b), the illusory length of the diagonal and the protruding effect of the gray squares (92) are perceived more clearly than in Fig. 6. The direction,
204
B. Pinna
Figure 6. When the sides of the large shapes appear made up of squares, (a–c), the diagonals are perceived made up of diamonds and vice versa, (b–d).
emerging by virtue of the grouping, induces the illusory continuation of the diagonal although the array of the small squares closes the whole shape and defines its boundary contours. By removing the gray squares (Fig. 7(c) and 7(d)), the effect is enhanced, even if the comparison of the two couples of figures, now that the effect is more clearly perceived, does not reveal any significant differences in strength (90). In Fig. 7(e) and 7(f), where the effect of the direction of grouping is annulled by removing all the elements of the diagonal except the element placed at the extreme of the diagonal, i.e., in the crucial place for the illusion, the illusion is weakened even though it slightly persists (88). Under these conditions the isolated element appears floating beyond the boundaries of the virtual square created by the portions of the sides (92). The closure principle of grouping is likely responsible for this effect. Nevertheless, this principle cannot fully include the isolated square. In fact, even if the closure is favoured by the presence of this element, it cannot be totally part of the global virtual square. This depends on its phenomenal appearance of being perceived as isolated (compare the role of the closure principle in Fig. 7(e) and 7(f), and Fig. 7(g) and 7(h), by focusing on the virtual vertex). These figures demonstrated that there are other factors responsible for the illusion, also related to perceptual grouping. It is not the aim of this work to study more systematically this illusion. This can be done through the phenomenological exploration of the conditions eliciting the illusion. The main purposes of this section were to study the emergence of the diagonal illusion on the basis of the preliminary invisibility and to show the need to further explore it phenomenologically. The previous results suggested that the phenomenological exploration could be a useful preliminary tool to probe and discover new visual effects. The tool is based on free observations, comparisons, confrontations, afterthoughts, different ways of
What comes before psychophysics?
205
Figure 7. The illusion of the diagonal: the diagonal is perceived slightly longer than the true diagonal of the square array that appears distorted or protruding towards the direction where the diagonal seems to continue (for details see the text).
seeing, variations of the physical–geometrical conditions, a posteriori matches, etc. Within the phenomenological context, the object under consideration is something not restricted to rigid and controlled conditions but some kind of more complex set of properties and qualities that behave and evolve within a whole context of conditions. This is really the way we perceive our world in everyday life. The definition of the variations and the possible comparisons occurring during the phenomenological exploration, noted down during the experiment, are useful to define the best conditions for the psychophysical experiment that follows and complements quantitatively the phenomenological one. 3.3. The Visual Levels of Perceptibility 3.3.1. The Illusion of Meaning There is a third condition useful to show the role of the phenomenological exploration in picking up the qualities, which will become inputs for the following psy-
206
B. Pinna
Figure 8. The illusion of meaning: squares and something that happens to the squares (see the text for details).
chophysical measurements. This is called visual level of perceptibility and shows conditions placed about at the same location of the gradient of perceptibility but at different visual levels of complexity. In Fig. 8(a), similarly to Fig. 7(g) and 7(h), two visual levels with about the same strength of perceptibility can be described as follows: (i) two couples of segments of different length placed at right angle, joined through their end point but with a gap between the two small segments (89); (ii) a square perimeter whose top right angle is missing, deleted or cut (90). The two visual levels can be described more easily in Fig. 8(b): (i) a pentagon (92); (ii) a beveled square (98). In Fig. 8(b), a third description is quite immediate even if placed at another visual level, more analityical than the first one: five segments of three different sizes joined through their end point and creating a closed figure (91). These descriptions can be defined as geometrical and perceptual. However, it is worthwhile noticing that even a ‘geometrical’ description is a possible perceptual result among others as suggested spontaneously by the subjects. These descriptions do not represent three different phenomena, but three levels or ways of seeing the same stimulus and revealing three sets of perceptual meanings (Pinna, 2010; Pinna and Albertazzi, in press; Pinna and Reeves, 2009) placed at different levels of complexity. They are also different ways of organizing the same elements. These different forms of visual organization are called form of grouping, form of shape and form of meaning (Pinna, 2010). In Fig. 8(b), good continuation, Prägnanz and closure principles form the grouping of five segments of three different sizes joined through their end point and creating a closed figure: the form of shape joins all the sides of the figure to form a pentagon. Nevertheless, the third phenomenal result is a beveled square. Within the forms of grouping and the form of shape there is neither a ‘square’, nor a ‘beveling’. A ‘beveled square’ is the result of the form of meaning, through which a complex set of meanings reciprocally related emerges (see Pinna, 2010). It is phenomenally clear that the identification of these visual levels of perceptibility precedes any psychophysical measurement and can emerge only through a phenomenological exploration.
What comes before psychophysics?
207
As the conditions become more and more complex, like those illustrated in Fig. 8(c–f), the phenomenological exploration becomes increasingly crucial to extract the different visual levels of complexity. In addition to the visual levels, there are other qualities that emerge only through the phenomenological exploration. These qualities are organized through complementary perceptual meanings that are a square and something that happens to the square, e.g., a square that is beveled. The complementation between squares and happenings create what we called the illusion of meaning. The illusion is due to the fact that, for example in the case of the beveled square of Fig. 8(b), there is not a square and, ipso facto, there is not a beveling. In fact, we clearly perceive that this figure is not a square even though we perceive at the same time and at another level of complexity that this is a square. This double meaning of perception, modal and amodal (Pinna, 2010), depends on the phenomenological exploration. In all the conditions illustrated in Fig. 8, the subjects perceived ‘squares with something happening to them’: cutting, beveling, breaking, tearing, gnawing or nibbling, deforming, twisting, deliquescing (92). These happenings are related to the qualities of the matter of the squares that differ from one condition to another, like for example, the glass and paper-like matter of Fig. 8(c) and 8(b) (95). These qualities are not perceived immediately or reported at all within a naive description. Also, the empty-filled antinomy of Fig. 8(f) can pass unnoticed without the phenomenological exploration. Finally and more importantly, even nameless objects or those whose description is hard or impossible to be coded, like the square of Fig. 8(e), can be defined through the phenomenological exploration. The importance of the perceptual levels of complexity and of the phenomenological exploration is shown in Fig. 9(a), where the word ARTE (Italian for Art) is illustrated (see Pinna, 2008). Under these conditions, multiple levels of perception, going from the reverse figure–ground organization to the formation of illusory contours and to the creation of meta-levels of perceptual meanings contribute synergistically to create something that is not only a word but the word ‘art’ written in a special way to adapt to the meaning of the term. The same word, as illustrated in Fig. 9(b), reveals a new visual level based on deformations that induce new emerging qualities. The study of complex qualities like these is part of the purposes of the phenomenological exploration. 4. On the Phenomenological Exploration The previous results suggested that the phenomenological exploration plays a basic role in solving the problem of ‘what we perceive’ and in detecting visual qualities as well as new perceptual illusions before starting any psychophysical quantitative measurement. This role was demonstrated through three critical conditions, which show the complexity of the visual results and, thus, the need to deeply explore the phenomenological sphere. These conditions are the visual gradient of perceptibility, the perceptible invisibility and the visual levels of perceptibility.
208
B. Pinna
Figure 9. The words ARTE (Italian for Art) showing multiple levels of perception, going from the reverse figure–ground organization to the formation of illusory contours and to the creation of meta-levels of perceptual meanings.
The phenomenological approach, here considered, represents the preliminary exploration necessary to fully collect the possible percepts and their organization in terms of gradient of perceptibility and different levels of complexities. This approach does not imply or assume percept–percept couplings, i.e., the study of relations among percepts in the organized perceptual world. In other words, it does not suggest a science of percepts iuxta propria principia, whose main experimental tool is the experimental phenomenology, considered like an ethology of visual objects independent from any psychophysical and neural substrates (Bozzi, 1989, 1990; Kanizsa, 1980, 1991; Masin, 1993; Massironi, 1998; Metzger, 1941; Vicario, 2001, 2005). Differently from this epistemological approach, we suggest that the phenomenological exploration complements and enriches not only the classical psychophysical approach, by enhancing and enlarging the visual context of ‘what we perceive’ before deciding ‘what to measure’, but also the neurophysiological domain. Good phenomenology not only encompasses the complexity of visual results but also leads psychophysics and neurophysiological methods to understand the underlying neural processes and the complexity of the brain circuitry. The descriptive role of phenomenology can pass through the exploration of the neural substrate and consider the phenomenal object and the neural object as two sides of the same coin (see also Spillmann, 2009). The previous experimental results bring phenomenological knowledge within the realm of neurophysiology, showing that they are not dichotomous and irreducible. In order to be fully effective, the phenomenological exploration requires free observations, comparisons, confrontations, afterthoughts, different ways of seeing, variations of the physical–geometrical conditions, a posteriori matches, etc. In fact,
What comes before psychophysics?
209
in everyday life, vision is never restricted to rigid and controlled conditions but rather a complex set of properties and qualities behave and evolve within a more general context of conditions. The phenomenological approach explores the visual gradient of perceptibility by picking up possible percepts similar to islands of properties organized within the background of other visual islands. As a consequence, this exploration changes the profile of the gradient, thus making visible what is scarcely perceptible or even invisible at first sight. In other words, through this exploration, vision determines vision, in the sense that vision determines what can be visible and what remains invisible or pushed into the background: what is perceived pushes something else (otherwise visible) in the background or makes it invisible. This suggests that what is invisible at first sight can become visible through the phenomenological exploration and that what can become visible is not anything but only what we can perceive within the set of possible ‘things’ located within the gradient of perceptibility. 4.1. Some Rules of the Phenomenological Exploration The phenomenological exploration, applied to the previous figures, suggests some general instructions or rules useful to code this approach. These rules are based on those suggested by Gestalt psychologists. However, differently from what they proposed, on the basis of the previous results, we propose a comparison within the same phenomenological study between opposite components and aspects of the exploration useful to study the complexity of the visual percepts within the gradient and the levels of perceptibility. In other words, if the purpose of the phenomenological exploration is to discover the potential percepts based on the stimuli and not to study only what emerges immediately, then the comparisons among the opposite components of the phenomenological exploration, described below, is what can make this purpose achievable. These opposite components define the rules and the path, which allow exploration in depth of the visual gradient of perceptibility, the visual invisibility and the levels of perceptibility before any psychophysical measurement. The comparison of the outcomes emerging from these components can change the gradient of perceptibility, make visible what was previously invisible, and can induce the perception of new perceptual levels of complexity as we have seen in the previous sections. This implies that different gradients/levels are compared to create a new gradient/level of perceptibility. These opposite components of the phenomenological exploration are the following. Naive vs Expert Subjects. tored, expert or scholar.
The subject can be untutored or naive but also tu-
Free vs Controlled Conditions. The stimuli, the ambient illumination and the observation conditions can be freely manipulated by the subject to explore the complexity of the possible visual results, or conversely they can be kept fixed under controlled conditions. The comparison between the results, emerging from the two kinds of procedures, explores the range of possible percepts within different kinds of gradients of perceptibility.
210
B. Pinna
Free vs Guided Tasks. The description of a phenomenon can be a free report devoid of any a priori indication of what to see, test, measure, or define or, on the contrary, it can be a guided report based on a precise experimental task. Subjects are given a carefully chosen series of visual stimuli and asked to report anything they see or, on the contrary, to report precisely about a specific phenomenon or attribute perceived. Free vs Guided Reports. The phenomenological exploration requires the comparison between free observations, comparisons, confrontations, afterthoughts, different ways of seeing, a posteriori matches or is based on suggestions or precise indications guiding the subjects to perceive in a predetermined way a specific attribute target. We suggest that these general instructions and rules and the comparison between the opposite sets can open new perspectives to study the complexity of vision and its possible results within the same pattern of stimuli. They can allow us (i) to reveal the visual gradient of perceptibility and the levels of perceptibility, (ii) to study the profile of the gradient of perceptibility and the relationship among visual levels, here only hinted at, (iii) to understand how stimuli organize themselves not only in space but also in time during the phenomenological exploration to determine or change their visual gradient of perceptibility (perceptual contamination), (iv) to study how the gradient of perceptibility change and under which conditions (the laws), (v) to explore the limits of linguistic descriptions within the complexity of visual results that include nameless or meaningless objects in the linguistic sense. Apart from these perspectives, the most important result here obtained is that the phenomenological exploration complements the psychophysical investigation making it easier to understand the complexity of vision and discover new phenomena. Acknowledgements Supported by Finanziamento della Regione Autonoma della Sardegna, ai sensi della L.R. 7 agosto 2007, n. 7, Fondo d’Ateneo (ex 60%) and Alexander von Humboldt Foundation. I thank the Guest Editor and the Reviewers for their suggestions that greatly improved the paper. References Bozzi, P. (1989). Fenomenologia sperimentale. Il Mulino, Bologna, Italy. Bozzi, P. (1990). Fisica ingenua. Oscillazioni, piani inclinati e altre storie: studi di psicologia della percezione. Garzanti, Milano, Italy. Da Pos, O. and Zambianchi, E. (1996). Visual Illusions and Effects. Guerini, Milan, Italy. Fechner, G. (1860/1966). Elements of Psychophysics, Vol. 1. Holt, Rinehart and Winston, New York, USA. Hermann, L. (1870). Eine Erscheinung simultanen Contrastes, Pflügers Archiv für die gesamte Physiologie 3, 13–15. Kanizsa, G. (1980). Grammatica del vedere. Il Mulino, Bologna, Italy.
What comes before psychophysics?
211
Kanizsa, G. (1991). Vedere e pensare. Il Mulino, Bologna, Italy. Kersten, D. (1999). High-level vision as statistical inference, in: The New Cognitive Neurosciences, M. S. Gazzaniga (Ed.). MIT Press, Cambridge, MA, USA. Kersten, D., Mamassian, P. and Yuille, A. (2005). Object perception as Bayesian inference, Ann. Rev. Psychol. 55, 271–304. Knill, D. C. and Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural coding and computation, Trends Neurosci. 27, 712–719. Knill, D. C. and Richards, W. (1996). Perception as Bayesian Inference. Cambridge University Press, Cambridge, UK. Koffka, K. (1935). Principles of Gestalt Psychology. Routledge and Kegan Paul, London, UK. Kristof, W. (1961). Versuche mit der Waagrechten Strecke-Punkt-Figur, Acta Psychologica 18, 17–28. Mach, E. (1914/1959). The Analysis of Sensation. Open Court, Chicago, USA. Mamassian, P., Landy, M. S. and Maloney, L. T. (2002). Bayesian modelling of visual perception, in: Probabilistic Models of the Brain: Perception and Neural Function, R. P. N. Rao, B. A. Olshausen and M. S. Lewicki (Eds), pp. 13–36. MIT Press, Cambridge, MA, USA. Masin, S. C. (Ed.) (1993). Foundations of Perceptual Theory. Elsevier, Amsterdam, The Netherlands. Massironi, M. (1998). Fenomenologia della percezione visiva. Il Mulino, Bologna, Italy. Metzger, W. (1941). Psychologie: die Entwicklung ihrer Grundannhamen seit der Einführung des Experiments. Steinkopff, Dresden, Germany. Metzger, W. (1975). Gesetze des Sehens. Kramer, Krankfurt-am-Main, Germany. Oppel, J. J. (1854–1855). Über Geometrisch-optische Täuschungen, Jahresbericht des Physikalischen Vereins zu Frankfurt am Main, pp. 37–47. Pinna B. (2008). The illusion of art, in: Art and Perception. Towards a Visual Science of Art, Part 2, B. Pinna (Ed.), pp. 1–32. Brill Academic Publishers, Leiden, The Netherlands. Pinna, B. (2010). New Gestalt principles of perceptual organization: an extension from grouping to shape and meaning, Gestalt Theory 32, 1–67. Pinna, B. and Albertazzi, L. (in press). From grouping to visual meanings: a new theory of perceptual organization, in: Information in Perception, L. Albertazzi, G. van Tonder and D. Vishwanath (Eds). MIT Press, Cambridge, MA, USA. Pinna, B. and Reeves, A. (2009). From perception to art: how the brain creates meanings, Spatial Vision 22, 225–272. Rao, R. P. N., Olshausen, B. A. and Lewicki, M. S. (2002). Probabilistic Models of the Brain: Perception and Neural Function. MIT Press, Cambridge, MA, USA. Schumann, F. (1900). Beiträge zur Analyse der Gesichtswahrnehmungen. Zur Schätzung räumlicher Grössen, Zeitschrift für Psychologie und Physiologie der Sinnersorgane 24, 1–33. Spillmann, L. (2009). Phenomenology and neurophysiological correlations: two approaches to perception research, Vision Research 49, 1507–1521. Thinès, G. (1977). Phenomenology and the Science of Behaviour. An Historical and Theoretical Approach. Allen and Unwin, London, UK. Vicario, G. B. (2001). Psicologia generale: i fondamenti. GLF editori Laterza, Roma, Italy. Vicario, G. B. (2005). Il tempo: saggio di psicologia sperimentale. Il Mulino, Bologna, Italy. von Helmholtz, H. (1866). Handbuch der Physiologischen Optik, Part III. Voss, Leipzig, Germany. Wertheimer, M. (1922). Untersuchungen zur Lehre von der Gestalt, I. Psychologische Forschung 1, 47–58. Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II. Psychologische Forschung 4, 301–350.
Index
A adaptation, 25 adaptive testing, 87 aesthetics, 183 apparent motion, 63
L length, 25 light, 25 logarithmic transform, 7
B Bayesian inference, 87 beauty, 183
M memory trace, 121 method of limits, 63 motion quartets, 63
C context, 121, 155 criterion setting theory, 121, 155
N natural images, 39 neural dynamics, 63
D decision criterion, 155 depth perception, 121 detection, 7 detection instability, 63 differential coupling, 7 discrimination, 7, 25 discrimination learning, 121
O oblique effect, 121, 155 orientation anisotropy, 121, 155
F Fechner, 183 Fechner’s law, 7, 25 form, 183 G Gestalt psychology, 193 H history, 183 hysteresis, 63
P parallel law, 25 perceptual organization, 193 phenomenology, 193 psychophysical methods, 87 psychophysics, 63, 193 R ratings, 39 S sensory integration, 155 shape, 183 signal detection theory, 7 T transducer function, 39
214
V V1 model, 39 visual illusions, 193
Index
W Weber’s law, 25 weight, 25