VIRTUAL AND ADAPTIVE ENVIRONMENTS Applications, Implications, and Human Performance Issues
VIRTUAL AND ADAPTIVE ENVIRONMENTS Applications, Implications, and Human Performance Issues
Edited by
Lawrence J. Hettinger Michael Haas Air Force Research Laboratory, Wright-Patterson AFB
LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS London
2003 Mahwah, New Jersey
Senior Acquisitions Editor: Editorial Assistant: Cover Design: Textbook Production Manager: Full-Service Compositor: Text and Cover Printer:
Anne Duffy Kristin Duch Kathryn Houghtaling Lacey Paul Smolenski TechBooks Hamilton Printing
This book was typeset in 10/12 pt. Times, Italic, Bold, and Bold Italic. The heads were typeset in Helvetica, Helvetica Italic, and Helvetica Bold.
C 2003 by Lawrence Erlbaum Associates, Inc. Copyright All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, retrieval system, or any other means, without prior written permission from the publisher.
Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, New Jersey 07430
Library of Congress Cataloging-in-Publication Data Virtual and adaptive environments : applications, implications, and human performance / edited by Lawrence J. Hettinger, Michael Haas. p. cm. Includes bibliographical references and index. ISBN 0-8058-3107-X (alk. paper) 1. Virtual computer systems. 2. Human-computer interaction. I. Hettinger, Lawrence J. II. Haas, Michael W. QA76.9.V5V56 005.4 3—dc21
2003 2002151264
To my mother—for guiding me toward a life devoted to the pursuit of interesting and helpful work. To Dean H. Owen and Robert S. Kennedy—valued and respected mentors, in grateful thanks for all their many efforts on my behalf. L.J.H. To Patty, my wife—for being the love of my life, my partner, and my best friend. To Jon, my son—for sharing your perspectives, your creativeness, and your compassion. M.W.H.
Contents
Preface
1 Introduction
xi 1
Lawrence Hettinger and Michael Haas GENERAL ISSUES IN THE DESIGN AND USE OF VIRTUAL AND ADAPTIVE ENVIRONMENTS
2 Visual Perception of Egocentric Distance in Real and Virtual 3 4
5 6 7
Environments Jack Loomis Joshua Knapp A Unified Approach to Presence and Motion Sickness Jerrold Prothero and Donald Parker Transfer of Training in Virtual Environments: Issues for Human Performance Marc Sebrechts, Corinna Lathan, Deborah Clawson, Michael Miller, and Cheryl Trepagnier Beyond the Limits of Real-Time Realism: Moving from Stimulation Correspondence to Information Correspondence Pieter Jan Stappers, William Gaver, and Kees Overbeeke On the Nature and Evaluation of Fidelity in Virtual Environments Thomas Stoffregen, Benoit Bardy, L. J. Smart, and Randy Pagulayan Adapting to Telesystems Robert Welch
21 47
67
91 111 129
vii
viii
CONTENTS
VIRTUAL ENVIRONMENTS
8 A Tongue-Based Tactile Display for Portrayal of 9 10 11 12
13 14 15 16 17 18 19
Environmental Characteristics Paul Bach-y-Rita, Kurt Kaczmarek, and Mitchell Tyler Spatial Audio Displays for Target Acquisition and Speech Communications Robert Bolia and W. Todd Nelson Learning Action Plans in a Virtual Environment Simon Goss and Adrian Pearce Fidelity of Disparity-Based Stereopsis Ian Howard Configural Scoring of Simulator Sickness, Cybersickness, and Space Adaptation Syndrome: Similarities and Differences Robert S. Kennedy, Julie M. Drexler, Daniel E. Compton, Kay M. Stanney, D. Susan Lanham, and Deborah L. Harm A Cybernetic Analysis of the Tunnel-in-the-Sky Display Max Mulder, Henk Stassen, and J. A. Mulder Alternative Control Technology for Uninhabited Aerial Vehicles: Human Factors Considerations W. Todd Nelson, Timothy R. Anderson, and Grant R. McMillan Medical Applications of Virtual Reality Richard Satava, and Shaun Jones Face-to-Face Communication Nadia Magnenat Thalmann, Prem Kalra, and Marc Escher Integration of Human Factors Aspects in the Design of Spatial Navigation Displays Eric Theunissen Implementing Perception–Action Coupling for Laparoscopy Fred Voorhorst, Kees Overbeeke, and Gerda Smets Psychological and Physiological Issues of the Medical Use of Virtual Reality Takami Yamaguchi
169
187 199 219
247
279
303 325 345
369 391
413
ADAPTIVE ENVIRONMENTS
20 Supporting the Adaptive Human Expert: A Critical Element in 21
the Design of Meaning Processing Systems John Flach and Cynthia Dominguez A Human Factors Approach to Adaptive Aids Sylvain Hourlier, Jean-Yves Grau, and Claude Valot
433 461
CONTENTS
22 Adaptive Pilot/Vehicle Interfaces for the Tactical Air Environment 23
Sandeep S. Mulgund, Gerard Rinkus, and Greg Zacharias The Implementation of Psycho-Electrophysiological Interface Technologies for the Control and Feedback of Interactive Virtual Environments Alysia Sagi-Dolev
Author Index Subject Index
ix 483
525
541 559
Preface
The use of virtual and adaptive environments promises to revolutionize the ways in which humans live their daily lives. Virtual and adaptive environments are systems composed of humans, computers, and interface devices. They are very likely to significantly alter the behavioral landscape of work, recreation, education, and the manner in which people routinely communicate with and otherwise interact with one another. Their development is being approached with the goal of enhancing the effectiveness and safety with which various complex tasks such as medical procedures and the control of an increasingly crowded civil airspace can be executed. In other instances, its development is geared toward enabling the performance of tasks, such as telerobotic construction of extraterrestrial structures such as the space station, which might otherwise prove to be too logistically challenging to accomplish. In still other cases, such as the many entertainment applications of virtual environments that are currently being planned and developed, it is being pursued simply to promote a sense of fun and enjoyment—a compelling form of temporary escape and relief from the stresses and challenges of everyday life. Clearly, the full promise of virtual and adaptive environment technology will not be realized for quite a few years to come. There are, of course, many technical issues to be overcome. The ubiquitous head-mounted visual display, perhaps the signature piece of hardware in the pantheon of virtual environment devices, is still limited in its ability to fully replicate a sense of real-world perceptual experience. The development of haptic and tactile display technologies, while having made impressive strides in recent years, is still in the early days of its developmental maturity. We would continue to recite a litany of technical advances that are sorely needed to support the realization of these systems’ full potential, but in the end we would confidently conclude that they are all being addressed by skilled and xi
xii
PREFACE
innovative researchers. We share a sense of profound optimism in that the technical promise of virtual and adaptive systems will be realized, certainly in the 21st century. In our opinion, a possibly more challenging set of problems and issues revolves around the overlapping questions of (1) how our knowledge of human perception, cognition, and behavior can be expanded and applied in such a way as to most effectively support the development of these technologies, (2) how our knowledge of human perception, cognition, and behavior can be used to determine the fidelity level required when applying a virtual and adaptive environment, and (3) how will we use these technologies, once they are developed, in a way that promotes maximal efficiency and safety for the user and society as whole. Indeed, the fundamental assumption motivating the publication of this book is that these systems are first and foremost human-centered technologies, in that their purpose is to complement and extend human capabilities across a wide variety of domains. Therefore, the role that psychological scientists and their colleagues in related domains such as human factors engineering and the neurosciences must play in providing guidance for the design and deployment of these systems is vital. The contributions to this book illustrate the many ways in which psychological science is contributing to and benefiting from the increased development and application of these nascent systems. This book was inspired by our interaction with many colleagues over a period extending from 1992 to 2002. First and foremost among these in terms of their importance and simple pleasure as a medium of exploration and interaction was the collection of unique and remarkable scientists and technicians with whom we have worked in the Air Force Research Laboratory’s Synthesized Immersion Research Environment, located at Wright-Patterson Air Force Base in Ohio. These valued colleagues include Ken Aldrich, Kevin Bennett, Kenneth Boff, Robert Bolia, Douglas Brungart, Bart Brickman, Gloria Calhoun, Jeff Collier, Jeff Craig, Jeff Cress, Jim Cunningham, Mark Draper, William Dember, Leon Dennis, Andre Dixon, Brian Donnelly, John Flach, Scot Galster, Eric Geiselman, Glen Geisen, David Hoskins, Jenny Huang, Andrew Junker, Liem Liu, Grant McMillan, Matt Middendorf, Brian Moroney, Todd Nelson, Barbara Palmer, Herb Pick, Michael Poole, Dave Post, Dan Repperger, Gary Riccio, Merry Roe, Robert Shaw, Clar Sliper, David Snyder, Dean Stautberg, Tom Stoffregen, Rob Tannen, Lee Task, Rob Tricke, Lloyd Tripp, Mike Vidulich, Mark Visconti, Joel Warm, Glenn Wilson, and Greg Zacharias. It has been our great pleasure over the years to work with and learn from some of the finest scientists and engineers working in this field, including those listed above. We also wish to gratefully acknowledge the hard work and encouragement of Anne Duffy and Kristin Duch, both of Lawrence Erlbaum Associates, our publisher. Without their patience, pushing, prodding, and valued technical support it is not clear that this book would have ever been completed.
PREFACE
xiii
Finally, we wish to extend our most sincere thanks to each of the contributors to this book. We have been fortunate to assemble a collection of authors that consists not only of many immediately recognizable experts in the field of virtual and adaptive environments, but also many of the “up-and-coming” young researchers in these areas—those from whom much will be heard in the years to come. We extend our heartfelt thanks and admiration to them all.
1 Introduction Lawrence J. Hettinger Northrop Grumman Information Technology
Michael W. Haas United States Air Force Research Laboratory Wright-Patterson Air Force Base
This book is devoted to the exploration of psychological issues involved in the design and use of virtual and adaptive environments. Specifically, it is concerned with the examination of issues affecting the ability of human users to successfully interact with these emerging technologies, as well as the ability of the technologies themselves to enable users to achieve specific behavioral objectives. Another major theme of the book involves the use of these technologies for the empirical examination of human perception, cognition, and behavior in innovative and scientifically advantageous ways. As has frequently been noted (e.g., Durlach & Mavor, 1994; Picard, 1997; Stanney, 2002), these systems have the potential to significantly impact human behavior at the individual, group, and social levels. If properly designed and deployed, these systems could significantly enhance human performance and wellbeing across a wide variety of domains. However, if improperly designed and used, their positive impact will undoubtedly be greatly reduced. Indeed, in some cases they could have an appreciably negative impact on individual users and, by extension, society in general. Therefore, it is critical that we approach their design
1
2
HETTINGER AND HAAS
and use from a position of solid understanding of the human performance factors that will promote their most effective implementation.1 Our purpose in this introductory chapter is to set the stage for the many contributions that follow by providing a general introduction to the area of virtual and adaptive environment technology, and to provide an overview of several of the psychological issues involved in their design and use. We will also describe some of the emerging applications of these technologies across a number of applied and scientific domains. In doing so, we will draw on our own work in the design and evaluation of these systems for use in current and future aviation, naval, and medical systems, as well as the rapidly expanding literature concerned with similar issues in these and other application areas. Virtual and adaptive interface technologies represent fascinating and important new domains of human-centered technological research and development. These systems must rely heavily on knowledge about human performance, perception, and cognition in order to be effectively developed—knowledge that in many cases is only beginning to be uncovered, its discovery spurred in large part by the fascination with and the increased demand for these technologies. In turn, virtual and adaptive environments afford intriguing new means for examining human behavior, permitting the examination of many facets of human behavior within the context of increasingly realistic, yet controlled, experimental settings. There is an important reciprocity between virtual and adaptive environments and human factors engineering, psychological–cognitive science, and the other disciplines involved in their examination and development. On one hand, developers of these technologies look to experts from these domains for answers to important questions concerned with how humans can most effectively and safely interact with them. Many of the chapters in this book directly address issues of this type, such as Ian Howard’s chapter on disparity-based stereopsis, Nadra Magnenat Thalmann, Prem Kalra, and Marc Escher’s chapter on face-to-face communication, chapters on sickness in virtual environments by Robert S. Kennedy and his colleagues, as well was Jerrold Prothero and Donald Parker, and several others. The goal of much of this work is to provide human-centered design guidance to help develop safe and effective technologies. On the other hand, the technologies themselves provide a unique means for examining aspects of complex human behavior in realistic, yet experimentally controllable, settings. A number of chapters in this book address this facet of these new technologies. For example, Jack Loomis and Jashua Knapp utilize virtual 1 Clearly, psychological issues are just one among many different areas of concern that must be addressed if virtual and adaptive technologies are to be successfully developed for long-term, real-world applications. Besides the purely technical challenges, there are sociological, anthropological, and broader cultural, ethical, and philosophical implications of these technologies that are vitally important to address. However, this book concerns itself principally with psychological aspects of these technologies likely to most directly impact the behavior of individuals and small groups.
1.
INTRODUCTION
3
environments as an empirical environment for the study of visual space perception, Robert Bolia and W. Todd Nelson’s chapter illustrates how virtual auditory environments can expand our knowledge of spatial audition in real environments, and Robert Welch examines issues involved with adaptation to rearranged perceptual inputs in light of issues unique to virtual environment design and use. The chapters that have been gathered for this book represent a wide range of research and development interests. They were not selected to provide an encyclopedic overview of the area2; Rather, they are intended to provide a means for many of the leading researchers in this area from across the world to provide a sampling of their views on the relevant research and development issues, as well as their approaches to solving problems associated with the safe and effective design and use of these technologies. In a domain that is characterized by relatively rapid advances in technical capability and sophistication, the chapters in this book are intended to deal with comparatively durable issues involved in complex human-systems development, particularly those of relevance to virtual and adaptive environment systems.
DEFINING VIRTUAL AND ADAPTIVE ENVIRONMENTS Virtual and adaptive environments (VEs and AEs) are both highly innovative approaches to the design of human–machine systems. Each is intended to significantly expand and facilitate human interactions with computer-based technologies. VEs represent an approach to human–machine system design that seeks to produce (to varying degrees) a sense of “immersion” or “presence” within a computergenerated or synthetic environment. AEs, on the other hand, seek to establish a broad, symmetrical communications channel between a computer-based system and its user, enabling the former to detect the condition or current state of the latter and thereby adjust its own activity to facilitate the attainment of some specific behavioral goal (e.g., Bennett, Cress, Hettinger, Stautberg, & Haas, 2001; Hettinger, Branco, Encarnacao, & Bonato, in press; Scallen & Hancock, 2001). AEs need not be coupled with highly immersive, computer-generated multisensory environments. Indeed, most current approaches to the development and use of AE technology concern the implementation of adaptive algorithms to enhance the performance of otherwise relatively conventional human–machine interface concepts (e.g., see Scerbo et al., 2001, for a comprehensive review of issues involved 2 Readers are encouraged to consult Stanney (2002) and Durlach and Mavor (1994) for excellent, wide-ranging overviews of the virtual environment domain, and Picard (1997) and Scerbo and colleagues (2001) for overviews of issues concerned with adaptive interfaces and “affective computing.” Additionally, special issues of The International Journal of Aviation Psychology devoted to multisensory displays and adaptive interfaces have recently been published (Haas & Hettinger, 2001; Hettinger & Haas, 2000).
4
HETTINGER AND HAAS
in the identification and assessment of psychophysiological variables involved in the design and use of adaptive automation). However, the potential benefits of combining virtual and adaptive interface technology, perhaps in combination with alternative control technologies such as brain-actuated or gesture-based control, afford a number of very intriguing possibilities for supporting human performance as well as examining human behavior (e.g., Brickman, Hettinger, Haas, & Dennis, 1998; Haas & Hettinger, 1993; Nelson, Anderson, & McMillan, this volume; Nelson et al., 1997). In many ways, virtual and adaptive environments represent two very distinct approaches to the design of human–machine interfaces. For instance, VE technology is primarily geared toward developing highly compelling and intuitive sensorimotor interfaces (i.e., sets of multisensory displays and multimodal controls) intended to simplify interactions between humans and complex computer-based systems. VEs emphasize, to varying degrees, the use of immersive, multisensory displays that simulate key aspects of information specific to real or imaginary environments. In turn, users interact with these environments by means of a number of more or less natural control mechanisms working in concert with body-motion detection systems (e.g., head-motion detectors, gestural interfaces, eye-motion detectors, etc.).3 The desired end product is a synthetic, computer generated environment that responds in a natural and predictable way to relatively intuitive user inputs and normal body-motion activities. AEs, on the other hand, rely on a more purely computational, “behind the scenes” approach to facilitating human–machine system performance. Specifically, AEs are based on the collection and application of behavioral, biological, and/or psychophysiological information about the user, as well as data about the situation in which the human–machine system is immersed. This information is processed in real time to draw reliable inferences about the current state of the user’s condition in order to dynamically alter and improve the nature of the information and control characteristics of the human–machine system. The goal of AE systems can be thought of as providing the means to improve human performance under relatively consistent or somewhat diminished performance conditions (e.g., high stress and high cognitive workload) as well as to support the maintenance of performance within a relatively constant behavioral envelope in the face of dramatically deteriorating conditions (e.g., significant loss of spatial and/or situational awareness and the presence of multiple high-priority tasks).4 Technically, therefore, there are significant differences between these two approaches to human–machine system design. However, VEs and AEs have a number 3 See
Foxlin (2002), Wilder, Hung, Tremaine, and Kaur (2002), and Turk (2002) for general overviews of these important technical considerations. 4 It may seem somewhat counterintuitive to consider devoting effort to the design of a system whose goal is to maintain performance at a constant level. However, in the face of sufficient deterioration in the condition of the user and/or his situation, maintaining human-system output at a constant level may indeed be a challenging and important accomplishment (Jones & Kennedy, 1996).
1.
INTRODUCTION
5
Immersive
Intuitive Presence
Transparent
FIG. 1.1. The combination of intuitive, immersive, and transparent elements of multisensory displays contributes to the sense of presence in virtual environments.
of very important psychological similarities. Both technologies are concerned with producing human–machine interfaces that are often described as more “intuitive,” “immersive,” and “transparent” than traditional designs. Each of these descriptors has begun to appear with increasing frequency in recent years in discussions of human–computer interface design (e.g., Hettinger et al., in press; Negroponte, 1996), and there is a substantial degree of overlap in their meaning. Indeed, the area of overlap might be considered as the domain of “presence” (see Fig. 1.1), the behavioral and phenomenal sense of fidelity in perception and action that may occasionally accompany and characterize individuals’ experience in VEs and AEs. Each of these three descriptors denotes what are generally assumed to be positive aspects of cognitive and perceptual experience not commonly associated with the use of complex technologies. Each refers to an essentially subjective, and in many cases perhaps even subconscious, phenomenon that may in some instances positively impact human performance in the use of otherwise complex systems.5 In nearly all instances of contemporary user-centered design of human–machine technologies, the attainment of one or more of these attributes is given particularly high priority, even if it is not always clear how to objectively define them or how to empirically assess the degree to which they are present or absent within a given system and what benefit, if any, they might bestow on its use. Given the importance 5 The validity of this assumption is an issue that has yet to be satisfactorily addressed within the VE
and AE communities and bears a very strong resemblance to similar, earlier controversies involved in determining the utility of “realism” for the training of complex skills in simulators (e.g., Allen, Hays, & Bufardi, 1986; Carter & Laughery, 1985). One often hears, particularly in the medical VE domain, that “realism” is critical to its success, particularly as a training medium. However, to our knowledge no empirical justification of this important (and costly) assumption has ever been offered.
6
HETTINGER AND HAAS
of these concepts (which frequently take the form of design goals and/or success criteria) for VE and AE systems, it seems appropriate to explore their meaning in somewhat more depth. Intuitive Interfaces The quest to develop human–machine systems that are easier to learn, use, and maintain has been perhaps the most dominant theme of human factors engineering since its inception. Originally an almost exclusive concern of the armed forces, in recent years the concept of “usability” has also become a concern of product designers and manufacturers in domains as diverse as personal computing, telecommunications, medical technology, and home appliances. In all cases, the basic theme is the same—in order to enhance the probability of safe, easy, and effective use of technical systems, the design of the human–machine interface must take appropriate account of the normal perceptual, cognitive, and physical capabilities and limitations of the targeted user population. One recurring aspect of this design philosophy has been to the attempt to produce “intuitive” interfaces. An intuitive interface can be defined as an ensemble of controls and displays whose design enables users to quickly and accurately apprehend the rules governing its use while also enabling them to perceive and understand the effects of their actions when using it. Additionally, an intuitive interface should afford a clear understanding of the nature and potential consequences of errors when they are committed while also clearly specifying the actions that need to be taken to correct those errors. Therefore, in many respects, the notion of the intuitive interface is one of the most fundamental concepts within human factors engineering, and its constraints apply as directly to VE and AE systems as they do to any other human– machine system. What sets VE and AE technology somewhat apart, however, is that they are conceived as means for accomplishing the design of intuitive human– machine systems across a wide variety of applications. For example, by immersing users within a synthetic environment that provides directly perceivable and readily comprehensible representations of processes occurring within a complex system, VE designers hope to enable users to attain a more effective understanding of how to successfully interact with it. Similarly, by devising an AE algorithm that enables a computer-based system to automatically factor the nature of the user’s dynamically changing state and situation into its operation, designers hope to develop human–machine technologies whose ability to adapt to the needs of the user in real time will significantly simplify their operation. Donald Norman, in discussing elements of good and bad design (from a human usability perspective) provides a number of criteria for intuitive designs, including providing users with the capability to clearly determine the relationships between: (1) their behavioral intentions and the possible actions that can be taken with the system to fulfill those intentions, (2) their actions and subsequent effects on the
1.
INTRODUCTION
7
system, and (3) the current and projected status of the system as a function of the needs and intentions of the user (Norman, 1988, p. 199). In general, the concept of the intuitive interface goes well beyond the simple notion of “user friendliness” to include the broader idea of effectively incorporating elements of design that promote a functional synergy between users’ perceptual and cognitive capabilities and strategies and rules for system operation. Simply put, when using an “intuitive” system, the user can clearly perceive what needs to be done to fulfill his intentions, can maintain an accurate awareness of the operation of the system with relative ease, and can quickly and accurately diagnosis and solve problems that might arise in its use. The many problems associated with the use of nonintuitive human–machine systems range from relatively simple, annoying inconvenience to deadly peril, and have been well documented over the years (e.g., Casey, 1993; Perrow, 1984; Reason, 1997). Simply put, human–machine systems whose design fails to take into account the inherent perceptual and cognitive capabilities and limitations of the intended user population can occasionally place those users at substantial risk of committing errors in their use. A major assumption underlying the drive to develop VE and AE systems is that they will help to achieve more intuitive designs, the former through more effective representation of information and execution of control activities, and the latter by enabling computer-based systems to more effectively anticipate the changing needs of the user over time. The degree to which they are currently able to do so, and their continued progress in this regard, is still very much an open question of considerable importance and forms the subject matter of much of this book. Transparent Interfaces There are many methods that can be used in the attempt to produce intuitive interfaces. For instance, designers commonly make use of time-honored techniques and tools such as color coding, functional grouping of related displays, and other similar methods. These approaches are based on a design metaphor that envisions the human–machine interface as what might best be described as a working surface that exists between the user and the system being operated on. This working surface consists of information about the system and control means for exerting influence over its activities. When designing systems with this metaphor in mind we hope to make that surface as functional as possible by factoring in what we know about how humans normally think about and perceive the world, solve problems, and make decisions. We also strive to take into account what we know about the errors that humans are prone to commit and the conditions that promote them, and we attempt to design those types of features out of the system. This is a design metaphor that has existed for decades, and much work is still being done to develop and generalize its rules and applications.
8
HETTINGER AND HAAS
However, a new design metaphor underlies the means by which VEs and AEs seek to achieve high levels of intuitive functionality—the “transparent interface” metaphor. The goal of this approach to design is to produce human–machine interfaces whose appearance and operation are so natural and undemanding in terms of the ongoing expenditure of the user’s perceptual and cognitive energy, that the phenomenal and behavioral consequence is almost as if there were no interface present at all. In other words, the interface is functionally transparent in terms of the level of effort required to operate it—attention can instead be solely devoted to the ongoing nature of the underlying processes themselves, rather than being consumed by the intermediary operations required to monitor and control them. Clearly, the immersive sensorimotor characteristics of VEs and the enhanced human–computer communication capabilities of AEs that enable systems to respond to the real-time needs of the user provide a good set of raw materials with which to begin the approach toward such an idealistic design. A transparent interface, therefore, is a set of displays and controls whose operation and use is so natural and effortless (in terms of the expenditure of perceptual and cognitive effort) that one might almost forget that one is interacting with a set of displays and controls at all. In other words, the use of the system is so natural and intuitive that it is almost as if the interface was not even there, the user having achieved a high-level, functional connection with the real object of attention—the underlying process being monitored and controlled. In most current cases, transparency is less a function of good design and more a function of (1) the level of a user’s familiarity with the technology and (2) the level of their involvement with the task they are currently performing. Nearly everyone at some point in their working life has experienced the sense of being so caught up and involved in performing a given task that one’s phenomenal awareness of the interfaces mediating that experience are, for the time being, nonexistent. One of the principle goals of VE and AE technology is to promote the ease with which such a functional state can be achieved and maintained. Immersive Interfaces VE and AE systems each provide a means of promoting the development of intuitive and transparent interfaces. However, each attempts to achieve this goal by implementing different technical solutions. VE systems rely heavily on sensorimotor immersion to accomplish this end. In other words, in many applications the user is literally surrounded by computer-generated multisensory imagery. For example, in a “fully immersed” virtual environment, the vast majority of a user’s sensory inputs are computer-generated; that is, essentially all of their visual and auditory inputs are synthetic in nature.6 Immersion, for VE technology, is the principal 6 Even in a fully immersed VE, however, users still have many nonvirtual sources of sensory information available to them. For instance, it is possible to attend to vestibular, tactile, haptic and
1.
INTRODUCTION
9
Human User
Signal Extraction and Processing
Displays and Controls
Situational Environment
FIG. 1.2. Schematic overview of basic adaptive interface architecture.
technical means by which designers attempt to produce intuitive and transparent human–machine interface concepts. The term immersion is not nearly as often applied to the AE domain, but in our opinion it is just as relevant a concept. However, with AE technology, “immersion” does not refer to the compelling sensorimotor qualities of VE systems that seek to surround the user in a synthetic (yet behaviorally and perhaps even phenomenally realistic) environment. Rather, it refers to immersing the user in a more direct, closed-loop interaction with the computer-based system, one in which the computer is equipped to detect meaningful changes in the user’s condition. With AE systems, the type of immersion that may occur is more cognitive in nature, as opposed to the perceptual–motor immersion of VE systems. A well-designed AE system should produce a sense in the user of being directly immersed in an ongoing “conversation” of sorts with the computer—one in which the computer-based system is able to more effectively “listen” and attend to the real-time needs of the user. This is accomplished by means of implementing a closed-loop connectivity between the user and computer-based system in which the latter detects and processes situationally relevant information from the user, and employs it to effect adaptive changes in the nature of the human–machine interface (see Fig. 1.2). Immersion, at least in the case of VE systems, may not be a universally positive feature. There is strong reason to believe that the immersive characteristics of VEs may be among the very features that lead to problems associated with postadaptation effects and the many motion sicknesslike signs and symptoms that other sources of information specifying the real-world context within which the virtual world exists. However, it is also possible for sensory information derived from the VE to strongly influence and interact with that derived from the real world, as when visually specified motion in a VE causes observers to experience strong postural disturbances (e.g., Stoffregen, Hettinger, Haas, Roe, & Smart, 2000).
10
HETTINGER AND HAAS
can frequently occur among users (e.g., Hettinger & Riccio, 1992, Kennedy et al., this volume; Welch, this volume). Therefore, any benefits of VE immersion may occasionally be offset by such negative side effects. Presence The nature, definition, psychometric status, and utility of the concept of presence in virtual environments have been among the most durable points of contention in the VE domain since its inception. The first issue of the most prestigious technical journal in the area, Presence: Teleoperators and Virtual Environments, featured a forum entitled “The Concept of Telepresence” (e.g., Held & Durlach, 1992; Zeltzer, 1992), and the controversy has hardly died down since. We propose that presence is comprised of the behavioral and phenomenal sense of fidelity in perception and action that may occasionally accompany and characterize individuals’ experience in VEs and AEs. It is most importantly manifest as functional and effective human performance, and secondarily as a phenomenal sense of reality or suspension of disbelief that characterizes the operator’s ontological relationship to the technical system with which he is interacting. We further propose that presence exists at the intersection of the human–machine interface design attributes described above and as illustrated in Figure 1.1. Slater and Steed (2000) provide a valuable recent overview of the concept of presence, as well as efforts to lend some degree of precision to its definition and empirical reliability to its assessment. Similarly, Sadowski and Stanney (2002) have recently summarized the state of current thinking and empirical research related to this concept. The reader is referred to these sources for excellent discussions of the many issues and disputes that have been raised in this area within the past decade. Despite its central importance as a topic of debate and investigation in the research and development community, it is still far from clear whether or not presence is a necessary or even important attribute of an effective VE or AE human– machine system. Ultimately, the importance of presence (and the technical features that underlie it) depends on the extent to which it provides a significant advantage in the ability of users to achieve their behavioral goals in the use of a system. In other words, presence is ultimately simply another attribute of these systems that may or may not support the larger, more important goals of whether or not the system can accomplish its end goals with respect to the user. The fundamental questions involved in the design of these systems should not be how “realistic” it is or how much of a sense of “presence” it imparts. Rather, the important questions should be related to the degree to which the system achieves its task-specific goals (e.g., How well does this device train doctors to perform heart surgery? How well does this system enable a user to functionally control a physically distant mechanical system?). In our opinion, presence is only meaningful from a design standpoint if
1.
INTRODUCTION
11
it substantially contributes to the more important behavioral goals of the system. As a design end in itself, it could in many cases prove to be a seductive waste of time and resources.
THE ROLE OF PSYCHOLOGICAL SCIENCE IN THE DESIGN AND USE OF VIRTUAL AND ADAPTIVE ENVIRONMENTS Psychology and its related disciplines (e.g., neuroscience and human factors engineering) have a key role to play in the development of virtual and adaptive environment technology. These systems are intended to significantly augment and support human behavior, and as such it is logical that their design should be based on a firm understanding of the factors that drive and influence targeted behaviors at the individual and group levels. Virtual and adaptive environments, perhaps more than any other class of emerging technologies, rely directly on the development of an expanding and advanced database of knowledge regarding human perception, cognition, and behavior to support their development if they are to exert a meaningful, positive impact on society. The large number of “applied” psychological questions involved in the development of these technologies is stimulating many interesting research and development programs (examples of which are described throughout this book) whose investigations are not only providing important technical guidance for system development, but which are also making significant contributions to the scientific knowledge base of factors underlying human behavior. However, the appearance of these technologies is also clearly supporting the emergence of a more behaviorally complex and ecologically valid laboratory-based research paradigm within the psychological sciences. Therefore, the advent of VE and AE technology is stimulating growth in the psychological sciences in at least two ways. First, psychologists and human factors engineers involved in the development of these technologies are being confronted with real-world engineering and design questions (e.g., What do people pay attention to in complex driving tasks and what do I need to include in a virtual world to effectively support that behavior? Why do people sometimes become sick and disoriented in VE systems?) that often cannot be efficiently addressed (if indeed they can be at all) by means of traditional theories of human perception, cognition, and performance and their frequently reductionist research methods. Therefore, researchers are being forced to look at these questions in new and innovative ways, and in so doing they are making significant contributions to psychological science. Second, for several decades a significant number of psychologists, many of them strongly influenced by the ecological theory of James Gibson (e.g., Gibson, 1966, 1979) have long been arguing for greater ecological validity in psychological
12
HETTINGER AND HAAS
research. In other words, these researchers have been arguing for some time that a truly useful science of psychology must be able to account for real-world aspects of human behavior and not simply the often artifactual types of behaviors characteristics of the psychological laboratory. The emergence of VEs and AEs greatly expands the possibilities now open to these researchers to conduct research with high ecological validity while still maintaining the type of experimental control needed to produce reliable scientific results. As previously noted, there is a vital reciprocity between the technologies discussed in this book and psychological science. On the one hand, the development of the technologies themselves will rely heavily on input from psychological science. The engineering and design realities of the situation demand that researchers provide answers at a more functional, real-world level of complexity then they are perhaps accustomed to doing. These real-world questions are stimulating tremendous growth in the methods and knowledge base of psychology. Meanwhile, the technologies themselves afford the opportunity for researchers to examine behavioral issues in a more ecological (and technologically) relevant way then has been possible to date. Indeed, the emergence of these technologies will enable psychological researchers to conduct research that can be expected to greatly advance our knowledge of many issues that relate to human performance in realistic and complex environments. This is primarily due to the fact that VEs and AEs can both be configured to enable experimental subjects to perform tasks with a high level of ecological validity, but also under the types of controlled conditions that will enable researchers to draw inferences about the impact of variables of interest. As Fig. 1.3 illustrates, VEs and AEs will, in our opinion, promote the production of advanced knowledge concerning human perception, cognition, and behavior as (1) a direct product of pure psychological research performed with these technologies as experimental media, and (2) as a by-product of research and development efforts primarily geared toward the advancement of the technologies themselves.7 The results of the types of research efforts described above will clearly feed directly into the design of VE and AE systems. However, a different class of problems faced by system designers involves how to design systems “from the ground up” in such a way that the knowledge generated by these empirical investigations is taken into consideration while at the same time effectively accounting for the unique performance requirements of a particular system. Toward this end, human factors engineers have developed a variety of techniques that can be described under the general heading of user-centered design. We now turn to this area as a means of suggesting mechanisms for effectively incorporating knowledge (and empirical techniques) from psychological science into the design and testing of prototype VE and AE systems. 7 This is analogous to the growth of basic knowledge concerning the effects of time delay on human
performance that arose as the result of very applied concerns with the deleterious effects it might have on human performance in flight simulators.
1.
INTRODUCTION
13
Applied Systems Research & Development
Virtual and Adaptive Environment System Development
Enhanced Knowledge of Human Perception, Cognition, and Behavior in Complex Environments
Fundamental Research on Human Perception, Cognition, and Behavior
FIG. 1.3. An illustration of the multifaceted, reciprocal relationships between applied and fundamental psychological research and virtual and adaptive environments.
User-Centered Design To the extent that psychology concerns itself directly with the development of novel technical systems, it does so by contributing knowledge about the human user. Ideally, this knowledge will result in the production of systems that “work better” because they are better accommodated to the perceptual, cognitive, and motor requirements of the user and her task. Although there are as many unique perspectives on this design philosophy as there are designers and design problems, to the extent that they focus first and foremost on the requirements of the user they are commonly referred to as user-centered design techniques. Elements of user-centered design include any of a number of techniques that are intended to help system developers design systems that take into account the capabilities and limitations of the user population. In some cases the greatest emphasis may be placed on the user’s comfort and sense of satisfaction (e.g., using a personal computer purely for entertainment purposes), and in other cases the emphasis may be on optimizing critical aspects of human performance, such as speed and accuracy of task performance (e.g., performing air combat tasks in a fighter cockpit). Physical and psychological (cognitive and perceptual) criteria are both taken into account. The user-centered design process places critical importance on the role of the user throughout the design process. In work that we have performed to develop VE and AE concepts for use in future fighter cockpits, experienced fighter pilots
14
HETTINGER AND HAAS
have always served as the most important members of our multidisciplinary design teams (Brickman et al., 1998). When designing new and complex human–machine systems, the role of the user is vital as a source of information about the realworld constraints of performing specific tasks that the emerging technology is intended to support. Users possess vital knowledge about the limitations of current human–machine systems, the aspects of task performance which if unsatisfactorily supported could result in the most serious consequences, factors that differentiate skilled from unskilled performance and the role of played by human–machine systems in both, and a host of other information that is central to successful design efforts. A key aspect of the user-centered design approach involves the frequent conduct of human performance testing as a central element of the design process. A key assumption of this design philosophy is that the criteria determining the success or failure of novel human–machine systems should be based on human performance metrics as opposed to engineering metrics. Further, the periodic assessment of human performance during the design process, particularly with novel human– machine concepts such as VE and AE technology, is considered to be vital as a means of assuring the efficient management of the development process. Fundamentally, the user-centered design philosophy is an essential “risk reduction” technique, guarding against the possibility that significant research and development efforts might result in an end product that is of little or no value to the end user. In summary, we have tried to illustrate in this chapter some of the ways in which psychological science is vital to the development of VE and AE technologies. At the same time, we have attempted to demonstrate the relevance of these emerging systems as exciting new experimental media for the generation of more purely fundamental types of knowledge concerning human perception, cognition, and performance. Each of the remaining 22 contributions to this volume provide fascinating illustrations of these general points, and in so doing provide a broad sampling of the psychological issues involved in the design and use of virtual and adaptive environments.
OVERVIEW OF THE BOOK This book has three major sections. The first, “General Issues in the Design and Use of Virtual and Adaptive Environments,” is devoted to discussions of general theoretic and human performance issues that relate to the development of systems of this type:
r Jack Loomis and Joshua Knapp discuss issues involved in the perception of egocentric distance (the distance from an observer to a target) and with the more general issue of perception of scale in virtual environments. They describe the challenges of performing such research using VE systems and
1.
r
r
r
r
r
INTRODUCTION
15
compare studies performed in the real world with those performed in the virtual world. Jerrold Prothero and Donald Parker discuss the utility of the “rest frame hypothesis” notion as a means of understanding causal elements of motion sickness–like symptoms in VEs, as well as general consideration of spatial perception in real and virtual environments. Marc Sebrechts, Corinna Lathan, Deborah Clawson, Michael Miller, and Cheryl Trepagnier discuss a variety of human performance issues involved in the utility of VEs as a training medium, with particular concentration on issues related to transfer of skills learned in the virtual world to the real world. An issue of particular relevance in their discussion involves the role played (or not played) by perceptual fidelity in promoting a system’s training objectives. Pieter Jau Stappers, William Gaver, and Kees Overbeeke also examine questions regarding the relevance of traditional notions of perceptual realism and fidelity in VE systems design. They provide a Gibsonian perspective to guide effective analysis and identification of information requirements that may ultimately prove to be a far more efficient (and cost-effective) means of designing virtual environments then simply attempting to mimic reality in all its many details. Thomas Stoffregen, Berat Bardy, L. J. Smart, and Randy Pagulayan examine several basic issues involved in the simulation, including VEs. Their primary interest is examining issues related to the evaluation of fidelity in simulators and VEs, particularly with regard to the nature of presence. Finally, Robert Welch describes and discusses the advantages of a number of approaches to aiding users as they attempt to adapt to the rearranged perceptual motor characteristics of “telesystems” (i.e., VEs, augmented reality systems, and teleoperator systems). The intent is to enable users to overcome the deleterious perceptual–motor side effects that often accompany the use of these systems.
The second major section of the book primarily concentrates on issues involved solely in the design and use of VE systems:
r Paul Bach-y-Rita, Kurt Kaczmarek, and Mitchell Tyler describe an innovative
r r
approach to information display that makes use of the tongue as its primary interface with the human body. The relevance of such a display technology for supporting the behavior of blind individuals is described, along with general issues involved in the “sensory substitution” domain. Robert Bolia and Todd Nelson discuss issues involved in the use of spatial audio displays for the support of target acquisition performance, as well as the facilitation of communication. Simon Goss and Advian Pearce explore the topic of how individuals learn action plans in virtual environments. Their interesting contribution describes
16
HETTINGER AND HAAS
r
r
r
r
r
r
r
r
a combination of adaptive and virtual environment applications in which the exploratory behavior of an individual within a virtual environment is monitored and compared to models of possible behaviors and activities. Ian Howard discusses the area of disparity-based stereopsis as visual information for depth in VE displays, a topic of considerable theoretical importance in terms of questions surrounding the foundations of visual depth perception, as well of practical importance to the effective design of visual displays that can successfully support depth perception in virtual worlds. Robert S. Kennedy, Julie M. Drexler, Daniel E. Compton, Kay M. Stanney, D. Susan Lanham, and Deborah L. Harm have been engaged in the examination of simulator sickness and sickness in virtual environments for many years. Their chapter examines the similarities and differences between these two types of motion sickness phenomena and suggests diagnostic means for assessing the tendencies of individual systems to create such problems in their users. Max Mulder, Henk Stassen, and J. A. Mulder present an empirical analysis of an innovative use of an augmented reality concept referred to as a tunnelin-the-sky display. The goal of this display is to provide 4-D information to aircraft pilots in as intuitive and natural a format as possible. W. Todd Nelson, Timothy R. Anderson, and Grant R. McMillan discuss the area of alternative control technologies. Briefly described above, the use of novel control technologies such as EEG-based control and gesture-based control is of great intrinsic interest in itself. However, its combination with VE technologies affords very interesting possibilities as a means of greatly expanding the functionality of these systems. Richard Satava and Shaun Jones provide an overview of one of the most important areas of applied development of VE systems—the medical domain. Emerging medical applications of VE technology range from procedural training and certification to surgical rehearsal systems and image-guided surgery techniques. Nadra Magnenat Thalmann, Prem Kalra, and Marc Escher describe their work in the development of techniques to facilitate face-to-face communications in virtual environments. The authors describe a unique technical approach to the solution of some of the many complex issues involved in supporting functional communications among individuals within VE settings. Erik Theunissen presents a discussion of human factors issues involved in the design of egocentric spatial navigation displays. Specially, he concentrates on the representational aspects of these displays, the factors that should influence the choice of a frame of reference, and a method to relate the magnitude of task-related visual cues to a set of design parameters. Fred Voorhorst, Kees Overbeeke, and Gerda Smets describe an important area of research whose goal is to enhance the performance of laparoscopic surgery, a medical technique whose importance and frequency of use has significantly increased in recent years. Specifically, the authors examine ways
1.
r
INTRODUCTION
17
of enhancing the functional utility of laparoscopic technology (and, by extension, other VE-related medical technologies) by taking into account normal human perception–action relationships. Takami Yamaguchi describes a variety of technical and human factors issues related to the design of a virtual medical clinic known as the Hyper Hospital. Using a variety of behavioral and psychophysiological measures of users’ response to prototype systems, he illustrates how a functional design can be achieved through systematic testing of user response.
The final section of the book deals with issues more exclusively related to the design and use of adaptive environments:
r John Flach and Cynthia Dominguez introduce a “meaning processing” frame-
r
r
r
work whose intent is to support humans as adaptive controllers of complex systems. In addition, the principles of ecological interface design are discussed as a means of adequately making available critical properties of specific, dynamic work domains. Sylvian Hourlier, Jean-Yves Grau, and Claude Velot discuss research issues involved in the design and use of adaptive pilot–vehicle interfaces for use in fighter aircraft. Specifically, they address a number of design issues and approaches from the perspective of a theory of “situated cognition” in which the focus of study is the human user as observed within the constraints of the particular situation in which he or she must overcome largely unpredictable problems and obstacles to reach a particular behavioral goal. Sandeep S. Mulgund, Gerard Rinkus, and Greg Zacharias also present a discussion of adaptive pilot–vehicle interface concepts for use in fighter aircraft. They describe a research project examining the utility of fuzzy networks and belief logic as a means of effectively handling the large sets of rapidly changing data that will necessarily characterize the use of adaptive interfaces in environments as dynamic as air combat. Alysia Sagi-Dolev provides a very useful overview of concepts and techniques used in obtaining psychophysiological signals from human users of adaptive environment systems. As the author notes, one of the key challenges in developing useful AE concepts is developing the ability to obtain valid and reliable physiological indices of user state in as unobtrusive a manner as possible.
REFERENCES Allen, J. A., Hays, R. T., & Buffardi, L. C. (2001). Maintenance training simulator fidelity and individual differences in transfer of training. In R. W. Sweezy & D. H. Andrews (Eds.), Readings in training and simulation: A 30-year perspective (pp. 272–284). Santa Monica, CA: Human Factors and Ergonomics Society.
18
HETTINGER AND HAAS
Bennett, K. B., Cress, J. D., Hettinger, L. J., Stautberg, D., & Haas, M. W. (2001). A theoretical analysis and preliminary investigation of dynamically adaptive interfaces. The International Journal of Aviation Psychology, 11, 169–196. Brickman, B. J., Hettinger, L. J., Haas, M. W., & Dennis, L. B. (1998). Designing the supercockpit: Tactical aviation and human factors for the 21st century. Ergonomics in Design, 6, 15–20. Carter, R. J., & Laughery, K. R. (1985). Nuclear power plant training simulator fidelity assessment. In Proceedings of the Human Factors and Ergonomics Society 29th Annual Meeting (pp. 1017–1021). Norfolk, VA: Human Factors and Ergonomics Society. Also reprinted in R. W. Sweezy & D. H. Andrews (Eds.). (2001). Readings in training and simulation: A 30-year perspective. Santa Monica, CA: Human Factors and Ergonomics Society. Casey, S. M. (1993). Set phasers on stun and other true tales of design, technology, and human error. Santa Barbara, CA: Aegean. Durlach, N. I., & Mavor, A. S. (1994). Virtual reality: Scientific and technical challenges. Washington, DC: National Academy Press. Foxlin, E. (2002). Motion tracking requirements and technologies. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum Associates. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Haas, M. W., & Hettinger, L. J. (1993). Applying virtual reality technology to cockpits of future fighter aircraft. Virtual Reality Systems, 1(2), 18–26. Haas, M. W., & Hettinger, L. J. (2001). Current research in adaptive interfaces (Preface to special issue). The International Journal of Aviation Psychology, 11, 119–121. Held, R. M., & Durlach, N. I. (1992). Telepresence. Presence: Teleoperators and Virtual Environments, 1, 109–112. Hettinger, L. J., Branco, P., Encarnacao, M., & Bonato, P. (in press). Neuroadaptive technologies: Applying neuroergonomics to the design of advanced interfaces. Theoretical Issues in Ergonomics. Hettinger, L. J., & Haas, M. W. (2000). Current research in advanced cockpit display concepts (Preface to special issue). The International Journal of Aviation Psychology, 10, 227–229. Hettinger, L. J., & Riccio, G. E. (1992). Visually induced motion sickness in virtual environments. Presence, 1, 306–10. Jones, M. B., & Kennedy, R. S. (1996). Isoperformance curves in applied psychology. Human Factors, 38, 167–182. Kuhlen, T., Kraiss, K., & Steffan, R. (2000). How VR-based reach-to-grasp experiments can help to understand movement organization within the human brain. Presence: Teleoperators and Virtual Environments, 9, 350–359. Negroponte, N. (1996). Being digital. New York: Knopf. Nelson, W. T., Anderson, T. R., & McMillan, G. R. (2002). Alternative control technology for uninhabited air vehicles: Human factors considerations. In Hettinger & Haas (current volume). Nelson, W. T., Hettinger, L. J., Cunningham, J. A., Roe, M. M., Haas, M. W., & Dennis, L. B. (1997). Navigating through virtual flight environments using brain-body-actuated control. In Proceedings of the 1997 Virtual Reality Annual International Symposium (pp. 30–37). Albuquerque, NM: IEEE Computer Society Press. Norman, D. A. (1988). The psychology of everyday things. New York: Basic Books. Perrow, C. (1984). Normal accidents: Living with high-risk technologies. New York: Basic Books. Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT Press. Prothero, J., & Parker, D. (2002). A unified approach to presence and motion sickness. In Hettinger & Haas (this volume). Reason, J. (1997). Managing the risks of organizational accidents. Aldershot, England: Ashgate. Sadowski, W., & Stanney, K. M. (2002). Presence in virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum Associates.
1.
INTRODUCTION
19
Scallen, S. F., & Hancock, P. A. (2001). Implementing adaptive function allocation. International Journal of Aviation Psychology, 11, 197–221. Scerbo, M. W., Freeman, F. F., Mikulka, P. J., Parasuraman, R., Nocera, F., & Prinzel, L. J. (2001). The efficacy of psychophysiological measures for implementing adaptive technology. (Tech. Ref. No. NASA TP-2001-211018). Hampton, VA: NASA Langley Research Center. Slater, M., & Steed, A. (2000). A virtual presence counter. Presence: Teleoperators and Virtual Environments, 9, 413–434. Stanney, K. M. (Ed.). (2002). Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum Associates. Stoffregen, T. A., Hettinger, L. J., Haas, M. W., Roe, M. M., & Smart, L. J. (2000). Postural instability and motion sickness in a fixed-base flight simulator. Human Factors, 42, 458–469. Turk, M. (2002). Gesture recognition. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 223–238). Mahwah, NJ: Lawrence Erlbaum Associates. Wilder, J., Hung, G. K., Tremaine, M. M., & Kaur, M. (2002). Eye tracking in virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 211–222). Mahwah, NJ: Lawrence Erlbaum Associates. Zeltzer, D. (1992). Autonomy, interaction, and presence. Presence: Teleoperators and Virtual Environments, 1, 127–132.
PART I: General Issues in the Design and Use of Virtual and Adaptive Environments
2 Visual Perception of Egocentric Distance in Real and Virtual Environments Jack M. Loomis∗ Joshua M. Knapp Department of Psychology University of California
Even after decades of research, visual space perception remains the subject of active investigation, indicating that it is indeed a challenging problem. Even at a functional level, we are still far from fully understanding some fundamental issues, such as the mapping between physical and visual space (visually perceived space), the connection between visual space and action, and which aspects of visual stimulation are most important in determining the structure of visual space. In the absence of a functional-level understanding, it is hardly surprising that our understanding of the underlying physiological mechanisms lags further behind. Nowhere is our lack of understanding more apparent than when one attempts to synthesize realistic virtual environments using computer graphics; most challenging in this regard is making large-scale vistas and structures appear as immense as their real-world counterparts. * Correspondence regarding this chapter should be addressed to Jack M. Loomis, Department of Psychology, University of California, Santa Barbara, CA 93106. E-mail:
[email protected]
21
22
LOOMIS AND KNAPP
The primary interest in visual space perception, at least through most of its history, has been with its phenomenology—why the visual world appears the way it does. Within this context, visual space is the visually based component of the phenomenal world, which is the totality of conscious experience (Brain, 1951; Koffka, 1935; Loomis, 1992; Russell, 1948; Smythies, 1994). Research on the structure of visual space and the processes underlying it has been pursued both for its own sake and out of the belief that visual space is a primary determinant of spatial behavior. The most sustained and concerted effort to understand the phenomenology of visual space can be found in the research of Walter Gogel, and the broadest exposition of this understanding can be found in two of his more recent works (Gogel, 1990, 1993). In the last couple of decades, two lines of research have begun to challenge the assumption that consciously experienced visual perception is causally linked to action. The first of these is research within the ecological tradition, initiated by James Gibson (1958, 1966, 1979) and subsequently pursued by many others (e.g., Flach & Warren, 1995; Lee, 1980, 1993; Turvey, 1977; Turvey & Carello, 1986; Turvey & Remez, 1979; Warren, 1990; Warren & Wertheim, 1990). A widespread assumption within this tradition is that it is possible to understand spatial behavior in terms of control by very specific aspects of visual stimulation (e.g., the pattern of global radial outflow associated with observer translation) without the need to posit any mediating internal representation, such as visual space. The second of these is research showing that the processes underlying certain forms of action appear to be distinct from those processes giving rise to consciously experienced visual space (e.g., Bhalla & Proffitt, 1999; Bridgeman & Huemer, 1998; Bridgeman, Kirch, & Sperling, 1981; Creem & Proffitt, 1998; Goodale & Humphrey, 1998; Milner & Goodale, 1995; Proffitt, Bhalla, Gossweiler, & Midgett, 1995; Weiskrantz, 1986, 1990). Although these two lines of research indicate that a consciously experienced 3-D representation of nearby objects may not be necessary for the control of certain types of action, other research indicates just the opposite (e.g., Philbeck & Loomis, 1997; Philbeck, Loomis, & Beall, 1997). If future research reveals that much of spatial behavior is not controlled by conscious visual perception, then the rationale for investigating the phenomenology of visual space will be substantially undermined. Without prejudging the outcome, however, we are confident that phenomenology will continue to remain an important topic, if only for its intrinsic interest. Moreover, developers of virtual environments will wish for a better understanding of the topic, for the perceptual realism of virtual environments is of great importance to users in a wide variety of applications involving entertainment, architecture, aesthetics, social interaction, and more. Accordingly, much of the rest of this chapter is concerned with consciously experienced visual perception, with an emphasis on distances beyond the 2-m limit of “personal space” (Cutting & Vishton, 1995).
2.
PERCEPTION OF EGOCENTRIC DISTANCE
23
PERCEPTUAL VARIABLES AND THEIR COUPLINGS An important perceptual variable in any theory of visual space is that of perceived location, which in turn comprises the perceptual variables of perceived direction and perceived egocentric distance, both of which are defined with the observer as origin. Another variable, perceived exocentric distance, usually refers to the perceived separation of two points along a common visual direction (Foley, 1980; Gogel, 1977, 1990) but can be generalized to refer to the perceived separation between any two points in visual space. There is evidence indicating that perceived separation is determined by more than just the perceived locations of the points defining the exocentric interval (e.g., Foley, Ribeiro-Filho, & Da Silva, 2001; Gogel, 1977; Levin & Haber, 1993; Loomis, Da Silva, Philbeck, & Fukusima, 1996). Other important perceptual variables are perceived size, perceived shape, perceived object motion, and perceived displacements of the observer. Gogel’s “theory of phenomenal geometry” takes perceived direction, perceived egocentric distance, and perceived displacements of the observer as its primitives, with these other variables being derivative (Gogel, 1990, 1993). Throughout much of the history of visual space perception, it has been recognized that certain perceptual variables covary with one another (Epstein, 1982; Gogel, 1984; Sedgwick, 1986). This coupling may sometimes be the result of joint determination by common stimulus variables that naturally covary, but there is evidence that variation in one perceptual variable can directly affect the value of another perceptual variable (Epstein, 1982; Gogel, 1990, 1993; Oyama, 1977) even in the absence of any stimulus change (e.g, in objects that sometimes undergo spontaneous depth reversals; Turner & Braunstein, 1995). Similarly, it is likely that this coupling occurs in the total absence of visual stimulation, as in visual hallucinations (Zubek, Pushkar, Sansom, & Gowing, 1961). The best-known coupling is that between perceived size and perceived egocentric distance and is referred to as the size–distance invariance hypothesis (Gilinsky, 1951; Kilpatrick & Ittelson, 1953; McCready, 1985; Sedgwick, 1986). As depicted in Fig. 2.1a, size–distance invariance is the relationship between perceived size (S’) and perceived egocentric distance (D’) for a given visual stimulus of angular size α, as expressed by this equation: S’ = 2D’ tan (α/2)
(1)
(In generalized versions of size–distance invariance, the constant 2 in Equation 1 is replaced by an observer constant, and the physical value of α is replaced by a perceptual value, α’.) Another coupling of perceptual variables occurs when either a stimulus object or the observer translates. In this case, perceived displacement of the target
24
LOOMIS AND KNAPP
FIG. 2.1. (a) The size–distance invariance hypothesis. A visual stimulus of angular size α, when perceived at a distance D’, appears to have a size S’. (b) The apparent distance–pivot distance hypothesis. A stationary point stimulus at pivot distance Dp will appear to move as the head moves if its perceived distance D’ is different from its pivot distance. Here, the target is depicted as appearing at a distance D’, which is twice Dp . Assuming that the perceived displacement (K’) of the head is equal to the physical displacement (K), the target in this case will appear to move through a distance of W’ in a direction opposite to the head.
is coupled to its perceived egocentric distance, according to the apparent distance–pivot distance hypothesis (Gogel 1982, 1990, 1993; Gogel & Tietz, 1992). Gogel has treated the general case at length, but to simplify the exposition here, we consider the special case of a translating observer and a stationary target point (Fig. 2.1b). Thus, if an observer’s head translates smoothly through a distance K as the observer views a point at pivot distance Dp and the observer perceives the head displacement correctly (i.e., K’ = K), the point will appear to move smoothly through a displacement of W’ when it is perceived to be at a distance D’, according to the following equation: W’ = K’(1-D’/Dp )
(2)
So, for example, if the point appears to be twice as far away as its pivot distance (i.e., D’ = 2 Dp ), it will appear to move through a displacement equal to the perceived head displacement, but in a direction opposite to the head (W’ = −K’). Besides this simple case of a single-point stimulus, Gogel (1990) has shown how this analysis applies to 3-D reversible objects, like a concave mask that appears convex. In this case, depth reversal of the mask, which is presumably the consequence of an observer tendency to see facelike objects as convex, results in apparent motion of
2.
PERCEPTION OF EGOCENTRIC DISTANCE
25
the stationary mask concomitant with translation of the observer’s head. Gogel’s analysis of motion perception under conditions of observer translation is highly relevant for understanding perception and performance in virtual environments, for misperception of distance ought to result in misperception of the motion (or lack thereof) of virtual objects that are simulated to be moving or stationary while the observer is in motion.
EFFECTIVENESS OF DISTANCE CUES A great deal of research has been concerned with the effectiveness of the various distance cues in contributing to the perception of egocentric and exocentric distance (for review and analysis, see Baird, 1970; Cutting & Vishton, 1995; Foley, 1980; Gogel, 1977; Howard & Rogers, 2002; Sedgwick, 1986). To see how different cues might contribute differently to space perception, it is instructive to consider the consequence of rescaling a rich environment consisting of many surfaces and objects, such that an observer’s eye receives exactly the same projective imagery before and after the rescaling. An example would be monocularly viewing a town from a stationary helicopter at an altitude of 200 m and monocularly viewing a perfect 1/100 scale model of the same from an altitude of 2 m, provided that the spatial distribution of illumination is also appropriately scaled. It is evident that all of the various static perspective cues relating to spatial layout (relative size, texture gradient, linear perspective, height in the field, and shading) are invariant with the rescaling, and therefore uninformative about scale (and egocentric distance). To the extent that these perspective cues are effective in specifying perceived layout, they must do so independently of scale (Loomis & Philbeck, 1999). If the eye translates through space at a speed proportional to the scale (i.e., in eye heights/sec), the resulting optic flow (motion parallax) is also informative about the shape of the layout but uninformative about its scale. Aerial perspective (change in the clarity and color of objects due to scattering within the intervening medium) increases with distance and is thus informative about scale, but not invariantly so because atmospheric conditions (haze, fog, etc.) greatly influence the degree of aerial perspective. The oculomotor cues of accommodation and convergence are absolute distance cues, but are only effective within several meters and thus do not permit the discrimination of larger scale environments. Binocular disparity, which results from the fixed lateral separation of the two eyes, is a relative distance cue, but it does depend on scale; only in conjunction with absolute distance cues can it specify exocentric distance along a line of visual direction (Foley, 1980). Its effectiveness, however, in contributing to the perception of metric distance is limited probably to no more than 100 m, although it must contribute to the perception of depth order well past 200 m. From the foregoing, it is clear that when there are no constraints on the observer’s viewing circumstances (e.g., altitude unknown, speed unknown, atmospheric
26
LOOMIS AND KNAPP
conditions unknown), the observer has no information about absolute scale other than that provided by the oculomotor cues and binocular disparity (at smaller scales). Conversely, when certain assumptions are met, more can be known about scale and distance. For example, under the assumption that a familiar object is of normal size, its angular size can, in principle, be used to establish its egocentric distance. Similarly, if the observer is viewing an object on the ground plane from normal eye height, height in the field (angular elevation) of the object can, in principle, be used to infer its egocentric distance (Sedgwick, 1986) and, by size–distance invariance, its size. Alternatively, the object’s vertical angular extent relative to the horizon can be used to determine its size (Sedgwick, 1980, 1986). Finally, if the observer knows his or her translational speed, the absolute motion parallax of an object known to be stationary can, in principle, be used to determine the object’s size and distance. Although some research has investigated the effectiveness of these various cues (e.g., familiar size: Gogel, 1976; Gogel & Da Silva, 1987; angular elevation: Ooi, Wu, & He, 2001; Philbeck & Loomis, 1997; vertical extent in relation to the horizon: Wraga, 1999a, 1999b; absolute motion parallax: Beall, Loomis, Philbeck, & Fikes, 1995; Ferris, 1972; Gogel & Tietz, 1979; Philbeck & Loomis, 1997), the extent to which these cues determine perceived egocentric distance is still far from settled.
FOCAL AWARENESS, SUBSIDIARY AWARENESS, AND PRESENCE IN VIRTUAL ENVIRONMENTS An important issue relevant to perception in virtual environments concerns the observer’s awareness of the virtual environment as a representation. Both Pirenne (1970) and Polanyi (1970) have treated this issue at some length in the context of representational paintings and photographs; by extension, their analysis also applies to television and cinema. These artifacts, like other forms of representation, have a dual character—they are both objects in 3-D space and representations of other 3-D spaces. When the observer has visual information about the location and orientation of the picture surface, the observer has “focal awareness” of the represented scene and “subsidiary awareness” of the picture surface (Polanyi, 1964, 1970). There is evidence that awareness of the 2-D picture surface interferes with the perceived three-dimensionality of the represented scene (Pirenne, 1970); in particular, a small depiction of a large object results in a perceived object of intermediate size. Perhaps for this reason, it is much easier to induce a perception of very large objects in large-projection displays than in the typical CRT display (Yang, Dixon, & Proffitt, 1999). Besides the perceptual conflict between the depicted scene and the representing medium, there is evidence that when one is viewing a 3-D scene, the suggestion that one is instead viewing a depiction, as conveyed by a surrounding “picture frame,” reduces the amount of perceived depth within the scene (Eby & Braunstein, 1995).
2.
PERCEPTION OF EGOCENTRIC DISTANCE
27
Virtual desktop systems that use conventional CRTs are like pictures and paintings in that the CRT surface is well localized in space by the observer. As such, the perceptual and cognitive awareness of the display qua display is likely to conflict with the intended awareness of the represented environment (Yang et al., 1999). The use of a restrictive aperture in conjunction with monocular viewing (Smith & Smith, 1961) or the use of stereoscopic display techniques can greatly reduce awareness of the display surface, thus reducing the perceptual conflict. The use of immersive virtual displays further reduces awareness of the display surface. One way of achieving immersion is the CAVE technique, in which the observer in enclosed within a cube of multiple screens projecting stereoscopic imagery. The more common technique is the use of head tracking in conjunction with head-mounted displays (HMDs). HMDs virtually eliminate all visual cues about the location and orientation of the constituent displays (e.g., LCDs or CRTs) so that the only visual cues available to the observer are those about the represented environment. Either way, immersion technology gives the observer the impression of being surrounded by the computer generated. This sense of immersion, especially when coupled with realistic imagery and a high degree of interactivity between the observer and the virtual environment, promotes “presence” or the experience of “being in” the virtual environment (Barfield, Zeltzer, Sheridan, & Slater, 1995; Heeter, 1992; Held & Durlach, 1992; Lombard & Ditton, 1997; Loomis, 1992; Slater, Usoh, & Steed, 1994; Zahorik & Jenison, 1998). As noted by Loomis (1992), complete presence is tantamount, in Polanyi’s analysis, to complete focal awareness of the simulated environment (i.e., with no subsidiary awareness of the virtual display system). As Yang and colleagues (1999) have demonstrated with their experiments on the vertical–horizontal illusion, perception of virtual environments under conditions of minimal subsidiary awareness is subject to exactly the same analysis as perception of real environments.
MEASURING PERCEIVED EGOCENTRIC DISTANCE IN REAL ENVIRONMENTS Visually perceived location and its constituents, perceived direction and perceived egocentric distance, are aspects of conscious experience. As such, they cannot be measured directly. Instead, measurement can proceed only from a theory that relates measures based on observers’ responses to the phenomenological variables of interest. Given the widespread belief that visual direction is perceived quite accurately, most of the research has centered on the measurement of perceived egocentric distance. Even today, this effort likely remains the biggest challenge in the field of visual space perception. Measures of perceived egocentric distance are wide ranging. The most straightforward and easiest to obtain are those based on direct judgment of perceived distance and a subsequent response that reflects that judgment. Most common are procedures based on numerical estimates. Here the observer estimates the
28
LOOMIS AND KNAPP
distance either in familiar distance units (e.g., feet or meters) or as a multiple of some given perceived extent and then communicates the resulting estimate (e.g., by speech or keyboard input to a computer). Verbal communication of numerical judgments that are expressed in familiar units of measurement is referred to as verbal report. Another common direct estimation method involves expressing the estimate by way of some open-loop motor behavior, such as reaching without sight of the hand to the perceived visual location of a target or walking without vision to the location of a previously viewed target. Numerical estimation requires that observers have internalized the unit of measurement, and motoric responding assumes that the response is calibrated to perception, at least over some limited range of distance. Equally important for the validity of these methods is the assumption that the observer’s responses are driven by perception alone, uncontaminated by what the observer knows (Carlson, 1977; Gogel, 1976). This assumption, especially in connection with numerical estimation, is likely not to be true in general. For example, objects seen from a great distance appear undersized, but an observer, when asked to report the objective size and distance of a familiar object, can still provide reasonably accurate estimates using knowledge and inference, instead of perceptual appearance (Gogel & Da Silva, 1987). However, when familiar size is eliminated as a cue, there is more reason to believe that verbal reports are informative about perceived distance (e.g., Da Silva, 1985; Foley et al., 2001; Loomis, Klatzky, Philbeck, & Golledge, 1998; Philbeck & Loomis, 1997), but circumspection is always warranted. Open-loop motoric responding has been extensively used for measuring visually perceived distance in real environments. Some of the earliest uses were in connection with very short distances (ball throwing to targets within a room: Smith and Smith, 1961; pointing to targets within arm’s reach: Foley, 1977, 1985; Foley & Held, 1972). More recently, visually directed action by a locomoting observer has come into use for the measurement of larger perceived egocentric distances. One of these, mentioned above, involves walking to a previously perceived target without further perceptual information about its location. The results of such experiments under full-cue viewing generally indicate the absence of systematic error when walking to targets up to 20 m away (Elliot, 1987; Elliot, Jones, & Gray, 1990; Loomis, Da Silva, Fujita, & Fukusima, 1992; Loomis et al., 1998; Rieser, Ashmead, Talor, & Youngquist, 1990; Sinai, Ooi, & He, 1998; Steenhuis & Goodale, 1988; Thomson, 1980, 1983). The results of many of these experiments are summarized in Fig. 2.2; the data of the different experiments have been displaced vertically for purposes of clarity. It might be thought that the finding of accurate performance is not the result of accurate perception of egocentric distance but is simply a consequence of the calibration process accompanying everyday perceptual–motor activity. A problem with this calibration hypothesis is that observers rarely walk blindly to previewed targets more than 3 m away. Stronger evidence against the calibration hypothesis
2.
PERCEPTION OF EGOCENTRIC DISTANCE
29
FIG. 2.2. Summary of results using visually directed walking. The data from the different studies have been displaced vertically for purposes of clarity. The dashed line in each case represents correct responding. Sources: (1) from Elliott (1987), (2) average of two groups of observers from Experiment 1 of Loomis et al. (1998), (3) from Experiment 2b of Loomis and colleagues (1998), (4) from Experiment 1 of Loomis et al. (1992), (5) from Thomson (1983), (6) from Rieser and colleagues (1990), (7) from Steenhuis and Goodale (1988), and (8) from Experiment 2a of Loomis and colleagues (1998).
comes from experiments in which the observer’s response is less tightly coupled to the target distance. In one such experiment, Thomson (1983) showed the observer a target on the ground ahead, after which the observer began walking toward it without vision. At some unpredictable location during the walk, the observer was signaled to stop and then throw a bean bag the rest of the way to the target. Even with such a two-component response, observers performed with nearly the same accuracy as when walking the full distance. Other evidence against the calibration hypothesis comes from research using triangulation methods. In triangulation by pointing (Fig. 2.3a), the observer views a target and then walks blindly along an oblique path while attempting to continue pointing in the direction of the previously viewed and now imaginally updated target (Fukusima, Loomis, & Da Silva, 1997; Loomis et al., 1992). The terminal
30
LOOMIS AND KNAPP
FIG. 2.3. (a) Triangulation by pointing. The observer points at the target with eyes open while standing at Location 1 and then translates with eyes closed to Point 2 while updating the image of the previously viewed target. The pointing direction of the arm at Point 2 can be used to triangulate the position of the initially perceived target under the assumption that the only error in the terminal pointing direction is associated with initial perception of the target. (b) Triangulation by walking. The observer looks at the target with eyes open while standing at Location 1 and then translates with eyes closed to Point 2 while updating the image of the previously viewed target. The observer’s terminal heading (or terminal walking direction after a turn toward the target at Point 2) can be used to triangulate the position of the initially perceived target under the assumption that the only error in this terminal heading (or walking direction) is associated with initial perception of the target. (Adapted from Fig. 1 in Fukusima, Loomis, & Da Silva, 1997, (p. 87)).
pointing direction is used to triangulate the initially perceived target location and, hence, its perceived distance from the viewing location. In triangulation by walking (Fig. 2.3b), the observer views a target and also then walks blindly along an oblique path. At some unanticipated location, the observer is instructed to turn and face the target (Knapp, 1999) or begin walking toward the target (Fukusima et al., 1997). The terminal heading (facing direction) or course (travel direction) after the turn is used to triangulate the initially perceived target location and, hence, its perceived distance from the viewing location. In another variant of triangulation by walking, the observer walks blindly along an oblique path, turns on command, and then attempts to walk the full distance to the target (Loomis et al., 1998; Philbeck, Loomis, & Beall, 1997). Because the directional responses of the observer toward the target, following a traverse along an oblique path, are not likely to have been previously calibrated by open-loop behavior, the evidence is strong that observers
2.
PERCEPTION OF EGOCENTRIC DISTANCE
31
FIG. 2.4. Summary of results using triangulation by pointing, triangulation by walking, and walking along direct and indirect paths to the perceived target, all obtained under full-cue conditions. The data from the different studies have been displaced vertically for purposes of clarity. The dashed line in each case represents correct responding. Sources: (1) from the outdoor experiment on triangulation by walking of Knapp (1999), (2) average of two conditions from Experiment 3 (triangulation by walking) of Fukusima and colleagues (1997), (3) from Experiment 3 (direct and indirect walking) of Loomis and colleagues (1998), (4) average of two conditions from Experiment 4 (triangulation by walking) of Fukusima and colleagues (1997), (5) from the experiment on direct and indirect walking by Philbeck and colleagues (1997), and (6) average of two conditions from Experiment 2 (triangulation by pointing) of Fukusima and colleagues (1997).
accurately perceive visual target distances up to at least 15 or 20 m away, as can be seen in the summary of results in Fig. 2.4. If these measurements based on visually directed action are indeed reflecting perceived egocentric distance, they ought to result in a pattern of systematic error when distance cues are diminished, as observed using other methods (e.g., Gogel & Tietz, 1979). Philbeck and Loomis (1997) compared verbal report and visually directed walking under different conditions of distance cue availability. Self-luminous targets of constant angular size were viewed in either a dark or a
32
LOOMIS AND KNAPP
FIG. 2.5. Mean indicated distance (using verbal report and visually directed walking) as a function of target distance in four different viewing conditions. Error bars represent standard error of the mean. (Reprinted from Philbeck & Loomis, 1997).
lighted room (full cues) and were positioned either at eye level or on the ground, the latter condition providing the cue of angular elevation. The results are shown in Fig. 2.5 and support a number of conclusions: (1) the similarity of verbal and motoric responses, (2) accurate responding under full cues, and (3) the importance of angular elevation for egocentric distance perception, as evident in the comparison of the two darkroom conditions. Out of concern that numerical estimation procedures are subject to contamination by what the observer knows or can infer about target position, Gogel and his associates have developed “indirect” procedures for measuring perceived
2.
PERCEPTION OF EGOCENTRIC DISTANCE
33
egocentric distance (Gogel, 1976, 1979; Gogel & Newton, 1976). These indirect measurements involve the observer judging perceptual variables other than perceived egocentric distance, variables thought to be less subject to cognitive intrusion. The first of these procedures involves the judgment of object size and the size–distance invariance hypothesis, discussed earlier (also see Fig. 2.1a). Under this hypothesis, Equation 1 indicates that if the perceived size, S’, of a visual stimulus of angular size α is judged by an observer, perceived egocentric distance, D’, can be solved for. The second indirect procedure involves the judgment of perceived motion and the apparent distance–pivot distance hypothesis, also discussed earlier (also see Fig. 2.1b). Under this hypothesis, Equation 2 indicates that if the perceived displacement, W’, of a point stimulus is judged by an observer whose head undergoes a displacement of K, the perceived egocentric distance, D’, can be solved for, provided that K’ (perceived head displacement) is equal to K. Gogel, Loomis, Newman, and Sharkey (1985) performed an experiment evaluating the congruence of the two measures of perceived egocentric distance. In a full-cue laboratory situation, they independently manipulated binocular and motion parallax cues, among others, thus producing variations in perceived distance from about 1 to 3 m. Figure 2.6 shows the computed measures of D’ using the two different response measures, averaged over observers, for a variety of target stimuli
FIG. 2.6. Congruence of two indirect measures of perceived egocentric distance, one based on perceived size (S’) and the other based on perceived displacement of a target (W’) during a lateral movement of the head. The data points represent different combinations of distance cues. (Source: Fig. 5 from Gogel, Loomis, Newman, & Sharkey, 1985, (p. 24)).
34
LOOMIS AND KNAPP
varying in terms of the manipulated distance cues. As can be seen in the figure, the two different procedures resulted in estimates of D’ that were highly correlated (r = .98); however, the estimates based on size were approximately 20% larger than those based on perceived displacements. Lest it be thought that the two judgments were really of the same perceptual variable, the reader should note that perceived size judgments grow in magnitude as the perceived target moves away from the observer (Fig. 2.1a), whereas judgments of perceived displacement grow in magnitude as the perceived target moves away from the pivot distance of the target (Fig. 2.1b). The congruence of the two measures of perceived egocentric distance is a small part of the strong empirical base supporting Gogel’s theory of phenomenal geometry (Gogel, 1990). The common observation that a very distant object tends to look much smaller than it is physically is consistent with the size–distance invariance hypothesis when the object is perceived to be much closer than it is (Gogel & Da Silva, 1987). Contrasting with this is the geometric analysis of Sedgwick (1980), showing that the vertical extent of an object resting on the ground, when judged relative to the visible or implicit horizon, provides information about its actual size. For example, the top of an object, which is equal in height to the observer’s eye height, will coincide with the visible horizon; similarly, an object twice as high as the observer’s eye height will extend as much above the horizon as its base lies below. This optic rule for judging height implies that judged object size ought to be invariant with egocentric distance; this would seem especially so for objects that are the same size as the observer’s eye height, for optical coincidence between object’s top and the visible horizon unambiguously specifies constant size. Yet, objects viewed from very far away appear very much smaller even under these conditions. It would thus appear that using angular height of an object to judge its size, while grounded in the rules of perspective, fails to determine perceived size. It is precisely for reasons such as this that Gogel and his colleagues have developed indirect measures of perceived egocentric distance that are more immune to cognitive intrusion. How then do the various methods of measuring perceived egocentric distance, of which these are the primary ones, agree with one another, especially in connection with egocentric distances beyond several meters? Da Silva (1985) has reviewed the results of many studies on perceived distance in indoor and outdoor environments. However, none of these studies involved direct comparisons of the different methods mentioned above. Given the huge variation in target distances, visual context, and method, the variable results make generalizations about distance perception difficult. As mentioned earlier, Philbeck and Loomis (1997) found that verbal report and visually directed walking were in quite close agreement in terms of mean values of perceived egocentric distance (Fig. 2.5), with the verbal reports exhibiting slightly greater variability. Loomis and colleagues (1998) observed the same result for larger target distances (out to 16 m) in an outdoor environment, both for visual perception and auditory perception. Although these two studies indicate that
2.
PERCEPTION OF EGOCENTRIC DISTANCE
35
verbal report and visually directed action result in similar estimates of perceived egocentric distance, it is unlikely that the two responses always tap the same judgmental process. Clearly, verbal estimation is appropriate in some situations where visually directed action is nonsensical. For example, an observer is likely to find it reasonable to estimate the depicted distance of an object seen in a photograph viewed normally, whereas the same observer would find it unusual to be asked to perform a visually directed action with respect to the object. The indirect procedures discussed above have yet to be used for measuring perceived egocentric distance of targets at large distances in natural environments. Gogel’s “head motion procedure” has so far only been studied in the laboratory for short egocentric distances; it remains to be seen whether it can be extended to measurement of much larger distances using larger excursions of the head than used until now (50 cm or less). The indirect procedure using judgments of size seems somewhat more promising given that prima facie it should be applicable to all target distances. However, making judgments of size in the context of natural outdoor scenes seems likely to engage cognitive processing nearly to the same extent as direct estimation of egocentric distance does, thus possibly defeating the very intent of it as an indirect method.
PERCEPTION OF VERY LARGE SCALE Closely related to the perception of egocentric distance is the perception of scale. The difference is one of emphasis, with egocentric distance being associated with a single target and scale being associated with an entire scene. The issue here concerns the basis on which an observer perceives the difference between a complex visual scene and copies of the scene varying only in terms of scale. In particular, we raise the very puzzling question of what perceptual and cognitive processing is involved when a person experiences immensity in the visible environment. When we view a huge object within a natural scene (e.g., the Eiffel Tower, the St. Louis Gateway Arch, or El Capitan in Yosemite Valley), we are impressed with the enormity of such an object. Even large cloud formations viewed from an airliner (where binocular cues and motion parallax information are useless for specifying scale) can appear immense. Although it is likely that we perceive huge objects as smaller than they actually are, the fact remains that we commonly experience immensity in the natural world. Although the experience of immensity is commonplace, it has been neglected by perceptual researchers, perhaps because, in our everyday stance of naive realism (Loomis, 1992), we simply attribute it to the immensity of the objects themselves. This is, of course, no explanation, and we must instead seek to determine the informational basis leading the observer to such an experience. The next time that the reader is in the presence of a huge object, observe that the object continues to look immense with monocular viewing and stationary head, even with hands cupped to limit the field of view. These observations raise the
36
LOOMIS AND KNAPP
question of what stimulus information supports the experience of immensity. If one were to close one’s eyes for a moment and a small-scale model were moved into place, would one continue to experience immensity, or would subtle visual cues lead immediately to a different perceptual experience? We do not have the answer to this, but we have done unpublished research on the role of prior visual stimulation on perception of size and have found evidence that observers interpret momentary visual stimulation in the context of an existing perceptual model. However, our work scarcely touches the surface of this interesting question. We raise it here, for it is of fundamental importance to the implementation of effective virtual environments. Although some uses of virtual environments may not depend at all on a proper rendering of scale, the phenomenology of visual space cannot be underestimated in its importance. The aesthetic and affective impact of realistically rendered virtual environments is absolutely vital to their success for many applications. For example, a virtual environment used by the travel industry to give a potential visitor to a destination with expansive vistas needs to convey the impression of its size for maximum impact. Whether virtual displays other than those employing multistory projection scenes (e.g., IMAX theaters) will succeed in doing so is a fascinating scientific question with major implications for commercial uses of virtual environments.
PERCEIVED EGOCENTRIC DISTANCE IN VIRTUAL ENVIRONMENTS Given the role that visual perception of distance plays in an observer’s experience within a visually based virtual environment, it is hardly surprising that this is one of the first topics to have been investigated by virtual environment researchers (Barfield & Rosenberg, 1995; Beall et al., 1995; Ellis, & Menges, 1997; Kline & Witmer, 1996; Knapp, 1999; Rolland, Gibson, & Arierly, 1995; Surdick, Davis, King, & Hodges, 1997; Witmer & Kline, 1998; Witmer & Sadowski, 1998). These studies address a variety of issues, most of which take us beyond our immediate interest in the accuracy with which egocentric distance is perceived. Thus, we focus on a few studies that have measured perceived egocentric distance in virtual environments involving HMDs. Our immediate concern in reviewing this work is with the methodological issue of how to assess the perception of distance and scale in virtual environments. Once we are confident about the methods, we can then use them to evaluate how well different virtual environment implementations convey a natural sense of scale. Given that virtual environment technology, including the rendering software, is constantly improving, only future research will tell us what we can ultimately expect from the technology in affording an accurate perception of distance and scale. For several years, our laboratory at the University of California at Santa Barbara has been using a high-quality virtual display system developed by Andrew C. Beall and Jack M. Loomis (for a detailed description, see Chance, Gaunet, Beall, &
2.
PERCEPTION OF EGOCENTRIC DISTANCE
37
Loomis, 1998). The Virtual Research FS5 HMD has a 44-deg (horizontal) × 33-deg (vertical) field of view in each eye, with 100% binocular overlap. Display resolution in each eye, with input from a Silicon Graphics Indigo2, High Impact graphics computer, is 800 horizontal lines × 486 vertical lines. The field-sequential display provides full-color, stereoscopic presentation with an effective visual acuity of about 20/70 (judging from the angular size of each pixel and informal assessments of letter identification performance). The hybrid head/body-tracking subsystem uses video capture of two lights worn on a backpack to measure torso position and heading and a goniometer (mechanical linkage with potentiometers) to measure orientation of the head in relation to the torso. The tracker allows the observer to walk around within a large room with natural head movements (Beall, 1997). We have taken great care to minimize the end-to-end system lag (down to about 50 msec) and to maximize the graphics update rate (30 Hz in each eye), but 80 msec lag and 15 Hz update rate in each eye are more commonly obtained in environments with the complexity of those in the experiments described below. The sense of presence in many of our virtual environments is very compelling, as judged by the informal comments of over 300 people who have had demonstrations with the system. Here we report the results of two experiments on the perception of distance done as part of Joshua M. Knapp’s doctoral dissertation (Knapp, 1999). Both involved binocular viewing of projectively-correct imagery, adjusted for the observers’ interpupillary distances and eyeheights. The first experiment compared three measures of perceived egocentric distance: verbal report, walked distance in a visually directed walking task, and a measure based on perceived size. For two of these measures, the targets were spheres placed on a textured ground plane (tesselated with a base texture pattern) at simulated distances of 1, 2.5, and 4 m. Given that one of the tasks involved walking to the target, the upper limit of 4 m was set by the size of the laboratory. The binocular cues of convergence and binocular disparity and the perspective cues of texture gradient, linear perspective, and angular elevation were all available; motion parallax due to head translations was not available, for observers viewed the targets from a fixed location. Because of the limited vertical field of view of the display, observers had to pitch their heads downward to sense the angular elevation of the target. For the verbal report judgment, the observer judged target distance in units of feet over the ground plane (i.e, not direct line of sight from head to target). For visually directed walking, the observer viewed the target and, when ready, walked with eyes closed to the judged position of the target (the display was also turned off). For the judgment based on perceived size, the observer saw a continuous untextured wall extending 3 m up from the ground in a frontoparallel plane at a distance (measured over the ground) of 1, 2.5, or 4 m. The observer turned a knob that varied the width of a vertically oriented aperture in the wall; the task was to make the aperture appear just passable (i.e., just as wide as the observer’s body, measured at the shoulders). This judgment of the passability of an aperture has been shown to be very accurate under full cues in real environments (Warren & Whang, 1987). We took the observer’s adjusted simulated width as equal to the observer’s perceived body width and
38
LOOMIS AND KNAPP
FIG. 2.7. Results of two recent experiments using the University of California at Santa Barbara virtual display system (Knapp, 1999). (a) Comparison of three measures of perceived egocentric distance: verbal report, walked distance in visually directed walking, and a measure derived from adjustment of a visible aperture to match the maximal width of the observer’s body. (b) Comparison of three measures of perceived egocentric distance: verbal report, a measure derived from triangulation by walking, and a measure derived from the judgment of perceived size. Error bars represent 1 SE of the mean.
used this, in conjunction with size–distance invariance (Equation 1), to obtain an indirect measurement of the perceived distance of the aperture and wall. Figure 2.7a gives the results of the experiment. Each value of indicated distance (perceived egocentric distance) is based on the mean of the data of 7 observers. Although the aperture-based measure is consistently higher in value than the other two, the three measures all indicate that, for this environment, distance is systematically underperceived, even for these short distances. The second experiment also compared three measures: verbal report, a measure based on triangulation by walking, and a measure based on perceived size. In contrast with visually directed walking, this triangulation response allowed us to obtain perceived estimates for much larger target distances because walking was in a direction orthogonal to the simulated visual target. Again, spheres were simulated as lying on a ground plane with a tesselated texture, this time at distances of 2, 6, and 18 m. Binocular and perspective cues provided information about the distance of the target. For the verbal response, the observers made their judgments in units of feet. For the size judgment task, spheres of constant angular size were used, and the observer judged the sphere diameter and verbally reported in units of inches. The verbal estimates were converted to perceived distance (over the ground) using size–distance invariance (Equation 1). For the triangulation response, the observer faced the target with the body turned in the direction of the subsequent walking response (as in Fig. 2.3b). When ready to respond, the observer closed the eyes
2.
PERCEPTION OF EGOCENTRIC DISTANCE
39
and initiated the walk until told to stop. After a distance of 3.1 m, the observer, still with eyes closed, turned to face the previously viewed and imaginally updated target. We used the terminal heading of the body (and head) to triangulate the initially perceived location of the target, from which we computed the perceived egocentric distance of the target. Figure 2.7b gives the results of this second experiment. Each value of indicated distance is based on the mean of 7 observers in the triangulation condition and 10 observers in the other two conditions. The concordance of the three measures is strong evidence that they are all measures of perceived egocentric distance and is also additional support for the idea that action (here, triangulation by walking) is controlled by conscious perception. As with the previous experiment (Fig. 2.7a), the results clearly indicate that egocentric distance is underperceived by a factor of about 2. This is surprising, for the environments in both experiments employed all of the known distance cues that are available to an observer in a natural outdoor environment when viewing from a fixed location. Results indicating more accurate distance perception in a virtual environment are those reported by Witmer and Sadowski (1998) and replotted in Figure 2.8a. They compared visually directed walking to real targets in a long corridor to visually directed walking (on a treadmill) to virtual targets within a simulated corridor. The indicated distances for the real environment mirror those summarized in Figure 2.2. The indicated distances for the virtual environment, although systematically lower, are still much more accurate than those obtained in our experiments (Fig. 2.7). Also, Witmer and Kline (1998) conducted a similar experiment, this
FIG. 2.8. Results of two experiments by Witmer and his colleagues. (a) Indicated distance (putatively perceived egocentric distance) as measured by visually directed walking in real and virtual environments. Walking in the virtual condition was performed on a treadmill. (Source: Fig. 2 from Witmer & Sadowski, 1998, (p. 483)). (b) Indicated distance (putatively perceived egocentric distance) as measured by verbal report in real and virtual environments. (Source: Fig. 3 from Witmer & Kline, 1998, (p. 155)).
40
LOOMIS AND KNAPP
time obtaining verbal reports. These data are replotted in Fig. 2.8b. These results for the virtual environment are closer to what we have obtained (Fig. 2.7). As a possible reason for the apparent difference between our results for the virtual environments and those of Witmer and colleagues, we note that their and our visual displays differed in terms of field of view. They used a head-tracked stereoscopic display (Fakespace Labs BOOM2C) with a 140-deg (horizontal) × 90-deg (vertical) field of view, in contrast with our 44-deg × 33-deg field of view. Kline and Witmer (1996) found that limiting the field of view had a large effect on variation in judged egocentric distance (see also Psotka, Lewis, & King, 1998). However, the head was not free to rotate in the Kline and Witmer study, which might have made display field of view more critical in their experiment than in the more usual situation, in which the head is free to rotate. Consistent with this interpretation, Knapp (1999) studied visually directed walking to targets in a full-cue outdoor environment and compared normal unrestricted field of view with a field of view that was restricted by goggles and approximately matched to the head-mounted display used in our virtual environment experiments. Observers performed very accurately in both conditions, with no significant difference between the two. Thus, it appears that the field of view of our head-mounted display is not the reason for underperception of distance in our virtual environments. We have hypothesized that the more likely reason is that the rendering of the scenes used in our experiments is lacking subtle but important visual cues (e.g., natural texture, and highlights). Supporting our hypothesis are some informal observations we have made—real environments, when viewed with our head-mounted display as it is driven by two video cameras located at a fixed vantage point, appear much more realistic in terms of distance and scale than our computer-synthesized virtual environments. If this hypothesis were to be correct, it would mean that photorealistic rendering of the surfaces and objects in a simulated environment ought to produce more accurate perception of distance, including the perception of very large scale. However, a recent study by Thompson, Willemsen, Gooch, Creem-Regehr, Loomis, and Beall (submitted) found that distance in photorealistic virtual environments was perceived no more accurately than it was in more artificially appearing virtual environments. Obviously, more research is needed to determine the stimulus and cognitive factors that underlie this difference in distance perception between real and virtual environments.
SIGNIFICANCE OF DISTANCE PERCEPTION FOR APPLICATIONS OF VIRTUAL ENVIRONMENTS There are two important reasons why visual distance perception in virtual environments needs to closely mimic visual distance perception in real environments. The first concerns spatial behavior. Users who acquire a skill in one setting will be able to effortlessly transfer this skill to the other setting with a minimum of recalibration
2.
PERCEPTION OF EGOCENTRIC DISTANCE
41
and relearning. The second concerns phenomenology. Virtual environments that look real in terms of scale will generally have a much greater aesthetic, cognitive, and emotional impact. No truer is this than in connection with the rendering of large-scale landscapes and objects. If virtual environment technology eventually succeeds in evoking the same experience of immensity and awe that one feels when viewing the Grand Canyon or the Egyptian Pyramids in person, it will have achieved what large-screen projection techniques of today can only begin to elicit at great expense. CONCLUSION In this chapter, we have noted the importance of phenomenological aspects of visual space for virtual environment applications while acknowledging recent research questioning the role that visual space might play in the control of spatial behavior. We have also noted the dual aspects of representational media and the relevance of this for understanding virtual environments. We have treated at some length the very difficult issue of how to measure perceived egocentric distance and have reviewed some of the research on the perception of egocentric distance, both in real and virtual environments. In view of the importance of the topic for virtual environments and the early stage of virtual environment technology, it is clear that much more research is needed to understand the perception of egocentric distance and scale in real and virtual environments, understanding that will undoubtedly further the development of more realistic and effective virtual environments. ACKNOWLEDGMENTS Office of Naval Research grant N00014-95-1-0573 supported development of the virtual display system, the experiments on visual distance perception conducted with the system, and preparation of this chapter. The authors thank Andrew Beall and Jerome Tietz for technical support and Rocco Greco, Kathleen Keating, Jeffrey Aller, and Andreas Gingold for their assistance with the experiments.
REFERENCES Baird, J. C. (1970). Psychophysical analysis of visual space. Oxford: Pergamon. Barfield, W., & Rosenberg, C. (1995). Judgments of azimuth and elevation as a function of monoscopic and binocular depth cues using a perspective display. Human Factors, 37, 173–181. Barfield, W., Zeltzer, D., Sheridan, T., & Slater, M. (1995). Presence and performance within virtual environments. In W. Barfield & T. A. Furness, III (Eds.), Virtual environments and advanced interface design (pp. 474–513). New York: Oxford University Press. Beall, A. C. (1997). Low-cost position and orientation tracking system for small-scale environments. Unpublished manuscript, University of California, Santa Barbara, Department of Psychology.
42
LOOMIS AND KNAPP
Beall, A. C., Loomis, J. M., Philbeck, J. M., & Fikes, T. J. (1995). Absolute motion parallax weakly determines visual scale in real and virtual environments. In Proceedings of the Conference on Human Vision, Visual Processing, and Digital Display, 2411, (pp. 288–297). Bellingham, WA: Society of Photo-Optical Instrumentation Engineers. Bhalla, M., & Proffitt, D. (1999). Visual-motor calibration in geographical slant perception. Journal of Experimental Psychology: Human Perception & Performance, 25, 1076–1096. Brain, W. R. (1951). Mind, perception, and science. London: Blackwell. Bridgeman, B., & Huemer, V. (1998). A spatially oriented decision does not induce consciousness in a motor task. Consciousness & Cognition, 7, 454–464. Bridgeman, B., Kirch, M., & Sperling, A. (1981). Segregation of cognitive and motor aspects of visual function using induced motion. Perception & Psychophysics, 29, 336–342. Carlson, V. R. (1977). Instructions and perceptual constancy judgments. In W. Epstein (Ed.), Stability and constancy in visual perception: Mechanisms and processes (pp. 217–254). New York: Wiley. Chance, S. S., Gaunet, F., Beall, A. C., & Loomis, J. M. (1998). Locomotion mode affects the updating of objects encountered during travel: The contribution of vestibular and proprioceptive inputs to path integration. Presence: Teleoperators and Virtual Environments, 7, 168–178. Creem, S. H., & Proffitt, D. R. (1998). Two memories for geographical slant: Separation and interdependence of action and awareness. Psychonomic Bulletin & Review, 5, 22–36. Cutting, J. E., & Vishton, P. M. (1995). Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In W. Epstein & S. Rogers (Eds.), Perception of space and motion (pp. 69–117). New York: Academic Press. Da Silva, J. A. (1985). Scales for perceived egocentric distance in a large open field: Comparison of three psychophysical methods. American Journal of Psychology, 98, 119–144. Eby, D. W., & Braunstein, M. L. (1995). The perceptual flattening of three-dimensional scenes enclosed by a frame. Perception, 24, 981–993. Elliott, D. (1987). The influence of walking speed and prior practice on locomotor distance estimation. Journal of Motor Behavior, 19, 476–485. Elliott, D., Jones, R., & Gray, S. (1990). Short-term memory for spatial location in goal-directed locomotion. Bulletin of the Psychonomic Society, 8, 158–160. Ellis, S. R., & Menges, B. M. (1997). Judgments of the distance to nearby virtual objects: Interaction of viewing conditions and accommodative demand. Presence: Teleoperators and Virtual Environments, 6, 452–460. Epstein, W. (1982). Percept–percept coupling. Perception, 11, 75–83. Ferris, S. H. (1972). Motion parallax and absolute distance. Journal of Experimental Psychology, 95, 258–263. Flach, J. M., & Warren, R. (1995). Active psychophysics: The relation between mind and what matters. In J. M. Flach, P. A. Hancock, J. Caird, & K. J. Vicente (Eds.), Global perspectives on the ecology of human–machine systems: Vol. 1. Resources for ecological psychology (pp. 189–209). Hillsdale, NJ: Lawrence Erlbaum Associates. Foley, J. M. (1977). Effect of distance information and range on two indices of visually perceived distance. Perception, 6, 449–460. Foley, J. M. (1980). Binocular distance perception. Psychological Review, 87, 411–434. Foley, J. M. (1985). Binocular distance perception: Egocentric distance tasks. Journal of Experimental Psychology: Human Perception and Performance, 11, 133–149. Foley, J. M., & Held, R. (1972). Visually directed pointing as a function of target distance, direction, and available cues. Perception & Psychophysics, 12, 263–268. Foley, J. M., Ribeiro-Filho, J. P., & Da Silva, J. A. (2001). Visual localization and the metric of visual space in multi-cue conditions. Investigative Ophthalmology & Visual Science, 42, S939.
2.
PERCEPTION OF EGOCENTRIC DISTANCE
43
Fukusima, S. S., Loomis, J. M., & Da Silva, J. A. (1997). Visual perception of egocentric distance as assessed by triangulation. Journal of Experimental Psychology: Human Perception and Psychophysics, 23, 86–100. Gibson, J. J. (1958). Visually controlled locomotion and visual orientation in animals. British Journal of Psychology, 49, 182–194. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Gilinsky, A. S. (1951). Perceived size and distance in visual space. Psychological Review, 58, 460– 482. Gogel, W. C. (1976). An indirect method of measuring perceived distance from familiar size. Perception & Psychophysics, 20, 419–429. Gogel, W. C. (1977). The metric of visual space. In W. Epstein, (Ed.), Stability and constancy in visual perception: Mechanisms and processes (pp. 129–181). New York: Wiley. Gogel, W. C. (1979). The common occurrence of errors of perceived distance. Perception & Psychophysics, 25, 2–11. Gogel, W. C. (1982). Analysis of the perception of motion concomitant with head motion. Perception & Psychophysics, 32, 241–250. Gogel, W. C. (1984). The role of perceptual interrelations in figural synthesis. In P. C. Dodwell & T. Caelli (Eds.), Figural synthesis (pp. 31–82). Hillsdale, NJ: Lawrence Erlbaum Associates. Gogel, W. C. (1990). A theory of phenomenal geometry and its applications. Perception & Psychophysics, 48, 105–123. Gogel, W. C. (1993). The analysis of perceived space. In S. C. Masin (Ed.), Foundations of perceptual theory (pp. 113–182). Amsterdam: Elsevier. Gogel, W. C., & Da Silva, J. A. (1987) Familiar size and the theory of off-sized perceptions. Perception & Psychophysics, 41, 220–238. Gogel, W. C., Loomis, J. M., Newman, N. J., & Sharkey, T. J. (1985). Agreement between indirect measures of perceived distance. Perception & Psychophysics, 37, 17–27. Gogel, W. C., & Newton, R. E. (1976). An apparatus for the indirect measurement of perceived distance. Perceptual and Motor Skills, 43, 295–302. Gogel, W. C., & Tietz, J. D. (1979). A comparison of oculomotor and motion parallax cues of egocentric distance. Vision Research, 19, 1161–1170. Gogel, W. C., & Tietz, J. D. (1992). Determinants of the perception of sagittal motion. Perception & Psychophysics, 52, 75–96. Goodale, M. A., & Humphrey, G. K. (1998). The objects of action and perception. Cognition, 67, 181–207. Heeter, C. (1992). Being there: The subjective experience of presence. Presence: Teleoperators and Virtual Environments, 1, 262–271. Held, R., & Durlach, N. I. (1992). Telepresence. Presence: Teleoperators and Virtual Environments, 1, 109–112. Howard, I. P., & Rogers, B. J. (2002). Seeing in depth: Volume 2: Depth perception. Thornhill, Ontario: I. Porteous. Kilpatrick, F. P., & Ittelson, W. H. (1953). The size–distance invariance hypothesis. Psychological Review, 60, 223–231. Kline, P. B., & Witmer, B. G. (1996). Distance perception in virtual environments: Effects of field of view and surface texture at near distances. In Proceedings of the Human Factors and Ergonomics Society 40th Annual Meeting (pp. 112–116). Philadelphia: Human Factors and Ergonomics Society. Knapp, J. M. (1999). The visual perception of egocentric distance in virtual environments. Doctoral dissertation, University of California, Santa Barbara, Department of Psychology. Koffka, K. (1935). Principles of Gestalt psychology. London: Routledge & Kegan Paul.
44
LOOMIS AND KNAPP
Lee, D. N. (1980). Visuo–motor coordination in space-time. In G. E. Stelmach & J. Requin (Eds.), Tutorials in motor behavior (pp. 281–295). Amsterdam: North-Holland. Lee, D. N. (1993). Body–environment coupling. In U. Neisser (Ed.), The perceived self: Ecological and interpersonal sources of knowledge (pp. 43–67). New York: Cambridge University Press. Levin, C. A., & Haber, R. N. (1993). Visual angle as a determinant of perceived interobject distance. Perception & Psychophysics, 54, 250–259. Lombard, M., & Ditton, D. (1997). At the heart of it all: The concept of presence. Journal of Computer Mediated-Communication. [on-line], 3, 2. Available at http://www.ascusc.org/jcmc/ vol3/issue2/lombard.html Loomis, J. M. (1992). Distal attribution and presence. Presence: Teleoperators and Virtual Environments, 1, 113–119. Loomis, J. M., Da Silva, J. A., Fujita, N., & Fukusima, S. S. (1992). Visual space perception and visually directed action. Journal of Experimental Psychology: Human Perception and Performance, 18, 906–921. Loomis, J. M., Da Silva, J. A., Philbeck, J. W., & Fukusima, S. S. (1996) Visual perception of location and distance. Current Directions in Psychological Science, 5, 72–77. Loomis, J. M., Klatzky, R. L., Philbeck, J. W., & Golledge, R. G. (1998) Assessing auditory distance perception using perceptually directed action. Perception & Psychophysics, 60, 966–980. Loomis, J. M., & Philbeck, J. W. (1999). Is the anisotropy of perceived 3-D shape invariant across scale? Perception & Psychophysics, 61, 397–402. McCready, D. (1985). On size, distance, and visual angle perception. Perception & Psychophysics, 37, 323–334. Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. New York: Oxford University Press. Ooi, T. L., Wu, B., & He, Z. J. (2001). Distance determined by the angular declination below the horizon. Nature, 414, 197–200. Oyama, T. (1977). Analysis of causal relations in the perceptual constancies. In W. Epstein (Ed.), Stability and constancy in visual perception: Mechanisms and processes (pp. 183–216). New York: Wiley. Philbeck, J. W., & Loomis, J. M. (1997). Comparison of two indicators of visually perceived egocentric distance under full-cue and reduced-cue conditions. Journal of Experimental Psychology: Human Perception and Performance, 23, 72–85. Philbeck, J. W., Loomis, J. M., & Beall, A. C. (1997). Visually perceived location is an invariant in the control of action. Perception & Psychophysics, 59, 601–612. Pirenne, M. H. (1970). Optics, painting, and photography. Cambridge, England: Cambridge University Press. Polanyi, M. (1964). Personal knowledge. New York: Harper & Row. Polanyi, M. (1970). What is a painting? British Journal of Aesthetics, 10, 225–236. Proffitt, D. R., Bhalla, M., Gossweiler, R., & Midgett, J. (1995). Perceiving geographical slant. Psychonomic Bulletin and Review, 2, 409–428. Psotka, J., Lewis, S. A., & King, D. (1998). Effects of field of view on judgments of self-location: Distortions in distance estimations even when the image geometry exactly fits the field of view. Presence: Teleoperators and Virtual Environments, 7, 352–369. Rieser, J. J., Ashmead, D. H., Talor, C. R., & Youngquist, G. A. (1990). Visual perception and the guidance of locomotion without vision to previously seen targets. Perception, 19, 675–689. Rolland, J. P., Gibson, W., & Arierly, D. (1995). Towards quantifying depth and size perception as a function of viewing distance. Presence: Teleoperators and Virtual Environments, 4, 24–49. Russell, B. (1948). Human knowledge: Its scope and limits. New York: Simon & Schuster. Sedgwick, H. A. (1980). The geometry of spatial layout in pictorial representation. In M. A. Hagen (Ed.), The perception of pictures: Vol. 1. Alberti’s window. New York: Academic Press.
2.
PERCEPTION OF EGOCENTRIC DISTANCE
45
Sedgwick, H. A. (1986). Space perception. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance: Vol. 1. Sensory processes and perception (pp. 21.1–21.57). New York: Wiley. Sinai, M. J., Ooi, T. L., & He, Z. J. (1998). Terrain influences the accurate judgement of distance. Nature, 395, 497–500. Slater, M., Usoh, M., & Steed, A. (1994). Depth of presence in virtual environments. Presence: Teleoperators and Virtual Environments, 3, 130–144. Smith, P. C., & Smith, O. W. (1961). Ball throwing responses to photographically portrayed targets. Journal of Experimental Psychology, 62, 223–233. Smythies, J. (1994). The walls of Plato’s cave. Aldershot, England: Aveburg. Steenhuis, R. E., & Goodale, M. A. (1988). The effects of time and distance on accuracy of targetdirected locomotion: Does an accurate short-term memory for spatial location exist? Journal of Motor Behavior, 20, 399–415. Surdick, R. T., Davis, E. T., King, R. A., & Hodges, L. F. (1997). The perception of distance in simulated visual displays: A comparison of the effectiveness and accuracy of multiple depth cues across viewing distances. Presence: Teleoperators and Virtual Environments, 6, 513–531. Thomson, J. A. (1980). How do we use visual information to control locomotion? Trends in Neuroscience, 3, 247–250. Thomson, J. A. (1983). Is continuous visual monitoring necessary in visually guided locomotion? Journal of Experimental Psychology: Human Perception and Performance, 9, 427–443. Thompson, W. B., Willemsen, P., Gooch, A. A., Creem-Regehr, S. H., Loomis, J. M., & Beall, A. C. (submitted). Does the quality of the computer graphics matter when judging distances in visually immersive environments? Turner, J., & Braunstein, M. L. (1995). Size constancy in structure from motion. Perception, 24, 1155– 1164. Turvey, M. T. (1977). Preliminaries to a theory of action with reference to vision. In R. J. Shaw & J. Bransford (Eds.), Perceiving, acting, and knowing: Toward an ecological psychology. Hillsdale, NJ: Lawrence Erlbaum Associates. Turvey, M. T., & Carello, C. (1986). The ecological approach to perceiving–acting: A pictorial essay. Acta Psychologica, 63, 133–155. Turvey, M. T., & Remez, R. E. (1979). Visual control of locomotion in animals: An overview. In L. Harmon (Ed.), Interrelations of the communicative senses (pp. 275–295). Washington, DC: National Science Foundation. Warren, R., & Wertheim, A. H. (1990). Perception & control of self-motion. Hillsdale, NJ: Lawrence Erlbaum Associates. Warren, W. H. (1990). The perception–action coupling. In H. Block & B. I. Bertenthal (Eds.), Sensory– motor organizations and development in infancy and early childhood (NATO Advanced Science Institutes Series, D: Behavioral and Social Sciences, Vol. 56, pp. 23–37). Dordrecht, the Netherlands: Kluwer. Warren, W. H., & Whang, S. (1987). Visual guidance of walking through apertures: Body-scaled information for affordances. Journal of Experimental Psychology: Human Perception and Performance, 13, 371–383. Weiskrantz, L. (1986). Blindsight: A case study and implications. Oxford, England: Oxford University Press. Weiskrantz, L. (1990). Outlooks for blindsight: Explicit methodologies for implicit processes. Proceedings of the Royal Society (London), B239, 247–278. Witmer, B. G., & Kline, P. B. (1998). Judging perceived and traversed distance in virtual environments. Presence: Teleoperators and Virtual Environments, 7, 144–167. Witmer, B. G., & Sadowski, W. J., Jr. (1998). Nonvisually guided locomotion to a previously viewed target in real and virtual environments. Human Factors, 40, 478–488.
46
LOOMIS AND KNAPP
Wraga, M. A. (1999a). The role of eye height in perceiving affordances and object dimensions. Perception & Psychophysics, 61, 490–507. Wraga, M. A. (1999b). Using eye height in different postures to scale the heights of objects. Journal of Experimental Psychology: Human Perception and Performance, 25, 518–530. Yang. T. L., Dixon, M. W., & Proffitt, D. R. (1999). Seeing big things: Overestimation of heights is greater for real objects than for objects in pictures. Perception, 28, 445–467. Zahorik, P., & Jenison, R. L. (1998). Presence as being-in-the-world. Presence: Teleoperators and Virtual Environments, 7, 78–89. Zubek, J. P., Pushkar, D., Sansom, W., & Gowing, J. (1961). Perceptual changes after prolonged sensory isolation (darkness and silence). Canadian Journal of Psychology, 15, 85–100.
3 A Unified Approach to Presence and Motion Sickness Jerrold D. Prothero Human Interface Technology Laboratory University of Washington email:
[email protected]
Donald E. Parker∗ Department of Otolaryngology University of Washington email:
[email protected] To say that the world is resting on the wheel of space or on the wheel of wind is not the truth of the self or the truth of others. Such a statement is based on a small view. People speak this way because they think that it must be impossible to exist without having a place on which to rest. —Dogen, thirteenth century
We present a framework for considering some features of spatial perception in virtual environments. This framework is embodied in a general rest frame hypothesis (RFH), which states that a particular reference frame, the “rest frame,” is selected as the comparator for spatial judgments. The RFH derives from previous work in numerous areas, including figure–ground studies, research on sensory integration, theoretical consideration of reference frames, and examination of visually induced self-motion. The RFH implies that simulator sickness should be reducible through visual background manipulations, specifically by providing an “independent” visual * Correspondence regarding this chapter should be addressed to Donald E. Parker, Department of Otolaryngology-HNS, Box 357923, University of Washington, WA 98195-7923.
47
48
PROTHERO AND PARKER
background that matches inertial orientation and motion cues. A second implication is that the sense of presence, of “being in” a virtual environment, should correlate with rest frame manipulations. Specifically, presence should be increased by manipulations that facilitate perception of a virtual scene as a rest frame. Experiments described in this chapter assessed the hypothesis that making motion cues from an independent visual background congruent with inertial cues could reduce simulator sickness. When an inertially congruent visual background was available, decreased simulator sickness and increased per-exposure postural stability were observed. These results suggest that appropriate visual background manipulations may reduce the prevalent unwanted side effects of simulators and immersive virtual environments. Experiments to assess RFH implications for presence, described later, used two procedures: self-reported presence and visual–inertial “crossover,” defined as switching between conflicting visual and inertial cues for determining perceived self-motion (see Fig. 3.3). Results indicated that a meaningful virtual scene, as opposed to a nonmeaningful one, increased both reported presence and the level of inertial motion required to overcome perceived self-motion elicited by scene motion. The presence research introduces a procedure, possibly based on brain stem–level neural processing, to measure the effectiveness of visual virtual environments. Both lines of research may contribute to developing effective virtual interfaces that have the potential to increase the human–computer bandwidth, and thus to partially address the complexity explosion. Both simulator sickness and presence deal with complex psychological issues. It is likely that a useful approach to these problems will have to be based on fundamental principles and broad models of basic perceptual phenomena. Perhaps the RFH will provide one of those principles.
THE REST FRAME HYPOTHESIS The RFH is a convenient means to summarize some of the literature on spatial perception. It derives from the observation that humans have a strong perception that certain things are stationary. From a mathematical point of view, relative motion between two entities (for instance, a train and the surface of the Earth) could be interpreted as implying that either (or neither) is stationary. We suggest that certain things are selected as being stationary in order to minimize calculations. For instance, if one is primarily concerned with moving from one place to another, it is useful to assume that the Earth’s surface is stationary and to use it as the basis for spatial orientation and motion. Similarly, it is much more efficient to compute the motion of one’s hand with respect to a room that is assumed to be stationary than the converse. Adoption of specific coordinate
3.
PRESENCE AND MOTION SICKNESS
49
systems to reduce complexity is a common strategy among mathematicians and physicists. To simplify the extraordinary difficulty of matching several sensory and motor spaces, the brain may be designed to use the intrinsic Euclidean frame of reference provided by the vestibular receptors. As noted by Berthoz (1991), this could reduce complexity of computation because what appears to be a threedimensional problem is reduced, by projection on three orthogonal planes, to a two-dimensional problem. The RFH simply formalizes these considerations. Borrowing from physics, a coordinate system used to define positions, angular orientations and motions is called a reference frame. The particular reference frame that a given observer takes to be stationary is called the rest frame for that observer. (As an example of a reference frame that is not a rest frame, consider a train that is perceived as moving through a landscape. One might note where a person is on the train. Because one is making a judgment with respect to the train, which is perceived as moving, the train acts as a moving reference frame.) The RFH is stated as follows. The brain has access to many rest frames. Under normal conditions, one of these is selected as the comparator for spatial judgments. We call this the selected rest frame. When equally attention-eliciting reference frames are in conflict, stable selection of a single rest frame may not be possible. The RFH derives from work in several areas, including figure–ground studies, sensory integration, reference frames, and visually induced self-motion. As noted later, the RFH has implications for both simulator sickness and the measurement of presence. Additional implications of the RFH, including the classification of spatial illusions and manipulation of level of presence, are presented in Prothero (1998).
DERIVATION OF THE REST FRAME HYPOTHESIS Figure–Ground Studies Hebb (1949) suggested that the primitive unity of a figure might be its most fundamental feature. Hebb’s view was supported by Senden’s (1932) observations: following removal of cataracts and consequently seeing for the first time as adults, patients immediately reported seeing figures even though they were unable to distinguish between such simple forms as circles and triangles. Rubin (1915) postulated fundamental differences between figure and ground, including: figures have form whereas ground may be relatively formless; figure has the character of a “thing” whereas the ground does not; figures are more apt to suggest meaning; figure seems to be in front and ground seems to extend behind it. How are figures constituted, what are the rules that determine whether elements are grouped to form a figure? This question was addressed by the Gestalt
50
PROTHERO AND PARKER
psychologist Wertheimer (1923), who suggested several well-known principles, including proximity, similarity, closure, continuation, symmetry, momentary set (Einstellung), and past experience. An enormous research effort has evaluated and extended Gestalt principles. One aspect of this effort concerns ambiguous and reversible figure–ground relationships (see Goldstein, 1984; Schiffman, 1990). Implications of the RFH described below follow from research indicating how stimulus properties and expectations can be manipulated to control figure–ground organization. Research has focused primarily on characteristics of figure, whereas ground has been defined mostly in terms of what it is not, that is, not thinglike, not formed, and not meaningful. For derivation of the RFH, the critical distinction between figure and ground may be that figure is in front and ground extends behind. This feature of ground appears to account for the observation that ground usually provides a rest frame, as suggested by rod-and-frame studies and visually induced self-motion (vection) research. Clearly, ground is neither formless nor meaningless. Our selforientation and self-motion perceptual systems are able to extract information from ground. Sensory Integration Perception of self-orientation and self-motion is served by a multisensory system that receives inputs from inertial receptors, including the vestibular apparatus and somatic receptors, as well as from the eyes (Gibson, 1966; Parker, 1980, 1991; Precht, 1979). Changes in inertial receptor response are usually interpreted as altered self-motion or self-orientation. Appropriate manipulation of stimuli to inertial receptors may result in ecologically invalid self-motion perception, such as the “cross coupling” produced by pitching the head forward while rotating about an Earth-vertical axis (see Howard, 1982) or by skin pressure cues from a “g-seat” (McMillan, Martin, Flach, Riccio, 1985). Similarly, visual-field flow is usually associated with self-motion. To the extent that the visual surround is interpreted as indicating what is stationary, as ground, visual flow with respect to an inertially stationary observer may elicit perceived self-motion. The phenomenon of visually induced perceived self-motion has been examined in numerous experiments and is the basis for many vehicle simulators (Rolfe & Staples, 1986). Neural integration of visual, inertial and somatic stimuli occurs at several locations, including the vestibular nuclei and the parietoinsular cortex (see Cohen & Henn, 1988). For example, Waespe and Henn (1979) demonstrated that vestibular nucleus neurons might be excited by clockwise inertial rotation. Similar excitation from the same neuron may be evoked by counterclockwise rotation of the visual surround (an optokinetic drum). These multisensory neurons, which might underlie self-motion perception, apparently do not distinguish between inertial and visual motion stimuli. Further, because these neurons are in the brain stem,
3.
PRESENCE AND MOTION SICKNESS
51
self-motion perception associated with their response may provide a more “fundamental” measure of spatial perception than self-report procedures that are influenced by unpredictable cognitive factors. This is described in the later section entitled “Implications for Measuring Presence.” Reference Frames Spatially oriented behavior presupposes a reference frame within which that behavior occurs. Numerous reference frames are available from the environment as well as from our bodies. Psychologists and neurophysiologists have long distinguished two fundamental types: egocentric frames, defined with respect to the person, and allocentric frames, defined by features of the environment (see Howard & Templeton, 1966). Due to the omnipresence of gravity, some suggest a third fundamental type, a geocentric frame (e.g., Paillard, 1971). Numerous subtypes can be differentiated within each fundamental type. For example, egocentric frames have been divided into oculocentric, headcentric, and bodycentric subtypes. Many people are familiar with the effects of a tilted room on one’s perception of upright, a phenomenon that was examined quantitatively by Asch and Witkin (1948). Based on their work, subsequent studies have addressed “rod-and-frame” effects. Observers are said to be field dependent to the degree that a surrounding frame influences their setting of a rod with respect to gravity. These studies show that people weigh competing visual exocentric and gravicentric reference frames differently. People adopt different reference frames depending on their motor activity, environment, expectations, goals, and so on, and are able to switch between frames. This is particularly apparent during exposure to microgravity, where observers often assign position in space of visual objects in terms of a bodycentric frame (Frederici & Levelt, 1990). Astronauts who adopt a bodycentric frame report that the orientation of their own head and torso defines “up” when in microgravity; others take their “up” orientation from the exocentric frame provided by the visual scene (Harm & Parker, 1993). During the course of a space mission, most astronauts report an increasing tendency to adopt a bodycentric reference frame. Harm, Parker, Reschke, and Skinner (1998) reported that astronauts categorized as dependent on visual reference frames were more susceptible to space motion sickness and exhibited shorter latencies to onset of circular vection. This led to the as yet untested suggestion that teaching astronauts to switch between reference frames prior to flight might reduce space motion sickness. In summary, the RFH draws heavily on the extensive prior work concerning the necessity of reference frames for orientation, alternative frames, factors that lead to adoption of a particular frame in given circumstances, and the ability to switch between reference frames. Of course, visual reference frames underlie the phenomenon of visually induced self-motion, as described later.
52
PROTHERO AND PARKER
Illusory Self-Motion (Vection) Movement of a visual scene is fundamentally ambiguous: The motion may be attributed to the self or the visual surround. Raising or lowering a striped towel may elicit postural adjustments in supine infants. This simple observation suggests that the default is to attribute scene motion to the self. Mach (1875, 2001) investigated illusions of self-rotation (circular vection) by placing subjects at the center of a cylinder painted with vertical stripes. Similar illusions of self-movement along a straight path (linear vection) were examined by Fischer and Kornmuller (1930). Several investigators including Brandt and his colleagues (1973, 1975) and Howard and his colleagues have carefully examined characteristics of visual stimuli associated with circular and linear vection (described later). Traditionally, it was believed that a necessary condition for vection was stimulation of peripheral vision though a wide field-of-view display. Andersen and Braunstein (1985) showed that “central vection” (vection as a result of stimulating only the central visual field with stimuli subtending angles as small as 7.5 deg) was possible. A similar nondependence of vection on peripheral vision was reported by Howard and Heckmann (1989). The importance of visual background for perception of self-motion has been examined in numerous studies (e.g., Brandt, Dichgans, & Koenig 1975; Ohmi & Howard, 1988; Ohmi, Howard & Landolt, 1987). Visual background is also critical for several other perceptual phenomena, including induced motion of external objects (Wallach, 1959); tilt of one’s self and of external objects (Howard, 1986); and judgment of spatial distances (Levine & Shefner, 1991). The influence of the visual background on the selected rest frame is understandable in that the visual background generally provides a large set of consistent spatial cues available for use as the comparator for spatial judgments. The likelihood that the moving part of a visual scene will be interpreted as background may be increased following various manipulations. For example, occluding part of a moving foreground may cause that moving part to be judged as background and, consequently, to evoke vection. Howard and Heckmann (1989) made essentially this point: The configuration in which we obtained good centre-consistent vection, namely a moving scene seen at some distance through a window in a large stationary surround, is typical of situations in which the world is seen through the window of a moving vehicle. The fact that good vection may be obtained under these circumstances is a point to be borne in mind by those who wish to avoid the high cost of producing widefield displays to induce convincing sensations of self-motion in aircraft simulators.
Similarly, Mergner and Becker (1990) exposed subjects to a 30-deg × 30-deg vection stimulus in central vision. Vection was never reported when the limited field of view was created to give the impression that the display was not the visual
3.
PRESENCE AND MOTION SICKNESS
53
background (by putting a box with a small opening over the projection system). In contrast, when other background cues were removed and a mask worn on spectacles created the same field-of-view restriction, the subjects did report vection. In the latter case, the vection was qualitatively less than with full-field stimuli; however, subjects’ quantitative estimates were only slightly lower.
IMPLICATIONS OF THE REST FRAME HYPOTHESIS Implications for Reducing Simulator Sickness The inability to integrate visual and inertial motion cues has been suggested as one probable cause for motion sickness in simulators and virtual environments (e.g., Parker & Parker, 1990; Presence, 1992; Stanney, Kennedy, Drexler, & Harm, 1999). According to traditional sensory conflict theory, one type of motion sickness results from the inability to resolve conflicting visual and inertial motion cues. Implications of the RFH for reducing simulator sickness arise through a slight refinement of this theory. Part of the malaise experienced in a simulator is due to imperfections in the technology, including lag in updating the visual scene and geometric distortions. Another part would occur even if the display were technically perfect; this component may be called motion sickness. Consequently, motion sickness may be viewed as a component of simulator sickness. Following the sensory rearrangement approach, motion sickness may be experienced in a simulator because stimuli provided by the simulator are incongruent and/or do not meet expectations. For instance, a driving simulator without a motion platform provides strong visual self-motion cues but no inertial self-motion cues. The standard sensory rearrangement theory of motion sickness states: “All situations which provoke motion sickness are characterized by a condition of sensory rearrangement in which the motion signals transmitted by the eyes, the vestibular system and the non-vestibular proprioceptors are at variance either with one another or with what is expected from previous experience” (Griffin, 1990; Reason, 1970; Reason, 1978). Motion sickness in simulators is likely to become an increasingly serious problem as technological improvements result in highly attention-eliciting virtual reference frames that conflict with the inertial reference frame provided by the real world. The RFH suggests that rest frames are crucial to spatial perception. The inability to consistently select a particular rest frame should have serious consequences. This implies that motion sickness does not arise from conflicting orientation and motion cues per se, but rather from conflicting rest frames implied by those cues. That is, what is crucial is not the full set of cues in an environment, but rather how those cues are interpreted to influence one’s sense of what is and is not stationary. Although this is only a slight refinement to sensory rearrangement theory, it suggests that attempts to reduce motion sickness may usefully focus on the particular stimuli
54
PROTHERO AND PARKER
that influence the selected rest frame rather than on all orientation and motion stimuli. Previous research provides direction for this effort. Extensive research in visual perception, some of which is noted in this chapter, indicates that the selected rest frame is heavily influenced by the visual background, that is, that one’s spatial comparisons tend to be with respect to the visual background. The importance of the visual background to the selected rest frame suggests that it may be possible to significantly reduce simulator sickness by making the visual background of a simulator scene agree with the inertial cues. This suggests splitting the simulator scene into two parts: the simulator’s usual “content of interest” in the foreground and, “behind” it, an “independent visual background” (IVB). The IVB can be made consistent with the inertial rest frame even if the foreground is not. Two experiments based on this suggestion are briefly summarized below. We have discussed implications of the RFH for reducing simulator sickness in the framework of a sensory conflict approach. However, those implications could also be developed using concepts from a perception–action theory of motion sickness similar to the one proposed by Riccio and Stoffregen (1991). Effective action presupposes that the actor is spatially oriented. Judgments to guide both perception and action can only be made with respect to some reference system. Information regarding what is stationary, what constitutes an ecologically appropriate rest frame, is critical for these judgments. Inappropriate rest frame selection may disrupt the perception–action loop leading to disturbances, including postural instability, which Riccio and Stoffregen postulate is critical for motion sickness. Implications of the RFH for Measuring Presence Presence is a broadly defined term that encompasses several dimensions of experience (see Stanney & Salvendy, 1998). Schubert, Friedmann, and Regenbrecht (1999) recently reported results from a factor analytic study of subjective responses to virtual experience. “Spatial presence,” defined as a sense of being in a place, along with “involvement” and “realness,” were the highest loading items in the analysis. The “being on a place” definition of presence implies a sense of spatial orientation. Given the importance of the selected rest frames to spatial perception, the following “presence hypothesis” is suggested: The sense of presence in an environment reflects the degree to which that environment influences the selected rest frame. That is, presence in a virtual environment is related to the virtual environment’s ability to influence the sense of position, angular orientation, and motion. This implies that the sense of presence could be measured by procedures that create a conflict between rest frame cues implied by the virtual and real environments. Presence should be indicated by the relative influence on the subject’s motion perception of virtual as opposed to real rest frame cues, that is, the degree to which virtual cues overwhelm real cues.
3.
PRESENCE AND MOTION SICKNESS
55
As described by Carpenter-Smith, Futamura, and Parker (1995), inertial and visual motion cues that indicate different self-motion directions can be presented simultaneously. The subjects’ responses indicate whether their perceived selfmotion is determined by the visual or the inertial cues. By varying inertial stimulus velocity, the “point of subjective equality” (“crossover” between visual and inertial dominance) can be determined. Consequently, the effectiveness of different visual stimuli for eliciting self-motion can be scaled in terms of inertial stimulus velocity. The presence experiment presented later describes an extension and refinement of the procedure reported by Carpenter-Smith and his colleagues (1995). We also attempted to relate visually induced self-motion, indicated by the visual–inertial crossover procedure, to self-reported presence.
SIMULATOR SICKNESS EXPERIMENTS The research on simulator sickness, reported here in abbreviated form, was conducted in collaboration with Mark Draper. For a complete presentation, see Prothero, Draper, Furness, Parker, and Wells (1998) and Prothero (1998). This research addressed the following questions: (1) Can an independent visual background (IVB), which provides orientation and motion cues in agreement with inertial cues, reduce reported simulator sickness and related side effects; (2) if so, can it do so without reducing the subjective impact, as measured for example by reported vection, of the foreground scene; and (3) can an IVB be effective even when the subject’s attention is directed to the visual foreground in which visual self-motion cues disagree with the inertial cues? Methods The foreground scene was a circular vection stimulus. This was created by placing a video camera on a tripod in an open plaza on the University of Washington campus and continuously rotating the camera in yaw with a period of 6 sec. In two experiments, the vection stimulus was shown for 3–4.5 min in a Virtual i/O i-glasses! head-mounted display. The video image appeared on a semitransparent surface. For the IVB (see-through) condition, the stationary laboratory wall was visible behind the video image; for the no-IVB (occluded) condition, a mask was placed behind the display surface so that the laboratory wall could not be seen (see Fig. 3.1). Subjects rated the rotating foreground stimulus as more visible than the visual background in the see-through condition. Response measures included a standard simulator sickness questionnaire (Kennedy, Lane, Berbaum, & Lilienthal, 1993) and per-exposure recording of postural instability. The postural instability measure involved having subjects adopt a “Sharpened Romberg” stance (one foot in front of the other, heel touching toe,
56
PROTHERO AND PARKER
Independent Visual Background
Scene
A
B
FIG. 3.1. A scene rotating at 60 deg/sec around the Earth-vertical axis was shown in an HMD. The IVB condition A provides a stationary real background, visible through the half-silvered mirror of the HMD, which is consistent with inertial cues detected by the stationary subject. The non-IVB condition B placed an occlusion behind the halfsilvered mirror such that only the rotating scene displayed on the HMD was visible.
weight evenly distributed between the legs, arms folded across the chest and chin up). The number of times a subject had to break this stance while viewing the circular vection stimulus was the postural instability measure. A postexposure selfreported vection rating was also recorded to assess the “subjective impact” of the foreground scene. In the second of two experiments, a visual task was added to force the subject’s attention to the visual foreground. The task consisted of calling out the colors of signs shown in the videotape. It served to test whether an IVB would be effective even when attention was forced to the rotating visual foreground. Results The IVB condition was found to produce significantly higher postural stability in both experiments. Self-reported simulator sickness symptoms were lower for the IVB condition only in the first experiment. No differences were found in reported vection between the two conditions in either experiment. Discussion The results suggest that an IVB may reduce simulator sickness, in accordance with the theoretical arguments presented earlier. It was encouraging that vection reports did not differ between the IVB and non-IVB conditions in either experiment. This suggests that it may be possible to avoid some of the problems associated with simulators without reducing subjective impact, as discussed by Prothero (1998). These experiments used a narrow field-of-view “low-end” system. Future research should address applicability of the IVB approach to high-end systems, such as advanced driving and aircraft simulators.
3.
PRESENCE AND MOTION SICKNESS
57
PRESENCE MEASUREMENT EXPERIMENT The research described in this section was designed to support development of an “objective” presence measure. The degree to which visual self-motion cues overwhelmed conflicting inertial self-motion cues was determined using a “crossover” procedure. Both inertial and visual rotation were around the Earth-vertical axis; conflict was restricted to the horizontal plane to avoid the strong inertial cues provided by gravity. We postulated that this procedure might provide a “fundamental” measure of presence, a measure less contaminated by uncontrolled cognitive variables than magnitude estimation. This study addressed the following questions: (1) does a meaningful visual scene influence the sense of self-motion more than a nonmeaningful scene composed of random elements; (2) what is the relationship between the perceptual crossover measure and self-reported presence; and (3) are the crossover and presence measures reliable? The first question asks whether the crossover and presence measures are influenced by visual scene content. We predicted that the meaningful stimulus would increase the sense of presence. If so, the meaningful stimulus should have a greater influence on self-motion; that is, more intense inertial rotation would be required to overwhelm visual rotation. The second question asks to what degree the crossover and presence measures are equivalent. Both measures may imperfectly reflect subjective presence: The crossover measure may introduce error because of failure to assess accurately the subject’s cognitive state; the selfreported presence measure may err due to well-known limitations of magnitude estimation procedures. The third question concerns reliability. The validity of a measure is limited by its test–retest correlation, that is, by the degree to which it consistently gives the same answer for the same conditions. Methods The within-subjects experimental design featured two levels of visual condition (meaningful/nonmeaningful) compared in each of two sessions. The two sessions were usually separated by days or weeks and never by less than 4 hours. The dependent variables were the crossover measure and self-reported presence, as described later. Twelve participants were seated upright in a chair that oscillated sinusoidally around an Earth-vertical axis (see Figs. 3.2 and 3.3). They wore a Virtual Research VR4 head-mounted display that has a 48-deg field of view. Head tracking was not used. The scene was displayed to both eyes but was not stereoscopic. To keep the frame rate high (60 frames per sec), the resolution was lowered to 240 × 320 pixels. Both the chair and the visual scene were oscillated sinusoidally at 0.1 Hz. The visual velocity was fixed at an amplitude of 30 deg/sec throughout the experiment. The image used was stored on a standard personal computer using Virtual TV (VTV) software from Warp, Ltd. This allows a very large image to be stored
58
PROTHERO AND PARKER
FIG. 3.2. The rotating chair used in the visual–inertial crossover experiments. It permitted sinusoidal oscillations in the horizontal plane under computer control.
in memory and different parts of it to be quickly indexed. Thus, visual oscillations were implemented by indexing different parts of the image according to a fixed schedule rather than by the slower procedure of computing the image dynamically. In each session, participants were exposed to two visual conditions: a scene from Maui (meaningful condition) and the same set of pixels randomized spatially (nonmeaningful condition). At the beginning of each session, per-exposure presence ratings were obtained. These were taken following a 1-min period, during which participants were asked to look around and gather their impressions of the scene. When presence was reported, inertial and visual motion self-motion cues were congruent. Participants were read the following presence question: “I feel like I am in: One equals the laboratory wearing a head-mounted display; seven equals the virtual world shown by the head-mounted display.” This question was thought to most directly capture the informal definition of presence as the sense of “being in” a virtual world. Higher values on this 1–7 scale indicated higher levels of reported presence in the virtual environment. To measure visual–inertial crossover, the self-motion implied by visual oscillation was set to lag the self-motion implied by chair (inertial) motion by 90 deg. Trials for the meaningful versus nonmeaningful visual scene conditions alternated using ABBA or BAAB sequences. Before each trial, the chair was started from rest and subjects were asked to close their eyes and count down by 7 sec from an arbitrary number for 25 sec. This served to keep them from “locking in” to the chair motion. They were then asked to open their eyes and continue counting for
3.
PRESENCE AND MOTION SICKNESS
59
Virtual Scene Visual Inertial Perceived
A
B Motion Direction
Visual Inertial Perceived
p Time
?
?
C
D
FIG. 3.3. Presence Measure Experimental Configuration. (A) Subject is seated in a chair which oscillates in yaw (i.e., around the Earth-vertical axis) while wearing an HMD. The HMD shows a scene that also oscillates in yaw. (B) In the real world, when one turns to the right the visual field flows to the left. The sense of self-motion is toward the right. (C) The visual motion in the HMD can be set to turn in the same direction as the inertial motion. In this case, the inertial cues make one think one is moving to the right, whereas the visual cues make one think one is moving to the left. The sense of self-motion depends on their relative strength. (D) Any visual–inertial phase angle is possible.
an additional 25 sec. This gave the visual cues time to evoke vection. (Latencies to “saturated” vection are in the order to tens of seconds; e.g., Wong & Frost, 1978.) Finally, they were asked to stop counting, continue watching the visual scene, and to signal, by switching a toggle, their perception of the right/left extremes (peaks) of chair motion. These data allowed us to determine whether subjects’ responses were determined by inertial (toggle switches in phase with chair motion) or visual stimuli (toggle switches in phase with visual scene motion).
60
PROTHERO AND PARKER
Following each trial, the chair was returned smoothly to rest. The peak inertial velocity for the next trial was higher than for the previous trial if the subject’s responses had been in phase with visual motion and lower if the responses had been in phase with chair motion. The size of the inertial velocity adjustments was determined by a PEST procedure (Taylor, 1967). The data produced were the ranges of inertial velocities in which the subject switched between visual and inertial “dominance” for the meaningful and nonmeaningful visual conditions. The midpoint of this range is referred to as the “crossover amplitude.” A higher inertial crossover velocity implies a greater influence of the visual stimulus on the sense of self-motion. The range was determined to within at most 5-deg/sec-peak velocity, except for a few cases in which this was impossible due to scheduling or equipment problems. Results Separate analyses of variance (ANOVAs) were computed for the crossover and presence data. Factors were visual scene (meaningful/nonmeaningful) and session number (first/second). These factors were tested using their interaction with the subjects factor as the error term. Inertial crossover velocity was higher for the meaningful visual scene condition than the nonmeaningful condition ( p < 0.05), as predicted. The trend for reported presence was toward greater for the meaningful visual scene condition, but the difference was not statistically significant ( p < 0.10). The correlation between the crossover and presence data across subjects was 0.06. (This was computed by comparing all data between the two measures with matched subject identifier, treatment condition, and session number.) However, the correlation of the difference between visual scene conditions for the crossover measure with the difference between conditions for the presence measure was 0.38. (This was computed by finding the differences across treatment conditions for matched subjects and session numbers, and then correlating these differences across measures.) This correlation approached statistical significance ( p < 0.07). The test–retest correlations were quite high for both measures: 0.83 for the crossover measure and 0.80 for the reported presence measure. Discussion As predicted, both the crossover and presence measures were higher for the meaningful visual scene than for the nonmeaningful one, although this was only a trend for reported presence. These measures appear to be assessing a common factor; however, the data also suggest that unique factors contribute to each. Both measures showed reasonable test–retest reliability. It is not surprising that no relationship (r = 0.06) was found between the crossover and presence measures. This is consistent with the lack of a standard
3.
PRESENCE AND MOTION SICKNESS
61
scale across subjects for assignment of numbers to mental states. One person’s presence rating of 2 need not bear much relationship to the strength of subjective presence felt by someone else who reported the same presence rating of 2. However, a weak relationship (r = 0.38) was found for differences in visual scene conditions across the two measures. This suggests that subjects who reported large presence difference between scene conditions had a tendency to also show large differences between the two conditions on the crossover measure. In summary, this preliminary study has shown that the visual–inertial crossover measure is capable of finding an effect of visual scene in the predicted direction that agrees with a trend found by a presence measure.
GENERAL DISCUSSION Virtual environments have the potential to increase the human–computer communication bandwidth by mimicking natural stimulation of sensory channels. However, use of virtual interfaces raise two fundamental problems. First, because they effectively stimulate human perceptual systems, virtual interfaces have the capability to create confusion with unpleasant consequences, including simulator sickness. The second problem is the lack of robust measures for the quality of interfaces, without which it is difficult to build the knowledge needed for systematic and high-quality interface engineering. We suggest that a useful (if partial) measure for the quality of an interface is the degree to which it can induce presence. Easily used, intuitive interfaces—interfaces that provide a consistent set of stimuli that match sensory capabilities—may also increase the sense of presence. Conversely, a good measure for presence should tell us something about the quality of the interface. There is as yet no agreed-on definition of “presence.” Some emphasize “cognitive” features, other focus on “behavioral” properties (e.g., Stanney & Salvendy, 1998). In this chapter, we have emphasized the spatial orientation property of presence for two reasons. First, spatial orientation is a fundamental requirement for ecologically appropriate behavior (Gibson, 1966; Howard and Templeton, 1966). One cannot act if one is disoriented. Second, as discussed below, measures of spatial orientation may permit reliable, valid determination of at least one basic feature of presence. Advantages of Visual–Inertial Crossover for Measuring Presence Assessment of presence may be useful for evaluating interfaces. The crossover procedure described in this chapter could provide an objective, reliable and valid means for scaling presence. This procedure has several possible advantages over a presence measure based on magnitude estimation.
62
PROTHERO AND PARKER
Numeric verbal magnitude estimation of presence is likely to produce results of marginal validity. Following its introduction by Stevens (1961), magnitude estimation procedures were widely used and produced many useful observations. However, numerous limitations soon became apparent (Falmagne, 1986; Poulton, 1968). One of these is the “range effect” (Teghtsoonian, 1971): Subjects’ numerical ratings are strongly influenced by the range of physical stimuli to which they are exposed. The possible influence of range effects can be controlled or evaluated for cases where the physical stimulus dimension is well described, for example, when relating physical sound intensity to perceived loudness. Similar control or evaluation is not available when the domain for which verbal magnitude estimations are being provided is not described. Presence is a product of the observer’s cognitive processes; physical manipulations to appropriately manipulate perceived presence are only vaguely known. Consequently, we are unable to evaluate possible range effects when performing magnitude estimation of presence. (Similar limitations are encountered when using magnitude estimation of assessment of cognitively mediated percepts, such as pain and motion sickness.) A second difficulty with magnitude estimation is anchor effects (Tversky & Kahneman, 1974). The values subjects assign to a given condition may depend on the conditions to which it is compared. Range and anchor effects have serious consequences for magnitude estimation. These effects sharply limit our ability to draw valid conclusions from comparisons between data gathered in separate experiments with different conditions. Because the perceptual crossover procedure may be more deeply rooted in central nervous system processes than cognition-based magnitude estimation, crossover amplitudes found in different experiments may be less distorted by uncontrolled cognitive effects than are presence magnitude estimates. Consequently, use of the crossover procedure may allow knowledge to be built up systematically by pooling data across experiments rather than being limited to single experiments. The prediction that the crossover measure is less prone to uncontrolled cognitive effects than self-reported presence estimates could be tested as follows. Condition A could be compared to a Condition B in one experiment, and that same Condition A could be compared to a quite different Condition C in a second experiment with different subjects. The prediction would be that crossover values would be similar for both experiments for Condition A. However significantly different presence estimates would be reported in the two experiments due to anchor and/or range effects. Variables known to influence presence should be manipulated in these experiments. There are certainly limitations to presence as a measure for the quality of a computer–user interface. It is a general and somewhat ambiguous measure; it does not replace task-specific performance measures. And higher presence is not always useful. For instance, if the task requires switching rapidly between two displays, high presence in either may impede a smooth transfer. Nevertheless,
3.
PRESENCE AND MOTION SICKNESS
63
presence is an important variable to study in the search for interface-goodness measures. Design and Evaluation of Virtual Interfaces Underlying the drive toward more advanced interfaces is the need to manage complex information. The wealth of the industrial economies is increasingly in the form of information rather than of physical materials. In the United States, information services have grown relative to the rest of the economy since 1860, with a sharp acceleration after 1940 (Katz, 1988). A monograph surveying the underlying trends and projecting their implications can be found in Prothero (1996). Virtual interfaces are one means to address the complexity problem. By using more of the human sensory capability and making use of natural human interaction modalities, virtual interfaces have the potential to increase the human–computer bandwidth. Harnessing this potential may depend on avoiding simulator sickness and measuring the quality of virtual interfaces.
SUMMARY AND CONCLUSIONS This chapter introduced the rest frame hypothesis as a means to summarize some of the literature on spatial perception. Building on the rest frame hypothesis, techniques are suggested for reducing simulator sickness and for measuring presence with a visual–inertial crossover procedure. It is suggested that motion sickness (and the component of simulator sickness due to motion sickness) arises from conflicting rest frames rather than from conflicting motion cues per se. Consequently, to reduce simulator sickness, it may be sufficient to remove the particular motion cue conflicts that indicate conflicting rest frames rather than removing all conflicting motion cues. This line of thought leads to the suggested use of independent visual backgrounds. It is also suggested that presence reflects the degree to which the selected rest frame is determined by the virtual interface. This implies that presence can be measured by perceptual experiments that gauge the degree to which the selected rest frame is determined by the virtual interface as opposed to the real environment. Research using a visual–inertial crossover procedure is described. This procedure may be preferable to magnitude estimation as a presence measure in that the former may be more deeply based in the function of the nervous system. Future work on the independent visual background technique should examine its applicability to high-end simulators, where simulator sickness is a more serious problem than in the low-end system used in the research discussed here. Future work on presence measures should involve both the systematic application of crossover procedure to interface issues and the development of perceptual
64
PROTHERO AND PARKER
techniques that are less cumbersome than the one introduced here. A colleague made the following interesting suggestion: Presence is known to be increased by the addition of auditory cues. Therefore, if auditory cues were added to the environment (thereby increasing presence), one would expect that greater inertial motion would be required to produce crossover.
ACKNOWLEDGMENTS As mentioned in the main text, the simulator sickness research reported here was conducted in collaboration with Mark Draper. Others who contributed ideas to the research reported include Thomas Furness, John Jahnke, Robert Welch, and Maxwell Wells. This research was supported in part by the Air Force Office of Scientific Research Grant F49620-93-1-0339, and NASA grants NAG5-4074 and 9-958.
REFERENCES Andersen, G., & Braunstein, M. (1985). Induced self-motion in central vision. Journal of Experimental Psychology, 11, 122–132. Asch, S., & Witkin, H. (1948). Studies in space orientation II: Perception of the upright with displaced visual fields and with body tilted. Journal of Experimental Psychology 38, 455–477. Berthoz, A. (1991). Reference frames for the perception and control of movement. In J. Paillard (Ed.), Brain and space (pp. 81–111). Oxford, England: Oxford University Press. Brandt, T., Wist, E., & Dichgans, J. (1975). Foreground and background in dynamic spatial orientation. Perception & Psychophysics, 17, 497–503. Brandt, T., Dichgans, J., & Koenig, E. (1973). Differential effects of central and peripheral vision on egocentric and exocentric motion perception. Experimental Brain Research, 16, 476–491. Carpenter-Smith, T., Futamura, R., & Parker, D. (1995). Inertial acceleration as a measure of linear vection: An alternative to magnitude estimation. Perception & Psychophysics, 57, 35–42. Cohen, B., & Henn, V. (1988). Representation of three-dimensional space in the vestibular, oculomotor and visual systems. New York: New York Academy of Sciences. Falmagne, J. (1986). Psychophysical measurement and theory. In K. Boff, L. Kaufman, & J. Thomas (Eds.), Handbook of perception and human performance (pp. 1-1–1-66). New York: Wiley. Fischer, M., & Kornmuller, A. (1930). Optokinetisch ausgeloeste Bewegungswahrnehmung und optokinetischer Nystagmus. Journal von Psychologie und Neurologie, 41, 273–308. Frederici, A., & Levelt W. (1990). Spatial reference in weightlessness: Perceptual factors and mental representations. Perception & Psychophysics, 47, 253–266. Gibson, J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Goldstein E. (1984). Sensation and perception. Belmont, CA: Wadsworth. Griffin, M. (1990). Handbook of human vibration. London: Academic Press. Harm, D., & Parker, D. (1993). Perceived self-orientation and self-motion in microgravity, after landing and during preflight adaptation training. Journal of Vestibular Research, 3, 297–305. Harm D., Parker D., Reschke M., & Skinner N. (1998). Relationship between selected orientation rest frame, circular vection and space motion sickness. Brain Research Bulletin, 47, 497–501. Hebb, D. (1949). The organization of behavior. New York: Wiley.
3.
PRESENCE AND MOTION SICKNESS
65
Howard, I. (1982). Human visual orientation. New York: Wiley. Howard, I. (1986). The perception of posture, self-motion and the visual vertical. In K. Boff, L. Kaufman, & J. Thomas (Eds.), Handbook of perception and human performance (pp. 18-1– 18-62). New York: Wiley. Howard, I., & Heckmann, T. (1989). Circular vection as a function of the relative sizes, distances and positions of two competing visual displays. Perception, 18, 657–665. Howard, I., & Templeton W. (1966). Human spatial orientation. London: Wiley. Katz, R. (1988). The information society: An international perspective. New York: Praeger. Kennedy, R., Lane, N., Berbaum, K., & Lilienthal, M. (1993). Simulator sickness questionnaire: an enhanced method for quantifying simulator sickness. International Journal of Aviation Psychology, 3, 203–220. Levine, M., & Shefner, J. (1991). Fundamentals of sensation and perception. Pacific Grove, CA: Brooks/Cole. Mach, E. (1875). Grundlinien der Lehre von der Bewegungsempfindungen. Leipzig: Engelmann. Mach, E. (2001). Fundamentals of the theory of movement perception, Translated by Young, L., Henn, V. and Scherberger, H. New York: Klumer Academic/Plenum Publishers. McMillan, G., Martin, E., Flach, J., & Riccio, G. (1985). Advanced dynamic seats: an alternative to platform motion? In Proceedings of the 7th Interservice/Industry Technical Equipment Conference. Arlington, VA: American Defense Preparedness Association, 37–51. Mergner, T., & Becker, W. (1990). Perception of horizontal self-rotation: multisensory and cognitive aspects. In R. Warren & A. Wertheim (Eds.), Perception and control of self-motion (pp. 219–263). Hillsdale, NJ: Lawrence Erlbaum Associates. Ohmi, H., & Howard, I. (1988). Effect of stationary objects on illusory forward self-motion induced by a looming display. Perception, 17, 5–12. Ohmi, M., Howard, I., & Landolt, J. (1987). Circular vection as a function of foreground–background relationships. Perception, 16, 17–22. Paillard, J. (1971). Les determinants moteurs de l’organisation spatiale. Cahiers de Psychologie, 14, 261–316. Parker, D. (1980). The vestibular apparatus. Scientific American, 243, 118. Parker, D. (1991). Human vestibular function and weightlessness. The Journal of Clinical Pharmacology, 31, 904–910. Parker, D., & Parker, K. (1990). Adaptation to the simulated rearrangement of weightlessness. In G. Crampton (Ed.), Motion and space sickness (pp. 247–262), Boca Raton, FL: CRC. Poulton, E. (1968). The new psychophysics: six models for magnitude estimation. Psychological Bulletin, 69, 1–19. Precht, W. (1979). Vestibular mechanisms. Annual Review Neuroscience, 2, 265–89. Presence (1992). Spotlight on simulator sickness. Presence, 1, 295–363. Prothero, J., Draper, M., Furness, T., Parker, D., & Wells, M. (1998). The use of an independent visual background to reduce simulator side-effects. Aviation, Space Environmental Medicine, 70, 277– 283. Prothero, J. (1996). The political science of the internet. [Online, Human Interface Technology Laboratory Tech. Rep. No. R-96-2]. Available: http://www.hitl.washington.edu Prothero, J. (1998). The role of rest frames in vection, presence and motion sickness [Online]. Doctoral dissertation, University of Washington. Available: http://www.hitl.washington.edu Reason, J. (1970). Motion sickness: A special case of sensory rearrangement. Advancement of Science, 386–393. Reason, J. (1978). Motion sickness adaptation: A neural mismatch model. Journal of the Royal Society of Medicine, 71, 819–829. Riccio, G., & Stoffregen, T. (1991). An ecological theory of motion sickness and postural instability. Ecological Psychology, 3, 195–240. Rolfe, J., & Staples, K. (1986). Flight simulation. Cambridge, England: Cambridge University Press.
66
PROTHERO AND PARKER
Rubin, E. (1915). Synsoplevede figurer. Copenhagen, Denmark: Gyldendalska. Schiffman, H. (1990). Sensation and perception. New York: Wiley. Schubert, T., Friedmann, F., & Regenbrecht, H. (1999). Decomposing the sense of presence: Factor analytic insights. In Proceedings of the Second International Workshop on Presence. Colchester, England: University of Essex, 1–6. Senden, M. v. (1932). Raum- und Gestaltauffassung bei operierten Blindgebornenen vor und nach der Operation. Leipzig, Germany: Barth. Stevens, S. (1961). To honor Fechner and repeal his law. Science, 133, 80–86. Stanney, K., Kennedy, R., Drexler, J., & Harm, D. (1999). Motion sickness and proprioceptive aftereffects following virtual environment exposure. Applied Ergonomics, 30, 27–38. Stanney, K., & Salvendy, G. (1998). Aftereffects and sense of presence in virtual environments: formulation of a research and development agenda. International Journal of Human–Computer Interaction, 10, 135–187. Taylor, M. (1967). PEST: efficient estimates on probability functions. Journal of the Acoustical Society of America, 41, 782–787. Teghtsoonian, R. (1971). On the exponents in Stevens’ power law and the constant in Ekman’s law. Psychological Review, 78, 71–80. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131. Waespe, W., & Henn, V. (1979). The velocity response of vestibular nucleus neurons during vestibular, visual and combined angular accelerations. Experimental Brain Research, 37, 337–347. Wallach, H. (1959). The perception of motion. Scientific American, 222. Wertheimer, M. (1923). Untersuchungen zue Lehre von der Gestalt II. Psychologisch Forschung, 4, 301–350. Wong, S., & Frost, B. (1978). Subjective motion and acceleration induced by the movement of the observer’s entire visual field. Perception & Psychophysics, 24, 115–120.
4 Transfer of Training in Virtual Environments: Issues for Human Performance Marc M. Sebrechts1 Corinna Lathan2 Deborah M. Clawson1 Michael S. Miller 3 Cheryl Trepagnier 4 The Catholic University of America1 AnthroTronix, Inc.2 National Defense University 3 and the National Rehabilitation Hospital 4
Transfer is an important aspect of training that emphasizes the ability to use what is learned in one context when performing in another context or at another time. Virtual environments (VEs) have been proposed as a particularly promising means to provide such training because they enable substantial control of the relations between training and testing contexts. This chapter examines those prospects in relation to the general psychological issues of transfer. First, we review the key concepts related to transfer using both traditional and VE approaches. We then provide examples of potential use of VE for skill transfer in three areas—navigation, medical simulation, and rehabilitation—and point out some key issues in assessing transfer in each of those contexts. * Correspondence
regarding this chapter should be addressed to Marc M. Sebrechts, Ph. D., Department of Psychology, The Catholic University of America, 4001 Harewood Road NE, Washington DC 20064. E-mail:
[email protected]
67
68
SEBRECHTS ET AL.
There is a long-standing belief that mental tasks can be thought of as analogous to physical ones. If we train the mental muscles, we will produce general improvement in reasoning on other tasks. This view was prevalent among a number of psychologists in the early part of this century and was known as the doctrine of formal discipline (Angell, 1908; Woodrow, 1927). The mind was thought to have general faculties, which when exercised appropriately, would lead to fairly broad transfer. Although this view has been frequently used to drive educational policy and practice, it has proven difficult to provide any convincing data demonstrating this type of generalization. In fact, a number of early empirical studies supported a surprising degree of specificity in the case of skill transfer (e.g., Thorndike & Woodworth, 1901). Based on these results, Thorndike (1906) proposed an alternative view, the theory of identical elements. Apparent transfer between different tasks occurred not because of a common strength of general reasoning, but because two related tasks shared identical elements. A number of more recent studies have supported the finding of specificity of transfer. Chase and Ericsson (1982), for example, trained a person, referred to as S.F., to increase short-term digit recall from the typical digit span of about seven or eight items to 81 digits after some 200 hours of practice. However, this memory span increase did not transfer from digits to letters. Even when given relevant information on problem solving, people have substantial difficulty applying the solution to another context. In studies of analogy, for example, people have shown relatively little spontaneous transfer. One classic example is a problem posed by Duncker (1945): A patient has a malignant tumor, but an x-ray powerful enough to destroy the tumor will destroy the intervening healthy tissue. Surgery is not an option. What should be done?
In the absence of other information, the solution rate on this problem is very low. The effect of prior information was examined by Gick and Holyoak (1980), who provided people ahead of time with another analogous problem. In this alternative problem, a general must attack a fortress, but the roads into the fortress have been mined to allow light traffic but not a substantial force. Even when people have received the solution to this problem—divide up the army into smaller, lighter forces and attack simultaneously along several roads—transfer to the x-ray problem is poor. However, if people are told of the relationship between the problems, then the solution of using several lower intensity x-rays that converge on the tumor is readily seen. A number of other studies have confirmed that spontaneous transfer rates are low, and in many instances even aided transfer is low. Presumably specific relationships need to be recognized for transfer to occur. It is interesting that specificity of transfer is also exhibited in the inappropriate generalization of procedural sequences. Once one learns a specific solution
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
69
procedure, there is a tendency to apply that same solution even when it is inappropriate or inefficient. This is known as the mechanization of thought, or Einstellung effect, as illustrated in Luchins’ water-jug problem (Luchins, 1942; Luchins & Luchins, 1959). In the original study, subjects were given a series of problems in which they needed to use three different-size jugs (A, B, and C) to get a desired amount of water. The first several problems could be solved by filling jug B and then pouring out some water to fill jug A and some to fill jug C twice (Answer = B–A–2C). Although later problems could be solved by a simpler procedure, A – C, most people continued to use the more complex procedure when possible and failed to solve the problem when the first procedure they had used would not work. The extreme form of this view is that only the specific task is learned. Transfer to another task occurs only if two tasks have identical elements or components. Singley and Anderson (1989) have suggested that this identical elements view of transfer is too restrictive. Transfer does not require surface identity in two tasks but a commonality of logical structure. Thus, for example, the ability to write computer programs in one language (such as Pascal) may facilitate the ability to write computer programs in another related language (such as C) even though the specific representations of individual instructions differ. In sum, the literature suggests strongly that training may be better to the extent that the learning conditions and the performance conditions are closely matched. The precise character of that match and what counts as essential to the match is still an issue. However, research on the learning of specific events, or episodic memory, suggests that everything, including the learning context, is relevant (Tulving, 1983). At the same time, the context that the learner brings to the situation, as well as their prior knowledge, is dynamic, so that in this sense the learning situation is never identical to the performance context. Perhaps the closest training–performance match is achieved by on-the-job training, where as many characteristics as possible are held constant. There are, however, three limitations to this approach of matching training and performance as a general solution. First, there may be practical limits to working in the performance environment due to risk, cost, complexity, or control. Consider, for example, the question of preparing a plan of action to deal with a site where there has been a toxic chemical explosion. Spending time orienting at the site itself would result in a substantial health risk. Second, the target environment may not present all of the relevant information in the optimal way. To return to the chemical explosion example, consider the attempt to rescue injured victims. The structures in the building may be damaged in ways that hide the structural cues that are important to recovery efforts. Being able to use a model of the building as it was prior to the explosion could provide substantial benefit in directing a search. Third, learning does not always occur optimally with all information present. A complex structure may be overwhelming and can actually inhibit learning. Carroll and Carrithers (1984a, 1984b; Carroll, 1992) studied these issues extensively in the
70
SEBRECHTS ET AL.
case of computer skill acquisition, leading to the design of a minimalist approach to learning. By minimizing the available tools and required procedures during initial learning, errors were reduced. More strikingly, such restricted initial learning enhanced subsequent learning when the full array of tools and procedures was made available. Virtual environments (VEs) provide one solution strategy that addresses each of these limitations. They can provide a means to mimic a substantial portion of the perceptual features of physical reality without the attendant risk or cost; the designer has explicit control over the environment and can modify its complexity. A VE also allows manipulations of the space that cause it to differ from reality in clearly specified ways. Environments can be changed or restricted to enhance specific needs of a given task. Thus, VEs hold the possibility of being an especially well-suited tool for transfer. Assessing this potential will depend both on the VE technology and on the ability of the technology to fit with performance demands of the user. Many of the major design properties of the technology are covered in a number of excellent texts (e.g., Barfield & Furness, 1995; Stuart, 1996; Stanney, 2002). Here we focus on the role of user characteristics as part of the VE design process for effective transfer. Although each specific VE will evoke particular questions, a standard informationprocessing model can serve as the basis for linking user characteristics to system design (Card, Moran, & Newell, 1983). In brief, anyone learning a task is subject to the constraints of his or her prior memory as well as the capacities of the perceptual processor that receives the information, the cognitive processor that transforms it and relates it to remembered information, and the motor processor that produces a response. The adequacy of a VE system will then depend on its ability to match or perhaps augment those characteristics. With perfect fidelity of a VE to reality, the match would be comparable to that in the physical world. However, few current VEs provide a sustained experience that is perceptually and experientially indistinguishable from reality. Most criticisms of fidelity are actually criticisms of trade-offs between different aspects of fidelity. For example, high-quality visual images usually result in some lag in tracking. It remains an empirical question as to how much fidelity is required for any given performance objective. In the limit case, we know, for example, that a lag of a few milliseconds should make little difference to the visual processor, whereas a lag of half a second will. There is still much to be learned about how specific changes in perceptual properties influence training or transfer performance. More interesting, in at least some cases, is that mimicking the physical world may not be the ideal for training. For example, one might better learn certain laws of physics by participating in a VE that violates normal constraints, or at least lets a person test alternative virtual physical worlds. In other cases, as we argue below in more detail, having a changed response in the virtual world may actually benefit patients recovering from certain types of stroke. At this point, we know very little about how such modified versions of reality might influence transfer.
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
71
VEs provide the first test bed for a full range of controlled studies on the influence of both physical simulations and modified reality on transfer. In what follows, we characterize some important issues in assessing the viability of VE for transfer by examining three areas of application with which we have some familiarity: spatial navigation, medical simulation, and rehabilitation. For each of these areas, we consider how different aspects of the human as information processor need to be taken into account in the analysis of the VE task. We discuss how the utility of a VE is at least in part dictated by an appreciation of how it can help to constrain the solution space during learning by reducing perceptual, motor, or cognitive demands. Although all of the processing components must function together in training and transfer, for each area of application described below, we focus on one of the three principal components. In the case of navigation, the emphasis is on perceptual processing and how the visual information is used to learn a specific route. The analyses use currently available VEs implemented in moderate-cost technology. Recent empirical data are described, together with specific techniques of assessment that are required in this environment. In the case of medical simulation, we highlight the motor components that need to be added in this environment. In addition to visual guidance, force feedback plays an important role. Training for a spine biopsy procedure is used as an example case study. Here, we focus on the trade-off between manual and automated tasks, and discuss how VEs may provide an especially useful context for developing part-task training. By enabling the learner to focus on one aspect of the task while the VE controls other aspects, it is possible to design a training environment that can facilitate transfer to systems with varying degrees of automation. Finally, in the case of rehabilitation, we emphasize cognitive processes and explore how those processes may be influenced by the rapidly evolving condition of a patient. In this context, the ability to change the training environment to more accurately fit the new constraints on processing makes VE especially relevant. In addition, our three example areas differ in the degree of maturity in research on assessment. For spatial navigation, there is already a growing body of literature (reviewed briefly in Darken & Peterson, 2002), and we have conducted specific experiments to contribute to that analysis. In the case of medical simulation, a prototype spine-biopsy system has been designed, but no empirical evaluation has been completed. Although there are a rapidly expanding number of simulations available in medicine, our analysis focuses on a particular methodological approach and the need to validate what appear to be promising VR approaches (Satava & Jones, 2002, Westwood, Hoffman, Stredney & Weghorst, 2000). In the case of rehabilitation, significant work has been done on motor skills (Holden & Todorov, 2002), but there is limited empirical assessment on the nature of cognitive demands related to rehabilitation (Rizzo, Buckwalter & van der Zaag, 2002). The progress in our laboratories to some extent is reflective of general progress in our understanding and use of VEs for better assessment of transfer of training.
72
SEBRECHTS ET AL.
TRANSFER OF TRAINING IN SPATIAL NAVIGATION There are many situations in which it would be useful to understand the spatial layout of a building or area before entering it for the first time. People moving to a new city would benefit from touring apartment buildings before a homehunting trip, and home owners would benefit from wandering through an addition or renovation before it is built. More critically, military forces, law enforcement personnel, and firefighters all encounter situations in which they must navigate through unfamiliar buildings or large areas in order to rescue hostages, disarm defenses, or search for people in need of rescue. In each of these situations, exploring the spaces themselves is impractical or impossible, so another approach to spatial learning is needed that will support transfer to performance on the actual tasks. Virtual reality promises one means of learning such spatial layouts and rehearsing routes. Two aspects of spatial learning are especially important in evaluating these prospects: Do VEs provide the requisite representation for the user to learn and transfer knowledge of spatial layouts? Do VEs lead to flexibility in the application of the spatial models that have been learned? Answers to these questions will depend both on what is presented to the user and the interface tools for controlling that presentation. To date, research on VE navigation has emphasized the visual display, and this section likewise focuses primarily on assessing the adequacy of visual immersion for VE training and transfer. In learning to navigate large-scale spaces, it is generally acknowledged that the mental representation progresses through three stages (Siegel & White, 1975). The earliest representation of a space is a set of disconnected landmarks. With more experience in the space, a “route” representation is formed that includes links between landmarks. When a more global, integrated representation is developed, the person is said to possess “survey” knowledge of the space. To understand the potential of VEs for navigation, then, it is important to assess their role in helping to train route and survey knowledge. Route knowledge is the minimum level needed to be able to follow a path through a space. Survey knowledge can allow higher levels of performance, permitting efficient discovery of new routes through the space. Route Knowledge There is substantial evidence that VEs are effective training tools for developing route representations. Witmer, Bailey, Knerr, and Parsons (1996) demonstrated that experimental participants who had learned a route in a VE could then walk that route in the actual building with few wrong turns and at the same speed as participants who had learned the route in the building itself. Although the VE experience in that study was limited to the visual modality, the visual representation
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
73
FIG. 4.1. Interior view of a medium-fidelity VE model of an actual building used in examining route learning and transfer. The perspective is from the second floor of a two-story building. On this floor, spaces are separated by dividing-walls that do not reach the ceiling.
was high fidelity, with complete furnishings in all of the traversed rooms. In our own studies, we have found similar route learning for a substantially lower fidelity VE that emphasized only the structural architecture in the model (See Fig. 4.1; Clawson, Miller, & Sebrechts, 1998; Miller, 2001). In a number of studies, including the one by Witmer and colleagues (1996), a VE was used as a practice supplement to maps and descriptions. In our case, we found that a VE, exclusive of other aids such as maps, could still produce transfer comparable to using a map or practice in the real world. There is other evidence that also suggests reasonable learning of routes from other implementations of virtual reality; for example, Waller, Hunt, and Knapp (1998), using extensive practice in a virtual model of a maze, found effective transfer to the physical maze. The requirement for fidelity is an important transfer issue, but the evidence to date suggests that route transfer (as measured by wrong turns or task completion) can be successful with limited fidelity, a restricted field of view of no more than 60 degrees, and certain restrictions on movement. It remains an open question whether or not fidelity will play a more significant role with other measures of performance.
74
SEBRECHTS ET AL.
Survey Knowledge Walking the learned route is a critical measure for evaluating learning, but it is also important to know whether the learners have developed effective survey representations of the space. Besides being able to follow the exact route followed during practice, do they understand the layout of the space they have traversed? The extant data is somewhat ambiguous on this question. Witmer and colleagues (1996) found no difference in survey knowledge among their practice groups, but this assessment was based on only a small number of trials in which participants needed to judge the spatial relations among objects. Waller, and colleagues (1998) concluded that an immersive VE was less effective than other training methods (map, real maze, or nonimmersive VE) in allowing participants to develop survey representations of the space. However, it is possible that their VE participants, with a limited number of training trials, were not able to achieve a survey model of the VE space in the time available. In our studies (Clawson et al., 1998; Miller, 2001), training was performed to a criterion, so it is reasonable to assume a minimum competence on the specified route. Under these circumstances, participants’ accuracy in identifying the Euclidean distances and angular directions between environmental objects along the specified route after VE training was comparable to that after training using a map or in the real building. Using a fixed study-time approach, Koh, von Wiegand, Garnett, Durlach & Shinn-Cunningham (2000), also found VE-trained survey performance comparable to a real-world trained group. Not all VE studies, however, have shown comparable acquisition and transfer of survey knowledge. Darken and Peterson (2002) have suggested that the varying results indicate that the utility of VEs for the development of survey knowledge may depend on training time. Whereas maps can provide rapid learning of spatial layout, moving through a virtual space, like moving through a real space, may require more time for survey learning. This hypothesis about VE utility remains to be tested systematically. In addition, acquisition of survey knowledge likely depends on a number of other characteristics of the environment. For example, studies in our lab (Knott, 1999; Piller, 2001) suggest that making walls transparent can lead to a survey-based representation more quickly than typical opaque walls; the transfer effectiveness of this approach remains to be tested. One of the advantages of virtual environments is that they can be manipulated in a variety of ways that may be responsive to specific task demands.
Flexibility In assessing survey knowledge, the flexibility with which mental representations can be applied to the actual space is also an important aspect of navigational learning. Moar and Carleton (1982) experimentally documented a common limitation
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
75
in navigation. After participants studied a series of photographs in order to learn a pair of intersecting routes, they were asked to give distance and direction estimates from one landmark to another, a test of survey knowledge. One limitation on spatial knowledge acquired from learning a route became apparent when participants were asked to give these estimates from one landmark to another that had been viewed earlier on the route in the learned direction. More errors were observed when estimates were made in the opposite direction from practice than when estimates were made in the same direction as in practice. It appeared that participants had constructed a mental representation of the space that was specific to the direction in which it was learned. Would this limitation be apparent after VE training? In our studies, flexibility, or inversely, specificity, was assessed by asking half of the participants to walk the route in the opposite direction rather than in the direction of training. These participants walked the route through the real building, pointing to features and indicating distances, in the opposite direction from training. Accuracy of route traversal was high. VE training, as well as training using floor plans or the real building, led to nearly perfect route traversal, with an average of only 1.6 wrong turns per person. Time required to traverse the route was affected by training condition and testing direction, as shown in Fig. 4.2. Those 50
•
Opposite Practiced
Mean navigation time (min)
45
40
35
30
25 VR
Map
Real
Training Condition FIG. 4.2. Mean time in minutes to navigate the route by training condition and testing alignment. VR is comparable to real-world training in the practiced orientation (♦), but is worst in the opposite orientation (•), demonstrating specificity of transfer.
76
SEBRECHTS ET AL.
who had learned the route in a virtual environment traversed the route quickly when following the practiced direction. When walking the route in the opposite direction, though, the VE-trained group was slower than the floor-plan and realbuilding groups. In part, this was because they hesitated more often before making turns. Thus, they made few wrong turns while walking the route, but they spent more time considering which way to turn. Similar specificity of practice effects was found on the pointing measure. As with total traversal time, pointing accuracy after VE training was comparable even to that of the group who had practiced in the building itself, as long as testing was in the practiced direction (mean error was 15 deg). When testing was in the opposite direction, though, performance for the VE-trained participants fell (to a mean error of 41 deg). This pattern was also seen in the distance estimates. As with traversal time and pointing, distance estimates revealed less flexibility after VE learning than after floor-plan or real-building learning. Direction of testing affected distance estimates for VE, leading to lower accuracy when participants were tested in the direction opposite from training. When following the route in the opposite direction, VE learners were less able to apply their mental representations of relative distances between features. It appears that navigational training in a VE may not be effective for learning distances in a way that can be flexibly applied. We further examined flexibility of learning in a VE by having participants learn a single route through a building, or multiple routes through the building, among the same set of landmarks (Miller, 2001). We also altered half of the building by changing the navigational affordances (e.g., walls and doorways) such that the routes learned during training were not possible at testing in the actual building. Participants learned the route using a VE or a paper map. Those who trained using a single route through the building navigated through the unaltered zone faster than participants who trained using multiple routes. However, in the altered zone, participants who trained using multiple routes navigated faster than participants who trained using a single route. Apparently, over-training on a single route resulted in faster navigation when the training and testing layout matched. Training on multiple routes, however, resulted in increased flexibility, as was evident when participants who had received multiple path training navigated more rapidly through the altered zone. So while it appears that training in a VE produces learned-route specificity, it is possible that the effect can be attenuated by further manipulating the nature of exposure to the VE to include approaching the landmarks from varying vantage points. Summary: Transfer of Training in Spatial Navigation Training in a VE appears to lead to highly accurate performance when following a learned route in the real world. This transfer of route knowledge appears to be achievable even with limited-fidelity VE training. In addition, route following seems to generalize relatively well from the learned direction to the opposite
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
77
direction, especially when exposure to the VE includes navigation using varying route perspectives. The utility of VEs for training other aspects of spatial knowledge is less clear. Survey knowledge, as assessed by identifying locations of objects along the learned route, is acquired through VEs, as well as it is through other training. However, assessment of ability to locate objects when following a route in a direction opposite to that used in training indicated that VE training was more sensitive to a change in the route direction than was training on floor plans or in the real building. It appears therefore that limited VE training may not be adequate for conveying a flexible survey perspective of large-scale spaces. Perhaps, as suggested by Waller and colleagues (1998), this limitation may be overcome by extended practice navigating in the VE to lighten the mental load associated with using this unfamiliar tool. In addition to the ability to mimic the traversed space, VE training offers a number of other potential advantages. For example, VE makes it possible to vary the perceptual complexity of a space by reducing the visual clutter that may occupy the real space. Using a VE can help control the social interactions or other inhibitions that might be encountered in a real space and interfere with learning. In addition, VE training substantially reduces the physical effort associated with exploring a large space. At the same time, these issues only begin to delineate the possibility of transfer of training for navigation. By manipulating the spaces, for example, by introducing impossible navigation routes to induce flexibility (Miller, 2001) or by making walls transparent (Sebrechts & Knott, 1998; Knott, 1999; Piller, 2001), it may be possible to change the way in which we think of traditional navigational training.
TRANSFER OF TRAINING IN MEDICAL SIMULATION Training on medical procedures is another area in which transfer of training is a critical issue. For ethical as well as practical reasons, patients cannot be used directly in many aspects of training. Further, there are often substantial limitations on the availability of cadavers or animal models. Even when such alternatives are possible, any individual instance may have peculiarities that do not generalize. Virtual environments are especially promising in this regard and are increasingly used to provide surgical simulation that incorporates computer graphics and interactive devices. One approach to simulator development has been to strive for maximum fidelity, developing highly realistic interactive models of the whole task. An alternative approach is to shift some of the workload from the surgeon to the simulator (Lathan, Cleary & Greco, 1998; Lathan & Cleary, 1998; Lathan, Cleary, & Traynor, 2000); the surgeon can then practice on a separable part of the procedure. On other tasks, such part-task training methods (also referred to as part-whole training) have been shown to be effective for tasks that can be separated into temporally distinct components (Marmie & Healy, 1995; Wightman & Lintern, 1985) rather than temporally integrated ones (Frederiksen & White, 1989).
78
SEBRECHTS ET AL.
As with spatial navigation, the medical virtual environment engages a range of perceptual, cognitive, and motor abilities. In this analysis, however, we will focus on evaluation of the transfer of motor skills from simulator training. We will also discuss how simulation can be used as a test bed for designing new procedures. The Spine Biopsy Task—Toward an Evaluation of Transfer of Training The spine biopsy task can serve as an illustration of the process needed to develop an assessment of transfer of training. In previous research (Tracey & Lathan, 2001), we have shown transfer of training in a related motor task in which the teleoperator of a robot arm learned a pick and place task. Elsewhere (Lathan, Traynor, & Cleary, 2000), we have shown that the spine biopsy simulator is effective as an experimental testbed. The following describes the way in which we can assess the utility of VEs for part-task training in a specific context. First we describe the analysis of the task followed by an allocation of subtasks to a VE simulation. This approach would then allow for the development of an optimal transfer of training through appropriate modification of the VE aspects of simulation. The first step in our simulator development included a task analysis to identify temporally separated components to target for part-task training. Task analysis is a method for producing a list of tasks needed to complete a goal, the order in which they need to be completed, the time it takes to complete the tasks, and what or who is needed to complete the system. Task analysis is a standard method in human factors engineering and has been used for other surgical simulator applications. Both Beagley, Kelly, and Shepherd (1997) and Higgins, Kaufmann, Champion, & Anderson (1997), for example, used a hierarchical task analysis (Shepherd, 1989) to determine the key cognitive and motor skills needed by the surgeon in order to complete a surgical procedure. The current computed tomography (CT)–directed biopsy procedure is fairly slow and tedious. The patient lies prone on the CT table, and an initial CT scan is done. The doctor then selects the best slice to reach the lesion, and the entry point on the patient is marked. The skin is anesthetized over the entry site, and the needle is inserted part way. Another scan is done to confirm the needle position, and the procedure continues with the needle being inserted further and rescanning as needed to verify needle position. When the lesion is reached, a core sample is removed and sent for pathologic analysis. In our simulator, the patient consists of a dummy (Fig. 4.3), which is for visual realism, and a computer model of the anatomical structure. A biopsy needle protrudes through the dummy’s back to a haptic feedback device mounted inside a hollowed-out section of the torso. Figure 4.4 shows a hierarchical task analysis (HTA) of the biopsy needle progression subtask. This subtask begins with CT imaging and ends with a movement of the needle, and may occur many times within one biopsy procedure. HTA allows one to break larger tasks and goals into subtasks to fully understand the
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
79
FIG. 4.3. The biopsy simulator physical interface, a dummy human torso, and a PHANTOMTM force reflecting joystick from SensAble Technologies.
demands on the system. We can see that path planning—calculating the path from the skin entry point (or current needle location) to the lesion location—takes the majority of the subtask time. Improving path planning is the key to optimizing the speed–accuracy trade-off of the spine biopsy task. Through task analysis we can see which objective parameters may be used for quantitative assessment of the simulator’s effectiveness as a trainer. In the real task, we would expect to see an improved ability of the surgeon to track the planned path and a reduction in the number of times the subtask is completed. Improved endpoint accuracy could also be measured indirectly through the size of the tumor successfully biopsied. The Spine Biopsy Simulation as a Testbed for Designing New Procedures The second step in our virtual spine biopsy development is a function allocation of the task components to determine the workload of the human information processor. Function allocation of a system is determined by assigning each system function to a system component (software, hardware, human operator, etc.). The
80
SEBRECHTS ET AL.
FIG. 4.4. The biopsy needle progression subtask hierarchical task analysis (HTA).
goal is to identify critical task components (i.e., those that have high probability of error, substantial workload, or potential for significant time and cost savings) and then relieve the human workload by a reallocation of system functions to the virtual components of the simulation. Reallocation in the simulator allows the operator to focus on training on a specific part of the task. Eventually, the operator is tested on the integrated task. Another option is to change the integrated task to a more advanced system so that the workload remains low. Function reallocation arises directly from the task analysis because the workload can be broken down into the perceptual (select image), cognitive (plan path), and motor (move needle) human information processors. The current biopsy procedure, as represented by Fig. 4.5(a), shows that the human operator carries most of this workload. The radiologist controls the insertion of the needle directly without the use of any force feedback and only uses occasional “snapshots” from the CT scanner for visual feedback. The system in Fig. 4.5(b) is still manually controlled, but the visual feedback has been enhanced greatly to give the radiologist real-time image guidance during the
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
Imaging
Imaging
Human
(a)
81
Ne edle
Nee dle
4.
Needle
Imaging Human
(b)
Human
(c)
FIG. 4.5. Relative function allocation of system components to the human operator, imaging information, and needle insertion. Surgery could advance from the current manual state (a), to the real-time visually guide state (b), to the robot-assisted state (c). Copyright Lawrence Erlbaum Associates.
insertion. Real-time 3-D image guidance would relieve some of the perceptual and cognitive human operator workload, allowing the operator to focus on the motor task of controlling the needle. In addition, incorporating this step into the real procedure would eliminate the need to re-image the patient and reduce the time for path planning and total time for task completion. Figure 4.5(c) depicts an even further step, a semiautonomous system in which a haptic master arm provides force feedback to help guide the radiologist and drive a robotic system that actually performs the task. In this VE scenario, the functions handled by the person are significantly reduced, allowing the radiologist to concentrate on the parts of the procedure for which human judgment is required most. Shared control will reallocate some of the motor function from the human to the robot, reducing total error and improving accuracy. Note that the progression from Fig. 4.5(a) to 4.5(c) is downward compatible; it does not preclude the semiautonomous system from being operated in manual mode. Summary: Medical Simulation Transfer Computed tomography–directed needle biopsies are routinely performed by interventional radiologists to gather tissue samples near the spine for pathological analysis so that a treatment plan can be developed. Biopsies are a major factor in deciding whether to perform surgery, give chemotherapy, give radiation therapy, or do nothing at all. As currently practiced, this procedure is slow and requires a great deal of geometric reasoning, practice, and skill on the part of the radiologist. Depending on the biopsy location, potential risks include bowel perforation, bleeding (life threatening if a large vessel is hit), pneumothorax (perforated lung), and infection. In addition to the risk to the patient and potential discomfort, complications can also result in a longer hospital stay, which would increase the cost to the hospital and patient. Image-guided systems that could assist the radiologist in needle placement and alignment would be a great improvement, reducing time and risk. Even greater
82
SEBRECHTS ET AL.
gains could be realized by the development of a semiautonomous robotic biopsy system that could place the needle under the guidance of the radiologist. There is a substantial potential payoff for improved transfer of training through the use of medical simulators. As already demonstrated in several cases (reviewed in Satava & Jones, 2002), this training can provide a risk-free environment for developing a skill. In addition, we have suggested how a VE can be used to help analyze and partition the surgical task between the surgeon and the simulator. By providing a part-task training environment in which the VE can be modified to simulate varying degrees of task automation, it becomes possible to merge the study of transfer with the design of transferable subcomponent skills.
TRANSFER OF TRAINING IN REHABILITATION Rehabilitation is an attempt to equip individuals who have an acquired or developmental disability with a repertoire of skills that will support coping with the exigencies of everyday life (Franzen, Roberts, Schmits, Verduyn, & Manshadi, 1996; Paquin & Perry, 1990). This poses a substantial transfer of training challenge, further exacerbated, in the case of acute rehabilitation, by the inevitable differences between the hospital context and the life to which the patient will be discharged. Real-world target tasks are varied and unpredictable. The settings to which recovering patients are discharged and the capabilities of the people who support their further recovery can differ along numerous dimensions. The car in which a stroke patient may have learned new entry and exit techniques, and the expertise with which a therapist supervised exercises, may not be representative of the circumstances encountered back at home or in a nursing home. There is a major additional transfer challenge, however, in the case of rehabilitation. The individual is undergoing a period of rapid change affecting cognitive, physical and emotional status, as well as relationships to other people and to the community. With shortened length of stay in rehabilitation hospitals, patients receive their major rehabilitative therapy while still in acute recovery, so transfer needs somehow to bridge these other changes. Another distinctive feature of rehabilitation has to do with the ability to manage one’s own behavior. The recovering stroke patient may perform acceptably during training on how to get into a car safely without loss of balance, and yet fail to apply this training later when the taxi comes to pick her up for her doctor’s appointment. Under the stresses associated with an actual outing, failing to call on procedures, rules or strategies that one has acquired is a common phenomenon in people with residual brain injury or developmental brain deficits. Successful transfer of training requires not only learning the skill, but learning appropriate metacognitive techniques that help determine when and how to apply those skills (Grigsby, Kaye, & Robbins, 1995; Owen, 1997; Zelazo, Carter, Reznick, & Frye, 1997). Although skills such as use of a walker or operation of a communication
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
83
device can be trained, it is more difficult to convey the problem-solving skills needed to decide how and when to use or adapt these techniques. If judgment, organization, and self-management are impaired, and coordination and motor skills are in the process of reorganizing, there may be a significant gap between the performance level necessary to function safely and efficiently and the actual capabilities of the recently discharged patient. In the absence of continued intensive therapy provided by specialists, there is a clear need for continued intensive, monitored practice. To the extent skills are overlearned and are exercised in a variety of contexts, the likelihood that the patient will invoke them when they are needed is raised. In a rehabilitative context, tasks may be unpredictable, there may be deficits in executive functions, and individuals may need to become effective managers of newly reorganized cognitive, emotional, and physical resources. Together with managed care reductions in specialist rehabilitative services, these characteristics convey special urgency to the issue of transfer of training. They also underscore the importance of the role that virtual reality therapy can play. The following sections outline some ways in which VEs may contribute, or are already contributing to rehabilitation. Although transfer in rehabilitation contexts raises a number of perceptual and motor concerns, here we focus primarily on cognitive processes. Motivation and Control The likelihood of successful transfer increases with amount of training and practice. Not surprisingly, gamelike therapeutic activities implemented on computers increase the time that patients invest in therapy, especially if the games are engaging. It also puts participation in therapy under patients’ control, making them the arbiters of when and whether to use the programs, with further benefits to their motivation (Petheran, 1996). VEs have significant and growing advantages over standard computer technology in terms of richness, multidimensionality, and relatively greater verisimilitude. In this respect, computer-based therapies may be seen as simply the first stage towards fully flexible, controllable, virtual therapeutic environments. The increasing recognition of untapped potential for recovery of function, even in patients with some types of chronic neurological disability, underlines the need for effective therapies that patients can pursue after hospitalization and outpatient benefits have run out, and when, paradoxically, they may be most apt to benefit. VR-based therapies combine the advantages, shared by other computerbased activities, of compatibility with remote monitoring, increased control by the individual patient, and customizability and adaptability to the individual’s changing needs and goals. VR has much greater potential benefit in that it can incorporate multiple modalities of input and display (e.g., sensing hand motion, head movement, gaze angle; and providing visual, auditory, vestibular and force feedback), is likely to provide a much more engaging activity, and offers a much broader scope for creating very specific, targeted and carefully titrated experiences.
84
SEBRECHTS ET AL.
VR, Mental Health, and Adaptation to Disabilities One of the key challenges in training during rehabilitation is the change in patient ability. As computer-based models, VEs can be structured to meet the patient’s ability level and to progress in difficulty or environmental complexity in increments that match patient transitions. This progressive disclosure model for transfer of training is related to research on the use of VEs for desensitization therapies. Gradually increasing exposure, by means of the VE, to an object or situation that elicits a phobic response (e.g., spiders, or flying, or being in public ) accustoms an individual to the experience by presenting tolerable increments of the stimulus, until that person is able to confront the object or situation in actuality without experiencing debilitating anxiety (Wiederhold & Wiederhold, 1998; Rothbaum, Hodges, Smith, Lee, & Price, 2000). Therapies of this type delivered using virtual reality have already proven themselves clinically to be as effective as traditional therapies as vehicles of training transfer (Anderson, Rothbaum, & Hodges, 2001), despite major disparities between the virtual and actual environments. Virtual environments may also be useful in a rather different type of transfer, conceptually related to their use in desensitization therapies. They can serve as nonthreatening, more private ways to introduce consumers to assistive technology, or to the adjustments they may need to make to rejoin their social, vocational or other activities as individuals with disabilities. Newly disabled individuals sometimes find it difficult to accept the need for assistance in performing functions they used to carry out without a second thought, such as walking, speaking, and accomplishing basic tasks of everyday life. VE-based practice can address desensitization in this context in small increments, and may accordingly make it easier for the individual with acquired disability to come to accept his or her new persona. Virtual navigation through social settings peopled by avatars, for example, may be useful as a stage in the acceptance of going out in public as a wheelchair user. In this type of application, the goals are cognitive and emotional adaptation, rather than skill acquisition. Once the desensitization accomplished in a virtual environment is transferred to the patient’s initial reentry encounters, these actual experiences would support further and continued desensitization.
Retraining Although there is a tendency to think of transfer as a single training/performance event, effectiveness of transfer can be viewed more accurately within a context of training and retraining. In the case of rehabilitation, training over extended periods, after discharge, shifts responsibility to the individual patient, and may be vital if the skills are to transfer to real-world tasks (Cicerone, 1997). For persons with disability or chronic conditions, there is, moreover, an increased likelihood of
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
85
periods of prolonged reduction of activity, for example in the event of secondary complications, other medical setbacks, or reduced opportunities for social and work activities and reduced mobility. Periods of being sidelined can detract from the individual’s confidence in his or her skills and willingness to become active again. Access to a VE may allow practice in a realistic environment so that the user can regain skill and confidence without exposure to danger. Individuals with certain neurological conditions experience unpredictable variability in their performance over time. Persons with multiple sclerosis, for example, sometimes experience exacerbations that may be temporary or may leave residual effects (van Oosten, Truyen, Barkhof, & Polman, 1995). Individuals with Parkinson’s disease may find their motor control better or worse from day to day or at different times of the day. Having a VE available in which to practice or monitor control skills conveniently and safely may be very useful in such circumstances. In addition to the skills themselves, achieving confidence, reducing stress, and supporting the individual’s emotional equilibrium are crucial in enabling the individual to make functional use of rehabilitative training (Parente & Stapleton, 1997). As length of stay in a rehabilitation hospital shortens, having the option of replaying, in the particularly vivid manner of VEs, training that went by too quickly under the stress of adaptation to catastrophic life changes, clearly has the potential to improve patients’ rehabilitation success. Neuropsychiatric disorders involving panic attacks, avoidance, or compulsive behaviors are subject to relapse, often requiring ‘booster’ therapy or therapy to recover from setbacks. Virtual-reality-delivered desensitization and exposure therapies, once they become more widely available, have the potential for greater flexibility in scheduling, and reduced wait-listing, since they could consume less direct therapist time. ‘Booster’ therapy could thus be accessed at the patient’s initiative at the earliest signs of relapse, instead of waiting for a crisis that impairs function, self-efficacy, and quality of life. Acquisition of Low-Incidence, High-Importance Skills Having to learn to perform common tasks in new ways can be difficult and time consuming. Learning how to cope with low-incidence but critical situations is especially challenging. Rehabilitative training typically emphasizes common situations, both because of their more general utility and because of the difficulty in practicing skills that occur in unusual circumstances. For example, substantial effort may be required in learning to drive an automobile with modified controls. It is impractical to extend that training to practicing under hazardous conditions, such as severe reduction in visibility and traction. VEs can provide risk-free opportunities to practice responding in such low-incidence, high-cost circumstances. In these circumstances, it would seem to be particularly important for the controls and
86
SEBRECHTS ET AL.
the display (e.g., force feedback) to represent accurately the critical components of the experience. Summary: Rehabilitation Transfer Virtual reality simulations that are programmable, customizable, and realistic, can offer nonthreatening, finely graduated training environments for persons who are faced with the challenges of adapting to a new way of being in the world. They can be used to extend initial rehabilitation training, to retrain after periods of inactivity or changes in medical status, and to prepare for circumstances that, if encountered in the actual environment without preparation, would pose threats to survival. They also provide one means of introducing novel assistive technologies. Rehabilitation incorporates many of the same principles that drive our understanding of transfer of training in general. At the same time, the special circumstances help force us to appreciate a series of additional issues in the development of VEs. By using VEs, it is possible to enhance motivation and control beyond that in the real world; a person who may be unable to ambulate without support can experience the fun of using leg movement to guide airplanes under bridges, all the while improving ankle control and strength (Girone, Burdea, Bouzit, Popescu, & Deutsch, 2000). Which is more likely to lead to recovery of ankle strength and agility—a page of sketches and instructions for exercise, or an opportunity to exercise power and control that surpass what nondisabled individuals could possibly accomplish? In addition, the ability to modify conditions found in the physical environment can substantially improve training. It becomes possible to practice events that occur too infrequently to learn in the real world, or that may be too dangerous to test in reality. Since the rehabilitation context is often a particularly dynamic one, the ability to adapt the VE interaction to the individual’s emerging ability levels makes it particularly promising. Finally, VEs may help to extend the concept of transfer of training to include retraining. Transfer can be judged not only by initial success but also by its place in a longer process of learning. As the technology develops, VEs may provide a useful way to integrate initial rehabilitation with ongoing efforts through telerehabilitation. The targeted learning environments could include two-way video and data transfer that can follow the recovering patient into the community. Patients would then be able to receive remote guidance from their clinicians, who could view and comment on a dynamic picture of patients’ level of performance and skill. The incorporation into VEs of video telecommunication and data sharing with remote clinicians could add, in major respects, to the independence and maximization of function for those coping with disabilities. Although VE tools have special significance for transfer of training in rehabilitation, research in this area may also help us to think more broadly about how to achieve successful transfer in the more general case.
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
87
CONCLUSIONS On first analysis, the question of transfer from virtual environments may seem straightforward. If the goal is to optimize transfer, we should mimic the target environment as closely as possible. Thus, fidelity is the central issue. The more realistic the virtual environment, the more success there will be in transfer. However, the actual situation is substantially more complicated. To begin with, there is a practical problem. Higher fidelity comes with a variety of costs. So it is important to think about which costs are worth incurring, primarily by examining their behavioral consequences. For learning to follow a route, for example, a relatively untextured environment appears to be roughly comparable to the real environment. And, as we have argued, differences that fall within human processing limitations should not produce any measurable behavioral consequences. A more important issue, which has been a central focus here, is the way in which VEs are different, not just lower fidelity, environments. For example, the VE provides a mechanism to eliminate a range of distractions during training. For a navigational task, this may just help eliminate some extraneous information. For rehabilitation, linking skill training to virtual games may extend the rate and range of learning. VE also provides for scaled training in a variety of ways. Using a part-task approach makes it possible to learn constituent parts of a task, although as noted earlier, it is important to know which aspects of a task are separable and which are integral. In the case of medical simulation, VE can serve not only as part of the training environment, but as part of the development process. By changing the degree of human/simulator trade-off, it provides a means of advance assessment of performance and learning under a variety of conditions. Transfer has proven to be a difficult area to assess, in part because of the constraints in providing systematic control of the alternative environments. In traditional evaluation, variations in task characteristics were always constrained by the “controlled” aspects of the study. In a virtual space, those assumptions are no longer necessary. It is possible to learn only a constrained route, to walk through a building that has no physical equivalent, to violate physical laws, and to modify the perceived world to accommodate individual deficits. VE not only adds a new technology for transfer. It provides a new way to think about what transfer tasks might look like.
ACKNOWLEDGMENTS The authors would like to acknowledge the Army Research Institute, Alexandria Virginia, (DASW01-96-K-0004) and the Office of Naval Research (N00014-970358) for support of research on “Using Virtual Reality to Improve the Learning, Retention, and Application of Spatial Mental Models.” For the spine biopsy analysis, we acknowledge the input and collaboration of Dr. Kevin Cleary in the
88
SEBRECHTS ET AL.
Imaging Sciences and Information Systems Center, Radiology Department, Georgetown University Medical Center, and clinical input from Drs. Matthew Freedman, Craig Platenberg, and Robert Greco, also at the center. This work was funded in part by U.S. Army grant DAMD17-96-2-6004. Research and writing on rehabilitation was supported by the National Rehabilitation Hospital Assistive Technology Research Center funded by the Department of the Army under award DAMD 17-00-1-0056. The content of this manuscript does not necessarily reflect the position or policy of the U.S. government and no official endorsement should be inferred. REFERENCES Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erlbaum Associates. Anderson, P. L., Rothbaum, B. O., Hodges, L. (2001). Virtual reality: using the virtual world to improve quality of life in the real world. Bulletin of the Menninger Clinic. 65(1), 78–91. Angell, J. R. (1908). The doctrine of formal discipline in light of the principles of general psychology. Journal of Experimental Psychology, 36, 1–14. Barfield, W., & Furness, T. A. (Eds.). (1995). Virtual environments and advanced interface design. New York: Oxford University Press. Beagley, N., Kelly, M., & Shepherd, A. (1997). New issues in virtual surgery draw from task analysis. Paper presented at MEDTEC 1997, Tysons Corner, VA. Card, S. K., Moran, T. P., & Newell, A. (1983). The psychology of human–computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates. Carroll, J. M. (1992). The numberg funnel: Designing minimalist instruction for practical computer skill. Boston, MA: MIT Press. Carroll, J. M., & Carrithers, C. (1984). Blocking learner error states in a training-wheel system. Human Factors, 26(4), 377–389. Carroll, J. M., & Carrithers, C. (1984). Training wheels in a user interface. Communications of the Association for Computing Machinery, 27, 800–806. Casey, S. (1993). Set phasers on stun: And other true tales of design, technology, and human error. Santa Barbara, CA: Aegean. Chase, W. G., & Ericsson, K. A. (1982). Skill and working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 16). New York: Academic Press. Cicerone, K. D. (1997). Cognitive rehabilitation: Learning from experience and planning ahead. Neuro Rehabilitation, 8(1), 13–19. Clawson, D. M., Miller, M. S., & Sebrechts, M. M. (1998). Specificity to route orientation: reality, virtual reality, and maps. Poster presented at the 10th annual convention of the American Psychological Society, Washington, DC. Darken, R. P. & Peterson, B. (2002). Spatial orientation, wayfinding, and representation. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 493–518. Duncker, K. (1945). On problem solving (L. S. Lees, Trans.). Psychological Monographs, 58(270). Franzen, K. M., Roberts, M. A., Schmits, D., Verduyn, W., & Manshadi, F. (1996) Cognitive remediation in pediatric traumatic brain injury. Child Neuropsychology, 2(3), 176–184. Frederiksen, J. R., & White, B. Y. (1989). An approach to training based upon principled task decomposition. Acta Psychologica, 71, 89–146. Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306– 355.
4.
TRANSFER OF TRAINING IN VIRTUAL ENVIRONMENTS
89
Girone, M., Burdea, G., Bouzit, M., Popescu, V., & Deutsch, J. E. (2000). Orthopedic rehabilitation using the “Rutgers ankle” interface. Studies in Health Technology and Informatics, 70, 89–95. Grigsby, J., Kaye, K., & Robbins, L. J. (1995). Behavioral disturbance and impairment of executive functions among the elderly. Archives of Gerontology & Geriatrics, 21(2), 167–177. Higgins, G. R., Kaufmann, H. R., Champion, & Anderson, J. H. (1997). Validation of new simulation technologies for surgical training. Paper presented at MEDTEC 1997, Tysons Corner, VA. Holden, M. K., & Todorov, E. (2002). Use of virtual environments in motor learning and rehabilitation. In K.M. Stanney (Ed.). Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 999–1026. Johnson, D. M. (1996). Learning in a synthetic environment: The effect of visual display, presence, and simulator sickness (Final Report). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Knott, B. (1999). Learning route and survey representations from a virtual reality environment. Unpublished doctoral dissertation. The Catholic University of America, Washington, DC. Koh, G., von Wiegand, T., Garnett, R., Durlach, N., & Shinn-Cunningham, B. (2000). Use of virtual environments for acquiring configurational knowledge about specific real-world spaces: Preliminary experiment. Presence: Teleoperators and Virtual Environments, 8(6), 632–656. Lathan, C. E., & Cleary, K. (1998). Performance feedback in a spine biopsy simulator. Proceedings of Surgical-Assist Systems, SPIE, 3262: 86–92. Lathan, C. E., Cleary, K., & Greco, R. (1998). Development and evaluation of a spine biopsy simulator. In J. D. Westwood, H. M. Hoffman, D. Stredney, & S. J. Weghorst (Eds.), Medicine meets virtual reality. Amsterdam: IOS Press. Lathan, C. E., Cleary, K., & Traynor, L. (2000). Human-centered design of a spine-biopsy simulator and the effects of visual and force feedback on path-tracking performance. Presence, 9(4), 337–349. Luchins, A. S. (1942). Mechanization in problem solving. Psychological Monographs, 54(1), (entire). Luchins, A. S., & Luchins, E. H. (1959). Rigidity of behavior: A variational approach to the effects of Einstellung. Eugene, OR: University of Oregon Books. Marmie, W. R., & Healy, A. F. (1995). The long-term retention of a complex skill. In A.F. Healy & L. E. Bourne, Jr. (Eds.), Learning and memory of knowledge and skills: Durability and specificity (pp. 30–65). Thousand Oaks, CA: Sage. Miller, M.S. (2001). Specificity, transfer, and retention of spatial knowledge from navigation using maps, virtual, and real environments. Unpublished doctoral dissertation. The Catholic University of America, Washington, DC. Miller, M. S., Clawson, D. M., Sebrechts, M. M., & Knott, B. A. (1998). Interface Design for Inducing and Assessing Immersion in VR. Poster presented at CHI ’98: Conference on Human Factors in Computing Systems, Los Angeles. Moar, I., & Carleton, L. R. (1982). Memory for routes. Quarterly Journal of Experimental Psychology Human Experimental Psychology, 34a(3), 381–394. Owen, A. M. (1997). Cognitive planning in humans: Neuropsychological, neuroanatomical and neuropharmacological perspectives. Progress in Neurobiology, 54(3), 431–450. Paquin, M. J., & Perry, G. P. (1990). Maintaining successful interventions in social, vocational, and community rehabilitation. Canadian Journal of Community Mental Health, 9(1), 39–49. Parente, R., & Stapleton, M. (1997). History and systems of cognitive rehabilitation. NeuroRehabilitation, 8(1), 3–11. Petheran, B. (1996). Exploring the home-based use of microcomputers in aphasia therapy. Aphasiology, 10(3), 267–282. Piller, M. (2001). Virtual environments as a tool for transfer and recall of spatial information. Unpublished MA thesis, The Catholic University of America, Washington, DC. Regian, J. W., & Yadrick, R. M. (1994). Assessment of configurational knowledge of naturally and artificially acquired large-scale space. Journal of Environmental Psychology, 14, 211–223.
90
SEBRECHTS ET AL.
Rizzo, A. A., Buckwalter, J. G., & van der Zaag, C. (2002). Virtual environment applications in clinical neuropsychology. In K.M. Stanney (Ed.). Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 1027–1064. Rothbaum, B. O., Hodges, L., Smith, S., Lee, J. H., & Price, L. (2000). A controlled study of virtual reality exposure therapy for the fear of flying. Journal of Consulting and Clinical Psychology, 68(6):1020–6 Satava, R. M., & Jones, S. B. (2002). Medical applications of virtual environments. In K.M. Stanney (Ed.). Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 937–957. Sebrechts, M. M., & Knott, B. A. (1998). Learning spatial relations in virtual environments: Route and Euclidean metrics. Poster presented at the American Psychological Society 10th Annual Convention, Washington, DC. Shepherd, A. (1989). Analysis and training of information technology tasks. In D. Diaper (Ed.), Task Analysis for Human-Computer Interaction. Chichester: Ellis Harwood. Siegal, A. W., & White, S. H. (1975). The development of spatial representations of large-scale environments. In H. W. Reeve (Ed.), Advances in Child Development and Behavior (Vol. 10). New York: Academic Press. Singley, M. K., & Anderson, J. R. (1989). The transfer of cognitive skill. Cambridge, MA: Harvard University Press. Stanney, K. M. (Ed.). (2002). Handbook of virtual environments: Design , implementation, and applications. Mahwah: NJ: Lawrence Erlbaum Associates. Stuart, R. (1996). The design of virtual environments. New York: McGraw-Hill. Thorndike, E. L. (1906). Principles of learning. New York: A. G. Seiler. Thorndike, E. L., & Woodworth, R. S. (1901). The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review, 9, 374–382. Tracey, M. R. & Lathan, C. E., (2001). Interaction of spatial ability and motor learning in the transfer of training from a simulator to a real task.. In J. D. Westwood, H. M. Hoffman, D. Stredney, & S. J. Weghorst (Eds.), Medicine meets virtual reality. Amsterdam: IOS Press, 521–527. Tulving, E. (1983). Elements of episodic memory. London: Oxford University Press. Van Oosten, B. W., Truyen, L., Barkhof, F., & Polman, C. H. (1995). Multiple sclerosis therapy: A practical guide. Drugs, 49(2), 200–212. Waller, D., Hunt, E., & Knapp, D. (1998). The transfer of spatial knowledge in virtual environment training. Presence, 7, 129–143. Westwood, Hoffman, Stredney & Weghorst. (2000). Medicine meets virtual reality. Amsterdam: IOS Press. Wiederhold, B. K., & Wiederhold, M. D. (1998). A review of virtual reality as a psychotherapeutic tool. CyberPsychology & Behavior, 1(1), 45–52. Wightman, D. C., & Lintern, G. (1985). Part-task training for tracking and manual control. Human Factors, 27, 267–283. Witmer, B. G., Bailey, J. H., Knerr, B. W., & Parsons, K. C. (1996). Virtual spaces and real world places: Transfer of route knowledge. International Journal of Human–Computer Studies, 45, 413–428. Witmer, B. G., & Kline, P. B. (1998). Judging perceived and traversed distance in virtual environments. Presence, 7(2), 144–167. Woodrow, H. (1927). The effect of type of training upon transference. Journal of Educational Psychology, 18, 159–172. Zelazo, P. D., Carter, A., Reznick, J. S., & Frye, D. (1997). Early development of executive function: A problem-solving framework. Review of General Psychology, 1(2), 198–226.
5 Beyond the Limits of Real-Time Realism: Moving from Stimulation Correspondence to Information Correspondence Pieter Jan Stappers∗ Kees Overbeeke Delft University of Technology
William Gaver Royal College of Art
Immersive virtual reality (VR) refers to systems that replace people’s natural environments with synthesized perceptual stimuli, usually under interactive control via gestures and head movements. Virtual reality is a promising tool in application areas dealing with complex spatial problems, such as teleoperation, minimally invasive surgery, computer-aided design, architectural evaluation, and city planning. Many of these tasks currently impose a high burden on the user’s abilities for mental imagery, assisting the user only with two-dimensional views and TV images over which the user has no control. Virtual reality allows its users to apply exploratory perceptual and motor skills to their task rather than to spatial problem solving.
* Correspondence regarding this chapter should be addressed to Pieter Jan Stappers, Delft University
of Technology, Faculty of Industrial Design Engineering, Landberghstraat 15 NL-2628CE, Delft, the Netherlands. E-mail:
[email protected]
91
92
STAPPERS, GAVER, AND OVERBEEKE
Most work in developing VR has been aimed at reproducing the natural world, making the stimulation the observer receives undistinguishable from the “real thing.” We argue that this “stimulation correspondence” view of VR has a number of drawbacks if we want to make optimal use of VR as a tool. Correspondence to the natural world is not always necessary or even desirable, is often wasteful, and tends to inhibit the development of new possibilities that VR offers. The basic problem of the realistic aim is that it puts an objective world first, the user’s experience, his tasks, and the information he needs for those tasks second. From a Gibsonian standpoint, the problem of the “realistic” aim is similar to that of classical perception theories that suppose the observer makes a detailed internal reconstruction of the physical environment world. As a result, much power is wasted at generating stimulation rather than information. However, if VR is treated as an expressive medium to be used as a tool rather than a slavish reproduction of the everyday world, interesting shortcuts and extensions emerge, including possibilities for creating more “expressive” forms of VR. Focusing on task-relevant information allows greater freedom in dealing with other aspects of appearance and interaction. They can be left out to improve performance. They can be left out for clarity (as in maps and diagrams). They can be changed, either for aesthetic reasons, to express an emotional tone, or to signal the status of the interface (as sketchiness signals openness). In this chapter, we discuss some of the advantages this sort of “information correspondence” approach has over the more traditional stimulation correspondence perspective. In particular, we suggest that the Gibsonian concept of information in the specificational sense1 may provide a useful basis for developing a theoretical understanding of the differences between these approaches, and for developing new VR tools more generally.
VIRTUAL REALITY Virtual reality is a computer technology used to allow users to interact with a synthetic environment in such a way that the users can employ the exploratory skills that they have learned in interacting with the natural environment.2 For a nontechnical introduction, see Biocca (1992). VR is at its most promising when it allows users to experience direct spatial interaction with objects rather than mediated interaction with more or less 1 In this chapter, we will use the term information in this specificational sense (Gibson, 1979), rather
than in the classical “bits transmitted” or “level of detail” sense of Shannon and Weaver. Therefore, in Fig. 5.1, we changed the labels “information” to “level of detail.” 2 We will use the words synthetic to stand for the computer-simulated entities and natural for the everyday “real” environment, rather than the popular but misleading terms virtual and real.
5.
BEYOND REALISM
93
symbolic representations. This is especially the case for immersive VR, where the user is fully submerged into the virtual environment, not just looking at an augmented display stand. Such systems offer spatial support beyond that delivered by more conventional aids, such as pictures, photographs, or perspective computer graphics renderings. Users can employ their everyday sensorimotor skills for dealing with much of the spatial complexity in the problems they attack. With conventional means, they have to resort to spatial problem solving and mental imagery, extra hurdles to their objective (Smets, Stappers, Overbeeke, & Mast, 1995). Typical application areas for immersive VR are teleoperation (remote control of robots, e.g., in hazardous situations), minimally invasive surgery (where surgeons operate using endoscopes instead of cutting the patient wide open), and product design, architecture, and urban planning (where designs can be experienced before they are physically implemented). All of these areas present spatial problems whose solution requires a human’s perception–action skills, for example, in coping with unpredictable, insufficiently formalized conditions where automated solutions fail (see, e.g., Brooks, 1988; Smets, 1995a). The component technologies of VR can be split into three parts: rendering (output to the user), sensing (input from the user), and simulation (tying input and output together through a model world). All these must be performed and integrated to produce a real-time system upholding the perception–action cycle for the active user. Each of these parts is important, and important progress has been made in each of these technologies, but each in itself is insufficient to produce a VR experience. The properties of the whole system, such as response delays, determine the quality of the simulation. As a technology, VR reflects an ecological paradigm in its emphasis on perception–action coupling to enable the user to become an active participant rather than a passive spectator. In fact, Gibsonian notions find resonance especially in this applied field more than in classical perception laboratories that have a long and venerable tradition in studying the reactions of passive subjects to arbitrary stimulation (Smets, 1995a; Smets et al., 1995).
A THEORY FOR DEVELOPING VR TOOLS: TWO APPROACHES Virtual reality is still in or only just out of its infancy: If we overlook the closed field of flight and combat simulation, applications out of the laboratory are typically arcade games. Most often still, the main asset of VR is its technological gloss value, not its power to allow users to be much more productive at real-world tasks (Krueger, 1995). A great deal of work is still necessary to realize the promise of VR. In what way should the development of VR be guided? Two approaches
94
STAPPERS, GAVER, AND OVERBEEKE
present themselves: the “realist” aim of producing stimulation to mimic the natural environment, and the ecological aim of producing task-specific information. Stimulation Correspondence The mainstream of development seems to follow Sutherland’s much-quoted ideal: “The screen is a window through which one sees a virtual world. The challenge is to make that world look real, act real, sound real, feel real” (Sutherland, 1965). What this means hangs on how we understand “real.” The straightforward interpretation is that the synthetic world should reproduce the physical world, but not many people are prepared to strive for that level of correspondence. People are not sensitive to all of physics’ details. The motions of atoms and photons may well be ignored, as long as the user cannot tell the difference. This leads to the mainstream stance that successful VR rests on reproducing sensory stimulation: “The artificial world is simulated or synthesized by the appropriate stimulation of the observer; since so much of our experience and knowledge is directly derived from the senses, it is possible to fool the perceiver by making it difficult for them to discern that the world they are experiencing is artificial” (Christou & Parker, 1995). From this perspective, the higher the correspondence between the stimulation that the user receives from the synthetic world and the stimulation in a possible natural world, the higher the “level of realism” and the greater the sense of experienced reality. The term presence is often used to express the ideal situation for most advocates of Virtual reality in which the user feels somehow located in the synthetic environment. Sheridan (1992; see also Biocca, 1992; Steuer, 1992), and likewise Zeltzer (1992), distinguish three components that determine the sense of presence: the detail of sensory stimulation (e.g., resolution of the rendering display), the ability to modify the environment (e.g., the amount of simulated interaction with the world), and the control of the sensors (e.g., the degree to which details of the user’s body movements are registered). These components, shown in Fig. 5.1,
extent of sensory input
surface of equal presence
perfect presence
autonomy
"Virtual Reality"
ability to modify environment interaction presence
control over sensors FIG. 5.1. Sheridan’s “presence cube” (left) and Zeltzer’s Autonomy-InteractionPresence cube (right; wordings were adapted to fit the definitions used in this article).
5.
BEYOND REALISM
95
are similar to the technology components of rendering, simulation, and sensing, respectively. Each typically ranges on a scale of 0 to 1, with (0,0,0) meaning that the user is totally alienated from the system, and (1,1,1) “perfect presence.” Sheridan’s axes of rendering, sensing, and interaction go from 0 beyond 1, suggesting that it may be possible to render with more precision than the user’s senses can register. Yet the degree of presence is measured as a distance to the condition at (1,1,1). More appropriately, we suggest, scale values greater than 1 should indicate a level of presence, or engagement, greater than that with the everyday world. The degree to which people become engrossed in movies, books, computer games, and the like suggest the possibility that this sort of “hyper-presence” is not as absurd as it may sound. However, the stimulation-correspondence approach would seem, by definition, unable to create levels of engagement higher than that of the world being emulated. Information Correspondence From an ecological standpoint, neither physical nor stimulation correspondence are satisfactory. Both lay an undue emphasis on passive reception rather than active exploration, ignoring the importance of the perception–action coupling. Christou and Parker (1995) discuss direct perception where it concerns higher order image variables such as texture gradients and optic flow (Gibson, 1950, 1966) but not the later paradigm of the ecological approach (Gibson, 1979). Gibson’s most radical attack on established theory was his focus on the observer–environment interaction being the fundament of perception–action theory. This leads to a focus on the user’s task rather than on properties of the observer or environment an sich, and to a different concept of information, as structure specifying affordances (opportunities for actions). The information correspondence approach suggests that rather than trying to slavishly imitate stimulation, designers should focus on task requirements first, then information that guides these tasks, and finally on means of making that information accessible to the user. The solution may not be veridical, in that it may not be experienced as the “real thing.” Nonetheless, it may well lead to better tools. For instance, consider basing a VR system on Lee’s (1976a, 1976b) tau variable for the control of approaches to a collision (braking). Tau (time to contact) is an optical variable (the inverse rate of optical expansion) that can guide the task (braking in time to achieve a soft collision), without the need for geometrical variables (distance and velocity). Where a classical approach would aim to produce images of sufficiently high quality to make accurate distance and velocity judgments, the Gibsonian alternative is to make sure optical expansion is sufficiently rendered. Smets (1995b) argues how the direct approach can lead to innovative design solutions in product design in general and VR applications in particular. Gaver (1991) discusses how information specifying affordances can be applied to computer interface design in general.
96
STAPPERS, GAVER, AND OVERBEEKE
Comparing the Approaches To reiterate, the mainstream of VR development is guided by the reproduction of stimulation; its success is judged by the level of correspondence between stimulation from natural and synthetic environments. On the other hand, the Gibsonian paradigm starts from task-specific information for active observers, and judges success by how well that information is presented. Gibsonians seek correspondence on the level of information rather than stimulation. These two approaches may be seen as reflecting different goals. From the stimulation correspondence perspective, the experience of “presence” in the simulation seems to be the ultimate aspiration. Information correspondence, in contrast, focuses on effective action within the environment. From this perspective, if presence is taken to be the illusion of being in a world just as encompassing as the natural one—in just the same way—then it may have to be sacrificed for systems that work. However, if presence is taken more broadly to include the kind of total engagement people find in books, films, or games, then we believe that concentrating on information may prove the more effective approach. The two approaches lead to different solutions, and both will be needed, especially in the short run, while processing power and display sizes are bottlenecks to any VR application. The Gibsonian theory provides new leads, but very few concrete examples of task-specific information have been worked out until now. The “realism” aim provides concrete directions, but too little guidance for developing effective tools for going beyond “realistic” applications, and for making trade-offs in the face of limited resources. In the next section, we develop these arguments using examples of successful simulations that do diverge from stimulation correspondence in various ways. In the following section, we compare the implications of stimulation correspondence and information correspondence for the three constituent technologies of VR: rendering, sensing, and simulation, and their real-time integration.
USEFUL WAYS OF AVOIDING REALITY Departures from literal simulations of the natural world abound in VR and simulation in general. For instance, users do not die in a simulated plane crashes, even symbolically (e.g., being unable to rerun the simulation). Users do not have to walk from place to place, at least in most simulated environments. They can fly by pointing their fingers, they can change their own body scale from dwarf to giant, or reach over great distance using nonlinear scaling of hand movements (see, e.g., Bowman & Hodges, 1997; Poupyrev, Weghorst, Billinghurst, & Ichikawa, 1996; Song & Norman, 1993; see Fig. 5.2). Deviations from reality such as these should not merely be glossed as deficiencies of simulation software or hardware, but recognized as an inherent, even
5.
BEYOND REALISM
97
FIG. 5.2. The virtual body (outline) of the VR user may scale nonlinearly with the real body (solid shape). In this way, arm stretches may allow the user to reach toward infinity in a virtual environment.
crucial, part of virtual reality. Without such freedom, VR systems could never offer more than the everyday world. For instance, an important finding from flight simulator studies is the occurrence of positive transfer: people can learn some tasks better in a simulator than with the real thing. The simulation can prevent the trainee pilot from tuning to terrain-specific tricks such as “Cut off engine thrust when the roof of the yellow house appears at the same height as the crooked tree.” This can be achieved either by changing the accidents of landscape on every landing, or by completely eliminating these contiguities through abstracted rendering of the landing strip and its surroundings. VR engineering theory should encompass techniques for departing from reality in meaningful ways. In this section, we discuss two interesting models for potential developments: computer games and cartoons.
Computer Games Computer games, one of the most thoroughly developed regions of 3-D interactive applications, often cheat at mimicking the natural world. The most striking examples are found in 3-D computer games like DOOM (id software), and Tomb Raider (Core Design), which sport impressive graphics, engaging game play, and high user performance in a 3-D “roam-a-maze-and-shoot-monsters task.” These examples of “light entertainment” compare impressively to contemporary simulations shown at scientific conferences, in which users performed tasks of less complexity with less skill. What lessons can they teach us about creating effective simulations? A first lesson may be that constraints on interaction can bring advantages to its quality. One example is the graphics in DOOM (see Fig. 5.3, left), which were very fast and sufficiently spatial to give the user a convincing 3-D impression. It features a first-person viewpoint, allowing the user to move horizontally and turn, but not look up or down. The environmental layout and game challenges are likewise restricted so that the user has no need to look up and down. By eliminating this degree of interaction, the rendering could be made much faster than if the user were allowed to look in arbitrary directions. Users hardly notice this lack, which is more than compensated for by the gain in speed.
98
STAPPERS, GAVER, AND OVERBEEKE
FIG. 5.3. Examples of graphic environments in the first-person computer game DOOM (left) and “undulating colors and perspective” effects in underwater scenes second-person game Tomb Raider (right).
A second lesson may be that approximations of perceptual information are often sufficient. In Tomb Raider’s underwater scenes, for example, tricks like bit-map color cycling and sinusoidal image warping are used to produce a compelling visual effect even though careful scrutiny reveals the fake for what it is (see Fig. 5.3, right). The effects are not “correct,” but good enough to produce a convincing experience. An examination of these sorts of heuristics, discovered in the pressure of computer game companies, might well give clues to the higher level information we use in the everyday world. A third lesson goes beyond this to suggest that different forms of perceptual information may replace one another. For example, the interaction in Tomb Raider is a break with perspective of other games. Instead of using traditional third- or first-person perspective (i.e., the views used by games like PacMan, or Doom and Quake, respectively), Tomb Raider uses a second-person perspective, in which the user views the world through the eyes of a virtual cameraman following the heroine. An important advantage of the second-person perspective is that it provides new sources of information about the actor’s environment, sources that are usually impoverished in VR due to poor field of view and proprioceptive feedback. Performing action in the virtual world is easier than with either first- or third-person perspective because the body of the heroine is viewed against the context of the environment around her. This is similar to the behind-the-tail view provided by many flight simulators, a viewpoint that no pilot has, but that many users prefer to the in-cockpit view. This may be an example of replacing information we gather in the natural world—available through proprioception and peripheral vision—with information only available via simulations.
5.
BEYOND REALISM
99
There are other advantages to the “second-person” perspective used in Tomb Raider as well. First, it makes for a dramatic narrative experience (Laurel, 1992), with the camera motions used as in movies to add drama and perspective. Search scenes in action movies similarly achieve suspense by showing the hero(ine) rather than what he or she sees (Boorstin, 1995). Finally, user control over the heroine is shifted from an effector level (controlling head and hand motions) to an intention level, as if the user were telling the heroine where to go but leaving details of that action to the heroine figure. For example, the user indicates the direction of motion, but the heroine figure stops before she hits an obstacle. Again, these features could be analyzed in terms of the information and affordances offered by the systems, and appropriated for use in more “functional” forms of VR. There are more general lessons to be drawn from computer games as well. First, even state-of-the-art academic and research VR environments can often be improved by good designers (e.g., graphic or product designers, or architects) with a clear view of the user’s experience and task. They may not always be able to articulate the effects they achieve, but these may be subjected to later analysis as necessary. Second, as DOOM showed, processing power is not always a bottleneck. This reinforces a general argument we are making in this chapter. Most generally, some games exhibit successful deviations from the natural world, and some deviations are more successful than others. Their success is not arbitrary, but perhaps an important clue to more effective VR, and even to perceptual information in the natural world. In sum, we suggest that we should not neglect our games in doing our work. Illustration and Cartooning A basic lesson in illustration is that photorealism, reproduction of the light reflected by a scene, is not necessary or even desirable for all tasks. This is known from static rendering in the arts of painting, illustration, cartooning, and sketching. After the advent and perfection of photography, many artists abandoned the struggle for photorealism in paintings and instead pursued the creation of other realities.1 In illustration, for example, medical illustration or cartography, the photograph has not replaced the line drawing (Tufte, 1983). Drawings are more effective than photographs at conveying certain kinds of information, especially in separating what is the necessary structure from merely contiguous detail. Good cartoonists need only a few strokes to successfully depict a person or an emotion, and cartoons appear especially effective at drawing emotional responses from readers. The Japanese Manga style, for instance, uses cartooning styles for its heroes and heroines, and realistic styles for their foes. The cartoonish rendering is thought to promote readers’ identification with the heroes to increase the experience of participation or presence (McCloud, 1993). Sketching by designers and artists similarly goes beyond representational realism. In the early stages of design, sketches are vague and ambiguous, and these qualities are needed for the sketches to serve their purpose, which is to sustain and direct new ideas. An example is shown in Fig. 5.4. The controlled vagueness
100
STAPPERS, GAVER, AND OVERBEEKE
FIG. 5.4. Sketch for automotive design, with an emphasis on the use context rather than form geometry. (Sketch by Rick Porcelijn.)
FIG. 5.5. Samples of geometric, realistic, and cartoonish rendering. (Model by Daniel Saakes.)
of design sketches is an asset rather than a drawback (Coyne & Snodgrass, 1993; Tovey, 1989). Designers complain that the cold, dead, and overly precise rendering styles of CAD packages, let alone the distracting way in which they are operated, make them utterly useless and even detrimental for early design. In the automotive industry, one of the most advanced as far as the presence of heavy CAD computers is concerned, stylists stick to sketching on paper until their ideas have crystallized to a definite form. Only then do they move to the computer (Tovey, 1992). Recent years have seen a rapid development of nonphotorealistic rendering in computer-aided design under the name of expressive rendering, allowing designers to output sketchlike drawings from geometric models (Lansdown & Schofield, 1995; Meier, 1996; Schumann, Strothotte, Raab, Laser, 1996; see Fig. 5.5). Currently, the designers still have to first create that geometry using a traditional CAD interface. But work on sketchlike and gestural input is also on the way, allowing users to specify shapes by quick strokes of a pen or other input device (Hummels & Stappers, 1998; Overbeeke, Kehler, Hummels, & Stappers, 1997; Smets et al., 1994; Zeleznik, Herndon, & Hughes, 1996). Nonphotorealistic techniques are powerful tools for separating necessary visual information from accidental and contiguous optical structure. Expressive
5.
BEYOND REALISM
101
renderings, such as cartoons, can be used both as shortcuts, depicting the same information as photographs in a more economical way. They can also be used as extensions, depicting information that is not present in photographs. Designers, especially, have used drawings as tools to be chosen, modified, discarded, and tuned to fit the purposes of their users—and it seems likely that the visual qualities of their drawings serve to indicate how provisional they are. Designer’s collaging techniques, and McCloud’s Manga example, also indicate that photorealistic and expressive techniques are not mutually exclusive, but may be used in combination to create different effects for different aspects of the same representation. If the stimulation correspondence view of virtual reality is relaxed, it should be clear that there are many lessons to be drawn from nonphotorealistic representations. Simpler, more diagrammatic VR worlds could not only be clearer than more realistic ones, but more expressive as well. Different styles could be used in tandem to differentiate relevant parts of the world or to distinguish different actors. Worlds might even appear overtly “sketchy” as they are developed, either to indicate that they are works in progress or to suggest that users might change them. It is important to note, however, that not all simplifications of photorealism are equally effective. In particular, many of the low-resolution, “blocky” VR worlds created due to processing or hardware limitations are neither effective nor attractive. In order to exploit the potential advantages of expressive renderings, the skills of illustrators, cartoonists, or designers must be used or at least understood. Once again, such a pursuit may have the added bonus of providing clues to the visual information we use in the everyday world. These are examples of success stories that do not rely on stimulus correspondence. We have no indication that these successes were based on an ecological approach. Instead, skill and artistry seem to have been guiding forces. Yet a number of features in these systems seem to resonate so well with the Gibsonian approach that they might well have been derived from them. For example, in a paper explaining the ecological approach, Pittenger (1983) discusses how the ecological information analysis could deal with such unnatural but principled phenomena such as Superman’s X-ray vision. We suggest that a better look at the world of games and cartoons might yield many other phenomena amenable to such analyses, and that such an investigation might be both of theoretical and practical importance.
INFORMATION CORRESPONDENCE FOR VR COMPONENT TECHNOLOGIES By focusing not on stimulation at the sensory level, but on information at the task level, the Gibsonian approach can offer shortcuts to technology, just as stimulation correspondence offers shortcuts as compared to full physical realism. In addition, the Gibsonian approach may be useful not only to describe how we are sensitive to the natural environment, but also beyond the natural habitat, to the extensions VR has to offer. Moreover, understanding the extended reality of VR may shed
102
STAPPERS, GAVER, AND OVERBEEKE
insights into how we act and perceive in the natural world as well. In this section, we discuss two examples regarding component technologies rendering and simulation, and their integration. For sensing technology, as yet the least advanced of these technologies, good enough examples were not found. Rendering The realistic aim has been most prominently visible in computer graphics, where the most striking technological advances have been made. High-fidelity computer graphics aim at achieving the optical quality of the photograph. However, an ecological perspective may lead us in different direction when considering rendering. A good example of how the Gibsonian approach has helped us in developing a VR environment lies in the information carried by the visual horizon (Smets et al., 1994). It shows how task-related information is more important than pure “pixel power.” People are sensitive to the visual horizon: They use it to perceive distance and scale in the environment (Mark, 1987; Sedgwick, 1973; Warren, 1984; Warren & Whang, 1987). The visual horizon provides scale information for objects standing on a flat ground surface. For instance, it is easier to see that one of the trees in the background of Fig. 5.6c is closer, not bigger than the other, because its roots appear lower in the picture. People can estimate the size of objects that reach to the ground: The proportion of the objects that is below the visual horizon is exactly as high as the observers’ eyes are above that ground. If there is twice as much tree over the horizon as below it, the tree must be three times as high as the observer. The horizon ratio expresses object size information in terms of the observer’s body scale. In many VR-CAD attempts in the early 1990s, the user floated in empty space with only the object he was creating, much like an astronaut in space, and with similar problems of disorientation (see Fig. 5.6a). It is very hard for the VR-user to estimate the distance and size of the ball shown in the picture, even though all the geometry, the visual angles, may be correct, even though the user can move his head and view the object in 3-D. This situation can be improved if we add an explicit visual horizon. This is a horizontal separation between ground and sky, at infinity, which helps users in orienting themselves and scaling their environment (Fig. 5.6b). The visual horizon is also implicitly specified by texture gradients on the ground, that is, objects or patterns of approximately the same size strewn over the ground. These structures converge at the horizon. The texture gradients may be given by means of either “bit-map textures” or by random patches such as the trees and hatch marks in Fig. 5.6c. The latter may be computationally cheaper and sometimes provide a more convincing spatial impression than the flat-and-polished look that bit-mapped textures often give. A limitation of horizon scaling is that it only works for objects connected to the ground, such as the trees in Fig. 5.6c. But consider the two flying balls in Fig. 5.6d. One is twice as far away as the other, but also twice as large. You can’t see which
5.
BEYOND REALISM
103
FIG. 5.6. Developing a VR environment that supports sensitivity to the visual horizon (from Stappers, 1998). The pictures show the user’s view; the user’s body is shown on the left for reference to the horizon height. (a) empty space with a floating representation of the user’s hand; (b) a visual horizon provides a ground and disambiguates the scale experience; (c) the horizon intersection specifies the height of the tree in proportion to the the viewer’s eye height; (d) a textured ground works even stronger, and the horizon line specifies how far the trees exceed the viewer’s eye height. (e) Floating balls are ambiguous with respect to size and distance; (f) drop shadows resolve the ambiguity in both respects.
is which in the picture, and even with a stereoscopic display it is hard to tell which is nearer. Yet if we add drop shadows—dark discs on the ground below a floating object—the ambiguity is instantly gone (Fig. 5.6e). This technique is often seen in computer games, but less frequently in “serious” VR applications, “because it isn’t consistent with the lighting model.” The important perception-theoretical aspect here is that it need not fit the photorealism: drop shadows tap into a human
104
STAPPERS, GAVER, AND OVERBEEKE
perceptual mechanism that works better than the “realistic” lighting algorithms. Even if the light sources shine from the side, the dark disk below the object retains its informational value. The horizon scaling theory provides an a posteriori explanation for the success of one expressive rendering technique known as drop shadows. Although an a posteriori theoretical explanation could perhaps be devised for the development of drop shadows, in this case it appears that art has preceded theory. Drop shadows are known in early cartoon animations and may even have been used earlier. By explaining which information is carried by the drop shadows (observer-scaled size and distance), the theory can help us to determine for what, when, and how drop shadows can best be applied, and perhaps how this technique can be generalized. Simulation Simulation is what the VR computer does besides rendering and sensing. It manages the state of all the simulated objects in the environment, including a representation of the user’s state, and all the interactions between these objects. Full physical realism is theoretically impossible, as physical theory becomes intractable for any but the most simplistic situations. This means that physical modeling is doomed to approximation, and the findings of chaos theory have shown that for many physical systems accuracy comes at an unrealistic price, especially if it has to be achieved in real time. This means that physics in VR will have to be approximated in a ruthless way. The “realist” aim of reproducing sensation as in the natural world suggests we must make simulations mimic the environment’s physics on the environment’s scale, and mostly this has been interpreted as using classical physics for dynamics. In most VR simulation management packages, the one typical behavior for objects is that you can assign gravity to them, which causes them to fall down or move in parabolar paths when given a velocity by throwing or hitting. However, work on intuitive physics (e.g., Clement, 1982; McCloskey, 1983; Kaiser, McCloskey, & Proffitt, 1986) has shown that people are quite unfamiliar in dealing with Newton’s physics, and suggests that they reason and act by notions that are more similar to medieval or Aristotelian models. Examples are a preference for linear motions, inertia along curved paths, and everything grinding down to halt as soon as we stop pushing it. If our intuitions, perception, and action so closely fit these alternate models, then the latter may be more useful candidates for simulation dynamics than the “veridical” ones of classical dynamics. They may serve as shortcuts where the Newtonian models become computationally expensive (Poston & Fairchild, 1993). Moreover, they may serve as a “reason to the madness” of apparent magic in VR tools that allow the user to fly or stretch his body. To explore this alternative dynamics, we ran an exploratory study in which subjects threw balls at a stationary target in virtual reality (Stappers, 1997, 1998).
5.
BEYOND REALISM
105
FIG. 5.7. Throwing a ball in VR. Left : sample trajectories under different dynamic models. Right : sample user view of the VR environment.
In different conditions, the simulation used different dynamic models for the flight of the ball, making it move according to Newton’s (parabola), Aristotle’s (straight line), or a medieval (circle) model of dynamics. These trajectories look strange in a third-person view, shown in Fig. 5.7, but quite acceptable from the first-person viewpoint of the immersive VR user. This was borne out by our data: Not only could participants hit the targets well with each of the dynamics, but none of the subjects were surprised by the different motions. Participants readily interpreted the events in experiential terms (e.g., with the circular trajectories, a few subjects reported they felt like throwing against the wind). In the same study, we also asked participants to raise or lower the parameter underlying the dynamics until trajectories of the balls they threw appeared to move as in the natural world. For the Newtonian model, stimulation correspondence would predict subjects to set the value of gravity to g = 9.8 m/sec. Instead, participants picked an acceleration that was always much lower, typically half the value of g. This contrasts with the findings of a similar experiment in which subjects adjusted the simulated value of gravity of an animation of a water fountain shown on a video display (Stappers & Waller, 1993), in which participant’s adjusted g to values much closer to that in the natural world. The difference may be due to the observer’s viewpoint (first person in the ball throwing experiment, third person for the fountain), or to the 80-msec delay experienced in the VR ball throwing experiment. With delay, as with lower frame rates, VR users tend to move slower themselves, and thus may experience “slower” events in the environment as more natural. Experiments like these suggest that the literature on intuitive physics can provide some foothold for building a theory of “expressive” dynamics for VR simulation. Expressive dynamics may be used as computational shortcuts, or for defining behaviors for virtual tools (motion dynamics of magnets or lassos in VR CAD) or virtual agents (behavior patterns of semiautonomous simulated life forms). It seems clear that expressive dynamics do not fit a stimulation correspondence
106
STAPPERS, GAVER, AND OVERBEEKE
view of VR. To what degree they might be explained by a Gibsonian approach to information—perhaps by reflecting dynamics in the complex environment of the everyday world—is still an open question. Integrated System Evaluation Although looking at the component technologies helps to structure the examples we just discussed, the quality of a VR environment can only be tested when all three are integrated so that rendering and simulation may be tested as interactive feedback to user actions, not as movies made of external events. For instance, in the previous section, we have mentioned that system delay may affect the experienced quality of a simulation; this is a property of the system’s integration rather than rendering, sensing, or simulation alone. The differences between isolated and integrated testing can be remarkable and has important impact on tuning technologies to tasks. An example of this was found by Smets and Overbeeke (1995), who studied the visual resolution subjects needed for solving a picture puzzle in a telepresence setup. Participants were seated at a table and asked to put together a puzzle. However, they could view the puzzle pieces and their hands only through a head-mounted display (see Fig. 5.8). The image on the display was obtained by a camera that was either stationary overlooking the scene, moved from side to side by an external mechanism, or fixed to the headmounted display. Moreover, a video processor between the camera and display was used to reduce spatial resolution and the number of grey levels. Thus, three levels of interaction (camera movement), three of spatial resolution, and two of temporal resolution were investigated. A main result of this study was that subjects could complete the task with a resolution as low as 18 × 15 pixels, but only when they had active control over the camera movements. This stands in stark contrast to values claimed necessary for convincing VR, typically hundreds or thousands of pixels and hundreds of
FIG. 5.8. Experimental setup in the resolution experiment, showing image viewing conditions varying in level of interactivity (left to right : active, passive, and still).
5.
BEYOND REALISM
107
intensity levels. The latter values were derived from judgments of passive animation sequences, with subjects judging veridicality. As this experiment indicates, requirements appear quite different when assessed using VR as a tool to perform a task such as putting a spatial puzzle together. These findings reflect a central concept of Gibson’s theory and a central technology of VR: the linking of perception and action in a closed cycle. Numerous experiments, among which Gibson’s (1962) experiments on active touch, have found dramatic differences in task performance between active, passive, and static conditions. Yet most of our technology is still tested with static or passive users, often because it is component technologies that are tested, not integrated systems. Next to the passive psychophysics (for a VR example, see Barfield, Hendrix, Bjorneseth, Kaczmarek, & Lotens, 1995), we need an active psychophysics dealing with people performing sensorimotor tasks (Flach, 1990; Flach & Warren, 1995).
CONCLUSIONS Virtual reality technology holds promise as a tool for solving spatial problems, but its development still has a long way to go. In this chapter we have sketched some problems and limitations of the mainstream approach that VR should try to reproduce stimulation as it occurs in the natural environment. Against this view we have placed the Gibsonian view that VR technology is a tool that should therefore convey task-specific information to its user, but that this information may contain shortcuts from and extensions to the natural world. We also sketched how elements of ecological theory may provide a theoretical framework for further developing some of the tricks of the simulation trade that have come about through artistry and craft rather than theory. Harnessing the power of these tricks into useful tools are a necessary component of making VR fulfill its promise of creating the experience of environments that may in some ways be richer than the one we meet in everyday life. Releasing VR from the obligation of mimicking the natural world brings freedom for more expressive variants, which are tuned to the user’s experience rather than to the natural world. This may have practical and philosophical implications in multiuser virtual environments: If everyone’s world becomes personally adapted, a single environment doesn’t exist anymore. Solipsism then becomes a real, if questionable, possibility. The experiences of users in such environments need not even be consistent. It is easy to imagine an arcade game in which two players battle it out as knights, but each player sees himself as the knight in white armor, the other one as the knight in black. The outcome of the fight is clear: either color has won, depending on the outcome. Although both contestants have participated in the same actions and events, they did not share a single “reality” on the level of stimulation. Natural and expressive VR are not exclusive, just as stimulation correspondence and information correspondence may coexist for some purposes. When a
108
STAPPERS, GAVER, AND OVERBEEKE
car designer wants to evaluate a new model in VR rather than by building an expensive foam-and-clay model, the virtual model should “look real, act real, sound real, feel real” because the virtual car model is a tool for evaluating the envisaged real product. But in creating the virtual model, the user should be helped with all the tricks we can offer. These tricks will have to be intuitive, clear to use, and express how and to what effect they can be used: To be useful, they must convey their information; they must “look good, act good, sound good, and feel good” without forcing the designer to do metalwork sweating at a forge. An ecological approach to virtual reality suggests that such systems should build on the information conveyed in the natural world, not the stimulation. We have shown a number of examples in which systems are both engaging and effective at allowing people to perform tasks, yet do not emulate the appearance of physical reality. Nonetheless, it seems that they work by abstracting and focusing on information—information in a Gibsonian sense—to provide users with access to a lawful virtual world. Such systems may have developed through artistry and the craft of computer sciences, but they seem congruent with the Gibsonian approach. We suggest that the ecological approach might amplify the achievements of nonveridical VR and that, conversely, advances in expressive VR may provoke new insights for an ecological approach to perception in the natural world. REFERENCES Barfield, W., Hendrix, C., Bjorneseth, O., Kaczmarek, K. A., & Lotens, W. (1995) Comparison of human sensory capabilities with technical specifications of virtual environment equipment. Presence, 4, 329–256. Biocca, F. (1992) Virtual reality technology: A tutorial. Journal of Communication, 42(4), 23–72. Boorstin, J. (1995) Making movies work. Thinking like a filmmaker. Los Angeles, CA: Silman-James. Bowman, D. A., & Hodges, L. F. (1997). An evaluation of techniques for grabbing and manipulating remote objects in immersive virtual environments. Proceedings Symposium on Interactive 3D Graphics, 35–38. Brooks, F. P., Jr. (1988). Grasping reality through illusion-interactive graphics serving science. Conference on human factors and computing systems CHI’88. ACM Press, New York, NY, 1–11. Carr, K., & England, R. (Eds.). (1995). Simulated and virtual realities: Elements of perception. London: Taylor & Francis. Christou, C., & Parker, A. (1995). Visual realism and virtual reality: A psychological perspective. In K. Carr & R. England (Eds.), Simulated and virtual realities. London: Taylor & Francis. Clement, J. (1982). Students’ preconceptions in introductory mechanics. American Journal of Physics, 50, 66–71. Coyne, R., & Snodgrass, A. (1993). Rescuing CAD from rationalism. Design Studies, 14, 100–123. Flach, J. M. (1990). Control with an eye for perception: Precursors to an active psychophysics. Ecological Psychology, 2, 83–110. Flach, J., Hancock, P., Caird, J., & Vicente, K. (Eds.). (1995). Global perspectives on the ecology of human–machine systems. Hillsdale, NJ: Lawrence Erlbaum Associates. Flach, J. M., & Warren, R. (1995). Active psychophysics: The relation between mind and what matters. In J. Flach, P. Hancock, J. Caird, & K. Vicente, (Eds.), Global perspectives on the ecology of human– machine systems (Vol. 1). Hillsdale, NJ: Lawrence Erlbaum Associates.
5.
BEYOND REALISM
109
Gaver, W. (1991). Technology affordances. Human factors in computing systems conference proceedings CHI’91, ACM Press 79–84. Gaver, W. W. (1996). Situating Action II: Affordances for interaction: The social is material for design. Ecological Psychology, 8, 111–129. Gibson, J. J. (1962). Observations on active touch. Psychological Review, 69, 477–491. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Hummels, C. J., & Stappers, P. J. (1998). Meaningful gestures for human–computer interaction: Beyond hand postures. In Proceedings of the third IEEE International Conference on Automatic Face and Gesture Recognition. Los Alamitos, CA: IEEE Computer Society Press. Kaiser, M. K., McCloskey, M., & Proffitt, D. R. (1986). Development of intuitive theories of motion: Curvilinear motion in the absence of external forces. Developmental Psychology, 22, 67–71. Krueger, M. W. (1995). When, why, and whether to experience virtual reality. Proceedings Virtual Reality World, 1995, 477–481. Lansdown, J., & Schofield, S., (1995). Expressive rendering: A review of nonphotorealistic techniques. IEEE Computer Graphics and Applications, 15(3), 29–37. Laurel, B. (1992). Computers as theatre. Reading, MA: Addison-Wesley. Lee, D. N. (1976a). The optic flow field: The foundation of vision. Proceedings of the Royal Society of London, B290, 169–179. Lee, D. N. (1976b). A theory of visual control of braking based on information about time-to-collision. Perception, 5, 437–459. Mark, L. S. (1987). Eyeheight-scaled information about affordances: A study of sitting and stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 13, 361–370. McCloskey, M. (1983). Intuitive physics. Scientific American, 248; 114–122. McCloud, S. (1993). Understanding comics: The invisible art. Princeton, NJ: Kitchen Sink Press. Meier, B. J. (1996). Painterly rendering for animation. 23rd SIGGRAPH Computer Graphics Proceedings (Annual conference series), 477–484. Overbeeke, C. J., Kehler, T., Hummels, C. C. M., & Stappers, P. J. (1997). Exploiting the expressive: Rapid entry of car designers’ conceptual sketches into the CAD environment. In D. Roller (Ed), Proceedings of ISATA-2 97, 30th International Symposium on Automotive Technology and Automation (pp. 243–250), Croydon, UK: Automotive Automation. Pittenger, J. B. (1983). On the plausibility of Superman’s X-ray vision. Perception, 12, 635–639. Poston, T., & Fairchild, K. M. (1993). Virtual Aristotelian physics. In Proceedings of the 1993 Virtual Reality Annual International Symposium. Los Alamitos, CA: IEEE Computer Society Press, 70–74. Poupyrev, I., Weghorst, S., Billinghurst, M., & Ichikawa, T. (1996). The go-go interaction technique: nonlinear mapping for direct manipulation in VR. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST). New York, NY: ACM Press, 79–80. Schumann, J., Strothotte, T., Raab, A., & Laser, S. (1996). Assessing the effect of non-photorealistic rendered images in CAD. Conference proceedings on human factors in computing systems CHI’96 New York: ACM Press, 35–41. Sedgwick, H.A. (1973). The visible horizon: A potential source of visual information for the perception of size and distance. Unpublished doctoral dissertation, Cornell University. (University Microfilms No. 73–22530) Sheridan, T. (1992). Musings on telepresence and virtual presence. Presence, 1, 120–126. Smets, G. J. F. (1995a). Designing for telepresence: The Delft Virtual Window System. In P. Hancock, J. Flach, J. Caird, & K. Vicente, (Eds.), Local applications of the ecological approach to human– machine systems. Hillsdale, NJ: Lawrence Erlbaum Associates, 182–207. Smets, G. (1995b). Industrial design engineering and the theory of direct perception and action. Ecological Psychology, 7, 329–374. Smets, G. J. F., & Overbeeke, C.J. (1995). Trade-off between resolution and interactivity in spatial task performance. IEEE Computer Graphics & Applications, 15, 46–51.
110
STAPPERS, GAVER, AND OVERBEEKE
Smets, G. J. F., Overbeeke, C. J., & Stratmann, M. H. (1987). Depth on a flat screen. Perceptual & Motor Skills, 64, 1023–1034. Smets, G. J. F., Stappers, P. J., Overbeeke, C. J., & v.d. Mast, C. (1995). Designing in virtual reality: Implementing perception–action coupling and affordances. In K. Carr, & R. England (Eds.), Simulated and virtual realites: Elements of perception. London: Taylor & Francis. Song, D., & Norman, M. (1993). Nonlinear interactive motion control techniques for virtual space navigation. In Proceedings of the IEEE Virtual Reality Annual International Symposium, Los Alamitos, CA: IEEE Computer Society Press, (pp. 111–117). Stappers, P. J. (1997). Intuitive physics in the design of virtual environment interfaces. In G. J. Torenvliet, & K. Vicente, (Eds.), Proceedings of the Ninth International Conference on Perception and Action. p. 102. Toronto, Canada: University of Toronto. Stappers, P. J. (1998). Experiencing non-Newtonian physics in VR. In M. Bevan, (Ed.), Proceedings of Virtual Reality in Education & Training ’97. Loughborough, England: UR News, 173–179. Stappers, P. J., & Waller, P. E. (1993). Using the free fall of objects under gravity for visual depth estimation. Bulletin of the Psychonomic Society, 31, 125–127. Steuer, J. (1992). Defining virtual reality: Dimensions underlying telepresence. Journal of Communication, 42, 73–93. Sutherland, I. E. (1965). The ultimate display. In information processing 1965. Proceedings of IFIP’ 65. Vol. 2, 506–508, Laxemburg, Austria: International Federation of Information Processing. Tovey, M. (1989). Drawing and CAD in industrial design. Design Studies, 10, 24–39. Tovey, M. (1992). Intuitive and objective processes in automotive design. Design Studies, 13, 23–41. Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press. Warren, W. H. (1984). Perceiving affordances: Visual guidance of stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 10, 683–703. Warren, W. H., & Whang, S. (1987). Visual guidance of walking through apertures: Body scaled information for affordances. Journal of Experimental Psychology: Human Perception and Performance, 13, 371–383. Zeleznik, R. C., Herndon, K. P., & Hughes, J. F. (1996). SKETCH: An interface for sketching 3D scenes. Computer Graphics, (Proceedings SIGGRAPH’96), 163–170. Zeltzer, D. (1992). Autonomy, interaction, and presence. Presence, 1, 127–132.
6 On the Nature and Evaluation of Fidelity in Virtual Environments Thomas A. Stoffregen∗ University of Minnesota
Benoit G. Bardy University of Paris
L. J. Smart Miami University
Randy Pagulayan Microsoft Game Studios
In this chapter, we discuss some issues relating to the evaluation of simulations. We concentrate on simulations that involve motion of the user, but our analysis may apply to simulation in general. We argue that in the strict sense of duplication, simulation is essentially impossible: With rare exceptions, sensory stimulation in a simulator cannot be identical to sensory stimulation that is available in the system being simulated. We suggest that these differences in stimulation provide information to the user about the nature of the simulation, as such. If this information
* Correspondence
regarding this chapter should be addressed to Thomas A. Stoffregen, School of Kinesiology, University of Minnesota, 141A Mariucci Anena, 1901 4th St. S.E., Minneapolis, MN 55414.
111
112
STOFFREGEN ET AL.
is picked up, then it may be rare for users to perceive a simulation to be “the real thing.” Researchers and developers commonly define the fidelity of simulations in terms of their tendency to give rise to illusory subjective experience in users. However, if simulation is perceived as such (that is, if perception is accurate rather than illusory), then there may be little practical value in metrics that rely on an illusory perception of reality. Metrics that assume that perception in simulations is illusory may be counterproductive if they focus the attention of researchers and developers on aspects of the simulation that are not directly relevant to issues of fidelity. We suggest that in many application domains, more useful metrics of fidelity can be developed from measurements of performance, rather than subjective experience. Our use of simulation includes virtual environments (VEs) and virtual reality (VR). We discuss relations between simulations and the events or systems that they are intended to mimic. We begin by describing a scenario that we believe brings into focus issues that are central to the evaluation of simulator fidelity. A Scenario: Simulation or Reality? Imagine this. You awaken from a deep sleep to find yourself wearing a pilot’s flight suit and helmet. You are strapped into a flight seat in a cockpit. The windscreen shows that you are at cruising altitude, flying through clouds. Quick, unpredictable motions of the cockpit indicate the presence of turbulence. Somehow in the night you have been spirited out of your bed and put into this situation. What, exactly, is this situation? It may be that you are in flight, that your sleeping body was inserted into the cockpit of an aircraft that then taxied, took off, and climbed under remote control. However, another possibility is that you are in a high-fidelity flight simulator; perhaps a 6-DF motion base in a visual dome, with 3-D sound imaging. These two situations differ greatly in their meaning, that is, in their consequences for your behavior. If you are in flight, then you need to control the aircraft (or at least figure out how to turn on the autopilot), and a mistake may lead to a fatal crash. If you are in a simulator, then it does not much matter what you do (Shapiro & McDonald, 1992, p. 102, note that “real events sometimes demand action, while obviously fictional events can usually be enjoyed vicariously”). The difference between the flight and the simulator, then, is highly consequential; in this case it may be a matter of life and death.1 Will you be able to determine whether you are in an aircraft or a simulator?2 If so, how? Will you be able to execute adaptive behaviors? If so, on the basis of what 1 From the perspective of the ecological approach to perception and action (Gibson, 1979/1986), we would say that the simulator and the simulated have different affordances for behavior. 2 This is the simulation version of the Turing Test: Can a user distinguish between a computergenerated reality and the real thing?
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
113
information? We see these questions as being the essence of the issue of simulator fidelity. Experiments at the level of our scenario would be required in order to determine the extent to which simulations are (or can be) differentiated from the systems that they simulate. Such experiments are very difficult to conduct, as they would depend on the absence, not just of informed consent but of any consent at all. However, it is only in such a situation that we could be certain that the subject’s perceptions and behavior were not influenced by prior knowledge and expectations.
SIMULATION AND NATURAL LAW In this section, we discuss relations between physical events and patterns of ambient energy that constitute potential sensory stimulation. We describe the hypothesis that physical reality structures ambient energy fields in such a way that the structure of these fields specifies, that is, is uniquely related to, the underlying reality. We then suggest that among real things are simulations (e.g., simulation devices and software). An implication of this is that simulations structure ambient energy fields differently than do the simulated systems and events, that is, that simulation is specified as such. Some aspects of our discussion may be obvious to many readers. However, we believe that the issues raised have important implications for the development of vehicular simulations, and that these implications have rarely been addressed in the simulation literature. In this section, we also consider only patterns in energy (potential sensory stimulation), not the pickup of information by living systems. Accordingly, our discussion does not address psychological or physiological issues. It is logically prior to such issues. The activity and results of perception, and technological strategies that are designed to influence psychological processes (such as washout algorithms; Nahon & Reid, 1990), will be addressed in later sections. Reality Is Specified The spatiotemporal structure of light, sound, and other forms of ambient energy are influenced by events and situations in accordance with natural law (e.g., laws of the generation and propagation of optical, acoustic, and mechanical energy). Different laws apply to different forms of energy. For example, the differential absorption of optical energy at different frequencies is governed by certain properties of objects (e.g., their surface texture and pigmentation), whereas the absorption of acoustic energy at different frequencies is governed by other properties of objects (e.g., their rigidity and density). In addition, physical motion produces changes in ambient energy arrays, such as optical flow (Gibson, 1979/1986). The lawful relations that exist between reality and ambient energy arrays gives rise to the hypothesis of specificity. This is the idea that there exists a unique (that is, 1:1) relation
114
STOFFREGEN ET AL.
between any given physical reality and the energy patterns to which it gives rise (Gibson, 1979/1986; Stoffregen & Bardy, 2001). If such a unique, lawful relation exists, then the energy patterns can be said to specify the corresponding reality. We stress that this is not a psychological hypothesis (Runeson & Vedeler, 1993). Specification exists prior to and independent of any psychological activity. The specificity hypothesis does not mandate any particular perceptual process, but it does make it possible for perception to be direct and veridical. Simulation Is Specified, as Such There are many differences between simulations and the things that they simulate, between the simulator and the simulated. After all, the purpose of simulation is to reproduce some of the characteristics of a system or situation (e.g., sensory stimulation and constraints on behavior) without reproducing others (e.g., the expense and/or danger of operating the actual system). Exactly which characteristics of a simulated system can be reproduced? Are there some characteristics that cannot be reproduced? In particular, can a simulator faithfully recreate the sensory stimulation that occurs in the simulated system? A simulator is a real device that is different from the device or situation that it simulates. Both simulations and the systems that they imitate influence the structure of ambient energy arrays in accordance with physical law. If sensory stimulation produced by a simulator were exactly identical to that produced by the simulated system, then the user would have no perceptual basis for differentiating the simulator from the simulated. We refer to this as a state of stimulus fidelity. Researchers have not claimed that stimulus fidelity exists in current simulations or that it will exist in future ones. In fact, there is widespread agreement that technological improvement will not lead to 100% accuracy (Gibson, 1971; Hochberg, 1986; Stappers, Overbeeke, & Gaver, 2002), implying that the energy arrays in the simulator will never be identical to those in the simulated. We use nonidentity to refer to differences between the simulator and the simulated in the structure of ambient energy arrays.3 Do nonidentities contain useful information, or are they noise? Are they picked up by users, or are they ignored? It is common to interpret non-identities as sources of ambiguity (noise), that is, as increasing the observer’s uncertainty about reality (e.g., Edgar & Bex, 1995; Hochberg, 1986). An alternative interpretation, which we endorse, is that non-identities specify the 3 It
is important to note that our use of nonidentity is not equivalent to the concept of sensory conflict. The latter is an interpretation of discrepancies that exist in sensory stimulation (Stoffregen & Riccio, 1991). The discrepancies that are interpreted as sensory conflict exist in the sensory stimulation associated with a given system (either simulator or simulated). The nonidentities discussed in this chapter exist between simulation and simulated systems. Because we do not accept the sensory conflict interpretation (Riccio & Stoffregen, 1991), we use discrepancy to refer to intermodal nonidentities that exist within a given system.
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
115
simulations, as such, and that this information is picked up by users (Stoffregen, 1997; cf. J. J. Gibson, 1971).4 What follows is a partial list of ways in which the structure of energy in simulations is nonidentical with structure of energy in the simulated systems. Our contention is that physical law places severe and irremediable constraints on the conditions in which a simulator can give rise to patterns of energy that are identical to those created by the simulated system. The Optic and Acoustic Arrays There are a wide variety of differences between optic and acoustic arrays generated by simulated events and those generated by simulations of those events (for a detailed list of examples in optics, see Hochberg, 1986; for a related discussion about audition, see Gilkey & Anderson, 1997). Many of these will be remediated through technological development, but some may prove to be fundamental, such that they cannot be eliminated (for examples and discussion, see Gibson, 1971; Edgar & Bex, 1995; Hochberg, 1986; Wann, Rushton, & Mon-Williams, 1995).5 Mechanical and Inertial Arrays The structure of haptic and inertial arrays in any fixed-based simulator will be identical to haptic and inertial structure in a simulated vehicle when the simulated motion is in a straight line at constant velocity. For all other motions, haptic and inertial structure in the simulator will differ from that in the simulated vehicle (cf. Stoffregen & Riccio, 1991). Rather than being subtle (such that they might be below detection threshold), these differences will often be of great magnitude. Inertial displacement of the body always structures stimulation of the vestibular and somatosensory systems (and usually of the visual and auditory systems). Thus, in order to simulate the totality of (multimodal) inputs that result from inertial motion, it is necessary to control the structure of a variety of different forms of 4 Our position contrasts with that of Michaels and Beek (1995), who argued that there can be patterns of stimulation that are not specific to the physical events that cause them. As examples of their position, Michaels and Beek (p. 274) included “any information created by artifice (e.g., a hologram or computer simulation).” Michaels and Beek appear to be asserting that simulation is not specified as such. 5 It should be possible to control for the possibility that simulation, as such, is specified by artifacts of optical or acoustic display technologies. This could be done by creating a vehicle in which the operator could not see or hear the outside world directly, but had access to it only through electronic media (e.g., video and radio) In such a vehicle, many distortions of audiovisual imaging should be the same as those found in vehicle simulators. Subjects would operate this experimental vehicle and would also operate a state-of-the-art full-motion simulator, with identical optical and acoustic display technology (in the vehicle the content of the displays would be “real,” whereas in the simulator the content of the displays would be computer generated). The dependent variable would be whether users could differentiate the simulator from the vehicle. If so, this would suggest that specification of the simulator, as such, was not limited to artifacts arising from audiovisual image generation and display technology.
116
STOFFREGEN ET AL.
energy, not only the optic array, but also the acoustic array, the haptic array, the gravito-inertial array, and so on. This is true both for motion-base (inertial) and for fixed-base (noninertial) simulations. We will discuss these separately. Consider an inertially stationary observer seated at the center of a rotating drum (e.g., Brandt, Dichgins, & Koenig, 1973). During constant-velocity angular rotation, optical and inertial stimulation might be identical to that created by inertial rotation of the observer within a stationary drum. Similarly, during constant velocity linear motion of a “moving room” (Lishman & Lee, 1973; Stoffregen, 1985), an inertially stationary observer might experience optical and inertial stimulation that was identical to what would occur if the observer moved at constant linear velocity within a stationary room. In these situations, observers have no basis for perceptual differentiation of the simulator (optical motion with inertial stasis) from the simulated (inertial motion with optical stasis). But the stimulus fidelity that exists with these “simulations” ceases to exist if the simulated event includes any inertial motion, such as changes in velocity of the observer, the environment, or both. The optical display may be faithful to the optical consequences of the simulated event, but stimulation of the vestibular and somatosensory systems will not be. These examples show that while it is possible to present the optical consequences of inertial motion, such optical depictions will not produce the changes in stimulation of haptic and vestibular systems that accompany inertial motion (Riccio, 1995). In the scenario presented at the beginning of this chapter, changes in direction or velocity of motion will produce nonidentities (in stimulation) in the simulator relative to the real aircraft. Because the scenario is likely to include substantial motion, these nonidentities will be large. For example, in flight, a climb gives rise to large and sustained increases in the magnitude and direction of the gravito-inertial force vector. In a fixed-base flight simulator, these changes will be wholly absent. In a motion-base simulator, a simulated climb can produce changes in the inertial array, but these will tend to be of lower magnitude, in a different direction, and of much shorter duration than those found in-flight. In general, vehicles have a much greater range of motion than do simulator motion bases. Many aircraft can execute 360-deg rolls and turns, pulling more than 8 g acceleration, with very high rates of linear and angular motion. By contrast, simulator motion bases are greatly restricted. Consider the Link 6-DOF “sixpost” motion platform, one of the more sophisticated systems currently in use. This system has 170-cm linear and 50◦ angular excursion, with peak velocity of 60 cm/sec (20 deg/sec), and peak acceleration of 0.8 g (linear) and less than 60 deg/sec (angular). Unlike an aircraft, these capabilities are independent and cannot be achieved simultaneously. The capabilities of the aircraft exceed those of the motion base in a variety of dimensions. Similarly, there are gross deviations from stimulus fidelity in any device that simulates walking (e.g., a head-mounted optical display, in which there is no inertial translation of the body) or in virtual reality “scooters” (Smets, Stappers, Overbeeke, & Mast, 1995).
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
117
Designers of flight simulators sometimes attempt to minimize conscious awareness of the limitations of motion bases by employing washout algorithms (Ellis, 1991; Nahon & Reid, 1990). The general strategy is to reach the limits of the motion base in such a manner that the termination of inertial motion is below detection threshold. Although they may enjoy success as psychophysical strategies, washout algorithms cannot reduce or overcome the fundamental physical limits of motion bases. Accordingly, we would predict that users should be able to differentiate motion-base simulations from the simulated vehicles even when washout algorithms are in use. We are not aware of any direct tests of this hypothesis. We have seen that inertial stimulation in a simulator generally is different from inertial stimulation in the simulated system, and that this is true for both motionbase and fixed-base simulations. A major consequence of this is that simulations must be inaccurate if they simulate motion relative to the inertial environment, either of the body as a whole or of its parts separately (Stoffregen & Bardy, 2001; Stoffregen & Riccio, 1991). Stimulus fidelity can occur only in the absence of such differential motions. As we noted earlier, this means that for a moving observer, stimulus fidelity in the haptic and inertial arrays can be achieved only for motion in a straight line at constant velocity, for example, straight and level flight. Thus, for any simulation that includes inertial motion, there will be nonidentities between the structure of energy arrays by the simulator and the simulated. Based on the hypothesis of specificity between physical reality and the structure of ambient energy arrays, we conclude that for any simulation that depicts inertial motion of the user, the simulation is specified as such. Within inferential theories of perception, nonidentities between the simulator and the simulated are not problematic. In these views, it is assumed that specificity does not exist and that (as a result) perception is primarily an inferential process operating on impoverished sensory stimulation that bears only a probabalistic relation to reality. In such theories, the sensory stimulation in a simulator might be probabalistically consistent with sensory stimulation in the simulated system without being identical to it. For this reason, such theories would tend to predict that simulations can be and are truly mistaken for the simulated systems and events. By contrast, theories based on the concept of specificity would tend to predict that nonidentities between the simulator and the simulated would result in each being perceived as such. This raises the question of whether simulation, being specified as such, actually is perceived as such: Is perception in simulators veridical?
WHAT IS PERCEIVED? It is widely believed that many types of nonidentity can be “tolerated” because of the psychophysical limitations of sensory systems and perceptual processes (e.g., Hochberg, 1986). Arguments of this kind may be plausible when applied to optical and acoustic stimulation, in which nonidentities can be relatively small
118
STOFFREGEN ET AL.
in magnitude and may shrink as technology continues to develop. However, we believe they are less plausible in the context of nonidentities in the structuring of haptic and inertial arrays by simulators and the simulated, which are often of great magnitude. Given the existence of these large nonidentities, will there be a perception of reality (an illusion), and if so, under what circumstances? Contemporary flight simulation typically uses very sophisticated display technology (e.g., fractal terrain models, multiaxis motion bases, and closed-loop control). Despite this technological sophistication, it is widely acknowledged that users easily differentiate the simulator from the simulated, detecting the simulation as such (Edgar & Bex, 1995; Hochberg, 1986; Stoffregen, 1997). Even multimillion-dollar flight simulators are still easily discriminated from “the real thing.” For example, the Synthesized Immersion Research Environment (SIRE) at Wright-Patterson Air Force Base features a Silicon Graphics Onyx computer image-generation system specially modified to drive six high-resolution channels, which are projected onto a specially constructed hemispherical dome. The graphics have a resolution of 2-min arc/pixel and an update rate of 30 Hz. The system produces exceptionally robust vection. However, pilots using the facility have no difficulty discriminating it from physical flight, even for “things that we actually try hard to model as highly convincing representations,” (L. Hettinger, personal communication, June 1995). The pickup and use of information in nonidentities need not imply conscious awareness. This is suggested by the occurrence of motion sickness in simulators during maneuvers that do not produce sickness in the simulated systems (cf. Biocca, 1992). Also, adaptation to sensorimotor “rearrangements” in simulators (e.g., DiZio & Lackner, 1992) can be interpreted as evidence that the information specifying the simulation as such has been detected and used. The examples discussed in this section are suggestive, but they are not definitive. Very few existing data relate directly to the question of whether users perceive simulations as such. In our view, this is because there has been very little research involving direct comparisons of perception in the simulator and in the simulated (most comparisons that have been done have been concerned primarily with transfer of training and not with whether there is differentiation of simulators and simulated; e.g., Moroney, Hampton, Beirs, & Kirton, 1994). In our Conclusion, we propose a program of research of this kind. The Role of Exploration Perceivers do not pick up all of the information that is available to them at any given time, or in any given situation. This means that perception is selective. Selective perception requires controlled motion of perceptual systems. This can include not only motion of receptor organs, but of the entire body, as when we walk toward an object to get a better view of it. We refer to the activity of perceptual selection as exploration (Gibson & Rader, 1979; Stoffregen, 1997). In this section, we
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
119
discuss ways in which perception can be influenced through either the promotion or restriction of exploration. Our discussion has shown that stimulus fidelity will occur only in highly limited circumstances. Given this fact, it might seem impossible for a simulator to be effective. However, simulation may be effective if users can be prevented from picking up the information that specifies the simulation as such. Conversely, simulation can be perceived as such if users are able to pick up patterns in ambient energy that specify it. Exploration produces changes in information (in the relation between action and sensory stimulation) that can disambiguate situations. For example, circular vection that is created by a rotating visual surround ceases if the observer is permitted to execute head movements (Dichgans & Brandt, 1978; cf. Prothero, Parker, Furness, & Wells, 1995, p. 362). The activity of moving the head produces changes in the stimulation of the visual, haptic, and vestibular systems. The changes that occur with the rotating visual surround differ dramatically from the changes that occur when head movements are executed during inertial body rotation. For example, head movements with inertial rotation will lead to coriolis forces, which are not present when rotation is solely optical. This difference specifies the actual motion; that is, it specifies whether the motion is optical or inertial.6 We have argued that stimulus fidelity exists only when the simulated motions are of constant velocity. Simulations can be perceived as such if the user has and exploits opportunities to explore beyond this limited situation, that is, if the user can alter the simulated velocity (cf. Mark, Balliet, Craver, Douglas, & Fox, 1990). We regard this as being logically identical to the disambiguating effect of head movements in circular vection. The use of bite bars and chin rests in experiments on circular vection can be interpreted as means to restrict perceptual exploration (cf. Mark et al., 1990). This is consistent with the fact that many simulations produce subjective realism only if exploration is limited through physical restraint (e.g., Hochberg, 1986). If it is important that the user perceive the simulator to be the simulated system, designers must develop ways to limit users’ exploratory activities. That vection can be so easily reduced or destroyed by simple head movements illustrates the power of exploration in perception, and is consistent with our hypothesis that stimulus fidelity exists only when a simulation does not attempt to depict inertial motion. Further research is needed to determine the kinds of exploratory activity that permit differentiation of simulators from the simulated. Restrictions on exploration (such as bite bars and chin rests) may themselves alter sensory stimulation in ways that specify the fact that the user is not in the simulated situation. The use of a bite bar, for example, causes changes in the structure of ambient energy that specify that the user is not in an aircraft, automobile, or other 6 This should be true also of head movements in a moving room, and so raises the question of why, if the moving room is specified as such, people sway in it. It may be that they perceive it quite accurately, but sway so as to maintain stable looking (Stoffregen, Smart, Bardy, & Pagulayan, 1999).
120
STOFFREGEN ET AL.
vehicle. A related example is restrictions on the acceleration and excursion of a simulation that are mandated by the limited range of its motion base. Such restrictions impose limits on the motions that the user can execute (and correspondingly limit the training utility of the simulation). Because these limits differ from those found in the simulated system, they provide information about the simulation. Our earlier analysis of specification leads us to conclude that in almost all cases, simulation is specified in patterns of ambient energy that are available to perceivers. This provides a logical basis for perceiving the simulation as such. Such a percept would be accurate and not an illusion. There is little experimental evidence that relates directly to the issue of whether users actually differentiate simulations from simulated systems. We have reviewed anecdotal reports and some indirect evidence that suggest that such differentiation does take place in at least some situations. In the next section, we consider implications of this possibility for the evaluation of simulator fidelity.
IMPLICATIONS FOR EVALUATING SIMULATOR FIDELITY What are the appropriate criteria for evaluating the effectiveness of simulations? This is an area of considerable uncertainty: “A major concern relating to the use of simulation for training stems from the difficulty of determining, in many cases, the simulation’s adequacy for training purposes” (Nickerson, 1992, p. 155). In terms of evaluation, we equate effectiveness with fidelity (that is, with the faithfulness of the user’s behavior in the simulator to their behavior in the simulated). Metrics for fidelity fall into two classes: fidelity of subjective experience and fidelity of performance. Following Riccio (1995) we refer to these as experiential fidelity and action fidelity, respectively. Riccio (1995, p. 136), noted that “there is a lack of general agreement about the criteria for fidelity of flight simulators. This makes it difficult to resolve controversies about the sufficiency of particular displays. Progress in flight simulation has been limited by a poor understanding of experiential fidelity and action fidelity.” We will argue that metrics of action fidelity are more useful as constraints on the design and evaluation of vehicular simulations. We will further argue that a profound limitation of any simulation is the range of circumstances (e.g., the type of simulated events) over which it can produce action fidelity. Experiential Fidelity: Presence Experiential fidelity is the extent to which a simulation gives rise to a subjective experience of “being there” (e.g., Held & Durlach, 1991; Smets, 1995). Prothero and colleagues (1995, p. 359) have argued that experiential fidelity should be the sole criterion for the design of virtual environments. Recently, there have been attempts to formalize experiential fidelity in the concept of presence. This effort is
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
121
derived largely from studies of situation awareness and vection (e.g., Prothero et al., 1995), and may be related to immersion. Presence has been operationalized in terms of conscious reports such as questionaires or numerical intensity ratings. For example, Prothero and colleagues (1995) assessed presence by asking subjects to give numerical ratings of the degree to which a simulation seemed “real.” Presence is widely believed to be common in simulators (e.g., Carr, 1995; Prothero & colleagues 1995; Stappers & Smets, 1995). Presence is widely understood to be an illusory percept. Prothero and colleagues (1995, p. 359; see also Slater, Usoh, & Steed, 1994) defined presence as “an illusion of position and orientation” (this complements vection, which is defined as an illusion of self-motion; e.g., Dichgans & Brandt, 1978). Carr (1995, p. 1) described virtual reality as “fooling people into accepting as real what is only perceived.” Similarly, Ellis (1991, p. 323) referred to creating “the illusion of an enveloping environment,” whereas Christou and Parker (1995, p. 53) asserted that with virtual environments “any sense of reality is . . . illusory” and that “it is possible to fool the perceiver by making it difficult for them to discern that the world they are experiencing is artificial.” These definitions reflect widespread agreement that presence can exist if and only if people are fooled by the simulation. Reality or Realism? Despite its intuitive appeal, the concept of presence is not entirely straightforward. We regard as critically important a complication that has been noted by Carr: “It is important to distinguish between the perception of realism and the perception of reality . . . A ‘sense of reality’ does not necessarily imply belief in reality” (Carr, 1995, p. 6; see also Stoffregen, 1997). We equate “perception of realism” with perception of the simulation as such, and “perception of reality” with perception of that which is simulated.7 This is a critical distinction that has not been addressed in the literature on presence. Researchers often appear to confuse presence with realism (e.g., Christou & Parker, 1995, p. 53–54). The distinction between perceiving realism and perceiving reality is important because these percepts may differ qualitatively. A perception of realism would be accurate, reflecting an actual resemblance of the simulation to the simulated. For example, a painting of a building may be said to resemble the building, and an 7 Is
there a difference between the “immersion” of simulation and the “suspension of disbelief” (Goffman, 1974) that is required for appreciation of fictional entertainment? Suspension of disbelief embodies the conceptual distinction between realism and reality in theatrical performance. The suspension of disbelief implies a perception of the simulation as such. In the case of the theater, it implies a perception of the fact of theatrical performance (for a detailed discussion, see Goffman, 1974). That is, suspension of disbelief implies that theater and movie patrons experience the situation as realistic, but not as reality (Stoffregen, 1997). This underscores the fact that the distinction is not peculiar to simulation technology, but is an extension of issues that predate the development of computer-based simulation (cf. Steuer, 1992).
122
STOFFREGEN ET AL.
impersonation may be said to resemble the person who is impersonated. Thus, a perception of realism would not be an illusion. In a simulator, by contrast, a perception of reality would necessarily be erroneous; the person is not in the “real” system, and to perceive otherwise would be an error. Because it is defined as an illusion, presence cannot refer to the (accurate) perception of realism but only to an (inaccurate) perception of reality. Thus, a more precise definition of presence might be “an illusory (false) perception that the simulator is the simulated.” The logical distinction between perception of realism and perception of reality has implications for methods that are used to assess presence. Questionnaires that are intended to measure the illusory perception of reality (presence) may, instead, measure accurate perception of realism. For instance, Prothero and colleagues (1995) exposed experimental participants to a virtual environment. After exiting the virtual environment, participants were asked: “How real did the virtual world seem to you?” However, Slater and colleagues (1994) asked to what extent the “computer-generated world” was “more real” than the “real world.” In their phrasing, these questions inform the subject that the situation they experienced was “virtual,” or “computer generated.” A subject who had truly been fooled by the simulation would be disabused of their error by this information. In addition, all possible answers to these questions are in terms of accurate, nonillusory perceptions of realism. These questions are poorly formulated if the goal is to assess an illusory perception that the simulator is the simulated. Rather than asking, “How real does this seem?” a better question might be: “Is this a simulation, or is it reality?” (participants might also be asked to give numerical certainty ratings). With existing simulation systems, such questions have little credibility; the status of the simulation as such is obvious (this suggests that, rather than being common, presence may be very rare). A rigorous empirical evaluation of this would require the development of paired situations, one member of which was real whereas the other was simulated. In most contemporary research on presence, participants are exposed only to simulations (e.g., Prothero et al., 1995). As a comparison, consider identical twins, Jane and Mary, both of whom are known to you. If one day you encounter Jane, you may correctly identify her as Jane. If you are asked, “Does this person resemble Mary?” the question would seem reasonable and you would reply in the affirmative. Another possibility is that in encountering Jane you erroneously perceive her to be Mary. In this case the question “Does this person resemble Mary?” would be nonsensical (it makes little sense to say that Mary resembles herself). In the same way, to ask whether a simulation “seems real” is to assume that the participant already knows that it is not real, that is, that there is no illusory perception of reality (presence) but only an accurate perception of realism. One reason that users do not experience the simulation as being real is that they have prior knowledge that it is not. They know this because they put on a
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
123
headset rather than getting into an aircraft, because they are being paid to use a simulator rather than an aircraft, and so on.8 This is a powerful deterrent to the occurrence of any “belief in reality” (Carr, 1995, p. 6). As we noted earlier, it is extremely difficult to control or eliminate situational information. Users almost always know, before using a system, whether it is a simulator or the real thing. With adult humans, such prior knowledge could be eliminated only by draconian interventions of the kind found in the scenario that opens this chapter. A related example occurs in the cinema. Despite recent advances in film technology (e.g., Omnimax cinema), movie patrons do not run away from cinematic dinosaurs, murderers, and so on. Presumably, this is because they experience realism but do not have any perception of reality, and this, in turn, is due to stimulus differences between film and reality (Stoffregen, 1997). When film was first developed, there were incidents in which patrons failed to differentiate events in a film from physical reality (Shapiro & McDonald, 1992). Early in this century, a Montana patron fell asleep during a film. When he awoke, the film included a bear. He mistook this for an actual bear and fired a gun. Similarly, it is reported that at the Lumiere brothers’ first public exhibition of the new motion picture technology, audience members confused a scene of a train with a real train and fled from the theater in terror. The rarity of reactions of this kind is testimony to the fact that patrons perceive realism rather than reality. Action Fidelity We have defined stimulus fidelity in terms of prepsychological relations between reality and ambient energy arrays. Action fidelity (Riccio, 1995; cf. Caro, 1979) is defined in terms of relations between performance in the simulator and performance in the simulated system (this is similar to the concept of functional fidelity; Moroney & Moroney, 1998). Action fidelity exists when performance in the simulator transfers to behavior in the simulated system. An appropriate measure of action fidelity is transfer of learning, or transfer of training (e.g., Flach, Riccio, Mcmillan, & Warren, 1986; Kozak, Hancock, Arthur, & Chrysler, 1993; Moroney, Hampton, Beirs, & Kirton, 1994). We have seen that experiential fidelity is measured via subjective reports. By contrast, action fidelity is measured in terms of task performance. Common metrics that could be used to compare performance in a simulator and in the simulated system are time to completion of a task, variance in performance across trials, and trials to criterion (Kozak et al., 1993; Moroney et al., 1994). Performance metrics appropriate for a flight control task might include time on course and magnitude 8 Knowledge of this kind is often thought to originate in cognition. However, scheduling, waiting rooms, donning of simulation hardware, and other preparations all produce characteristic patterns of sensory stimulation. This is specification; the sense is global, but real nevertheless. Thus, such knowledge may have a perceptual basis (cf. Stoffregen, Gorday, Sheng, & Flynn, 1999).
124
STOFFREGEN ET AL.
of heading and altitude deviation (e.g., Moroney et al., 1994). Appropriate metrics for a manual tracking task might include position errors, time, and phase (Knight, 1987), whereas appropriate metrics for a telemanipulation task might include the accuracy with which users can position objects in a closed-loop 3-D video image (Smets, 1995, p. 193). It can be very difficult to obtain data on transfer of skills from a simulation to the simulated system. An example is high-performance jet aircraft, for which it is both difficult and very expensive to collect data on actual flight control. Another example might be virtual environments for which there is not a corresponding physical system, such as a “walk-through” of the human body. For these systems, it may be impractical to proceed directly to the development of action fidelity metrics. An alternative might be to begin by concentrating on systems for which transfer of training can be measured more directly. For example, rather than concentrating on high-performance aircraft, measures of action fidelity might be first developed in automobiles. Reliance on experiential fidelity motivates designers to maximize stimulus fidelity, in an effort to maximize the subjective experience. By contrast, action fidelity does not mandate a concentration on stimulus fidelity. Transfer of skills from the simulator to the simulated system may occur despite departures from stimulus fidelity. A display that does not look or feel realistic may nevertheless facilitate performance. For example, Moroney and colleagues (1994) studied the acquisition of instrument flight skills. Some participants were trained in an FAAstandard simulator, whereas others were trained using a PC-based desktop retail flight simulation program (whose cost was 96% less). Both groups were later tested for instrument flight skills in actual flight. Results showed that there was not a significant difference in actual flight skills between the groups trained on the FAA-approved simulator and those trained with the desktop system, despite the fact that the desktop system was less “realistic.” In some cases, departures from stimulus fidelity may produce improvements in action fidelity. Consider a situation in which a simulator is used to teach nap-of-theearth flight (minimum controllable altitude, following terrain contours). Simulator training at realistically low altitudes would be inefficient because novices could not control the aircraft (this is why training is needed). There would be many crashes, and learning would be reduced or delayed. Training at high altitudes would be inefficient, because at high altitudes the optical consequences of changes in altitude are reduced. A solution might be deliberately to reduce the stimulus fidelity of the simulation; changes in optical splay resulting from controlled changes in altitude could be made to be greater than in real flight at the same altitude (i.e., an increase is the gain of the closed-loop optical splay function). This deliberate departure from subjective realism might lead to improvements in training (Warren & Riccio, 1986; cf. Flach et al., 1986; Stappers et al., 2002) despite a reduction in subjective realism.
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
125
Students of presence have not offered a clear reason why the user’s subjective experience should be of interest to or important for the evaluation of simulator fidelity. The importance of presence appears to be assumed: Presence “is thought to correlate with improved task performance in virtual environments” (Prothero et al., 1995, p. 361; cf. Held & Durlach, 1992). However, the relevance of presence is an assumption that can, and should, be evaluated empirically. A consequence of our analysis is that the utility of experiences of realism and reality will be application specific: If the purpose of a simulation is to influence subjective awareness (e.g., entertainment), then experiential measures may be most appropriate. However, if the simulation is intended to influence behavior (e.g., vehicular control and training), then experiential measures may be insufficient.
CONCLUSION In this chapter, we have developed several arguments: (1) that simulation is specified as such, in patterns of ambient energy that can serve as stimuli for perception; (2) that because of this, the simulator can be, and almost always is, differentiated from the simulated system; and (3) this reflects an accurate perception of realism (the reality of simulation), and not illusory perception of reality (the illusion of the simulated). These conclusions have implications for the scenario that we presented at the beginning of this chapter. If you awoke in this situation, we would expect that you would quickly perceive the true nature of the situation (whether you were in a simulator or real aircraft), and that your behavior would vary greatly depending on which situation you were in, with these variations being, for the most part, adaptive. Your percepts would be accurate, that is, you would perceive the reality of the situation. There would be no illusion. If you were in the simulator, you might enjoy noticing that it was highly realistic. This, too, would be an accurate (nonillusory) percept, and it would not interfere with your simultaneous perception of the simulator as such. The arguments developed in this chapter may have important consequences for the design of experiments in VE and for the assessement of simulator fidelity. Experiments are needed that evaluate the perception of reality in addition to (or rather than) evaluating the perception of realism. These experiments are difficult to design because they require that subjects have no a priori knowledge about whether they will be in a simulator or in the simulated system. However, such experiments appear to be essential to a satisfying assessment of the perception of reality in simulations. A second need is for experiments on action fidelity, using paradigms such as transfer of learning. In particular, research is needed on the possibility that task-specific departures from subjective realism could be used to improve users’ sensitivity to the relevant dynamics of the simulated system. Finally, it may be important to determine the actual correlations between performance and
126
STOFFREGEN ET AL.
subjective experiences (perception of realism and, separately, reality). Work of this kind may help to determine the respective and complementary contribution of action fidelity and experiential fidelity to the design and evaluation of simulations.
ACKNOWLEDGMENTS Preparation of this chapter was supported by grants from the National Science Foundation (SBR-9601351 and INT-9603315) to Thomas A. Stoffregen and by a grant from the National Center for Scientific Research (CNRS/NSF-3899) to Benoˆıt G. Bardy, with additional support from the French Ministry National Education, Research and Technology.
REFERENCES Biocca, F. (1992). Will simulation sickness slow down the diffusion of virtual environment technology? Presence, 1, 334–343. Brandt, T., Dichgans, J., & Koenig, E. (1973). Differential effects of central versus peripheral vision on egocentric and exocentric motion perception. Experimental Brain Research, 16, 476–491. Caro, P. W. (1979). Relationship between flight simulator motion and training requirements. Human Factors, 21, 493–501. Carr, K. (1995). Introduction. In K. Carr & R. England (Eds.), Simulated and virtual realities (pp. 1–10). Bristol, PA: Taylor & Francis. Christou, C., & Parker, A. (1995). Visual realism and virtual reality: A psychological perspective. In K. Carr & R. England (Eds.), Simulated and virtual realities (pp. 53–84). Bristol, PA: Taylor & Francis. Dichgans, J., & Brandt, T. (1978). Visual–vestibular interaction: Effects on self-motion perception and postural control. In R. Held, H. Leibowitz, & H. Teuber (Eds.), Handbook of sensory physiology (Vol. 8, pp. 755–804). New York: Springer-Verlag. DiZio, P., & Lackner, J. R. (1992). Spatial orientation, adaptation, and motion sickness in real and virtual environments. Presence, 1, 319–328. Edgar, G. K., & Bex, P. J. (1995). Vision and displays. In K. Carr & R. England (Eds.), Simulated and virtual realities (pp. 85–102). Bristol, PA: Taylor & Francis. Ellis, S. R. (1991). Nature and origins of virtual environments: A biblographical essay. Computing Systems in Engineering, 2, 321–347. England, R. (1995). Sensory–motor systems in virtual manipulation. In K. Carr & R. England (Eds.), Simulated and virtual realities (pp. 131–178). Bristol, PA: Taylor & Francis. Flach, J. M., Riccio, G. E., McMillan, G., & Warren, R. (1986). Psychophysical methods for equating performance between alternative motion simulators. Ergonomics, 29, 1423–1438. Gibson, E. J., & Rader, N. (1979). Attention: The perceiver as performer. In G. A. Hale & M. Lewis (Eds.), Attention and cognitive development (pp. 1–21). New York: Plenum. Gibson, J. J. (1971). The information available in pictures. Leonardo,4, 27–35. Gibson, J. J. (1986). The ecological approach to visual perception. Mahwah, NJ: Lawrence Erlbaum Associates. (Original work published 1979) Gilkey, R. H., & Anderson, T. R. (1997). Binaural and spatial hearing in real and virtual environments. Mahwah, NJ: Lawrence Erlbaum Associates. Goffman, E. (1974). Frame analysis. New York: Harper & Row.
6.
FIDELITY IN VIRTUAL ENVIRONMENTS
127
Held, R., & Durlach, N. (1992). Telepresence. Presence, 1, 109–112. Hochberg, J. (1986). Representation of motion and space in video and cinematic displays. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 1, Chapter 22). New York: Wiley. Kozak, J. J., Hancock, P. A., Arthur, E. J., & Chrysler, S. T. (1993). Transfer of training from virtual reality. Ergonomics, 36, 777–784. Knight, J. R., Jr. (1987). Manual control and tracking. In G. Salvendy (Ed.), Handbook of human factors (pp. 182–218). New York: Wiley. Lishman, J. R., Lee, D. N. The autonomy of visual kineasethesis. Perception, 2, 287–294. Mark, L. S., Balliet, J. A., Craver, K. D., Douglas, S. D., & Fox, T. (1990). What an actor must do in order to perceive the affordance for sitting. Ecological Psychology, 2, 325–366. Moroney, W. F., Hampton, S., Beirs, D. W., & Kirton, T. (1994). The use of personal computer-based training devices in teaching instrument flying: A comparative study. In Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting (pp. 95–99). Santa Monica, CA: Human Factors and Ergonomics Society. Moroney, W. F., & Moroney, B. W. (1998). Simulation. In D. J. Garland, J. A. Wise, & V. D. Hopkin (Eds.), Human factors in aviation systems (pp. 358–388). Mahwah, NJ: Lawrence Erlbaum Associates. Michaels, C., & Beek, P. (1995). The state of ecological psychology. Ecological Psychology, 7, 259– 278. Nahon, M. A., & Reid, L. D. (1990). Simulator motion-drive algorithms: A designer’s perspective. Journal of Guidance, Control, and Dynamics, 13, 356–362. Nickerson, R. S. (1992). Looking ahead: Human factors challenges in a changing world. Mahwah, NJ: Lawrence Erlbaum Associates. Pausch, R., Crea, T., & Conway, M. (1992). A litearture survey for virtual environments: Military flight simulator visual systems and simulator sickness. Presence, 1, 344–363. Prothero, J. D., Parker, D. E., Furness, T. A., & Wells, M. J. (1995). Towards a robust, quantitative measure for presence. In Proceedings, Experimental analysis and measurement of situation awareness (pp. 359–366). Dayton Beach, FL: Embry-Riddle Aeronautical Univerisity Press. Riccio, G. E., & Stoffregen, T. A. (1991). An ecological theory of motion sickness and postural instability. Ecological Psychology, 3, 195–240. Riccio, G. E. (1995). Coordination of postural control and vehicular control: Implications for multimodal perception and simulation of self-motion. In P. Hancock, J. Flach, J. Caird, & K. Vicente (Eds.), Local applications of the ecologial approach to human–machine systems (pp. 122–181). Hillsdale, NJ: Lawrence Erlbaum Associates. Runeson, S., & Vedeler, D. (1993). The indispensability of precollision kinematics in the visual perception of relative mass. Perception & Psychophysics, 53, 617–632. Shapiro, M. A., & McDonald, D. G. (1992). I’m not a real doctor, but I play one in virtual reality: Implications of virtual reality for judgments about reality. Journal of Communication, 42, 94–114. Slater, M., Usoh, M., & Steed, A. (1994). Depth of presence in virtual environments. Presence, 3, 130–144. Smets, G. J. F. (1995). Designing for telepresence: The Delft Virtual Window System. In P. Hancock, J. Flach, J. Caird, & K. Vicente (Eds.), Local applications of the ecologcial approach to human– machine systems (pp. 182–207). Mahwah, NJ: Lawrence Erlbaum Associates. Smets, G. J. F., Stappers, P. J., Overbeeke, K. J., & Mast, C. (1995). Designing in virtual reality: Perception–action coupling and affordances. In K. Carr & R. England (Eds.), Simulated and virtual realities (pp. 189–208). Bristol, PA: Taylor & Francis. Stappers, P. J., Overbeeke, C. J., & Gaver, W. W. (in press). Beyond the limits of real-time realism: Uses, necessity, and theoretical foundations of non-realistic virtual reality. In M. W. Haas & L. J. Hettinger (Eds.), Psychological issues in the design and use of virtual and adaptive environments. Mahwah, NJ: Lawrence Erlbaums Associates.
128
STOFFREGEN ET AL.
Steuer, J. (1992). Defining virtual reality: Dimensions determining telepresence. Journal of Communication, 42, 73–93. Stoffregen, T. A. (1985). Flow structure versus retinal location in the optical control of stance. Journal of Experimental Psychology: Human Perception and Performance, 11, 554–565. Stoffregen, T. A. (1997). Filming the world: An essay review of Anderson’s The reality of illusion. Ecological Psychology, 9, 161–177. Stoffregen, T. A., & Bardy, B. G. (2001). On specification and the senses. Behavioral and Brain Sciences, 24, 195–261. Stoffregen, T. A., Gorday, K. M., Sheng, Y-Y., & Flynn, S. B. (1999). Perceiving affordances for another person’s actions. Journal of Experimental Psychology: Human Perception & Performance, 25, 120–136. Stoffregen, T. A., & Riccio, G. E. (1988). An ecological theory of orientation and the vestibular system. Psychological Review, 95, 3–12. Stoffregen, T. A., & Riccio, G. E. (1990). Responses to optical looming in the retinal center and periphery. Ecological Psychology, 2, 251–274. Stoffregen, T. A., Smart, L. J., Bardy, B. G., & Pagulayan, R. J. (1999). Postural stabilization of looking. Journal of Experimental Psychology: Human Perception & Performance, 25, 1641–1658. Wann, J. P., Rushton, S. K., & Mon-Williams, M. (1995). Natural problems for stereoscopic depth perception in virtual environments. Vision Research, 19, 2713–2736. Warren, R., & Riccio, G. E. (1986). Visual cue dominance hierarchies: Implications for simulator design. Transactions of the Society for Automotive Engineering, 6, 937–951.
7 Adapting to Telesystems Robert B. Welch NASA-Ames Research Center
This chapter presents an array of training procedures for assisting users of “telesystems” (i.e., teleoperator systems, virtual environments, and “augmented realities”) to overcome the sensory and sensorimotor “rearrangements” often found with these devices. A rearrangement is said to exist when the relationship between sensory systems or between sensory and motor systems has been altered (e.g., Welch, 1978). When first encountered, rearrangements elicit misperception, contradicted expectations, inappropriate motor actions such as misreaching, and sometimes motion sickness-like symptoms. Two examples are a misalignment between felt and seen limb position and a delay between bodily movements and visual feedback. The goal of most of the procedures described in this chapter is to help users conquer rearrangements of this sort by means of the well-studied process of adaptation. In short, these techniques are based on the variables known from laboratory research to control or facilitate perceptual and perceptual-motor adaptation to rearranged environments. It will be argued that these procedures represent a significant advance over the casual or “sink-or-swim” philosophy frequently found in telesystems training. The chapter begins with a review of the many limitations of current telesystem technology and the specific sensory rearrangements that may result. It is argued 129
130
WELCH
that, although telesystem users can and do adapt to or otherwise compensate for many of these problems without special assistance, it is desirable, when possible, to accelerate and maximize this naturally occurring process. This can be accomplished by means of systematic training procedures that capitalize on the variables known to control or augment adaptation to the more traditional sensory rearrangements such as prismatic displacement. It will be noted that the acquisition of maximal adaptation may be accompanied by substantial and potentially troublesome aftereffects. Fortunately, there are at least two ways of dealing with these aftereffects: (1) unlearning (or readaptation) procedures and (2) dual adaptation training (i.e., repeated alternation between adaptation and readaptation). For those rearrangements (e.g., visual magnification) that resist substantial perceptual adaptation the cognitive strategies of “intellectual” correction or avoidance can be employed. Finally, the presence of individual differences in adaptation to sensory rearrangements is acknowledged and its implications for future telesystem training procedures discussed.
LIMITATIONS OF CURRENT TELESYSTEMS AND THE TYPICAL RESPONSE OF THEIR DESIGNERS The Limitations Contrary to the widespread impression from the mass media, “virtual reality” is rarely confused with reality.1 Rather, a typical virtual environment (VE) is subject to an array of sensory and sensorimotor deficiencies, discordances, and other limitations that inhibit the observer’s subjective sense of “presence” or “telepresence” (i.e., the feeling of being located in the displayed environment rather than the actual one) and, in general, render it a poor imitation of everyday experience. The same is true for the related technologies of “augmented reality,” produced, for example, by a see-through, head-up display (HUD; e.g., Barfield, Rosenberg, & Lotens, 1995; Roscoe, 1993), and teleoperation (e.g., Sheridan, 1989). Thus, for the sake of brevity, the designation “telesystem” is used to include all three technologies. This term, which was suggested by T. Sheridan (personal communication, April 1994), is apt because all three devices entail a sensory environment that is located remotely with respect to the observer, albeit metaphorically in the case of VEs and augmented reality, whose displays “reside” within a computer.2 1 The term virtual reality is quickly going out of vogue among researchers, many of whom consider it an oxymoron. Therefore, it will not appear again in this chapter, and the term virtual environment (VE) used instead. 2 This term, although perhaps not ideal, was chosen only after a number of options were rejected. For example, the possible alternate term of virtual display systems, although appropriate for VEs and augmented realities, does not apply to teleoperator systems.
7.
ADAPTING TO TELESYSTEMS
131
The failure of most current telesystems to reproduce the sensory and sensorimotor capacities of their human operators (e.g., Barfield, Hendrix, Bjorneseth, Kacsmarek, & Lotens, 1995) is usually considered a serious shortcoming to be remedied as quickly as possible, perhaps even at substantial cost.3 Thus, there exists a pervasive and persistent belief, both inside and outside the telesystem community (e.g., Barfield & Furness, 1995, p. 6), that such devices can achieve their goals only if they (1) deliver stimuli that fully utilize human perceptual and motor capacities such as high visual–spatial resolution, extensive field of view (FOV), stereopsis, externalized localization of sound and natural hand–eye coordination, (2) avoid seriously violating users’ expectations about the physical world (e.g., objects should not move through other objects), and (3) elicit a strong sense of “presence” (or “telepresence”). This article of faith may be disputed on several grounds. First, in many instances, it is simply wrong. For example, aircraft simulators can successfully train pilots to fly real airplanes despite (or perhaps even because of ) their reliance on relatively crude, unrealistic graphic displays. Likewise, most if not all successful teleoperator systems deliberately deprive their users of the entire array of sensory stimuli present in the environment in which the remote device is located (e.g., at the bottom of the ocean), particularly those of the dangerous variety (e.g., intense cold and pressure). Second, although a strong sense of “presence” (or “telepresence”) may well be considered desirable for telesystems used for entertainment purposes (e.g., the Back to the Future attraction at Universal Studios), there is as yet no clear evidence that these subjective experiences actually improve the operator’s ability to perform a task either during or after interacting with the device (e.g., Welch, 1997; 1999). Third, rather than attempting to distribute limited computer resources equally among all aspects of the telesystem, it would seem more sensible to assign them more heavily to the especially task-relevant aspects at the expense of characteristics likely to be irrelevant to task performance. Indeed, the inclusion of a new telesystem capability may sometimes cause more problems than it solves. For example, adding luminance to a device with a wide FOV may result in flicker, whereas the old (i.e., night vision) system did not (R. S. Kennedy, personal communication, May 15, 1994). 3 Although perceptual experience that fails to match physical reality is said to be “nonveridical,” one might question this characterization with respect to VEs or “augmented realities” because the world to which these telesystems refer is not physical but rather created by a computer. Nevertheless, it is implicit in this chapter that the sensory displays of these telesystems depart from physical reality under one or more of the following conditions: (1) a mismatch (conflict) exists between normally redundant sensory modalities (e.g., a discrepancy between felt and seen limb position), (2) the observer’s perception is accompanied by significant initial errors in performance (e.g., underreaching for virtual targets), (3) perceptual constancy is violated (e.g. the experience of illusory motion of the visual field during head movements), or (4) the perception caused by a virtual stimulus differs from the perception caused by the same stimulus in the natural environment (e.g., a familiar object covering a given area of the retina is perceived as smaller than its physical counterpart when covering the same area of the retina).
132
WELCH
As further illustration of this point, engineers seem to have an inordinate fondness for including stereoscopic displays in their VEs and teleoperator systems, even when the tasks for which these devices are designed do not entail the neardistance conditions for which this capacity is useful. The important moral here is that, just because a feature can be added to a telesystem does not mean that it should be added. These pitfalls can probably best be avoided by keeping one’s eye on the performance goal of the telesystem and what is required to achieve it. If this goal can be met just as well without such frills as stereopsis or elegant pictorial fidelity, then, at least for now, it is unnecessary and perhaps even counterproductive to include them. Omitting a useless capability might, for instance, serve to reduce or eliminate sensory feedback delays that would otherwise exist, as well as decrease the incidence of motion sickness symptoms (e.g., James & Caird, 1995). There is even some evidence that poor telesystem fidelity is actually preferable to nearly perfect fidelity. Kennedy, Lilienthal, Berbaum, Baltzley, McCauley (1989), for example, reported data suggesting that the incidence of simulator sickness was greater for simulators that users rated especially high in fidelity (although apparently not high enough). Thus, it is possible, at least with respect to the relationship between VE fidelity and induced sickness, that “a (near) miss is worse than a mile.” If confirmed, this notion leads to the somewhat surprising suggestion that, unless one can exactly duplicate a real-life environment, it may be preferable to make it less authentic.4 Even if the designers and implementers of current telesystems avoid the preceding hazards, other deficiencies or limitations may remain that are so unpleasant and disruptive to the user that they simply must be resolved one way or another. Possibilities include (1) poor lighting, (2) inadequate visual resolution, (3) the absence of certain cues or entire sensory modalities, (4) restricted FOVs, (5) sparse and /or unrealistic computer graphics, (6) delayed, asynchronous, faulty, variable, or absent sensory feedback from the user’s movements, (7) ambiguous or conflicting depth and distance cues, (8) decorrelations between sensory cues or systems, (9) distortions and intersensory conflicts involving visual size, shape, and spatial orientation, and (10) uncomfortably heavy head gear. User Complaints As many as 90% of telesystem users report at least one of a large array of unpleasant effects or aftereffects (e.g., Barfield & Weghorst, 1993; Biocca, 1992; Carr & England, 1995; Chien & Jenkins, 1994; Durlach & Mavor, 1995; Kalawsky, 1993; 4 Perhaps
the notion that experienced pilots are more susceptible to simulator sickness than less experienced pilots (e.g., Havron & Butler, 1957; Miller & Goodson, 1960) can be explained by assuming that for the latter flyers the intersensory discrepancies of the simulator that presumably cause simulator sickness tend to be below threshold and therefore less nauseogenic, whereas for experienced pilots they tend to be above threshold.
7.
ADAPTING TO TELESYSTEMS
133
Kennedy & Stanney, 1997; Kennedy, Lane, Lilienthal, Berbaum, & Hettinger, 1992; Pausch, Crea, & Conway, 1992; Wilson, Nichols, & Haldane, 1997). These complaints can be divided into three categories—physical, sensory–perceptual, and behavioral, although it is not always clear which telesystem limitations are associated with which ailments. This tripartite classification is somewhat arbitrary because many telesystem complaints are not independent of one another. For example, simulator sickness (a physical complaint) and dizziness (a perceptual complaint) are likely to interfere with task performance (a behavioral complaint). Physical complaints from telesystems include (1) eyestrain, or “asthenopia” (e.g., Mon-Williams, Wann, & Rushton, 1993; Rushton, Mon-Williams, & Wann, 1994), which may be symptomatic of underlying distress of or conflict between oculomotor subsystems (e.g., Ebenholtz, 1992), (2) headaches (e.g., MonWilliams, Rushton, & Wann, 1995), (3) cardiovascular, respiratory, or biochemical changes (e.g., Calvert & Tan, 1994), and, perhaps most serious of all, (4) motion sickness symptoms (e.g., pallor, sweating, fatigue, and drowsiness, although rarely vomiting, e.g., Gower, Lilienthal, Kennedy, & Fowlkes, 1987; Kennedy et al., 1992; McCauley & Sharkey, 1992). The last-mentioned is variously referred to as simulator sickness, virtual environment sickness, or cybersickness.5 The sensory–perceptual problems experienced by some telesystem users include (1) momentary reduction in binocular acuity (e.g., Mon-Williams et al., 1993), (2) misperception of depth (e.g., Cobb, Nichols, Ramsey, & Wilson, 1999), (3) changes in dark focus of accommodation (Fowlkes, Kennedy, Hettinger, & Harm, 1993), and (4) potentially dangerous “delayed flashbacks” (e.g., illusory experiences of climbing, turning, and inversion) that may not appear until several hours after a pilot has operated an aircraft simulator (e.g., Kennedy, Fowlkes, & Lilienthal, 1993). Finally, the disruptive behavioral effects of telesystem use include (1) disruption of hand–eye coordination (e.g., Biocca & Rolland, 1998; Kennedy, Stanney, Ordy, & Dunlap, 1997; Rolland, Biocca, Barlow, & Kancherla, 1995), (2) locomotory and postural instability (e.g., DiZio & Lackner, 1997; Kennedy, Stanney, Ordy, & Dunlop, 1997), and (3) slowed, uncertain, and otherwise degraded task performance (e.g., Fowlkes et al., 1993; McGovern, 1993). Sometimes these effects can even cause accidents, as seen, for example, in the frequent turnovers reported by McGovern (1993) for remotely operated land vehicles, at least partially attributable to their failure to provide their users with somatosensory feedback. Additionally, the fact that as many as 10% of aircraft simulator trainees have experienced delayed “flashbacks” from their training sessions has led the U.S. Navy to institute a mandatory policy of grounding pilots for 5 As Pausch, Crea, and Conway (1992) noted, the term “simulator sickness” (or other like terminolgy) applies only when a telesystem has caused sickness when attempting to recreate a situation that typically does not cause sickness. For example, if a simulator was designed to reproduce the conditions of a violently rocking boat, the sickness caused by it is not an example of “simulator sickness.”
134
WELCH
a period of 12–24 hours after a simulator flight (e.g., Baltzley, Kennedy, Berbaum, Lilienthal, & Gower, 1989; Kennedy, Lanham, Drexler, & Lilienthal, 1995).
The Current Response to the Limitations of Telesystems and a Practical Alternative The Search for (and Anticipation of) Engineering Solutions Designers and implementers of telesystems should take seriously the problematic side effects of their devices, out of concern for users’ well-being, safety, and potential litigious tendencies, and because such unpleasant experiences are likely to discourage future interactions with these telesystems and perhaps even with telesystems in general (e.g., Biocca, 1992). In principle, there are two distinct, albeit complementary, approaches for addressing these problems: (a) change the telesystem to accommodate the user and (b) change the user to accommodate the telesystem. In actuality, however, the typical current response, at least from the human factors and human–computer interface communities, has been limited primarily to the first of these two strategies. Thus, rather than addressing the deficiencies of their devices by providing operators with training procedures to cope with them, telesystem designers have usually sought engineering solutions such as more powerful computers, faster tracking devices, and better display systems and have confidently (or at least hopefully) anticipated their imminent arrival. Indeed, much of the discipline of human factors (e.g., Wickens, 1992) has assumed that the best solution for a human–machine interface problem is to modify the machine rather than assist the human to overcome the problem. In fact, according to F. Biocca (personal communication, May 1998), any telesystem that requires a certain amount of adaptation before it can be used in an optimal manner tends to be viewed by current workers in the field as poorly designed and “oppressive.” Although many of the limitations of current telesystems will undoubtedly to resolved by future improvements of both software and hardware, an unquestioning reliance on this belief and concomitant neglect of other possible interim remedies represent serious mistakes for at least two reasons. First, despite the extremely rapid advances in telesystem technology, many problems (e.g., delays of sensory feedback and the subtle intersensory conflicts presumably responsible for simulator sickness) have yet to be remedied and may resist solution for a long time. Second, even when an engineering breakthrough does finally occur, its initial cost is likely to be prohibitive for all but the most generous research budgets, making its widespread implementation slow and sporadic. Thus, certain trade-offs will be inevitable until computer resources increase sufficiently and prices decline to an affordable level. Finally, telesystems will always exist whose very design and construction present users with substantial intersensory or sensorimotor rearrangements. Many teleoperator devices, for example, entail a large, multidimensional spatial discord between actions and visual consequences (e.g., Vertut & Coiffet, 1986). The operation
7.
ADAPTING TO TELESYSTEMS
135
of a robotic arm, for example, frequently entails hand control movements that are non-intuitively related to the motions of the effector which, in turn, are viewed at some distance away and perhaps misaligned with respect to body orientation. Thus, it should be clear that the limitations of both current and future telesystem technology make it imperative that designers and implementers consider the strategy of changing the user to accommodate to their devices as well as vice versa. A Practical Alternative: Change the User Rather Than the Device Some problems of current telesystem technology can only be solved by improved technology. Such limitations include poor lighting, small FOV, and heavy head-gear. There are, However, other problems that, although currently avoidable, occur nonetheless because many telesystem designers are untrained in the area of human perception and perceptual-motor behavior and thus unaware of the problems their devices are likely to cause for these capacities. For example, Draper (1998) demonstrated that a number of common VE interaction scenarios are likely to present users with visual-vestibular rearrangements to which they are forced to adapt and which therefore produce potentially disruptive postexposure aftereffects. Obviously, to the extent that current technology can avoid such rearrangements, designers need to be made aware of these and similar pitfalls and construct their devices accordingly. Finally, there is a subset of telesystem sensory rearrangements for which there are no current (or perhaps even future) technological solutions, but that are nevertheless likely to be amenable to specialized user training procedures based on the variables known to control and facilitate human adaptation to such conditions (e.g., Welch (1978; 1986). It is these procedures to which most of the remainder of this chapter is addressed. It is important to understand from the outset that these techniques are not designed simply to help the user perform the specific task for which the telesystem is being used. Rather, they incorporate more general variables for overcoming the perceptual and perceptual-motor consequences of telesystem rearrangements which will otherwise impede performance on that or any other task that could be supported by the telesystem in question. For example, as described in a later section, one of the variables that facilitates adaptation is incremental (rather than all-at-once) exposure to the rearrangement. Therefore, where possible, this factor should be included in the ideal telesystem-training procedure, regardless of the specific task being trained. Typical Current Practices of User Orientation, Calibration, and Training Unfortunately, some users are allowed to interact with a telesystem with little or no preamble–often to their sorrow. For example, some Army test pilots have been known to put aircraft simulators through their paces to the limit the very first time they use these devices, causing many of them to become motion sick (L. Hettinger, personal communication, Oct., 1998). More prudent telesystem instructors provide first-time users with detailed instructions, orientation, familiarization, practice, and
136
WELCH
perhaps a few strategies. In addition, the device may be adjusted and calibrated to accommodate individual characteristics (e.g., body height, arm length, and interocular distance). In some cases the user may actually be warned of the more serious limitations of the device and advised to engage in or avoid certain behaviors, thus allowing them to circumvent (rather than confront) these limitations (e.g., Ellis, 1995). For example, they may be cautioned against making fast movements in order to minimize sensory feedback delays or admonished to keep the head as still as possible to reduce the chances of becoming sick (e.g., Jones, 1997; Wilpizeski, Lowry, Contrucci, Green, & Goldman, 1985). Similarly, in the case of aircraft simulators, they may be advised to avoid especially aggressive flight maneuvers (e.g., those involving high rates of linear or rotational acceleration or flying backward), at least at the outset of training. Or they may be instructed to close their eyes occasionally and to avoid looking at the ground during such unusual maneuvers (e.g., McCauley & Sharkey, 1992). Providing the operator with initial practice on the telesystem task is likely to result in a certain amount of adaptation (along with various other more cognitive compensatory behaviors to be discussed at the end of this chapter). However, the procedures described in subsequent sections should greatly accelerate and maximize such adaptation because they capitalize on the conditions and variables that extensive research (e.g., Welch, 1978, 1986) has shown to facilitate this process. This, it is argued, should result in optimal task performance when interacting with the telesystem, a decline in the incidence and severity of cybersickness, and hopefully increased positive transfer of performance to the real-world task for which the device is being used. Finally, all of these advantages should increase the likelihood that users will return to and have confidence in the telesystem device in question and that others will be encouraged to do likewise.
SENSORY REARRANGEMENTS (AND DISARRANGEMENTS) CAUSED BY TELESYSTEMS Telesystem deficiencies relevant to the processes of perceptual and perceptual– motor adaptation include (1) intersensory conflicts, (2) distortions of depth and distance, (3) distortions of form and size, (4) loss of perceptual stability due to sensory feedback delays and asynchronies, and (5) sensory “disarrangement” (i.e., randomly changing distortions). Intersensory Conflicts This category comprises discordances between two (or perhaps more) of the spatial modalities–vision, audition, touch, proprioception, and the haptic and vestibular senses. Thus, an observer’s hand (or its computer-generated image or mechanical
7.
ADAPTING TO TELESYSTEMS
137
surrogate) may be seen in one location and felt in another,6 an object may look rough but feel smooth, and a sound source may be seen in one place but heard elsewhere. Some instances of intersensory discrepancy entail the absence of one of two normally correlated sensory impressions. For example, in fixed-base aircraft simulators, visual motion is often displayed without the vestibular and other inertial cues (e.g., tactual and somatosensory stimuli) that usually accompany it. Likewise, many VEs allow observers to manipulate a virtual object with their virtual hands, but provide no tactual or force feedback about these “contacts.” The type and severity of the effects of intersensory conflict depend greatly on which sensory modalities are involved. For example, conflicts between seen and felt position of the limb will produce reaching errors, particularly during very rapid or “ballistic” reaching, or when visual feedback is precluded (e.g., Welch, 1978, 1986). On the other hand, the conflicts between visual and vestibular/inertial cues that characterize many flight simulators frequently cause disorientation, postural instability (ataxia), and symptoms of motion sickness (e.g., Oman, 1990, 1991; Reason, 1978). Another conflict involving spatial orientation occurs when the displayed visual scene is misaligned relative to the direction of gravity sensed by the user’s vestibular and somatosensory receptors. This can happen, for example, with teleoperator systems in which a distant camera is pitched or rolled with respect to the operator’s control coordinations, creating a discrepancy between the remote visual information about verticality and the operator’s (accurate) perception of the direction of gravity. A number of studies have revealed that the judgment of apparent eye level (i.e., the visual horizon) is very strongly biased in the direction of the rotation of the display. For example, observers in a room that is pitched forward or backward by 20 degrees will undergo a shift of apparent eye level of about 10 deg in the direction of the pitch (e.g., Stoper & Cohen, 1989; Matin & Fox, 1989). Of particular interest in the present context is the finding of Nemire, Jacoby, and Ellis (1994) of comparable effects for a pitched virtual room. Surprisingly, several studies have shown that the substantial visual effects of the pitched environment are accompanied by only minor errors in open-loop (i.e., no visual feedback) pointing (Ballinger, 1988; Cohen & Ballinger, 1989; Welch & Post, 1996). Conflicts between spatial vision and vestibular sensations during movement of the observer’s head or entire body are particularly problematic, both because of their unpleasant behavioral and gastrointestinal effects and the fact that they are very difficult and sometimes impossible to avoid. Thus, even very sophisticated telesystem technology may be unable to eliminate serious intersensory conflicts of this sort. For example, DiZio and Lackner (1992) and Stoffregen, Bardy, Smart, and Pagulayan (Chapter 6 of this book) and have argued that the ability of VE 6 The virtual or mechanical hand of the user will also typically fail to look much like a normal hand.
It is not obvious, however, if this experience will cause problems other than initial surprise and minor distraction.
138
WELCH
technology to simulate inertial motion may never improve to the point where at least some adaptation will not be necessary. Thus, motion-based aircraft simulators may always be beset by significant discrepancies between the ranges and amplitudes of the applied inertial forces and the visual motions with which they are paired. There are, for example, many simulations, such as an airplane taking off from an aircraft carrier or engaging in violent maneuvers, in which a substantial conflict between the two senses is unavoidable because the appropriate g forces exceed the capacity of the simulator. Similarly, most motion-based aircraft simulators are incapable of enough lateral displacement to produce the experience of a coordinated turn— one in which only Z (head-to-foot) axis acceleration is present. The false sense of lateral acceleration produced by this maneuver is likely to cause performance deficits and other problems. The fact that motion-based aircraft simulators tend to be more nauseogenic than fixed-based ones (e.g., Gower et al., 1987) may be due to such intersensory discordances. Further limitations in simulating gravitational– inertial forces stem from the fact that sustained microgravity cannot be produced by earthbound telesystems and hypergravity requires a human centrifuge (e.g., Cohen, Crosbie, & Blackburn, 1973), a device that is unavailable or unaffordable to most investigators. For spatial discrepancies in which a normally correlated modality is simply missing the behavioral and physical effects are also varied. The absence of tactual or force feedback when attempting to grasp a virtual object, although unlikely to disturb gross reaching behavior, may interfere with fine manual control. McGovern (1993) reports that the absence of the somatosensory feedback necessary to provide the “feel of the road” when operating remotely controlled land vehicles contributes to mishandling and accidents (e.g., rollovers). Another problematic instance of an omitted sensory experience is when visual motion occurs without the normally accompanying vestibular and other inertial cues, as in fixed-base aircraft simulators. Such a situation may produce a dramatic sense of virtual body motion known as vection. For some, this illusion is nauseogenic, as is the “freezing” of a simulation in an unusual orientation caused because the user has just performed an extraordinary maneuver (e.g., McCauley & Sharkey, 1992). Depth and Distance Distortions One form of depth distortion in some VEs is when visual objects that are being depicted as closer to the observer’s viewpoint fail to occlude supposedly more distant objects. Another example entails visual objects whose edges are fuzzy due to poor resolution and which, on the basis of “aerial perspective” cues, should thus appear far away but are nevertheless presented with the saturated hue and optical sharpness of a nearer object. Moving the head while viewing a VE through a stereoscopic head-mounted display (HMD) frequently causes virtual objects to appear to change relative position in the distance dimension, as well as their size and shape (e.g., Robinett & Rolland,
7.
ADAPTING TO TELESYSTEMS
139
1992). These distortions are caused by the inability of the VE device to calculate correctly and instantly the images presented to the HMD during the movement. The same is true if objects are moved around a stationary HMD wearer. The movement-contingent depth distortion, referred to as the kinetic depth effect, is likely to disrupt visual–motor interactions (e.g., pointing or reaching) with respect to these objects and perhaps cause simulator sickness. Fortunately, this form of sensory rearrangement is susceptible to a certain amount of visual adaptation (e.g., Wallach & Karsh, 1963a, 1963b; Welch, 1978, pp. 182–184). Besides such problems with depth perception, the absolute distance that objects appear to be located may be under- or overestimated, as demonstrated for example, by visually “open-loop” pointing responses (i.e., hand–eye coordination in the absence of visual feedback). Further, in situations with very few depth cues, distance perception may be ambiguous, which is likely to cause substantial variability in hand–eye coordination. A common discrepancy (in this case an intrasensory one) observed with stereoscopic VE displays is between ocular vergence and accommodation of the lens. Under everyday circumstances, fixating an object in the distance dimension leads to convergence of the eyes to produce a single image, together with the appropriate shaping of the lens for fine focusing. However, in many stereoscopic VEs, the convergence and accommodation appropriate for the visual display are incongruent. Thus, the stereoscopic stimuli may be set for a distance that differs from the one for which optimal focusing occurs. Besides reaching errors, this conflict may cause diplopia, asthenopia, and even nausea (e.g., Ebenholtz, 1992; Howarth & Costello, 1996; Mon-Williams et al., 1993; Wann, Rushton, & Mon-Williams, 1995). Unfortunately, it may not be possible to overcome this problem by means of adaptation because many current stereoscopic VE displays confront users with a range of binocular targets on which to converge their eyes (Wann & Mon-Williams,1997). Clearly, such a situation entails no consistent rule to guide the adaptive process. Roscoe (1993) has demonstrated that pilots who are viewing collimated virtual images by means of see-through HUDs will not focus their eyes on infinity, but instead toward the resting accommodation distance (approximately 1 m). The consequences of this misaccommodation are that (1) the visual world beyond the HUD appears minified, (2) a terrain or airport runway appears as if viewed from a higher altitude than is actually the case, and (3) familiar objects appear farther away than their true distances. The behavioral implications of these misperceptions are obvious and, according to Roscoe (1993), potentially lethal. Unfortunately, magnifying the images in an attempt to compensate for these illusions is not feasible because of the limitation this hardware modification places on display size (Roscoe, 1993). There is, however, evidence that humans are capable of a certain amount of adaptation to such distance (and size) distortions. It is interesting that the evidence for this claim comes almost entirely from studies of adaptation to underwater distortions caused by the diver’s face mask (e.g., Ross, 1971; Welch, 1978, Chapter 12).
140
WELCH
Form and Size Distortions An artifact of many VE systems is an illusory curvature of contours, most notably the so-called pincushion effect, in which the sides of a rectilinear shape (especially one occupying much of the field of view) appear to be bowed inwardly. Magnification or minification of the entire visual scene or objects within it can also occur with some VEs and augmented realities, especially those that distort perceived distance. Thus, in the example of the see-through HUD described above (Roscoe (1993), overestimation of the distance of familiar objects was accompained by an overestimation of their size, probably based on the mechanism of “misapplied size constancy scaling” that has been used to explain the “moon illusion” (e.g., Rock & Kaufman, 1962). The few published studies on optically induced size distortions have revealed little visual adaptation (Welch, 1978, Chapter 8). The same is true for prismatically induced curvature, as seen most convincingly in an experiment by Hay and Pick (1966) in which participants who wore prism goggles for an heroic 7 consecutive weeks underwent adaptation of only about 30% of the optical curvature caused by the prisms. On the other hand, when head movements are made while viewing a magnified or minified visual field, the resulting oculomotor disruption (and loss of visual position constancy) caused by the increased or reduced vestibulo-ocular reflex (VOR) gain, although likely to be nauseogenic, can apparently be completely overcome by adaptation (e.g., Collewijn, Martins, & Steinman, 1981, 1983). It appears that the drastic loss of visual position constancy from head movements while wearing right–left reversing goggles can be eliminated (e.g., Stratton, 1897b), although, interestingly, the resulting loss of the VOR under these circumstances may resist recovery (e.g., Gonshor & Melvill Jones, 1976).
Delays of Sensory Feedback Perhaps the most serious and currently intractable flaw of many VEs and almost all teleoperator systems is the presence of noticeable delays (lags) between the operator’s movements and visual, auditory, and /or somatosensory feedback. For VEs, such delays can occur because of insufficient refresh rates (exacerbated by the use of highly detailed graphics) and relatively slow position trackers. Studies of flight simulation have revealed that lags of as little as 50 msec can have a measurable effect on performance and longer delays cause serious behavioral oscillations (e.g., Wickens, 1986). Sensory feedback delays are unavoidable in teleoperator systems that involve extreme operator–teleoperator distances, for example, between the Earth and the Moon, due to the finite speed of light. However, they can also occur over much smaller ranges (e.g., between a ship and a remotely operated submersible several miles distant) from inherent slowness of transmission and/or the tardy response of remote devices to commands. Finally, it is possible for teleoperator equipment to be of such mass and inertia as to resist the user’s efforts and thus slow its response (T. Sheridan, personal communication, April 1994). With delays
7.
ADAPTING TO TELESYSTEMS
141
of 30 msec or longer, direct teleoperation becomes difficult (e.g., Brooks, 1990); when they exceed 1 sec, it is virtually impossible (e.g., Ferrell, 1966; McKinnon & Kruk, 1993). An especially disruptive and unpleasant effect of visual delay is the interference of the VOR and concomitant illusory visual motion that occurs during head or entire body movements. Normally, a head movement in one direction causes an approximately equal movement of the eyes in the opposite direction, effectively nulling motion of the visual field relative to the head. However, visual feedback delay causes the visual field to lag behind the head movement, making the initial VOR inappropriate and causing the visual world to appear to move in the same direction as the head. It is unclear if human beings are capable of adapting to this sporadic loss of visual position constancy (i.e., oscillopsia) or even if it causes motion sickness (e.g., Draper, 1998). Indeed, because the discordance between head movement visual feedback occurs only at the instant at which the subject’s head is reversing direction, this situation may be best characterized as one of “disarrangement” (i.e., a randomly changing distortion). As we shall see in a later section, disarrangement does not lead to adaptation, but rather to a degradation of performance. Visual feedback delays from motor movements that do not involve vestibular stimulation, such as reaching, pointing, or using manual devices to control the movements of teleoperator devices, rarely cause motion sickness symptoms. However, they can lead to other problems, ranging from mere distraction to serious discoordination (e.g., Funda, Lindsay, & Paul, 1992). Delays of auditory feedback also cause difficulties, particularly with speech (e.g., Smith & Smith, 1962, Chapter 13). Even more problematic perhaps are variable asynchronies between the onsets of two (or more) sensory systems, as, for example, with some motion-base aircraft simulators that present mismatches between the onset times of visual and inertial stimuli. R. S. Kennedy (personal communication, May 15, 1994) believes that these problems may be an especially important factor in the etiology of simulator sickness. Finally, there may be some situations in which the disadvantages of visual feedback delays are outweighed by their advantages. McCandless, Ellis, and Adelstein (2000) showed that, despite the deterioration in distance estimations caused by delays of feedback when subjects turned their head from side to side (thereby acquiring distance information via motion parallax), performance was still better than when subjects made distance estimates with a motionless head. Sensory “Disarrangement” When the size and/or direction of a telesystem-induced sensory rearrangement varies from moment to moment, rather than remaining constant, sensory disarrangement is said to exist. Examples with respect to telesystems include (1) so-called “jitter” (jiggling of the visual image due to electronic noise in the position tracker and/or the image-generator system) and (2) moment-to-moment
142
WELCH
variability in the absolute and relative accuracy of position trackers in monitoring the operator’s limbs and body transport. Meyer, Applewhite, and Biocca (1992) have discussed these limitations in the context of VEs. Such noise can, for example, cause the relationship between seen and felt limb position to change randomly. Not surprisingly, such unpredictable sensory conditions are resistant to adaptation because there is no constant compensatory “rule” on which adaptation can be based. However, as discussed in a later section, they can result in the degradation of perceptual and perceptual–motor performance in the form of increased moment-to-moment performance variability (e.g., Cohen & Held, 1960).
EVIDENCE FOR AND POSSIBLE CAUSES OF IMPROVED PERFORMANCE AND THE DECLINE OF USER COMPLAINTS Evidence of Improvement Telesystem users’ physical complaints and performance difficulties usually decline after prolonged and/or repeated experience with the device in question (Biocca, 1992; McCauley & Sharkey, 1992; Kennedy, Jones, Stanney, Ritter, & Drexler, 1996; Kennedy et al., 1993; Regan, 1995; Uliano, Kennedy, & Lambert, 1986). For example, in the study by Kennedy et al. (1996), marked reductions in sickness were observed in the second of two 40-min VE exposures. Regan and Ramsey (1994) reported that subjects who had experienced four 20-min exposures to a VE in which they engaged in a prescribed set of activities experienced progressively less malaise and other motion sickness symptoms. The “savings” were especially great when VE immersions were separated by only 1–2 weeks. Finally, the fact that simulator training often results in posttraining aftereffects such as ataxia (Kennedy et al., 1993) and delayed “flashbacks” (e.g., Baltzley et al., 1989) represents strong, albeit indirect, evidence that some form of adaptation has occurred. Possible Causes of the Improvement Adaptation to Sensory Rearrangement Of the several possible reasons for the observed declines in complaints as a result of continued telesystem use, the one of primary interest here is perceptual and perceptual–motor adaptation as a response to those telesystem problems that involve sensory rearrangements. Adaptation to sensory rearrangement may be defined as “a semi-permanent change of perception and/or perceptual–motor coordination that serves to reduce or eliminate a registered discrepancy between or within sensory modalities or the errors in behavior induced by this discrepancy” (Welch, 1978, p. 8). There are two major indices of adaptation: (1) the reduction
7.
ADAPTING TO TELESYSTEMS
143
of perceptual and/or perceptual–motor errors during exposure to the sensory rearrangement and (2) postexposure aftereffects.7 Frequently, this adaptation takes the form of a recalibration of proprioception (sometimes including a modification in the felt direction of the eyes) to conform to the relatively unmodifiable visual sense. For example, in the case of adaptation to prismatic displacement (the most commonly studied optical rearrangement) the subject’s hand may soon come to feel as if it is located where it was seen through the prism during the exposure period, even if it is currently out of view (e.g., Harris, 1965). Adaptation can be automatic and “unthinking,” as seen, for example, in the fact that aftereffects of hand–eye coordination will occur even when the subject is aware that the sensory rearrangement is no longer present. Visual-motor adaptation can occur even when visual adaptation has not, as seen in a classic study by Snyder and Pronko (1952) in which, after a month of exposure to inverted vision, the subject had acquired essentially error-free perceptual–motor coordination. This occcurred despite the fact that when critically examining his experience he had to admit that the world still looked upside-down. On the other hand, Kohler (1962) and Stratton (1896) have reported possible evidence of a truly perceptual righting of the optically rotated world. In any case, although the presence of perceptual adaptation probably always predicts behavioral adaptation the reverse is not true. It is extremely likely that at least one of the responses to protracted exposure to a telesystem is an adaptive process of the sort observed in studies of adaptation to optical rearrangements. Indeed, it may be argued that the fact that simulator sickness eventually decreases is indirect evidence of adaptation to the sensory conflicts generally assumed to be its cause. A study by Biocca and Rolland (1998) directly tested the possibility of adaptation to telesystems. The subjects in their experiment actively engaged in a pegboard task while wearing a prototype see-through HUD that displaced virtual eye position forward by 165 mm and upward by 62 mm. Their results demonstrated adaptation in terms of recovered speed and accuracy on the task while wearing the HUD and substantial aftereffects of hand–eye coordination when the device was removed. In another study, Groen and Werkhoven (1998) introduced into an HMD-displayed VE a 10-cm lateral displacement of the virtual hand relative to its true position and obtained a postexposure aftereffect, as measured by open-loop pointing at a real target. They also found a large aftereffect in the distance dimension, presumably because their VE also foreshortened perceived distance.
7 It
is important to distinguish between the aftereffects of exposure to a telesystem (or to any sensory rearrangement) and a mere perseveration of effects. Only if a particular effect (motion sickness symptom, hand–eye miscoordination, etc.) has declined or disappeared by the end of the telesystem exposure period (or perhaps never occurred in the first place) and then reappears at some time in the postexposure period is it correct to say that an aftereffect has been observed. It is not accurate, for example, to refer to postexposure malaise as an aftereffect if subjects were experiencing such malaise prior to leaving the telesystem environment.
144
WELCH
Thus, it is apparent that many of the sensory distortions found in current telesystems are akin to those deliberately introduced by investigators over the long history of research on adaptation to sensory rearrangements. Extensive reviews of this research have been provided by Dolezal (1982), Harris (1965), Kornheiser (1976), Rock (1966), Smith and Smith (1962), and Welch (1978, 1986). The sensory rearrangements examined include optically induced (1) lateral rotation (yaw) of the visual field (e.g., Held & Hein, 1958), (2) visual tilt (roll, e.g., Ebenholtz, 1969), (3) displacement in the distance dimension (e.g., Held & Schlank, 1959, (4) curvature (e.g., Hay & Pick, 1966), (5) right–left reversal (e.g., Kohler, 1964), (6) inversion (e.g., Stratton, 1896), (7) depth and distance distortion (e.g., Wallach, Moore, & Davidson, 1963), and (8) altered size (e.g., Rock, 1965). Acoustical rearrangements have entailed (1) small (e.g., 10-deg) lateral rotations of the auditory field (e.g., Held, 1955), (2) right–left reversal (Willey, Inglis, & Pearce, 1937; Young, 1928), and (3) functional increases in the length of the interaural axis (e.g., Shinn-Cunningham, Durlach, & Held, 1992). Another atypical sensory environment to which humans have been exposed and adapted is altered gravitational– inertial force—hypergravity (e.g., Welch, Cohen, & DeRoshia, 1995), hypogravity (e.g., Lackner & Graybiel, 1983), and alternating hyper- and hypogravity (e.g., Cohen, 1992). Finally, underwater distortions caused by the diver’s mask have been the subject of extensive investigation (e.g., Ross, 1971; Welch, 1978, Chapter 12). From this vast literature, one conclusion is certain: Human beings (and a number of other mammalian species) are able to adapt their behavior and, to a lesser extent, their perception to any sensory rearrangement to which they are actively exposed, as long as this rearrangement remains essentially constant over time. Adaptation is not a unitary process, but rather varies greatly in both acquisition rate and magnitude as a function of type of sensory rearrangement and specific adaptive component (visual, proprioceptive, vestibular-ocular, etc.). And, as mentioned previously, although humans are capable of adapting their behavior to even the most dramatic visual rearrangements, concomitant adaptive changes in visual perception often fail to occur (e.g., Dolezal, 1982; Snyder & Pronko, 1952). Despite the absence of visual adaptation to optical inversion even after several months of exposure, a certain amount of truly visual adaptation does reliably occur for lesser distortions such as prismatic displacement (e.g., Craske & Crawshaw, 1978) and curvature of vertical contours (e.g., Hay & Pick, 1966). Finally, adaptation is likely to be very specific. For example, Redding (e.g., 1973) found that both visual and visuomotor adaptation to optical tilt are unaffected by simultaneous exposure to prismatic displacement and vice versa and that the magnitudes of the two types of adaptation are uncorrelated. Similarly, Groen and Werkhoven (1998), using a VE-induced rearrangement of both the lateral and distance dimension, found the aftereffects of adaptation to the former distortion to be uncorrelated with those of the latter. Clearly, there are similarities between many of the limitations of current telesystems and the “traditional” sensory rearrangements studied in the laboratory and
7.
ADAPTING TO TELESYSTEMS
145
substantial, albeit sometimes indirect, evidence that an adaptation process occurs during exposure to these devices. Therefore it appears reasonable to conclude that procedures that incorporate the variables known to control or facilitate adaptation will be especially useful for the systematic training of telesystem users. Expediting this adaptive process should, in turn, facilitate task performance and help users recover from and/or reduce the severity of any simulator sickness they may be undergoing. Other Possibilities Adaptation to sensory rearrangement is not the only possible explanation for the observed declines in telesystem user complaints with protracted use. Additional causes include (1) the reduced novelty of the situation, (2) sensory habituation (or “desensitization”), (3) the acquisition of “cognitive strategies,” such as intellectual correction and avoidance behaviors, and (4) “intersensory bias.” Frequently confused with adaptation, intersensory bias is defined as “a fastacting, quickly dissipating, often complete resolution of the imposed intersensory discrepancy in which each modality influences the other” (Welch, 1994, p. 119). The best-known example is “visual capture,” in which the conflict between seen and felt position of the hand when viewed through a light-displacing prism is immediately resolved, largely in favor of vision (e.g., Hay, Pick, & Ikeda, 1965). Thus, the hand is felt to be located very close to where it is being seen through the prism and the seen position of the hand is shifted slightly in the direction of its felt position. Another example of intersensory bias is the “dominance” of vision over audition in spatial perception, referred to as ventriloquism (e.g., Radeau, 1994; Radeau & Bertelson, 1977) because of its role in ventriloquists’ ability to “throw” their voices. Intersensory bias, in the case of visual capture, does not persist nearly as long after termination of exposure to the sensory rearrangement as does adaptation (e.g., Hay et al., 1965). Ventriloquism, however, differs in this respect in that passive exposure to a visual–auditory discrepancy does lead to relatively long-lasting postexposure aftereffects (e.g., Canon, 1970, 1971; Radeau & Bertelson, 1974). The experimental observations of visual capture, ventriloquism, and other forms of intersensory bias (e.g., Welch & Warren, 1980) suggest that any small-tomoderate intersensory conflicts caused by a particular telesystem will be perceptually resolved from the outset, as long as exposure is uninterrupted. Thus, users may fail to perceive the sensory discrepancy between vision and proprioception or vision and audition and will therefore be capable of accurate hand–eye coordination. This will be true, however, only if the visual–proprioceptive conflict is not too large and the hand’s movements are made relatively slowly and under continuous visual guidance. Thus, rapid and/or “blind” pointing responses will err in the direction of the visual stimulus. As indicated above, visual capture is a temporary condition, quickly dissipating as soon as the prismatically displaced limb is out of view. The same is true for right–left auditory reversals when the visual source of the sound is removed (e.g., Young, 1928).
146
WELCH
Continued, active interaction with the intersensory conflict will cause visual capture to be replaced by “true” adaptation. In contrast to the former, adaptation may persist when the hand is no longer visible, leading to accurate pointing even when the limb is moved rapidly and/or can no longer be seen and to relatively persistent aftereffects when the discrepancy is removed. A large body of research has identified many of the variables that control the incidence, rate, and extent of adaptation, and it may be assumed, unless demonstrated otherwise, that these variables will have the same effects on adaptation to the sensory rearrangements found in many telesystems. CONTROLLING AND FACILITATING VARIABLES FOR ADAPTATION TO SENSORY REARRANGEMENTS AND THEIR POTENTIAL APPLICABILITY TO TELESYSTEM TRAINING PROCEDURES A Stable Rearrangement As indicated previously, exposure to a sensory rearrangement that is continuously changing in magnitude and/or direction fails to produce adaptation. More surprising, perhaps, is the fact that such a sensory disarrangement, at least with respect to prismatic displacement, actually degrades hand–eye coordination by increasing moment-to-moment variability in open-loop reaching. In an experiment by Cohen and Held (1960), participants viewed their actively moving hands through prisms whose strength varied continuously from 22 deg leftward through no displacement to 22 deg rightward and the reverse, at a rate of one cycle every 2 min. Although this experience failed to change average target-pointing accuracy along the lateral dimension, (i.e., no adaptation occurred), it greatly increased the trial-to-trial variability of subsequent open-loop target-pointing responses. An analogous result has been reported for auditory disarrangement (Freedman & Pfaff, 1962a, 1962b; Freedman & Zacks, 1964) as measured in terms of right–left auditory localization errors. Another example of disarrangement is the situation in which delays of sensory feedback from bodily movements change from moment to moment, as may occur in some telesystems. Such delays are difficult to handle when they remain constant over a reasonable period of time, but represent an even more serious problem when are inconsistent or variably asynchronous (e.g., Uliano, Kennedy, & Lambert, 1986). It may be concluded that if telesystems expose operators to intersensory or sensorimotar discordances whose parameters are changing from moment to moment, not only will adaptation fail to occur but, worse yet, performance will become inconsistent. Thus, disarrangement represents a sensory distortion whose effects cannot be ameliorated by adaptation-training procedures because the basic premise on which adaptation is based (i.e., the presence of a relatively stable sensory rearrangement) does not hold. It would appear, therefore, that telesystems that suffer
7.
ADAPTING TO TELESYSTEMS
147
from this problem must await engineering solutions, such as more stable position trackers. Active Interaction It is generally agreed that the most powerful of all the variables affecting adaptation to sensory rearrangement is active interaction with the altered sensory environment. Thus, it has been amply demonstrated that passive exposure to a sensory rearrangement produces little or no adaptation (e.g., Held & Hein, 1958). Although the basis for this observation is controversial (e.g., Welch, 1978, pp. 28–29), one likely possibility is that active interaction provides the observer with unambiguous information about the sensory rearrangement that, in turn, initiates and/or catalyzes the adaptive process. For example, because felt limb position is more precise when bodily movement is self-initiated than when controlled by an external force (e.g., Paillard & Brouchon, 1968), the discrepancy between seen and felt limb position can be assumed to be particularly pronounced during active movement, thereby facilitating adaptation. However, one qualification to the preceding endorsement of active interaction as a facilitator of adaptation is that the motor intentions (“efference”) generating these bodily movements must actually conflict with the resulting visual feedback (“reafference”). For example, when wearing prism goggles, subjects’ active hand or head movements initially lead to visual consequences (e.g., errors in localizing a visual target) that conflict with those that previous experience has led them to expect. However, their eye movements will not be in error. That is, although subjects who are wearing prism goggles and instructed to rapidly turn and face a visually perceived object will err in the direction of the prismatic displacement, they will have no difficulty turning their eyes to fixate the perceived locus of the object. The reason for this is that prism goggles do not alter the relationship between the retinal locus of stimulation and the appropriate fixating eye movements. On the other hand, if the prisms are affixed to contact lenses instead of goggles, or are otherwise controlled by eye movements (e.g., White, Shuman, Krantz, Woods, & Kuntz, 1990), the situation is quite different. With this optical arrangement, not only will head movements be in error, but eye movements as well. The results of the few relevant studies using this unusual arrangement (Festinger, Burnham, Ono, & Bamber, 1967; Taylor, 1962) suggest that observers who are so accoutered are not only able to acquire correct eye movements but also undergo much more visual adaptation to prismatically induced curvature of vertical contours than is the case with prism goggles. Perhaps adaptation to other discrepancies such as altered size and displacement would also be enhanced by such means. Therefore, for VEs designed to be controlled by the operator’s eyes it is possible that any visual discordance that is present will undergo stronger adaptation than if the same discordance its experienced in a head movement–controlled VE such as one involving an HMD.
148
WELCH
It is clear from the preceding discussion that maximal adaptation to telesystems requires that users be allowed to actively interact with the sensory environment being displayed. This may be a two-edged sword, however, because before active movement can benefit adaptation, it may make some users sick. For example, VEs that create a sense of gravitoinertial force by means of passive transport in a centrifuge or a visual flow field are very likely to induce motion sickness symptoms as soon as the user engages in active bodily movements, particularly of the head (e.g., DiZio & Lackner, 1992). Unfortunately those participants who cannot tolerate these unpleasant effects may exit the telesystem before much adaptation has had a chance to occur. Error-Corrective Feedback Making and correcting errors when attempting to point at or reach for objects in a rearranged visual environment facilitates adaptation beyond the effect attributable to visual–motor activity alone (e.g., Coren, 1966; Welch, 1969). For example, Welch (1969) demonstrated that when subjects were allowed to make visually closed-loop target-pointing responses, they adapted significantly more than when they moved the visible hand in the same active manner but in the absence of visual targets. It is very important to understand that the mere availability of targets is insufficient to produce this facilitating effect on adaptation. Rather, target-pointing responses are most efficacious if they are performed either so rapidly that their trajectories en route to the target cannot be altered (i.e., they are ballistic) or in such a manner that the outcome of a given response is not observable until it has reached its goal (often referred to as “terminal exposure”). In the Welch (1969) experiment, the latter procedure was used. If observers are allowed to make slow, visually guided reaching movements (“concurrent exposure”) instead, they will almost certainly zero in on the target on each attempt and thus experience little or no error when the hand finally reaches its goal. The fact that concurrent exposure is less informative than terminal exposure is the most likely reason why adaptation is much greater in the latter condition (e.g., Welch, 1978, pp. 29-31). In sum, it can be concluded that telesystem exposure should, where possible, entail numerous targets with which the user can actively interact, together with unambiguous errorcorrective feedback from these interactions at the conclusion of each response. Immediate Sensory Feedback As discussed previously, delays of motor-sensory feedback represent one of the most serious limitations of current telesystems, causing severe behavioral disruption for latencies in excess of 1 sec (e.g., McKinnon & Kruk, 1993) that do not appear amenable to “true” adaptation. This state of affairs would seem to argue against the use of adaptation-training procedures to ameliorate this problem. Current (non-adaptation) means of circumventing deleterious effects of feedback delays
7.
ADAPTING TO TELESYSTEMS
149
include the use of (1) sparse graphics to accelerate refresh rates, (2) reconstructive models for the “rehearsal” of the appropriate motor manipulations, (3) predictive models, and (4) the creation of, and real-time interaction with, virtual models of the distant environment in which a teleoperator system is located (e.g., Funda et al., 1992). Fortunately, there are at least some situations for which the reports of the “fatal” effects of sensory feedback delays for adaptation have been greatly exaggerated. The fact is that the role played by this variable with respect to adaptation depends on two other factors: (1) the presence or absence of a second sensory rearrangement and (2) whether the observer is allowed to make discrete movements that result in error-corrective feedback versus continuous movements with no error-corrective feedback. The Presence of a Second Sensory Rearrangement and Error-Corrective Feedback from Hand–Eye Responses It is true that when a sensory feedback delay is superimposed on another intersensory discrepancy, such as a conflict between felt and seen limb position, the ability to adapt to the latter is likely to be severely impaired. For example, Held, Efstathiou, and Greene (1966) showed that visual–motor adaptation to prismatic displacement is substantially reduced or even abolished by delays of visual feedback as short as 300 msec. Because many current telesystems are subject to much greater delays than this, it might seem, as Held and Durlach (1993) have argued, that users will be unable to use adaptation as a means of overcoming any spatial (or other) discordances that may also be present. Fortunately, however, it appears that this pessimistic conclusion is limited to the type of exposure condition typically preferred by Held and his colleagues in which subjects do not reach for specific targets, but merely view their hands as they swing them side to side before a visually homogeneous background (e.g., Held & Hein, 1958). In contrast, it has been amply demonstrated that when subjects are provided with error-corrective feedback from target-pointing responses, substantial adaptation will occur despite significant visual delays. For example, in an experiment by Rhoades (described by Welch, 1978, p. 105), subjects revealed substantial hand–eye adaptation to prismatic displacement as a result of a series of target-pointing responses for which error-corrective feedback was delayed by varying amounts. Although, as expected, adaptation declined with increasing delay, it was still statistically significant for a delay of 8 sec! Additionally, quite substantial adaptation has been obtained in a large number of published studies (e.g., Baily, 1972; Dewar, 1970; Welch, 1972) in which, for procedural reasons, the investigators were forced to institute visual error-corrective feedback delays of about 1 sec. What, then, can we conclude about the role of sensory feedback delays for telesystem adaptation? First, if a feedback delay represents the only intersensory
150
WELCH
discordance present, motor responses will be seriously disrupted by delays of a second or more. Under such circumstances, users may, at best, only be able to circumvent these problems by means of conscious strategies such as deliberately moving very slowly. Second, if the feedback delay is accompanied by another discordance, such as displacement of the visual field, and the observer’s task involves continuous movement, not only will visual–motor performance be disturbed, but even very short visual delays will prevent adaptation from occurring. Finally, when observers are allowed to engage in discrete perceptual–motor responses, followed by error-corrective feedback, some adaptation to the discordance will occur even with feedback delays as long as several seconds.
Incremental Exposure Another controlling variable for adapting to sensory rearrangement, and presumably to telesystems as well, is the provision of incremental (rather than “all-atonce”) exposure. Thus, it has been shown that if exposure to prismatic displacement (Lackner & Lobovits, 1977, 1978; Howard, 1968), optical tilt (Ebenholtz, 1969; Ebenholtz & Mayer, 1968), and slowly rotating rooms (e.g., Graybiel, Deane, & Colehour, 1969; Graybiel & Wood, 1969) is introduced gradually, adaptation (and/or the elimination of motion sickness symptoms) is more complete than if the observer is forced to confront the entire sensory rearrangement right from the start. With regard to rotating environments, Reason and Brand (1975) reported that the use of incremental (“graded”) exposure resulted in protective adaptation that could be detected 6 months later. There are a variety of ways in which this variable might be applied to telesystem adaptation. First, it has been advocated (e.g., Kennedy et al., 1987, p. 48) that aircraft simulator operators begin their training with short “hops” before attempting longer ones. Second, because intense, rapid responses are likely to cause greater delays of visual feedback than mild, slow ones, training should begin with the latter. Perhaps, having adapted first to minor visual delays, participants will be better able to tolerate and/or acquire cognitve strategies for ameliorating the disruptive effects of longer ones. A third potential means of incrementing feedback delays in VEs might be to begin training by using very simple graphics such as edges, stick figures (which entail very rapid refresh rates) and, when the user is ready, to increase the graphic complexity and with it the delay. A final form of incremental training that might prove useful would be to impose a gradual increase in the size of the FOV rather than beginning with its largest possible extent. That is, because it is known that telesystems with large FOVs produce better task performance than those with small FOVs but are also more nauseogenic (e.g., Pausch et al., 1992), gradually increasing the FOV might reduce the simulator sickness symptoms that participants would otherwise experience while simultaneously improving their performance.
7.
ADAPTING TO TELESYSTEMS
151
Distributed Practice It has been demonstrated that periodic rest breaks facilitate adaptation to prismatic displacement (Cohen, 1967, 1974; Taub & Goldberg, 1973) and /or result in greater retention of adaptation (Dewar, 1970; Yachzel & Lackner. 1977). Presumably, the same would be true for adaptation to telesystems. Indeed, Kennedy, Lane, Berbaum, and Lilienthal (1993) determined that the ideal inter-“hop” interval for “inoculating” users against aircraft simulator sickness (with respect to the simulators they observed) was 2–5 days; shorter or longer intervals resulted in less adaptation. Cole, Merritt, Fore, and Lester (1990) reported that adaptation of hand–eye coordination with respect to a remote manipulator task occurred primarily between, rather than within, training days, clear evidence of a “distribution effect.” Based on these admittedly sparse data, it appears that telesystem training should entail some sort of distributed practice regimen, although the ideal profile of “on” and “off” periods will almost certainly vary from one device to another.
SIMULATIONS OF NATURALLY REARRANGED SENSORY ENVIRONMENTS Some telesystems are designed for the express purpose of simulating real-life environments that are by their nature rearranged or otherwise out of the ordinary and to which individuals must adapt if they are to perform appropriately. In other words, for these telesystems, the presence of certain intersensory discordances is deliberate rather than a technological limitation. An example is a VE that has been designed to simulate the visual–inertial conflicts of hypogravity caused by the absence of the otolithic cues for gravity, used to help astronauts adapt to this environment before actually entering it. This procedure, sometimes referred to as preflight adaptation training (PAT), has shown promising results (e.g., Harm & Parker, 1993, 1994; Harm, Zografos, Skinner, & Parker, 1993; Parker, Reschke, Ouyang, Arrott, & Lichtenberg, 1986). Presumably, PAT procedures could be used to generate adaptive immunity to other disruptive sensory environments such as rocking ocean vessels or the underwater visual world as viewed through a diving mask. Another example of a telesystem designed to include a sensory rearrangement is that of Virre, Draper, Gailey, Miller, and Furness (1998), who proposed that individuals suffering from chronically low VOR could be exposed and adapted to gradually increased gain demands by means of a VE. Obviously, all of the facilitating procedures and conditions for adaptation that have been proposed in this chapter should prove useful in helping users adapt to these and other deliberately imposed telesystem rearrangements. Alternatively, a telesystem might be designed to produce an intersensory discordance that does not actually exist in the real world. This might be done because the presence of such a discordance serves a useful purpose. A good example
152
WELCH
is the magnification of the interaural axes as a potential means of enhancing the localizability of auditory objects (Durlach & Pang, 1986; Durlach, ShinnCunningham, & Held, 1993; Shinn-Cunningham et al., 1992). It is important to note that with such an arrangement one should hope that auditory adaptation does not occur, because this would mean that observers’ capacity to localize auditory objects has returned to normal, contrary to the aim of the device. On the other hand, it would probably be advantageous if the observers’ behaviors (e.g., hand–ear coordination) recovered. Otherwise, they would misreach for auditory objects, assuming as usual that these responses were open-loop or ballistic. Whether it is possible to have it both ways in this situation remains to be seen. Varieties of Sensory Rearrangement and the Problem of Aftereffects Varieties of Sensory Rearrangement A telesystem-induced spatial rearrangement can be relatively innocuous, as with a 3-deg lateral misalignment of felt and seen limb position presented by the HMD of a VE device. Or it can be substantial and multidimensional, as when an astronaut’s rightward turn of a joystick causes the Space Station’s robotic arm to move at an oblique angle in the distance dimension, as viewed on a monitor. Regardless of the magnitude or complexity of the rearrangement, however, the operator’s behavior when using the device is likely to follow the same scenario: Large initial errors, rapid improvement and, finally, perfect or near-perfect performance. The recovery of performance while using the telesystem is what we have referred to as the reduction of effect. It is reasonable to assume that the controlling and facilitating variables described previously will have the same beneficial effects on the reduction of effect regardless of the nature or magnitude of the teleysystem rearrangement. Where the two extreme cases may depart is whether or to what extent users experience aftereffects upon leaving the device. Postexposure aftereffects can occur for two very different reasons—negative transfer and perceptual recalibration. Causes of Aftereffects If a telesystem user performs a visual-motor task for a while and then moves on to a second task for which the stimuli or effectors are the same but the outcome different, errors will occur as the result of negative transfer. An example would be an astronaut who operates the robotic arm for an hour and then switches to a task that also involves a joystick, the movement of which, however, now produces a different outcome. The astronaut is likely to make errors when initially performing the second task because of negative transfer, errors that can properly be considered aftereffects. If, on the other hand, the stimulus-response relationship of the second task is very different from that of the first, little or no negative transfer will occur
7.
ADAPTING TO TELESYSTEMS
153
and thus little or no disruptive aftereffects. For example, immediately after an extended session of controlling the robotic arm with a joystick, an astronaut will have no trouble using a computer mouse to position a cursor on a monitor. In contrast to large and multidimensional rearrangements like the robotic arm example, minor spatial distortions, such as a lateral visual displacement of a few degrees, will cause the felt position of the limb to be recalibrated in terms of its visual position (e.g., Harris, 1965). This so-called proprioceptive shift is the basis of the postexposure negative aftereffects of hand-eye coordination that result from exposure to this kind of rearrangement. Thus, because the hand feels as if it is shifted to one side of its true position, initial reaching responses in the postexposure period will err in the direction opposite the previously imposed visual displacement. If only one hand was used for this task, the aftereffects will be confined almost exclusively to that limb and thus will undergo little or no transfer to the other hand (e.g., Harris, 1965). Unlike negative transfer, however, perceptual recalibration will affect the exposed hand with respect to a very wide range of postexposure tasks and hand positions, not just ones whose stimulus characteristics are similar or identical to those of the initial task (e.g., Bedford, 1989). Perceptual recalibration does not occur in the presence of very large and/or multidimensional rearrangements. Rather, interaction with these rearrangements leads to a more cognitive behavioral change referred to as visual-motor skill acquisition (Clower & Boussaoud, 2000). Thus, except in the case of negative transfer, this form of adaptation will not produce aftereffects and is subject to substantial, perhaps even complete, intermanual transfer (Imamizu & Shimojo, 1995). Distinguishing between those situations that produce perceptual recalibration and those producing visual-motor skill acquisition has obvious implications for telesystem training and its outcomes. Although the behavioral disturbances caused by minor sensory rearrangements are likely to be more quickly overcome than those caused by the more dramatic rearrangements, the latter will not lead to large and troublesome aftereffects, except, of course, when negative transfer has occurred. Further, it will be unnecessary to separately train each hand and it may be possible for users to shift with little or no difficulty between different telesystems (each with its own rearrangements) or between a telesystem and the everyday world. Procedures for Eliminating Aftereffects: Unlearning/Relearning and Dual Adaptation Training Whether based on negative transfer or perceptual recalibration, postexposure aftereffects from telesystem use represent a troublesome and sometimes risky side effect of adapting to the device (Welch, 1997). The potential danger of delayed “flashbacks” has already been noted. Further, Biocca and Rolland (1998) raise the plausible scenario of a surgeon who, after using and adapting to a see-through HUD designed to assist in operations, removes the device and makes a serious mistake on a patient due to an aftereffect. Is there a way to retain the advantages
154
WELCH
of rapid and complete telesystem adaptation while avoiding its disruptive and potentially dangerous aftermath? Two solutions may be proposed: (1) eliminate the aftereffects by the use of unlearning or relearning procedures and (2) create “contingent” or “dual” adaptation. It is reasonable to assume that the variables that control or encourage adaptation will play the same roles in its unlearning or relearning. Based on this assumption, the optimal way to abolish postexposure aftereffects is to require users to engage in the same activities after leaving the device as when they were in it. Studies of adaptation to the “traditional” forms of sensory rearrangement (e.g., prismatic displacement) have shown that postexposure aftereffects dissipate much more rapidly if subjects are allowed to interact with the non-rearranged visual world than simply sitting immobile in the dark or even in a lighted setting (e.g., Welch, 1978, Chapter 4). Consider a simulator for training oil tanker pilots that deliberately reproduces the large delay between turning the wheel and the response of the simulated ship that exists for real oil tankers. Clearly, it would be risky for users to drive an automobile or other vehicle immediately after an extended training session in such a simulator. Rather, one would want to be assured that the aftereffects had been completely and permanently abolished before allowing them to leave the premises or, if not, to prohibit them from driving vehicles for a period of time. The latter course of action is based on the assumption (or hope) that the passage of time is sufficient to eliminate aftereffects, perhaps through simple decay or unlearning/relearning from random, unspecified perceptual-motor activities. In contrast, the present proposal would have operators again pilot the virtual ship immediately after the training session, but this time with the device arranged to omit the visual-motor delay. Obviously, this strategy will not work for telesystems whose rearrangements cannot be deactivated. In that case, users should be exposed to a real situation that mimics the one provided by the telesystem and required to engage in the same visual-motor behaviors as during training. The identity of the cues that elicit delayed flashbacks subsequent to aircraft simulator training remains unclear. However, it would seem likely that they are stimuli that duplicate or at least resemble those to which adaptive responses were conditioned during training. Examples might include the sight of the centerline of the road while driving at night and the feel of the steering wheel. A great advantage of the unlearning/relearning procedure, however, is that it is not necessary to know what the sensory triggers are because this procedure will automatically decondition all of the stimuli present, whether critical or not. It can be argued, therefore, that this proposed procedure represents a significant improvement over the current strategy for dealing with delayed flashbacks in which pilots are grounded for 12-24 hours after a simulator training session (e.g., Kennedy et al., 1995). Adaptation, whether demonstrated by the reduction of effect or the postexposure aftereffect, does not start from scratch every time one moves from one sensory
7.
ADAPTING TO TELESYSTEMS
155
environment to another. Rather, both anecdotal and experimental evidence demonstrate that repeated alternation between a rearranged sensory environment and the normal environment leads to decreased perceptual and perceptual-motor disruption at the point of transition. The act of adapting separately to two (or more) mutually conflicting sensory environments has been referred to as “dual adaptation” (e.g., Welch, Bridgeman, Anand, & Browman, 1993) or “context-specific” adaptation (e.g., Shelhamer, Robinson, & Tan, 1992). An everyday example is adjusting to new prescription lenses in which, after repeatedly donning and doffing their spectacles, wearers report that the depth distortions, illusory visual motion, and behavioral difficulties they had experienced at the outset have now largely disappeared. Thus, as the result of this alternating experience, the presence or the absence of the tactual sensations of the spectacles has become the discriminative cue for turning adaptation on or off. It has also been observed that astronauts who have flown into space two or more times experience progressively less initial perceptual and perceptual interference when entering microgravity (or returning to 1 g) and/or regain normal neurovestibular function more rapidly (e.g., Bloomberg, Peters, Smith, Heubner, & Reschke, 1997; Paloski, 1998; Reschke, Bloomberg, Harm, Paloski, Layne, & McDonald, 1998; Reschke, Bloomberg, Harm, Paloski, Layne, & McDonald, 1998). Experimental evidence of dual adaptation for both optically and gravitationally rearranged environments has been reported by Bingham, Muchisky, and Romack (1991), Cunningham and Welch (1994), Flook and McGonigle, 1977), McGonigle and Flook (1978), Paloski, Bloomberg, Reschke, and Harm (1994), Paloski, Reschke, Doxey, and Black (1992), Welch et al. (1993), and Welch, Bridgeman, Williams, and Semmler (1998).8 The preceding evidence, together with reports of dual adaptation to slowly rotating rooms (Guedry, 1965; Kennedy, 1991), parabolic flight (Lackner & Graybiel, 1982), and aircraft simulators (e.g., Kennedy & Fowlkes, 1990), bolsters the previously mentioned efforts to provide astronaut-trainees with PAT procedures (e.g., Harm & Parker, 1994). Thus, days or even weeks will probably separate the last of these preflight adaptation sessions from the day of lift-off. However, despite this 8 Evidence that might seem to contradict the dual adaption hypothesis has been reproted by Kennedy and Stanney (1996), whose subjects revealed increasing, rather than the expected decreasing, postural aftereffects (ataxia) from repeated exposures to an aircraft simulator. Before accepting these data as evidence against the notion of dual adaptation, however, it should be determined if, despite the systematic increase in postexposure ataxia, the rate of readaptation to the normal environemnt changed. An increase in this rate from one simulator exposure to the next would support the dual adaptation hypothesis. Alternatively, the report by Kennedy and Stanney (1996) of progressively increasing aftereffects might signify that subjects’ adaptation was accumulating from one adapatation session to the next, thereby causing the aftereffects to increase as well. According to the dual adaptation notion, however, once adaptation has finally reached asymptote such that further exposures provide little or no increment, the size of the postexposure aftereffects should begin to decline on subsequent exposures and the rate of readaptation increase.
156
WELCH
relatively lengthy period of exposure to normal (1-g) visual-inertial conditions, the installation of dual adaptation will increase the likelihood that training effects will be preserved for use upon arriving in space. Based on the evidence for dual adaptation, it can be argued that users who systematically alternate between adapting to a telesystem and readapting to the normal environment (or to a second telesystem) should eventually be able to make the transition with little or no interference of perception, performance, or physical well-being. Presumably, the ideal dual adaptation-training regimen would include the unlearning/relearning procedure advocated above to eliminate the aftereffects and to produce readaptation. Finally, the “decision” about which form of adaptation to invoke in a given situation requires one or more discriminative cues that reliably differentiate the two conditions. It will be important, therefore, to deliberately provide users with multiple, attention-getting cues for this purpose.
Cognitive Strategies “Intellectual” Corrections for Visual Distortions As noted previously, visual-motor adaptation to sensory rearrangements tends to be much more rapid and complete than visual adaptation. Thus, even when one has learned to perform correctly in the presence of a visual rearrangement, the world may continue to look distorted or unusual. For example, subjects learn to make the correct visual-motor and locomotory responses when viewing the world through goggles that rotate the visual field 180 deg but continue to see it as upside down (e.g., Snyder & Pronko, 1952). In another example, deep-sea divers learn to reach and grasp accurately for objects despite the fact that the water-glass-air interface of their facemasks persists in making these objects appear smaller and closer than they actually are (e.g., Ross, 1971). In both of these examples, observers eventually come to learn the true nature of their visual environment, despite its appearance. Further, in the case of underwater distance distortions it has been shown that this “intellectual” correction can be quickly taught to novice divers by means of verbal error-corrective feedback (Ferris, 1972, 1973a, b). It is easy to imagine implementing such a procedure to train telesystem users to compensate intellectually for those sensory or intersensory distortions that resist “genuine” adaptation or perhaps as a “crutch” at the outset of training while waiting for this adaptation to occur. As described in an earlier section, the conflict between accommodation of the lens and ocular convergence that is sometimes caused by see-through HUDs can make the visual world appear much smaller than it really is, leading to serious errors in judgment and perhaps accidents (Roscoe, 1993). However, because visual adaptation to size distortions is very limited (e.g., Welch, 1978, Chapter 8), it is unlikely that the procedures advocated here will produce adaptation to this perceptual effect. Thus, an alternative is to use verbal feedback to train wearers
7.
ADAPTING TO TELESYSTEMS
157
of see-through HUDs to compensate intellectually for the apparent minification of the visual scene as Ferris (1973a, b) did for divers’ perception of underwater distance. Avoidance Strategies Avoidance strategies involve attempts by the telesystem user to circumvent the unpleasant and disruptive effects of the device. For example, pilot trainees will learn to avoid certain head movements in some aircraft simulators after discovering them to be sources of motion sickness symptoms or being so informed by their instructors (e.g., Kennedy et al. 1987, p. 50). Other examples of avoidance strategies are deliberately slowing down one’s manual control movements and/or using a “wait and move” strategy, in order to minimize the deleterious behavioral effects of visual feedback delays (e.g., Smith & Smith, 1962). Because avoidance strategies, like intellectual corrections, are deliberate and conscious, they are unlikely to endure in the form of aftereffects once the operator has exited the telesystem. Although avoidance strategies may be useful in the early stages of telesystem use, they may be counterindicated later on. This is because users who persist in such tactics are likely to acquire habits that interfere with subsequent performance in the devices and, more importantly, on the real world tasks for which they are being trained. Furthermore, if there is good reason to assume that users are capable of adapting to these discordances, they must eventually be forced to confront them (although perhaps by means of small increments) so that this adaptation can occur.
THE ROLE OF INDIVIDUAL DIFFERENCES As with any measure of human perception or performance, the acquisition rate and magnitude of adaptation to sensory rearrangement varies from person to person (e.g., Welch, 1978, Chapter 11). Thus, it is reasonable to assume that telesystem operators will differ reliably from each other with respect to such things as (1) whether or not they actually detect a discordance, (2) how disruptive the discordance is for them, (3) how adaptable they are to it, (4) how quickly they achieve asymptotic adaptation, and (5) how quickly and/or completely they readapt to the normal sensory world. Little is known about the causes or correlates of these presumed individual traits, although some hypotheses have been offered. For example, Reason and Brand (1975) argued that individuals who are particularly receptive to sensory information are likely to be especially susceptible to motion sickness. Support for this proposal as it applies to the present context was provided by Barrett and Thornton (1968) who found “field-independent” (e.g., Witkin, Dyk, Faterson, Goodenough, & Karp, 1962) individuals to be more susceptible to simulator sickness than “field-dependent” individuals. Barrett and Thornton (1968)
158
WELCH
had predicted just such a difference based on their conjecture that “field independents” are more sensitive than “field dependents” to the conflict between visual and vestibular sensations because the former group is more likely than the latter to rely on vestibular cues. Another suggestion was made by Wann et al. (1995), who speculated that individuals with unstable binocular vision will experience especially large aftereffects, assuming that their telesystem interaction exposes them to stimuli that stress the accommodation (focal) system, vergence systems, or the crosslinks between them. The presence of individual differences in the response to the intersensory and sensorimotor conflicts of current telesystems means that some users will require more adaptation training than others to reach a given level of adaptation. Indeed, it would seem likely that those users who are particularly prone to the deleterious effects of telesystems will receive the most benefit from adaptation training. Further, such individual variability indicates that, even if or when the major problems of current telesystems have been overcome by engineering advances, at least some users will continue to register and be reactive to the small sensory and sensorimotor rearrangements that will undoubtedly remain. Thus, adaptation-training procedures of the sort advocated here will continue to have a place in telesystem training and use.9
SUMMARY AND CONCLUSIONS Unless the sensory and intersensory rearrangements that characterize many current telesystems are corrected by means of improved design and technology, users will continue to experience disruptive and unpleasant perceptual, behavioral, and physiological effects during and after exposure to these devices. It is important to counteract such problems, both to improve performance and increase the user’s well-being and because the failure to do so may jeopardize the diffusion of telesystem technology. One important way to help users overcome these problems is by training procedures based on well-established principles of perceptual and perceptual-motor adaptation to sensory rearrangement. A potential drowback of maximal adaptation is the presence of significant aftereffects. However, it is likely that the latter can be quickly, completely, and perhaps permanently abolished by means of the same procedures and conditions used to produce adaptation in the first place and by the induction of dual adaptation from repeated experience with the telesystem. For those telesystem rearrangements that, for whatever reason, 9 However,
even if all that remains are these idiosyncrasies, a likely response of the human factors engineer would be to create a device that modified itself to conform to them, rather than employing the adaptive fine tuning suggested here. A potential problem with this engineering solution, however, is that it entails confronting the user with a changing stimulus condition which might cause problems of its own (M. Draper, personal communication, May 27, 1998).
7.
ADAPTING TO TELESYSTEMS
159
TABLE 7.1 Adaptive and Cognitive Strategies for Overcoming the Effects of Telesystem Rearrangements
Adapting to the telesystem Active interaction Error-corrective feedback Immediate feedback Incremental exposure Distributed practice Readapting to the normal sensory environment Readaptation (using the “ideal” adaptation procedure) Dual adaptation training (alternating adaptation and readaptation) Cognitive strategies “Intellectual” correction Avoidance behaviors
resist perceptual or even behavioral adaptation the deliberate strategies of “intellectual” correction and avoidance can be used. All of these training procedures are summarized in Table 7.1. There is one serious caveat about the whole undertaking of training telesystem users to overcome the limitations of their devices. Namely, because it is possible for adaptation to inhibit transfer of training to the subsequent real-world task for which it is being used, it is important to carefully balance the advantages and disadvantages of adaptation in terms of its effects on transfer of training. Finally, even when engineering advances ultimately eliminate many of the major sensory and sensorimotor limitations of current telesystems, the adaptation-training procedures proposed here will probably continue to be necessary as a means of accommodating to the idiosyncrasies of the individual user. ACKNOWLEDGMENTS The author wishes to thank the following people (presented in alphabetical order) for their comments on earlier drafts of this manuscript: Frank Biocca, Malcolm M. Cohen, Robert Cole, Mark Draper, Larry Guzy, Curtis Ikehara, Mary Kaiser, Robert Kennedy, Jeffrey McCandless, Donald E. Parker, and Thomas Sheridan. This paper was inspired by two public presentations: “Telepresence and Sensorimotor Adaptation,” a symposium chaired by the author at a conference entitled “Human–Machine Interfaces for Teleoperators and Virtual Environments” (Santa Barbara, CA, March 1990) and a talk entitled “Adapting to Virtual Environments,” delivered by the author at the 64th Annual Scientific Meetings of the Aerospace Medical Association (Toronto, Canada, May 1993).
160
WELCH
REFERENCES Baily, J. S. (1972). Adaptation to prisms: Do proprioceptive changes mediate adapted behavior with ballistic arm movements? Quarterly Journal of Experimental Psychology, 24, 8–20. Ballinger, C. J. (1988). The effects of a pitched field orientation on hand /eye coordination. Unpublished master’s thesis, Naval Postgraduate School, menterely, CA. Baltzley, D. R., Kennedy, R. S., Berbaum, K. S., Lilienthal, M. G., & Gower, D. W. (1989b). The time course of postflight simulator sickness symptoms. Aviation, Space, and Environmental Medicine, 60, 1043–1048. Barfield, W., & Furness, T. A., III (1995). Virtual environments and advanced interface design. New York: Oxford University Press. Barfield, W., Hendrix, C., Bjorneseth, O., Kaczmarek, K. A., & Lotens, W. (1995). Comparison of human sensory capabilities with technical specifications of virtual environment equipment. Presence: Teleoperators and virtual environments, 4, 329–356. Barfield, W., Rosenberg, C., & Lotens, W. A. (1995). Augmented-reality displays. In W. Barfield & T. A. Furness (Eds.), Virtual environments and advanced interface design pp. 542–575. New York: Oxford University Press. Barfield, W., & Weghorst, S. (1993). The sense of presence within virtual environments: A conceptual framework. In G. Salvendy & M. Smith (Eds.), Human–computer interaction: Software and hardware interfaces (pp. 699–704). Amsterdam: Elsevier. Barrett, G. V., & Thornton, C. LO. (1968). Relationship between perceptual style and simulator sickness. Journal of Applied Psychology, 52, 304–308. Bedford, F. L. (1989). Constraints on learning a new mapping between perceptual dimensions. Journal of Experimental Psychology: Human Perception and Performance, 15, 232–248. Bingham, G. P., Muchisky, M., & Romack, J. L. (1991, November). “Adaptation” to displacement prisms is skill acquisition. Paper presented at the annual meeting of the Psychonomic Society, San Francisco, CA. Biocca, F. (1992). Will simulator sickness slow down the diffusion of virtual environment technology? Presence: Teleoperators and virtual environments, 1, 334–343. Biocca, F., & Rolland, J. (1998). Virtual eyes can rearrange your body: Adaptation to visual displacement in see-through, head-mounted displays. Presence: Teleoperators and Virtual Environments, 7, 262–277. Bloomberg, J. J., Peters, B. T., Smith, S. L., Heubner, W. P., & Reschke, M. F. (1997). Locomotor head–trunk coordination strategies following space flight. Journal of Vestibular Research, 7, 161– 177. Brooks, T. L. (1990). Telerobotic response requirements. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics: SMC ’90 (pp. 113–120), New York: IEEE Computer Society Press. Calvert, S. L., & Tan, S. L. (1994). Impact of virtual reality on young adults’ physiological arousal and aggressive thoughts: Interaction vs. observation. Journal of Applied Developmental Psychology, 15, 125–139. Canon, L. K. (1970). Intermodality inconsistency of input and directed attention as determinants of the nature of adaptation. Journal of Experimental Psychology, 84, 141–147. Canon, L. K. (1971). Directed attention and maladaptive “adaptation” to displacement of the visual field. Journal of Experimental Psychology, 88, 403–408. Carr, K., & England, R. (Eds.). (1995). Simulated and virtual realities: Elements of perception. Bristol, PA: Taylor & Francis. Chien, Y. Y., & Jenkins, J. (1994). Virtual reality assessment. Washington, DC: U.S. Government Printing Office. A report of the Task Group on Virtual Reality to the High Performance Computing and Communications and Information Technology Subcommittee of the Information and Communications Research and Development Committee of the National and Science Technology Council.
7.
ADAPTING TO TELESYSTEMS
161
Clower, D. M., & Boussaoud, D. (2000). Selective use of perceptual recalibration versus visuomotor skill acquisition. Journal of Physiology, 84, 2703–2708. Cobb, S. V. G., Nichols, S. C., Ramsey, A. R. & Wilson, J. R. (1999). Virtual reality-Induced symptoms and effects (VRISE). Presence: Teleoperators and Virtual Environments, 8, 169–186. Cohen, M.M. (1967). Continuous versus terminal visual feedback in prism aftereffects.Perceptual & Motor Skills, 24, 1295–1302. Cohen, M. M. (1974). Visual feedback, distribution of practice, and intermanual transfer of prism aftereffects. Perceptual and Motor Skills, 37, 599–609. Cohen, M. M. (1992). Perception and action in altered gravity. Annals of the New York Academy of Sciences, 656, 354–362. Cohen, M. M., & Ballinger, C. J. (1989). Hand–eye coordination is altered by viewing a target in a pitched visual frame [Abstract]. Aviation, Space, & Environmental Medicine, 60, 477. Cohen, M. M., Crosbie, R. J., & Blackburn, L. H. (1973). Disorienting effects of aircraft catapult launchings. Aerospace Medicine, 44, 37–39. Cohen, M. M., & Held, R. (1960, April). Degrading visual–motor coordination by exposure to disorded re-afferent stimulation. Paper presented at the annual meeting of the Eastern Psychological Association, New York. Cole, R. E., Merritt, J. O., Fore, S., & Lester, P. (1990). Remote manipulator tasks are impossible without stereo TV. Stereoscopic Display Applications, SPIE, Vol. 1256, Bellingham, WA, Society of Photo-Optical Instrumentation. Collewijn, H., Martins, A. J., & Steinman, R. M. (1981). The time course of adaptation of human compensatory eye movements. In L. Maffei (Ed.), Pathophysiology of the human visual system. Documenta Ophthalmologica Proceedings Series, 30, 123–133. Collewijn, H., Martins, A. J., & Steinman, R. M. (1983). Compensatory eye movements during active and passive head movements: Fast adaptation to changes in visual magnification. Journal of Physiology, 340, 259–286. Coren, S. (1966). Adaptation to prismatic displacement as a function of the amount of available information. Psychonomic Science, 4, 407–408. Craske, B., & Crawshaw, M. (1978). Spatial discordance is a sufficient condition for oculomotor adaptation to prisms: Eye muscle potentiation need not be a factor. Perception & Psychophysics, 23, 75–79. Cunningham, H. A., & Welch, R. B. (1994). Multiple concurrent visual–motor mappings: Implications for models of adaptation. Journal of Experimental Psychology:Human Perception & Performance, 20, 987–999. Dewar, R. (1970). Adaptation to displaced vision: The influence of distribution of practice on retention. Perception & Psychophysics, 8, 33–34. DiZio, P., & Lackner, J. R. (1992). Spatial orientation, adaptation, and motion sickness in real and virtual environments. Presence: Teleoperators and Virtual Environments, 3, 319–328. DiZio, P., & Lackner, J. R. (1997). Circumventing side effects of immersive virtual environments. In M. Smith, G. Salvendy, & R. Koubek (Eds.), Design of computing systems: Social and ergonomic considerations (pp. 893–896). Amsterdam: Elsevier. Dolezal, H. (1982). Living in a world transformed: Perceptual and performatory adaptation to a visual distortion. New York: Academic Press. Draper, M. H. (1998). The adaptive effects of virtual interfaces: Vestibulo-ocular reflex and simulator sickness (Vols. 1 & 2). Unpublished doctoral dissertation, University of Washington, Seattle, WA. Durlach, N. I., & Mavor, A. S. (1995). Virtual reality: Scientific and technological challenges. Washington, DC: National Academy Press. Durlach, N. I., & Pang, X. D. (1986). Interaural magnification. Journal of the Acoustical Society of America, 80, 1849–1850. Durlach, N. I., Shinn-Cunningham, B. G., & Held, R. (1993). Supernormal auditory localization. I. General background. Presence: Teleoperators and Virtual Environments, 2, 89–103.
162
WELCH
Ebenholtz, S. M. (1969). Transfer and decay functions in adaptation to optical tilt. Journal of Experimental Psychology, 81, 170–173. Ebenholtz, S. M. (1992). Motion sickness and oculomotor systems in virtual environments. Presence: Teleoperators and Virtual Environments, 1, 302–305. Ebenholtz, S. M., & Mayer, D. (1968). Rate of adaptation under constant and varied optical tilt. Perceptual and Motor Skills, 26, 507–509. Ellis, S. R. (1995). Origins and elements of virtual environments. In W. Barfield & T. A. Furness (Eds.), Virtual environments and advanced interface design. pp. 14–57. New York: Oxford University Press. Ferrell, W. R. (1965). Remote manipulation with transmission delay. IEEE Transactions on Human Factors in Electronics, 6, 24–32. Ferrell, W. R. (1966). Delayed force feedback. Human Factors, 8, 449–455. Ferrell, W. R., & Sheridan, T. B. (1967). Supervisory control of remote manipulation. IEEE Spectrum, 4, 81–88. Ferris, S. H. (1972). Improvement of absolute distance estimation underwater. Perceptual and Motor Skills, 35, 299–305. Ferris, S. H. (1973a). Improving absolute distance estimation in clear and turbid water. Perceptual and Motor Skills, 36, 771–776 Ferris, S. H. (1973b). Improving distance estimation underwater: Long-term effectiveness of training. Perceptual and Motor Skills, 36, 1089–1090. Festinger, L., Burnham, C. A., Ono, H., & Bamber, D. (1967). Efference and the conscious experience of perception. Journal of Experimental Psychology Monograph, 74(4, Whole No. 637). Flook, J. P., & McGonigle, B. O. (1977). Serial adaptation to conflicting prismatic rearrangement effects in monkey and man. Perception, 6, 15–29. Fowlkes, J. E., Kennedy, R. S., Hettinger, L. J., & Harm, D. L. (1993). Changes in the dark focus of accommodation associated with simulator sickness. Aviation, Space, and Environmental Medicine, 64, 612–618. Freedman, S. J., & Pfaff, D. W. (1962a). The effect of dichotic noise on auditory localization. Journal of Auditory Research, 2, 305–310. Freedman, S. J., & Pfaff, D. W. (1962b). Trading relations between dichotic time and intensity differences in auditory localization. Journal of Auditory Research, 2, 311–318. Freedman, S. J., & Zacks, J. L. (1964). Effects of active and passive movement upon auditory function during prolonged atypical stimulation. Perceptual and Motor Skills, 18, 361–366. Funda, J., Lindsay, T. S., & Paul, R. P. (1992). Teleprogramming: Toward delay-invariant remote manipulation. Presence: Teleoperators and virtual environments, 1, 29–44. Gonshor, A. & Melvill Jones, G. (1976). Extreme vestibulo-ocular adaptation induced by prolonged optical reversal of vision. Journal of Physiology (London), 256, 381–414. Gower, D. W., Lilienthal, M. G., Kennedy, R. S., & Fowlkes, J. E. (1987, September). Simulator sickness in U.S. Army and Navy fixed- and rotary-wing flight simulators. Conference Proceedings no. 1433 of the AGARD Medical Panel Symposium on Motion Cues in Flight Simulation and Simulator Induced Sickness. (pp. 8.1-8.20), Brussels, Belgium. Graybiel, A., Deane, F. R., & Colehour, J. K. (1969). Prevention of overt motion sickness by incremental exposure to otherwise highly stressful Coriolis acceleration. Aerospace Medicine, 40, 142–148. Graybiel, A., & Wood, C. D. (1969). Rapid vestibular adaptation in a rotating environment by means of controlled head movements. Aerospace Medicine, 40, 638–643. Groen, J., & Werkhoven, P. J. (1998). Visuomotor adaptation to virtual hand position in interactive virtual environments. Presence: Teleoperators and virtual environments, 7, 429–446. Guedry, F. E. (1965). Habituation to complex vestibular stimulation in man: Transfer and retention effects from twelve days of rotation at 10 rpm. Perceptual Motor Skills, 21, (Suppl. 1–V21), 459– 481. Harm, D. L., & Parker, D. E. (1993). Perceived self-orientation and self-motion in microgravity, after landing and during preflight adaptation training. Journal of Vestibular Research, 3, 297–305.
7.
ADAPTING TO TELESYSTEMS
163
Harm, D. L., & Parker, D. E. (1994). Preflight adaptation training for spatial orientation and space motion sickness. The Journal of Clinical Pharmacology, 34, 618–627. Harm, D. L., Zografos, L. M., Skinner, N. C., & Parker, D. E. (1993). Changes in compensatory eye movements associated with simulated conditions of space flight. Aviation, Space and Environmental Medicine, 64, 820–826. Harris, C. S. (1965). Perceptual adaptation to inverted, reversed, and displaced vision. Psychological Review, 72, 419–444. Havron, H. D., & Butler, L. F. (1957). Evaluation of training effectiveness of the 2FH2 helicopter flight trainer research tool (NAVTRADEVCEN 1915-00-1). Port Washington, NY: Naval Training Device Center. Hay, J. C., & Pick, H. L., Jr. (1966). Visual and proprioceptive adaptation to optical displacement of the visual stimulus. Journal of Experimental Psychology, 71, 150–158. Hay, J. C., Pick, H. L., Jr., & Ikeda, K. (1965). Visual capture produced by prism spectacles. Psychonomic Science, 2, 215–216. Held, R. (1955). Shifts in binaural localization after prolonged exposure to atypical combinations of stimuli. American Journal of Psychology, 68, 526–548. Held, R., & Durlach, N. (1993). Telepresence, time delay and adaptation. In S. R. Ellis, M. K. Kaiser, & A. J. Grunwald (Eds.), Pictorial communication in virtual and real environments. (2nd ed, pp. 232– 246). London: Taylor & Francis. Held, R., Efstathiou, A., & Greene, M. (1966). Adaptation to displaced and delayed visual feedback from the hand. Journal of Experimental Psychology, 72, 887–891. Held, R., & Hein, A. (1958). Adaptation of disarranged hand–eye coordination contingent upon reafferent stimulation. Perceptual and Motor Skills, 8, 87–90. Held, R., & Schlank, M. (1959). Adaptation to disarranged eye–hand coordination in the distancedimension. American Journal of Psychology, 72, 603–605. Howard, I. P. (1968). Displacing the optical array. In S. J. Freedman (Ed.), The neuropsychology of spatially oriented behavior. Homewood, IL: Dorsey. Howarth, P. A., & Costello, P. J. (1996). Visual effects of immersion in virtual environments: Interim results from the U.K. Health and Safety Executive Study. Society for Information Display International Symposium Digest of Technical Papers, 27, 885–888. Imamizu, H., & Shimojo, S. (1995). The locus of visual-motor learning at the task or manipulator level: Implications from intermanual transfer. Journal of Experimental Psychology: Human Perception and Performance, 21, 719–733. James, K. R. and Caird, J. K. (1995). The effects of optic flow, propiception, and texture on novice locomotion in virtual environments. In Proceedings of the Human Factors and Ergonomics Society (pp.1405–1409). Santa Monica, CA: Human Factors and Ergonomics Society. Jones, S. A. (1996). Incidence and severity of simulator sickness in posturally-restrained vs. unrestrained drivers. Unpublished doctoral dissertation, University of Central Florida. Kalawsky, R. S. (1993). The science of virtual reality and virtual environments. Wokingham, England: Addison-Wesley. Kennedy, R. S. (1991). Long-term effects of living in rotating artificial gravity environments. Aviation, Space, and Environmental Medicine, 62, 491. Kennedy, R. S., Berbaum, K. S., Dunlap, M. P., Mulligan, B. E., Lilienthal, M. G., & Funaro, J. F. (1987). Guidelines for alleviation of simulator sickness symptomatology (Tech. Rep. No. TR-87– 007). Orlando, FL: Navy Training Systems Center. Kennedy, R. S., & Fowlkes, J. E. (1990, June). What does it mean when we say that “simulator sickness is polygenic and polysymptomatic”? Paper presented at the IMAGE V conference, Phoenix, AZ. Kennedy, R. S., Fowlkes, J. E., & Lilienthal, M. G. (1993). Postural and performance changes following exposures to flight simulators. Aviation, Space, and Environmental Medicine, 64, 912– 920.
164
WELCH
Kennedy, R. S., Jones, M. B., Stanney, K. M., Ritter, A. D., & Drexler, J. M. (1996). Human factors safety testing for virtual environment mission-operation training (Contract No. NAS9–19482). Houston, TX: NASA Johnson Space Center. Kennedy, R. S., Lane, N. E., Berbaum, K. S., & Lilienthal, M. G. (1993). Simulator Sickness Questionnaire: An enhanced method for quantifying simulator sickness. The International Journal of Aviation Psychology, 3, 203–220. Kennedy, R. S., Lane, N. E., Lilienthal, M. G., Berbaum, K. S., & Hettinger, L. J. (1992). Profile analysis of simulator sickness symptoms: Application to virtual environment systems. Presence: Teleoperators and Virtual Environment, 1, 295–301. Kennedy, R. S., Lanham, D. S., Drexler, J. M., & Lilienthal, M. G. (1995). A method for certification that after effects of virtual reality exposures have dissipated: Preliminary findings. In A. C. Bittner & P. C. Champney (Eds.), Advances in Industrial Safety VII (pp. 263–270). London: Taylor & Francis. Kennedy, R. S., Lilienthal, M. G., Berbaum, K. S., Baltzley, D. R., & McCauley, M. E. (1989). Simulator sickness in U.S. Navy flight simulators. Aviation, Space, and Environmental Medicine, 60, 10–16. Kennedy, R. S., & Stanney, K. M. (1996). Postural instability induced by virtual reality exposure: Development of a certification protocol. International Journal of Human–Computer Interaction, 8, 25–47. Kennedy, R. S., & Stanney, K. M. (1997). Aftereffects of virtual environment exposure: Psychometric issues. In M. Smith, G. Salvendy, & R. Koubek (Eds.), Design of computing systems: Social and ergonomic considerations (pp. 897–900). Amsterdam: Elsevier. Kennedy, R. S., Stanney, K. M., Ordy, J. M., & Dunlap, W. P. (1997). Virtual reality effects produced by head-mounted display (HMD) on human eye–hand coordination, postural equilibrium, and symptoms of cybersickness. Society for Neuroscience Abstract, 23, 772. Kohler, I. (1964). The formation and transformation of the perceptual world. Psychological Issues, 3, 1–173. Kornheiser, A. S. (1976). Adaptation to laterally displaced vision: A review. Psychological Bulletin, 83, 783–816. Lackner, J. R., & Graybiel, A. (1982). Rapid perceptual adaptation to high gravitoinertial force levels: Evidence for context-specific adaptation. Aviation, Space and Environmental Medicine, 53, 766–769. Lackner, J. R., & Graybiel, A. (1983). Perceived orientation in free-fall depends on visual, postural, and architectural factors. Aviation, Space, and Environmental Medicine, 54, 47–51. Lackner, J. R., & Lobovits, D. (1977). Adaptation to displaced vision: Evidence for prolonged aftereffects. Quarterly Journal of Experimental Psychology, 29, 65–59. Lackner, J. R., & Lobovits, D. (1978). Incremental exposure facilitates adaptation to sensory rearrangement. Aviation, Space and Environmental Medicine, 49, 362–264. Matin, L., & Fox, C. R. (1989). Visually perceived eye level and perceived elevation of objects: Linearly additive influences from visual field pitch and from gravity. Vision Research, 29, 315–324. McCandless J. W., Ellis S. R., & Adelstein, B. D. (2000). Localization of a time-delayed, monocular virtual object superimposed on a real environment. Presence: Teleoperators and Virtual Environments, 9, 15–24. McCauley, M. E., & Sharkey, T. J. (1992). Cybersickness: Perception of self-motion in virtual environments. Presence: Teleoperators and virtual environments, 1(3), 311– 318. McGonigle, B. O., & Flook, J. P. (1978). Long-term retention of single and multistate prismatic adaptation by humans. Nature, 272, 364–366. McGovern, D. E. (1993). Experiences and results in teleoperation of land vehicles. In S. R. Ellis, M. K. Kaiser, & A. J. Grunwald (Eds.), Pictorial communication in virtual and real environments (2nd ed., pp. 182–195). London: Taylor & Francis. McKinnon, G. M., & Kruk, R. (1993). Multi-axis control in telemanipulation and vehicle guidance. In S. R. Ellis, M. K. Kaiser, & A. J. Grunwald (Eds.), Pictorial communication in virtual and real environments (2nd ed., pp. 247–264). London: Taylor & Francis. Meyer, K., Applewhite, H., & Biocca, F. (1992). A survey of position trackers. Presence: Teleoperators and Virtual Environments, 1, 173–201.
7.
ADAPTING TO TELESYSTEMS
165
Miller, J. W., & Goodson, J. E. (1960). Motion sickness in a helicopter simulator. Aerospace Medicine, 31, 204–212. Mon-Williams, M., Rushton, S., & Wann, J. P. (1995). Binocular vision in stereoscopic virtual-reality systems. Society for Information Display International Symposium Digest of Technical Papers, 25, 361–363. Mon-Williams, M., Wann, J. P., & Rushton, S. (1993). Binocular vision in a virtual world: Visual deficits following the wearing of a head-mounted display. Ophthalmic and Physiological Optics, 13, 387–391. Nemire, K., Jacoby, R. H., & Ellis, S. R. (1994). Simulation fidelity of a virtual environment display. Human Factors, 36, 79–93. Oman, C. M. (1990). Motion suckness: A synthesis and evaluation of the sensory conflict theory. Canadian Journal of Physiology and Pharmacology, 68, 294–303. Oman, C. M. (1991). Sensory conflict in motion sickness: An observer theory approach. In S. R. Ellis (Ed.), Pictorial communication in virtual and real environments (pp. 362–376). New York: Taylor & Francis. Paillard, J., & Brouchon, M. (1968). Active and passive movements in the calibration of position sense. In S. J. Freedman (Ed.), The neuropsychology of spatially oriented behavior. Homewood, IL: Dorsey. Paloski, W. H. (1998.) Vestibuluospinal adaptation to microgravity. Otolaryngological Head and Neck Surgery, 118, S39–S44, Paloski, W. H., Bloomberg, J. J., Reschke, M. F., & Harm, D. L. (1994). Space flight-induced changes in posture and locomotion. Journal of Biomechanics, 27, 812. Paloski, W. H., Reschke, M. F., Doxey, D. D., & Black, F. O. (1992). Neurosensory adaptation associated with postural ataxia following space flight. In M. Woolacott & F. Horak (Eds.), Posture and Gait: Control Mechanisms. Eugene, Oregon: University of Oregon Press, pp. 311– 315. Parker, D. E., Reschke, M. F., Ouyang, L., Arrott, A. P., & Lichtenberg, B. K. (1986). Vestibulo-ocular reflex changes following weightlessness and preflight adaptation training. In E. Keller & D. Zee (Eds.), Adaptive processes in visual and oculomotor control systems (pp. 103–109). Oxford, England: Pergamon. Pausch, R., Crea, T., & Conway, M. (1992). A literature survey for virtual environments: Military flight simulator visual systems and simulator sickness. Presence: Teleoperators and Virtual Environments, 1(3), 344–363. Radeau, M. (1994). Auditory–visual spatial interaction and modularity. Current Psychology of Cognition, 13, 3–51. Radeau, M., & Bertelson, P. (1974). The aftereffects of ventriloquism. Quarterly Journal of Experimental Psychology, 26, 63–71. Radeau, M., & Bertelson, P. (1977). Adaptation to auditory–visual discordance and ventriloquism in semirealistic situations. Perception & Psychophysics, 22, 137–146. Reason, J. T. (1978). Motion sickness adaptation: A neural mismatch model. Journal of the Royal Society of Medicine, 71, 819–829. Reason, J. T., & Brand, J. J. (1975). Motion sickness. London: Academic Press. Redding, G. M. (1973). Simultaneous visuomotor adaptation to optical tilt and displacement. Perception & Psychophysics, 17, 193–200. Reschke, M. F., Bloomberg, J. J., Harm, D. L., Paloski, W. H., Layne, C. S, & McDonald, P. V. (1998). Posture, locomotion, spatial orientation, and motion sickness as a function of space flight. Brain Research Review, 28, 102–117 Regan, E. C. (1995). An investigation into nausea and other side-effects of head-coupled immersive virtual reality. Virtual Reality, 1(1), 17–32. Regan, E. C., & Ramsey, A. D. (1994). Some side-effects of immersion virtual reality: The results of four immersions. Report 94R012, Army Personnel Research Establishment, Ministry of Defence, Farnborough, Hampshire, England.
166
WELCH
Robinett, W., & Rolland, J. P. (1992). A computational model for the stereoscopic optics of a head-mounted display. Presence: Teleoperators and virtual environments, 1, 45–62. Rock, I. (1965). Adaptation to a minified image. Psychonomic Science, 2, 105–106. Rock, I. (1966). The nature of perceptual adaptation. New York: Basic Books. Rock, I., & Kauffman, L. (1962). The moon illusion, II. Science, 136, 1023–1031. Rolland, J. P., Biocca, F. A., Barlow, T., & Kancherla, A. (1995). Quantification of adaptation to virtual-eye location in see-thru head-mounted displays. , IEEE Virtual Reality Annual International Symposium ’95 (pp. 56–66). Los Alimitos: CA: IEEE Computer Society Press. Roscoe, S. N. (1993). The eyes prefer real images. In S. R. Ellis, M. K. Kaiser, & A. J. Grunwald (Eds.), Pictorial communication in virtual and real environments (2nd ed., pp. 577–585). London: Taylor & Francis. Ross, H. E. (1971). Spatial perception underwater. In J. D. Woods & J. N. Lythgoe (Eds.), Underwater science, chap. 3, pp. 69–101. London: Oxford University Press. Rushton, S., Mon-Williams, M., & Wann, J. (1994). Binocular vision in a bi-ocular world: New generation head-mounted displays avoid causing visual deficit. Displays, 15, 55–260. Shelhamer, M., Robinson, D. A., & Tan, H. S. (1992). Context-specific adaptation of the gain of the vestibulo-ocular reflex in humans. Journal of Vestibular Research, 2, 89–96. Sheridan, T. B. (1989). Telerobotics. Automatica, 25, 487–507. Sheridan, T. B. (1992). Defining our terms. Presence: Teleoperators and Virtual Environments, 1, 272–274. Shinn-Cunningham, B. G., Durlach, N. I., & Held, R. (1992). Adaptation to transformed auditory localization cues in a hybrid real/virtual environment. Journal of the Acoustical Society of America, 92, 2334. Smith, K. U., & Smith, W. K. (1962). Perception and motion. Philadelphia: Saunders. Snyder, F. W., & Pronko, N. H. (1952). Vision with spatial inversion. Wichita, KS: University of Wichita Press. Stoper, A. E., & Cohen, M. M. (1989). Effect of strsuctured visual environments on apparent eye level. Perception & Psychophysics, 46, 469–475. Stratton, G. (1896). Some preliminary experiments on vision without inversion of the retinal image. Psychological Review, 3, 611–617. Stratton, G. (1897a). Upright vision and the retinal image. Psychological Review, 4, 182–187. Stratton, G. (1897b). Vision without inversion of the retinal image. Psychological Review, 4, 341–460, 463–481. Taub, E., & Goldberg, I. A. (1973). Prism adaptation: Control of intermanual transfer by distribution of practice. Science, 180, 755–757. Taylor, J. G. (1962). The behavioral basis of perception. New Haven, CT: Yale University Press. Uliano, K. C., Kennedy, R. S., & Lambert, E. Y. (1986). Asynchronous visual delays and the development of simulator sickness. In Proceedings of the Human Factors Society 30th Annual Meeting (pp. 422–426). Dayton, OH: Human Factors Society. Vertut, J., & Coiffet, P. (1986). Robot Technology—Teleoperations and Robotics: Evolution and Development Vol. 3A, Prentice Hall, Englewood Cliff, NJ. Virre, E. S., Draper, M. H., Gailey, C., Miller, D., & Furness, T. A. (1988). Adaptation of the VOR in patients with low VOR gains. Journal of Vestibular Research, 8, 331–334. Wallach, H., & Karsh, E. B. (1963a). The modification of stereoscopic depth perception and the kinetic depth effect. American Journal of Psychology, 76, 429–435. Wallach, H., & Karsh, E. B. (1963b). Why the modification of stereoscopic depth-perception is so rapid. American Journal of Psychology, 76, 413–420. Wallach, H., Moore, M. E., & Davidson, L. (1963). Modification of stereoscopic depth perception. American Journal of Psychology, 76, 191–204. Wann, J. P., Rushton, S. K., & Mon-Williams, M. (1995). Natural problems for stereoscopic depth perception in virtual environments. Vision Research, 19, 2731–2736.
7.
ADAPTING TO TELESYSTEMS
167
Wann, J. P., & Mon-Williams, M. (1997). Health issues with virtual reality displays:What we do know and what we don’t. Computer Graphics (May), 53–57. Welch, R. B. (1969). Adaptation to prism-displaced vision: The importance of target pointing. Perception & Psychophysics, 5, 305–309. Welch, R. B. (1972). The effect of experienced limb identity upon adaptation to simulated displacement of the visual field. Perception & Psychophysics, 12, 453–456. Welch, R. B. (1978). Perceptual modification: Adapting to altered sensory environments. New York: Academic Press. Welch, R. B. (1986). Adaptation of space perception. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance. Chap. 24 New York: Wiley. Welch, R. B. (1994). The dissection of intersensory bias: Weighting for Radeau. Current Psychology of Cognition, 13, 117–123. Welch, R. B. (1997). The presence of aftereffects. In G. Salvendy, M. Smith, & R. Koubek (Eds.), Design of computing systems: Cognitive considerations (pp. 273–276). Amsterdam: Elsevier. Welch, R. B. (1999). How can we determine if the sense of presence affects task performance? Presence: Teleoperators and Virtual Environments. Welch, R. B., & Post, R. B. (1996). Accuracy and adaptation of reaching and pointing in pitched visual environments. Perception & Psychophysics, 58, 383–389 Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–667. Welch, R. B., Cohen, M. M., & DeRoshia, C. W. (1996). Reduction of the elevator illusion from continued hypergravity exposure and visual error-corrective feedback. Perception & Psychophysics, 58, 22–30. Welch, R. B., Bridgeman, B., Anand, S., & Browman, K. E. (1993). Alternating prism exposure causes dual adaptation and generalization to a novel displacement. Perception & Psychophysics, 54, 195–204. Welch, R. B., Bridgeman, B., Williams, J. A., & Semmler, R. (1998). Dual adaptation and adaptive generalization of the human vestibulo-ocular reflex. Perception & Psychophysics, 60, 1415–1423. White, K. D., Shuman, D., Krantz, J. H., Woods, C. B., & Kuntz, L. A. (1990, March). Destabilizing effects of visual environment motions simulating eye movements or head movements. In N. I. Durlach & S. R. Ellis (Eds.), Human–machine interfaces for teloperators and virtual environments, (NASA Conference Pub. No. 10071). Santa Barbara, CA: National Aeronautics and Space Administration. Wickens, C. D. (1986). The effects of control dynamics on performance. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance. pp. 39–60. New York: Wiley. Wickens, C. D. (1992). Engineering psychology and human performance, (2nd ed.). New York: HarperCollins. Willey, C. F., Inglis, E., & Pearce, C. H. (1937). Reversal of auditory localization. Journal of Experimental Psychology, 20, 114–130. Wilpizeski, C. R., Lowry, L. D., Contrucci, R. R., Green, S. J., & Goldman, W. S. (1985). Effects of head and body restraint on experimental motion-induced sickness in squirrel monkeys. Aviation, Space, and Environmental Medicine, 56, 1070. Wilson, J. R., Nichols, S., & Haldane, C. (1997). Presence and side effects: Complementary or contradictory? In M. Smith, G. Salvendy, & R. Koubek (Eds.), Design of computing systems: Social and ergonomic considerations (pp. 889–892). Amsterdam: Elsevier. Witkin, H. A., Dyk, R., Faterson, H., Goodenough, D. R., & Karp, S. A. (1962). Psychological differentiation: Studies of development. New York: Wiley. Yachzel, B., & Lackner, J. R. (1977). Adaptation to displaced vision: Evidence for transfer of adaptation and long-lasting aftereffects. Perception & Psychophysics, 22, 147–151. Young, P. T. (1928). Auditory localization with acoustical transposition of the ears. Journal of Experimental Psychology, 11, 399–429.
PART II: Virtual Environments
8 A Tongue-Based Tactile Display for Portrayal of Environmental Characteristics Paul Bach-y-Rita Kurt A. Kaczmarek Center for Neuroscience and Department of Biomedical Engineering University of Wisconsin
Mitchell E. Tyler Wicab, Inc.
More than 30 years of studies with vibro-tactile and electro-tactile human–machine interfaces on various parts of the body surface have demonstrated the feasibility of portraying environmental characteristics by means of tactile displays. However, until very recently, tactile communication systems, such as for sensory substitution, have not achieved practical utility for a variety of reasons. Mechanical systems were bulky, power hungry, and noisy. Electrical stimulation of touch (electrotactile stimulation) was not comfortable and produced unreliable and unstable tactile percepts because of the highly variable conditions at the electrode–skin interface. Recent experiments have shown, however, that the tongue may be an ideal site for a practical electro-tactile information display, overcoming many of the problems encountered in previous studies. The tongue is very sensitive and highly mobile. The presence of an electrolytic solution (saliva) assures good electrical contact. Perception with electrical 169
170
BACH-Y-RITA, KACZMAREK, AND TYLER
stimulation of the tongue appears to be better than with fingertip electro-tactile stimulation, and the tongue requires only about 3% of the voltage (5–15 V) and much less current (0.4–2.0 mA) than the fingertip (Bach-y-Rita, Kaczmarek, & Meier, 1998; Bach-y-Rita, Kaczmarek, Tyler, & Garcia-Lara, 1998). The potential exists for a simple, practical, and cosmetically acceptable interface (built into an orthodontic retainer), with FM signals from the artificial sensor carrying the information wirelessly to the tongue display, for relay to the brain. The display has potential application for persons with sensory loss (blindness, deafness), for communications (sensate Internet), for transportation (night vision), for surgery (sensate probes), and many other uses.
CONCEPTUAL FRAMEWORK Information Transmission and Sensorimotor Integration For the brain to correctly interpret information from devices, it is not necessary that it be presented in the same form as in natural sensory information systems. We do not “see” with the eyes (Bach-y-Rita, 1972); the visual image does not go beyond the retina, where it is turned into patterns of pulses along nerves. Those individual pulses are not different from the pulses from the big toe; it is the brain that recreates the image from the patterns of pulses. The studies on the development of practical devices are based on concepts of brain plasticity (see, e.g., Bach-y-Rita, 1972, 1995). The brain is able to recreate “visual” images that originate in an artificial receptor (a TV camera), which are transduced into a tactile display (the TVSS system) and carried to the brain via tactile pathways. It is only necessary to present the information from a device in a form of energy that can be mediated by the receptors at the human–machine interface, and for the brain, through a motor system (e.g., a head-mounted camera under the motor control of the neck muscles, for blind persons), to know the origin of the information. The tactile system is as capable as the visual and auditory systems for information transmission (Bach-y-Rita, 1972), but, in the absence of a practical interface, it has largely been ignored. The tongue display does not require that “feeling” be altered; the tongue system will carry information from a variety of devices that have nothing to do with feeling. Extensive experience with TVSS (reviewed below) and in many reported studies of the mechanisms of sensorimotor integration (recently reviewed by Mari˜no, Martinez, & Canedo, 1999), demonstrate that somatosensory activity is linked to active exploration of the environment and is required for the discrimination of texture and interpretation of complex spatiotemporal patterns of stimuli activating different classes of mechanoreceptors. The interplay between sensory and motor systems culminates in the subjective perception of shape, form, size, surface structure, and mechanical properties of objects as they are actively explored. The complex limb movements of primates have been shown to be under elaborate feedback from skin, deep muscular, and joint receptors ascending in the dorsal column system. Animals with well-developed explorative and
8.
A TONGUE-BASED TACTILE DISPLAY
171
manipulative skills have a greater number of dorsal root fibers innervating the forelimb than less dexterous animals, and most of those fibers are cutaneous. Sensorimotor integration begins at the level of the second-order neurons of the somatosensory system, at the dorsal column nucleus. Several mechanisms, including efferent control by the cerebral cortex, enhance the activity of second-order neurons. The sensorimotor cortex can discriminate wanted from unwanted sensory information. Sensorimotor integration is common to all the sensory systems and has been important in the development of our sensory substitution systems. Our studies have revealed the capacity for interchangeability of the location of the interface on the body and of the motor control systems, with maintenance of perceptual characteristics. Late Brain Reorganization A number of questions that were posed in a 1967 paper (Bach-y-Rita, 1967) have been answered, at least in part, by our TVSS studies and those of others, such as (Frost & Metin, 1985; Metin & Frost, 1989) and many other studies (cf. Bach-yRita, 1995). Among the 1967 questions were the following: Is it possible to alter the central effects of afferent impulses from a circumscribed region? Central sensory representation, although to a degree phylogenetically determined, may possibly be modified if the functional roles of the particular “sensations” are modified. This concept can be submitted to test by increasing the functional demands from a cutaneous area that normally has a limited sensory role. This can be accomplished by presenting suitable coded “visual” information from an artificial receptor to a cutaneous area. A successful vision substitution system, producing high-resolution “visual” (rather than the normal tactile) experiences on presentation of the optical images to the skin of the back may produce measurable central effects. Possible mechanisms of the late brain reorganizations have been evaluated elsewhere (Bach-y-Rita, 1972, 1995). A number of studies provide the neural substrates for plastic changes with training; examples include the greatly increased cortical representation of a fingertip area in monkeys following training in haptic exploration (Jenkins, Merzenich, Ochs, Allard, & Guic-Robles, 1990) and the expanded finger motor cortex representation in piano players, as well as in the sensorimotor cortex in Braille readers, reported by Pascual-Leone and Torres (1993).
ACCESS BY BLIND PERSONS TO COMPUTER GRAPHICS Tactile Evoked Potentials in Blind Persons The tactile vision substitution system studies have demonstrated that the brain is sufficiently plastic to reorganize function to utilize the information from sensory substitution systems. However, the neural mechanisms underlying the plastic changes in sensory substitution are not known at present. One study from our
172
BACH-Y-RITA, KACZMAREK, AND TYLER
laboratory has demonstrated that blind persons with extensive tactile training show consistent differences from sighted persons in the somatosensory evoked potential (Feinsod, Bach-y-Rita, & Madey, 1973). The stimulus was delivered to the finger, and the evoked potential was recorded over the somatosensory cortex. The fast initial components (N, and P,) of the sensory evoked response are regarded as thalamic radiation responses. Their latencies are thus related to the conduction along peripheral (median) nerve and spinothalamic tracts, which are not altered by blindness. However, a difference between the blind and sighted subjects is evident at components N2, P2, and N3. In each case, the component occurs earlier in the blind subject. The latency of N2 was shorter by 6%, of P2 by 9%, and of N3 by 13% in the blind subjects. Tactile Evoked Potentials in Deaf Persons We have also demonstrated changes in the auditory evoked potential in blind persons (Woods, Clayworth, & Bach-y-Rita, 1985). We compared auditory evoked potentials (AEPs) in normal subjects and age and sex-matched subjects who had been blind since early infancy. Blind and sighted subjects had comparable brainstem auditory evoked potentials (BAEPs), but blind subjects showed shortened latencies and, in some cases, enhanced amplitudes of middle- and long-latency AEPs. Intergroup differences were first observed at latencies of 50–70 msec in middle latency recordings. Blind subjects also showed amplitude enhancements in toneevoked N1 and P2 components, and shortened latencies of P3s and reaction times in a target detection task. The results suggested that the reorganization of the auditory system following early blindness is primarily the result of changes in auditory structures in the forebrain.
PERCEPTUAL STUDIES Tactile Vision Substitution Over the last 100 years, a number of research groups have been studying tactile interfaces for sensory substitution (c.f., Bach-y-Rita, Collins, Saunders, White, & Scadden, 1969: Collins & Bach-y-Rita, 1973; Kaczmarek & Bach-y-Rita, 1995). This discussion will be limited to our own published studies. We developed tactile vision substitution systems (TVSS) to deliver visual information from a TV camera to arrays of stimulators in contact with the skin of one of several parts of the body, including the abdomen, back, thigh, forehead, and fingertip (e.g., Bach-y-Rita, 1972; 1989, 1995; Bach-y-Rita et al., 1969; Bach-y-Rita, Kaczmarek, & Meier, 1998; Bach-y-Rita, Kaczmarek, Tyler and Garcia-Lara, 1998; Collins & Bach-y-Rita, 1973; White, Saunders, Scadden, Bach-y-Rita, & Collins, 1970). Mediated by the skin receptors, energy transduced
8.
A TONGUE-BASED TACTILE DISPLAY
173
from any of a variety of artificial sensors (e.g., camera, pressure sensor, and displacement) is encoded as neural pulse trains. In this manner, the brain is able to recreate “visual” images that originate in a TV camera. Indeed, after sufficient training with the TVSS, our participants, who were blind, reported experiencing the images in space instead of on the skin. They learned to make perceptual judgments using visual means of analysis, such as perspective, parallax, looming and zooming, and depth judgments. Although the TVSS systems have only had between 100 and 1,032 point arrays, the low resolution has been sufficient to perform complex perception and “eye”–hand coordination tasks. These have included facial recognition, complex inspection-assembly tasks and accurate judgment of speed and direction of a rolling ball with over 95% accuracy in batting the ball as it rolls over a table edge. Several characteristics of these tasks are described in the following paragraphs. Camera movement must be under the control of one of the subject’s motor systems (hand, head movement, or any other). Indeed, we have shown that this is possible, once the blind person has learned the mechanics. This includes camera control: zooming, aperture and focus, and the correct interpretation of the effects of camera movement, such as occurs when the camera is moved from left to right and the image seems to move from right to left. Further, many phenomena associated with vision have to be learned; for example, when viewing a person seated behind a desk, the partial image of the person must be correctly interpreted as a complete person with the image of the desk interposed, rather than perceiving just half a person. The subjective experience is comparable (if not qualitatively identical to) vision, including subjective spatial localization in the three-dimensional world. Even the visual illusions that have been tested (e.g., the waterfall effect) are the same as vision. Once a blind person has learned with one motor system (e.g., a handheld camera, thus using the corresponding kinesthetic system), the camera can be switched to another system (e.g., mounted on the head), with no loss of perceptual capacity. And when the human–machine interface, the electro- or vibro-tactile array, is moved from one area of skin to another (e.g., from the back to the abdomen or to the forehead), there is no loss of correct spatial localization even when the array is switched from back to front, because the trained blind subject is not perceiving the image on the skin, but is locating it correctly in space. Similarly, a blind person using a long cane does not perceive the resulting stimulation as being in the hand, but correctly locates it on the ground being swept with the cane, and a person writing with a pen does not perceive the contact as being on the fingers, but rather locates it subjectively on the page. However, in studies with congenitally blind adolescents and adults, the emotional content, or qualia (as the term that will be used here) of the sensory experience appears to be missing. The apparent absence of qualia in our studies may be related to the few hours of practice that the subjects have had, or it may be related to more fundamental issues. Sensory substitution experience that starts in
174
BACH-Y-RITA, KACZMAREK, AND TYLER
early childhood appears to have a different relation to qualia, and thus babies and children react with interest and pleasure (cf. Bach-y-Rita, 1996). Touch Substitution In addition to vision substitution, the feasibility of touch sensory substitution has been demonstrated, such as for persons who have insensate hands due to leprosy (but have preserved sensation centrally, such as on the forehead). A special glove was fabricated with a single pressure transducer per fingertip that (with movement of the finger over a surface) relayed the pattern on the fingertips to an area of skin on the forehead where sensation was intact. Within minutes, participants were able to distinguish rough from smooth surfaces, and soft and hard objects, and the structure (curved, irregular, etc.) of the surface was perceived as if coming from the fingers. In fact, a small crack in the table surface could be detected. After 1 day of use, one patient expressed delight at feeling the finger contact when he touched his wife, something that he had been unable to experience for 20 years (Bach-y-Rita, 1995; Collins & Madey, 1974). Thus, even though the actual human–machine interface was on the forehead, the Leprosy patient perceived the sensation on the fingertip, as his motor control over the placement of the substitute tactile sensors directed the localization in space to the surface where the stimulation originated. Under sponsorship by the National Aeronautics and Space Administration (NASA), we extended this work to the development of gloves for astronauts. Sensors were placed in the fingertips of gloves in order to compensate for the loss of tactile sensation that causes the decrease in manual performance (Bach-y-Rita, Webster, Tompkins, & Crabb, 1987). This same technology has been extended to other applications as well. For example, an insole–pressure pad receptor system was developed for diabetic persons with insensate feet (Kothari, Webster, Tompkins, Wertsch, & Bach-y-Rita, 1988; Maalej et al., 1987; Wertsch, Webster, & Tompkins, 1992; Zhu et al., 1990). The pressure data acquisition system consisted of a pair of insoles instrumented with 14 pressure sensors, a portable microprocessor-based data acquisition system, and a microcomputer. The user wore the ET array on the back of their thigh and readily learned to recognize and use the pressure distribution information to reestablish functional postural control sufficient to allow for standing and slow walking.
TONGUE INTERFACE Physical Characteristics Although our studies over 30 years showed the potential for practical sensory substitution systems, the actual development of such systems eluded us, due to comfort and cosmetic considerations, and for technical reasons. To provide for predictable electro-tactile perception, our studies suggested that electrodes for
8.
A TONGUE-BASED TACTILE DISPLAY
175
fingertip use must have a coaxial structure (small center electrode surrounded by a ground plane for the return current) to obtain localized sensation, with the stimulation current explicitly controlled for each electrode because of the highly nonlinear electrode–skin interface (Boxtel, 1977; Kaczmarek & Webster, 1989). However, the electrode geometry and driving circuitry can be simplified for tongue stimulation, possibly because the electrode–skin interface on the tongue is very different from that on the fingertips. In particular, elimination of the ground plane allows closer interelectrode spacing for greater spatial resolution, and using voltage rather than current control greatly simplifies output circuitry, leading to the use of integrated circuits. This is a great advantage for practical implementation, which in the future should include miniaturization and economy of production. The tactile percept elicited by electrical stimulation of touch varies along a number of dimensions, depending on the stimulus parameters. Along with qualities such as intensity (“loudness”) and frequency (“pitch”), which are associated with mechanical vibratory stimuli, electro-tactile stimuli possess qualities that are not as well defined but which nonetheless are easily discriminated by subjects. Some of these qualities are primarily spatial in nature (e.g., “focused” vs. diffuse), whereas some are primarily intensive (e.g., “pressure” vs. “vibration” vs. “pinprick”). The intensive qualities can have a large influence on the overall comfort or discomfort of the stimulus, and can in large part define the useful operating range of the tactile display system, from sensation threshold to pain threshold.1 Further, variation in percept quality may itself be useful for conveying information in the form of “tactile colors” (Aiello, 1998a). We have noted a gradation of sensory threshold, lowest at the tip of the tongue, and thus have mapped the differences over the surface of the tongue because it may be necessary to develop electro-tactile arrays with varying intensities (in preparation). Electro-tactile stimuli are delivered to the dorsum of the tongue via flexible electrode arrays (Fig. 8.1) placed in the mouth, with connection to the stimulator apparatus via a cable passing out of the mouth and held lightly between the lips. The tongue electrode array is made of a thin (100-µm) strip of polyester material onto which a rectangular matrix of gold-plated circular electrodes has been deposited by a photolithographic process similar to that used to make printed circuit boards. All electrodes are separated by 2.34 mm (center to center) and gold plated for biocompatibility. The 12 × 12 array is approximately 3 cm square. Waveform and Control Method The electro-tactile stimulus consists of positive, 40-µs pulses delivered sequentially to each of the active electrodes in the pattern. Bursts of three pulses each are delivered at a rate of 50 Hz with a 200-Hz pulse rate within a burst. This structure was shown previously to yield strong, comfortable electro-tactile percepts on the 1 Some applications, such as hazard warning systems, may employ brief painful stimuli that are not easily masked by other situational sensory stimuli.
176
BACH-Y-RITA, KACZMAREK, AND TYLER
FIG. 8.1. Picture of the prototype electro-tactile display unit (TDU) with a laptop PC and a 12 × 12 tongue array.
abdomen (Kaczmarek, Webster, & Radwin, 1992); similar confirmatory studies on the tongue are planned. Output coupling capacitors in series with each electrode guarantee 0 DC current to minimize potential tissue irritation. Elimination of the ground plane simplifies electrode design and allows the use of either larger electrodes for increased percept quality or more electrodes for increased array density. Of course, eliminating the ground plane means that the electrodes themselves must serve as the return current path.2 We have called this a distributed ground scheme. In addition, if the electrode activation is multiplexed (i.e., only one electrode in a given region of the array is driven at any time), the inactive electrodes can be connected to ground through either active switching or passive leakage through the driver circuit output impedance, both of which are particularly easy to combine with voltage-control circuitry. In summary, the combination of a distributed ground and voltage control enable use of a trivial output 2 The
use of an external ground plane separated from the active electrode array, although common in biopotential recording, is not useful for electro-tactile stimulation because it causes the current to spread away from the active electrodes, often resulting in a diffuse, radiating percept.
8.
A TONGUE-BASED TACTILE DISPLAY
177
circuit design, one that can be based on commercially available logic integrated circuits, therefore minimizing the need for discrete components. The increasing quality of the electro-tactile percept with electrode size, however, presents a challenge to the system designer, who must choose (along a continuum) between a densely packed array of smaller electrodes with less desirable perceptual qualities and a lower resolution array of larger electrodes that present a higher quality percept. Ongoing research will determine if this limitation can be circumvented by choice of different waveform parameters, which also affect stimulus quality (Aiello, 1998b; Kaczmarek, Webster, & Radwin, 1992). We are presently exploring pattern perception on 12 × 12 tongue-based electrode arrays, both for visual rehabilitation and for tongue-based communication and navigation in hazardous environments, and we plan to construct a 576-point array in the near future. The advantages of dealing with a multiparameter stimulus waveform to partially compensate for the low punctate resolution of the tongue display have not yet been explored in the tongue studies. Although six parameters can be identified—namely, the current level, the pulse width, the interval between pulses, the number of pulses in a burst, the burst interval, and the frame rate (Aiello, 1998b)—only the level of the current was varied in previous studies. All six parameters in the waveforms can, in principle, be varied independently within certain ranges and may elicit potentially distinct responses. In a study of electrical stimulation of the skin of the abdomen, it was found that the best way to encode intensity information with a multidimensional stimulus waveform was through modulation of the energy delivered by the stimulus, which was perceived as a variation of the stimulus intensity. The energy was varied in such a way that the displacement in the parameter space, corresponding to a given transition between energy levels, was minimal (gradient mode of stimulation). Although the gradient mode of stimulation requires a real-time fulfillment of mathematical constraints among the variations of all the parameters, its implementation could be included within the microelectronic package for signal treatment (Aiello, 1998a). The methods tested for the abdomen may prove their efficacy in the stimulation of the tongue as well, especially when more sophisticated and meaningful patterns are to be displayed onto higher resolution electrode arrays. A major goal of these studies is to develop human–machine interface systems that are practical and cosmetically acceptable. For blind persons, a miniature TV camera, the microelectronic package for signal treatment, the optical and zoom systems, the battery power system, and an FM-type radio signal system to transmit the modified image wirelessly will be included in a glasses frame. For the mouth, an electro-tactile display, a microelectronics package, a battery compartment, and the FM receiver will be built into a dental retainer. The stimulator array could be a sheet of electro-tactile stimulators of approximately 27 × 27 mm. Orthodontic retainers from a cross section of orthodontic patients were examined to determine the dimensions of compartments that could be created during the
178
BACH-Y-RITA, KACZMAREK, AND TYLER
molding process to accommodate the FM receiver, the electro-tactile display, the microelectronics package, and the battery. The dimensions and location of compartments that could be built into an orthodontic retainer have been determined. For all the retainers of adolescent and adult persons examined, except for those with the most narrow palates, the following dimensions are applicable: in the anterior part of the retainer, a space of 23 × 15 mm, × 2 mm deep, is available. Two posterior compartments could each be 12 × 9 mm, and up to 4 mm deep (Bach-y-Rita et al., 1998). Knowledge of these dimensions allows the development of a standard components package that could be snapped into individually molded retainers, in which the wire dental clips would also be the FM antenna (Bach-y-Rita, Kaczmarek, & Meier, 1998). For all applications, the mouth display system would be the same, but the source of the information to be delivered to the brain through the human–machine interface would determine the sensor instrumentation for each application. Thus, as examples, for hand amputees, the source would be sensors on the surface of the hand prosthesis. For astronauts, the source would be sensors on the surface of the astronaut glove. For night vision, the source would be similar to the glasses system for blind persons but would have an infrared camera. For pilots and race car drivers, whose primary goal is to avoid the retinal delay (greater that the signal transduction delay through the tactile system) in the reception of information requiring very fast responses, the source would be built into devices attached to the automobile or airplane. Robotics and underwater exploration systems would require other instrumentation configurations, each with wireless transmission to the mouth display. For spinal cord injured persons, the development of three systems is planned: for sex sensation, for foot sensation and lower limb position, and for sensation in a robotic hand and insensate feet (Bach-y-Rita, 1999). With funding from the Defense Advanced Research Projects Agency (DARPA), we demonstrated the feasibility of developing a tongue system for the underwater orientation and navigation of Navy SEALs. We plan to extend our previously reported NASA-sponsored astronaut glove sensation study (Bach-y-Rita et al., 1987) to include the tongue stimulus array for carrying the information wirelessly from the touch, pressure and shear force sensors on the exterior of the astronauts to the tongue display, for relay to the brain. Other medical applications under consideration are for amputees (by providing sensory receptors on the prosthesis and information to the brain via the tongue) and minimally invasive surgery. In this case, a surgeon could “feel” through a tiny probe or robot (as if it were his finger) inside the heart. Shear force, touch, pressure, and temperature sensors will be placed on the tip and sides of surgical robots and surgical devices inserted during minimally invasive surgery. The information will be led by wires out of the body to the base of the surgical device and from there to the tongue of the surgeon by means of FM transmission. The perception by the surgeon of being inside the body would be comparable to a blind person with a long cane who explores a doorway, a staircase, and identifies
8.
A TONGUE-BASED TACTILE DISPLAY
179
a person’s foot. In all of these cases, the objects (e.g., foot) are perceived in their correct spatial localization; nothing is perceived in the hand holding the cane that contains the sensory receptors. The hand is a relay to the brain; similarly, in the system discussed here, the tongue plays a comparable relay role. IMPLICATIONS FOR PERCEPTION Tactile Perception In previous studies (Bach-y-Rita & Hughes, 1985), we noted that An understanding of the functional equivalence between visual and vibro-tactile processing would have both basic scientific and practical implications, the former because it would bear on whether information for the various perceptual systems ought to be considered modality specific or amodal, and the latter because the data would suggest the possibilities and constraints for vision substitution and other prosthetic developments.
Although the early system was termed a tactile vision substitution system, we have been reluctant to suggest that blind users of the device are actually seeing. Others (e.g., Heil, 1983, Morgan, 1977) have not been so reluctant, claiming that because blind participants are being given similar information to that which causes the sighted to see and are capable of giving similar responses, one is left with little alternative but to admit that they are actually seeing (and not merely “seeing”). Our data suggest that, at least initially, the blind participants obtain the “visual” information primarily by an analysis of contours, although simultaneous analysis of the information is also used. Participants using the TVSS learn to treat the information arriving at the skin in its proper context. Thus, at one moment the information arriving at the skin has been gathered by the TV camera, but at another it relates to the usual cutaneous information (pressure, tickle, wetness, etc.). The participant is not confused; when he or she scratches his or her back under the matrix, nothing is “seen.” Even during task performance with the sensory system, the participant can perceive purely tactile sensations when asked to concentrate on these sensations. As learning progresses, the information extraction processes become more and more automatic and unconscious. Miller (1956) considered that a “chunking” phenomena allows the number of bits per chunk to increase. A blind participant “looking” at a display of objects must initially consciously perceive each of the relative factors such as the perspective of the table, the precise contour of each object, the size and orientation of each object, and the relative position of parts of each object to others nearby. With experience, information regarding several of these factors is simultaneously gathered and evaluated. The increased information transfer through a sensory substitution system can be interpreted in terms of Miller’s “chunking” model. The highly complex “visual” input can thus be reduced, by selective processes, to manageable proportions, allowing the input to be
180
BACH-Y-RITA, KACZMAREK, AND TYLER
mediated by the somesthetic system or, in Gibsonian (1966) terms, the participant learns to extract the relevant information. Many phenomena associated with vision have to be learned; for example, when viewing a person seated behind a desk, the partial image of the person must be correctly interpreted as a complete person with the image of the desk interposed, rather than perceiving just half a person. The subjective experience is comparable (if not qualitatively identical to) vision, including subjective spatial localization in the three-dimensional world. Even the visual illusions that have been tested (e.g., the waterfall effect) are the same as vision. Sensory Overload Normal sensory systems do not usually overload. The central nervous system is able to select only the information needed for any particular perceptual task. Thirty years ago we stated: Many efforts at creating sensory aids set out to provide a set of maximally discriminable sensations. With this approach, one almost immediately encounters the problem of overload—a sharp limitation in the rate at which the person can cope with the incoming information. It is the difference between landing an aircraft on the basis of a number of dials and pointers that provide readings on such things as airspeed, pitch, yaw, and roll, and landing a plane with a contact analog display. Visual perception thrives when it is flooded with information, when there is a whole page of prose before the eye, or a whole image of the environment; it falters when the input is diminished, when it is forced to read one word at a time, or when it must look at the world through a mailing tube. It would be rash to predict that the skin will be able to see all the things the eye can behold, but we would never have been able to say that it was possible to determine the identity and layout in three dimensions of a group of familiar objects if this system had been designed to deliver 400 maximally discriminable sensations to the skin. The perceptual systems of living organisms are the most remarkable information reduction machines known. They are not seriously embarrassed in situations where an enormous proportion of the input must be filtered out or ignored, but they are invariably handicapped when the input is drastically curtailed or artificially encoded. Some of the controversy about the necessity of preprocessing sensory information stems from disappointment in the rates at which human beings can cope with discrete sensory events. It is possible that such evidence of overload reflects more an inappropriate display than a limitation of the perceiver. Certainly the limitations of this system are as yet more attributable to the poverty of the display than to taxing the information handling capacities of the epidermis. (White et al., 1970)
Although there are definitely problems with the TVSS in the interpretation of objects in the presence of a cluttered background, they do not appear to be due to overload, but rather to be primarily due to the poverty of the display resulting from the low resolution of the present systems.
8.
A TONGUE-BASED TACTILE DISPLAY
181
EDUCATIONAL AND VOCATIONAL CONSIDERATIONS Education of Blind Children We have demonstrated that it is possible to develop an understanding, in congenitally blind students, of the visual world, including visual means of analysis (Miletic, Hughes, & Bach-y-Rita, 1988). Children learn to understand and use visual means of analysis such as monocular cues of depth and interposition; for the latter, one example of the training is to view three candles lined up one behind the other. Only one is perceived because the view of the others is blocked; the child then moves his or her head to see the three candles appearing as they are viewed from the side. For an additional rewarding learning experience with the candles, one is lit, and the child views the flame and blows on it to produce waving of the flame, which is perceived with interest and excitement. Comparable “visual” experiences have been explored, and further applications, using the tongue interface, are planned in regard to learning math and geometry and science, and in the development of a portable system for itinerant teachers of blind students. Thus, The feasibility of developing visual information educational programs for blind persons has been demonstrated. These are presently being organized in three countries.
Vocational Opportunities A vocational test of the TVSS revealed its potential application to jobs presently reserved for sighted workers. A person totally blind since 2 months from birth spent 3 months on the miniature diode assembly line of an electronic manufacturer. During the assembly process, he received a frame containing 100 small glass cylinders with attached wires, as the frame emerged from an automatic filling machine that filled each cylinder with a small piece of solder. The automatic process was 95% efficient, and so approximately 5% of the cylinders remained unfilled. His first task after receiving each frame was to inspect each of the cylinders and to fill by hand those that remained unfilled. This was accomplished with a small TV camera mounted in a dissecting microscope, under which the blind worker passed the frame containing the cylinders. The information from the cylinders was passed through an electronic commutator in order to transform it into a tactile image, and was delivered to the skin of the abdomen of the worker by means of 100 small vibrating rods in an array clipped to the workbench. In order to receive the image, the blind worker had only to lean his abdomen against the array (without removing his shirt). He did not wear any special apparatus, and his hands were left free to perform the inspection and assembly tasks under the microscope. He filled the empty cylinders by means of a modified injection needle attached to a vacuum: he placed the needle in a dish filled with small pieces of solder. The needle picked up only one piece because the suction was then blocked. He then brought the needle with the solder into the “visual” field
182
BACH-Y-RITA, KACZMAREK, AND TYLER
under the microscope, and by hand–“eye” coordination placed the needle in an empty cylinder, at which point he released the suction, and the solder dropped in. He repeated the process for each empty cylinder encountered, and then passed the frame to another loading machine, where it was automatically filled with diode wafers. Again, the task was then to fill the approximately 5% cylinders that did not have a diode wafer, which he did as above, except that this stage offered two extra problems: The wafers were very thin and flat and did not always fall flat into the cylinders; sometimes they landed on edge. Further, the wafers were gold on one side and silver on the other, and they had to be correctly oriented. He had the additional task of turning over 50% of the wafers. This task was accomplished by identifying the color on the basis of light reflectance, because the silver side reflected more light. The blind worker was able to perform the tasks but was much slower than the line workers, and became more fatigued than they did. Thus, he would not have been a competitive worker on that line. However, it demonstrated the feasibility of developing jobs in an industrial setting in the future (Bach-y-Rita, 1982, 1995). Electrophysiological Studies We, and others, are exploring the applications of sensory substitution to the access by blind persons to computer graphics. The ability to perceive two- and three-dimensional graphics would greatly increase the potential of information technology employment.
PSYCHOLOGICAL AND PHILOSOPHICAL ISSUES Seeing or “Seeing” In Molyneux’s Question, Morgan (1977) offers two basic arguments for his position that subjects using a tactile vision substitution system (TVSS) are actually seeing and not merely “seeing.” One, the structural nature of the perceptual system does not offer any criteria for distinguishing seeing from not seeing; for example, the horseshoe crab is offered as an example of a biological system with fewer receptors than most mammals but which can nonetheless see. Morgan’s second argument concerns behavioral equivalence: If blind participants receive (optical conversions of) optical information that would satisfy criteria for seeing in the sighted world and respond in an indistinguishable manner, one might concede that the blind are “seeing.” We noted that using a TVSS is more like visual perception than typical tactile perception (for example, under normal ecologically valid conditions the tactile perceptual system usually involves concurrent kinesthetic information, which leads to designating it the “haptic” system). Are the systems the same? Clearly not on
8.
A TONGUE-BASED TACTILE DISPLAY
183
quantitative grounds, because the resolution of the TVSS is orders of magnitude less than the visual system, but one also might ask if the ways in which the systems differ are crucial. In any case, sighted persons still see, even under environmentally impoverished conditions such as fog or rain or at dusk, where shapes and patterns are difficult to distinguish, and a person with blurred tunnel vision is still using the remaining visual capacity to see (Bach-y-Rita & Hughes, 1985). Qualia Subjects trained with the TVSS have noted the absence of qualia (the emotional content of a sensory experience), which in a number of cases has been quite disturbing. Thus, well-trained subjects are deeply disappointed when they explore the face of a wife or girlfriend and discover that, although they can describe details, there is no emotional content to the image. In two cases, blind university students were presented Playboy centerfolds, and although they could describe details of the undressed women, it did not have the affective component that they knew (from conversations with their sighted classmates) that it had for sighted persons. Similar experiences have previously been noted by congenitally blind persons who acquire sight following surgery: colors have no affective qualities, and faces do not transmit emotional messages (e.g., Gregory & Wallace, 1963). The absence of qualia with substitute sight may be compared to the acquisition of a second language as an adult. The emotional aspects of the new language are often lacking, especially with emotionally charged words and expressions, such as curse words. It appears that both spoken language and other sensory messages require long experience within the context of other aspects of cultural and emotional development to be able to contain qualia. In some sense, qualia have been evident in the tactile sensory substitution experience, but in a limited fashion. For example, school-age children perceiving the flame of a candle for the first time are always pleased by the experience, especially when their actions influence the sensory message, such as when they blow gently on the candle and perceive the flickering flame. A university student who had just married showed great interest and pleasure in the exploration of typical kitchen instruments and activities, such as in the process of baking a cake. Another was delighted that the three-dimensional images helped her to understand the structure of a traffic intersection. A leprosy patient who had lost all sensation in his hands expressed great pleasure when he perceived the sensation of active touch from his wife by using a glove with artificial tactile receptors that transmitted the information from active touch to the skin of the forehead, where his sensation was intact. However, these and similar reports of qualia are overshadowed by the more frequent reports of displeasure at the absence of qualia, noted above. It remains to be demonstrated, by appropriately designed experiments, whether persons who begin to use vision sensory substitution as adolescents or adults will ever fully
184
BACH-Y-RITA, KACZMAREK, AND TYLER
develop the qualia that are present with vision in persons sighted since birth. Will specific training be necessary? Will qualia develop with extended use of the substitution system? Systems for blind babies (cf. Bach-y-Rita & Sampaio, 1995) and for blind children (Miletic et al., 1988) have already provided some suggestive evidence for the development of qualia, such as the infant’s smile on perceiving the mother’s approach, and the child’s pleasure at perceiving the flame of a candle, and especially perceiving the wavering flame as he blows on it. Our working hypothesis is that when such systems are used from infancy, the qualia will develop even if the system is not continuously in use. Thus, we expect that an hour a day or even less should be sufficient for the baby to grow with the contextual inclusion of the substitute sensory information. It would then be a part of the development of the entire personality, and when thus included as an integral part of the emotional development process, qualia comparable to those of sighted persons should be obtained. Improved systems for both babies and adults should provide experimental models for exploring the development of qualia with behavioral, as well as brain imaging and electrical activity, studies.
CONCLUSION Tactile sensory substitution studies have demonstrated the capacity of the brain to portray environmental characteristics by means of tactile displays and to integrate the information into perceptual and behavioral responses. The recent development of a tongue human–machine interface now offers the possibility of carrying these studies to the next level: the development of practical systems for sensory substitution, for tactile tele- and internet communications, for surgery, and many other applications.
ACKNOWLEDGMENTS This research was supported in part by the National Institutes of Health grant R01-EY10019, the Charles E. Culpeper Foundation Biomedical Pilot Projects Initiative, the Robert Draper Technology Innovation Fund, and the Industrial and Economic Development Research Fund at the University of Wisconsin–Madison, and DARPA contract BD-8911.
REFERENCES Aiello, G. L. (1998). Multidimensional electrocutaneous stimulation. IEEE Trans Rehab Engr, 6, 1–7. Aiello, G. L. (1998). Tactile colors in artificial sensory communication. In Proc of the 1998 Internat Symp on Info Theory & Its Applications, Mexico City (pp. 82–85).
8.
A TONGUE-BASED TACTILE DISPLAY
185
Bach-y-Rita, P. (1967). Sensory plasticity: Applications to a vision substitution system. Acta Neurol. Scand., 43, 417–426. Bach-y-Rita, P. (1972). Brain mechanisms in sensory substitution. New York: Academic Press. Bach-y-Rita, P. (1982). Sensory substitution in rehabilitation. In L. Illis, M. Sedgwick, & H. Granville (Eds.), Rehabilitation of the neurological patient (pp. 361–383). Oxford, England: Blackwell. Bach-y-Rita, P. (1989). Physiological considerations in sensory enhancement and substitution. Europa Med Phs, 25, 107–128. Bach-y-Rita, P. (1995). Nonsynaptic diffusion neurotransmission and late brain reorganization. New York: Demos-Vermande. Bach-y-Rita, P. (1996). Substitution sensorielle et qualia. In J. Proust (Ed.), Perception et Intermodalit´e (pp. 81–100). Paris: University Presses of France. Bach-y-Rita, P. (1999). Theoretical aspects of sensory substitution and of neurotransmitter-related reorganization in spinal cord injury. Spinal Cord, 37, 465–474. Bach-y-Rita, P., Collins, C. C., Saunders, F., White, B., & Scadden, L. (1969). Vision substitution by tactile image projection. Nature, 221, 963–964. Bach-y-Rita, P., & Hughes, B. (1985). Tactile vision substitution: Some instrumentation and perceptual considerations. In D. Warren & E. Strelow (Eds.), Electronic spatial sensing for the blind (pp. 171– 186). Dordrecht, the Netherlands: Martinus Nijhoff. Bach-y-Rita, P., Kaczmarek, K., & Meier, K. (1998). The tongue as a man–machine interface: a wireless communication system. In Proc of the 1998 Internat Symp on Info Theory and Its Applications, Mexico City (pp. 79–81). Bach-y-Rita, P., Kaczmarek, K., Tyler, M., & Garcia-Lara, J. (1998). Form perception with a 49–point electro-tactile stimulus array on the tongue J Rehab. Res Develop, 35, 427–430. Bach-y-Rita, P., & Sampaio, E. (1995). Substitution sensorielle chez les adultes et les enfants aveugles. In Y. Christen, M. Doly, & M.-T. Droy-Lefaix (Eds.), Vision and adaptation (pp. 108–116). Amsterdam: Elsevier. Bach-y-Rita, P., Webster, J., Tompkins, W., & Crabb, T. (1987). Sensory substitution for space gloves and for space robots. In G. Rodriques (Ed.), Proceedings of the Workshop on Space Robots (Vol. Pub. 87–13, Vol. 2, pp. 51–57). Pasadena, CA: Jet Propulsion Laboratories. Boxtel, A. v. (1977, November). Skin resistance during square-wave electrical pulses of 1 to 10 mA. Med. Biol. Eng. Comput. 15, 679–687. Collins, C. C., & Bach-y-Rita, P. (1973). Transmission of pictorial information through the skin. Advan Biol. Med. Phys, 14, 285–315. Collins, C. C., & Madey, J. M. (1974). Tactile sensory replacement. Proc of the San Diego Biomed Symp, 13, 15–26. Feinsod, M., Bach-y-Rita, P., & Madey, J. M. (1973). Somatosensory evoked responses—Latency differences in blind and sighted persons. Brain Res 60, 219–223. Frost, D. O., & Metin, C. (1985). Induction of functional retinal projections to the somatosensory system. Nature, 317, 162–164. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Gregoroy, R. L., & Wallace, J. G. (1963). Recovery from early blindness: A case study, Experimental Psychology (Monograph No. 2). Cambridge, England: Heffner. Heil, J. (1983). Perception and cognition. Berkeley: University of California Press. Jenkins, W. M., Merzenich, M. M., Ochs, M. T., Allard, T., & Guic-Robles, E. J. (1990).Neurophysiology, 63, 82–104. Kaczmarek, K., & Bach-y-Rita, P. (1995). Tactile displays. In W. Barfield & T. Furness, III (Eds.), Advanced interface design and virtual environments (pp. 349–414). Oxford, England: Oxford University Press. Kaczmarek, K. A., & Webster, J. G. (1989). Voltage-current characteristics of the electro-tactile electrode–skin interface. Paper presented at the IEEE Annu. Int. Conf. Eng. Med. Biol. Soc., Seattle, WA, Nov. 9–12.
186
BACH-Y-RITA, KACZMAREK, AND TYLER
Kaczmarek, K. A., Webster, J. G., & Radwin, R. G. (1992). Maximal dynamic range electrotactile stimulation waveforms. IEEE Trans. Biomed Eng, 39(7), 701–715. Kothari, M., Webster, J. G., Tompkins, W. J., Wertsch, J. J., & Bach-y-Rita, P. (1988), Capacitive sensors for measuring the pressure between the foot and shoe. Paper presented at the 10th Annu. Int. Conf. of IEEE Eng. Med. Biol. Soc. Nov 4–7 New Orleans. Maalej, N., Zhu, H., Webster, J. G., Tompkins, W. J., Wertsch, J. J., & Bach-y-Rita, P. (1987). Pressure monitoring under insensate feet. Paper presented at the ninth Annu. Conf. of the Engin. Med. Biol. Soc. Mari˜no, J., Martinez, L., & Canedo, A. (1999). Sensorimotor integration at the dorsal column nuclei. News Physiol. Sci., 14, 231–237. Metin, C., & Frost, D. O. (1989). Visual responses of neurons in somatosensory cortex of hamsters with experimentally produced retinal projections to somatosensory thalamus. Proc of the Nat Acad of Sciences, 86, 357–361. Miletic, G., Hughes, B., & Bach-y-Rita, P. (1988, November). Vibro-tactile stimulation: An educational program for spatial concept development. Journal of Visual Impairment and Blindness, vol. 82, 366– 370. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity to process information. Psychological Review, 63, 81–97. Morgan, M. J. (1977). Molyneux’s question. Cambridge, England: Cambridge University Press. Pascual-Leone, A., & Torres, F. (1993). Plasticity of the sensorimotor cortex representation of the reading finger in Braille readers. Brain, 116, 39–52. Wertsch, J. J., Webster, J. G., & Tompkins, W. J. (1992). A portable insole plantar pressure measurement system. J Rehabil. Res. Dev., 29, 13–18. White, B. W., Saunders, F. A., Scadden, L., Bach-y-Rita, P., & Collins, C. C. (1970). Seeing with the skin. Perception & Psychophysics, 7, 23–27. Woods, D. L., Clayworth, C. C., & Bach-y-Rita, P. (1985). Early blindness reorganizes auditory processing in humans. Soc. Neurosci Abst, 11, 449. Zhu, H., Maalej, N., Webster, J. G., Tompkins, W. J., Bach-y-Rita, P., & Wertsch, J. J. (1990). An umbilical data-acquisition system for measuring pressures between the foot and shoe. IEEE Trans. Biomed Engin, 37(a), 908–911.
9 Spatial Audio Displays for Target Acquisition and Speech Communications Robert S. Bolia W. Todd Nelson Air Force Research Laboratory Wright-Patterson Air Force Base
The evolution during the past two decades of high-speed, low-cost digital signal processors has allowed for the development of inexpensive systems that are capable of reproducing, to high degrees of fidelity, the physical cues that determine the location of a sound source. This is accomplished by measuring the acoustic waveform at or near the eardrum of a listener and interpreting the difference between the spectra of the source and the resultant wave as a transfer function characterizing, for a given ear and a given source position, the frequency-dependent modifications of the incident waveform by the head, torso, and pinna. If this is done for both ears, then it is possible to encode all of the cues necessary for accurate auditory localization using two channels, one for each ear. The transfer functions for each ear, commonly referred to as head-related transfer functions (HRTFs), can then be represented as digital filters and convolved with an appropriate audio signal to produce the sensation of an externalized sound source whose perceived direction is the same as that of the measured HRTF (Martin, McAnally, & Senova, 2001). The availability of this technology has resulted in an increase in the quantity of research conducted on all types of binaural phenomena—from auditory localization to the “cocktail party effect”—and has occasioned the suggestion that the display of spatial information via the auditory modality may be useful for enhancing performance, reducing workload, and improving situation awareness in a number 187
188
BOLIA AND NELSON
of tasks. Indeed, several published studies have suggested that this is the case (Begault & Pittman, 1996; Bronkhorst, Veltman, & van Breda, 1996; McKinley, Ericson, & D’Angelo, 1994; Nelson, Bolia, Ericson, & McKinley, 1998a, 1998b; Perrott, Cisneros, McKinley, & D’Angelo, 1996; Sorkin, Wightman, Kistler, & Elvers, 1989; Wenzel, 1992). At present, however, spatial audio displays have not found their way into many real-world environments. The aim of the present chapter is to present some of the more basic research in two areas for which spatial audio displays may prove useful—target acquisition and speech communications—and to discuss the form such displays may take, as well as some implementation issues. Once the two research areas have been addressed individually, the potential for interactions between them will be raised, along with the possibility of dynamically switching between the two interface types according to the state of the operator, of the environment, or both. Finally, the role of these displays as components of multisensory adaptive interfaces will be discussed.
THE USE OF SPATIAL AUDIO DISPLAYS FOR TARGET DETECTION AND IDENTIFICATION Numerous researchers have proposed that one of the primary functions of the auditory system is the redirection of gaze (Heffner, 1997; Perrott, Saberi, Brown, & Strybel, 1990). It has long been known that a spatialized auditory stimulus can occasion reflexive head motions or visual saccades (Sokolov, 1967). Indeed, it has been demonstrated in primates that saccades generated by a spatialized acoustic stimulus are completed more rapidly than those generated by a visual stimulus, and yet are just as accurate (Whittington, Hepp-Reymond, & Flood, 1981). Perrott and his colleagues (1990) have shown that eye movements induced by acoustic stimuli are accurate enough to bring about foveation of a visual target, suggesting the possibility that the redirection of the eyes might be a principal role of the auditory localization system. This hypothesis is strengthened by the work of Heffner and Heffner (1992), who found a strong correlation between size of field-of-best-vision and localization acuity in mammals. Given that the auditory modality plays a paramount role in the detection and identification of objects and events in extracorporeal space, it makes sense to consider the utility of spatial auditory cues for the design of displays for target acquisition. Much of the work in this area has been done by Perrott and his associates (Perrott et al., 1990; Perrott, Sadralodabai, Saberi, & Strybel, 1991; Perrott et al., 1996), employing a paradigm known as auditory guided visual search. In the classic visual search paradigm, an observer searches a visual field for a target among an array of nontarget items, or distractors, and reports either the presence or absence of the target. The relationship between search time and number of distractors is linear, with search time either increasing with set size or remaining approximately constant, independent of the number of nontarget elements in the
9.
SPATIAL AUDIO DISPLAYS
189
field. The former case is generally interpreted as involving some limited-capacity process, in which subjects must direct their attention to small sections of the visual field sequentially, and is hence referred to as a serial search. On the other hand, searches for which reaction times do not vary with the number of distractors are presumed to involve some “preattentive” or parallel process, and are hence termed parallel searches (Treisman & Gelade, 1980; Wolfe, Cave, & Franzel, 1989). Auditory guided visual search differs from classical visual search in two ways. First, for a subset of the trials completed, the visual target is co-located with an acoustic stimulus. The difference in search times between the auditory and the nonauditory trials is interpreted as a measure of the salience of the audio cue for determining the location of a visual target. Further, participants respond using a twoalternative, forced choice discrimination instead of a present–absent response, indicating via a button press which of two potential targets was present on a given trial. In an early study using the auditory guided visual search paradigm, Perrott, and colleagues (1990) had observers complete a simple search (one target, no distractors) for visual targets located in the horizontal plane over a 260 deg range of azimuths centered at 0 deg (i.e, directly in front of the observer). They found, not unexpectedly, that search times varied directly with increasing target azimuth in both the visual-only and the auditory guided conditions. More surprising was the finding that in the auditory guided condition differences between search latencies for targets in the rear hemifield and those for targets in the central visual field were only on the order of 200–300 msec, compared with a difference of 500 msec or more in the absence of spatially correlated audio information. They also found a significant improvement in search times for targets within 10 deg of the fovea, where it had been presumed that auditory information would be redundant. The magnitude of this advantage in the central visual field suggests that it is due not to simple redundancy but rather to early integration of auditory information in the oculomotor system, reinforcing the notion that one of the driving forces behind the evolution of the human auditory system was the need to efficiently regulate gaze. One implication of this result is that spatial audio cueing may be useful for the direction of attention not only to information sources outside of an operator’s field of view, but also to locations within a few degrees of fixation. In a subsequent experiment reported in the same article, the authors introduced uncertainty in the vertical dimension by allowing the target to vary in elevation as well as azimuth, and found substantially the same pattern of results. Both of these studies, however, were conducted against a simple dark background, whereas real-world search tasks often involve search in the presence of distractors similar to the target along one or more visual dimensions (e.g., looking for a particular threat on a situation display). In light of this, Perrott and his colleagues (Perrott et al., 1991) investigated auditory guided visual search performance in the presence of visual distractors. In this study, target locations were limited to within 15 deg of a participant’s initial line of gaze for two reasons: (1) in order to further explore the finding of the 1990 study that a significant advantage of spatial audio cueing
190
BOLIA AND NELSON
exists even for targets in the central visual field, and (2) because this region seemed like a likely locus for the conduction of real-world visual searches (e.g., cockpit displays). The authors found, predictably, that search times depended on the number of visual distractors in the field and that this dependence was modulated by the target’s distance from the initial fixation point and the presence or absence of a spatially correlated audio cue. In short, the slopes of the RT vs set size curves increased with increasing target-fixation distance, and were lower in the spatial audio condition than in the no-sound condition. The results in the no-sound condition are straightforward, and indeed are predictable from the literature on visual attention (Treisman & Gelade, 1980). The results from the spatial audio condition can be discussed in similar terms if we assume that the accuracy with which listeners can localize is worse than the average angular separation of the targets and distractors for certain set sizes. Specifically, as set size increases, so does distractor density. An increase in the number of distractors thereby results in an increase in the probability that a distractor will fall into a neighborhood around the target whose radius is less than the spatial acuity of the auditory system in that region. This would presumably result in subjects using the audio cue to focus their visual attention on a region of space close to the target and then completing a serial search of the selected visual field. This hypothesis is supported by the work of Rudmann and Strybel (1999), who found that an increase in local distractor density occasioned a corresponding increase in search times. Missing from the studies heretofore mentioned is a study of the effects of spatially correlated audio cues on search performance when the locus of the target is unrestricted in azimuth and elevation. Perrott and his colleagues (1996) undertook such an investigation for the case in which a participant executed a simple visual search (i.e., no distractors) for a target located on the interior surface of a geodesic sphere. Not surprisingly, search times varied as a function of target location. The importance of location was diminished significantly, however, by the introduction of a spatial audio cue, which provided an advantage in all regions of space. This was especially meaningful for targets in the rear hemifield, where the addition of a spatial audio cue decreased search times by several hundred milliseconds. In the same study, Perrott and his colleagues (1996) investigated the utility of a virtual spatial audio cue for guiding visual search. In this case, observers’ searches were guided not by a free-field audio cue, but by an audio cue digitally filtered such that, when presented over headphones, it appeared to emanate from the same location as the visual target. Results indicated a significant advantage of virtual spatial audio cueing over the no-cueing condition, but also a performance degradation over the free-field-cueing condition, a fact that reflects the fidelity of the virtual audio cues. This is most likely due either to (1) the limited spatial resolution of the head-related transfer functions or (2) the fact that the HRTFs employed in this study were not measured from the pinnae of the subjects, but from a generic manikin. This hypothesis is borne out by the observation that
9.
SPATIAL AUDIO DISPLAYS
191
the discrepancies in performance between the free-field and virtual conditions occur for targets in the rear hemifield and for those more than 45 deg off of the horizontal plane, areas in which individual differences in the HRTF are thought to be important. This work was later extended by Bolia, D’Angelo, and McKinley (1999) to examine the effect of visual distractors on three-dimensional auditory guided visual search with free-field and virtual spatial audio cues. The findings of this research, while novel, were not unexpected. As is usually reported in the visual search literature (e.g., Treisman & Gelade, 1980), a linear relationship between search time and set size was exhibited. Further, as in the work of Perrott and his colleagues (1996), performance advantages were found in both the free-field and virtual listening conditions, with performance in the free-field condition significantly better than performance in the virtual condition. It is interesting to think about these results in terms of the slopes of the RT vs set size curves, which represent the additional time required to complete a search for each additional distractor added to the set. In the unaided condition, each additional distractor adds, on average, an additional 248 msec to the search time. In the free-field audio-aided condition, the search has become essentially parallel, that is, independent of set size—the slope of the RT vs. set size function is only 1 msec per distractor. Under the virtual audio condition, search efficiency still depends on the number of distractors in the field, but to a much lesser degree than in the unaided condition. With a slope of only 40 msec per distractor—more than 200 msec per distractor less than that obtained in the condition in which no audio cueing is provided—this represents a savings of between 500 msec in an uncluttered visual field to as much as 12 sec when there are 50 distractors in the field. More recent investigations have disclosed the possibility of using spatial audio displays to direct an operator’s attention not only to locations outside of the operational platform, but also to displays and controls within his or her immediate workspace (Brungart & Simpson, 2001a). Such displays have been made practical by the discovery by Brungart and his colleagues (Brungart & Durlach, 1999; Brungart & Rabinowitz, 1999) that the cues governing the perception of angular position and distance are different for sound sources located within 1 m of the listener (i.e., the near field) than for those located more than 1 m away (i.e., the far field). While near-field spatial audio displays have yet to be tested in occupational environments, their implementation in conjunction with far-field displays suggests the potential to direct an operator’s visual attention to locations both inside and outside of the cockpit. The literature reviewed here provides compelling evidence for the benefits of spatial audio displays in tasks involving the acquisition of visual targets. Although such displays might prove effective in any number of application domains, the implications of such dramatic reductions in reaction time for automotive and aerospace environments cannot be overstated. This is especially true in the tactical
192
BOLIA AND NELSON
aircraft cockpit, where a savings of a few hundred milliseconds in the acquisition of a threat or the direction of visual attention toward a particular instrument might mean the difference between life and death. Further, the potential to reduce workload seems inherent in the use of such displays, given that pilots, instead of devoting their resources to an exhaustive serial search of the out-the-window scene, can rely on the spatial audio cues to inform them of the presence and location of a target, affording them more time to concentrate on their primary task of flying the aircraft (Tannen et al., 2000). The workload-reducing capacity of spatial audio displays for target acquisition is attested in the work of Nelson and his colleagues (Nelson et al., 1998; Tannen et al., 2000).
SPATIAL AUDIO DISPLAYS FOR SPEECH COMMUNICATIONS The “cocktail party effect,” a well-known and often cited auditory phenomenon, is the name given to the fact that listeners are able to “hear out” and comprehend the utterance of a single “target” talker in the presence of multiple simultaneous “distractor” talkers. Yost, in a review of the literature on the cocktail party phenomenon (1997), suggests seven physical attributes of a sound source that might contribute to this segregation of speech signals: (1) spectral separation, (2) spectral profile, (3) harmonicity, (4) spatial separation, (5) temporal separation, (6) temporal onsets and offsets, and (7) temporal modulations. Although it is probable that the effect in question is due to several of these factors, and to interactions between them, the one receiving the most attention in the literature is that dealing with the spatial separation of the talkers. Although this subject has not been treated exhaustively, there have been a number of investigations conducted on the effect of spatial separation on speech segregation and intelligibility. A study by Ricard and Meirs (1994) on the intelligibility and localization of speech signals from virtual spatial locations serves as a convenient example of this type of research. Speech signals were spatialized along the horizontal plane using a ConvolvotronTM and nonindividualized HRTFs and presented to listeners over headphones in the presence of a white-noise masking stimulus. Speech intelligibilty was determined using the Modified Rhyme Test, which showed that when localized single-word stimuli were presented in the presence of a masking white noise, intelligibility increased by an average of 4–5 dB relative to that of nonlocalized speech stimuli. In addition, their data suggested that the accuracy of localizing speech stimuli was comparable to that of nonspeech stimuli presented via headphones using nonindividualized head-related transfer functions, a result which has also been reported by Begault and Wenzel (1993). Ericson and McKinley (1997) investigated the effects of wide-band noise and the spatial separation of speech signals on speech intelligibility in a series of experiments conducted in the early 1990s. In this case, speech intelligibility was
9.
SPATIAL AUDIO DISPLAYS
193
determined using the coordinate response measure (CRM; Moore, 1981) and the Voice Communications Effectiveness Test (McKinley & Moore, 1989). The former is used to measure the intelligibility of one of multiple simultaneous utterances, whereas the latter is designed to measure information transfer in representative military communications. Spatial separation of the speech signals was achieved with the Air Force Research Laboratory’s Auditory Localization Cue Synthesizer (McKinley, Ericson, & D’Angelo, 1994) using nonindividualized HRTFs with 1 deg resolution in the horizontal plane. The wideband noise was either correlated diotic pink noise, uncorrelated pink noise, or ambient pink noise. The diotic and dichotic pink noise were mixed with speech signals and presented to the listeners over headphones, whereas the ambient pink noise was played over loudspeakers in a reverberant chamber. Ericson and McKinley (1997) reported that (1) angular separation greater than or equal to ± 45 deg provided the highest levels of intelligibility; (2) diotic noise was associated with the greatest degradations in speech intelligibility, followed by ambient noise and dichotic noise, respectively; and (3) no additional benefit was revealed for spatial separations greater than 90 deg. The effects of spatial separation on speech intelligibility was furthered explored by Nelson and his colleagues (Nelson, Bolia, Ericson, & McKinley, 1998a, 1998b) in a series of studies designed to assess the effect of spatial auditory information on a listener’s ability to detect, identify, and monitor multiple simultaneous speech signals. Specifically, factorial combinations of the number of simultaneous speech signals (1–8 talkers), signal location along the horizontal plane (front right quadrant [RQ], front hemifield [FH], right hemifield [RH], full 360 deg [F], and a nonspatialized control [C]) and the gender of the talker (male or female) were evaluated. Performance efficiency was evaluted using a modified version of the CRM (Bolia, Nelson, Ericson, & Simpson, 2000), which required participants to listen for the occurrence of a critical call sign (e.g., “Baron”) and to identify the color–number combination that emanated from the same spatial location as the critical call sign. For example, if assigned “Baron” as the critical call sign, then the appropriate response to “Ready, Baron, go to red six now” would have been to press the response buttons that corresponded to the red–six combination. If the critical call sign were not presented, then no response was required. Fifty percent of the experimental trials included the critical call sign. Results obtained in the free-field presentation (Nelson et al., 1998a) of the speech signals indicated that the spatialization of simultaneous speech signals (1) increased the percentage of correctly identified critical signals and (2) lowered ratings of perceived mental workload as compared to a nonspatialized control condition. It is interesting that post hoc analyses of the former effect indicated that although performance efficiency associated with the four spatialization conditions was significantly greater than the nonspatialized control condition, the four spatialization conditions were not significantly different from each other. Similarly, results obtained for simultaneous speech signals presented via a virtual spatial audio display indicated that spatialization increased the percentage of
194
BOLIA AND NELSON
correctly detected and identified critical speech signals. Collectively, these results provide a compelling demonstration of the beneficial effects of spatialization on performance efficiency and may be particularly relevant to application domains in which operators are required to effectively monitor multiple speech communications channels simultaneously (air traffic control, command and control, etc.). A nice addition to this work is that of Brungart and Simpson (2001a), who have demonstrated enhanced speech intelligibility under conditions in which the target and distractor messages varied in distance but not in azimuth. In this study, participants listened to pairs of simultaneous phrases from the CRM played at an azimuth of −90 deg and at distances of 12 cm, 25 cm, and 100 cm. One of the phrases was always located at a distance of 100 cm from the listener, and so the control condition was that in which both phrases were located at that distance. An improvement in intelligibility on the order of 20–30% was obtained when one of the talkers was located at the one of the two nearer distances, suggesting that the use of near-field spatial audio displays may prove valuable in the construction of future multichannel communications interfaces. Although much work has been done in the area of spatial audio displays for speech communications, there are many questions that remain to be answered. One question is whether the effects of spatialization similar to those obtained in the horizontal plane exist in the median plane. Another is whether the effects exhibited by angular separation and those exhibited by separation in distance interact in any way. Both of these questions raise an equally important issue: given what is known about the effects of angular and linear separation of talkers on speech intelligibility, how does one use that knowledge to design an optimal communications display? This question will undoubtedly require a significant amount of additional research.
SPATIAL AUDIO DISPLAYS AS COMPONENTS OF ADAPTIVE HUMAN–MACHINE INTERFACES Although the use of spatial audio displays to enhance performance on specific tasks has been the subject of a number of investigations, few studies have examined the possibility that performance of any one of these tasks may affect simultaneous performance of any of the others (Morley et al., 2001). For example, it is possible that reaction times in an aurally aided target acquisition task may be elevated when the operator is also performing a speech communications task and one of the speech channels is “close to” the location of the target. In this case it is desirable to know (1) the relation between proximity and performance degradation, that is, whether any “spatial” masking—as opposed to energetic or informational masking—occurs, and (2) what measures might be taken to prevent such an occurrence. One possible solution to the second question is the use of an adaptive display (Bennett, Cress, Hettinger, Stautberg, & Haas, 2001; Haas, Nelson, Repperger, Bolia, & Zacharias, 2001; Hettinger, Cress, Brickman, & Haas, 1996) to present
9.
SPATIAL AUDIO DISPLAYS
195
each set of cues in a manner that is optimized for the current state of the human– machine system. This might take the form of a multimodal display, in which information is adaptively switched between sensory modalities; of an adaptive angular display, in which the angular positions of signals are manipulated to prevent spatial overlap; or of a gross distance display, in which signals are switched dynamically between the near and the far acoustic fields. In a multisensory display, it is possible to adaptively switch the modality of presentation as a function of the locations of the target(s) and the spatialized communication channels. For example, if the target is within the visual field of the operator, it might make sense to present it visually rather than aurally, given a secondary auditory load such as a speech communications task. Indeed, this may be the case regardless of the spatial locations of the communications channels, depending on the state of the operator and of the environment (the relative amounts of visual and auditory workload, etc.). Although some researchers have begun to think about similar issues (Moroney et al., 1999; Tannen et al., 2000), to date no experiments have been conducted to address this specific question. In an adaptive angular display, the mode of information display might be switched as a function of the relationship between the spatial location of the audio signals and the semantic information conveyed by it. For example, if multiple intercom channels are spatially separated solely to take advantage of the binaural intelligibility level differences obtained by spatialization, then the absolute locations of the voices are not critical (Nelson et al., 1998b). Indeed, research has demonstrated that, as long as talkers are separated by some minimal critical distance, their locations per se are not important (Ericson & McKinley, 1997). As such, the appearance of a threat might occasion the shifting of the speech signals away from the direction of the threat, which presumably would allow for both enhanced speech intelligibility and accurate target localization. More recent research (Bolia, Nelson, & Morley, 2001) has suggested an interface adaptation based on the cerebral lateralization of speech perception, specifically, that high-priority communications should be presented in the operator’s right hemifield. However, such adaptations would clearly not be practical if the locations of the speech signals were not arbitrary, that is, if some information about the talker were encoded by spatial location (e.g., if the location of a voice were tracked to the location of the talker—perhaps a wingman or another crew member). To date, no research has been conducted to investigate the feasibility of such displays. Another area in which there has been a dearth of research is that of the utility of adaptive auditory distance displays. Indeed, it is only through the recent investigation of near-field auditory localization and the measurement of HRTFs in the near field (Brungart & Durlach, 1999; Brungart & Rabinowitz, 1999) that the consideration of such displays has even become practical. Assuming that the cocktail party phenomenon exists in the near field (which has not been demonstrated but which is likely given the binaural cues subserving near-field localization), it should be possible to separate speech channels and spatialized threat warnings by placing the
196
BOLIA AND NELSON
former in one distance arc and the latter in the other, and switching the presetation of the speech display adaptively between the near and far fields as a function of whether attention needs to be cued to locations inside or outside of the cockpit. REFERENCES Begault, D. R., & Pittman, M. T. (1996). Three-dimensional audio versus head-down traffic alert and collision avoidance system displays. International Journal of Aviation Psychology, 6, 79–93. Begault, D. R., & Wenzel, E. M. (1993). Headphone localization of speech. Human Factors, 35, 361–376. Bennett, K. B., Cress, J. D., Hettinger, L. J., Stautberg, D., & Haas, M. W. (2001). A theoretical analysis and preliminary investigation of dynamically adaptive interfaces. International Journal of Aviation Psychology, 11, 169–196. Bolia, R. S., D’Angelo, W. R., & McKinley, R. L. (1999). Aurally-aided visual search in threedimensional space. Human Factors, 41, 664–669. Bolia, R. S., Nelson, W. T., Ericson, M. A., & Simpson, B. D. (2000). A speech corpus for multitalker communications research. Journal of the Acoustical Society of America, 107, 1065–1066. Bolia, R. S., Nelson, W. T., & Morley, R. M. (2001). Asymmetric performance in the cocktail party effect: Implications for the design of spatial audio displays. Human Factors, 43, 208–216. Bronkhorst, A. W., Veltman, J. A. H., & van Breda, L. (1996). Application of a three-dimensional auditory display in a flight task. Human Factors, 38, 23–33. Brungart, D. S., & Durlach, N. I. (1999). Auditory localization of nearby sources. II. Localization of a broadband source. Journal of the Acoustical Society of America, 106, 1956–1968. Brungart, D. S., & Rabinowitz, W. M. (1999). Auditory localization of nearby sources. Head-related transfer functions. Journal of the Acoustical Society of America, 106, 1465–1479. Brungart, D. S., & Simpson, B. D. (2001a). Auditory localization of nearby sources in a virtual audio display. In Proceedings of the 2001 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Pfaltz, NY: pp. 107–110, pub-IEEE. Brungart, D. S., & Simpson, B. D. (2001b). Distance-based speech segregation in near-field vitual audio displays. In Proceedings of the 2001 International Conference on Auditory Display (pp. 169–174). Ericson, M. A., & McKinley, R. L. (1997). The intelligibility of multiple talkers separated spatially in noise. In R. H. Gilkey & T. R. Anderson (Eds.), Binaural and spatial hearing in real and virtual environments (pp. 701–724). Mahwah, NJ: Lawrence Erlbaum Associates. Haas, M. W., Nelson, W. T., Repperger, D., Bolia, R. S., & Zacharias, G. (2001). Applying adaptive control and display characteristics to future air force crew stations. International Journal of Aviation Psychology, 11, 223–235. Heffner, R. S. (1997). Comparative study of sound localization and its anatomical correlates in mammals. Acta Otolaryngologica Supplement, 532, 46–53. Heffner, R. S., & Heffner, H. E. (1992). Visual factors in sound localization in mammals. Journal of Comparative Neurology, 317, 219–232. Hettinger, L. J., Cress, J. D., Brickman, B. J., & Haas, M. W. (1996). Adaptive interfaces for advanced airborne crew stations. In Proceedings of the Third Annual Symposium on Human Interaction with Complex Systems (pp. 188–192). Martin, R. L., McAnally, K. I., & Senova, M. A. (2001). Free-field equivalent localization of virtual audio. Journal of the Audio Engineering Society, 49, 14–22. McKinley, R. L., Ericson, M. A., & D’Angelo, W. R. (1994). Three-dimensional auditory displays: Development, applications, and performance. Aviation, Space, and Environmental Medicine, 65, 31–38.
9.
SPATIAL AUDIO DISPLAYS
197
McKinley, R. L., & Moore, T. J. (1989). An information theory based model and measure of speech communication effectiveness in jamming. In Proceedings of Speech Tech ’89 (pp. 101–105). New York: Media Dimensions. Moore, T. J. (1981). Voice communication jamming research. In AGARD conference proceedings 311: Aural communication in aviation (pp. 2:1–2:6). Neuilly-sur-Seine, France: Morley, R. M., Bolia, R. S., Nelson, W. T., Alley, T. R., Tyrrell, R. A., & Gugerty, L. (2001). Dual task interference between a speech communications task and a sound localization task. In Abstracts of the 24th Midwinter Meeting of the Association for Research in Otolaryngology (p. 260). Moroney, B. W., Nelson, W. T., Hettinger, L. J., Warm, J. S., Dember, W. N., Stoffregen, T. A., & Haas, M. W. (1999). An evaluation of unisensory and multisensory adaptive flight path navigation displays: An initial investigation. In Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting (pp. 71–75). Nelson, W. T., Bolia, R. S., Ericson, M. A., & McKinley, R. L. (1998a). Monitoring the simultaneous presentation of multiple spatialized speech signals in the free field. In Proceedings of the 16th International Congress on Acoustics and the 135th Meeting of the Acoustical Society of America (pp. 2341–2342). Nelson, W. T., Bolia, R. S., Ericson, M. A., & McKinley, R. L. (1998b). Monitoring the simultaneous presentation of spatialized speech signals in a virtual acoustic environment. In Proceedings of the 1998 IMAGE Conference (pp. 159–166). Nelson, W. T., Hettinger, L. J., Cunningham, J. A., Brickman, B. J., Haas, M. W., McKinley, R. M. (1998). The effects of localized auditory information on visual target detection performance using a helmet-mounted display. Human Factors, 40, 452–460. Perrott, D. R., Cisneros, J., McKinley, R. L., & D’Angelo, W. R. (1996). Aurally aided visual search under virtual and free-field listening conditions. Human Factors, 38, 702–715. Perrott, D. R., Saberi, K., Brown, K., & Strybel, T. Z. (1990). Auditory psychomotor coordination and visual search performance. Perception & Psychophysics, 48, 214–226. Perrott, D. R., Sadralodabai, T., Saberi, K., & Strybel, T. Z. (1991). Aurally aided visual search in the central visual field: Effects of visual load and visual enhancement of the target. Human Factors, 33, 389–400. Ricard, G. L., & Meirs, S. L. (1994). Intelligibility and localization of speech from virtual directions. Human Factors, 36, 120–128. Rudmann, D. S., & Strybel, T. Z. (1999). Auditory spatial facilitation of visual search performance: Effect of cue precision and distractor density. Human Factors, 41, 146–160. Sokolov, Y. (1967). Perception and the conditioned reflex. New York: Pergamon. Sorkin, R. D., Wightman, F. L., Kistler, D. J., & Elvers, G. C. (1989). An exploratory study of the use of movement-correlated cues in an auditory head-up display. Human Factors, 31, 161– 166. Tannen, R. S., Nelson, W. T., Bolia, R. S., Haas, M. W., Hettinger, L. J., Warm, J. S., Dember, W. N., & Stoffregen, T. A.(2000). Adaptive integration of head-coupled multi-sensory displays for target acquisition. In Proceedings of the IEA 2000/HFES 2000 Congress (Vol. 3, pp. 390–393). Treisman, A., & Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. Wenzel, E. M. (1992). Localization in virtual auditory displays. Presence, 1, 80–106. Whittington, D. A., Hepp-Reymond, M.-C., & Flood, W. (1981). Eye and head movements to auditory targets. Experimental Brain Research, 41, 358–363. Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419–433. Yost, W. A. (1997). The cocktail party problem: Forty years later. In R. H. Gilkey & T. R. Anderson (Eds.), Binaural and spatial hearing in real and virtual environments (pp. 329–347). Mahwah, NJ: Lawrence Erlbaum Associates.
10 Learning Action Plans in a Virtual Environment Simon Goss Defence, Science and Technology Organisation (DSTO) Aeronautical and Maritime Research Laboratory Air Operations Division
Adrian Pearce Department of Computer Science and Software Engineering The University of Melbourne
Our view is that facilitating human-centred design in the adaptive virtual interface is about recognizing intentionality of the user. The operator has a purposeful motivation in undertaking activity in the virtual environment. This can range across a number of task-focused activities, including mission rehearsal in a simulator, where the intentions relate to the achievement of mission goals to the purely recreational exploration of a cyberspace. All of these use the computer as a transformative technology for immersion. The cockpit and computer screens become the experience of flying; the gloves and the screen put the surgeon’s hand in the digital patient of an anatomy trainer. Further, the same interface may provide an augmented reality in the actual environment, such as the surgeon performing telepresence surgery, or in the electronic cockpit, where alternate sensor views are superposed in a fused information device to assist with traversal of physical space and the avoidance of real physical threats. Particular information is required at particular times in response to particular contingencies, and the display device filters and shapes the information appropriately for the context. The context includes the pilot’s intentions. Design choice constrains the affordances offered by the virtual environment interface. For our purposes, a virtual interface requires an explicit representation of intentionality in the internal representation of agent implementation. We subscribe to the folk psychologic belief, desire, and intentionality notion of agency in the 199
200
GOSS AND PEARCE
construction of an interface agent, and in its interpretation of the actions of entities in the virtual environment. Here, an agent emulates rational behavior in that it has intentions that it forms according to its beliefs and goals. An agent uses predefined plans, which are applicable to the situation, to fulfil its intentions (long-term persistent goals). Such an agent is differentiated from an object-oriented entity in that it is reflective rather than immediately reactive to environmental sensor input. (For a description of the agent formalism, see Georgeff & Lansky, 1986; Rao & Georgeff, 1991). An intention-facilitating virtual interface recognizes the plans of the user. The level of delegation and authority given to the interface agent (assistant, associate or even supervisor) in taking actions having recognized the plans of the system user by observation of the user’s actions and the sensed environment is a current issue in system construction (for example, in the degree of delegation given to the virtual personal assistant in communication space, or the amount of autonomous authority given to an electronic crew member embedded in the avionics of the cockpit of the future; Miller & Goldman, 1997). The interface agents need to recognize the situation and service the user intentions; all require recognition of intention of the user. Rao and Murray (1994), working in the domain of pilot agents in an operations research simulation system, indicated that one way to implement recognition is to reflect on one’s own behavioral repertoire (the plans one knows about) and ascribe these to other agents (Rao & Murray, 1994). Intention recognition becomes a search-through plan space for plans that match the observed actions of the other entity. This has been demonstrated in a limited capacity in a prototype system (Tidhar & Busetta, 1996) that shows a dramatic change of outcome when agents reason about what other agents might be doing. In military simulations where agents provide artificial players, problems of coordination (Tambe et al., 1995) have been found to be due to failure to recognize intentional situations in teams (Kaminka & Tambe, 1997; Tambe, 1997). The (nontrivial) issue confronting model-based planning systems as interface agents is the recognition of plans of the user while in execution. The problem is harder than identifying an action on its completion. To be of practical assistance, an interface agent needs to know what is happening before that event is over. We explore this in the context of flight simulation and present a method for learning action plans from spatiotemporal data that describe action plans of agents or entities in a virtual environment. These are required for testing candidate operator intentions against operator action history and are interpretable as partial instantiations of (operator or agent) intentionality. Our method of constructing procedures requires three components: (1) an appropriate ontology (model of operator task performance), (2) an appropriate virtual environment architecture (accessibility of data and image generation databases), and (3) a learning procedure (which relates the data stream to the domain ontology). In simple terms, we are looking at the domain of circuit flight. We have a task analysis for circuit flight. The flight simulator has an authentic flight model for
10.
ACTION PLANS
201
a PC9 aircraft and a cockpit with generic throttle and stick controls. It also has a particular software architecture conferring special data-recording properties. A relational learning technique is used to relate the data from the flight simulator to the task analysis. We build relations that describe generalized flight plan segments. In practice, these run in real time and announce attributed plan segments while the pilot is executing them. This is a compelling demonstration of the feasibility of real-time recognition of intention in a user interface to an immersive virtual environment task. We assert that our results have wider significance and may form part of the foundation for the construction of agent-oriented simulations and, more broadly, virtual environments.
ONTOLOGY ACQUISITION To recognize the intentions of the user across the virtual interface, we need to understand the activities the user is undertaking. To relate a plan-level description of pilot activities to the detailed data observable in the world of the flight simulator requires explicit representation of the activity goal structure. An ontology is a set of terms, their definitions, and axioms relating them; terms are normally organized in a hierarchy (Noy & Hafner, 1997). Ontology acquisition involves task analysis and knowledge engineering. Simple methods were used: consulting training materials used by a flight instructor, interview of a flight instructor, and a contrived method (Schvaneveldt, 1990; Schvaneveldt, Durso, Goldsmith, Breen, & Cooke, 1985) that used pairwise comparison of terms obtained from an aviation psychologist to construct a knowledge network. The results of each were used to construct a task hierarchy for concept demonstration purposes of the recognition of intention. The Flight Domain Pilot skills are comprised of hierarchic competencies. Some of these involve the ability to recover from abnormal situations. Our design choice of an agent-oriented implementation of the interface agent views these as a nested hierarchy of goals. The training method involves acquisition of concepts and then application of the skills these describe to build, first a set of part-task skills, and then more complex combinations and refinements of these. This process is paralleled in the machine learning method here, where a domain ontology is acquired, and then procedural ontological elements are acquired. Pilots first learn the effects of the controls, taxying, straight and level flight, climbing, descent, and turning (level, ascending, descending, and descending to selected headings). Stalls are practiced mainly as a preventive measure. Competency must be demonstrated in recognition and recovery from stalls and incipient spins. These are combined in the complex exercise of circuit flight. A circuit consists of four legs in a box shape, with the runway in the center of one of the long legs. Take-off and landing are into the wind. After take-off,
202
GOSS AND PEARCE
Downwind Leg
1000ft Above Aerodrome Level Downwind Leg
5-600ft Base Leg Take Off Final Leg
Windsock
5-600ft
FIG. 10.1. The normal circuit pattern is flown at 1,000 ft above aerodrome level.
an aircraft climbs to a height of 500 ft above the aerodrome level. On the crosswind leg, the aircraft climbs to 1,000 ft above the aerodrome. Height is maintained at this level on the downwind leg, which is parallel to the runway. The aircraft descends on the base leg to a height of 500–600 ft above aerodrome level and turns to fly the final leg directly along the line of the runway. This is shown in Fig. 10.1 Competency must be demonstrated in a range of sequences. For example, in a crosswind leg, although the wind may be blowing straight down the runway during the take-off, winds can change without notice. Other tasks are learned once the pilot has achieved a first solo flight prior to graduating: steep turns (which require a different control strategy); recovery from unusual attitudes, low-level flight, forced landings without power, and formation flight (keeping station, changing between formation patterns, and flying formation circuits). A Knowledge Map The Pathfinder method of expertise model acquisition was used. An aviation psychologist generated a list of concepts pertaining to circuit flight. Terms from the final landing phase dominate, as this is both the most significant part of the flight task (landing safely) and one of the most studied aspects of aviation. There is literature from simulator studies that seeks to understand the psychophysical cues used to judge height and glide slope in a final approach (Lintern & Koonce, 1992; Lintern & Walker, 1991) Exhaustive pairwise comparison of 53 terms (see Fig. 10.2) took far too long (over 2 hr). A more reasonable number would have been 30. The resulting knowledge structure diagram for an expert was significantly different to that of a novice. There was little understandable structure for the novice. The result for the expert finds the concepts in four clusters; these are related to
10.
ACTION PLANS
203
activities of general flight, effect of instruments, and approach-and-landing, which is split into out-the-window observables and instrumental cues. The Ontology Our ontology is small and incomplete. We present it as a task hierarchy in Fig. 10.2. Ontology is an arena in which psychology and computer science “interface.” Some elements are to be found in the architecture of the simulator software and interface, some in the descriptions of activities in the simulator. For instance, the ontology of circuit flight involves control actions to achieve navigation and flight control goals. These goals are part of the task hierarchy; the controls are part of the interface. There is the complication that contingency plans are executed in parallel and that plans at several levels can be currently under execution and interleaved. For Flaps & Throttle Lift Off Take-off
Climb (Flaps) Climbing Turn Climb
Crosswind Leg Medium-Level Turn
Straight and Level Circuit Flight
Downwind Leg Medium Level-Turn Descend (No Flaps) Base Leg
Descend (Flaps) Medium Descending Turn Descend (Flaps)
Final Leg Descend (Full Flaps)
Landing
Round Out Touch Down
FIG. 10.2. The circuit flight task hierarchy. Maneuvers are represented as a hierarchy of maneuvers and submanoeuvers or events. The abstraction possible in our FSIM demonstrator can be nested, providing a decomposition of maneuvers and events. That is, to talk about the activity of compound entities you need to be able to explicitly represent the relationship between entities across the ontology.
204
GOSS AND PEARCE
example, the goals of safety are concurrent with goals of navigation and communication. In the implementation a blackboard system is used in which procedures sit and watch the input space in parallel. However, the hierarchical representation of goals is useful for navigating the knowledge structure and organizing training sessions in the simulator.
THE VIRTUAL ENVIRONMENT ARCHITECTURE The architecture of the virtual environment constrains the interactional possibilities. Paraphrasing Boden (1972), it is not how the world is, but how it is represented as being that is crucial with regard to the truth of intentional statements. To make sense of the actions of the user and of other virtual agents in the virtual environment in intentional terms, we need to be able ask and answer questions of the environment. The architecture we describe arose from attempts to use flight simulators as knowledge acquisition tools to get rules of pilot performance with the eventual aim of creating agent rule bases to provide artificial agents as opponents and allies in human-in-the-loop simulation, and to represent crew behavior in the operations research models of air engagements. The insight driving this was that we can have access to the image generator object database and include in it labels for objects as well as rendering information for visual displays. We can then create a data record, on a frame-by-frame basis if required, which relates the user actions to symbol-level descriptions of the virtual world presented to the user as imagery. This is the raw material of descriptions and perception of intentional acts in a virtual environment. Our Virtual Environment In addition to the status of navigation instruments and past actions, pilots use knowledge of the world, both in own-ship (egocentric or view-dependent) and map-view (exocentric or view-independent) representations. Such information is critical in control and trajectory planning in the visual flight regime. We refined a workstation–based flight simulator used in the machine learning of control strategies (Sammut, 1992) by rewriting it to provide the worldview and by including a verasic flight interface and model. Our flight simulator has a significant difference in veracity, interface to pilot and data architecture. It has the ability to not only record the actions of pilots, in terms of instrument variables, but also dynamic knowledge of the world; the positions of objects and dynamic entities in the threedimensional flight course relative to the pilot, and pilot motion (Goss, 1993; Goss, Dillon, & Caelli, 1996) The flight controls for the simulator include a control column, throttle, brakes, rudder, and flaps, in a generic single-seat cockpit with rudder pedals, stick, throttle, and stick-mounted switches. The switches are used for viewpoint controls, the
10.
ACTION PLANS
205
autopilot and mode switching and cuing of the flight simulator software. The cockpit is trolley-mounted with a seat. A monitor in the trolley is used for instruments. The worldview is projected onto a wall in a darkened booth with a video projector. The simulator runs using (possibly many) video projectors in a projection room. (We are in fact using the work-up and part-task psychometric simulator facilities of the Defence, Science and Technology Organisation (DSTO) Air Operations Simulation Centre, which has a variety of wheel-in cockpits for fixed and rotarywing aircraft, and a variety of visual displays, including helmet-mounted displays and a 200-deg × 100 deg partial dome). For remote demonstration purposes, we have implemented a desktop flight simulator throttle and stick. The work station monitor provides the out-the-window view. A Flybox1 provides throttle and stick and switches. A general view in the simulation community is that this level of interface is sufficient for ancillary players, whose purpose is to provide agency for other players in the virtual environment, such as wingmen or incoming targets, for experimental participants. The simulation center facility was acceptable to most pilots. There is also a mouse and keyboard interface. In each case we can monitor the control actions of the pilot. The displayed instruments include airspeed, direction, an artificial horizon, rate of climb, throttle position, and flap-position indicators. Movement through the virtual environment is based on a 6 DF flight model, which uses a database validated from wind tunnel experiments. The flight model is authentic to a particular class of aircraft, in our case the PC9, a high-performance single-engine propeller driven airplane. We record the virtual world in our data structures and are able to relate operator activity to goals in the outside world. The main additional requirements to a typical flight simulator are for recording the visual and geographic positions of objects and to determine their visibility. We accomplish this with a symbolic description generator (SGD), which is analogous in operation to the image generator in a virtual environment. Its function, however, is to render a description of the component of the scenery and their mutual relations rather than render the pixel image from the object database. The SDG consists of two levels. The first generates raw data that describes the positions and visibility of target points on each object in the simulation. The second, and most important, level converts these data into a rich symbolic description of the visual scene. The simulated world is a large area containing natural features such as mountains and rivers, and cultural features like buildings and runways. Objects can be static or dynamic, such as moving vehicles. The world is described through a number of object databases that are loaded via a command file. For recording descriptions of imagery, issues include the frame update rate and the rate of update of description. There are many permutations of imagery that
1 Described
at http://www.bgsystems.com/products/FlyBox.html
206
GOSS AND PEARCE
would correspond to a single symbolic description. The scene description is insensitive to small changes in scale and ranging. The data recording rate is variable, depending on the underlying hardware and the current scene complexity. Time is calculated in absolute terms, so a varying visual update rate does not affect the subsequent symbolic processing. Side, rear, and map views are available, and viewing mode changes are also logged. The object database contains a set of objects, each of which may have a number of trackable or target points on them. These points are used to give quantitative relative relationships when referencing objects. For example, when referencing a mountain, it is useful to reference the peak of the mountain, a set of points around the base, and possibly the volumetric centroid of the mountain. Multiple target points on an object also allows the observation of higher order properties, such as the relative rotation of an object. The output is in the form of time-series relational statements, which can be illustrated with periodic in-place images as required. An example is shown in Fig. 10.3. The symbolic statements refer to the visibility of objects, their spatial relations, their relationship to the center of visual flow, the pilot controls, and the absolute position of objects and the simulated aircraft. Each variable or variable pair can be controlled to different levels of quantification, or different thresholds for noticing change. The output can be in a linguistic variable form or a numeric form. Additional features of the system include an autopilot system that permits either direct implementation of a discrete time controller or the integration of a machine learning system that uses previously recorded data to determine operational flight rules. A wingman view is available, suitable for flight replay or monitoring autopilot behavior. The design of the data record, the replay modes, and the annotation facilities are significant design issues in the use of virtual environments as research environments. Learning Plans In our flight simulator work, we build on the work of Sammut (Sammut, Kedzier and Michie 1992; Sammut, 1996). Their work in machine learning concerned the construction of behavioral clones from traces of operator action on a work station running a flight simulator program. A consensus view (generalization across a number of operator traces) is constructed as an autopilot. This autopilot can then fly the simulator. Because it encompasses general tendencies rather than recording a particular episode, it captures the underlying strategy and reduces the effect of episodic variation. The behavioral clone is a characterization of the control strategy of the operator at the task. This work represented a significant departure for machine learning from dealing with static data and classifier tasks. From our point of view, the ability to reproduce behavior is not the same as being able to recognize it. The subjects in the simulator were told to fly a flight plan with respect to features in the environment. The autopilot rules were at the level of operation of control devices such as flaps and throttle (see Fig. 10.4). Our
16.2: Control(Pilot,0,-245) 16.2: Position(Plane,-3205.6,5.1,-3975.0) RPY(Plane,0,357,90) 16.2: Thrust=10000.0 Rudder=0.0 Airspeed=98.6 Climb=31.6 16.2: Flaps=20 Gear(Down) Landed(No) Stalled(No) ....................................................................................... 18.1: 1 1 runway1:end viewable visible -2500.0 0.0 -3975.0 0.0 -17.6 18.1: Position(Plane,-3102.6,10.6,-3975.0) RPY(Plane,0,352,90) ....................................................................................... 18.6: 14 0 mountain2:centre viewable visible 1875.0 50.0 -5000.0 -15.7 -17.0 18.6: 16 0 mountain4:centre viewable visible 4050.0 58.3 -3800.0 1.8 -17.0 .......................................................................................
FIG. 10.3. FSIM flight simulator trace. Top: The FSIM flight simulator is based on a PC9, high-performance single-engine propeller driven aeroplane that utilizes a windtunnel database plane model. The world and objects can be loaded in at run time and is displayed, including airspeed, direction, artificial horizon, rate of climb, throttle position, and flap-position indicators, instruments. Bottom: The name and position of objects and entities with their relationships to the center of visual flow are recorded along with control actions and aircraft status parameters.
Z_feet <= 30642 : thrust_100 Z_feet > 30642 : | elevation > -43 : thrust_20 | elevation <= -43 : | | Z_feet <= -16382 : thrust_10 | | Z_feet > -16382 : thrust_0 FIG. 10.4. An example of an autopilot rule (Sammut, 1992). A rule controlling the throttle during ascent is shown, where Z feet is the distance from the runway during approach to land.
207
208
GOSS AND PEARCE
research goal was to produce behaviour. The outside world was referenced only indirectly in terms of distance from the origin of the Cartesian coordinate system set at the foot of the runway. Figure 10.4 is an example of a rule controlling the throttle during ascent, where Z feet is the distance from the runway during approach to land. The rules have no explicit representation of the external view or the features in the external (virtual world). The autopilot, when invoked at an airfield 10 miles to the northeast of the flight profile, is flown irrespective of objects and dynamic entities, in the same way that a wayward welding robot might operate if moved off a production line. Rules at this level do not offer explanation or description; they are low-level machine instructions to an autopilot. We use relational learning techniques to relate task descriptions to the control actions and available information from the displays (instruments and external world view). Relational representation is a powerful technique for interpreting visual and temporal information (Bischof & Caelli, 1994, 1997; Pearce, Caelli, & Bischof, 1996), including the online interpretation of traffic intersections for legal turns and right of way (Dance & Caelli, 1993). CLARET (consolidated learning algorithm based on relational evidence theory ) utilizes the constraints present in time series data: those of states and their continuous valued, attributed relationships (the scenery) and actions or designs (the scenario). Relational rules are generated that explicitly depict actions and relationships between states of the general form shown in Fig. 10.5. Here, while is a bounded temporal condition upon if. Learning while structures is a significant step in machine learning in dynamic domains. This is the key to The General Form: WHILE interpreting (or intending to achieve) goal_i IF this state and that state have these relationships in space AND this action and that action have those relationships in time AND ... THEN describe (or prescribe) sub-goal_j at time t.
An instance: While in context of FLYING-CIRCUIT if LEVEL-LEFT-TURN before CRUISING-ALTITUDE and if LEVEL-LEFT-TURN overlaps WIND-AT-RIGHTANGLES ... THEN intention is TURN-TO- CROSSWIND-LEG FIG. 10.5. CLARET rules, the general form. Relational rules are generated that explicitly depict actions and relationships between states during maneuvers.
10. Start roll
ACTION PLANS
209
Mid roll
Finish roll Bank level
Bank roll Mid turn S1, t=0
S3, t=5
S5, t=10
Controls:
Roll, Pitch, Yaw: S2, t=3
S’1, t=1
S’2, t=3
S4, t=6
S6, t=11
S’3, t=4
FIG. 10.6. Left-turn maneuver. Segmentation and relational feature extraction is carried out on the time-series trajectories obtained from each separate feature spaces, for example, controls, plane, and objects. These are labeled with unique identifiers S1 , S2 , S3 , . . . , forming interval-based action representations.
learning intentional structures in our method, and the heart of the symbol binding relating explicit coded rules for an interface agent to parametric bounds over the multiscaled feature space of the user interface. The data structures of the flight simulator permit the logging of both pilot actions and object positions over time, and over a large number of time intervals for different flight scenarios. For coding spatiotemporal sequences, representational hierarchies must integrate the notion of what a state is together with relative location to other states. An example of a left-turn maneuver is shown in Fig. 10.6. Figure 10.7 shows a diagrammatic description of the glide approach and landing. It can be regarded as a spatiotemporal map. There are strong adjacency relations, both spatially and temporally in the dynamic traversal of this space. A trajectory is an ordered list or sequence. The CLARET representation of this is shown in Fig. 10.8. Here a maneuver at one scale is a submaneuvre at another scale. The data are obtained in the flight simulator when the pilot records the beginning and end of each maneuver or event in the ontology. Input flight trajectory (numeric roll-pitch-yaw and object relationships) from the flight simulator is collected and segmented. Similar segmentation and relational feature extraction is carried out on
210
GOSS AND PEARCE
the time-series trajectories obtained from each separate feature spaces, for example, controls, plane, and objects. These are labeled with unique identifiers S1 , S2 , S3 , . . . , forming interval-based action representations. Successive applications of the matching algorithm result in both scene interpretations (states) and hierarchical scenario interpretations (actions). Hierarchical interpretation involves the successive application of attribute partitioning (numerical learning), graph matching, and relational extension (inductive logic programming). (10) More pronounced round-out
(9) Select full flap when absolutely certain of reaching the filed
(8) Medium descending turn-20 angle of bank (30 max). Monitor airspeed closely
Look Out Aiming Point
(7)
Adjust base to arrive higher than normal on final
(6) Select partial flaps as required 45
(5) Initially descend (no flaps) (4) Descent point: • Power OFF • Maintain height until speed reduces to glide speed • Select glide attitude • Trim
Look Out
(3) Re-assess wind effect
(2) Turn base earlier than usual, especially in strong winds (1) Pre-landing checks. FIG. 10.7. Glide approach and landing. A diagrammatic description of the glide approach and landing is shown. It can be regarded as a spatiotemporal map. There are strong adjacency relations, both spatially and temporally in the dynamic traversal of this space.
10.
ACTION PLANS
211
FLYING-CIRCIUTS
CRUISING-ALTITUDE
LEVEL-TURN
WIND-AT-RIGHT-ANGLES
While in context of FLYING-CIRCUITS If LEVEL-TURN before CRUISING-ALTITUDE and If LEVEL-TURN overlaps WIND-AT-RIGHT-ANGLES Then intention is TURN-TO-CROSSWIND- LEG FIG. 10.8. Motion sequence graph. Top: Interval-based representation is shown. A maneuver at one scale is a submaneuver at another scale. The data are obtained in the flight simulator when the pilot records the beginning and end of each maneuver or event in the ontology. Bottom: The corresponding CLARET general rule form is shown for this interval-based activity in the FSIM flight simulator.
Recognizing Learned Plans In supporting intentionality in a virtual environment interface agent, we anticipate user action by recognizing intentions instantiated as plans under execution. In the flight simulator, the aim is to predict future pilot actions, in a given time interval, from pilot actions, instrument settings, object coordinates, and near-object characteristics at previous time intervals. What makes this problem one for machine learning is that more traditional time series can not be applied. This is because of the complexity of the data and the fact that very different variable states can require similar pilot actions over a flight path. CLARET uses a relational evidence network to deterministically order the search resulting in an admissible strategy. Real-time performance in the flight simulator is made tractable by reducing the cardinality of the search space using dynamic programming principles and relational evidence measures to prune the computation involved. Queries are delineated from indexing operations such as attribute-based lookup as the data is relational. This is reflected in the form of input data used, whether in the form of lists of attributes or relations that encode specific instantiations (combinations) of states. Here, local information or feature-indexed attributes are extracted from states (such as duration), as well as part-indexed relationships (such as time differences between states). In recognition mode, the system dynamically binds to different maneuvers as they occur in the “trajectories” of input time series. For example, in describing flight, an approach-to-land maneuver is defined by the subsequence of different roll-pitch-yaw states of the plane and different actions on the control yoke. The system can now be used to interactively query partially enacted sequences in predictive mode or describe sequences in descriptive mode.
212
GOSS AND PEARCE
This now opens the possibility of adaptive response according to ascription of user intention by observation of system state and intentional interpretation of user behavior. For example, the system can either advise the pilot during the mission, and adapt visuals and the activity of other entities in the scene accordingly. In the case of the wind changing direction, other planes may be rerouted, ground vehicles may be redeployed, and the ground lighting on the runway adjusted accordingly. A parser converts spatiotemporal rules obtained by the learning procedure into descriptions that are consistent with the syntax of the target belief, desire and intentionality (BDI) agent formalism. The output is also human-readable, allowing for interactive validation. A human-readable summary of activity that occurred during the simulation is also available for analysis by pilots and trainers.
DISCUSSION Task Validity and Fidelity Issues Fidelity and face validity of task are significant issues in the provision of synthetic experiences for experts in real-world activity. The fidelity of the flight model (to aircraft class) was at least as important to pilots as the fidelity of the cockpit. Providing synthetic experience is simpler in areas such a vehicle movement through space, where in the natural world some interposed means between the user and the environment is the norm, and actions are effected through controls such as throttles, rudder pedals, and control yokes. If these are electronic rather than mechanical in nature, such as the fly-by wire cockpit, so much the better. Simulators that are adequate for computer science research into modeling spatiotemporal expertise are not necessarily adequate for aviation-related virtual experiences. Former general aviation and airline crew did not like early versions of our system, in which the flight model was unrealistic. For them, the environment was not immersive. However, it was adequate, particularly with increasing sophistication of cockpit and controls, with a low-fidelity flight model to simulation technology and aeronautical engineers with some flight training due to their familiarity with a range of fidelity in simulators and the difference in their expectations of a “flight” simulator. In work by others, it has been found that task performance in maintaining flight profiles was degraded in a simulator compared with the natural world until the pitch changes in engine noise was rendered authentically (Oldfield, Meehan, & Goss, 1995). Architecture Issues The virtual environment, as generally practiced, is principally visual sensory stimulation. The hardware and system vendors are image output oriented in systems design. Systems are optimised for and described in terms of generating high frame
10.
ACTION PLANS
213
rates and polygons per frame. This includes a Aristotlean view of the virtual world in its implementation. Objects are effectively illuminated by rays of light emanating from the design eye point. Complexities occur in scaling visual-oriented simulations to multisensate virtual environments. For instance, if an object makes a sound in the middle distance, where is the sound it makes stored? With the sound source, or as an adjunct property of each of the objects the sound touches? Does a tree falling in a forest make a sound? There are work-arounds to each new situation, but as there is substantial variance in the internal model in the simulation from the physics of the real world, then each addition may require a system rebuild rather than a system extension. Aliasing, a technique in which, for instance, a tree image is rendered multiple times to generate a forest, is a technique commonly used to speed up polygon counts in image generation. We require a label for each rendered tree, or labeled points in the forest for the descriptions of the scenery. To exploit machine learning of user models (and machine perception of modelled users), we need labeling of images, access to the image generation database, and user control actions. To date, the suppliers of simulation systems have been more concerned with polygon counts than accessibility of the data record (either for postprocessing or available in real time to other computational processes). As users react to the images encoded in the pixels and polygons in displays, statements about the plans that they are enacting and their intentions are only sensible with regards to object and entity labels. Encourage your suppliers to include the possibility of including labels in image databases. Incorporate this into the design if you write your own. With appropriate architecture and labeling, we have an environment in which computer processes can act on “high-level percepts.” If objects, actions, and relation labels can be read from a database, it is not necessary to execute low-level perceptive processes. The same formalism that is used for the description of human operator behavior at the cognitive goal level can be used for the implementation of computational process in an agent-oriented system. The intentional structure gives us a description of what the user does as a template for the model of the virtual interface agents. A benefit of agent-oriented design is that new plans may be incrementally entered without interruption to the system. The virtual environment comes a substantial way out of the laboratory and toward the complexity of the real world, and is an exciting vehicle for the experimental cognitive science. This places some requirements on the design of the virtual environment to include task veracitude and provision for data gathering and analysis. What we have at the moment is dynamic visual behavior, maybe with motion cueing and 3-D spatial sound. The user of a research simulator is the experimenter. The person manipulating the interface and experiencing the displays is an experimental subject. Research systems should facilitate research tasks such as the recording and interpretation of data in addition to that which is necessary to provide appropriate stimuli to the subject. The method we demonstrate is well suited to generating summaries of experimental observations. Given a data stream,
214
GOSS AND PEARCE
a coding scheme (which could be high level), and exemplars in a training set, automated coding of data in circumstances now involving laborious video analysis can be achieved. Ontology Revisited The purpose of the ontological engineering endeavour is to identify the range of possibility of user interaction. The interface agent and system architecture should support that. The user may have experiences outside the design of the system builder, but the system will only sense that which is built into its model of the world. There is no cogent theory of the construction of these systems on an application scale. Ad hoc programming idioms are used. The engineering development of the internal mechanisms of agent-oriented simulation environments is in advance of theory. In knowledge-based systems such as KADS, structures for the organizational context are in the conceptual scheme, but employed as “fix-its” rather than as the cornerstones of systems design (Goss & Shadbolt, 1998). The least well-developed aspect of the work-in-progress reported in this chapter lies in ontology acquisition. The task analysis methods of traditional human-computer interfaces (HCI) are lacking. Most work to date is in the interaction of a single user interacting across a single interface. Teamwork and coordination are the metaphors needed in creating virtual interface agents. The intentional structure required for the provision of a virtual team member to work with a user, such as an artificial wingman working in concert with a pilot against teams of antagonists, needs to accommodate shared intentions and dynamic coordination and role allocation. Agents need to model the intentional structure of other agents in order to achieve team behaviours. Modeling of deception and competitive behavior is even more complex, involving recursive “I believe that he believes that I believe . . . ” situations. The same structure is needed for the digital personal assistant in the commercial cybersphere. The statement of hierarchic plans has some similarity to those in Air-Soar (Pearson, Huffman, Willis, Laird, & Jones, 1993). Tambe et al (1995) use the Soar architecture to control a workstation–based flight simulator through the successive application of operators within a series of subgoals and problem spaces. These are hierachic in nature and increasing from homeostatic goal achievement (maintain altitude), and hetrostatic goals achievement (ascend and descend). Tac-Air-Soar has two views, one centred in aircraft parameters (roll, pitch, yaw, and control settings of ailerons, elevator and throttle). The other world-centred, where operators achieve positioning and orientation (in Cartesian space). These operators give changes in rates of climb, turn, and velocity. Because the domain is the same and Soar is an agent architecture, this is not surprising. Our work, however, has some significant differences in the fidelity of simulation and task, and in the architecture of the simulator. The agents we construct cannot only undertake activities, but report on the actions of other agents while undertaking activities before completion.
10.
ACTION PLANS
215
This is the important aspect for the design of interface agents, as recently incorporated into our military simulations (Heinze, Goss and Pearce 1999, Heinze et al 2002 Tidhar et al 1999).
CONCLUSION The virtual environment comprises a logical map, the database, the human interface, and the stimulation, principally optical and auditory rendering channels. As designers, we control the image databases and can label objects, entities, and even behaviors. Our results are preliminary and do not yet fully achieve our project vision. They do, however, offer some useful insights in the recognition of plans and in the construction of plan recognizers. The process of constructing a system has alerted us to deficiencies in current theory in task analysis and in the design and construction of simulators. A focus on the data record and techniques for its creation, analysis, and manipulation as part of the design for the technologies of sensate immersion allows the possibilities of simulator systems and virtual environments as a new observational environment for human factors and psychology. The results we have achieved are a significant step toward to use of the virtual environment for the capture of procedural expertise and affirm our belief in the utility of (computational) cognitive folk psychology in constructing systems. They demonstrate a method of recognizing intentional actions while in the process of execution.
ACKNOWLEDGMENTS Discussions with Terry Caelli, Claude Sammut, and Mike Bain are gratefully acknowledged. This work was undertaken as collaborative research between the Defence Science and Technology Organisation (DSTO), Curtin University and the University of Melbourne. Thanks to Jane Phipps for unpublished PATHFINDER results.
REFERENCES Bischof, W., & Caelli, T. (1994). Learning structural descriptions of patterns: A new technique for conditional clustering and rule generation. Pattern Recognition, 27, 689–698. Bischof, W. F., & Caelli, T. (1997). Visual learning of patterns and objects. IEEE Transactions on Systems Man and Cybernetics, 27(6), 907–918. Boden, M. (1972). Purposive explanation in psychology: Cambridge, Harvard University Press. Dance, S., & Caelli, T. (1993). A symbolic object-oriented picture interpretation network: SOO-PIN. In H. Bunke (Ed.), Advances in structural and syntactic Pattern Recognition. Singapore: World Scientific, 530–541.
216
GOSS AND PEARCE
Georgeff, M. P., & Lansky, A. L. (1986). Procedural knowledge [Special issue]. Proceedings of the IEEE, 74, 1383–1398. Goss, S. (1993 February). An environment for studying human performance and machine learning. Paper presented at the Second Australasian Cognitive Science Conference, Melbourne, Australia. Goss, S., Dillon, C., & Caelli, T. (1996). Producing symbolic descriptions of generated imagery. Paper presented at the First International Meeting in Simulation Technology and Training (Simtect), Melbourne, Australia. Goss, S., & Shadbolt, N. (1998). Design considerations in agent systems. Manuscript in preparation. Heinze, C., Goss, S., Josefsson, T., Bennett, K., Waugh, S., Lloyd, I., Murray, G & Oldfield, J. (2002). Interchanging agents and humans in military simulation, AI Magazine, 23(2), 37–47. Heinze, C. and Goss, S. and Pearce, A. (1999, August) Plan Recognition in Military Simulation: Incorporating Machine Learning with Intelligent Agents. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI99) Agent Workshop on Team Behaviour and Plan Recognition. Stockholm, 53–64. Kaminka, G. A., & Tambe, M. (1997, November). Towards social comparison for failure Detection. Paper presented at the Fall Symposium on Socially Intelligent Agents, Massachusetts Institute of Technology, Cambridge, MA. Lintern, G., & Koonce, J. (1992). Visual augmentation and scene detail effects in flight training. The International Journal of Aviation Psychology, 2(4), 281–301. Lintern, G., & Walker, M. (1991). Scene content and runway breadth effects on simulated landing approaches. The International Journal of Aviation Psychology, 1(2), 117–132. Miller, C., & Goldman, R. (1997, September). “Tasking” interfaces: Associates that know who’s the boss. Proceedings of the Fourth. Human Electronic Crew Conference, Kreuth, Germany. Noy, N. F., & Hafner, C. D. (1997). The state of the art in ontology design. AI Magazine, 18(3), 53– 74. Oldfield, S., Meehan, J., & Goss, S. (1995, November). Manned simulation—Research challenges and issues for aviation psychologists. Proceedings of the Third Aviation Psychology Symposium, Sydney, Australia. Pearce, A., Caelli, T., & Bischof, W. (1996). CLARET: A new relational learning algorithm for interpretation in spatial domains. Paper presented at the Fourth International Conference on Control, Automation, Robotics, and Vision (ICARv ’96), Singapore. Pearson, J., Huffman, S. B., Willis, M. B., Laird, J. E., & R. M. Jones. (1993). A symbolic solution to intelligent real-time control. IEEE Robotics and Autonomous Systems, 11(3–4), 279–291. Rao, A. S., & Georgeff, M. P. (1991). Modeling rational agents within a BDI architecture. Proceedings of the Second International Conference on Principles of Knowledge Representation and Reasoning, San Mateo, California, 473–484. Rao, A., & Murray, G. (1994). Multi-agent mental-state recognition and its application to air-combat modelling Proceedings of the Thirteenth International Workshop on Distributed Artificial Intelligence, Seattle, Washington, 283–304. Sammut, C. (1996). Automatic construction of reactive control systems using symbolic machine learning. The Knowledge Engineering Review, 11(1), 27–42. Sammut, C. H. Kedziers, S., & Michie, D. (1992). Learning to fly. At the Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland. Schvaneveldt, R. (1990). Pathfinder associative networks. Norwood, NJ: Ablex. Schvaneveldt, R. W., Durso, F. T., Goldsmith, T. E., Breen, T. J., & Cooke, N. M. (1985). Measuring the structure of expertise. International Journal of Man–Machine Studies, 2(3), 699–728. Tambe, M. (1997, November). Towards flexible teamwork. Proceedings of the Fall Symposium on Socially Intelligent Agents, Massachusetts Institute of Technology, Cambridge, MA.
10.
ACTION PLANS
217
Tambe, M., Johnson, W. L., Jones, R. M., Koss, F., Laird, J. E., Rosenbloom, P. S., & Schwamb, K. (1995, Spring). Intelligent agents for interactive simulation environments. AI Magazine, 16,(1), 15–39. Tidhar, G., & Busetta, P. (1996). Mental state recognition for air mission modeling. Paper presented at the TTCP X-4 Workshop in Situation Assessment, Naval Research Laboratory, Washington DC. Tidhar, G., Heinze, C., Goss, S., Murray, G., Appla, D. and Lloyd, I. (1999). Using Intelligent Agents in Military Simulations or “Using Agents Intelligently.” Proceedings of the Eleventh Innovative Applications of Artificial Intelligence Conference, American Association of Artificial Intelligence (AAAI), Deployed Applications paper, 829–836.
11 Fidelity of Disparity-Based Stereopsis Ian P. Howard Centre for Vision Research York University
“Stereoscopic vision,” or stereopsis, refers to the visual perception of the 3-D structure of the world. Monocular cues to depth include focusing, perspective, image overlap, shading, atmospheric opacities, and motion parallax. There are two binocular cues to depth—the vergence position of the eyes and binocular disparity. Binocular disparity is the difference in the positions of the images in the two eyes due to the eyes being about 6.5 cm apart. Under the best conditions, we can detect depth between two objects when the binocular disparity is only 2–6 arcsec. A three-dimensional scene can be created in a stereoscopic virtual reality system by presenting a distinct two-dimensional image to each eye with appropriate disparities between the images. The proper rendering of depth in a stereoscopic system requires a knowledge of the various types of disparity and of the distortions that can occur. In this chapter, I describe different types of binocular disparity and the role each type plays in stereoscopic vision and in the control of vergence movements of the eyes. I pay particular attention to the role of vertical disparities in stereopsis because until recently it was believed that only horizontal disparities are important. Differences in the way horizontal and vertical disparities are processed are related to the different roles they play in stereopsis. 219
220
HOWARD
I describe various types of distortion of three-dimensional vision that are likely to occur in a stereoscopic system. One type of distortion occurs when inappropriate disparities are introduced into a stereoscopic display. In a second type, the inclination of an isolated surface, such as a ground plane, tends to be underestimated, but is more correctly perceived when a second surface inclined at a different angle is introduced into the scene. Also, a vertical surface appears to slant in depth when adjacent to a slanted surface, an effect known as depth contrast. An important attribute of stereopsis is our ability to see a textured surface through a transparent textured surface. To do this, we must segregate two overlapping sets of texture elements, each with a distinct pattern of disparities. I describe several properties of this disparity segregation process. I start with a discussion of disparity between a single pair of images.
TYPES OF BINOCULAR DISPARITY Point Disparities The role of horizontal disparity in stereoscopic vision has been known since Wheatstone invented the stereoscope in 1836. The horizontal disparity of the binocular images of a point in space is their horizontal separation in retinal coordinates. The locus of points in space with zero horizontal disparity is, ideally, a vertical cylinder passing through the point of convergence of the visual axes and the centers of the two eyes, as shown in Fig. 11.1. The horizontal disparity of a point is a linear function of its distance from the locus of zero disparity, as shown in Fig. 11.2a. Thus, the difference in horizontal disparity between two points indicates the distance in depth between the points but not the absolute distance of either point. The horizontal disparity (η) between two points is proportional to their separation in depth (d) and to the interocular distance (a), and inversely proportional to the square of the absolute distance of
FIG. 11.1. The theoretical locus of zero horizontal disparity is a cylinder through the two eyes and the fixation point.
11.
DISPARITY-BASED STEREOPSIS
221
Point of convergence Range of detectable horizontal disparity
(a)
(b)
FIG. 11.2. Disparity as a function of distance from the plane of convergence. (a) Horizontal disparity increases with distance from the plane of convergence. (b) Along an oblique line of sight, vertical disparity decreases with increasing distance, and horizontal disparity increases with increasing distance from the plane of convergence. Horizontal disparity changes more rapidly than vertical disparity.
the points from the observer (D). η≈
ad D2
in radians
For correct perception, horizontal disparities must therefore be scaled by the square of distance. Objects depicted in stereoscopic systems often appear flattened, like cardboard cutouts. This can occur when the retinal images are the same size and have the same disparity as those created by the actual object. The effect occurs because the size of the retinal image of an object varies inversely with distance, whereas disparities created by an object vary inversely with distance squared. If the visual system uses an incorrect estimate of distance, the perceived ratio of size to depth of the object will be incorrect. Thus, if distance is underestimated, objects and surfaces should look flattened (Johnston, 1991). In most stereoscopic systems, distance, as indicated by accommodation and convergence on the plane of the stereoscopic images, is smaller than that in the original scene. Therefore, objects look flattened. The vertical disparity of the images of a point in space is their vertical separation in retinal coordinates. The locus of zero vertical disparity is, ideally, the horizontal plane of regard and the median plane of the head, as in Fig. 11.3. The vertical disparity of a point is a joint function of its eccentricity with respect to the median plane of the head and the plane of regard, and of its distance from the point midway between the eyes, as shown in Figure 11.4. The angles λ L and λ R subtended in the left and right eyes by a vertical line of length z, on the horizontal plane of regard at an eccentricity of x cm to the right
222
HOWARD
FIG. 11.3. The locus of zero vertical disparity.
P
z
x O λR
y
λL
P
a
FIG. 11.4. Vertical line OP is nearer to the right eye than to the left eye. Therefore, the right-eye image is larger than the left-eye image. In particular, the disparity of P depends on its height above the plane of regard (z), its distance from the median plane (x), and its distance from the eye (y). Points on the plane of regard or the median plane have zero vertical disparity because these points are an equal distance from the two eyes.
of the median plane, and distance y cm from the interocular axis are given by − 12 − 12 a 2 a 2 2 2 λ L = arctan z x + +y λ R = arctan z x − +y 2 2 The locus of points with zero horizontal disparity and zero vertical disparity is known as the space horopter. Ideally, with symmetrical convergence of the eyes, the space horopter is a circle (the Vieth-M¨uller circle) through the fixation point and the centres of the two eyes, and a vertical line through the fixation point, as shown in Fig. 11.5. Points with zero disparity appear fused and single. In practice, the horizontal (longitudinal) component of the locus of single vision may not be perfectly circular and the vertical component is inclined top away. Also, the
11.
DISPARITY-BASED STEREOPSIS
223
practical horopter has a certain thickness because the visual system tolerates a certain degree of disparity before diplopia occurs. These issues are discussed in some detail in Howard and Rogers (2002). A change in horizontal disparity over the whole binocular field of view evokes vergence but not an impression of changing depth (Erkelens and Collewijn 1985).
Patterns of Disparity Produced by Plane Surfaces Consider patterns of disparity over a textured plane surface. The size and direction of disparity at each location can be represented by a vector. Vectors on flat converged retinas can be constructed by joining corresponding points on superimposed images, as in Fig. 11.6. They resemble those produced on spherical retinas, which are difficult to draw. There are eight types of disparity in a flat surface. Horizontal Displacement Disparity A small horizontal disparity applied to one region of a surface creates a step in depth between the region and the zero-disparity surround. Horizontal disparity also evokes horizontal vergence.
True vertical Vertical horopter
Longitudinal horopter
Vieth-Müller circle
FIG. 11.5. The horizontal circle and the vertical line represent the theoretical space horopter, or locus of points that have zero horizontal and zero vertical disparity for a given angle of convergence of the eyes. Points lying on the horopter should appear single. In practice, the vertical component of the locus of single vision is inclined top away and the horizontal component may not be circular. Also, the locus of single vision has a certain thickness, as shown in the diagram.
224
HOWARD
Left image
Right image
(a)
Right image Left image
(b)
(c) FIG. 11.6. Patterns of disparities created by frontal, inclined and slanted surfaces on converged flat retinas. (Adapted from Howard & Rogers 1995) (a) A frontal surface produces horizontal size disparities of opposite sign on the left and right halves of the median plane. The upper and lower halves contain vertical-shear disparities of opposite sign. Disparities are small near the center. (b) A surface slanted about a vertical axis creates an overall horizontal-size disparity and vertical-shear disparities similar to those created by a frontal surface. (c) A surface inclined top away creates an overall horizontal-shear disparity and vertical-shear disparities similar to those created by a frontal surface.
11.
DISPARITY-BASED STEREOPSIS
225
Vertical Displacement Disparity A vertical displacement disparity has no effect on perceived depth except that a large disparity creates diplopia and destroys depth impressions. Vertical displacement disparity is a stimulus for vertical vergence. Horizontal-Size Disparity Horizontal magnification of one eye’s image (R) creates a horizontal gradient of horizontal disparity (Fig. 11.6b). This causes a frontal surface to appear slanted in depth about a vertical axis by an angle given by θ = arctan
2D (R − 1) a (R + 1)
where a is the interocular distance and D the viewing distance. On a frontal surface, horizontal-size disparities of opposite sign occur on opposite sides of the median plane (Fig. 11.6a). Vertical-Size Disparity Vertical magnification of one eye’s image of a surface creates a vertical gradient of vertical disparity. This causes a frontal surface to appear slanted about a vertical axis in a direction opposite to slant created by horizontal magnification of the image. Ogle (1938) called this the induced effect. The induced effect can be explained by stating that the perception of slant about a vertical axis depends on size-deformation disparity, or the difference between horizontal size disparity and vertical-size disparity. Vertical-size disparity over the binocular field arises only from meridional aniseikonia, or unequal optical magnification of the images along the vertical meridian. A frontal surface contains opposite gradients of vertical size disparity on either side of the median plane (Fig. 11.6a). Overall-Size Disparity Overall magnification of one eye’s image can be regarded as equal horizontal and vertical magnification. The depth effects produced by the two components tend to cancel, leaving little if any impression of depth. Another way to explain this is to say that rotation disparity has no component of size-deformation disparity. Overall size disparity arises only from aniseikonia, or unequal but uniform optical magnification of the images in the two eyes. Horizontal-Shear Disparity Horizontal shear of one eye’s image relative to the other eye’s image creates a vertical gradient of horizontal disparity. It occurs on a surface inclined about a horizontal axis and therefore creates an impression of inclination, i, when displayed
226
HOWARD
in stereoscopic systems (Fig. 11.6c). Theoretically i = arctan
2D tan θ2 a
For small disparities i = arctan
Dθ in radians a
Where θ is the angle of shear disparity, a the interocular distance, and D the viewing distance. Vertical-Shear and Rotation Disparity Vertical shear of one eye’s image creates a horizontal gradient of vertical disparity. On a frontal surface, horizontal gradients of vertical-shear disparity occur with opposite sign above and below the plane of regard (Fig. 11.6a). These gradients were previously described as gradients of vertical-size disparity of opposite sign on either side of the median plane. Homogeneous vertical-shear disparity does not occur on natural surfaces. Nevertheless, when imposed on images in a stereoscope, it causes a surface to appear inclined in depth about a horizontal axis in a direction opposite to that created by horizontal shear of the images (Cagenello & Rogers, 1990; Howard & Kaneko, 1994; Rogers 1992). This is an induced effect in the domain of shear disparity. Inclinations of a random-dot surface produced by the two types of shear disparity are shown in Fig. 11.7a. The curves have been superimposed but, in fact, they run in opposite directions. It can be seen that the inclinations produced by the two types of disparity are similar in magnitude up to a disparity of about 3 deg. They are less than the theoretical values, which is most likely due to the presence of conflicting perspective information, which indicates that the surface remains frontal. Beyond that value, inclination produced by vertical-shear disparity falls off. This is due to the greater effect of conflicting perspective on depth created by vertical disparity than on that created by horizontal disparity (Banks & Backus, 1998). It can be seen from Fig. 11.7b that rotation of one image relative to the other creates little or no inclination. A rotation disparity is a combination of horizontalshear and vertical-shear disparities of equal sign, as illustrated in Fig. 11.8, and their opposed effects cancel. Figure 7b also shows that a combination of horizontal- and vertical-shear disparities of opposite sign (deformation) creates more inclination than either component alone. The above results may be summarized by stating that the perception of inclination about a horizontal axis depends on shear-deformation disparity in the same way that slant about a vertical axis depends on size-deformation disparity. Figure 11.8 depicts the types of disparity on flat surfaces that can be created in a stereoscope. Displacement disparities, like simple point disparities, are zero-order spatial derivatives of disparity. Magnification and shear disparities are disparity gradients, or first spatial derivatives of displacement disparity. Higher
Perceived inclination (deg)
11.
60
DISPARITY-BASED STEREOPSIS
60 Horizontal-shear
Rotation
Vertical-shear
Deformation
30
30
0
0
-30
-30
-60
227
-60 -12
-6
0
6
12
-12
-6
0
6
12
Shear disparity (deg)
Horizontal-shear disparity (deg)
(a)
(b)
FIG. 11.7. Perceived inclination of an 85-deg × 65-deg random-dot surface as a function of (a) horizontal-shear and vertical-shear disparity and (b) rotation and deformation disparity. Inclination produced by vertical-shear disparity is reversed for comparison with that produced by horizontal-shear disparity. (Adapted from Howard & Kaneko, 1994).
Horizontal disparities
Horizontal disparity Depth step.
Horizontal-size disparity Slant about vertical
Horizontal-shear disparity Inclination about horizontal
Vertical disparities
Combined disparities
Vertical disparity No depth
Vertical-size disparity Opposite slant
Vertical-shear disparity Opposite inclination
Overall-size disparity Little depth
Rotation disparity Little or no depth
FIG. 11.8. Types of disparities and the depth effects they create.
228
HOWARD Type of disparity
Type of vergence
Horizontal disparity
Horizontal vergence
Vertical disparity
Vertical vergence
Rotation disparity
Cyclovergence
Horizontal-shear component
Little cyclovergence
Vertical shear component
Cyclovergence
FIG. 11.9. Stimuli for vergence eye movements.
spatial derivatives of disparity create impressions of surface curvature but are not considered in this chapter. Displacement and shear disparities serve as stimuli for vergence and are therefore reduced when vergence occurs, as shown in Fig. 11.9. Size disparities do not evoke vergence. FUNCTIONS OF VERTICAL DISPARITY It has been generally assumed that depth is coded only by horizontal disparity. We will now see that vertical-size and vertical-shear disparities have several important functions in the perception of depth.
11.
DISPARITY-BASED STEREOPSIS
229
Control of Vergence Overall elevation of one image relative to the other (vertical translation disparity) arises only when the eyes are out of vertical alignment and induces a corrective vertical vergence (Howard, Allison, & Zacher, 1997). Rotation of one image relative to the other occurs only when the eyes are rotationally misaligned, as in cyclophoria, and induces a corrective cyclovergence, as illustrated in Fig. 11.9 (Howard & Zacher, 1991). Horizontal-shear disparity is used for coding inclination and should therefore not be cancelled by cyclovergence. Vertical-shear disparity over the whole binocular field occurs only when the eyes are misaligned. It is therefore the vertical-disparity component of rotation disparity that should evoke cyclovergence. Rogers and Howard (1991) confirmed that cyclovergence is evoked much more strongly by vertical-shear disparity than by horizontal-shear disparity. Global Vertical-Size Disparity and Aniseikonia Koenderink and van Doorn (1976) proposed that the perceived slant of a surface depends on the deformation component of disparity. For size disparity, deformation disparity corresponds to the difference between horizontal-size disparity and vertical-size disparity. The deformation theory accounts for Ogle’s induced effect in which vertical-size disparity produces an opposite slant to that produced by horizontal-size disparity. With overall magnification of one image, the two opposed effects cancel to produce little or no slant, depending on the relative weights that individuals assign to the two types of disparity. The use of global deformation disparity makes the stereoscopic system immune to effects of differences in the sizes of the binocular images (aniseikonia), because an overall magnification of one image produces no deformation disparity. This mechanism requires that an average value of vertical-size disparity from the whole binocular field be applied to each locally detected horizontal disparity. The deformation-disparity mechanism also renders the visual system immune to effects of slight differences in optical magnification in a stereoscopic virtual reality system. Gradients of Vertical-Size Disparity as a Cue to Absolute Distance Vertical size disparities arise locally because a vertical line away from the median plane of the head is nearer one eye than the other and thus projects a larger image in one eye. For a given viewing distance, the vertical disparity of a point on a frontal surface increases with increasing eccentricity from the median plane and with increasing distance from the horizontal plane of regard, as shown in Fig. 11.6a. For a given oblique line of sight of one eye, vertical disparity decreases with increasing viewing distance and is zero at infinity. In 1970, I argued that if eccentricity
230
HOWARD
Images on coplanar retinas
Images on converged or unconverged retinas
Images on converged retinas
(a) On flat retinas, disparity gradients vary with vergence.
(b) On spherical retinas, disparity gradients do not vary with vergence.
FIG. 11.10. Effect of vergence changes on disparity.
is known, vertical disparities “could form the basis for judgments of absolute distance.” Mayhew and Longuet-Higgins (1982) developed a computational theory of how vertical disparities may provide an estimate of the distance of a surface and of the angle of eccentric gaze. At a given distance from the median plane, the vertical disparities can be expressed as a vertical-size ratio defined as the size of the image of a vertical line element in one eye divided by the size of the image of the element in the other eye. With flat retinas, the gradient of vertical-size disparity produced by a frontal surface varies with the angle of convergence of the retinas, as shown in Fig. 11.10a. When flat retinas are coplanar, there are no gradients of vertical-size disparity at any distance. With spherical retinas, the gradient of vertical-size disparity produced by a frontal surface does not vary with changes in vergence, as shown in Fig. 11.10b. The gradient changes only with viewing distance and can therefore be used to indicate viewing distance (Howard 1970). For spherical retinas, the vergence angle, and hence the viewing distance, could also be derived from nonvisual information arising in the extraocular muscles. Cumming, Johnston, and Parker (1991) failed to find an effect of vertical-size disparity on apparent distance. However, they used a small display in which vertical disparities were probably too small to be detected. Rogers and Bradshaw (1993) used a large display and found that, when the pattern of vertical-size disparities was appropriate to a surface at infinity, the surface appeared to lie at a greater distance than when the vertical-size disparities were appropriate to a surface at 28 cm.
11.
DISPARITY-BASED STEREOPSIS
231
Vertical-Size Disparities and Distance Scaling of Horizontal Disparity Because the horizontal disparity produced by a fixed depth interval is inversely proportional to the square of viewing distance, horizontal disparities must be scaled by distance squared. Estimates of distance may be based on vergence and monocular cues to distance, such as perspective. However, distance information for an extended surface could also be provided by the pattern of vertical-size disparity. Rogers and Bradshaw (1992, 1993) demonstrated that gradients of vertical-size disparity can be used to scale depth defined by horizontal disparity. They created a large surface with depth corrugations defined by horizontal disparity. When vertical-size disparity was zero, as it is in a surface at infinity, the surface appeared far away and the depth corrugations appeared deep. When the pattern of verticalsize disparities was appropriate to a surface at 28 cm from the observer, the surface appeared near and the depth corrugations appeared only about half as deep. This depth scaling effect worked only for surfaces larger than 10 deg in diameter. With smaller surfaces, the vertical disparities were too small to be detected and vergence was used to scale distance. In a stereoscope, the disparity between a given pair of images decreases in proportion to the distance of the screen on which the images are presented rather than in proportion to the square of the distance. This can cause objects to appear compressed in depth.
Vertical-Size Disparity and Surface Slant Horizontal disparities are zero for a vertical surface contained in the horizontal horopter and are constant on any vertical surface that is concentric with the horizontal horopter. The pattern of horizontal disparity on a surface is ambiguous because it varies with the slant of the surface with respect to the frontal plane, with the distance of the surface from the horopter, and with eccentricity. For example, a frontal surface in the midline is also the tangent to the horopter and the normal to the cyclopean axis so that the same pattern of disparity signifies the same angle of slant with respect to each plane. At an eccentric angle θ , a frontal surface is at an angle of θ to the orthogonal to the cyclopean axis and at an angle of 2θ to the tangent to the horopter (see Fig. 11.11). The slant signified by a given pattern of horizontal disparity can be known only if the angle of eccentricity is known. There are three ways to resolve this ambiguity. First, one can use other cues to depth, such as perspective, that are not affected by eccentricity. Second, one could derive information about the direction of a surface from extraocular muscles that control vergence and version. Third, one can use the pattern of vertical-size disparity. The horizontal disparity between the ends of a line can be denoted by the length of the image in one eye divided by the length of the image in the other eye, or the horizontal-size ratio. At any location on a surface, the relationship between the
232
HOWARD
E
Orthogonal to cyclopean axis
Tanget to Vieth-Müller circle
P
Frontal plane
θ θ
D θ 2θ Vieth-Müller circle
Cy clo p
ea
na
xis
C
θ
O FIG. 11.11. Principal planes through an eccentric point. Angle EPD is 2θ . That is, the tangent to the Vieth-Muller ¨ circle makes an angle of 2θ to the frontal plane. Also, the orthogonal to the cyclopean axis makes an angle θ to both the frontal plane and the tangent to the Vieth-Muller ¨ circle.
horizontal-size ratio and the vertical-size ratio specifies the slant of the surface with respect to each of the three principal planes. For example, for a vertical surface patch normal to the cyclopean axis, the vertical-size ratio equals the horizontal-size ratio because both are affected in the same way by eccentricity and distance. The loci of constant vertical-size ratio of a vertical line and of the horizontal-size ratio of a horizontal line orthogonal to the cyclopean axis are a series of circles in the two quadrants of the visual field. One circle is shown in Fig. 11.12. For any point P on the locus, let r be the ratio of the distances from the two eyes (the vertical-size ratio) and a be the interocular distance. When P is aligned with the two eyes at P’ or P” r=
a + x1 a − x2 = x1 x2
from which
x1 =
a r −1
and
The radius of the locus of constant vertical size ratio is therefore Radius =
ar x1 + x2 = 2 2 r −1
x2 =
a r +1
11.
DISPARITY-BASED STEREOPSIS
233
P
Median plane
P’ Left eye
P” Right eye x2 a
x1
FIG. 11.12. A locus of equal vertical-size ratio of a vertical line and of equal horizontalsize ratio of a line orthogonal to the cyclopean axis.
Vieth-Müller circles
Median plane
Loci of constant size ratios P 0.01
Distance (metres)
3
0.015
2 0.02 A
B 0.03 0.04 0.05 0.07 0.1
1
0
3
2
1
0 1 Eccentricity (metres)
2
3
FIG. 11.13. Loci of constant vertical- and horizontal-size ratios for surface patches orthogonal to cyclopean lines of sight. The numbers on OP are the vertical-size ratios. The eyes are at O. The Vieth-Muller ¨ circles are loci of constant horizontal disparity.
From this, one can construct a family of loci, as in Fig. 11.13. The eyes are at point O, and the loci are shown only in the binocular visual field because they have no meaning outside this field. The size ratios decrease with increasing distance along any cyclopean line of sight, such as OP. Also, they increase with increasing eccentricity along any frontal line, such as line AB. The loci of constant size
234
HOWARD
ratios intersect all the Vieth–M¨uller circles at right angles. A person could set a surface to be normal to the cyclopean line by simply rotating it until the verticaland horizontal-size ratios are equal. The magnitude of the ratios specifies the eccentricity of the surface at that distance. To take another example, if the horizontal size ratio is one, the surface patch must be tangential to the horizontal horopter, and its eccentricity, θ, is indicated by the vertical size ratio. Its angle to the frontal plane is 2θ, and its angle to the normal to the cyclopean line of sight is θ. On a frontal surface, the horizontal-size ratio equals the vertical-size ratio squared. People can use this relationship to distinguish between a frontal surface and a surface curved about a vertical axis, as long as the surface is large (Rogers & Bradshaw, 1995). Horizontal disparities fully specify the curvature of a surface about a horizontal axis when scaled for distance. There is therefore no need to use vertical disparities in judging curvature in depth about a horizontal axis. Vertical-Shear Disparity and Image Misalignment Horizontal and vertical disparities induce horizontal and vertical vergences, which keep the eyes properly converged. Rotation disparity, and particularly the verticalshear component, induce cyclovergence, which keeps the eyes torsionally aligned. But the gain of cyclovergence is not sufficient to maintain exact alignment of the eyes, and some people have a weak response to excyclodisparity (Howard & Kaneko, 1994). Further, the eyes fail to remain torsionally aligned in oblique gaze (Somani, Dezousa, Tweed, & Vilis, 1998). The use of shear-deformation disparity protects the visual system from the effects of image misalignment because image misalignment produces rotation disparity that contains no deformation. Thus, by using the difference between vertical- and horizontal-shear disparities (deformation disparity) the visual system insulates depth perception from effects of image misalignment. Cyclovergence and use of shear-deformation protects the visual system from the effect of small rotational misalignments of the images in a stereoscopic virtual reality system. However, as we shall see, it does not protect it against the affects of rotational misalignment of part of a stereoscopic display.
LIMITATIONS AND DISTORTIONS OF STEREOPSIS We must be able to detect local variations in horizontal disparity because we can readily perceive local variations in relative depth. We do not need to detect local variations in vertical disparity for following reasons: 1. Vertical disparity varies over the visual field more slowly than horizontal disparity. It changes suddenly only where there is at a steep step in depth in the
11.
DISPARITY-BASED STEREOPSIS
235
periphery of the visual field. But, at such locations, it is always accompanied by a much larger change in horizontal disparity, which is adequate to code relative depth. 2. Large steps in vertical disparity would not be detected because the larger accompanying horizontal disparity would take the images beyond the range of disparity detectors. 3. None of the uses that I have described for vertical disparity require the detection of local variations in disparity. One need detect only an average value or a mean gradient of vertical disparity over a large area. The following evidence shows that while horizontal-shear disparity is detected in each location of the visual field, vertical disparity is extracted as an average value over a large area of the binocular field.
Spatial Resolution of Disparity The spatial resolution of the disparity system is indicated by the highest spatial frequency of disparity modulation of a random-dot display that creates an impression of an undulating surface. For horizontal disparity, depth sensitivity is maximum at a spatial frequency of about 0.3 c/deg and falls to zero at about 4 c/deg (Rogers & Graham, 1982; Tyler, 1974). Finer modulations of disparity are averaged by the visual system (Parker & Yang, 1989; Stevenson, Coormack, & Schor, 1991). We know that vertical-size disparity is not averaged over the whole binocular field because Rogers and Koenderink (1986) obtained induced effects produced by vertical-size disparities of opposite sign simultaneously in right and left halves of a textured display. On the other hand, there must be a considerable degree of averaging of vertical disparity, because Stenton, Frisby, and Mayhew (1984) found that vertical-size disparity in one part of a display produced slant of the whole display. Kaneko and Howard (1997a) found that depth is not seen in spatial modulations of vertical-size disparity higher than about 0.04 c/deg. Thus, vertical-size disparities are averaged over an area about 20 deg in diameter. An average value is sufficient because vertical disparities change only gradually over surfaces. Sensitivity to spatial modulations of vertical-shear disparity has not been determined by this method, but the evidence reviewed in what follows suggests that vertical-shear disparity is averaged over the whole binocular field. I recently obtained further support for the idea that a global estimate of vertical disparity is applied to each locally detected horizontal disparity. Two surfaces with opposite horizontal size disparities appeared as two intersecting surfaces slanting in opposite directions. When the vertical-size disparity of both surfaces was varied by the same amount, the surfaces appeared to rock together about a vertical axis but the relative angle between them remained constant.
236
HOWARD 40 Horizontal-shear only
Vertical-shear only
20
Perceived inclination (deg)
0
-20
-40 -5
-2.5
0
2.5
5
5
2.5
0
-2.5
-5
Vertical-shear disparity (deg) (b)
Horizontal-shear disparity (deg) (a) 40 Rotation 20
Display diameter 60° 30° 10°
0
-20
-40 -5
-2.5
0
2.5
5
Rotational disparity (deg) (c)
FIG. 11.14. Perceived inclination of a random-dot surface as a function of the diameter of the surface, for (a) horizontal-shear disparity, (b) vertical-shear disparity, and (c) rotation disparity (Adapted from Howard & Kaneko, 1994).
Effects of Stimulus Size Figure 11.14a shows that inclination produced by horizontal-shear disparity of a random-dot surface is reduced only slightly as the size of the display is reduced from 60–10 deg (Howard & Kaneko, 1994). Figure 11.14b shows that inclination produced by vertical-shear disparity declines rapidly as display size is reduced. This is because the vertical disparity signal is weak when summed over a small area. Figure 11.14c shows that slant produced by rotation disparity increases as the size of the display is reduced in size. As the mean vertical-disparity component weakens, the horizontal-disparity component produces slant.
11.
DISPARITY-BASED STEREOPSIS
237
Interactions Between Adjacent Displays Three types of depth interaction occur between stimuli in distinct depth planes. Depth Enhancement Slant, or inclination of a surface in depth, is more accurately and rapidly perceived in the presence of a second surface in another depth plane. This is depth enhancement. For example, a change in horizontal disparity applied to a surface (zero-order spatial derivative) is perceived as a change in depth only when a stationary reference object is present (Erkelens & Collewijn, 1985; Regan, Erkelens, & Collewijn, 1986). The boundary between the two stimuli introduces a higher order spatial derivative of disparity. A change in disparity over the whole visual field is most likely due to changes in vergence and is best ignored as a signal for depth. Similarly, the inclination, or slant, of a large surface produced by a first-order spatial derivative of disparity is underestimated and takes a long time to perceive in the absence of a differently oriented reference surface, which provides a second-order depth discontinuity (Gillam, Flagg, & Finley, 1984; Gogel 1965; van Ee & Erkelens 1996). Depth contrast The second type of depth interaction is apparent depth created in a frontalplane display by a superimposed or adjacent display containing a step or gradient of depth. Werner (1938) called this depth contrast. There is contradictory evidence about the occurrence of depth contrast created by steps of disparity. Anstis (1975) observed depth contrast between a disc surrounded by a surface with uncrossed disparity and a coplanar disc surrounded by a surface with crossed disparity. However, Brookes and Stevens (1989) could not see this effect. Graham and Rogers (1982) observed that frontal planes on either side of a horizontal step of disparity appeared inclined in depth and that a frontal surface flanked above and below by inclined induction surfaces appeared inclined in the opposite direction (see also Brookes & Stevens, 1989). Anstis and colleagues (1978) reported the depth analogue of the Craik–O’Brien–Cornsweet illusion. Stimulus duration is an important factor in depth contrast. Kumar and Glaser (1993) used a pair of test dots within a surrounding frame and found that the largest contrast occurred with the shortest exposure of 10 msec, as Werner (1937) had originally noted. With exposures of several seconds, depth contrast almost completely disappeared. Depth enhancement and depth contrast may be two aspects of the same processes, namely, a tendency to underestimate the slant or inclination of a surface relative to the frontal plane (in a headcentric coordinate system) coupled with accurate registration of the relative disparity between two surfaces. According to this theory, the greater the underestimation of depth in a slanted or inclined induction surface, the greater should be the depth contrast in a frontal surface.
238
HOWARD
The following evidence supports this theory. We found that the large inclination produced by horizontal-shear disparity induced little inclination contrast, whereas the smaller inclination produced by rotation disparity induced a significant degree of contrast in an adjacent or superimposed display with zero disparity. In both cases, more contrast was induced by ceiling surfaces, in which inclination was underestimated, than by floor surfaces, in which inclination was more accurately perceived (Pierce et al., 1998). Werner (1938), noticed that contrast was strongest when depth in the induction stimulus was not perceived. Also, depth contrast is strong during the initial period of stimulus exposure when depth of the induction stimulus is not fully apparent (Kumar & Glaser, 1993). Van Ee and Erkelens (1995) obtained depth contrast both from shear disparity and from size disparity when the inclination or slant of the induction surface was not evident. Finally, the visual system is less sensitive to slant about a vertical axis, which produces strong contrast, than to inclination about a horizontal axis, which produces weak or no contrast (Cagenello & Rogers, 1993; Gillam et al., 1984; Mitchison & McKee, 1990; Rogers & Graham, 1983). I propose the following explanation of the above results. Relative disparity signals are processed even though the inclinations or slants of surfaces in bodycentric coordinates are weakly registered. Relative disparity indicates only that two surfaces have different inclinations or slants. In the absence of information to the contrary, one tends to perceive surfaces as symmetrically inclined or slanted with respect to the frontal plane. Interactions Between Local Horizontal Disparity and Global Vertical Disparity The third type of interaction between surfaces arises from the fact that the perceived inclination and slant of a surface are derived from deformation disparity, or the difference between local horizontal disparity and global vertical disparity. Horizontal disparities in adjacent or superimposed surfaces are detected independently and enable one to perceive distinct surfaces and sharp transitions between them. Vertical disparities are averaged over an area, which smoothes the boundaries between adjacent surfaces and causes superimposed surfaces to appear as one. The slant boundary between a surface with zero disparity and an adjacent surface with horizontal-size disparity appears sharp because horizontal disparities are processed locally. The boundary between a zero-disparity surface and a surface with vertical-size disparity appears curved because vertical-size disparity is averaged across the boundary (Pierce & Howard, 1997). Similarly, a zero-disparity surface produces a sharp inclination boundary with an adjacent surface with horizontal-shear disparity. The boundary between a zerodisparity surface and a surface with vertical-shear disparity cannot be perceived, and the two surfaces appear virtually as one (Howard & Pierce, 1998). Verticalshear disparity is averaged over a wider area than vertical-size disparity.
11.
DISPARITY-BASED STEREOPSIS
239
40 Vertical-shear only
Horizontal-shear only 20
Perceived inclination (deg)
0
-20
-40 -5
-2.5
0
2.5
5
Horizontal-shear disparity (deg)
5
2.5
0
-2.5
-5
Vertical-shear disparity (deg)
40 Rotation 20
black surround zero-disparity textured surround
0
-20
-40 -5
-2.5
0
2.5
5
Horizontal-shear disparity (deg)
FIG. 11.15. The effect of a 15-deg-wide zero-disparity annulus on the perceived inclination of a 30-deg central random-dot surface with (a) horizontal-shear disparity, (b) vertical-shear disparity, and (c) rotation disparity. Mean of three subjects (Adapted from Howard & Kaneko, 1994)
A surface with overall-size disparity appears slightly slanted about a vertical axis because the horizontal-disparity component is weighted more than the vertical-disparity component. However, a surface with overall-size disparity appears strongly slanted in the direction of the horizontal-disparity component when a zero-disparity surface is placed next to it or superimposed on it (Pierce & Howard, 1997). In this case, the horizontal-disparity component is assessed in terms of mean vertical-size disparity over both displays. The zero-vertical disparity of the zero-disparity display dilutes the mean estimate of vertical-size disparity. Similarly, a surface with rotation disparity appears frontal or nearly frontal when presented alone but strongly inclined about a horizontal axis when a zero-disparity is placed next to it or superimposed on it (Howard & Pierce 1998). Figure 11.15a shows that addition of a zero-disparity random-dot surrounding annulus increases the perceived inclination produced by horizontal-shear disparity
240
HOWARD
in a central textured disc subtending 30 deg (Howard & Kaneko, 1994). Increasing the width of the surrounding annulus from 5–30 deg had very little effect (Kaneko & Howard, 1997b). The boundary between the test patch and the zero-disparity surround is a second spatial derivative of disparity which enhances the effect produced by the first-order derivative of the test patch (Gillam et al., 1984). Figure 11.15b shows that addition of a zero-disparity surround severely reduces the inclination produced by vertical-shear disparity. Inclination is reduced because the vertical disparity within the display is averaged with the zero-disparity of the larger surround. This accounts for why Gillam and Rogers (1991) failed to find the shear-disparity-induced effect. Finally, Fig. 11.15c shows that addition of a zero-disparity surround causes a display with rotation disparity to appear strongly inclined about a horizontal axis in the direction of the horizontal component. This effect occurs because the addition of the zero-disparity surround reduces the average vertical-shear disparity over the whole binocular field. The locally derived horizontal component of rotation disparity within the central test disc is assessed in terms of this reduced estimate of vertical-shear disparity. Similar effects on perceived slant were produced by addition of a zero-disparity surround to each of the three types of size disparity (Kaneko & Howard, 1996). However, the zero-disparity surround did not reduce slant induced by verticalsize disparity as much as it reduced that induced by vertical-shear disparity. This difference reflects the fact that the vertical-size disparity is averaged over a smaller area than vertical-shear disparity. Effects of Mixed Disparities Figure 11.16a shows that addition of dots with zero disparity to a display of dots with 4% horizontal-size disparity creates two superimposed surfaces slanted at different angles. The surface defined by zero-disparity dots appears slanted in the opposite direction to the surface with disparity. This is slant contrast (Gillam, Chambers, & Lawergren, 1988a; Gillam Chambers, & Russo, 1988b). The two surfaces segregate because horizontal disparities are processed locally and then integrated into distinct global patterns that create depth transparency. This does not happen with vertical disparity. Figure 11.16b shows that addition of zerodisparity dots to a dot display with vertical-size disparity creates one surface with a slant between that created by the zero-disparity dots alone and that created by the disparity dots alone (Kaneko & Howard, 1996). In this case, the vertical-size disparities are averaged to an intermediate value. Figure 11.17a shows that addition of zero-disparity dots to a display with horizontal-shear disparity creates two superimposed surfaces, one inclined and one frontal. The two surfaces segregate because horizontal-shear disparities are processed locally. Figure 11.17b shows that addition of zero-disparity dots to a display of dots with vertical-shear disparity creates one surface that increases in apparent inclination as the percentage of disparity dots increases. However, addition
11.
DISPARITY-BASED STEREOPSIS
Perceived slant (deg) Right-facing Left-facing
40 30
Horizontal-size disparity
241
Dots with disparity
20 10
Dots with zero disparity
0 -10 -20 -30 -40 100 75 50 Left image smaller
25 0 25 % disparate dots
50 75 100 Right image smaller
(a) Left-facing
Perceived slant (deg)
30
Right-facing
40
-10
Vertical-size disparity
20 10 0 All dots
-20 -30 -40 100 75 50 Left image smaller
25 0 25 % disparate dots
50 75 100 Right image smaller
(b) FIG. 11.16. Perceived slant(s) produced by 4% of (a) horizontal-size and (b) verticalsize disparity as a function of the percentage of disparate dots to zero-disparity dots. For horizontal-size disparity, the two sets of dots created two distinct surfaces. For vertical disparity, the dots created a single slanted surface. Mean results of three subjects. Error bars are SEMs (Adapted from Kaneko & Howard, 1996)
of 25–50% of zero-disparity dots has no effect. Dots with vertical-shear disparity are given more weighting than zero-disparity dots in the disparity averaging process (Adams et al., 1996; Kaneko & Howard, 1997b). Thus, like vertical-size disparities, vertical-shear disparities are averaged over a wide area to produce a weighted mean disparity. Figure 11.17c shows that addition of zero-disparity dots to a display of dots with rotation disparity creates two surfaces inclined strongly in opposite directions. The surfaces segregate because each display of dots has distinct horizontal-shear disparities. Once the proportion of zero-disparity dots reaches a threshold value, the average vertical-shear disparity obtained from both displays reduces the perceived inclination of the display with disparity and induces an impression of opposite inclination in the zero-disparity display.
242
HOWARD
Perceived inclination (deg) Top near Top away
40
Horizontal-shear disparity Disparite dots
30 20 10
Zero-disparity dots
0 -10 -20 -30 -40 100
75 50 Inward shear
25 0 25 % disparate dots
(a)
Perceived inclination (deg) Top near Top away
40 30
50 75 100 Outward shear
Vertical-shear disparity
20 10
All dots
0 -10 -20 -30 -40 100
75 50 Inward shear
25 0 25 % disparate dots
75 100 50 Outward shear
(b) Perceived inclination (deg) Top near Top away
40 30 20
Rotation disparity Zero-disparity dots
10 0 -10 -20 -30 -40 100
Disparate dots 75 50 Inward rotation
25 0 25 % disparate dots
50 75 100 Outward rotation
(c) FIG. 11.17. Perceived inclination(s) produced by 2.3 deg of (a) horizontal-shear, (b) vertical-shear and (c) rotation disparity as a function of the percentage of disparate dots of the total dots. For horizontal and rotation disparities, the two sets of dots created two distinct surfaces. For vertical disparity, the dots created a single inclined surface. Mean results of three subjects (Adapted from Kaneko & Howard, 1997b)
11.
DISPARITY-BASED STEREOPSIS
243
Anisotropy Between Slant and Inclination An anisotropy between slant and inclination has been noted by several investigators. For example, inclination induced by horizontal-shear disparity is more rapidly and accurately perceived than slant induced by horizontal-size disparity (Cagenello & Rogers 1993; Gillam et al., 1984; Mitchison & McKee, 1990). Also, the depth analogue of the Craik–O’Brien–Cornsweet illusion is larger when the disparity discontinuity is vertical (involving size disparity) than when it is horizontal (involving shear disparity; Rogers & Graham, 1983). We obtained depth contrast from a surface slanted about a vertical axis but not from a surface inclined about a horizontal axis (Howard & Pierce, 1998; Pierce & Howard, 1997). Implications for Virtual Reality Systems These effects have implications for virtual reality systems. A virtual reality system is immune to a small horizontal or vertical misalignment of the images because these are corrected by horizontal and vertical vergence. Such a system is also immune to rotational misalignment of the images because this is corrected partly by cyclovergence and partly by the shear-deformation-disparity mechanism. A differential magnification of the images to the two eyes cannot be corrected by vergence but is corrected by the size-deformation-disparity mechanism. However, a relative horizontal or vertical shear of the images will introduce distortions of three-dimensional scenes. Also a meridional magnification of one image will introduce distortions (Ogle, Martens, & Dyer, 1967). In a system in which a virtual stereoscopic display is superimposed on a real background, misalignment or differential magnification of the stereoscopic images will distort the scene because the vertical disparity component of the misaligned images will be applied to the local horizontal disparity components in both the stereoscopic and real scenes. REFERENCES Adams, W., Frisby, J. P., Buckley, D., G˚arding, J., Hippisley-Cox, D., & Porrill, J. (1996). Pooling of vertical disparities by the human visual system. Perception, 25, 165–176. Anstis, S. M. (1975). What does visual perception tell us about visual coding. In C. Blakemore & M. S. Gazzaniga (Eds.), Handbook of psychobiology (pp. 269–323). New York: Academic Press. Anstis, S. M., Howard, I. P., & Rogers, B. (1978). A Craik–Cornsweet illusion for visual depth. Vision Research, 18, 213–217. Banks, M. S., & Backus, B. T. (1998). Extra-retinal and perspective cues cause the small range of the induced effect. Vision Research, 38, 187–194. Brookes, A., & Stevens, K. A. (1989). The analogy between stereo depth and brightness. Perception, 18, 601–614. Cagenello, R., & Rogers, B. J. (1990). Orientation disparity, cyclotorsion, and the perception of surface slant. Investigative Ophthalmology and Visual Science, 31(Abstracts), 97. Cagenello, R., & Rogers, B. R. (1993). Anisotropies in the perception of stereoscopic surfaces: The role of orientation disparity. Vision Research, 33, 2189–2201.
244
HOWARD
Cumming, B. G., Johnston, E. B., & Parker, A. J. (1991). Vertical disparities and the perception of three–dimensional shape. Nature, 349, 411–413. Erkelens, C. J., & Collewijn, H. (1985). Eye movements and stereopsis during dichoptic viewing of moving random–dot stereograms. Vision Research, 25, 1689–1700. Gillam, B., Chambers, D., & Lawergren, B. (1988a). The role of vertical disparity in the scaling of stereoscopic depth perception: an empirical and theoretical study. Perception and Psychophysics, 44, 473–483. Gillam, B., Chambers, D., & Russo, T. (1988b). Postfusional latency in slant perception and the primitives of stereopsis. Journal of Experimental Psychology: Human Perception and Performance, 14, 163–175. Gillam, B., Flagg, T., & Finley, D. (1984). Evidence for disparity change as the primary stimulus for stereoscopic processing. Perception and Psychophysics, 36, 559–564. Gillam, B., & Rogers, B. J. (1991). Orientation disparity, deformation, and stereoscopic slant perception. Perception, 20, 441–446. Gogel, W. C. (1965). Equidistance tendency and its consequences. Psychological Bulletin, 64, 153–163. Graham, M. E., & Rogers, B. J. (1982). Simultaneous and successive contrast effects in the perception of depth from motion–parallax and stereoscopic information. Perception, 11, 247–262. Howard, I. P. (1970). Vergence, eye signature, and stereopsis. Psychonomic Monograph Supplements, 3, 201–204. Howard, I. P., Allison, R., & Zacher, J. E. (1997). The dynamics of vertical vergence. Experimental Brain Research, 116, 153–159. Howard, I. P., & Kaneko, H. (1994). Relative shear disparities and the perception of surface inclination. Vision Research, 34, 2505–2517. Howard, I. P., & Pierce, B. J. (in press). Types of shear disparity and the perception of surface inclination. Perception, 27, 129–145. Howard, I. P., & Rogers, B. J. (1995). Binocular vision and stereopsis. New York: Oxford University Press. Howard, I. P., & Rogers, B. J. (2002). Seeing in depth. Toronto: I. Porteous. Howard, I. P., & Zacher, J. E. (1991). Human cyclovergence as a function of stimulus frequency and amplitude. Experimental Brain Research, 85, 445–450. Johnston, E. B (1991). Systematic distortions of shape from stereopsis. Vision Research, 31, 1351–1360. Kaneko, H., & Howard, I. P. (1996). Relative size disparities and the perception of surface slant. Vision Research, 36, 1919–1930. Kaneko, H., & Howard, I. P. (1997a). Spatial limitation of vertical-size disparity processing. Vision Research, 37, 2871–2878. Kaneko, H., & Howard, I. P. (1997b). Spatial properties of shear disparity processing. Vision Research, 37, 315–324. Koenderink, J. J., & van Doorn, A. J. (1976). Geometry of binocular vision and a model for stereopsis. Biological Cybernetics, 21, 29–35. Kumar, T., & Glaser, D. A. (1993). Temporal aspects of depth contrast. Vision Research, 33, 947–957. Mayhew, J. E. W., & Longuet–Higgins, H. C. (1982). A computational model of binocular depth perception. Nature, 297, 376–378. Mitchison, G. J., & McKee, S. P. (1990). Mechanisms underlying the anisotropy of stereoscopic tilt perception. Vision Research, 30, 1781–1791. Ogle, K. N. (1938). Induced size effect. I. A new phenomenon in binocular space–perception associated with the relative sizes of the images of the two eyes. American Medical Association Archives of Ophthalmology, 20, 604–623. Ogle, K. N., Martens, T. G., & Dyer, J. A. (1967). Oculomotor imbalance in binocular vision and fixation disparity. Philadelphia: Lea & Febiger. Parker, A. J., & Yang, Y. (1989). Spatial properties of disparity pooling in human stereo vision. Vision Research, 29, 1525–1538.
11.
DISPARITY-BASED STEREOPSIS
245
Pierce, B. J., & Howard, I. P. (1997). Types of size disparity and the perception of surface slant. Perception, 26, 1503–1517. Pierce, B. J., Howard, I. P., & Feresin, C. (1998). Depth interactions between inclined and slanted surfaces in vertical and horizontal orientations. Perception, 26, 27, 87–103. Regan, D., Erkelens, C. J., & Collewijn, H. (1986). Necessary conditions for the perception of motion in depth. Investigative Ophthalmology and Visual Science, 27, 584–597. Rogers, B. J. (1992). The perception and representation of depth and slant in stereoscopic surfaces. In G. A. Orban & H.-H. Nagel (Eds.), Artificial and biological vision systems (pp. 241–266). Berlin: Springer-Verlag, Rogers, B. J., & Bradshaw, M. F. (1992). Differential perspective effects in binocular stereopsis and motion parallax. Investigative Ophthalmology and Visual Science, 33 (Abstracts), 1333. Rogers, B. J., & Bradshaw, M. F. (1993). Vertical disparities, differential perspective and binocular stereopsis. Nature, 361, 253–255. Rogers, B. J., & Bradshaw, M. F. (1995). Disparity scaling and the perception of frontoparallel surfaces. Perception, 24, 155–179. Rogers, B. J., & Graham, M. E. (1982). Similarities between motion parallax and stereopsis in human depth perception. Vision Research, 22, 216–270. Rogers, B. J., & Graham, M. E. (1983). Anisotropies in the perception of three–dimensional surfaces. Science, 221, 1409–1411. Rogers, B. J., & Howard, I. P. (1991). Differences in the mechanisms used to extract 3–D slant from disparity and motion parallax cues. Investigative Ophthalmology & Visual Science, 32, (Abstracts), 695. Rogers, B. J., & Koenderink, J. (1986). Monocular aniseikonia: A motion parallax analogue of the disparity–induced effect. Nature, 322, 62–63. Somani, R. A. B., Desouza, J. F. X., Tweed, D., & Vilis, T. (1998). Visual test of Listing’s law during vergence. Vision Research, 38, 911–923. Stenton, S. P., Frisby, J. P., & Mayhew, J. E. W. (1984). Vertical disparity pooling and the induced effect. Nature, 309, 622–624. Stevenson, S. B., Cormack, L. K., & Schor, C. M. (1991). Depth attraction and repulsion in random dot stereograms. Vision Research, 31, 805–813. Tyler, C. W. (1974). Depth perception in disparity gratings. Nature, 251, 140–142. Van Ee, R., & Erkelens, C. J. (1995). Anisotropy in Werner’s binocular depth-contrast effect. Vision Research, 36, 2253–2262. Van Ee, R., & Erkelens, C. J. (1996). Temporal aspects of binocular slant perception. Vision Research, 36, 45–51. Werner, H. (1937). Dynamics in binocular depth perception, Psychological Monographs, 49, 1–120. Werner, H. (1938). Binocular depth contrast and the conditions of the binocular field. American Journal of Psychology, 51, 489–497.
12 Configural Scoring of Simulator Sickness, Cybersickness and Space Adaptation Syndrome: Similarities and Differences Robert S. Kennedy Julie M. Drexler Daniel E. Compton RSK Assessments, Inc.
Kay M. Stanney D. Susan Lanham University of Central Florida
Deborah L. Harm NASA Johnson Space Center
Written reports about conditions conducive to motion sickness date back at least to Hippocrates. From the standpoint of operational efficiency, Julius Caesar, Lawrence of Arabia, and Admiral Nelson were plagued with bouts of sickness (Money, 1972), but all appear to have either adapted over repeated exposures or otherwise coped with the symptoms; thus, they were able to distinguish themselves in their respective motion environments despite these adverse effects. However, the practical rule is that motion sickness can be expected to adversely affect operational efficiency (Benson, 1978), and the U.S. Navy has long been concerned with the influence of various ship motions on seasickness and seakeeping performance. Ernie Pyle, who witnessed firsthand the World War II D-Day invasion in 247
248
KENNEDY ET AL.
Normandy, wrote about what he observed to be the enormously reduced fighting efficiency of soldiers and sailors due to seasickness and seasickness drugs, and it was observed that the landing occasioned “the greatest mass vomiting ever known in the history of mankind” (Reason & Brand, 1975, p. 18). Symptoms of motion sickness have been with us since the means of passive conveyance achieved wide use. The pathognomonic sign is vomiting (and at times retching). The other signs of the syndrome are many and disparate. They include overt manifestations such as pallor, sweating, and salivation (Colehour & Graybiel, 1966; Stern, Koch, Stewart, & Lindblad, 1987), and, curiously, lassitude and a reluctance to communicate. The major reported symptoms of motion sickness imply involvement of the vagus nerve complex related to the autonomic nervous system, and these include nausea, drowsiness, general discomfort, apathy, headache, stomach awareness, disorientation, fatigue, and incapacitation (Kennedy & Frank, 1986). Accompaniments, but less well-known as outcomes, include postural and eye–hand incoordinations (Kennedy, Stanney, Compton, Drexler, & Jones, 1999) and the sopite syndrome (Graybiel & Knepton, 1976). The latter problems may occur as the sole manifestation of sickness or may be present when other combinations of symptoms are present, and for this reason are insidious and portend a condition that could lead to accidents following exposures. Additional signs of motion sickness include changes in cardiovascular, respiratory, gastrointestinal, biochemical, and temperature regulation functions. In addition to humans, most animals (e.g., monkeys, dogs, birds) appear to exhibit traditional signs of motion sickness (viz., vomiting, salivation, drowsiness). Further, fish and seals being transported in trucks and aboard ships have been known to regurgitate their food (Chinn & Smith, 1955). Even rats—which do not have a characteristic vomiting mechanism—show a disordered operant response (Eskin & Riccio, 1966) after protracted rotation.
GENESIS Nausea, the cardinal symptom of seasickness, has its origins in the Greek word for sailor (nautes) or boat (naus). Symptoms at sea, likely the most prevalent form of sickness, appear to occur when the predominant motions of the environment are within the frequency range centered around 0.2 Hz (McCauley & Kennedy, 1976). The amount of acceleration and time spent at that frequency (versus one higher or lower) is a major determinant of sickness incidence and severity. Thus, vehicles that move in the low frequency range (viz., most seagoing vessels, some cars, some all-terrain vehicles, large surface-effect ships, most high-winged aircraft, ferries [with or without stabilization gear], buses, swings, some moving-base simulators, and camels) also exhibit their share of sickness. Although it is tempting to posit that most forms of motion sickness can be avoided if one were to design vehicles to only move either below 0.01 Hz or above 0.80 Hz (cf. MILSTD 1472C, 1981),
12.
CONFIGURAL SCORING
249
there are other forms of motion sickness in which the presence of a stimulus within the bandwidth around 0.2 Hz stimulus is not so obvious, specifically: (1) rotationinduced sickness (e.g., carnival devices and merry-go-rounds), particularly those involving Coriolis-type stimulations (Kennedy & Graybiel, 1965), which increase as velocity increases; (2) space sickness, which seems to be related to activity levels during early microgravity exposures, and its aftereffects, which appear to be proportional to the duration of exposure,1 (3) many environments where dynamic and static visual displays have been shown to induce motion sickness (Hettinger & Riccio, 1992),2 and, relatedly, (4) dynamic visual scenes where depth coding is disrupted.
VISUALLY INDUCED MOTION SICKNESS (VIMS) While the statement may appear to be axiomatic, it is not entirely true that motion sickness only occurs when the physical environment moves. For example, visually perceived movement can influence the motion sickness experience. Erasmus Darwin (1794), in his “Zoonomia,” related that his grandson, who suffered greatly from the motions of the ship Beagle, expressed the view that it was visual disturbances that constituted the principal cause of seasickness, and although blind people can become seasick, “people can increase their resistance to motion sickness by being blindfolded in otherwise provocative moving environments” (Kennedy, Tolhurst, & Graybiel, 1965). Around the turn of the century, Stratton (1897), viewing real images through inverting prisms, described dizziness and nausea in individuals who were made to walk while wearing these glasses. Because most investigators found that humans can rapidly adapt to these unusual conditions, such methods subsequently became a popular paradigm for the study of central nervous system plasticity, and, as a result, a very large research literature and much theory on perceptual adaptation emerged from this work, carrying down into the 1950s and 1960s (Kohler, 1968). Also around the turn of the century, Wood (1895) described the Haunted Swing Illusion from the San Francisco World’s Fair. This device offered the first example of which we are aware of a purely visual stimulus producing sickness and disorientation (see Hettinger, 2002, for a discussion). In 1949, Tyler and Bard alluded to others who had made similar observations regarding the importance of visual factors in motion sickness, but they questioned whether these visually related 1 There is currently no satisfactory way to predict who is prone to space sickness or what the physical
stimulus is that causes it (retinal slip, vestibulo ocular reflex recalibration, otolith tilt reinterpretation hypothesis, stomach contents at microgravity, etc., see Reschke, Kornilova, Harm, Bloomberg, & Paloski, 1997, for a complete review) 2 For example, Witkin (1949) produced unwanted motion sickness using only a tilted chair.
250
KENNEDY ET AL.
problems were etiologically identical to those of motion sickness. Crampton and Young (1953) began to explore motion sickness and the perception of ego- or self-motion, and their work, plus the work of clinical otolaryngologists with optokinetic stimuli, anticipated the research concerned with the perception of illusory self-motion (Dichgans & Brandt, 1978). Dichgans and Brandt (e.g., 1972, 1973, 1978) systematically explored the manner in which visual stimulation can influence the perception of illusory self-motion (called vection). Their work on vection forms the basis of what is now known about the psychophysical determinants of the perception of ego motion, and they, along with Hettinger (2002), Howard and Howard (1994), Kennedy, Hettinger, Harm, Ordy, and Dunlap (1996) are sources that can be consulted for psychophysical parameters that govern the experience of vection. In many of the situations in flight simulators where visually induced motion sickness (VIMS) occurs, it appears that similar symptoms of motion sickness are present (Hettinger & Riccio, 1992). But the relationship of vection to sickness and presence is not simple since it must reconcile the following facts. For example, it is known that vection is an important ingredient in visually induced sickness because persons who do not ordinarily get vection also do not become sick (Hettinger, Berbaum, Kennedy, Dunlap, & Nolan, 1990). Relatedly, “true” motion sickness in several different environments is not experienced at all by persons with bilateral labyrinthine deficits (Kennedy, Graybiel, McDonough, & Beckwith, 1968; Kellogg, Kennedy, & Graybiel, 1965), and, whereas labyrinthine defectives are also immune to sickness caused by vection (Cheung, Howard, & Money, 1991), they can perceive vection (Cheung, Howard, Nedzelski, & Landolt, 1989). Therefore, the vection experience (Hettinger, 2002; Hettinger et al., 1990; Kennedy, Stanney, Compton, Drexler, & Jones, 1999), constrains the conclusion one might wish to make about the origin of sickness (i.e., that sickness is caused by vection).
SIMULATOR SICKNESS As a topic of scientific inquiry, motion sickness has been studied primarily in its most common forms: sea and air sickness (Reason & Brand, 1975) and space sickness (Crampton, 1990). Therefore, it is not surprising that when the ability to simulate vehicular motion was developed, a form of motion sickness unique to these conditions emerged. It has been referred to as simulator sickness, simulator aftereffects, or the simulator adaptation syndrome (Kennedy, Hettinger, & Lilienthal, 1990). The development of flight and automobile simulators appears to have been guided by the assumption that more realistic simulation (i.e., wide field-of-view visual displays containing highly detailed representations of environmental features) will result in faster and better training. Engineering talents have focused on creating realistic, high-fidelity simulation environments, but empirical
12.
CONFIGURAL SCORING
251
research has not indicated that increasing fidelity by “x” percent results in “x” percent increase in training benefit. A fundamental thesis of this chapter is that although the effects of simulator realism and fidelity on training effectiveness are poorly understood or unknown, there is strong reason to suspect that increased realism may result in an increase in the incidence of simulator sickness. At present, the psychophysical laws that govern the relationship between the richness or fidelity of visual imagery and training effectiveness are not well-known. However, in what follows, empirical evidence will be presented that indicates that as simulators have become more compellingly realistic and faithful in their representations of reality, the incidence of simulator sickness has increased. Among the more serious problems presented by this syndrome is the seldom recognized potential for residual aftereffects (Baltzley, Gower, Kennedy, & Lilienthal, 1988; Crosby & Kennedy, 1982; Kellogg, Castore, & Coward, 1980; McGuinness, Bouwman, & Forbes, 1981; Ungs, 1989), including illusory sensations of climbing and turning, perceived inversions of the visual field, and disturbed motor control. Above all, the visually related disturbances are more prevalent in simulator sickness than gastrointestinal disturbances. In fact, simulator sickness bears a strong resemblance to the disturbances that individuals experience when wearing reversing, displacing, or inverting lenses previously mentioned (Dolezal, 1982) or when exposed to rotating (Graybiel, Guedry, Johnson, & Kennedy, 1961) or tilted rooms (Witkin, 1949).
SPACE ADAPTATION SYNDROME Simulator sickness symptoms can also have much in common with reports of astronauts’ experiences of the space adaptation syndrome (Homick, 1982; Parker, Reschke, Arrott, Homick, & Lichtenberg, 1985). For example, vomiting in all these cases appears to have a sudden, sometimes unexpected onset, often without accompanying prodromal nausea (Thornton, Moore, Pool, & Vanderploeg, 1987), and dizziness is prominent. As pointed out by Casali (1986), the term motion sickness should perhaps not be used as a global description of sickness induced by simulators; rather, Benson (1978) believes that the generic term should be motion maladaptation syndrome. Many simulators impart no physical motion at all, and yet sickness may still occur as a result of perceiving visual representations of motion (Hettinger et al., 1987; Parker, 1971). Because the signs and symptoms that qualify for a diagnosis of motion sickness are diverse and because motion sickness can be caused by many stimuli, we find it helpful to characterize the malady as polygenic and polysymptomatic (Kennedy & Fowlkes, 1992). The diversity of causes and effects implies that generalizable solutions that will apply to all conditions or that will work on all symptoms will be difficult to obtain. However, there is much ordered information in the scientific
252
KENNEDY ET AL.
literature, and reviews are available (Kennedy & Frank, 1986; Money, 1970; Reason & Brand, 1975; Tyler & Bard, 1949), along with a field manual that suggests how to use devices in order to minimize symptoms (Kennedy et al., 1987).
MEASURING SICKNESS WITH QUESTIONNAIRES Historical Beginnings In our years of research investigating forms of motion sickness, including space and simulator sickness, we have employed a variety of techniques to document incidences of sickness. The major tool utilized in these investigations has been a multisymptom motion sickness questionnaire (MSQ), which in one form or another is now currently in use at more than three dozen laboratories and facilities. The early beginnings of this form of self-report questionnaire were as follows:
r During World War II, Wendt (reviewed in Wendt, 1968), performed research
r r r
r
r
r r
in an attempt to assess the continuum of motion sickness symptoms by employing a 3-point scale where “vomiting” was rated highest, then “nausea without vomiting,” and finally “no symptoms.” Graybiel, Clark, and Zarriello (1960), used a seven-item symptom checklist in the Pensacola Slow Rotation Room. Subsequently, we formalized the self-report technique into a checklist and incorporated participants’ verbal symptom reports from studies of Coriolis sickness in the Slow Rotation Room (Kennedy and Graybiel (1965). This expanded the symptom checklist to a total of 33 separate symptoms, from which was derived a 5-point composite score that was first used to quantify symptoms experienced during Slow Rotation Room studies (Kennedy, Tolhurst, & Graybiel, 1965). Other approaches derived from this checklist, known as the Graybiel classification system (DiZio & Lackner, 1997), include scoring the usage of signs as well as symptoms, but require an experimenter (Graybiel, Wood, Miller, & Cramer, 1968). Symptoms covered in both the questionnaires by Wendt (1968) and Graybiel and colleagues (1968) include: cerebral (e.g., headache), gastrointestinal (e.g., nausea, burping, and emesis), psychological (e.g., anxiety, depression, and apathy), and other less characteristic indicators of motion sickness, such as fullness of the head. Responses to the questionnaire to be reported below were made for each symptom using a 4-point Likert-type (Likert, 1967) scale ranging from none to severe, and in some cases, yes or no. In later applications of the questionnaire, the 4-point scale was expanded to study seasickness (Wiker, Kennedy, McCauley, & Pepper, 1979).
12.
CONFIGURAL SCORING
253
r The Simulator Sickness Questionnaire (SSQ) has also been used to study r
hurricane-induced sickness in aircraft (Kennedy, Moroney, Bale, Gregoire, & Smith, 1972) and during storms at sea (Kennedy et al., 1968). A computerized, automated version of the checklist that can be selfadministered is also available (Kennedy, Lane, Berbaum, & Lilienthal, 1993).
PSYCHOMETRIC PROPERTIES Self-report checklists have the obvious disadvantage of being subject to fabrication, and they also have a proven record of predictive validity, wherein a correlation between seasickness severity and objective signs of vomiting of r = 0.73 ( p < .001) has been reported (Wiker et al., 1979). Further, it is probably a safe, although nettlesome, assumption that questionnaire data are probably twice as reliable as the objective measures that they have been developed to replace. Ironically, in some well-intentioned studies, these “objective” measures are sometimes validated against the self-report score itself as the criterion. Questionnaires generally exhibit high reliability: 1. Split-half correlation for the SSQ for 200 subjects after a VE exposure is r = 0.80 (Kennedy, Stanney, Compton, Drexler, & Jones, 1999), and with the Spearman (1904, 1907) correction for test length, the full SSQ form is r = .89. 2. Yoo (1999), using a driving simulator, found reliabilities of r ∼ 0.78 for the SSQ. It should be mentioned that the lack of reliability of objective measures of sickness is not only in the recording of the physiological response (e.g., EGG and pallor), but also in the physiological response system itself (viz., not all people get pale before they vomit). Scoring Currently, most information regarding cybersickness incidence and severity is available using one or another version of the SSQ (Kennedy, Lane, Berbaum, & Lilienthal, 1993; Lane & Kennedy, 1988). In our early studies with the SSQ technique in military flight simulators, composite (i.e., total) scores showed that sickness was prevalent in nearly all fielded flight simulators of that era and appeared greatest in moving base simulators (Kennedy et al., 1989). Later, in order to improve on the metric properties of the questionnaire and to seek a differential predictor for different environments, a factor analysis was carried out (Lane & Kennedy, 1988), which revealed three clusters of symptoms. In current usage, a weighted average of the three factors comprises a total score, which is intended to reflect the severity of the symptomatology for an individual and, when added to a representative graph of N scores, we believe can be used to index the “troublesomeness”
254
KENNEDY ET AL.
of a simulator or virtual environment system. However, we should keep in mind that there is also prospective heuristic value in the profile or configural scoring of the devices that can be used to flag those systems with high levels of symptoms. We will return to that approach later in the chapter.
TOTAL (NORMATIVE) SCORES Although the maximum score possible on the SSQ is 300, in practical application we have no record of ever reaching that high a score. As may be seen in Fig. 12.1, 95% of the obtained scores are less than 100. Normative data from flight simulators that were examined over a series of surveys carried out by the navy and army using eight moving base helicopter simulators were compared in order to show the total sickness scores in Fig. 12.2 (Fowlkes, Kennedy, & Allgood, 1990; Gower et al., 1987; Kennedy, Jones, Lilienthal, & Harm, 1994). These data are for the paper-andpencil version that has been used with nearly 3,000 cases. The paper-and-pencil posteffect data track well with the computerized version of the same system, which has been used with ≥ 6,000 cases and which also appear in the cumulative curves shown in Fig. 12.1. There are two known constraints on these data: (1) 95% of the population of persons shown in this figure are male military personnel and (2) nearly all of them were pilots. However, the sheer size of the database (> 9,000 exposures) and the
Simulator Sickness Total Score
100
Paper/Pencil
Computer
90 80 70 60 50 40 30 20 10 0
40
50
60
70
80
90
100
Cummulative Frequency (%)
Paper and Pencil Questionnaires: N = 2827 (32 simulators) Computer Questionnaires: N = 6182 (10 simulators) FIG. 12.1. Normative data for military flight simulators.
12.
CONFIGURAL SCORING
255
30
Total Sickness Score
25
20 15
10 5
0
2F120 2F121 2F64C 2F135
2B33
2B38
2B31
2B40
Simulator Designation
FIG. 12.2. Total sickness score for eight military helicopter simulators.
regularity of the findings suggest that with the provisos mentioned above one may use the data both for normative purposes and for comparison with other devices. In the flight simulator studies, in which paper-and-pencil versions of self-report forms were employed and also where a computerized inquiry method was used for a large number of subjects (N = 6,182; 10 simulators), certain regularities appear. Note, for example, in Fig. 12.1 that in the flight simulator data: (1) both cumulative frequency distributions grow at the same rate, (2) more than 40% of those exposed do not have any symptoms at all, (3) the average score is about 5, and (4) 80% of those exposed have scores equal to or less than 20. It may be said about these data that a fine-grained analysis has been prepared (Table 12.1) to reflect common usage, in our experience, which we find helpful in comparing simulators. We find that if a simulator sickness score is above 10, the incidence may be beginning to reach significance, and average total scores for a simulator above 20 are at the 50th percentile of sickness when compared to a military database.
CONFIGURAL (DIAGNOSTIC) SCORING Based on the thesis that motion sickness is often ascribed to a conflict between two sensory systems, Reason and Brand (1975) listed six types of sensory rearrangement that may produce motion sickness. They argued, for instance, that the conflict may be between vision and vestibular inputs or between two vestibular inputs (cf. Benson, 1978). Each of these possibilities could involve both sensory
256
KENNEDY ET AL. TABLE 12.1 Categorization of Symptoms Based on Central Tendency (Mean or Median) Using Military Aviation Personnel in Each Simulator
SSQ Total Score
Categorization
0 <5 5–10 10–15 15–20 > 20
No symptoms Negligible symptoms Minimal symptoms Significant Symptoms Symptoms are a concern A problem simulator
systems (vision/vestibular, e.g., inertial inputs at sea while watching the waves) or only one. Factor analytic scoring (Lane & Kennedy, 1988), was derived from the total scores of the questionnaires from the navy and army database. Three clusters were identified: (1) Nausea, (2) Oculomotor, and (3) Disorientation (Kennedy et al., 1993; Kennedy et al., 1992). Scores on the Nausea (N) subscale are based on the report of symptoms that relate to gastrointestinal distress such as nausea, stomach awareness, salivation, and burping. Scores on the Oculomotor (O) subscale relate to eyestrain, difficulty in focusing, blurred vision, and headache. Scores on the Disorientation (D) subscale are related to vestibular disturbances such as dizziness and vertigo. Figure 12.3 shows how the three-factor solution reflects the large database of simulator sickness shown above and, for comparison purposes, also shows profile data from a U.S. Coast Guard seasickness study (Wiker et al., 1979) and from a rescoring of the National Aeronautics and Space Administration’s (NASA’s) astronaut responses from space flight studies (Kennedy et al., 1994). Taken together, it may be seen that space sickness, overall, exhibits stronger symptoms of sickness than that of both simulator and seasickness, and space sickness is more severe than our large simulator sickness database. Further, the profile scoring of simulators shows that there is relatively greater oculomotor disruption with simulators than in the other two environments. As may be seen in Fig. 12.3, seasickness, as one might expect, has a preponderance of nausea and progressively lower levels of oculomotor- and disorientation-type reports. Conversely, space sickness has very little oculomotor symptomatology, significantly more nausea, and moderate to severe amounts of disorientation. Figure 12.4 depicts results obtained from a set of helicopter simulators that all appeared to have a common sickness profile. That is, each had a consistently high incidence of oculomotor symptomatology and relatively lower nausea and disorientation symptoms. It is interesting that each of these helicopter simulators
12.
CONFIGURAL SCORING
257
30
Sickness Score
25 20 15 10 5 0
Nausea
Oculomotor
Disorientation
FIG. 12.3. Configural scoring profiles of sickness for three environments.
30 Nausea
Oculomotor
Disorientation
Sickness Score
25
20 15
10 5
0 2F120 2F121 2F64C 2F135
2B33
2B38
2B31
2B40
Simulator Designation
FIG. 12.4. Profile sickness scores for eight army and navy helicopter simulators.
employed computer-generated imagery over multiple cathode ray tube (CRT) displays that were often set at different physical distances from the operator’s viewing position. Ebenholtz (1988) made the point that such a configuration could lead to eyestrain, and that is essentially what the oculomotor symptom complex appears to be showing. Admittedly, eyestrain can be occasioned by many other factors, but using the information available here, one might look to
258
KENNEDY ET AL.
common causes of eyestrain in any situation where the simulator sickness profile manifests a prevalence of this class of symptoms. We thought that this observation about equipment configuration being causally related to this particular symptom distribution may have some generalizability and if so, perhaps there might be an advantage in looking within virtual reality/environment systems to determine whether consistent profiles could be found. RESULTS OF COMPARATIVE STUDIES USING SICKNESS PROFILES An opportunity to compare the symptom profiles (Nausea, Oculomotor, Disorientation) and overall level of sickness (total score) in questionnaires from eight different VE experiments was presented to us. We had carried out, in collaboration with three university laboratories, four experiments (1, 2, 4, & 5) using three different virtual environment (VE) head-mounted display (HMD) systems. In addition, we had access to the data from four other experiments (two federal laboratories [3 & 7] and two other university laboratories [6 & 8]). Table 12.2 shows the pertinent details from the eight experiments with the detailed hardware characteristics of each system. It may be seen that most of the experiments lasted 30 min, an extended period of time for purposes of immersion, but only one fourth as long as the average simulator exposure time. The purpose of this portion of the present discussion is to describe the results from these VE experiments. The symptom profiles and total sickness in the eight VE systems will be compared to the very large database from military flight simulators. Additional comparisons with both space motion sickness and seasickness will be made. It is important that the clusters of symptoms encountered when vection is experienced are also like those that have been reported in simulators, although some VEs appear to exhibit different symptoms from seasickness profiles, which has been related to reports of presence and immersion in both VE and simulator exposures where sickness is also recorded. Currently, as VE systems have enabled provision of compelling sensations of self-motion using visual scenes alone, symptoms of motion sickness have been increasingly reported in these systems as well (Durlach & Mavor, 1995; Hettinger, 2002). The advent of sickness in VEs affords the opportunity to compare sickness rates and profiles in these environments to sea and space sickness. CYBERSICKNESS IN VIRTUAL ENVIRONMENTS There is concern that continued development of VE technology may be compromised by the presence of motion sickness–like symptoms, known as cybersickness, which are currently being experienced by a significant proportion of VE users (Chien & Jenkins, 1994; Kingdon, Stanney, & Kennedy, 2001; Stanney, Mourant,
259
University Central Florida (Orlando) Army Personnel Research Establishment (Farnborough, U.K.) Murray State University (Murray, Kentucky)
2
4
3
University Central Florida (Orlando)
Laboratory
1
Exp No.
i∗ glasses! (Kennedy, Jones, Stanney, Ritter, & Drexler, 1996, stereoscopic)
Kaiser E/O VIM 500 (Kennedy, Stanney, Dunlap, & Jones, 1996) i∗ glasses! (Kolasinski, 1996; stereoscopic) Virtual Research Flight Helmet (Regan & Price, 1994)
HMD
Ascent
Demo software
20
40
Ascent
WorldTool Kit
Program
20
30
Exposure Duration (Min)
37
146
40
34
N
Not available
360 × 240
Triad
Triad
789 × 230
789 × 230
Delta
Structure
710 × 225
Format (Pixels)
TABLE 12.2 Eight Experiments Using HMD-based VE Systems
∼3
Not available
∼3
3.38
Spot Size (ARC Min)
30D
110H × 60V
30D
40H × 30V
Field of View (Deg)
(Continued)
100%
Not available
100%
100%
Vertical Overlap
260
8
7
6
5
Laboratory
Virtual Research VR-4 (Salzman, Dede, & Loftin, 1995; stereoscopic)
CyberMaxx 180 (Rich & Braun, 1996) Virtual Research VR-4 (Bliss et al., 1996; stereoscopic) Virtual Research VR-4 (Lampton et al., 1994; stereoscopic)
HMD
75∗∗∗
20∗∗
∼ 20∗
40
Exposure Duration (Min)
Solid Surface Modeler
WorldTool Kit
Solid Surface Modeler
Heretic
Program
39
57
55
23
N
742 × 230
742 × 230
Triads
Triads
Triads
Triad
789 × 230 742 × 230
Structure
Format (Pixels)
dependent on task completion; ∗∗ Repeated exposures used; ∗∗∗ Total time including breaks.
University of Idaho (Moscow, ID) University of Houston (Houston, Texas) U.S. Army Research Institute (Orlando, Florida) George Mason University (Fairfax, Virginia)
∗ Time
Exp No.
TABLE 12.2 (Continued)
8.10
8.10
8.10
∼7
Spot Size (ARCMin)
48H × 36V
48H × 36V
48H × 36V
53H × 35V
Field of View (Deg)
100%
100%
100%
100%
Vertical Overlap
12.
CONFIGURAL SCORING
261
& Kennedy, 1998). If reports of VE sickness continue unabated (Kolasinski, 1995) and if the recent projected estimates of increased device usage are correct, particularly for entertainment and education (Machover, 1996), occurrences of VE sickness and its accompanying ill effects may soon surpass earlier estimates. As VE systems were fielded, such ill effects were compared with the symptoms of motion sickness reported in the 1980s by military aircrew and NASA test pilots following their exposures to flight simulators (Frank, Kennedy, Kellogg, & McCauley, 1983; Kennedy, Jones, Lilienthal, & Harm, 1994; Kennedy, Lilienthal, Berbaum, Baltzley, & McCauley, 1989; McCauley & Cook, 1987). It was significant that the symptoms and aftereffects seen in connection with cybersickness and simulator sickness have elements in common with space sickness (Paloski, Black, Reschke, Calkins, & Shupert, 1993; Reschke et al., 1994) and other forms of motion sickness (Crampton, 1990). However, there appear to be some distinct differences between cybersickness and other forms of motion sickness (Stanney & Kennedy, 1997). A good definition of virtual environments is available from Durlach and Mavor (1995), who consider that a VE system consists of a human operator, a human–machine interface, and a computer. The computer, the displays and controls in the interface are configured to immerse the operator in an environment containing three-dimensional objects with three-dimensional locations and orientations in three-dimensional space. Each virtual object has a location and orientation in the surrounding space that is independent of the operator’s viewpoint, and the operator can interact with these objects in real time using a variety of motor output channels to manipulate them. (p. 18)
On the surface, this sounds very much like a definition of a simulator, and we take no position regarding whether one is a subclass of the other or the converse. The simplistic argument of the psychophysical linking hypothesis of Brindley (1960) is analogous to the current situation. It asserts that if the same perceptual experience occurs following two different stimulus conditions, then similar neural pathways may be involved in the action, and if different perceptual experiences occur, then perhaps different pathways were involved. On the other hand, if the symptoms of two forms of motion sickness are very much alike, then one might argue for a common cause, even if they occurred in a different simulator, with different display projection systems and other characteristics. Thus, symptom profiles of VE and simulator systems, along with other forms of motion sickness, may be measured and compared to determine if cybersickness is distinctive or if it can be identified and treated with the same human factors solutions used for other forms of motion sickness. One challenge in developing human factors solutions to this problem is to quantify and reliably determine the stimulus for the various forms of visually induced sickness. Engineering tests of dynamic systems, such as flight trainers, routinely employ controlled inputs and measure them “end-to-end with visual/motion
262
KENNEDY ET AL.
hardware response as the output” (Browder & Butrimas, 1981, p. ii). Such an approach is necessary to evaluate the engineering characteristics of a system. However, sickness in VEs is a person-centered problem. An identical VE device can have widely varying effects on different individuals. Given this interaction between users and VEs, determining the contribution of visual scene content, for example, to the incidence and severity of sickness requires human-in-the-loop exposures. It thus becomes an issue of systematic measurement of human-centered phenomenon. Whereas investigations of VE systems (described later) generally use male and female college students as experimental subjects, the flight simulation systems that have been studied to date are generally used by experienced military personnel. Further, the training regimes of flight simulators differ from VE systems in that the exposure durations are almost always greater than 1 hour, and in some cases are 4 hours, whereas VE exposures are usually less than 1 hour. For comparison purposes (and discussed more fully later in this chapter), we have included Fig. 12.5, which depicts comparable data from Fig. 12.1 and adds our experiences with VE devices, using mostly a college-age population. Note that all percentile scores are much higher for VE than for military flight simulators, which implies that sickness is higher at every level for VE systems. Indeed, the average sickness level experienced in VE devices (i.e., the 50th percentile) is higher than that experienced by all but 10% of military aircrew in flight simulators. This implies that subjects in these VE settings are reporting far more severe problems than are reported by a military pilot population in flight simulators. Thus, on rational grounds, they occasion different types of sickness, and we hypothesized that despite the obvious technical similarities of VE and simulators it would be interesting to determine whether different devices can evince characteristically different
100 Total Sickness Score
90
Paper & Pencil
Computer
Virtual Reality
80 70 60 50 40 30 20 10 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Cummulative Frequency (%) Paper and Pencil Questionnaires: N = 2827 (32 simulators) Computer Questionnaires: N = 6182 (10 simulators) Virtual Reality: Questionnaires: N = 485 (8 devices)
FIG. 12.5. Normative data for military simulator and virtual environment devices.
12.
CONFIGURAL SCORING
263
symptom profiles, and then to compare the profiles of several environments to each other. We decided to address this issue empirically, and in what follows we present our approach and findings. Apparatus Virtual Environment Devices The VE devices used for the experiments are current commercially available, over-the-counter systems without special modifications or provisions. One of the devices used was comprised of a computer game called Ascent, produced by Gravity for Virtual iO, which comes bundled with i-glasses!(tm). This system was used in two experiments (2 & 4) of differing duration (see Table 12.2). The HMD contained a head tracker that was engaged for all participants. The control device consisted of a standard mouse. The Ascent game was chosen because it met the following requirements: (1) It was easy to learn, uncomplicated, and moderately engaging; (2) the game is such that each participant received essentially the same stimulus, and the game can cycle continuously for a specified amount of time; and (3) previous testing revealed that this game had the potential to induce discomfort in some individuals (Kolasinski, 1996), possibly due to the active head movements required to play. The other devices listed in Table 12.2 entail similar combinations of HMDs and software, but each system had unique characteristics. A full catalog of every item on which these devices and programs differed would be too lengthy for this chapter, but that does not mean that these differences are considered insignificant. At this stage of our knowledge, it is not yet known what features to list; studies such as those reported here will help to determine significant features. The procedures for most of the eight experiments were the same. That is, while engaged in the VE task, participants were seated in a chair that allowed 360-deg viewing of the virtual environment. Lights were turned off in the room while the participant was immersed in the VE to reduce glare and reflections within the HMD. Participants were usually administered questionnaires prior to exposure. Participants were exposed to the VE for the durations specified in Table 12.2. The virtual task activities generally involved navigation throughout the VE and virtual object manipulation via the mouse. The SSQ was administered immediately after exposure.
TOTAL SCORES FROM VE SYMPTOMS Figure 12.6 shows the average total score in the eight VE devices from Table 12.2. The average score over all the devices is at about 30 on the total score scale, but the range is broad, from 19–55. This comparison is easily seen in Fig. 12.7 using the same total score, where the military helicopters show an average score
264
KENNEDY ET AL. 60
Total Sickness Score
50 40 30 20 10 0
Univ Murray Univ APRE Univ Univ ARI George Avg of St of of of '94 Mason VE Ctl FL Univ Ctl FL Idaho Houston Univ Sick
Avg Sim Sick
FIG. 12.6. Total sickness scores from eight virtual environment devices.
60
Total Sickness Score
50 40 30 20 10 0
Coriolis Suscept Index
NASA PAT
Space Sickness
Army Helo Avg.
Navy Helo Avg.
VE Avg Type A
VE Avg Type B
FIG. 12.7. Comparsion of total sickness scores by environment.
of approximately 10, indicating substantially lower severity in military helicopter flight trainers. Figure 12.7, also for comparison, lists three examples from NASA’s space sickness research program. The first two are laboratory tests for assessing motion sickness susceptibility (Coriolis Sickness and Preflight Adaptation Trainer, or PAT) and understandably have high scores because they are used to experimentally produce sickness. The third environment reported in Fig. 12.7, Space Sickness, is based on the reports of 85 persons (astronauts) who have actually traveled in space.
12.
CONFIGURAL SCORING
265
60
Sickness Score
50 40 30 20 10 0
Univ Murray Univ APRE Univ Univ ARI George Avg of St of of of '94 Mason VE Ctl FL Univ Ctl FL Idaho Houston Univ Sick
Avg Sim Sick
FIG. 12.8. Configural sickness scores from eight virtual environment devices.
The incidence here is at a level that is between the average for simulators and for VE exposures in general. It would appear that the average VE user is experiencing more sickness than the average military pilot in a flight simulator or an astronaut during space travel, but not as much as with NASA’s provocative tests (Coriolis and PAT) of space sickness. Figure 12.8 breaks out the VE systems from Fig. 12.6 according to their threefactor configural-scoring basis (i.e., Nausea, Oculomotor, and Disorientation). A striking difference can be seen among the eight systems. First, there appear to be two distinctly different profiles. The more common (five VEs) show less Oculomotor symptoms than the other two sets of symptoms. We refer to these as VE Type A. The second group (three VEs), referred to as VE Type B, which are devices produced by the same manufacturer, show relatively less Nausea but are otherwise consistent with the Type A VE. That is, all VE systems appear to exhibit a significant amount of Disorientation and lesser Oculomotor symptoms. Type A VE shows significantly more Nausea than Type B. On the other hand, it may be seen that the helicopter simulators in Fig. 12.4 also have a very distinctive profile. The profile shows the most prominent symptom cluster to be Oculomotor, which is different from the profile of VE sickness (Fig. 12.9). In Fig. 12.9, the actual and experimentally produced space sickness examples have a profile that resembles VE Type A (high Nausea and Disorientation, low Oculomotor disruption) and, to a lesser extent, VE Type B, which has far less Nausea. Again, it should be obvious that the army and navy helicopter simulator symptom profiles are distinctly alike and are different from the other five types of sickness. As these profiles demonstrate, simulator and VE sickness are very different. Simulators tend to have disproportionately high oculomotor symptomatology (and low disorientation reports), whereas VEs tend to have high disorientation
266
KENNEDY ET AL. 60
Sickness Score
50 40 30 20 10 0
FIG. 12.9. Comparison of configural sickness scores by environment.
symptomatology (and moderate or low oculomotor reports). In addition, VEs generally have higher total sickness scores regardless of the subscale profiles. Given the lower score of simulators, the moving base devices seem to exhibit relatively more nausea, although some VE systems show high nausea too. For purposes of hypothesis generation, several “group membership” statistical strategies were performed to identify proportionally different symptom clusters and determine how well each study exhibited such clustering. They primarily consist of discriminant and chi-square analyses. Discriminant analyses were conducted in order to determine how well group membership on known characteristics could be predicted from the subscales (Nausea, Oculomotor, and Disorientation). Class separation was performed on the basis of straightforward characteristics, usually binary in nature (moving base vs. fixed base, fixed wing vs. rotary wing, HMD vs. dome projection, monoscopic vs. stereoscopic imagery, and simulator vs. virtual environment). The scores used in the discriminant analyses were the average scores for all participants in each study, yielding one set of numbers for each device analyzed. The simulator vs. VE comparison yielded strong results; it is not known, however, how much of that separation is due to the symptom profiles and how much is due to the difference in symptom magnitudes between the two device types. Another analysis was done to assess how well each participant’s profile in a study matched the profile for the overall study. A chi-square test was used to make this assessment. Originally, the study was divided into all six possible profiles (N > O > D, N > D > O, O > D > N, O > N > D, D > N > O, and D > O > N), but for simplicity, only three categories were used for this analysis. The categories
12.
CONFIGURAL SCORING
267
chosen were simply based on which symptom was the greatest (N, O, or D). This yielded three possible categories (excluding “ties” for highest symptoms). Participants reporting no symptoms were discarded from this analysis. The results are presented below. The first column indicates the study and the study number, followed by columns for nausea (N), oculomotor (O), disorientation (D), chisquare, and the total number of cases. The three subscales (N, O, and D) have two lines of data for each study: the first line contains the number of participants having that profile; the second line indicates what percentage of participants in that study have that profile. Data are presented for various simulators (Table 12.3), VE systems (Table 12.4), and simulators that utilized an electronic version of the SSQ entitled BESS (Table 12.5).
TABLE 12.3 Symptom Profiles for Various Simulators
Study
N
2E6
3 50 27 51.9 6 19.4 7 41.2 30 22.4 2 15.4 5 9.4 14 34.1 41 43.6 28 28.6 10 28.6 97 33.7
2E7 Lemoore 2F110 Miramar 2F112 Miramar 2F117 New River 2F132 Lemoore 2F87F Brunswick 2F87F Jacksonville 2F121 New River CH-53E Tustin CH-53E New River 2F64C Jacksonville
O
2 33.3 15 28.8 19 61.3 9 52.9 82 61.2 10 76.9 44 83 24 58.5 43 45.7 58 59.2 18 51.4 152 52.8
D
Chi Square
Total N
1 16.7 10 19.2 6 19.4 1 5.9 22 16.4 1 7.7 4 7.5 3 7.3 10 10.6 12 12.2 7 20 39 13.5
.607
6
.012
52
.004
31
.047
17
.000
134
.004
13
.000
53
.000
41
.000
94
.000
98
.000
35
.000
288
(Continued)
268
KENNEDY ET AL. TABLE 12.3 (Continued)
Study
N
TH57C In-plant
13 100 16 23.2 19 31.7 24 32.4 6 12.5 83 37.6 4 28.6 6 28.6 9 34.6 4 15.4 9 47.4 3 18.8 2 40 2 25 5 10.6 4 5.3 3 27.3 11 45.8
Whiting TH57 ARMY AH-1 Ft. Rucker ARMY UH-60 Ft. Rucker ARMY CH-47 Ft. Campbell ARMY AH-64 2F120 New River Tustin CH-46 Tustin CH-53 Whidbey A-6E Whidbey EA-6B Tustin CH-46 Tustin CH-53 Whidbey EA-6B Mayport SH60 Whiting TH-57 Ocoee I Ocoee II VTRS FAST 2F87(J)
8 32
O
43 62.3 33 55 42 56.8 40 83.3 128 57.9 9 64.3 12 57.1 16 61.5 14 53.8 9 47.4 10 62.5 2 40 5 62.5 36 76.6 52 69.3 4 36.4 8 33.3 45 95.7 13 52
D
10 14.5 8 13.3 8 10.8 2 4.2 10 4.5 1 7.1 3 14.3 1 3.8 8 30.8 1 5.3 3 18.8 1 20 1 12.5 6 12.8 19 25.3 4 36.4 5 20.8 2 4.3 4 16
Chi Square
Total N
.000
13
.000
69
.000
60
.000
74
.000
48
.000
221
.030
14
.050
21
.020
26
.054
26
.026
19
.080
16
.607
5
.197
8
.000
47
.000
75
.913
11
.325
24
.000
47
.087
25
12.
CONFIGURAL SCORING
269
TABLE 12.4 Symptom Profiles for Various VE Devices.
Study
Kolasinski VE Murray VE Idaho VE Bliss VE Stanney VE ARI 1a ARI 1b ARI 1d ARI 2.1 ARI 2.2 ARI 3.1 ARI 4.1a ARI 4.1b Kay\Sue VE Deb Harm SSQ 5/9 Dark Focus 10/94 Dark Focus NASA Jim May 9/98
N
11 32.4 13 31.7 8 42.1 8 23.5 8 33.3 5 12.5 9 26.5 6 15.8 4 17.4 15 22.4 10 23.3 4 22.2 3 15.8 112 38.4 35 44.9 56 21.4 17 9.8 177 30.5
O
9 26.5 12 29.3 3 15.8 13 38.2 9 37.5 13 32.5 19 55.9 25 65.8 8 34.8 23 34.3 16 37.2 12 66.7 12 63.2 59 20.2 5 6.4 158 60.3 112 64.4 78 13.4
D
14 41.2 16 39 8 42.1 13 38.2 7 29.2 22 55 6 17.6 7 18.4 11 47.8 29 43.3 17 39.5 2 11.1 4 21.1 121 41.4 38 48.7 48 18.3 14 8 325 56
Chi Square
Total N
.572
34
.728
41
.268
19
.479
34
.882
24
.004
40
.017
34
.000
38
.200
23
.110
67
.368
43
.009
18
.021
19
.000
292
.000
78
.000
262
.000
174
.000
580
270
KENNEDY ET AL. TABLE 12.5 Symptom Profiles for Various Simulators (Using BESS).
Profile
TH57C - Whiting CH-53E New River CH-46 Tustin CH-53 Tustin 2F114 — Whidbey 2F143 — Whidbey CH-46 Tustin CH-53 Tustin 2F143 Whidbey CH-46 North Island
N
958 56 6 28.6 96 27.3 36 28.6 67 27.1 14 25 115 33.9 7 35 33 33.3 110 27.2
O
375 21.9 2 9.5 107 30.4 45 35.7 88 35.6 19 33.9 95 28 3 15 36 36.4 111 27.5
D
379 22.1 13 61.9 149 42.3 45 35.7 92 37.2 23 41.1 129 38.1 10 50 30 30.3 183 45.3
Chi Square
Total N
.000
1712
.012
21
.001
352
.526
126
.112
247
.336
56
.075
339
.157
20
.761
99
.000
404
DISCUSSION Because flight simulators and VEs are both visually interactive environments, one might expect their ill effects to be largely the same. The data reveal, however, that these two types of systems are different in at least two important ways: 1. Based on the results of this study, the level of symptoms produced by VE systems are statistically higher ( p < .0001) than those engendered by flight simulators. More specifically, Fig. 12.2 indicates that, on the average, flight simulators have total scores ranging from 8–20, with most systems being 10 or under. In Fig. 12.7, we display in one chart the Total Sickness scores from our complete database. As can be seen in this figure, the total scores for VE systems are considerably higher, ranging from 19–55. There are the obvious physical differences between these two types of systems. However, it should be noted that nearly all the persons used in the simulator data are military
12.
CONFIGURAL SCORING
271
pilots who are self-selected, have more experience with novel motion environments, and may be more likely to underreport symptoms, whereas in VE systems the participants were primarily college students. Whether this is a true difference in device or population remains to be investigated. In either case, it is an interesting finding. 2. The symptom profiles of these systems are quite distinguishable. First, Fig. 12.4 demonstrates that, as a family, the flight trainers (all moving-base helicopter simulators from U.S. Army, Navy and Marine Corps training centers that use multiple CRT display systems) have distinctively different profiles of sickness from space and VE sickness. The simulators show proportionately more reports of Oculomotor disturbance when compared to Nausea and Disorientation, whereas space sickness (see Fig. 12.9) and five of the eight VE devices (see Fig. 12.8) show a reverse of the simulator sickness pattern (i.e., proportionately more Nausea and Disorientation when compared to Oculomotor). In nearly all the simulator data, the prominent symptom cluster (Oculomotor) is statistically verifiable, and in five VE devices Disorientation is the statistically verifiable prominent symptom cluster. These results demonstrate quite convincingly that flight simulators and VE systems produce different patterns of symptomatology. In addition, VE systems produce higher levels of all three-symptom clusters than flight simulators. But why? Although there are not sufficient data to conclude confidently, we believe 1. For systems with relatively high oculomotor disturbances, one should naturally focus on the visual display system as the source of difficulty. Headmounted displays can have a distorted field of view that may drive visual disturbances, although in our experience persons in multiple CRT systems report more eyestrain, a key factor in the Oculomotor symptom cluster. These disturbances can be due to several issues, including: optical displays imaged at infinity but located at physically different distances (Ebenholtz, 1988), magnification differences between the right and left channels, right and left channel relative image rotation, off-axis views, relative misalignment between the optic axis of the left and right channels; inconsistent focus adjustment between channels, and luminance differences between channels, such as bi-ocular versus binocular displays (Rushton, Mon-Williams, & Wann, 1994). 2. The high rate of Disorientation symptomatology may come about in VEs for several reasons. First, there are rotation-induced effects because the head is capable of moving from side to side. These movements, which often exhibit noticeable lag, can also create pseudo-Coriolis stimuli (Dichgans & Brandt, 1973) as a side effect. Second, because of a difference in visually
272
KENNEDY ET AL.
displayed perception, and due to the fact that a person is seated while provided with visual cues that produce self-motion, this could produce another VIMS problem (Hettinger & Riccio, 1992). In flight simulators persons are also seated—but they would be in the real aircraft. In VEs one is seated, but usually the simulation is of walking. One might also focus on position tracking systems that track a user’s head, hand, or other body part to create virtual worlds from the user’s perspective. When a position tracking error occurs, there is a mismatch between the visual space perceived from the VE and the perceived proprioceptive (felt-position) cues. The cues from the visual system seem to dominate, thus causing cue conflicts, which may lead to nausea and/or to disorientation. Further support for these possible explanations of why flight simulators and VE systems produce different symptom patterns may be found by examining conditions present in space flight. As noted earlier, the symptom pattern produced by a number of VE systems (Type A) is identical to the symptom pattern associated with Space Motion Sickness (SMS). In the microgravity environment of space flight, the primary sensory information about movement through and position inside the spacecraft is visual. Particularly during the first few days of the mission, when SMS symptoms are most often present, astronauts increase their reliance on vision for self-motion and position information (Harm & Parker, 1993; Reschke et al., 1994). One might argue that there is a mismatch between visual and proprioceptive cues similar to that described above for VE systems. In addition, there also may be similar mismatches between visual and inertial cues in VE systems and the space flight environment. 3. The lowered Nausea in VE Type B compared to Type A could relate to the better computational capabilities of the particular systems that were used and/or to lowered transport delays of VE Type B systems found in our study. Admittedly, these items are speculative, but they are offered to suggest that the cybersickness experienced in VEs may be driven by different technological factors than the simulator sickness experienced in flight simulators. Arguably, if pursued, these symptom profiles may signal which aspects of the equipment should be improved in order to minimize sickness severity and rates.
ACKNOWLEDGMENTS This research was supported in part by National Aeronautics and Space Administration contracts 9-19482 and 9-19453 and National Science Foundation grant DMI-9561266. The opinions expressed here are the authors’ and should not be construed to be endorsement by the sponsoring agencies. The authors are indebted to senior investigators who collected some of the data: Dr. C. Braun, University of Idaho; Dr. J. Bliss, University of Houston; Dr. A.
12.
CONFIGURAL SCORING
273
Ritter, Murray State University; Dr. G. Kolasinski, University of Central Florida; Dr. D. Dryer, University of Central Florida; Dr. S. Goldberg, U.S. Army Research Institute; Dr. C. Dede and M. Salzman, George Mason University.
REFERENCES Baltzley, D. R., Gower, D. W., Kennedy, R. S., & Lilienthal, M. G. (1988). Delayed effects of simulator sickness: Incidence and implications. Aviation, Space, and Environmental Medicine, 59 (5), 465. Benson, A. J. (1978). Motion sickness. In G. Dhenin & J. Ernsting (Eds.), Aviation Medicine: Physiology and Human Factors (pp. 468–493). London, UK: British Crown Copyright. Bliss, J. P., Tidwell, P. D., Loftin, R. B., Johnston, B. E., Lyde, C. L., & Weathington, B. (1996). An experimental evaluation of virtual reality for training teamed navigation skills (University of Houston, Virtual Environment Technology Laboratory, Tech. Rep. No. 96-01). Brindley, G. S. (1960). Physiology of the retina and the visual pathway. Baltimore: William & Wilkins. Browder, G. B., & Butrimas, S. K. (1981). Visual technology research simulator—Visual and motion system dynamics (Tech. Rep. No. IH-326). Orlando, FL: Naval Training Equipment Center. Casali, J. G. (1986). Vehicular simulation-induced sickness: Vol. 1. An overview (Tech. Rep. No. NTSC-TR-86-010). Orlando, FL: Naval Training Systems Center. Cheung, B. S. K., Howard, I. P., & Money, K. E. (1991). Visually-induced sickness in normal and bilaterally labyrinthine-defective subjects. Aviation, Space, and Environmental Medicine, 62, 527– 531. Cheung, B. S. K., Howard, I. P., Nedzelski, J. M., & Landolt, J. P. (1989). Circularvection about Earth-horizontal axes in bilateral labyrinthine-defective subjects. Acta Otolaryngol (Stockholm), 108, 336–344. Chien, Y. Y., & Jenkins, J. (1994). Virtual reality assessment. Washington, DC: U.S. Government Printing Office. (A report of the Task Group on Virtual Reality to the High Performance Computing and Communications and Information Technology Subcommittee of the Information and Communications Research and Development Committee of the National and Science Technology Council). Chinn, H. I., & Smith, P. K. (1955). Motion sickness. Pharmacol. Review, 7, 33–82. Colehour, J. K., & Graybiel, A. (1966). Biochemical changes occurring with adaptation to accelerative forces during rotation (Joint Rep. No. NAMI-959). Pensacola, FL: National Aeronautics and Space Administration/U.S. Naval Aerospace Institute. Crampton, G. (Ed.). (1990). Motion and space sickness. Boca Raton, FL: CRC, Press. Crampton, G. H., & Young, F. A. (1953). The differential effect of a rotary visual field on susceptibles and nonsusceptibles to motion sickness. The Journal of Comparative and Physiological Psychology, 46(6), 451–453. Crosby, T. N., & Kennedy, R. S. (1982, May). Postural disequilibrium and simulator sickness following flights in a P3–C operational flight trainer. In Preprints of the 53rd Annual Scientific Meeting of the Aerospace Medical Association (pp. 147–148). Bal Harbour, FL: Aerospace Medical Association. Darwin, E. (1794). Zoonomia: Or, the laws of organic life. Dublin, Ireland: Byrne & Jones. Dichgans, J., & Brandt, T. (1972). Visual–vestibular interaction and motion perception. In J. Dichgans & E. Bizzi (Eds.), Cerebral control of eye movements and motion perception (pp. 327–338). Basel, NY: Karger. Dichgans, J., & Brandt, T. (1973). Optokinetic motion sickness as pseudo-Coriolis effects induced by moving visual stimuli. Acta Otolaryngology, 76, 339–348. Dichgans, J., & Brandt, T. (1978). Visual–vestibular interaction: Effects on self-motion perception and postural control. In R. Held, H. W. Leibowitz, & H. L. Teuber (Eds.), Handbook of sensory physiology: Vol. 8. Perception (pp. 756–795). Berlin: Springer-Verlag.
274
KENNEDY ET AL.
DiZio, P., & Lackner, J.R. (1997). Circumventing side effects of immersive virtual environments, In M. Smith, G. Salvendy, & R. Koubek (Eds.). Design of computing systems: Social and ergonomic considerations. Amsterdam: Elsevier, pp. 893–896. Dolezal, H. (1982). Living in a world transformed: Perceptual and performatory adaptation to a visual distortion. New York: Academic Press. Durlach, N. I., & Mavor, A. S. (Eds.). (1995). Virtual reality: Scientific and technological challenges. Washington, DC: National Academy Press. Ebenholtz, S. M. (1988). Sources of asthenopia in Navy flight simulators. Alexandria, VA: Defense Logistics Agency, Defense Technical Information Center. (AD No. A212 699) Eskin, A., & Riccio, D. C. (1966). The effects of vestibular stimulation on spontaneous activity in the rat. Psychological Record, 16(4), 523. Fowlkes, J. E., Kennedy, R. S., & Allgood, G. O. (1990, April). Biomedical evaluation and systemsengineering for simulators (BESS). Paper presented at the International Training Equipment Conference and Exhibition (ITEC), Birmingham, England. Frank, L. H., Kennedy, R. S., Kellogg, R. S., & McCauley, M. E. (1983). Simulator sickness: Reaction to a transformed perceptual world. I. Scope of the problem (Contract No. 81-C-0105). Orlando, FL: Naval Training Equipment Center. (AD No. A192 438) Graybiel, A., Clark, B., & Zarriello, J. J. (1960). Observations on human subjects living in a “slow rotation” room for periods of two days. Archives of Neurology, 3, 55–73. Graybiel, G. A., Guedry, F. E., Johnson, W., & Kennedy, R. S. (1961). Adaptation to bizarre stimulation of the semicircular canals as indicated by the oculogyral illusion. Aerospace Medicine, 32, 321–327. Graybiel, A., & Knepton, J. (1976). Sopite syndrome: A sometimes sole manifestation of motion sickness. Aviation, Space, and Environmental Medicine, 47, 873–882. Graybiel, A., Wood, C. D., Miller, E. F., II, & Cramer, D. B. (1968). Diagnostic criteria for grading the severity of acute motion sickness. Aerospace Medicine, 39, 453. Gower, D. W., Lilienthal, M. G., Kennedy, R. S., & Fowlkes, J. E. (1987, September). Simulator sickness in U.S. Army and Navy fixed- and rotary-wing flight simulators. In AGARD Conference Proceedings No. 433: Motion Cues in Flight Simulation and Simulator-Induced Sickness (pp. 8.1– 8.20). Neuilly-sur-Seine, France: Advisory Group for Aerospace Research and Development. Harm, D. L., & Parker, D. E. (1993). Perceived self-orientation and self-motion in microgravity, after landing and during preflight adaptation training. Journal of Vestibular Research, Equilibrium & Orientation, 3, 297–305. Hettinger, L. J. (2002). Illusory self-motion in virtual environments. In K. Stanney (Ed.), Handbook of virtual environments. (pp. 471–491) Mahwah, NJ: Lawrence Erlbaum Associates. Hettinger, L. J., Berbaum, K. S., Kennedy, R. S., Dunlap, W. P., & Nolan, M. D. (1990). Vection and simulator sickness. Military Psychology, 2(3), 171–181. Hettinger, L. J., Nolan, M. D., Kennedy, R. S., Berbaum, K. S., & Schnitizius, K. P., & Edinger, K. M. (1987). Visual display factors contributing to simulator sickness. Proceedings of the 31st Annual Meeting of the Human Factors Society (pp. 497–501). Santa Monica, CA: Human Factors Society. Hettinger, L. J., & Riccio, G. E. (1992). Visually induced motion sickness in virtual environment. Presence, 1, 306–310. Homick, J. L. (1982). Space motion sickness (Tech. Rep. No. USC 18681). Houston, TX: NASA Johnson Space Center. Howard, I. P., & Howard, A. (1994). Vection: The contributions of absolute and relative visual motion. Perception, 23, 745–751. Kellogg, R. S., Castore, C., & Coward, R. E. (1980, May). Psychophysiological effects of training in a full vision simulator. In Preprints of the 51st Annual Meeting of the Aerospace Medical Association (pp. 203–208). Anaheim, CA. Kellogg, R. S., Kennedy, R. S., & Graybiel, A. (1965). Motion sickness symptomatology of labyrinthine defective and normal subjects during zero gravity maneuvers. Aerospace Medicine, 36, 315– 318.
12.
CONFIGURAL SCORING
275
Kennedy, R. S. (1996). Analysis of simulator sickness data (Tech. Rep. under Contract No. N61339-91D-0004 with Enzian Technology, Inc.). Orlando, FL: Naval Air Warfare Center, Training Systems Division. Kennedy, R. S., Berbaum, K. S., Lilienthal, M. G., Dunlap, W. P., Mulligan, B. E., & Funaro, J. F. (1987). Guidelines for alleviation of simulator sickness symptomatology (Tech. Rep. No. TR-87007). Orlando, FL: Naval Training Systems Center. Kennedy, R. S., & Fowlkes, J. E. (1992). Simulator sickness is polygenic and polysymptomatic: Implications for research. International Journal of Aviation Psychology, 2(1), 23–38. Kennedy, R. S., & Frank, L. H. (1986). A review of motion sickness with special reference to simulator sickness (Tech. Rep. No. 81-C-0105-16). Orlando, FL: Naval Training Equipment Center. Kennedy, R. S., & Graybiel, A. (1965). The Dial test: A standardized procedure for the experimental production of canal sickness symptomatology in a rotating environment (Rep. No. 113, NSAM 930). Pensacola, FL: Naval School of Aerospace Medicine. Kennedy, R. S., Graybiel, R. C., McDonough, R. C., & Beckwith, F. D. (1968). Symptomatology under storm conditions in the North Atlantic in control subjects and in persons with bilateral labyrinthine defects. Acta Otolaryngologica, 66, 533–540. Kennedy, R. S., Hettinger, L. J., Harm, D. L., Ordy, J. M., & Dunlap, W. P. (1996). Psychophysical scaling of circular vection (CV) produced by optokinetic (OKN) motion: Individual differences and effects of practice. Journal of Vestibular Research, 6(5), 331–341. Kennedy, R. S., Hettinger, L. J., & Lilienthal, M. G. (1990). Simulator sickness. In G. H. Crampton (Ed.), Motion and space sickness (pp. 179–215). Boca Raton, FL: CRC Press. Kennedy, R. S., Jones, M. B., Lilienthal, M. G., & Harm, D. L. (1994). Profile analysis of aftereffects experienced during exposure to several virtual reality environments. In AGARD Conference Proceedings—“Virtual Interfaces: Research and Applications” (AGARD-CP-541, pp. 2.1–2.9). Neuilly-sur-Seine, France: Advisory Group for Aerospace Research & Development. Kennedy, R. S., Jones, M. B., Stanney, K. M., Ritter, A. D., & Drexler, J. M. (1996). Human factors safety testing for virtual environment mission-operation training (Final Rep., Contract No. NAS919482). Houston, TX: NASA Johnson Space Center. Kennedy, R. S., Lane, N. E., Berbaum, K. S., & Lilienthal, M. G. (1993). Simulator Sickness Questionnaire (SSQ): A new method for quantifying simulator sickness. International Journal of Aviation Psychology, 3(3), 203–220. Kennedy, R. S., Lane, N. E., Lilienthal, M. G., Berbaum, K. S., & Hettinger, L. J. (1992). Profile analysis of simulator sickness symptoms: Application to virtual environment systems. Presence, 1(3), 295–301. Kennedy, R. S., Lanham, D. S., Drexler, J. M., Massey, C. J., & Lilienthal, M. G. (1995). Cybersickness in several flight simulators and VR devices: A comparison of incidences, symptom profiles, measurement techniques and suggestions for research. In M. Slater (Ed.), Proceedings of the Conference of the FIVE Working Group, Framework for Immersive Virtual Environments (ESPRIT Working Group 9122, pp. 243–251). London: QMW University of London. Kennedy, R. S., Lilienthal, M. G., Berbaum, K. S., Baltzley, D. R., & McCauley, M. E. (1989). Simulator sickness in U.S. Navy flight simulators. Aviation, Space, and Environmental Medicine, 60, 10–16. Kennedy, R. S., Moroney, W. F., Bale, R. M., Gregoire, H. G., & Smith, D. G. (1972). Comparative motion sickness symptomatology and performance decrements occasioned by hurricane penetrations in C-121, C-130, and P-3 Navy aircraft. Aerospace Medicine, 43(11), 1235–1239. Kennedy, R. S., & Stanney, K. M. (1996a). Postural instability induced by virtual reality exposure: Development of a certification protocol. International Journal of Human–Computer Interaction, 8(1), 25–47. Kennedy, R. S., & Stanney, K. M. (1996b). Virtual reality systems and products liability. The Journal of Medicine and Virtual Reality, 60–64.
276
KENNEDY ET AL.
Kennedy, R. S., Stanney, K. M., Compton, D. E., Drexler, J. M., & Jones, M. B. (1999). Virtual environment adaptation assessment test battery (Phase II Final Rep., Contract No. NAS9-97022). Houston, TX: NASA Johnson Space Center. Kennedy, R. S., Stanney, K. M., Dunlap, W. P., & Jones, M. B. (1996). Virtual environment adaptation assessment test battery (Final Rep., Contract No. NAS9-19453). Houston, TX: NASA Johnson Space Center. Kennedy, R. S., Tolhurst, G. C., & Graybiel, A. (1965). The effects of visual deprivation on adaptation to a rotating environment (Research Rep. No. 106, NSAM 918). Pensacola, FL: U.S. Naval School of Aviation Medicine. Kingdon, K. S., Stanney, K. M., & Kennedy, R. S. (2001). Extreme responses to virtual environment exposure. Proceedings of the Human Factors and Ergonomics Society 45th Annual Meeting (pp. 1906-1911). Santa Monica, CA: Human Factors and Ergonomics Society. Kohler, I. (1968). The formation and transformation of the perceptual world. In R. N. Haber (Ed.), Contemporary theory and research in visual perception (pp. 474–497). New York: Holt, Rinehart & Winston. Kolasinski, E. M. (1995, May). Simulator sickness in virtual environments (Tech. Rep. No. 1027). Orlando, FL: U.S. Army Research Institute. Kolasinski, E. M. (1996). Prediction of simulator sickness in a virtual environment. Unpublished doctoral dissertation, University of Central Florida, Orlando. Lackner, J. R., & Graybiel, A. (1984). Elicitation of motion sickness by head movements in the microgravity phase of parabolic flight maneuvers. Aviation, Space, and Environmental Medicine, 55(6), 513–520. Lampton, D. R., Kolasinski, E. M., Knerr, B. W., Bliss, J. P., Bailey, J. H., & Witmer, B. G. (1994). Side effects and aftereffects of immersion in virtual environments. In Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting (pp. 1154–1157). Santa Monica, CA: Human Factors and Ergonomics Society. Lane, N. E., & Kennedy, R. S. (1988, June). A new method for quantifying simulator sickness: Development and application of the simulator sickness questionnaire (SSQ) (Technical Report No. EOTR 88-7). Orlando, FL: Essex Corporation. Likert, R. (1967). The human organization. New York: McGraw-Hill. Machover, C. (1996). What virtual reality needs. Information Display, 12(6), 32–34. McCauley, M. E., & Cook, A. M. (1987). Simulator sickness research program at NASA-Ames Research Center. In Proceedings of the Human Factors Society 31st Annual Meeting (pp. 502–504). Santa Monica, CA: The Human Factors Society. McCauley, M. E., & Kennedy, R. S. (1976, September). Recommended human exposure limits for verylow-frequency vibration (Technical Publication No. TP-76-36). Point Magu, CA: Pacific Missile Test Center. McCauley, M. E., & Sharkey, T. J. (1992). Cybersickness: Perception of self-motion in virtual environments. Presence, 1(3), 311–318. McGuinness, J., Bouwman, J. H., & Forbes, J. M. (1981). Simulator sickness occurrences in the 2E6 Air Combat Maneuvering Simulator (ACMS) (Tech. Rep. No. 80-C-0315-4500-1). Orlando, FL: Naval Training Equipment Center. (AD A097 742/1) Military Standard 1472C. (1981). Human engineering design criteria for military systems, equipment and facilities (MIL-STD-1472C). Washington, DC: Department of Defense. Money, K. E. (1970). Motion sickness. Psychological Reviews, 50(1), 1–39. Money, K. E. (1972). Measurement of susceptibility to motion sickness. In AGARD Conference Proceedings No. 109: Predictability of Motion Sickness in the Selection of Pilots (pp. B2-1-B2-4). Neuilly-sur-Seine, France: Advisory Group for Aerospace Research and Development. Oman, C. M. (1991). Sensory conflict in motion sickness: An observer theory approach. In S. R. Ellis, M. K. Kaiser, & A. C. Grunwald (Eds.), Pictorial communication in virtual and real environments (pp. 362–376). New York: Taylor & Francis.
12.
CONFIGURAL SCORING
277
Paloski, W. H., Black, F. O., Reschke, M. F., Calkins, D. S., & Shupert, C. (1993). Vestibular ataxia following shuttle flights: Effects of microgravity on otolith-mediated sensorimotor control of posture. The American Journal of Otology, 14(1), 9–17. Parker, D. E. (1971). A psychophysiological test for motion sickness susceptibility. Journal of General Psychology, 85, 87. Parker, D. E., Reschke, M. F., Arrott, A. P., Homick, J. L., & Lichtenberg, B. K. (1985). Otolith tilttranslation reinterpretation following prolonged weightlessness: Implications for preflight training. Aviation, Space, and Environmental Medicine, 56(6), 601–606. Reason, J. T., & Brand, J. J. (1975). Motion sickness. New York: Academic Press. Regan, E. C., & Price, K. R. (1994). The frequency of occurrence and severity of sideeffects of immersion virtual reality. Aviation, Space, and Environmental Medicine, 65(6), 527– 530. Reschke, M. F., Harm, D. L., Parker, D. E., Sandoz, G. R., Homick, J. L., & Vanderploeg, J. M. (1994). Neurophysiologic aspects: Space and motion sickness. In A. E. Nicogossian, C. L. Huntoon, & S. L. Pool (Eds.), Space physiology and medicine (3rd ed., pp. 228–260). Philadelphia: Lee & Febiger. Reschke, M. F., Kornilova, L. N., Harm, D. L., Bloomberg, J. J., & Paloski, W. H. (1997). Neurosensory and sensory–motor function. In C. S. L. Huntoon, V. V. Antipov, & A. I. Grigoriev (Eds.), Space biology and medicine: Vol. 3, Book 1. Humans in spaceflight (pp. 135–193). Reston, VA: American Institute of Aeronautics and Astronautics. Riccio, G. E., & Stoffregen, T. A. (1991). An ecological theory of motion sickness and postural instability. Ecological Psychology, 3(3), 195–240. Rich, C. J., & Braun, C. C. (1996). Assessing the impact of control and sensory compatibility on sickness in virtual environments. In Proceedings of the Human Factors and Ergonomics Society 40th Annual Meeting (pp. 1122–1125). Santa Monica, CA: Human Factors and Ergonomics Society. Rushton, S., Mon-Williams, M., & Wann, J. P. (1994). Binocular vision in a bi-ocular world: Newgeneration head-mounted displays avoid causing visual deficit. Displays, 15(4), 255–260. Salzman, M. C., Dede, C., & Loftin, R. B. (1995). Usability and learning in educational virtual realities. In Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting (pp. 486–490). Santa Monica, CA: Human Factors and Ergonomics Society. Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15, 72–101. Spearman, C. (1907). Demonstration of formulae for true measurement of correlation. The American Journal of Psychology, 18(2), 161–169. Stanney, K. M., & Kennedy, R.S. (1997). Cybersickness is not simulator sickness. In Proceedings of the 41st Human Factors and Ergonomics Society Annual Meeting (pp. 1138–1142). Santa Monica, CA: Human Factors and Ergonomics Society. Stanney, K. M., Mourant, R. R., & Kennedy, R. S. (1998). Human factors issues in virtual environments: A review of the literature. Presence, 7(4), 327–351. Stern, R. M., Koch, K. L., Stewart, W. R., & Lindblad, I. M. (1987). Spectral analysis of tachygastria recorded during motion sickness. Gastroenterology, 92, 92–97. Stratton, G. M. (1897). Vision without inversion of the retinal image. Psychological Review, 4, 341. Thornton, W. E., Moore, J. P., Pool, S. L., & Vanderploeg, J. (1987). Clinical characterization and etiology of space motion sickness. Aviation, Space, and Environmental Medicine, 58(9), A1–A8. Tyler, D. B., & Bard, P. (1949). Motion sickness. Physiological Review, 29, 311–369. Ungs, T. J. (1989). Simulator induced syndrome: Evidence for long-term aftereffects. Aviation, Space, and Environmental Medicine, 60, 252–255. Wendt, G. R. (1968). Experiences with research on motion sickness. In Fourth Symposium on the Role of Vestibular Organs in Space Exploration (NASA SP-187, pp. 29–32). Washington, DC: National Aeronautics and Space Administration.
278
KENNEDY ET AL.
Wiker, S. F., Kennedy, R. S., McCauley, M. E., & Pepper, R. L. (1979). Susceptibility to seasickness: Influence of hull design and steaming direction. Aviation, Space, and Environmental Medicine, 50, 1046–1051. Witkin, H. A. (1949). Perception of body position and of the position of the visual field. Psychological Monographs, 63, 1–46. Wood, R. W. (1895). The “haunted swing” illusion. Psychological Review, 2, 277–278. Yoo, Y. H. (1999). The prediction and quantification of individual differences in susceptibility to simulator sickness in a fixed-base simulator. Unpublished doctoral dissertation, University of Central Florida, Orlando.
13 A Cybernetic Analysis of the Tunnel-in-the-Sky Display Max Mulder J. A. Mulder Faculty of Aerospace Engineering Delft University of Technology
Henk G. Stassen Faculty of Design, Engineering and Production Mechanical Engineering and Marine Technology Delft University of Technology
THE FUTURE OF AIR TRAFFIC MANAGEMENT The management of air traffic will change dramatically in the near future. The increasing number of delays near congested airports indicates a capacity problem of the air traffic management (ATM) system that is currently operational. With a predicted 270% growth in passenger transportation volume before 2015 (RAND, 1993), this problem becomes a threat for an economical and safe air transportation system. New technologies are being developed with the dual objective of increasing air traffic efficiency and enhancing safety. The future ATM system is based on two principles. First, flexibility in air traffic control is increased by allowing complex curved approach and departure profiles. Second, aircraft must follow these profiles with high accuracy in both position and time, yielding a four-dimensional navigation environment. In the future, aircraft must remain within a limited volume of airspace—a bubble in the sky—that moves along the planned trajectory in time. The introduction of 4-D navigation procedures has significant impact on air traffic control but also on the flight crew. 279
280
MULDER, STASSEN, AND MULDER
It is a well-known fact that considerable problems exist in the modern automated cockpit environment. Phenomena like a lack of situation awareness and flight control system mode awareness could be caused by inadequate interface design and an inappropriate implementation of the automated systems. Current research in cockpit interface design is directed at bringing pilots “back into the loop,” with a more active role in the operation of their vehicles. In the future 4-D navigation environment, the pilot task demand load will increase substantially, and situation awareness must therefore be improved. Enhancing flight-deck automation even further is a feasible option to solve these problems but may move the pilot even more out of the loop. The Tunnel-in-the-Sky Display Another possible solution for the issues sketched above is to improve the presentation of the 4-D guidance and navigation information by means of intuitive displays (Oliver, 1990). A perspective flight-path display, such as the tunnel-in-the-sky display (Fig. 13.1), is a viable candidate to become the primary flight display of the future. The tunnel display shows a three-dimensional presentation of the planned trajectory in addition to the status information of a conventional primary flight display. Previous research has shown that a tunnel display has significant advantages over conventional displays. It allows high-precision, manual trajectory following (Grunwald, 1984; Theunissen, 1995; Wilckens, 1973), improves situation awareness (Parrish, Busquets, Williams, & Nold, 1994; Regal & Whittington, 1995), and is compatible with pilot manual and supervisory tasks (Funabiki, 1997; Grunwald, 1984). The application of a tunnel-in-the-sky display in the cockpit has important consequences for the conduct of flight. With conventional displays, the pilot mentally reconstructs the aircraft’s spatiotemporal situation from a number of planar, twodimensional displays (Haskell & Wickens, 1993). With a perspective display, this information is presented in a spatial format, similar to the information humans are confronted with in daily life. The design of a tunnel display should therefore be preceded by a study of human characteristics in visually processing these spatiotemporal visual scenes. This chapter presents a cybernetic approach—centered on dynamic information processing and control—in order to analyze the tunnel-in-the-sky display, which can be integrated with concepts originating from the ecological approach to visual perception (Gibson, 1986). A CYBERNETIC ANALYSIS At the Delft University of Technology, a research project was initiated in 1993 to investigate the applicability of a tunnel display for pilot manual control tasks. In contrast to other studies, the goal of the research was not to compare the tunnel
13.
TUNNEL-IN-THE-SKY
281
FIG. 13.1. The tunnel-in-the-sky display. The aircraft reference symbol (1) is positioned in the center of the display. The tunnel projection (2) shows the reference trajectory. The two linear tapes on the left and right depict the aircraft velocity (3) and altitude (4), respectively. The horizon line (5) marks the aircraft attitude. The aircraft heading and bank angles are shown by the heading indicators (6) and the bank indicator (7).
display with current displays in terms of pilot performance, situation awareness, and workload. Rather, the objective was to understand how pilots use the tunnel display as their main source of information in the aircraft guidance task (Mulder, 1995). A methodology was developed, labeled the cybernetic approach, which allows substantial insight into the effects of varying display designs on pilot behavior (Mulder, 1999). It is an integrated multiple-stage approach for the study of pilot manual control behavior centered around information, more specifically, information used for control, and it resembles active psychophysics (Flach, 1991). Four steps can be distinguished (cf. Warren, 1988). First, an analysis is conducted of a pilot’s tasks and needs. Second, the perceptual cues that are available and used by the pilot are investigated. The third and fourth stages consist of empirical studies and a mathematical, model-based analysis. The present chapter deals with the second stage only. (For the complete analysis, the reader is referred to Mulder, 1999.) The conclusions of a task analysis are discussed first, followed by an introduction on the concepts of information transfer and information processing.
282
MULDER, STASSEN, AND MULDER
Next, the structure of the cybernetic analysis is explained, as are the principal subjects of investigation. The Pilot Manual Control Tasks Mulder (1995) reasoned that the task of manually guiding an aircraft along a curved trajectory can be divided into several subtasks. When it is assumed that the trajectory consists of a concatenation of straight and curved segments, all subtasks can be categorized as either regulation or anticipation tasks. In a regulation task, which is the subject here, the pilot attempts to maintain a stationary flight condition corresponding with time-invariant guidance constraints. In an anticipation task, the pilot controls a transient maneuver between two stationary flight conditions. In both tasks, the pilot needs information about the aircraft spatiotemporal situation with respect to the guidance constraints. Information Transfer and Information Processing With conventional displays, the aircraft status is shown through a set of individual and semi-integrated indicators. The tunnel display, however, shows the aircraft flight condition with respect to the trajectory in an integrated, spatial fashion through the projection of a regular geometrical shape—the tunnel—on the view plane. Because of their high proficiency in visually processing dynamic threedimensional scenes, humans seem to pick up the information much more easily. But what does this mean? This question is addressed through an analysis of two intimately related cybernetic issues, information transfer and information processing. These terms do not indicate stages in perception. Rather, they are used here to distinguish between those properties of interaction that can be described in mathematical terms and those that cannot. Information processing deals with the question of how pilots perceive and use the information presented in conducting their task. It focuses on the cognitive and perceptual mechanisms of the human observer and guides the selection and analysis of the most informative optical cues conveyed by the display. Information transfer refers to a mathematical description of the characteristics of the optical cues in terms of the aircraft state, the variables affecting tunnel geometry, and the properties of the perspective projection. In short, information transfer is a matter of geometry, whereas information processing is a matter of psychology (Warren & Owen, 1982). The Ecological Approach In the tunnel display context, the principal issue in investigating human information processing is the perception of three-dimensional scenes in motion. The earliest developments in this field were conducted by perceptual psychologists,
13.
TUNNEL-IN-THE-SKY
283
most notably James J. Gibson (1950, 1986). Central in his ecological approach to visual perception is the locomotion of an observer with respect to a surface. In the 1980s, a number of empirical studies were conducted to investigate the potential sources of information for altitude control (e.g., Johnson, Tsang, Bennett, & Phatak, 1989; Wolpert, 1988; Wolpert & Owen, 1985). In these experiments, participants controlled their altitude during locomotion above a flat ground surface. Three ground textures were examined: (1) lines parallel to the direction of motion, the projection of which makes available the optical splay angle information; (2) lines perpendicular to the direction of motion, the projection of which conveys optical depression angle information (also referred to as optical density); and (3) the combination of both textures. Optical splay and optical density are isolated components from the global optical expansion pattern that results from an approach to a ground surface (Gibson, 1950). Texture parallel to the direction of motion isolates the perspective gradient, whereas texture perpendicular to the direction of motion isolates the compression gradient (Cutting & Millard, 1984). The main interest in these experiments was in identifying the source of information that proved to be the most useful for the control of altitude above a surface: optical splay angle, optical density, or both? Although the results were rather contradictory (see Flach, Hagen, & Larish, 1992, for an excellent survey), the theoretical concepts involved have significant implications in the tunnel display context (Mulder, 1999). The control of distance with respect to a surface is very relevant in the task of guiding the aircraft through a tunnel. The tunnel walls limit the lateral and vertical motion of the aircraft with respect to the trajectory, and the aircraft will always approach two of the four tunnel planes. Because the tunnel is presented by means of a perspective wire frame, conveying the same cues as introduced above, it is hypothesized that these cues—optical splay and optical density, or, equivalently, the gradients of perspective and compression—are essential in examining the tunnel display. Rectilinear and Curvilinear Motion In accordance with the separation of the pilot’s guidance task into several subtasks, the information conveyed by the tunnel display can be analyzed for two elementary motion conditions—rectilinear and curvilinear motion (Gordon, 1966; Mulder, 1999). In the next two sections, the rectilinear motion condition will be discussed, starting with a mathematical analysis—the information transfer—of potential optical cues. The goal is to investigate how small deviations from the reference flight condition are coded in the display. The mathematical analysis goes in depth and applies concepts originating from several fields, such as aircraft kinematics and optical gradient theories. The selection of potential cues is based on the findings from a literature survey (Mulder,
284
MULDER, STASSEN, AND MULDER
1999). Then, the pilot’s perception and use—information processing—of the selected array of optical cues in order to control the aircraft in the rectilinear motion condition will be discussed. Here, the functionality of the optical cues for the pilot’s task is analyzed in terms of the findings reported in the literature together with their characteristics as specified by the mathematical analysis.
INFORMATION IN STRAIGHT TUNNEL SECTIONS Static and Dynamic Optical Cues The investigation will prove that it is useful to distinguish between static and dynamic optical cues. Static cues are sources of information that can be discerned from a single snapshot of the display. Dynamic cues are defined as sources of information that can be discerned from the animated picture resulting from the motion through the tunnel. It will be shown that the static cues of linear perspective essentially provide information about the attitude and position of the aircraft relative to the tunnel. The flight path—the direction of motion of the aircraft relative to the trajectory—can in principle not be perceived from a static tunnel image. Information about flight path is conveyed by the dynamic optical cues of motion perspective. The dynamic cues can be categorized one step further. First, there are the time derivatives of the static optical cues, labeled indirect dynamic cues. Second, the cues resulting from the optic flow of the animated image are labeled direct dynamic sources of information (Mulder, 1999). The potential sources of visual information in the tunnel display for the task of following a straight trajectory are discussed in this section. First, a generic tunnel geometry is defined. Then the static cues are examined, followed by a study of the dynamic cues. Definition of the Situation The general situation considered here is illustrated in Fig. 13.2, which shows the (instantaneous) tunnel display image when flying through a straight section of the tunnel. Three elements of the tunnel geometry are distinguished: (1) the tunnel frames (F), which are positioned in longitudinal direction; (2) the longitudinal frame lines (Li , i = 1–4), connecting the vertices of the tunnel frames; and (3) the altitude poles (A), which connect each individual frame with the Earth’s surface. The trajectory has a nonzero downslope t (defined positive when downward) and a zero reference heading angle. The tunnel frames have a width Wt and a height Ht , positioned at intermediate distances Dt , and are numbered 1, 2, and so forth, starting from the first visible frame. The aircraft is positioned in the tunnel with an arbitrary position and attitude with respect to the tunnel
13.
TUNNEL-IN-THE-SKY
285
FIG. 13.2. The various geometrical elements of a generic tunnel-in-the-sky display. The tunnel geometry consists of frames (F) positioned on altitude poles (A) and connected with four longitudinal frame lines (Li , i = 1 − 4). The subsequent tunnel frames are numbered 1, 2, and so forth, starting from the first visible frame. The aircraft reference symbol (C) marks the display center. The horizon line (H) shows the aircraft attitude with respect to the world.
centerline. The attitude of the aircraft with respect to the world is defined with the three Euler angles ψ (heading), θ (pitch) and φ (roll). The aircraft position is defined relative to the tunnel centerline in terms of a position error in lateral (X e ) and vertical (Ve ) direction, defined positive to the right and below the tunnel centerline. Static Optical Cues Cue Inventory When the aircraft is positioned in the tunnel with an arbitrary position and attitude with respect to the straight trajectory, the display image will be similar to that of Fig. 13.2. The main static cues are described at the hand of Fig. 13.3, which shows three subsets of cues (defined with respect to the view-plane frame of reference U, V ) resulting from the projection of the longitudinal (Fig. 13.3a), vertical (Fig. 13.3b), and lateral (Fig. 13.3c) elements of the tunnel geometry. Figure 13.3a shows that the attitude angles θ and φ are directly coded in the display. In addition, the following cues can be defined: 1. The position of the vanishing point (u∞ , v∞ ), defined as the projection on the viewplane of an arbitrary point of the tunnel that lies an infinite distance ahead.
(a)
(b)
(c)
FIG. 13.3. The three subsets of static optical cues in a straight tunnel section: (a) the longitudinal tunnel cues (1)–(2); (b) the lateral tunnel cues (3)–(4); (c) the vertical tunnel cues (5)–(6).
286
13.
TUNNEL-IN-THE-SKY
287
2. The optical splay angles i (i = 1–4), defined as the angles of the longitudinal frame lines with respect to the horizon. Another splay angle can be defined for the virtual line connecting the tops of all altitude poles ( 5 ). Figure 13.3b shows the lateral tunnel cues: 3. The lateral displacements εi (left) and ηi (right) of the vertical frame lines (frame i) with respect to the rotated view-plane centerline V . The lateral displacements πi of the altitude poles are similar cues. 4. The relative lateral displacements εi j (left), ηi j (right) and πi j of the vertical frame lines and the altitude poles of frames i and j. Figure 13.3c shows the vertical tunnel cues: 5. The vertical displacements µi (bottom) and νi (top) of the lateral frame lines (frame i) with respect to the rotated view-plane centerline U . 6. The relative vertical displacements µi j (bottom) and νi j (top) of the lateral frame lines of frames i and j. Mathematical expressions can be derived that relate the optical cues to the aircraft position and attitude with respect to the tunnel trajectory. These expressions are in principle nonlinear and must be linearized for later use. A suitable linearization point that is applied throughout the analysis is that of zero position errors and small attitude angles. The latter assumption allows the application of small angle equivalents of geometric functions. The linearized expressions can be used to study the effects of (small) deviations from the linearization point on the tunnel image. (Only the main findings of the mathematical survey are reported here; for a comprehensive study, the reader is referred to Mulder, 1999). Position of the Vanishing Point The position of the vanishing point on the view plane can be approximated by the following equations: u ∞ = −κ (ψ) v ∞ = −κ (θ + t ) with κ a display constant originating from the perspective projection. The position of the vanishing point in lateral and vertical direction represents the lateral (ψ) and vertical (θ + t ) angular difference between the aircraft longitudinal axis (the viewing axis) and the tunnel centerline, respectively. Its position is independent of
288
MULDER, STASSEN, AND MULDER
the aircraft position error. Extending the planes of the top and bottom tunnel walls into the distance results in a horizontal pseudo-horizon located at −κt below the true horizon (Fig. 13.3a). Extending the planes of the left and right tunnel walls into the distance yields a second—vertical—pseudo-horizon perpendicular to the true horizon. The vanishing point is the cross point of both pseudo-horizons and marks the projection of the tunnel centerline at infinity. The Optical Splay Angles The linearized relations for the splay angles yield equations for the change in splay angle ω from the reference condition, for example, for 1 : ω1 = −
2Wt Wt2 + Ht2
Ve −
2Ht Wt2 + Ht2
Xe
The same holds for the other splay angles, except for sign differences. The splay angle expressions are independent of the perspective projection (κ) and the aircraft attitude with respect to the tunnel. They are only a function of the position error with respect to the trajectory and the tunnel size, and provide strong cues for symmetry. When either the vertical or lateral position error is zero, symmetrical conditions result with respect to the horizontal and vertical pseudohorizons, respectively. When neither X e nor Ve are zero, the situation is not symmetric and all splay angles are rotated due to the effects of both the position errors. Hence, the position errors are shown in a coupled fashion. The Lateral Displacement Cues All displacement cues are derived in the rotated view-plane reference frame U V . The following linearized expressions hold: Wt + 2X e εi = κ ψ + 2Di Wt − 2X e ηi = −κ ψ − 2Di Xe πi = κ ψ + Di with Di the distance to the frame i. The relative lateral displacements εi j (left), ηi j (right), and πi j (poles) are defined simply as εi j = εi − ε j , and so forth.
13.
TUNNEL-IN-THE-SKY
289
Expressions for the relative displacements can be derived using the appropriate linearized equations. The Vertical Displacement Cues The derivation of the vertical displacement cues is similar to their lateral companions, yielding the following equations: Ht − 2Ve µi = κ (θ + t ) + 2Di Ht + 2Ve νi = −κ (θ + t ) − 2Di These expressions form the basis of the computation of all other vertical displacement cues, conducted in a similar fashion as for their lateral counterparts. The displacement cues show the aircraft attitude and the aircraft position errors in an uncoupled fashion: The lateral expressions are independent of the properties in the vertical dimension, and vice versa. In both dimensions, the displacement of a frame i on the display is due to the attitude and position error with respect to the tunnel. The contribution of the position error is scaled with the distance to the frame Di , increasing distances reduce the effects of a position error. When the distance goes to infinity, the vanishing point is obtained. The relative displacements of tunnel frames i and j only depend on the position error and a compression factor, determined by the distances to the frames involved. When the vanishing point is taken as the reference, the vertical displacement of a lateral frame line with respect to this point—or, equivalently, with respect to the horizontal pseudo-horizon—can be regarded as a depression angle. The same holds for the lateral displacement of a vertical frame line with respect to the vertical pseudo-horizon. Hence, it is hypothesized that the two pseudo-horizons are the primary reference for the displacements of the tunnel frames. For instance, consider the lateral dimension (Fig. 13.3b). A position error to the left of the trajectory leads to a compression of the “texture” of the left tunnel wall—the vertical elements of the tunnel frames at the left of the vertical pseudo-horizon—and an expansion of the texture of the right tunnel wall—the vertical elements of the tunnel frames at the right of the vertical pseudo-horizon: Wt + 2X e Wt − 2X e εi∞ = εi − ε∞ = κ ; ηi∞ = ηi − η∞ = κ 2Di 2Di The same holds in the vertical dimension (Fig. 13.3c) for the compression of the texture on the bottom and top tunnel walls. The two pseudo-horizons could serve as a reference for the compression of the texture elements—the lateral and vertical lines of the tunnel frames—on the top/bottom and left/right tunnel walls.
290
MULDER, STASSEN, AND MULDER
To summarize: The attitude of the aircraft with respect to the tunnel centerline is conveyed by the vanishing point. The position of the aircraft with respect to the trajectory is conveyed by cues that represent the gradients of optical perspective— the splay angles—and optical compression—the relative displacement cues. The static cues do not convey any information about the relative motion with respect to the tunnel trajectory.
DYNAMIC OPTICAL CUES Definition of a Rectilinear Flight Condition The rectilinear flight condition is most relevant for the analysis of straight tunnel sections. In this condition, the aircraft travels along a straight line. The aircraft velocity vector V is constant in direction (with respect to the world) and magnitude (referred to as Vtas ). Effects of air turbulence and wind are neglected. The aircraft rotation vector is set at zero. The attitude of the aircraft velocity vector with respect to the aircraft longitudinal axis is determined by two aerodynamic angles: the angle of attack α and the angle of slip β. The aircraft flight-path angle determines the aircraft direction of motion relative to the world. For small aircraft attitude and aerodynamic angles, the following relations hold: χ w = ψ + β and γ w = θ − α. Here, χ w is defined as the flight-path azimuth angle and γ w the angle of climb. The direction of motion relative to the trajectory is then given by: χ t = χ w and γ t = −(t + γ w ). Following a straight trajectory implies that the aircraft velocity vector must remain aligned with this trajectory. Hence, the latter two angles constitute the flight-path angle error: χe = χ t and γe = γ t . When the flight-path angle error is small, the derivatives of the position errors X e , and Ve are equal to the product of the flight-path angle error and the aircraft velocity. Derivatives of the Static Optical Cues: Indirect Dynamic Cues The static cues are defined from a frozen image of the tunnel image. When moving through the tunnel, the static cues change in time. Hence, their time derivatives form the first category of dynamic cues, labeled the indirect cues. First, the position of the vanishing point on the view plane is determined by the aircraft attitude relative to the trajectory. Because the aircraft angular velocities are zero in rectilinear motion, the position of the vanishing point on the view plane does not change. It does not convey any information about the direction of motion with respect to the trajectory. Second, The derivatives of the splay angles are a function only of the derivatives of the position errors that are at their turn determined by the flight-path angle error and the aircraft velocity. For example, for the derivative of the change
13.
TUNNEL-IN-THE-SKY
291
in splay angle ω1 , one obtains 2Wt 2Ht V Vtas χe γ − ω˙ 1 = − tas e Wt2 + Ht2 Wt2 + Ht2 The direction of motion relative to the trajectory is coded through the angular velocities of the splay angles. Similar to the splay angles themselves, their rates show the lateral and vertical direction of motion in a coupled fashion: A splay angle rate can be due to a lateral or a vertical flight-path angle error, or both. Now consider the displacements of the tunnel frames. Whereas the positions of the frames on the view plane are a function of both the attitude and the position relative to the tunnel, the relative displacements are only a function of the latter. Therefore, the relative velocity of a frame with respect to the stationary vanishing point conveys information about the flight-path angle error. For example: the derivatives of the lateral displacements relative to the vertical pseudo-horizon are
ε˙i∞ η˙ i∞
X˙ e Vtas =+ εi∞ + ε 2 /κ Wt /2 + X e Wt /2 + X e i∞ X˙ e Vtas ηi∞ + η2 /κ =− Wt /2 − X e Wt /2 − X e i∞
for the left and right tunnel walls, respectively. The same holds for the derivatives of the vertical displacements with respect to the horizontal pseudo-horizon. The rates of the lateral displacements with respect to the pseudo-horizons are a function of the flight-path angle error in each dimension. The equations show the additive properties of two elements of flow (Flach et al., 1992). The first terms on the right-hand side show the fractional change in distance to the tunnel wall, that is, the temporal cue of time to contact with that particular wall. The second terms on the right-hand side show the relationship with the global optical flow rate. Both the flow components are scaled with the displacement relative to the pseudo-horizons. For a large displacement (close by), the effect of the second component on the right-hand side of the equations is amplified, whereas for smaller displacements (far away), it is the effect of the first component that dominates. The Optic Flow Field: Direct Optical Cues The Focus of Radial Expansion Longuet-Higgins and Prazdny (1980) have shown that the magnitude and direction of the flow velocities of an arbitrary point on the view plane can be computed as the sum of a translational motion and a rotational motion component. In the rectilinear motion condition, the aircraft rotation vector is neglected. The velocity
292
MULDER, STASSEN, AND MULDER
of a projected point on the view plane then can be expressed in the translational components only, yielding a radial expansion pattern. The focus of radial outflow (FRO) marks the origin of the flow pattern. Its position on the view plane can be computed with u FRO = +κ (β) ;
v FRO = −κ (α)
The aircraft aerodynamic angles determine the position of the FRO on the view plane. The FRO shows the attitude of the velocity vector with respect to the aircraft longitudinal axis. The Radial Expansion Pattern The optic flow pattern resulting from rectilinear motion is illustrated in Fig. 13.4 and can be described as straight radial lines through the FRO. The magnitude of the velocity vector of an arbitrary point is determined by: (1) the distance between the position of the FRO and the projection of that particular point on the viewplane (close to the FRO all velocities diminish); (2) the distance with respect to the viewplane (points that are positioned farther away move not as fast as points closer to the viewplane—motion parallax); and (3) the velocity of the locomotion (when the velocity increases, the magnitude of all flow vectors increase). The radial expansion pattern provides information about translation and relative depth. The future path (when the current state of motion remains the same) is specified in the optic flow pattern by the locomotor flow line, that is, the flow line passing directly below the observer (Lee & Lishman, 1977). Mulder (1999) showed that the global optic flow characteristics are identical to those of the indirect dynamic cues, stressing the redundancy in flow information. A flight-path angle error results in an equal contribution of flow for all points on the view plane, scaled by their distance to the view plane. The effect of a position error is scaled twice by the distance; that is, the contribution in flow asymmetry due to a position error is strongest for those points on the tunnel that are located near-by. Temporal Cues The optic flow pattern, as well as the derivatives of local gradients, convey temporal cues. The location of the FRO with respect to the tunnel geometry yields information about the time to contact with the tunnel walls. The impact point is locally specified by the FRO, but also by the radial expansion pattern itself. Consider Fig. 13.4a: Crossing the vertical flow line emerging from the FRO with the right tunnel wall shows the location of impact with that wall. The same holds for the horizontal flow line emerging from the FRO and the bottom tunnel wall. The flow pattern indicates the same information. When the FRO is put at the vanishing point, the flight-path angle error is set at zero and the rectilinear motion is aligned with the tunnel centerline (Fig. 13.4b). The texture compression rates in both the lateral and
13.
TUNNEL-IN-THE-SKY
293
(a)
(b) FIG. 13.4. Radial flow pattern in rectilinear motion. In this figure, the dotted lines show the theoretical radial flow pattern originating from the focus of radial outflow (circle). The dashed lines show the view plane centerlines. The dash–dot lines mark the position of the vanishing point. The thick dashed line shows the locomotor flow line, projected on the bottom tunnel wall. The arrows show the velocities of the tunnel frame elements on the view plane. The aircraft attitude angles (ψ, θ), aerodynamic angles (α, β), and flight-path angles (χ, γ ) are as indicated. The following state is plotted: Wt = Ht = 45 [m]; t = 3 [deg]; Vtas = 70 [m/sec]; β = 3 [deg]; α = 7 [deg]; ψ = 4 (top), −3 (bottom) [deg]; θ = 3, +4 [deg]; γe = 1, 0 [deg]; χe = 7, 0 [deg]; Xe = −15 [m]; Ve = 5 [m]. FIG. 13.4a Rectilinear flight with a nonzero flight-path-angle error. FIG. 13.4b Rectilinear flight with a zero flight-path-angle error.
294
MULDER, STASSEN, AND MULDER
vertical dimension contain similar temporal cues in terms of the fractional change in distance to the left/right and top/bottom tunnel walls, respectively. The greater the magnitude of this fractional change in distance, the larger the flight-path angle error and the smaller the time before impact with one of the tunnel walls. The discussion shows the importance of the dynamic optical cues for the presentation of flight-path information. Information about the relative motion of the aircraft with respect to the trajectory is redundant. It is coded in the global optic flow field—the location of the FRO and the radial flow pattern—and in local properties such as the splay rates and the texture compression rates. In the next section, the use of the cues in the control of rectilinear motion will be discussed from an information processing perspective.
PERCEPTION AND CONTROL OF RECTILINEAR MOTION In rectilinear flight, all rotation effects are neglected and the aircraft travels through the world along a straight line. This is a simplified abstraction of reality. In fact, rectilinear motion is a special case of curvilinear motion, namely, one in which the radius of circular motion is extremely large. The rectilinear motion condition is important in the analysis, because many concepts can be derived for this elementary state of locomotion. Guided by the findings reported in literature, the tunnel display has been analyzed mathematically in the previous section. Below, the sources of optical information are studied from the perspective of the pilot. The principal hypothesis is that the main ecological stimulus of the pilot in controlling the aircraft through the tunnel is that of an approach to a surface. The discussion is categorized with respect to the perception of those variables that are important for controlling the aircraft along the reference trajectory. These motion referents, all defined with respect to the trajectory, are the attitude, position, and flight path of the aircraft. The optical cues conveying information about this set of motion referents are analyzed from the perspective of their functionality (Owen, 1990). The Aircraft Attitude The attitude of the aircraft with respect to the world is determined by the three Euler angles. The aircraft roll angle φ and pitch angle θ are shown by the tunnel display in similar fashion as in conventional primary flight displays, that is, through the translation and rotation of the horizon with respect to the display center. The vanishing point marks the attitude of the aircraft with respect to the tunnel. As indicated in Fig. 13.3a, the two parallel lateral tunnel planes and the two parallel vertical tunnel planes at infinite distance result in two perpendicular axes—the pseudo-horizons through the vanishing point—rotated with the aircraft roll angle φ with respect to the view plane. The depression angle t of the trajectory yields a
13.
TUNNEL-IN-THE-SKY
295
horizontal pseudo-horizon, with respect to which the vertical attitude with respect to the tunnel—(θ + t )—can be perceived. The vertical pseudo-horizon marks the lateral attitude—heading (ψ)—of the aircraft with respect to the trajectory. The Aircraft Position The aircraft position with respect to the trajectory is conveyed by the tunnel display through the gradients of optical perspective and optical compression. The Gradient of Optical Perspective The optical splay angles conveyed by the tunnel display are a function of the position error and the tunnel size. A clear disadvantage of the splay angles is that a change in any of them can be caused by a lateral or vertical position error, or both. In automobile driving research, from which the potential use of splay angles originates (Beall & Loomis, 1996; Biggs, 1966; Riemersma, 1981), this disadvantage does not occur simply because the motion of a car is limited to the horizontal plane. The fact that the splay angles show the referents in a coupled fashion led to inferior performance in altitude control studies in which both the lateral and the vertical position disturbances were present (Johnson et al., 1989). Although it is possible that the pilot can mentally distinguish between the effects of a lateral and a vertical position error on a splay angle, this will require some cognitive effort. The splay angles also have some important virtues. First, because it is a property of the entire line, the change in splay angle is the same for all points on that line: The splay gain is independent on which part of the line is perceived (Beall & Loomis, 1996). Second, the splay angles are unaffected by the forward motion of the observer, which affects the perception of the other position cues. Third, the splay angles are unaffected by the aircraft attitude relative to the tunnel. The Gradient of Optical Compression The displacements of the elements of the tunnel frames with respect to the two pseudo-horizons convey optical compression information. An important advantage of the compression gradient cues is that they show the information in both the dimensions in an uncoupled fashion. The aircraft position with respect to the four tunnel walls is shown through the compression of the texture—the lines of the tunnel frames in the appropriate planes—in the two approaching planes, and through the relative expansion of the texture in the two receding planes. The relative change in compression and expansion of the tunnel frame elements with respect to the pseudo-horizons, however, also depends on the position of the particular elements that are perceived. Hence, the gain in presenting the position referent depends on the distances from the observer to the particular local elements involved (Beall & Loomis, 1996), which is an important disadvantage. Another disadvantage of the fact that relative displacements must be perceived is that these displacements
296
MULDER, STASSEN, AND MULDER
themselves are affected by the forward motion. Thus, the observer must perceive a change in the relative displacements between frames on top of the displacements of these frames due to the forward motion (Crowell & Banks, 1996). This could have been the reason why splay angles were found superior in some of the altitude control studies (Wolpert, 1988; Wolpert & Owen, 1985). Some additional comments on the last disadvantage can be made. Considering the relative position of the tunnel walls with respect to the pseudo-horizon is not affected by the forward motion. For this purpose, the observer must perceive the tunnel walls as “walls” instead of the concatenation of tunnel frames. It can be hypothesized that when the intermediate distances between frames (Dt ) is small enough, the percept could be that of a global density instead of merely a local relative displacement. To summarize: There are two main sources of optical information that specify a position error. Both are essentially static cues. The projection of the tunnel geometry conveys such salient and informative cues by itself that it can be hypothesized that the flow field cues can be regarded as contextual rather than functional (Owen, 1990). The functional cues are those of optical splay and optical compression. Both have their specific virtues, and it depends on the characteristics of the task and the display which of the cues is most effective in specifying the position error referent (Mulder, 1996). The Flight Path The attitude of the aircraft does not reveal the direction of locomotion with respect to the trajectory. The aerodynamic angles α (vertical) and β (lateral) depict the direction of the aircraft velocity vector with respect to the aircraft longitudinal axis. Together with the aircraft attitude angles θ and ψ, these angles determine the direction of motion with respect to the world, that is, the flight path (γ w , χ w ). According to literature and supported by the mathematical analysis, there are two ways of perceiving the direction of locomotion. The Perception of Flight Path from the Global Optic Flow Field Rectilinear motion results in a radial flow pattern. Because the direction of motion is specified by the focus of radial outflow, Gibson (1950) postulated that the FRO serves as an optical basis for the perception of heading. However, it is not the local FRO per se that specifies the direction of motion, but rather the global flow pattern (Warren, Morris, & Kalish, 1988). The direction of motion is implicit everywhere in the flow field, even when the FRO is not in view (Warren, 1976). Figure 13.4 clearly shows that the magnitudes of the optic flow field velocity vectors depend on the distances to the environmental objects, whereas the directions of these vectors are completely determined by the direction of locomotion. The future path is specified in the locomotor flow line. Recent research, investigating
13.
TUNNEL-IN-THE-SKY
297
the accuracy of estimating the direction of locomotion during motion parallel to a plane, confirm Gibson’s hypothesis (Warren, 1976; Warren et al., 1988). The optical flow patterns provide sufficient basis for motion direction judgments during rectilinear motion at a level prerequisite for the control of locomotion (Warren et al., 1988) and independent on the fact of whether the FRO is visible or not (Warren, 1976). A series of experiments are reported in Grunwald and Kohn (1993), in which subjects had to estimate the flight path during passive rectilinear and curvilinear motion over different types of texture. Results indicated that the accuracy in estimating the direction of rectilinear motion increased and estimation times decreased with the global optical flow rate. It was also concluded that the far visual field as opposed to the near visual field is essential in estimating flight path. This was attributed to the larger local expansion rates of the optic flow—defined as the change in flow vector direction per unit angular distance—in the far visual field. However, due to the smaller magnitudes of the flow field velocities in the far visual field, estimation times became larger. The Perception of Flight Path from Local Gradients When the flight-path angle error is zero, the aircraft travels along a straight line parallel to the reference trajectory, and the relative distances to the tunnel walls remain the same. When the flight-path angle error is not zero, any two of the walls are approaching, whereas the remaining two are receding from the observer. The splay angles—the perspective gradient—are a function of both the lateral and vertical position errors. Their derivatives, the splay rates, are a function of the flight-path angle error in both the lateral (χe ) and vertical (γe ) direction. Hence, the characteristics of the splay rates are similar to those of the splay angles themselves. A clear disadvantage is that the splay rates are affected by both the lateral and vertical referents of relative motion. Splay rate gain, however, is the same for the entire line, and the splay rates are unaffected by the longitudinal motion. Most important, when the flight-path angle error is zero, so are the splay rates and the splay angles remain constant (Fig. 13.4b). When on course, the splay angles coincide with the directions of the radial flow lines (Gordon, 1966; Lee & Lishman, 1977). Hence, the splay angles could act as an optical invariant for the relative distance to any of the four tunnel walls. The displacements of the elements of the tunnel frames with respect to the two pseudo-horizons—the compression gradient—are only a function of the lateral and vertical position errors. Therefore, the derivatives of these displacements in each dimension are only a function of the flight-path angle error in that particular dimension. This is an important quality. Similar to the relative displacements themselves, however, and probably even worse; the rates of the changes in relative displacements must be perceived on top of the motion of the local tunnel elements due to the forward motion. Again, when enough frames are positioned to give the impression of a wall, that is, a surface instead of a concatenation of frames, this disadvantage could be less important.
298
MULDER, STASSEN, AND MULDER
The redundancy in optical information specifying the flight path is impressive. Concerning the functionality of the optical cues in specifying the flight path, some remarks can be made. Most important, it can be hypothesized that the local gradient cues of perspective and compression are more important than the cues emerging from the global optic flow pattern. This assumption originates in the fact that the tunnel display does not contain any elements (e.g., texture) that can carry the flow, except for the tunnel geometry itself. This is a fundamentally different stimulus situation than those that were used in the experiments referred to above. These experiments applied information-rich displays that consisted of fields of random texture elements. It is clear that the changing tunnel geometry provides strong cues for flight-path relative to the trajectory. Thus, why would a pilot bother about the location of the FRO when the changing geometrical tunnel shape itself provides the task-relevant information in an unequivocal manner? Results of automobile driving experiments investigating the nature of the visual information in lanekeeping tasks support this hypothesis (Beall & Loomis, 1996; Gordon, 1966; Kapp´e, 1997; Riemersma, 1981).
CONCLUDING REMARKS In guiding the aircraft along the trajectory, a pilot needs information about the aircraft spatiotemporal situation with respect to the tunnel. Based on a literature survey, the cybernetic issues of information transfer and information processing are discussed in this paper. The main working hypothesis was that the principal stimulus when moving through a constrained environment as depicted by the tunnel display is that of an approach to a surface. The aircraft position and attitude relative to the trajectory can be perceived through the cues of linear perspective, cues that emerge from a static tunnel image. The aircraft flight path and temporal situation are conveyed by the changing tunnel geometry through cues of motion perspective, cues that are essentially dynamic. The display does not present any information other than the tunnel geometry itself, which makes it questionable whether the global cues of the optic flow field (such as the FRO) are used. Hence, one of the main conclusions is that it is probably the local cues conveyed by the dynamic transformation of the tunnel image that are used to control locomotion. One should keep in mind, however, that the local properties are instances of the global optical flow pattern. Based on the analysis, it is concluded that the ease in using the tunnel display for aircraft guidance can be attributed to (1) the redundancy in optical information and (2) the fact that the display supports human’s innate perception and action mechanisms of direct manipulation. Concerning the first issue, it is clear from the study previously described that almost all aircraft motion referents are conveyed through a set of optical cues. Warren (1988) pointed out, however, that although sources of optical information can specify the same referent in a geometrical
13.
TUNNEL-IN-THE-SKY
299
sense, they may not be equally useful to the pilot. The redundancy of optical information in ecological displays such as the tunnel display is a difficult entity to test empirically. The main problem in testing the relative functionality of a set of redundant cues is that they cannot be isolated. Therefore, experiments designed to assess the utilization of a redundant set of optical cues in visual displays are subject to alternate interpretation, facing the experimenter with problems of determining which of the information sources is actually responsible for the pilot’s performance (Warren & Owen, 1982). Concerning the second issue, recall that the tunnel display presents the information through the projection of a regular geometrical object on a two-dimensional display. The stimulus array that results from this transformation is compatible to the stimuli that emerge during everyday locomotion through the environment. Because all human goal-directed behavior involves perception and action in space and time, the ecological stimulus for vision is the inherently spatiotemporal optic flow field (Gibson, 1986). The changing pattern of light that enters the eye during locomotion is structured and conveys information about the structure of the environment, the nature of the locomotion, and about the temporal situation of the locomotion relative to the environment (Lee & Lishman, 1977). The reciprocity of the optic flow field and the locomotion allows observers to control their locomotion using vision only, constituting a perception and action cycle. Central in an empirical evaluation should be that the coupling between perception and action remains intact. Therefore, the experiments in Mulder (1999) are interactive; that is, pilots performed a continuous task of adjusting their locomotion through the environment using the same mechanisms as they would apply in real flight. This is referred to as active psychophysics (Flach, 1991). Further, whereas the set of aircraft motion referents—attitude, position and flight path— can be considered extrinsic to the self-motion event, the optical cues conveyed by the display can be considered intrinsic to the self-motion event (Owen, 1990). The study in this chapter reveals that both the extrinsic and intrinsic variables are coupled in the transforming visual array and that an inverse transformation is not needed: Pilots can use the information conveyed by the optical cues directly for control. In other words, the use of direct manipulation of optical variables allows for inferences to be made directly from those that are manipulated instead of implying them from indirect—extrinsic—relations among environmental variables (Larish & Flach, 1990; Owen, 1990).
REFERENCES Beall, A. C., & Loomis, J. M. (1996). Visual control of steering without course information. Perception, 25, 481–494. Biggs, N. L. (1966). Directional guidance of motor vehicles—A preliminary survey and analysis. Ergonomics, 9(3), 193–202.
300
MULDER, STASSEN, AND MULDER
Crowell, J. A., & Banks, M. S. (1996). Ideal observer for heading judgments. Vision Research, 36(3), 471–490. Cutting, J. E., & Millard, R. T. (1984). Three gradients and the perception of flat and curved surfaces. Journal of Experimental Psychology: General, 113(2), 198–216. Flach, J. M. (1991). Control with an eye for perception: Precursors to an active psychophysics. In W. W. Johnson & M. K. Kaiser (Eds.), Visually guided control of movement (pp. 121–149). Moffett Field, California, NASA-Ames Research Center. Flach, J. M., Hagen, B. A., & Larish, J. F. (1992). Active regulation of altitude as a function of optical texture. Perception & Psychophysics, 51(6), 557–568. Funabiki, K. (1997). Tunnel-in-the-sky display enhancing autopilot mode awareness. In Proceedings of the 1997 CEAS Free Flight Symposium (pp. 29.1–29.11). Delft, The Netherlands. Gibson, J. J. (1950). The perception of the visual world. Boston, MA: Houghton Mifflin. Gibson, J. J. (1986). The ecological approach to visual perception. Hillsdale, NJ: Lawrence Erlbaum Associates. (Original work published 1979) Gordon, D. A. (1966). Perceptual basis of vehicular guidance. Public Roads, 34(3), 53–68. Grunwald, A. J. (1984). Tunnel display for four-dimensional fixed-wing aircraft approaches. Journal of Guidance and Control, 7(3), 369–377. Grunwald, A. J., & Kohn, S. (1993). Flight-path estimation in passive low-altitude flight by visual cues. Journal of Guidance, Control, and Dynamics, 16(2), 363–370. Haskell, I. D., & Wickens, C. D. (1993). Two- and three-dimensional displays for aviation: A theoretical and empirical comparison. The International Journal of Aviation Psychology, 3(2), 87–109. Johnson, W. W., Tsang, P. S., Bennett, C. T., & Phatak, A. V. (1989). The visually guided control of simulated altitude. Aviation, Space and Environmental Medicine, 60, 152–156. Kapp´e, B. (1997). Visual information in virtual environments. Unpublished doctoral dissertation, University of Utrecht & Institute for Perception TNO, the Netherlands. Larish, J. F., & Flach, J. M. (1990). Sources of optical information useful for perception of speed of rectilinear self-motion. Journal of Experimental Psychology: Human Perception and Performance, 16(2), 295–302. Lee, D. N., & Lishman, R. (1977). Visual control of locomotion. Scandinavian Journal of Psychology, 18, 224–230. Longuet-Higgins, H. C., & Prazdny, K. (1980). The interpretation of a moving retinal image. Proceedings of the Royal Society of London, Series B, 208, 385–397. Mulder, M. (1994). Displays, perception and aircraft control. A survey of theory and modelling of pilot behavior with spatial instruments. Rep. (Tech. No. LR-762). Delft, the Netherlands: Delft University of Technology, Faculty of Aerospace Engineering. Mulder, M. (1995). Towards a mathematical model of pilot manual control behavior with a tunnel-inthe-sky display. In Proceedings of the 14th European Annual Conference on Human Decision Making and Manual Control (pp. 1.2.1–1.2.13). Delft, the Netherlands: Delft University of Technology, The Netherlands. Mulder, M. (1996). Modelling manual control of straight trajectories with a tunnel-in-the-sky display. In Proceedings of the 15th European Annual Conference on Human Decision Making and Manual Control (pp. 1.2.1–1.2.12). Soesterberg, the Netherlands: TNO, Soesterberg, The Netherlands. Mulder, M. (1999). Cybernetics of Tunnel-in-the-Sky Displays. Delft, the Netherlands: Delft University of Technology Press. (Doctoral dissertation, Faculty of Aerospace Engineering). Oliver, J. G. (1990). Improving situational awareness through the use of intuitive pictorial displays. Society of Automotive Engineers (SAE) Technical Paper Series. (No. 901829). Long Beach, CA: Delft University Press. Owen, D. H. (1990). Perception & control of changes in self-motion: A functional approach to the study of information and skill. In R. Warren & A. H. Wertheim (Eds.), The Perception and Control of Egomotion (pp. 289–326). Hillsdale, NJ: Lawrence Erlbaum Associates. Society of Automotive Engineers
13.
TUNNEL-IN-THE-SKY
301
Parrish, R. V., Busquets, A. M., Williams, S. P., & Nold, D. E. (1994). Spatial awareness comparisons between large-screen, integrated pictorial displays and conventional EFIS displays during simulated landing approaches (Tech. Paper No. 3467). Washington, DC: National Aeronautics and Space Administration. RAND. (1993). Airport growth and safety. A study of the external risks of Schiphol airport and possible safety-enhancement measures. Santa Monica, CA: Author. Regal, D., & Whittington, D. (1995). Guidance symbology for curved flight paths. In Proceedings of the Eighth International Symposium on Aviation Psychology (pp. 74–79). Columbus, OH: Riemersma, J. B. J. (1981). Visual control during straight road driving. Acta Psychologica, 48, 215–225. Theunissen, E. (1995). Influence of error gain and position prediction on tracking performance and control activity with perspective flight path displays. Air Traffic Control Quarterly, 3(2), 95–116. Warren, R. (1976). The perception of egomotion. Journal of Experimental Psychology: Human Perception and Performance, 2(3), 448–456. Warren, R. (1988). Visual perception in high-speed low-altitude flight. Aviation, Space, and Environmental Medicine, 59(Suppl. 11), A116–A124. Warren, R., & Owen, D. H. (1982). Functional optical invariants: A new methodology for aviation research. Aviation, Space, and Environmental Medicine, 53(10), 977–983. Warren, W. H., Morris, M. W., & Kalish, M. (1988). Perception of translational heading from optical flow. Journal of Experimental Psychology: Human Perception and Performance, 14(4), 646–660. Wilckens, V. (1973). Improvements in pilot/aircraft-integration by advanced contact analog displays. In Proceedings of the Ninth Annual Conference on Manual Control (pp. 175–192). Wolpert, L. (1988). The active control of altitude over differing texture. In Proceedings of the Human Factors Society 32nd Annual Meeting (pp. 15–19). Wolpert, L., & Owen, D. (1985). Sources of optical information and their metrics for detecting loss in altitude. In Proceedings of the Third Symposium on Aviation Psychology (pp. 475–481). Columbus, OH:
14 Alternative Control Technology for Uninhabited Aerial Vehicles: Human Factors Considerations W. Todd Nelson Timothy R. Anderson Grant R. McMillan Air Force Research Laboratory Wright-Patterson Air Force Base
The purpose of this chapter will be to examine the application of several different types of alternative or nonconventional control technologies as part of human– machine interfaces for uninhabited aerial vehicle (UAV) and uninhabited combat aerial vehicle (UCAV) operations. In the present context, alternative control technologies will be defined as control devices that do not require a mechanical linkage between the operator and the input device, thereby potentially offering a more efficient and intuitive way of achieving system control. Such devices include, but are not limited to, position and orientation tracking systems, automatic speech and gesture recognition systems, head and eye movement tracking systems, and brainand brain–body-actuated control systems. One of the primary goals of this chapter will be to review several varieties of alternative control technology in terms of their technological maturity, potential benefits, and technological and human factors considerations as they relate to UAV and UCAV applications. UAVS AND UCAVS As described by Fahlstrom and Gleason (1994), the term unmanned or uninhabited aerial vehicle (UAV) generally refers to aircraft that are operated without onboard pilots. More recently, the term uninhabited combat air vehicle (UCAV) has been 303
304
NELSON, ANDERSON, AND McMILLAN
used to denote a specific class of UAVs—namely, those that are dedicated to tactical airborne applications, including air-to-ground, air-to-air, and reconnaissance and intelligence-gathering missions (Fulghum, 1997). Although the term UAV has been used to describe aircraft that operate without human intervention—that is, “drones”—our discussion will be restricted to UAV systems in which human operators are required. There are two primary reasons for this focus. First, it is important to recognize that although the air vehicle will be uninhabited, contemporary UAV systems typically consist of numerous human-operated systems, including launch and recovery systems, ground control and payload stations, data link and communications systems, and transportation and maintenance systems. Second, as noted in a UAV study prepared by the U.S. Air Force Scientific Advisory Board (SAB; see Worch, 1996), the role of the human in the operation of UAVs will be extremely important, especially in regard to handling malfunctions and time-critical mission replanning, and executing difficult missions in highly complex and dynamic hostile environments. Along these lines, of the nine missions/tasks that the SAB has identified as strong candidates for UAV applications, six of the missions are “weapons carrying,” including counterweapons of mass destruction, theater missile defense, fixed target attack, moving target attack, suppression of enemy air defenses, (SEAD), and air-to-air combat. Accordingly, the SAB has concluded that “the human’s flexibility and capability for inductive reasoning are desirable attributes that justify the retention of a significant supervisory and intervention capability during UAV operations for the foreseeable future” (Worch, 1996, p. 7-2). UAVs—Why Now? As pointed out by Munson (1988) and others (Fulghum, 1997; Worch, 1996), the use of UAVs is often associated with potential cost benefits compared to that of manned aircraft systems, including significantly lower development and production costs, substantial reductions in aircraft weight, fuel, and power consumption, and sizable savings in aircraft system storage. Consistent with this view, the SAB (Worch, 1996) recently concluded that the potential for life-cycle cost savings is recognized as a principal motivating factor for the development of UAVs for military applications. In addition to these potential fiscal advantages, Davis (1988) has noted that UAVs may be especially advantageous for flight tasks that are difficult or unfeasible for human pilots, including (1) long-term endurance flights, (2) tactical maneuvers that are associated with excessive and/or prolonged gravitational forces, and (3) missions involving high-altitude flight. As a result, Fulghum (1997) reports that military experts recognize that although manned aircraft are approaching a performance asymptote, UAVs are not. Finally, UAVs have the added value of eliminating the risk of pilot fatalities or prisoner taking if the aircraft is shot down (Fulghum, 1997). The latter is especially meaningful if one considers historical
14.
ALTERNATIVE CONTROL TECHNOLOGY
305
data from the Vietnam War—2,500 manned aircraft lost, approximately 5,000 airmen killed, and nearly 90% of all prisoners of war were pilots and crewmembers (Munson, 1988, p. 7). In spite of the numerous potential advantages associated with UAVs, many technological challenges remain, for example, propulsion system technology for highendurance aircraft, UAV structural design issues involving weight, stealth, maintainability, repairability, issues involving life-cycle costs, and human–machine control system technology (Worch, 1996). It is interesting that the SAB identified the latter as being the most critical technological challenge for future tactical UAV systems. Moreover, it concluded that, thus far, the application of human factors principles to issues involving automation, allocation of functions, and human– machine interface design for UAVs has been deficient. Although it is beyond the scope of this chapter to address all of these issues, the sections that follow will address a subset of these, specifically, the application of alternative control technologies as part of the UAV human–machine control interface.
ALTERNATIVE CONTROL TECHNOLOGIES Alternative, or nonconventional, control technologies are defined as control devices “that do not require a direct mechanical linkage between the user and the input device” (McMillan, Eggleston, & Anderson, 1997). Examples of alternative control technologies include speech and gesture recognition systems, position and orientation tracking technologies, eye-movement tracking systems, and brainactuated control devices. Conversely, traditional control devices such as joysticks, keyboards, foot pedals, flight sticks, push buttons, and switches are excluded from this definition. As noted earlier, the primary motivation for exploring alternative control technologies is to provide the human operator with a more efficient way of interacting with complex systems. Comprehensive reviews of alternative control technologies have recently appeared in several book chapters (Durlach & Mavor, 1995; McMillan et al., 1997; Stanney, 2002) the most complete of which has been provided by McMillan and his colleagues (1997). Additional reviews of these technologies can be found in the proceedings of a recent NATO lecture series entitled “Alternative Control Technologies: Human Factors Issues.” The information presented in this chapter is meant to supplement those reviews with descriptions of several key technologies that may be particularly relevant to UAV applications. Position and Orientation Tracking Technologies Position and orientation trackers are a class of control technologies that enable “real-time” tracking of physical objects in three-dimensional space. With regard to UAV operations, tracking devices may potentially be used to control inputs derived
306
NELSON, ANDERSON, AND McMILLAN
TRANSMITTER
GENERATE COHERENT ELECTROMAGNETIC CURRENT
DETECT INDUCED CURRENT
RECEIVER
COMPUTE POSITION AND ORIENTATION
FILTERING AND DISTORTION ALGORITHMS HUMANMACHINE INTERFACE FIG. 14.1. Basic components of a helmet-mounted electromagnetic head tracking system.
from the position of the operator’s head, hands, feet, or any combinations of these. In addition, to the extent that helmet-mounted display and spatial audio displays are included in the human–machine interface, head-position tracking is likely to be employed. Whereas mechanical, optical, acoustical, and electromagnetic trackers have been developed for various types of virtual environment and teleoperated applications, the latter are by far the most prevalent (see Durlach & Mavor, 1995; Foxlin, 2002 for review). Typically, electromagnetic tracking systems include a transmitter, a receiver, and a control box, a schematic of which is depicted in Fig. 14.1. Coherent electromagnetic fields (i.e., arranged along the x, y, and z axis) created at the transmitter combine with the receiver to yield a unique set of voltages, which are converted to position and orientation by the control box and used by the computer to update display devices (see Pimentel & Teixeira, 1993, for a detailed description). Commercially available electromagnetic trackers such as Ascension Technology’s Flock of Birds and Polhemus’s Fastrak are quite popular for head- and hand-mounted tracking due to their low price and reasonable accuracy (Durlach & Mavor, 1995). However, as pointed out by Meyer, Applewhite, and Biocca (1992), electromagnetic trackers continue to be challenged by problems involving time delays, interference caused by nearby metallic objects, range limitations (approximately 3 ft), and problems with position and orientation accuracy. Despite the relative maturity of electromagnetic tracking devices, researchers have only recently started to evaluate and quantify these effects (Kocian & Task, 1995; Nixon, McCallum, Fright, & Price, 1998).
14.
ALTERNATIVE CONTROL TECHNOLOGY
307
Eye-Position Tracking Technologies Eye-tracking technologies permit users to issue control inputs by simply looking at the objects they want to manipulate. In the case of UAVs, eye-tracking technologies could potentially be used to select items presented on visual displays. Of the three general types of eye-tracking technologies—electroocular, electromagnetic, and optical—it has been suggested that optical trackers are the most suitable for general use (see Durlach & Mavor, 1995). Commercially available optical eye trackers, such as those developed by ISCAN, Inc., and ASL, Inc., employ video-based technology to track the position of the eye by measuring movement in the pupil and light source reflections from the cornea (see Borah, 1998, for a review). As noted by McMillan and his colleagues (1997), the tracking resolution of commercially available systems is inferior to a mouse or joystick controller; however, laboratory systems are capable of much greater resolution. In addition, in the absence of head-position tracking, the effective range over which eye movements can be tracked is approximately 20 deg. Accordingly, researchers (Durlach & Mavor, 1995; McMillan et al., 1997) have suggested that eye-tracking technologies be used in combination with other alternative controls, for example, head-position tracking or speech recognition systems. Speech Recognition Technologies Speech recognition technology is a relatively mature alternative control technology that has been under development for almost 3 decades. At a basic level, speech-based control comprises five stages—signal acquisition, signal processing, recognition algorithms, control algorithms, and user feedback. Advances in two of these areas—signal processing and control algorithms—have permitted the development of speech-based control systems capable of recognizing continuous, speaker-independent input. McMillan and his colleagues (1997) have noted that successful application of automatic speech recognition technology has included (1) telephone call handling systems, (2) speech-controlled telephone dialing, (3) speech-based control of numerous appliances and devices for physically disabled users, (4) a telephone-based bank information system, and (5) speech-based control of a multifunction display in the cockpit of an experimental F-16 fighter aircraft. Collectively, such demonstrations indicate that the effectiveness of speech-based control is maximal when used in conjunction with complex information entry tasks that would normally require manual input (McMillan et al., 1997). Despite these advances, the efficacy of speech recognition systems continues to be challenged by (1) vocabulary complexity, (2) noise in the operator environment, (3) differences between the acoustic environments in which the recognizer was trained and where it is used, and (4) variability between and within speakers, for example, changes in accents or dialects and affective state (Durlach & Mavor, 1995). In addition, one of the primary research issues that needs to be determined is the effectiveness of these systems when operators are exposed to high mental and physical workload environments.
308
NELSON, ANDERSON, AND McMILLAN
Gesture-Recognition Technologies Gesture recognition systems enable operators to use hand and body motions as control inputs to a system. With respect to UAV applications, gesture-based control involving hand motions could potentially be used to select targets or waypoints, change viewing perspectives, (i.e., zoom in or zoom out), or even navigate through the flight environment. Currently, glove-based systems are the most advanced of the gesture-based systems. Such systems use joint-angle tracking technologies in conjunction with electromagnetic position trackers to track independently the motions of the operator’s fingers and hand. To the extent that gesture-based control employs electromagnetic tracking technology, its effectiveness will be challenged by many of the same technological limitations described in the section on position and orientation tracking technologies—that is, time delay, update rate, range, interference, resolution, and accuracy. In addition, gesture interpretation of dynamic hand movements is a formidable challenge given the complex nature of the pattern recognition task and is still in an early stage of development. Typical approaches include neural networks, feature analysis, and template matching (McMillan et al., 1997). Finally, in addition to glove-based gesture control, McMillan (1998) has noted that facial and postural gesture recognition may offer additional means for providing control inputs for teleoperated applications.
Brain- and Brain–Body-Actuated Control Technologies Brain- and brain–body actuated control technologies represent a relatively new class of alternative controllers that use electroencephalographic (EEG) signals, and combinations of EEG and electromyographic (EMG) signals as control inputs. For example, in UAV applications, brain-actuated control may be used to select particular displays on a control panel, toggle between menu options, or select way points on a computerized map. A schematic of a brain-actuated control system is illustrated in Fig. 14.2. As can be seen in the figure, EEG signals are recorded at the scalp, amplified, converted to digital signals, processed using the techniques of digital signal processing (FFT, band-pass filtering, etc.), and used as input to a control algorithm (McMillan et al., 1997). Brain–body-actuated control uses a similar method, though in this case electrical signals are made up of a combination of EEG and EMG, often recorded at the forehead (see Junker, Berg, Schneider, & McMillan, 1995). Although research involving brain- and brain–body-actuated control is relatively limited, recent laboratory investigations indicate that it might provide a viable means of “hands-off” control for tasks that previously required manual controllers. For example, brain-actuated control has been used to control computer cursor movement (McFarland, Lefkowicz, & Wolpaw, 1997), to predict button pushing (Pfurtscheller, Flotzinger, & Neuper, 1994), and to perform a single-axis tracking task in a flight simulator (McMillan et al., 1995). In addition, Nasman,
14.
Signal Acquisition
ALTERNATIVE CONTROL TECHNOLOGY
Signal Processing
Feedback
309
Control Algorithm
Controlled Device
FIG. 14.2. Basic elements of brain- and brain–body-actuated control.
Calhoun, and McMillan (1997) have recently provided a cogent argument for integrating brain-actuated control with see-through helmet-mounted displays. Recent investigations have also demonstrated that brain–body (i.e., combined EEG and EMG signals) actuated control is effective for control tasks that are relevant to air vehicle operations. For example, Junker and colleagues (1995) demonstrated that brain–body-actuated control could be used to perform a target acquisition task. In a different experiment, Nelson and his colleagues (1997) showed that brain–body-actuated control could be used to perform a simple, single-axis tracking task through a virtual flight environment. Nelson and his colleagues (1996) have also employed isolated EMG signals generated at the forehead to respond to visual stimuli presented on a computer display. Remarkably, unpracticed participants achieved a 98% correct response rate, and the EMG-based responses were found to be slightly faster than those issued using a traditional manual response button. UAVS AND VIRTUAL ENVIRONMENTS: IMMERSING THE OPERATOR At the most basic level, UAVs can be viewed as teleoperated systems, that is, they consist of one or more human operators, human–machine interfaces, and one or more teleoperated air vehicles. A fundamental challenge in the design of teleoperated systems is to design human–machine interfaces that afford operators effective perception and control in the teleoperated environment. As noted by Slenker and Bachman (1990), human–machine interfaces for UAV mission planning and ground control stations, such as the Pioneer’s GCS 2000, have traditionally consisted of standard aircraft–like displays and controls, for example, head-down visual displays, alpha-numeric instruments and gauges, a flight stick and throttle, rudder pedals, buttons, keyboards, and switches. Although these interfaces may be sufficient for operating UAVs during less demanding missions or in benign environments, they are likely to be ineffective in the highly
310
NELSON, ANDERSON, AND McMILLAN
dynamic and spatially complex environments in which UAVs are expected to operate. For example, it is uncertain whether traditional display and control interfaces will be adequate for conveying the necessary spatial orientation cues required by complex UAV missions. In addition, conventional control interfaces will likely be insufficient for supporting the control of multiple UAVs by a single operator. These criticisms are consistent with the assertion advanced by Furness (1986) in his analysis of traditional fighter aircraft interfaces, specifically, that traditional interfaces severely limit human information processing by failing to exploit the extraordinary multisensory, spatial-processing, and psychomotor capabilities of the operator. As an alternative to traditional interfaces, Furness (1986) introduced the concept of the “Super Cockpit,” an approach to human–machine interfaces that employed numerous multisensory virtual displays, as well as an ensemble of nonconventional control technologies. These novel interface concepts included stereoscopic helmet-mounted displays, spatial audio displays, haptic/tactile force feedback displays, automatic speech and gesture recognition, and head-, eye-, and hand-position tracking devices. In short, the fundamental notion involved “immersing” the pilot in a multisensory, three-dimensional virtual environment, thereby enhancing the efficiency with which he or she could perceive, attend, and respond to critical information in the tactical air environment. In addition to enhancing pilot performance efficiency, it was anticipated that these interfaces would have numerous advantages over conventional cockpit interfaces, including enhanced spatial and situation awareness, reduced visual and manual workload, and the capability to support the remote operation of fighter aircraft. For instance, wide field-of-view pictorial visual displays were proposed as an alternative to the multitude of conventional gauges and dials, thereby allowing the spatial location of critical information (i.e., way points, targets, threats, etc.) to be displayed directly to the pilot. Similarly, spatial audio displays were designed to provide spatial information about the location of threats, targets, other aircraft, and the ground. Extending this argument, we contend that UAV mission effectiveness will vary directly with the immersive qualities of the human–machine interface. This concept is illustrated in Fig. 14.3, which depicts increasing levels of mission effectiveness across several different levels of immersive interface technology, that is, virtually augmented, partially immersive, and fully immersive. Moreover, these three levels of immersion may correspond respectively to missions involving reconnaissance/surveillance, suppression of enemy air defenses, and air-to-air combat. Fig. 14.3 also indicates a shift in responsibilities for the human operator, from hands-on manual control to supervisory control. This notion assumes future UAVs will be capable of moderate levels of autonomy, thereby allowing the human operator to strategize and command at the supervisory level. Virtually augmented interfaces represent UAV interface concepts that are technologically feasible in the near term (i.e., present–2007) and will consist of a
14.
ALTERNATIVE CONTROL TECHNOLOGY
311
FIG. 14.3. Relationship between mission effectiveness and immersion qualities of the user interface for UAV operations.
collection of conventional displays—large computer displays, auditory displays, and manual controllers—that are “augmented” by several virtual interface technologies—helmet-mounted displays, spatial audio displays, and various alternative control technologies. As described in a previous section, the relative maturity of speech recognition and position tracking technologies make them particularly strong candidates for virtually augmented interfaces. In addition, body-actuated control inputs (e.g., isolated EMG signals) may also be effective for simple discrete response inputs. In the case of speech-based controls, current technology would enable operators to use speaker-independent, limited-vocabulary, connected-word speech commands for tasks that normally require manual control inputs, such as keystrokes, mouse clicks, and menu and display selection. In addition, it is anticipated that the effectiveness of speech-based controls would be enhanced when used in conjunction with head-, eye-, or hand-based control. Finally, although the utility of speech-based control is susceptible to degradation in high noise environments, utilizing sound-attenuated UAV mission planning and control stations may minimize these effects. By contrast, the application of head- and eye-based control in the near term is less straightforward. Indeed, deficiencies in tracking resolution, problems
312
NELSON, ANDERSON, AND McMILLAN
associated with time delay, and complications due to noise and interference indicate that head- and eye-based control should be used with caution. Accordingly, practical near-term applications of head- and eye-based control may include highlighting or accentuating objects contained in the visual display, thus fostering a more intuitive and efficient mode of “hands-free” control, especially when combined with speech-based commands. Finally, given the high reliability and efficiency of forehead-generated EMG signals (see Nelson et al., 1996), it is possible that head- and eye-based control may be combined with body-actuated control to provide an efficient alternative to button presses or mouse clicks. As illustrated in Fig. 14.3, UAV interface concepts in the midterm (i.e., 2007– 2015) represent a radical departure from interfaces used in the near term. Candidate interface technologies for the midterm include wide field-of-view projection and helmet-mounted displays, spatial audio and communication displays, tactile and haptic displays, and a variety of advanced alternative control technologies, for example, speech-based, head-based, and eye-based control, gesture recognition, and brain-actuated control. The underlying assumption is that immersive interfaces will enable operators to more efficiently manage the demands associated with the heightened mission complexity in the midterm, a notion that is illustrated in the central portion of Fig. 14.3. According to the SAB (Worch, 1996), potential midterm missions may include suppression of enemy air defenses and jamming, and attacks on fixed and moving targets, hence the need to provide UAV operators with a more efficient way of comprehending and interacting with the air combat environment. Speech-based controls in the midterm are expected to permit continuous, speaker-independent, speech input that makes use of large vocabularies. Although midterm speech-based control interfaces do not quite afford conversational-style interaction, it is anticipated that they will allow UAV operators to “converse,” at least to a limited extent, with the UAV system. This capability would enable operators to query the UAV system for information about the status of various UAV systems—for example, weapons, payload, fuel, and threats—without using their hands. It may also allow teams of operators to “converse” with their control stations and the UAVs, thereby fostering more efficient and intuitive interaction. Advances in position and orientation tracking technologies by the midterm are expected to permit effective head- and eye-based control of various functions that are typically performed with manual controls, including head-aimed target selection and weapons guidance. In addition, head- and eye-slaved HUDs will ensure that mission critical information is continuously available to the operator regardless of where they are looking. As described in a previous section, headbased control is associated with numerous tactical advantages in manned air combat environments, and it is likely that these advantages will also extend to midterm tactical UAV missions. To the extent that brain- and brain–body-actuated controls have matured sufficiently, they may offer an additional control modality for UAV applications in the
14.
ALTERNATIVE CONTROL TECHNOLOGY
313
midterm. Along this line, Nasman, Calhoun, and McMillan (1997), have recently articulated the advantages of integrating brain-actuated control with HMD technology, and have identified several potential applications, including multifunction display operation, weapons and target selection, and radio frequency switching. Further, the coupling of brain-actuated control with head/eye-based control and speech-based control is expected to expand its utility in the midterm. As shown on the right side of Fig. 14.3, UAV interfaces in the far term (i.e., 2015– 2025) will be designed to completely immerse the UAV operator in the tactical environment. By now, the rationale for increasing the immersive qualities of the interface should be familiar: (1) to increase the perceptual and perceptual–motor capabilities of the operator and (2) to offset the cognitive processing demands caused by increased mission complexity. With regard to the latter, the SAB notes that far term UAV missions may be expanded to include theater missile defense, counterweapons of mass destruction, and air-to-air combat (Worch, 1996). Building on the interface concepts introduced in the midterm, fully immersive UAV interfaces will likely include retinal displays (see Durlach & Mavor, 1995), advanced position and orientation tracking devices, gesture recognition technology, and advanced speech-based control. With regard to control interfaces, perhaps one of the most significant changes will be the discontinuation of the physical throttle and flight stick controller. Instead, position and orientation tracking devices will be used in combination with gesture recognition technology to enable UAV operators to use hand, body, and facial gestures to issue UAV control inputs. Moreover, gesture-based control is expected to be fully integrated with head- and eye-based control, brain-actuated control, and enhanced speech-based control technology. With regard to the latter, speech-based technology in the far term is expected to provide conversational language interfaces featuring speaker-independent, very large vocabulary, and continuous speech recognition capabilities. An additional, albeit more provisional, advantage of the fully immersive interface concept is that it may enable a single UAV operator to control multiple UAV aircraft. No longer constrained by single aircraft controls (i.e., flight stick and throttle), UAV operators would be able dictate high-level mission tactics and strategies by exhibiting what may be termed meta-control. For instance, highly coordinated gesture recognition, head- and eye-slaved control, and speech-based control may allow a single operator to command coordinated clusters of UAVs for air-to-ground, SEAD, and/or air-to-air missions. This notion is portrayed in Fig. 14.4, which shows a fully immersed UAV operator directing a fleet of tactical UAVs. It has been proposed in the previous section that immersive human–machine interfaces may enhance the effectiveness with which operators control UAVs in tactical air environments. Specifically, we described the putative merit of several alternative control technologies and speculated on their role in UAV missions in the near, mid-, and long terms. Although an appeal to the so-called immersive interface paradigm has numerous intuitive advantages, its utility may be limited
314
NELSON, ANDERSON, AND McMILLAN
FIG. 14.4. Fully immersive interface for UAV supervisory command and control.
by two major human factors challenges: (1) problems arising from time delay and (2) issues involving simulator sickness. In the following section, these two factors are considered with regard to UAV applications.
HUMAN FACTORS CONSIDERATIONS Out of the Loop: The Ubiquitous Time Delay UAVs and virtual environment technologies are highly susceptible to problems associated with time delay, that is, the time between an input to a system and its corresponding output (Ricard & Puig, 1977). In fact, Davis (1988) has noted that the major limitation of earlier UAV systems was the lack of a “real-time” communication link between the aerial vehicle and the operator. In addition to the time delay associated with the communication link, several components in the immersive interface can add to the time delay. These include rendering of computergenerated images, the frame rate of the display device, application-dependent processing lags, and delays involved in the tracking and filtering of control input (Wloka, 1995). As a result, time delays in UAVs—particularly those that employ immersive interfaces—can be expected to range from a few milliseconds to several
14.
ALTERNATIVE CONTROL TECHNOLOGY
315
seconds (Durlach & Mavor, 1995), and are expected to be variable. Moreover, the magnitude of the system’s time delay will vary directly with the complexity of the human–machine interface (Durlach & Mavor, 1995; Frank, Casali, & Wierwille, 1988; Jewell and Clement, 1985). This is particularly important, given that the SAB concluded that “wide angle, high resolution cockpit views” will be required in UAV control stations (Worch, 1996, p. 8-6). Accordingly, issues involving time delay and their impact on the utility of UAVs will become increasingly important, especially as these systems are considered for increasingly complex tactical missions. As noted by Ricard (1994), a sizable literature has accumulated that demonstrates the harmful effects of time-delayed visual feedback on operators’ ability to control and regulate dynamic systems. Reviews by several researchers (Poulton, 1974; Ricard, 1994; Wickens, 1986) have noted that the degrading effects of time delay are determined by several factors, including the magnitude of the time delay, the complexity of the system being controlled, and the requirements of the task. In addition, factors such as the modality of the display, characteristics of the control device, the format or type of information depicted in the display (i.e., pursuit, compensatory, etc.), biomechanical constraints of the control movements, and the goals of the operator(s) combine to determine the overall impact of the time delay. The increased use of flight simulation systems throughout the 1970s and 1980s motivated additional empirical investigations concerning the effects of timedelayed visual feedback on performance efficiency. Of primary concern were questions involving (1) the maximum amount of delay that is tolerable and for what types of tasks, (2) the influence of time delay on transfer of training, (3) the effect of time delay on operator workload, and (4) how these effects changed with task complexity and aircraft dynamics. Such questions continue to be relevant to the design of effective human–machine interfaces for UAV and UCAV systems and for UAV training simulators. Evidence for the negative effects of time delay on flight control skills was provided by Hess (1984b) in an investigation of delayed visual feedback on tracking performance and closed-loop stability. Participants in this study performed a compensatory tracking task under three different time delay conditions: 0, 190, and 380 msec. As one might expect, RMS tracking error varied directly with the magnitude of the time delay. In addition, Hess (1984b) noted that increases in time delay were associated with a form of control instability termed pilot-induced oscillations (PIOs). PIOs refer to oscillatory or unstable aircraft dynamics caused by inappropriate control actions that, if left uncorrected, can eventually lead to closed-loop instability and loss of control. Empirical evidence linking time-delayed visual feedback and the occurrence of PIOs has been provided by a number of researchers (Hess, 1984a, 1984b; Middendorf, Fiorita, & McMillan, 1991; Middendorf, Lusk, & Whiteley, 1990). For example, Middendorf, Lusk, and Whiteley (1990) conducted a study in which participants performed a sidestep landing maneuver in a fixed-base flight simulator
316
NELSON, ANDERSON, AND McMILLAN
under visual time delays of 90, 200, and 300 msec. Using spectral analysis techniques, Middendorf and his colleagues were able to show that participants’ control inputs became less stable as time delay was increased, and that control inputs were accompanied by instances of PIOs. Similar results were reported by Middendorf, Fiorita, and McMillan (1991) in an investigation assessing operators’ ability to perform a low-level flight task under comparable levels of time delay. As these researchers reported, not only were all aspects of flight control performance degraded by the addition of the time delay, but once again were characterized by the presence of PIOs. Such results are consistent with ideas put forth by Frost (1972), who suggested that oscillatory behavior often occurs in manual control systems when operators attempt to achieve very “tight control” in the presence of time delay. The reason is that “tight control” necessitates high gain which can lead to control instability when combined with significant time delays. Hence, it is not surprising that PIOs accompanied the low-level flight and sidestep flight control tasks used by Middendorf and his colleagues (1990, 1991). With regard to UAV and UCAV applications, these results strongly suggest that performance efficiency may be severely impaired by time delays of several hundred milliseconds, especially when operators are required to perform tasks requiring “tight control.” Although most laboratory investigations have focused on the effects of time delay on manual control, recent research has demonstrated that the negative effects of time delay also extend to head-based or head-slaved tracking tasks (Azuma & Bishop, 1994; Nelson et al., 1998; So & Griffin, 1991, 1993, 1995). Time delays in head-slaved tracking tasks have been associated with decrements in performance efficiency, increased ratings of task difficulty, and disturbing visual illusions (Azuma & Bishop, 1994; So & Griffin, 1991, 1993, 1995). So and Griffin (1991) studied dual-axis, head-slaved tracking task under various levels of time delay (0, 40, 100, 160, 220, 280, 380 msec). Participants used a monocular HMD to track a moving circular target. Their results indicated that significant decrements in tracking performance, measured as percent time on target, occurred with as little as 40 msec of imposed time delay, and that all objective measures of tracking performance were significantly degraded by 100 msec of imposed time delay. Further, self-reported ratings of task difficulty, ranging from “not difficult” to “extremely difficult,” were shown to vary directly with increases in time delay. The results also indicated that with imposed time delays of 0, 40, and 80 msec, improvements in tracking performance were not achieved through practice. Such results are very important with regard to head-aimed control in UAV applications because time delays in the near term and midterm are likely to exceed 100 msec. In addition to the degrading effects of time delay on tracking performance, a frequency domain analysis revealed that increases in time delay were accompanied by increases in gain and phase lag for target motion frequencies around 0.8 Hz (So & Griffin, 1995). Evidently, observers modified their head-slaved tracking
14.
ALTERNATIVE CONTROL TECHNOLOGY
317
strategies in an attempt to compensate for the effects of time delay, and in the process adopted a control strategy that was not simply ineffective, but detrimental. Such results provide compelling empirical evidence that concomitant increases in operator gain and time delay can contribute to unstable, oscillatory control behavior, and can potentially lead to head-slaved PIOs. Along this line, Dornheim (1995) has recently noted that time delays and the possibility of PIOs in helmet-aircraftweapon systems present a major challenge. In regard to UAVs, these outcomes underscore the importance of recognizing the effects of time delays on head-based control inputs. Moreover, it is reasonable to anticipate that time delays will also negatively impact other forms of nonconventional control and may greatly constrain the types of tasks in which alternative control technologies are employed. Handling Qualities and Operator Workload The degrading effects of time-delayed visual feedback have also been revealed by research demonstrating its association with reductions in real and simulated aircraft handling qualities (Bakker, 1982; Crane, 1983; Gawron, Bailey, Knotts, & McMillan, 1989; Ricard & Harris, 1980; Smith & Bailey, 1982; Smith & Sarrafian, 1986) and elevations in subjective workload (Hess, 1984b; Jewell & Clement, 1985; Middendorf et al., 1991; Wickens, 1986). The influence of time delay on aircraft handling qualities and manual control skills was demonstrated by Gawron, Bailey, Knotts, and McMillan (1989) by comparing pilots’ ratings of handling qualities and dual-axis tracking performance across six time-delay conditions (0, 30, 90, 130, 180, and 240 msec) in both a USAF/FDL variable-stability, NT-33A aircraft and a grounded, fixed-base flight simulator. Gawron and her colleagues found that aircraft handling qualities, as measured by the Cooper-Harper Rating Scale, and tracking performance were degraded by the addition of time delay. Further in both cases, the effects of time delay were greater for the ground-based simulation than the in-flight test. In the case of handling qualities, Cooper-Harper ratings declined approximately 1 unit/100 msec for the in-flight simulation, and 1.5 units/100 msec in the ground simulation. Such results are consistent with the position taken by Smith and Bailey (1982), that the “allowable time delay and the rate of flying quality degradation with time delay are a function of the level of task precision, pilot technique, and subsequent aircraft response” (p. 18-9). Accordingly, time delay will have a profound effect on UAV missions that require operators to perform high-precision flight-control tasks. As noted by numerous researchers (Hess, 1984b; Jewell & Clement, 1985; Middendorf et al., 1991; Wickens, 1986) time-delayed visual feedback has also been shown to increase operator workload. For example, in the study by Middendorf and his colleagues (1991), in which participants were required to perform a low-level flight task, increases in time delay resulted in higher ratings of perceived mental workload. Using the NASA Task Load Index (TLX) to assess operator workload, Middendorf and his colleagues reported that overall workload ratings
318
NELSON, ANDERSON, AND McMILLAN
for the 300-msec time-delay condition were significantly higher than those in the baseline condition (90 msec). In addition, several of the TLX subscales, including Performance, Frustration, and Effort, were found to be sensitive to time-delay manipulations. Such outcomes corroborate researchers’ claims that the reason time delay elevates workload is that it forces pilots to generate lead information to compensate for the delay (Hess, 1984b; Jewell & Clement, 1985). Outcomes such as these are particularly relevant to UAV applications, especially when one considers the increased mission complexity and its concomitant increases in operator workload. Offsetting the Effects Given the ubiquity of time delay in immersive interface technology and UAV systems, one possible solution may be to apply time-delay compensation algorithms. In general, digital signal processing techniques, such as phase lead and phase lead/lag filters, can be used to generate “lead” in order to offset the delay in the system. Empirical studies involving manual control in flight simulation have indicated that algorithmic prediction is effective for increasing pilot–vehicle performance, pilot–vehicle stability, and handling qualities, and reducing pilot workload and training time (Crane, 1983; Hess & Meyers, 1985; Ricard, 1994; Ricard & Harris, 1980). Similarly, prediction algorithms have been used with helmet-mounted displays to alleviate the effects of time delay on head-slaved control tasks (Azuma & Bishop, 1994; List, 1983; Nelson et al., 1998; So & Griffin, 1991, 1993, 1995; Ricard, 1994; Wloka, 1995). Although algorithmic prediction can offer an effective means for time-delay compensation in VR systems, numerous researchers (Azuma & Bishop, 1994; List, 1983; So & Griffin, 1991, 1993; Wloka, 1995) have pointed out several limitations associated with this approach. First, the accuracy of the algorithm’s prediction depends, to a large extent, on the number of parameters that are being used in the prediction. For example, as noted by List (1983) and others (Azuma & Bishop, 1994), many of the filters used to predict head position assume constant head velocity; that is, changes in velocity are not considered in the prediction. Although filters that make this assumption may be resistant to small changes in velocity, considerable changes in velocity can lead to drastic over- or underestimations of the user’s actual head position (List, 1983). Second, as was described in a previous section, inaccurate head position data resulting from interference, noise, and other factors can cause errors in the appearance of updated images in HMDs. Moreover, when prediction algorithms are applied to these erroneous head position data, the problem can be exacerbated. Generally, the effect of applying prediction algorithms to “noisy” data is to cause images in the HMD to jump around or “jitter”—an effect participants have reported as being subjectively disturbing and annoying (Azuma & Bishop, 1994, 1995; So & Griffin, 1991, 1993).
14.
ALTERNATIVE CONTROL TECHNOLOGY
319
Third, prediction accuracy declines rapidly as bandwidth of the system and prediction interval are increased. Thus, Azuma and Bishop (1995) recently demonstrated that the magnitude of prediction errors varied directly with increases in (1) the size of the prediction interval and (2) the frequency range of the head motions. These effects were found to persist even when noise-free measures of head position, velocity, and acceleration were used. UAVs, the Immersive Interface, and Simulator Sickness To the extent that UAVs employ immersive virtual interface technology, issues involving simulator sickness, or “cybersickness,” will be an extremely important human factors concern. As described by Kennedy, Allgood, and Lilienthal (1989), simulator sickness refers to motion sickness–like symptoms that occur in aircrew during and following training. Symptoms include general discomfort, stomach awareness, nausea, disorientation and fatigue. There is also a prominent component of visually related disturbances such as eyestrain, headache, difficulty focusing and blurred vision . . . Aftereffects associated with simulator sickness include postural instability, dizziness, and flashbacks. Flashbacks, which include illusory sensations of climbing and turning, sensations of negative g, and perceived inversions of the visual field, are particularly problematic because of their sudden unexpected onset and risk to safety. (p. 62)
More recently, Hettinger and Riccio (1992) have coined the term “visually induced motion sickness,” or VIMS, to refer to manifestations of simulator sickness that occur in conjunction with visual virtual environment technologies. As these researchers have noted, VIMS is most frequently encountered in two general situations. First, it often occurs when operators are required to use HMDs in the presence of excessive time delays. In this case, time delays in the display cause virtual images to appear to float or swim around in the HMD, an effect that has been described as subjectively disturbing and nauseogenic (Azuma & Bishop, 1994; So & Griffin, 1991). Second, VIMS frequently occurs when operators are presented with visual information that promotes the feeling of vection or self-motion in the absence of any physical motion (i.e., vestibular and proprioceptive stimulation). This latter point will be particularly important to the extent that UAVs employ display technologies capable of providing operators with compelling illusions of self-motion that are not accompanied by motion cues. Riccio and Stoffregen (1991) have provided a theoretical explanation linking time-delayed visual feedback and motion sickness, in which they suggest that motion sickness is the result of prolonged postural instability resulting from exposure to “provocative environments”—environments, real or virtual, that contain some type of altered specification in the relations between the user and his or her environment. Examples of provocative environments known to produce postural instabilities and motion sickness include spatially rearranged visual environments
320
NELSON, ANDERSON, AND McMILLAN
produced by prism goggles (see Dolezal, 1982), rearrangements of the gravitoinertial environment produced by microgravity in spaceflight (Parker & Parker, 1990), and changes in the dynamics of support surfaces such as the undulating deck of a ship in rough waters (Kennedy, Graybiel, McDonough, & Beckwith, 1968) or the heaving cabin of an aircraft during turbulence (Kennedy et al., 1993). In each of these cases, operators’ perceptual–motor control strategies for maintaining stable posture are rendered ineffective, and if these instabilities are left unresolved, motion sickness may result (Riccio & Stoffregen, 1991). It is important to recognize that immersive UAV interfaces, which contain numerous spatial and spatiotemporal rearrangements, may also prove to be “provocative” (Riccio & Stoffregen, 1991). For example, spatiotemporal rearrangements produced by time delays in the visual display may give rise to (1) the illusory motion of virtual objects, (2) oscillatory or unstable control behavior, and (3) illusions of self-motion. To the extent that these effects produce prolonged instabilities in postural control, symptoms of simulator sickness might also arise. In addition to the debilitating physiological effects that accompany simulator sickness, Kennedy, Lane, Lilienthal, Berbaum, and Hettinger (1992) have noted that its occurrence can place serious constraints on training effectiveness. For example, in order to reduce symptomatology, pilots trained in simulators often adopt behavioral strategies that are inappropriate for the task of flying, such as closing their eyes, restricting head movement, or looking away from vection-inducing, time-delayed visual displays (Kennedy et al., 1992). In the case of UAV training or operations, the acquisition of such strategies would greatly compromise training effectiveness and performance efficiency. Further, Kennedy and his colleagues (1992) have warned that behaviors acquired to reduce symptomatology may also jeopardize the positive transfer of skills to other simulated environments. Indeed, it will be necessary for researchers to address issues involving the effects of immersive UAV interfaces on simulator sickness, training effectiveness, and performance efficiency.
CONCLUSION As pointed out by Worch (1996) and Davis (1988), the numerous fiscal and tactical advantages associated with UAVs and UCAVs provide sufficient motivation for their continued development. Unfortunately, one of the consequences of this heightened awareness and interest is that expectations of what UAV technology offers can quickly become unrealistic. Publicity surrounding the topic of “virtual reality” (VR) produced a similar state of affairs several years ago, which led to a situation in which the field of VR was characterized as demonstrating extremely high “talk-to-work” ratios (Durlach & Mavor, 1995). To be sure, the successful development of advanced UAVs will depend on the resolution of many formidable technological challenges—one of the most critical involving the design of advanced
14.
ALTERNATIVE CONTROL TECHNOLOGY
321
human–machine control interfaces. Along these lines, we have proposed that the use of various alternative control technologies potentially offer a more intuitive and efficient means for interacting with complex human–machine systems, thereby enhancing operator performance and optimizing workload. Further, we have suggested that increasingly immersive interfaces that employ coherent ensembles of alternative control technologies may afford greater mission complexity. It is important, however, to recognize the empirical nature of such claims, which underscores the need for continued research in the field of alternative control technologies. Several research topics that are suggested by the material presented herein include, but are not limited to, investigations of (1) the interaction between two or more alternative control technologies for operationally relevant tasks, (2) the interaction between alternative control technologies and traditional control devices, (3) the effects of using one or more alternative control technologies on various psychological constructs (e.g., operator workload and situation awareness, etc.), (4) the effects of alternative control technologies on task performance and cybersickness, (5) the effects of time delay on one or more alternative control technologies, and (6) the effect of operator immersion, task complexity, and alternative control technologies on performance efficiency. Last, given the focus of many of the chapters in this book— adaptive interfaces—it may be particularly useful to consider an adaptive alternative controls approach for UAV and UCAV applications, especially if alternative control technologies are found to be effective only under certain circumstances.
REFERENCES Azuma, R., & Bishop, G. (1994). Improving static and dynamic registration in an optical see-through HMD. Computer Graphics (Proceedings of SIGGRAPH ’94), 197–203. Azuma, R., & Bishop, G. (1995). A frequency-domain analysis of head-motion prediction. Computer Graphics (Proceedings of SIGGRAPH ’95), 401–408. Bakker, J. T. (1982, April). Effect of control system delays on fighter flying qualities. Paper presented at AGARD Flight Mechanics Panel Symposium on “Criteria for Handling Qualities of Military Aircraft,” Fort Worth, TX. Borah, J. (1998). Technology and application of gaze based control. In Proceedings of RTO Lecture Series 215: Alternative Control Technologies: Human Factors Issues (pp. 3:1–3:10), Neuilly-surSeine Cedex, France: RTO/NATO. Brungess, J. R. (1994). Setting the context: Suppression of enemy air defenses and joint war fighting in an uncertain world. Maxwell Air Force Base, AL: Air University Press. Bryson, S., & Fisher, S. S. (1990). Defining, modeling, and measuring system lag in virtual environments. In SPIE Proceedings on Stereoscopic Displays and Applications, (Vol. 1256, 98–109). Bellingham, WA: SPIE. Crane, D. F. (1983). Compensation for time delay in flight simulator visual-display systems. In Proceedings of the AIAA Flight Simulation Technologies Conference (pp. 163–171). Washington, DC: American Institute of Aeronautics and Astronautics. Davis, E. E. (1991). Pioneer unmanned aerial vehicles: Combat proven/combat tested. Proceedings of the Eighteenth Annual AUVS Technical Symposium and Exhibit (pp. 6–25). Washington, DC: Association for Unmanned Vehicle Systems.
322
NELSON, ANDERSON, AND McMILLAN
Davis, W. (1988). The human role in future unmanned vehicle systems (Tech. Rep. No. HSD-TR-88010). Dornheim, M. A. (Oct 23, 1995). Helmet-mounted sights must overcome delays. Aviation Week & Space Technology, p. 54. Durlach, N. I., & Mavor, A. S. (Eds.). (1995). Virtual reality: Scientific and technical challenges. Washington, D.C.: National Acadmic Press. Fahlstrom, P. G., & Gleason, T. J. (1994). Introduction to UAV systems. Columbia, MD: UAV Systems. Foxlin, E. (2002). Motion tracking requirements and technologies. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications. (pp. 163–210). New Jersey: Lawrence Erlbaum. Frank, L. H., Casali, J. G., & Wierwille, W. W. (1988). Effects of visual display and motion system delays on operator performance and uneasiness in a driving simulator. Human Factors, 30, 201–217. Frost, G. (1972). Man–machine systems. In H. P. Vancott & R. G. Kinkade (Eds.), Human engineering guide to equipment design (Rev. ed.). Washington, DC: U.S. Government Printing Office. Fulghum, D. A. (1997, June 2). Unmanned strike next for military. Aviation Week & Space Technology, 47–48. Furness, T. A. (1986). The super cockpit and its human factors challenges. In Proceedings of the Human Factors Society 30th Annual Meeting (pp. 48–52). Santa Monica, CA: Human Factors Society. Gawron, V. J., Bailey, R. E., Knotts, L. H., & McMillan, G. R. (1989). Comparison of time delay during in-flight and ground simulation. In Proceedings of the Human Factors Society 33rd Annual Meeting (pp. 120–123). Santa Monica, CA: Human Factors Society. Haas, M. W., & Hettinger, L. J. (1993). Applying virtual reality technology to cockpits of future fighter aircraft. Virtual Reality Systems, 1 (2), 18–26. Halberstadt, H. (1992). The wild weasels: History of the US Air Force SAM killers, 1965 to today. Osceola, WI: Motorbooks International Publishers. Hess, R. A. (1984a). Analysis of aircraft attitude control systems prone to pilot-induced oscillations. Journal of Guidance, Control, and Dynamics, 7, 106–112. Hess, R. A. (1984b). Effects of time delays on systems subject to manual control. Journal of Guidance, Control, and Dynamics, 7, 416–421. Hess, R. A., & Myers, A. A. (1985). A nonlinear filter for compensating for time delays in manual control systems. In Proceedings of the 20th Annual Conference on Manual Control (pp. 93–116). Moffett Field, CA: NASA-Ames Research Center. Hettinger, L. J., Cress, J. D., Brickman, B. J., & Haas, M. W. (1996). Adaptive interfaces for advanced airborne crew stations. Proceedings of the Third Annual Symposium on Human Interaction with Complex Systems (pp. 188–192). Los Alamitos, CA: IEEE Computer Society Press. Hettinger, L. J., & Riccio, G. E. (1992). Visually induced motion sickness in virtual environments. Presence, 1(3), 306–310. Jewell, W. F., & Clement, W. F. (1985). A method for measuring the effective throughput time delay in simulated displays involving manual control. Proceedings of the 20th Annual Conference on Manual Control (pp. 173–183). Moffett Field, CA: NASA-Ames Research Center. Junker, A., Berg, C., Schneider, P., & McMillan, G. R. (1995). Evaluation of the cyberlink interface as an alternative human operator controller (Tech. Rep. No. AL/CF-TR-1995-0011). Kennedy, R. S., Allgood, G. O., & Lilienthal, M. G. (1989). Simulator sickness on the increase. In Proceedings of the AIAA Flight Simulation Technologies Conference (Paper No. 89-3269, pp. 62–67). New York: American Institute for Aeronautics and Astronautics. Kennedy, R. S., Lane, N. E., Lilienthal, M. G., Berbaum, K. S., & Hettinger, L. J. (1992). Profile analysis of simulator sickness symptoms: Application to virtual environment systems. Presence, 1 (3), 295–301. Kocian, D. F., & Task, H. L. (1995). Visually coupled systems hardware and the human interface. In W. Barfield & T. A. Furness (Eds.), Virtual environments and advanced interface design (pp. 175–256). New York & Oxford, England: Oxford University Press.
14.
ALTERNATIVE CONTROL TECHNOLOGY
323
List, U. (1983). Nonlinear prediction of head movements for helmet-mounted displays (Tech. Rep. No. AFHRL-TP-83-45). Brooks Air Force Base, TX: Air Force Systems Command. McFarland, D. J., Lefkowicz, A. T., & Wolpaw, J. R. (1997). Design and operation of an EEG-based brain–computer interface with digital signal processing technology. Behavior Research Methods, Instruments, & Computers, 29, 337–345. McMillan, G. R. (1998). The technology and applications of gesture-based control. Proceedings of RTO Lecture Series 215: Alternative Control Technologies: Human Factors Issues (pp. 4:1–4:11), Neuilly-sur-Seine Cedex, France: RTO/NATO. McMillan, G. R., Calhoun, G. L., Middendorf, M. S., Schnurer, J. H., Ingle, D. F., & Nasman, V. T. (1995). Direct brain interface utilizing self regulation of the steady-state visual evoked response. Proceedings of the RESNA 18th Annual Conference, (pp. 693–695). Vancouver, Canada: RESNA. McMillan, G. R., Eggleston, R. G., & Anderson, T. R. (1997). Nonconventional controls. In G. Salvendy, (Ed.), Handbook of human factors and ergonomics (2nd ed.). New York: Wiley. Meyer, K., Applewhite, H. L., & Biocca, F. A. (1992). A survey of position trackers. Presence, 1, 173–200. Middendorf, M. S., Fiorita, A. L., & McMillan, G. R. (1991). The effects of simulator transport delay on performance, workload, and control activity during low-level flight. In Proceedings of the AIAA Flight Simulation Technologies Conference (pp. 412–426). Washington, DC: American Institute of Aeronautics and Astronautics. Middendorf, M. S., Lusk, S. L., & Whiteley, Capt. J. D. (1990). Power spectral analysis to investigate the effects of simulator time delay on flight control activity. In Proceedings of the AIAA Flight Simulation Technologies Conference (pp. 46–52). Washington, DC: American Institute of Aeronautics and Astronautics. Munson, K. (1988). World unmanned aircraft. London: Jane’s Publishing. Nasman, V. T., Calhoun, G. L., & McMillan, G. R. (1997). Brain-actuated controls and HMDs. In J. E. Melzer & K. Moffitt (Eds.), Helmet- and head-mounted displays (pp. 285–312). New York: McGraw-Hill. Nelson, W. T., Hettinger, L. J., Cunningham, J. A., Roe, M. M., Haas, M. W., & Dennis, L. B. (1997). Navigating through virtual flight environment using brain–body-actuated control. In Proceedings of the 1997 Virtual Reality Annual International Symposium (VRAIS 97), (pp. 30–37). Los Alamitos, CA: IEEE Computer Society Press. Nelson, W. T., Hettinger, L. J., Cunningham, J. A., Roe, M. M., Lu, L. G., Haas, M. W., Dennis, L. B., Pick, H. L., Junker, A., & Berg, C. B. (1996). Brain–body-actuated control: Assessment of an alternative control technology for virtual environments. In Proceedings of the 1996 IMAGE Society Conference (pp. 225–232). Scottsdale, AZ: IMAGE Society. Nelson, W. T., Hettinger, L. J., Haas, M. W., Warm, J. S., Dember, W. N., & Stoffregen, T. A. (1998). Compensation for the effects of time delay in a virtual environment. In R. R. Hoffman, M. F. Sherrick, & J. S. Warm (Eds.), Viewing Psychology as a Whole: The Integrative Science of William N. Dember (pp. 579–601). New York: Lawrence Erlbaum. Nixon, M. A., McCallum, B. C., Fright, W. R., & Price, N. B. (1998). The effects of metals and interfering fields on electromagnetic trackers. Presence, 7(2), 204–218. Pfurtscheller, G., Flotzinger, D., & Neuper, C. (1994). Differentiation between finger, toe and tongue movement in man based on 40 Hz EEG. Electroencephalography and Clinical Neurophysiology, 90, 456–460. Pimentel, K., & Teixeira, K. (1993). Virtual reality: Through the new looking glass. New York: Intel/Windcres/McGraw-Hill. Poulton, E. C. (1974). Tracking skills and manual control. New York: Academic Press. Ricard, G. (1994). Manual control with delays: A bibliography. Computer Graphics, 28, 149–154. Ricard, G. L., & Harris, W. T. (1980). Lead/lag dynamics to compensate for display delays. Journal of Aircraft, 17, 212–217.
324
NELSON, ANDERSON, AND McMILLAN
Ricard, G. L., & Puig, J. A. (1977). Delay of visual feedback in aircraft simulators. NAVTREQUIPCEN TN-56. Riccio, G. E., & Stoffregen, T. A. (1991). An ecological theory of motion sickness and postural instability. Ecological Psychology, 3, 195–240. Slenker, K. A., & Bachman, T. A. (1990). Development of an unmanned air vehicle mission planning and control station (MPCS) using Ada. In Proceedings of the 1990 AUVS Conference (pp. 394–409). Dayton, OH: Association for Unmanned Vehicle Systems. Smith, R. E., & Bailey, R. E. (1982, April). Effects of control system delays on fighter flying qualities. Paper presented at AGARD Flight Mechanics Panel Symposium on “Criteria for Handling Qualities of Military Aircraft,” Fort Worth, TX. Smith, R. E., & Sarrafian, S. K. (1986). Effect of time delay on flying qualities: An update. Journal of Guidance, Control, and Dynamics, 9, 578–584. So, R. H. Y., & Griffin, M. J. (1991). Effects of time delays on head tracking performance and the benefits of lag compensation by image deflection. In Proceedings of the AIAA Flight Simulation Technologies Conference, (Paper No. 91-2926, pp. 124–130). New York: American Institute for Aeronautics and Astronautics. So, R. H. Y., & Griffin, M. J. (June, 1993). Effect of lags on human performance with head-coupled simulators. (Tech. Rep. No. AL/CF-TR-1993-0101). Wright-Patterson Air Force Base, OH: Air Force Materiel Command. So, R. H. Y., & Griffin, M. J. (1995). Effects of lags on human operator transfer functions with headcoupled systems. Aviation, Space, and Environmental Medicine, 66, 550–556. Stanney, K. M. (Ed.) (2002). Handbook of virtual environments: Design, implementation, and applications. New Jersey: Lawrence Erlbaum. Stinnett, T. A. (1989). Human factors in the super cockpit. In R. Jensen (Ed.),Aviation psychology (pp. 1–37). Brookfield, VI: Gower. Stoffregen, T. A., & Riccio, G. E. (1988). An ecological theory of orientation and the vestibular system. Psychological Review, 95, 3–14. Wells, M., & Griffin, M. (1987). A review and investigation of aiming and tracking performance with head-mounted sights. In IEEE Transactions on Systems, Man, and Cybernetics (Vol. SMC-17, pp. 210–221). Wickens, C. D. (1986). The effects of control dynamics on performance. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance: Vol. 2. Sensory processes and perception (pp. 39-1–39-60). New York: Wiley. Wilder, J., Hung, G. K., Tremaine, M. M., & Kavr, M. (2002). Eye tracking in virtual environments. In K. M. Stanney (Ed.). Handbook of virtual environments: Design, implementation, and applications. (pp. 211–222). New Jersey: Lawrence Erlbaum. Wloka, M. M. (1995). Lag in multiprocessor virtual reality. Presence, 4, 50–63. Worch, P. R. (1996). United States Air Force Scientific Advisory Board study on UAV: Technologies and combat operations (Rep. No. SAF/PA 96-1204). Washington DC: U.S. General Printing Office.
15 Medical Applications of Virtual Reality Richard M. Satava, M.D., FACS∗ Yale University School of Medicine Defense Advanced Research Projects Agency Advanced Biomedical Technology Program
Shaun Jones
Virtual reality (VR) represents a core technology in the revolution of the “Information Age.” Although there is no consensus on a definition of VR, there appears to be two aspects that are recognized: It is an environment and an interface. The virtual environment is usually described as having three components: It is a threedimensional computer generated image, it is highly interactive, and it provides the user with the impression of actually existing in the environment (presence). The interface is usually described by the component technologies, such as display (head-mounted display, 3-D monitor, CAVE, etc.) for visualization and input devices (e.g., DataGlove, wand, and 3-D mouse) for interactivity. Although VR originally was described as immersive (being completely surrounded by the 3-D environment), other nonimmersive approaches have provided the illusion of VR. Thus, it is a computer-based tool that gives the user an opportunity to understand something (through 3-D visualization) about the real world that may not be otherwise comprehensible. It is a learning engine. * Correspondence
regarding this chapter should be addressed to Richard M. Satava, M.D., FACS, Yale University School of Medicine, Department of Surgery, Room BB430. University of Washington Medical Center 1959 NE Pacific Street Seattle, WA 98195
325
326
SATAVA AND JONES
As a scientific tool, VR is available to all disciplines, including medicine. The areas of application within medicine are for diagnosis, therapy, and education. But more significant is that VR can help understand the fundamental revolution of the Information Age as applied to medicine. Nicholas Negroponte finds the key to this understanding in the book Being Digital, in which he describes the difference between the real world and information world in terms of “atoms” and “bits” (Negroponte, 1995). He uses the example of the fax machine. For thousands of years, humans have been sending documents from one place to another in the form of atoms, such as clay tablets, papyrus, and paper. Now, in the Information Age, we use the fax machine to send the bits instead of the atoms. This results in the document being sent from one place to another quicker, cheaper, and more effectively. In the medical field, the most important atoms comprise the human body, or more specifically, a person or patient. With the impetus from Dr. Donald Lindberg, director of the National Library of Medicine, and Dr. Michael Ackerman (cf. Ackerman, 1991), program manager of the Visible Human project, there is now the equivalent of “bits” of a person (see Fig. 15.1). Today, anyone can have a Computed Tomography (CT), or Magnetic Resonance Imaging (MRI) scan, which will provide a computer representation of their body (bits) that represent the “information equivalent” of their atoms. This information equivalent is a “medical avatar,” a virtual representation of the person. Admittedly, the representation is quite limited today; however with technological advances, more information (properties such as vital signs, biochemical values, genetic coding, kinematics and dynamic parameters, etc.) can be added to the image so that eventually it will be an information-rich approximation of the person. Just as each cell in the human body contains the information (DNA, proteins, enzymes, etc.) about the person, so, too, can the medical avatar have the appropriate information encoded in each pixel—a concept of the “deep pixel,” each pixel containing not only location but physiological, biochemical, and genetic data. Thus, by clicking on an image of an organ, it will be possible to retrieve all the information about the organ (because it is stored in the “deep pixels”). In a broader context, VR is a form of 3-D visualization, which is major component of medical information. In a 1995 workshop chaired by Dr. Nathaniel Durlach (Durlach & Mavor, 1995), the question was informally raised as to how much of medical practice is actually information management (see Fig. 15.2). Extrapolating from the concept of information equivalents, as much as 70–80% could be considered information. For example, when surgeons or radiologists perform procedures or surgical operations they are looking at the electronic representation of patient’s organs on a monitor (laparoscopic or video assisted surgery)—the information equivalent of the real organs. When surgery is complete, the physician visits the person in the recovery room, looking at the heart rate and blood pressure on the monitor (the information equivalent of the sense of touch). When looking at X rays, MRIs, ultrasound, or other imaging modalities, the physician views a digital image (bits) instead of film (atoms). Our medical records are becoming electronic, instead of writing with pen and paper. The educational process is becoming more and more
15.
MEDICAL APPLICATIONS
327
FIG. 15.1. The Visible Human project, with 3-D reconstruction performed by Michael Sellberg, Engineering Animation, Inc., Des Moines, Iowa.
328
SATAVA AND JONES
FIG. 15.2. Representing medical practice as information equivalents (see text for explanation).
information-based through computer-aided instruction, virtual anatomy, surgical simulations, electronic text and references, and Web-based educational resources. If the emerging technologies of robotics and computer enhanced surgery continues to progress on the current course, even our technical skills will be a form of information—the surgeon moves the input device, the information flows through the system, and the scalpel cuts. We have now replaced blood and guts with bits and bytes, and the virtual can represent the real. By using this expanded concept of virtual reality as information equivalents (or computer representation of the real world), examples of current applications in the areas of diagnosis, therapy, and education and training are listed here. Nearly all are in the laboratory, prototype, or investigational phase; however, they provide concrete illustrations of what is possible and demonstrate the scientific basis for the framework proposed above.
DIAGNOSIS The art of diagnostic medicine is based on the science of data acquisition about a patient. As a generalization, data can be obtained through history, physical examination, laboratory tests, and diagnostic imaging. The relevance to VR is
15.
MEDICAL APPLICATIONS
329
that it is this information (based on scientifically acquired data) that creates the medical avatar, from which all other medical applications derive. The history (taken verbally or by data entry), physical examination (by physician exam or biosensor vital-signs acquisition) and laboratory values (by noninvasive laboratory diagnosis) provide the properties to the image. It is the imaging modalities that create the visual representation (medical avatar), which are then enriched with the properties. Thus, the use of 3-D visualization in image acquisition and display has become the first application for VR. The modalities commonly used are CT and MRI scans; however, ultrasound, Positron Electron Tomography (PET), and Single Photon Computed Tomography (SPECT) are also adding further anatomic and functional information. The initial patient image is created (e.g., a CT scan) by “rendering”; this consists of segmentation, or separating out, the individual organs and tissues from the serial sections and then displaying the reconstructed 3-D anatomy on a monitor or other display device. Nearly every major medical center is providing 3-D images in many specialties, with the prominent applications being the brain and brain vasculature for neurosurgery, heart and coronary vasculature for cardiology and cardiac surgery, bones and joints for orthopedic surgery, skull and face for Ear Nose and Throat (ENT), ophthalmology and plastic surgery, and some solid organs (liver, kidney, and uterus, etc.) for general, urologic, and gynecologic surgery. Virtual endoscopy is a recent thrust for noninvasive intraluminal diagnosis, which has traditionally been done by inserting video endoscopes into natural body openings, such as nasal sinuses for sinusoscopy, the mouth and lungs for bronchoscopy, the mouth and stomach for upper gastrointestinal endoscopy, the anus for sigmoidoscopy and colonoscopy, and joints for arthroscopy. In addition, previous invasive visualization of the arteries and veins (angiography) is converting to noninvasive digital angiography, especially for the arteries of the brain (carotid) and heart (coronary). There are numerous examples of volume visualization of the brain, as exemplified by James Duncan of Yale University School of Medicine (cf. Staib & Duncan, 1996; see Fig. 15.3). A first step in applying volume visualization to the liver is being explored by Jacques Marescaux of Strasbourg, France (Marescaux et al., 1998). Software refined on the Visible Human is now being used to segment and visualize hepatic lesions within the context of the complicated vascular and biliary anatomy (see Fig. 15.4). Beside the utilization of 3-D volume visualization of an organ, virtual endoscopy is the procedure that is beginning to yield near-term results. The preliminary study selected by the National Institutes of Health and the National Cancer Institute for clinical studies is virtual colonoscopy as a screening procedure for cancer (see Fig. 15.5). This is a reasonable choice because of the large number of patients requiring colonscopy, the target disease is an anatomic abnormality (as opposed to diagnoses made by subtle surface changes in mucosa, color, etc.), and the impact on overall health of the nation. One of the detractors is the need to provide an excellent bowel preparation. Interestingly enough, one approach might be to “tag” the stool (perhaps by drinking a particular contrast agent that would be incorporated into
FIG. 15.3. Three dimensional image of the brain, typical of diagnostic and anatomic visualization of the brain and associated structures. Courtesy Dr. Jim Duncan, Yale University School of Medicine.
FIG. 15.4. Three dimensional visualization of the liver with segmentation of the vascular and biliary trees in relation to a malignant tumor. Courtesy Dr. Jacques Marescaux, European Institute of Tele Surgery (EITS), Strasbourg, France.
330
15.
MEDICAL APPLICATIONS
331
FIG. 15.5. Virtual colonoscopy, a view of the lumen. Courtesy Dr. Richard Robb, Mayo Clinic, Rochester, Minn.
stool). Then, by using intelligent digital signal processing, the entire stool with the contrast material can be electronically “subtracted” from the remainder of the bowel wall—sort of an electronic enema or bowel prep. Obviously, the attractiveness of this approach to patients would be decreased discomfort from brutal laxative and cathartic bowel preps, the requirement for sedation, and subsequent discomfort from inserting the colonoscope. The current level of accuracy of virtual endoscopy is over 97% accuracy for all lesions greater than 5 mm, and about 75–80% for lesions 3 mm or larger, in a number of series that compare virtual endoscopy to video endoscopy. Because the incidence of cancer in lesions less than is 5 mm is so low (less than 1%), this is a reasonable first evaluation of efficacy for virtual endoscopy for general screening. Principal investigators and participating institutions in this preliminary study are Richard Robb (Mayo Clinic), David Vining (Bowman-Gray Medical Center), Ron Kikinis and Ferenz Jolesz (Brigham and Women’s Hospital), William Lorensen (G.E. General Electric Medical Research), and Sandy Napel (Stanford University Medical Center; cf. Lorensen, Jolesz, & Kikinis, 1995; Ribin et al., 1996). There are two other next-generation extensions of virtual endoscopy that are being investigated: organ-specific texture mapping and numerical biopsy. Organspecific texture mapping refers to using additional information from CT scan or MRI scan to provide precise color to the surfaces of the organs and tissues.
332
SATAVA AND JONES
The concept is possible because of the “Visible Human project,” which is three complete 3-D data sets (CT scan, MRI scan, and color phototomography) of a real person that have been exactly matched, pixel for pixel. Thus, each pixel has a Hounsfield unit value from a CT scan, a T-spin from an MRI, and a color value from the color photography. Thus, for each organ, a “color look-up table” can be created. Michael Sellberg of Engineering Animation, Inc., has created a liver model from the Visible Human that permits a specific color to be substituted for each Hounsfield unit of the CT scan (M. Sellberg, personal communication, 1999). The resultant “colorized” CT scan has the color and texture of the photographed liver. Although in a preliminary stage and only completed for the liver of the Visible Human, this demonstrates how a diagnosis can be made of lesions that do not distort the anatomy and rely on visual inspection cues for diagnosis, such as inflammation and AV malformations. “Numerical biopsy,” proposed by Richard Robb of the Mayo Clinic, is similar in concept to texture mapping in the sense that the combination of absolute numerical values from the CT scan or MRI has the potential to make the diagnosis of malignant versus benign without actually removing the tissue (Robb, Aharon, & Cameron, 1997). A similar noninvasive technique has been investigated using lasers for hyperspectral analysis of mucosal lesions of the colon and the esophagus for Barrett’s esophagus(Zonios et al., 1998). This “optical biopsy” is accomplished by reading the numerical value of the reflected light. Because larger molecules (like proteins and DNA) have very specific reflected light signals (“optical signatures”), and malignant tissue has higher concentration of DNA, there are very distinctively different signatures for benign and malignant tissues. Researchers will study the possibility for the numerical biopsy potential for a whole host of signatures, including Hounsfield units from the CT scan. There are a number of barriers to be overcome in making virtual endoscopy a simple to use, practical, and accurate diagnostic tool. Today, the segmenting of the various organs must be done by hand, an inaccurate and labor-intensive procedure. Research is in progress to have this done automatically by the CT or MRI scanner as it acquires the image—in real time. Thus, immediately after the patient has the scan, the radiologist, gastroenterologist or surgeon will have the 3-D model of the patient organ to “fly through.” There is also the problem of “registration,” or aligning the pixels up exactly so that the same MRI, CT, and other scan data can match the identical pixel of the corresponding scan. Without this registration, the look-up table for texture and the numerical biopsy for diagnosis cannot be accomplished in a reasonable amount of time. Also, as mentioned above, the removal of extraneous material, such as a stool for a colonoscopy, must be solved. A very different diagnostic application of VR is by the artist Rita Addison in her environment called DETOUR (Addison, 1995). As a patient, she had suffered a transient neurologic deficit of vision and expressive aphasia (inability to express the thoughts she was having) following a head injury. To communicate with her physician, she made a virtual gallery of her paintings and then distorted them
15.
MEDICAL APPLICATIONS
333
FIG. 15.6. Painting of reeds in a pond showing the visual distortion in a virtual environment. Courtesy Rita Addison, Princeton, N.J.
to represent the visual field defects that she would transiently experience (see Fig. 15.6). Thus, she was able to show her physician the visual defect she was experiencing, change the distortion, and let the physician make the diagnosis by experiencing the abnormality in a virtual environment. THERAPY The acquisition and segmentation of the 3-D image for diagnosis also constitutes the first step toward therapy. With such an image, a surgeon can now use the image to preoperative plan a procedure. Joseph Rosen of Dartmouth University Medical Center has a VR model of a face with deformable skin that allows the practicing of a plastic surgical procedure and demonstration of the final outcome before making the incision on a patient (Rosen, 1992). Scott Delp has a virtual model of a lower leg on which he can practice a tendon transplant operation and then “walk” the leg to predict the short and long-term consequences of the surgery (see Fig. 15.7; Delp & Zajac, 1992). Likewise, Dr. Altobelli of the Brigham and Women’s Hospital has developed a system that creates 3-D images from the CT scan of a child with bony deformities of the face (cranio–facial dysostosis); using this 3-D model, the bones can be correctly rearranged to symmetrically match the normal side of the face, permitting repeated practice of this extremely difficult procedure (Gleason et al.,
334
SATAVA AND JONES
FIG. 15.7. Virtual lower leg, which had precise kinematic properties that enable operative planning and outcomes analysis. Courtesy Dr. Scott Delp, MusculoGraphics, Inc., Chicago, Ill.
1994). As noted previously, Jacques Maresceaux has patient-specific liver models that are imported into a planning tool that permits visualization of the plane of segmental resection, margins of tumor as well as points of vascular and biliary tree resection (see Fig. 15.8; Evrad et al., 1992). Another therapeutic application is interoperative navigation. Currently, ultrasound and X-ray guided stereotactic breast biopsy use the image to guide biopsy location. In neurosurgical applications, Ferenz Jolesz of Brigham and Woman’s Hospital has provided the capability for 3-D MRI scans of an individual patient’s brain tumor (Gleason et al., 1994). At the time of brain surgery, the MRI scan is fused with the video image of the patient’s actual skull or brain, thus giving “X-ray vision” of the tumor that is not otherwise visible when it is deeply embedded in the brain tissue (see Fig. 15.9). This is a capability only possible by using the information equivalents in a virtual environment. Rehabilitation is another area that has applications for VR. Walter Greenleaf creates a virtual environment for persons with disabilities to both have a quantitative
FIG. 15.8. Preoperative planning with a 3-D reconstruction of a patient’s liver and tumor. Courtesy Dr. Jacques Marescaux, European Institute of Tele Surgery (EITS), Strasbourg, France.
FIG. 15.9. Fusing of preoperative MRI-derived brain tumor with the real-time video surgery to provide interoperative navigation and “X-ray vision.” Courtesy Dr. Ferenc Jolesz, Brigham and Women’s Hospital, Boston, Mass.
335
336
SATAVA AND JONES
analysis of progress while restoring musculoskeletal function, as well as a teaching environment for acquiring new skills to compensate for the disability(Greenleaf, 1997). There is a motorized wheelchair that is an input device that uses a virtual world to teach how to navigate various obstacles. In the area of psychiatry, Ralph Lamson has used VR to treat phobias (Lamson & Meisner, 1994). He has created virtual representations of actual places, such as an elevator or the Golden Gate Bridge. This permits patients to very gradually adapt and experience situations that are threatening to them and slowly conquer their fears. Other efforts are in progress to assess and treat emotional disorders.
EDUCATION The VR application that has received the most attention is surgical simulation. This is natural, because it is a direct analogy to flight simulation—a very complex, highly intense information-rich situation requiring immediate life-and-death decisions. Flight simulation has more than 80 years of development and has been used for official FAA certification since the late 1950s. Surgical simulation is less than a decade old. Although there is great similarity, there is one great difference that makes medical simulation much more difficult. Flight simulators are created from the same designs that built the aircraft, and a given aircraft remains exactly the same over time—it is a physical object (an engineering term meaning “rigid body problem”). Medical simulations must learn the design of the human body, which is variable from person to person, and which changes from moment to moment—it is a biologic system (an engineering term meaning “nonrigid body problem”). There are orders of magnitude and more complexity to biologic systems. Nevertheless, even simple simulators can provide enormous value, as manifest by the earliest flight simulators, which were adapted carnival rides (see Fig. 15.10). None of the early pilots on the early simulators ever had the illusion that they were flying an actual aircraft; however, even the simple instrumented simulators were able to reduce crash landings in bad weather or the dark by about 95% without subjecting the pilots to the risk of an actual crash. This is the promise for surgical and procedural simulators. Simulations can occur with many different levels of models, from simple graphic objects to represent various organs or tissues, to highly complex environments with multiple systems portrayed. The simple simulators are used to teach a task, such as the intravenous needle insertion (see Fig. 15.11) of HT Medical (D. Meglan et al., 1996), although more complex models can be used for procedures like angioplasty or complete operations like laparoscopic cholecystectomy (G. Buess, personal communication, 1999). However, the more complicated the environment that is being simulated, the less realistic the images will appear. Even with current high-performance technology, there is not enough computational power to display
FIG. 15.10. Edwin Link in the first flight simulator, constructed from a carnival ride and his father’s organ instruments. Courtesy CAE Link, Inc.
FIG. 15.11. Simple surgical simulation of inserting a needle below the clavicle for central venous placement. Courtesy Jonathan Merrill, High Techsplanations, Inc., Rockville, Md.
337
338
SATAVA AND JONES
FIG. 15.12. Early surgical simulation of gallbladder surgery (cholecystectomy). Courtesy author.
all the needed aspects of the simulation. These requirements include the visual fidelity of the tissues, the properties of tissues, the dynamic changes to tissues, the contact detection between surgical instruments and the tissues, and the drawing of the objects in real time in high-resolution full color at least 40 times per second (40 Hz is the minimum video refresh rate that provides completely flicker-free video motion). Thus, all simulations are a compromise of trade-offs of the above components. Will the simulator have cartoonlike organs that behave like real organs in real time, such as the early simulations by Satava (see Fig. 15.12; Satava, 1993), or will they look photorealistic but not have any properties (i.e., not bleed when cut or not change shape when probed) and be displayed on the monitor in jerking motion at only a few frames per second? Until we have massively more computer power, we must match the educational need with the current level of technological capability. These simulators, such as MusculoGraphics’ Limb Trauma Simulator (see Fig. 15.13) or Boston Dynamics’ “Anastomosis Simulator” (see Fig. 15.14; Playter & Raibert, 1997) cannot only provide a training opportunity but also a skills performance assessment tool. As the instrument handles of the simulator are manipulated to perform the surgical simulation, the system can also be tracking the hand motions, pressures applied, accuracy and precision of needle placement, and other movements. This data can be compiled into a full analysis of individual skills for objective feedback on enhancing training or even for credentialing.
FIG. 15.13. Limb trauma simulator. Courtesy Scott Delp, MusculoGaphics, Inc., Chicago, Ill.
FIG. 15.14. Anastomosis simulator. Courtesy Dr. Marc Raibert, Boston Dynamics, Inc., Boston, Mass.
339
340
SATAVA AND JONES
Early implementations are in the areas above, as well as in complementing the current Advanced Trauma and Life Saving course. The key elements in all of these simulators will be the educational content, curriculum, and assessment tools, more than the technical capabilities. Efforts must be focused as much on these nontechnical areas as in the simulator capabilities in order to improve surgical training. Telepresence surgery and dexterity-enhanced surgery (both are forms of computer-aided surgery) have not been included because the procedure is being performed on the actual animal model or patient. However, these systems are an integral part of where the future lies. Because the surgeon uses the information equivalent video image (as opposed to directly looking at the real organs), these systems are both operating systems and surgical simulator. By simply replacing the real video image on the monitor with a computer-generated image from a library of difficult cases or even patient specific data, the surgical system is now a simulator. The beauty of such an integrated single-system approach is that you train as you operate, and operate as you train.
HEALTH CARE INTEGRATION The power of VR as the core element in the above framework for the revolution in medicine resides in the ability of the information equivalents and medical avatar to integrate the entire spectrum of health care. The following scenario, referred to as the Doorway to the Future, is used to illustrate how such a “systems approach” can improve, by orders of magnitude, the provision of health care. This description is intended to provide a pathway for integrating individual technologies that are just emerging, and to show how advances in one area can stimulate further leaps forward in collateral technologies. It might require 10, 20 or even 50 years to accomplish the integrated whole; however, early implementation can occur today as the integration process occurs. A patient enters a physician’s office and passes through a doorway, the frame of which contains many scanning devices, from CT scan to MRI to ultrasound to near infrared and others. These scanners not only acquire anatomic data, but physiologic and biochemical data (like the pulse oximeters today). When the patient sits down next to the physician, a full 3-D holographic image of the patient appears suspended on the desktop, a visual integration of the information acquired just a minute before by the scanners. This image is the patient’s medical record. When the patient expresses the complaint of pain over the right flank, the physician can rotate the image, remove various layers, and query the representation of the patient’s liver or kidney regarding the lactic dehydrogenase, serum glutamic-oxaloacetic transaminase, alkaline phosphatase, serum creatinine, or other relevant information. This information, and more, is stored in each pixel of the patient’s representative image
15.
MEDICAL APPLICATIONS
341
(medical avatar) such that the image of each structure and organ (such as the liver) stacks up into a “deep pixel” all the relevant information about the structure. Each pixel contains not only anatomic data, but also biochemical, physiologic, past historical, and other data so that information can be revealed directly from the image rather than by searching through volumes of written medical records or making a prolonged computer-database search. Diagnosis can be performed with virtual endoscopy by a “fly-through” of their virtual organs. Should a problem or disease be discovered, the image could be immediately used for patient education, instantly explaining to the patient, demonstrating on his or her own avatar what the problem might be. Should a surgical problem be discovered, this same image can be used by the surgeon for preoperative planning, as is done by Rosen or Altobelli, or imported into a surgical simulator, like Levy, to practice a variety of different approaches to a difficult surgical procedure to be performed on the patient the next morning. At the time of operation, the image can be fused with a video image and used for intraoperative navigation or to enhance precision, as is performed by Jolesz stereotactic neurosurgery. During the post operative visits, a follow-up scan can be compared to the preoperative scan, and using digital subtraction techniques, the differences can automatically be processed for outcomes analysis. Because the avatar is an information object, it can be stored on a credit card–size medical record (the U.S. Army has the Personnel Information Carrier, or PIC) and can be available and distributed (through telemedicine or over the Internet) anytime and any place. Thus, this single concept of replacing the written medical record (including X ray and other images) with the visual record of a medical avatar permits the entire spectrum of health care to be provided with unprecedented continuity.
CONCLUSION Clearly, a liberal interpretation of VR demonstrates how this scientific tool infuses and stimulates the next generation of health care. Other equally important areas, such as medical informatics, biotechnology, and healthcare management, are also shaking the foundations of medicine. It is obvious that not all of the possible technological advances described above will come to fruition, nor will the entire system evolve as described. However, this provides a framework to evaluate the importance of individual technologies as part of a whole system and sets a long-term strategic goal that gives meaning to the technologies and relative importance for program planning. Difficult challenges remain ahead: the stringent and rigorous scientific and clinical evaluation of the technologies and applications, and their practicality based on both the improved quality of health care and the cost-effectiveness. It must be remembered that the final uncompromising metric can only be how the individual patient will be benefited.
342
SATAVA AND JONES
ACKNOWLEDGMENTS The opinions or assertions contained herein are the private views of the authors and are not to be construed as official, or as reflecting the views of the Department of the Army, Department of the Navy, the Defense Advanced Research Projects Agency, or the Department of Defense.
REFERENCES Ackerman, M. J. (1991). The visible human project. Biocommunincations, 18, 14. Addison, R. (1995). DETOUR: Brain Deconstruction. In R. M. Satava, K. Morgan, H. B. Sieburg et al. (Eds.), Proceedings of the interactive technology and the new paradigm for healthcare. Amsterdam: IOS. Delp, S. L., & Zajac, F. R. (1992). Force and moment generating capacity of lower limb muscles before and after tendon lengthening. Clinical Orthopedics and Related Research, 284, 247–259. Durlach, N. I., & Mavor, A. S. (1995). Virtual reality: Scientific and technological challenges. Washington, DC: National Academy Press. Evrad, S., Moyses, B., Ghnassia, J. P., Vix, M., Mutter, D., Methelin, G., & Marescaux, J. (1992). Validation of the measurement of hepatic volume by three-dimensional computed tomography. Annals Chirurgie, 46, 601–604. Gleason, P. L., Kikinis, R., Altobelli, D., Wells, W., Alexander, E., III, Black, P. M., & Jolesz, F. (1994). Video registration virtual reality for nonlinkage stereotactic surgery. Stereotactic Functional Neurosurgery, 63, 139–143. Greenleaf, W. J. (1997). Applications of virtual reality technology to therapy and rehabilitation: Including physical therapy and disability solutions. Orthopaedic Physical Therapy Clinics of North America, 6, 81–98. Lamson, R., & Meisner, M. (1994). The effects of virtual reality immersion in the treatment of anxiety, panic, and phobia of heights. In Virtual Reality and Persons with Disabilities: Proceedings of the second Annual International Conference. San Francisco, Lornesen, W. E., Jolesz, F. A., & Kikinis, R. (1995). The exploration of cross-sectional data with a virtual endoscope. In R. M. Satava, K. Morgan, (Eds.), Interactive technology and the new medical paradigm for health care. Washington, DC: IOS. Marescaux, J, Clement, J. M., Tassetti, V., Koehl, C., Sotin, S., Russier, Y., Mutter, D, Delingette, H., & Ayache, N. (1998). Virtual reality applied to hepatic surgery simulation: The next revolution. Annals of Surgery, 228, 627–634. Meglan, D. A., Raju, R., Merrill, G. L., Merrill, J. A., Shankar, N. S., & Higgins, G. A. (1996). The Teleos virtual environment toolkit for simulation based surgical eduaction. In S. J. Weghorst, H. B. Sieberg, & K. S. Morgan (Eds.), Healthcare in the information age. Amsterdam: IOS. Negroponte, N. (1995) Being digital. New York: Knopf. Playter, R., & Raibert, M. (1997) A virtual surgery simulator using advanced haptic feedback. Minimally Invasive Therapy and Allied Technologies, 6, 117–121. Ribin, G. D., Beaulieu, C. F., Arigiro, V., Ringl, H., Norbash, A. M., Feller, J. F., Dake, M. D., Jeffrey, R. B., & Napel, S. (1996) Perspective volume rendering of CT and MR images: Applications for endoscopic imaging. Radiology, 119, 321–330. Robb, R. A., Aharon, S., & Cameron, B. M. (1997). Patient-specific anatomic models from 3-dimensional medical image data for clinical applications in surgery and endoscopy Journal of Digital Imaging, 10, 31–35.
15.
MEDICAL APPLICATIONS
343
Rosen, J. (1992) From computer-aided design to computer-aided surgery. In Proceedings of medicine meets virtual reality. San Diego, CA:. Satava, R. M. (1993) Virtual reality surgical simulator: The first steps. Surgical Endoscopy, 7, 203–205. Staib, L. H., & Duncan, J. S. (1996). Model-based deformable surface finding for medical images. IEEE Transactions on Medical Imaging, 15, 720–726. Zonios, G., Cothren, R., Crawford, J. M., Fitzmaurice, M., Manoharan, R., Van Dam, J., & Feld, M. S. (1998) Spectral pathology. Annals of the New York Academy of Sciences, 838, 108–115.
16 Face-to-Face Communication Nadia Magnenat Thalmann Prem Kalra* Marc Escher MIRALab, CUI University of Geneva
Airplane and car manufacturers created the first computerized human models 20 years ago. The main idea was to simulate a very simple articulated structure for studying problems of ergonomics. In the 1970s, researchers developed methods to animate human skeletons, mainly based on interpolation techniques. Bodies were represented by very primitive surfaces like cylinders, ellipsoids, or spheres. At the same time, the first experimental facial animation sequences appeared (Parke, 1974). The Juggler, from Information International, Inc. (1982), was the first realistic human character in computer animation. The results were very impressive; however, the human shape was completely digitized, the body motion had been recorded using 3-D rotoscopy, and there was no facial animation. The first 3-D procedural model of human animation was used in producing the 12-minute film Dreamflight (Magnenat-Thalmann, Bergeron, & Thalmann, 1982), one of the first to feature a 3-D virtual human. Simultaneously, an effort with more emphasis on functional aspects than realism for virtual humans was initiated at the University of Pennsylvania (Badler & Morris, 1982) through a software package called Jack.
* Currently
at the Indian Institute of Technology, New Delhi, India
345
346
MAGNENAT-THALMANN, KALRA, AND ESCHER
In the 1980s, researchers started to base animation on the key frame animation and parametric animation, and in the late 1980s on the laws of physics. Dynamic simulation made it possible to generate complex motions with a great deal of realism. However, an ordinary human activity like walking is too complex to be simulated by the laws of dynamics alone. Two people, with the same physical characteristics, do not move in the same way. Even one individual does not move in the same way all the time. A behavioral approach to human animation is necessary to lend credibility to such simulations. The face is a relatively small part of a virtual human, but it plays an essential role in communication. We look at faces for clues to emotions or even to read lips. It is a particular challenge to simulate these aspects. Therefore, the ultimate objective is to model human facial anatomy exactly, including its movements, with respect both to structural and functional aspects. Recent developments in facial animation include physically based approximation to facial tissue and the reconstruction of muscle contractions from video sequences of human facial expressions. Problems of correlation between emotions and voice intonation have been also studied. Ensuring synchronization of eye motion, facial expression of emotion and the word flow of a sentence, as well as synchronization among several virtual humans, is at the heart of our new facial animation system at the University of Geneva. In the context of interactive virtual environments and animation systems, the relationship between the user as animator and virtual human as synthetic actor needs to be emphasized. A two-way communication is required where not only can the animator give commands to the actor but the actor must also be able to respond both verbally and behaviorally. This chapter is an account of a face to virtual face interaction system where a clone, representing a real person, can communicate with another virtual human, who is autonomous, in a virtual world. The dialogue consists of both verbal and other expressive aspects of facial communication between the two participants. Gives an overview of the problem and describes major contributions related to the different aspects. Concentrates on our system and describes different components of the system. Presents issues related to the standardization of parameters for defining the shape and animation of the face. Future trends are outlined in the concluding remarks. PROBLEM DOMAIN AND RELATED WORK To clone is to copy. In our context, cloning means reproducing a virtual model of a real person in the virtual world. Here, our interest is restricted to one component of the human figure—the face. The face is the most communicative part of the human figure. Even a passive face conveys a large amount of information, and when it comes to life and begins to move, the range of motions it offers is remarkable: We observe the lips, teeth, and tongue for speech, eyes and head movements for additional elements of dialogue, and flexing muscles and wrinkle lines for emotions.
16.
FACE-TO-FACE COMMUNICATION
347
Prerequisites for cloning a face are analyses of several aspects necessary for its reconstruction: its shape, and its movements due to both emotions and speech. This requires techniques from various fields. Shape includes geometrical form as well as other visual characteristics such as color and texture. Input for shape reconstruction may be drawn from photographs and/or scanned data. The synthesis of facial motion involves deforming its geometry over time according to physical or ad hoc rules for generating movements conveying facial expressions and speech. The input for the facial motion of the clone will be the facial expressions and/or the speech of the real person. In the rest of this section, we review work related to shape reconstruction, synthesis of facial animation, the analysis and tracking of facial motion, and facial communication. The vast domain of speech analysis and recognition is beyond the scope of this review.
3-D Shape Reconstruction Geometrical Representation Among the variety of ways of representing a face geometrically, the choice should be one that allows for precise shape, effective animation, and efficient rendering. Surface primitives and structures are currently the preferred geometrical representations for faces. Among surface description techniques are polygonal surfaces, parametric surfaces, and implicit surfaces. In a polygonal surface representation, a face is a collection of polygons, regularly or irregularly shaped. The majority of existing models use polygonal surfaces, primarily because of their simplicity and the hardware display facilities available for polygons on most platforms. The parametric surfaces use bivariate parametric functions to define surfaces in three dimensions, for example, bicubic B-spline surfaces (Nahas, Huitric, & Sanintourens, 1988, Waite, 1989). The advantage of these models is that they have smooth surfaces and are determined using only a few control points. However, local highdensity details for the eyes and mouth are difficult to add. Hierarchical B-splines developed by Forsey and Bartels (1988) enable more local detail without the need to add complete rows or columns of control points. Wang (1993) has used the hierarchical B-splines for modeling and animating faces. An implicit surface is an analytic surface defined by a scalar field function (Blinn, 1982). Interaction with implicit surfaces is difficult with currently available techniques, and these have not yet been used for facial modeling. Facial Features A face consists of many parts and details. Researchers tend to focus on the visible external skin surface of the face from neck to the forehead—“the mask”— for facial animation study and research. However, it is necessary to add features
348
MAGNENAT-THALMANN, KALRA, AND ESCHER
like eyeballs, teeth, tongue, ears, and hair to obtain realistic results. In addition, these features are essential to the recognition of particular individuals. Facial Data Acquisition Face models rely on data from various sources for shapes, color, texture, and features. In constructing geometrical descriptions, two types of input should be distinguished: three-dimensional and two-dimensional. Three-Dimensional Input Use of a 3-D digitizer/scanner would seem to be the most direct method for acquiring the shape of a face. A 3-D digitizer involves moving a sensor or locating device to each surface point to be measured. Normally, the digitized points are the polygonal vertices (these can also be the control points of parametric surfaces). There are several types of 3-D digitizers employing different measurement techniques (mechanical, acoustic, and electromagnetic). Many researchers for modeling faces have used Polhemus, an electromagnetic digitizer (Blinn, 1982; Magnenat-Thalmann & Thalmann, 1987). In other cases, a plaster model has been used for marking the points and connectivities. This procedure is not automatic and is very time-consuming. Laser-based scanners, such as Cyberware (1990), can provide both the range and reflectance map of the 3-D data in a few seconds. The range data produce a large regular mesh of points in a cylindrical coordinate system. The reflectance map gives color and texture information. One of the problems with this method is the high-density data provided. Another is that the surface data from laser scanners tend to be noisy and have missing points. Some postprocessing operations are required before the data can be used. These may include relaxation membrane interpolation for filling in the missing data (Lee, Terzopoulos, & Waters, 1995), filter methods, for example, hysteresis blur filters for smoothing data (Williams, 1990), and adaptive polygon meshes to reduce the size of the data set for the final face model (Terzopoulos & Waters, 1991). As an alternative to measuring facial surfaces, models may be created using interactive methods like sculpturing (Elson, 1990; LeBlanc, Kalra, MagnenatThalmann, & Thalmann, 1991) Here the face is designed and modeled by direct and interactive manipulation of vertex positions or surface control points. This, however, presupposes design skills and sufficient time to build the model. When constructing a clone, relying on subjective visual impressions may not be accurate or rapid enough. Two-Dimensional Input There are a number of methods for inferring 3-D shape from 2-D images. Photogrammetry of a set of images (generally two) can be used for estimating 3-D shape information. Typically, the same set of surface points are located and
16.
FACE-TO-FACE COMMUNICATION
349
measured in at least two different photographs. This set of points may even be marked on the face before the pair of photographs is taken. The measurement can be done manually or using a 2-D digitizer. A better method takes account of perspective distortion by using a projection transformation matrix determined by six reference points with known 3-D coordinates (Parke, 1990). Another approach is to modify a canonical or generic face model to fit the specific facial model using information from photographs of the specific face (Akimoto & Suenaga, 1993; Kurihara & Arai, 1991). This relies on the fact that humans share common structures and are similar in shape. The advantages here are that no specialized hardware is needed and that the modified heads all share the same topology and structure and hence can be easily animated. Animation Techniques Because the clones we are interested in creating will not remain static but will have to move like real people, we briefly review work done in animating synthetic models of the face. The different approaches employed for animation are primarily dictated by the context of application. There are five basic kinds of facial animation: interpolation, parametrization, pseudo-muscle-based, muscle-based, and performance driven. Interpolation resembles the key-frame approach used in conventional animation, in that the desired facial expression is specified for a particular point in time, defining the key frame. An in-betweening algorithm computes the intermediate frames. This approach, which has been used to make several films (KleiserWalczak, 1989), is very labor intensive and data intensive. Parametric animation models make use of local region interpolation, geometric transformations, and mapping techniques to manipulate the features of the face (Parke, 1982). These transformations are grouped together to create a set of parameters. Sets of parameters can apply to both the conformation and the animation of the face. In pseudo-muscle-based models, abstract notions of muscles simulate muscle actions, whereas deformation operators define muscle activities (MagnenatThalmann, Primeau, & Thalmann, 1988). The dynamics of different facial tissues is not considered. The idea here is not to simulate detailed facial anatomy exactly but to design a model with a few parameters that emulates muscle actions. There are no facial animation models yet, based on complete and detailed anatomy. Models have, however, been proposed and developed using simplified structures for bone, muscle, fatty tissue, and skin. These models enable facial animation through particular characteristics of the facial muscles. Platt and Badler (1981) used a mass-spring model to simulate muscles. Waters (1987) developed operators to simulate linear and sphincter muscles having directional properties. A physically based model has been developed where muscle actions are modeled by simulating the trilayer structure of muscle, fatty tissues, and the skin (Terzopoulos & Waters, 1990). Most of these methods do not have real-time performance.
350
MAGNENAT-THALMANN, KALRA, AND ESCHER
Performance-based animation uses data from real human actions to drive the virtual character. The input data may come from interactive input devices such as a DataGlove, instrumented body suits, and video or laser-based motion-tracking systems. The next section elaborates on methods of tracking facial motion from video-based input. Facial Motion Tracking The ability to track facial motion accurately promises unique opportunities for new paradigms of human–computer interaction. The task of automatically and faithfully tracking and recognizing facial dynamics, however, without invasive markers or special lighting conditions, remains an active exploratory area of research. One of the simplest approaches is to track markers placed directly on the face of the person whose facial motion we wish to track. With retroreflective markers located on the performer’s face, the problem of tracking becomes one of tracing a set of bright spots on a dark field. The technique is effective when there are discrete markers and they remain in view at all times. Tracking motion robustly is difficult. There have been some efforts in computer vision to find reasonable solutions. Both model-based and image-based methods have been employed. One approach makes use of a deformable template, assuming that each face consists of features like the eyes, nose, and mouth, which are uniquely identifiable (Yuile, Cohen, & Hallinan, 1989). A potential energy function for the image, formulated as a combination of terms due to valleys, edges, peaks, image, and internal potential, is minimized as a function of the parameters of the template. The limitation of this method is that it becomes unstable in the absence of a feature as the template continues to look for it. “Snakes,” or active contours, have also been used to track features (Kass, Witkin, & Terzopoulos, 1988). A snake is basically the numerical solution of a first-order dynamic system. The human face has many feature lines and boundaries that a snake can track. Snakes can also be used for estimating muscle contraction parameters from a sequence of facial images (Terzopoulos & Waters, 1993). The image-intensity function is transformed into a planar force field by computing the magnitude of the gradient of the image intensity. Some methods use a combination of image processing techniques for different features of the face, as snakes may not be stable for all the regions (Pandzic, Kalra, Magnenat-Thalmann, & Thalmann, 1994). B-splines have also been used for defining snakes where the control points are time varying (Blake & Isard, 1994). Methods using snakes are generally not very robust, as the numerical integration may become unstable. Optic flow computation has also been used to recover image motion parameters corresponding to facial expressions (Essa & Pentland, 1995). Optic flow can be estimated by tracking the motion of pixels from frame to frame. This enables us to capture the dynamics of facial expressions and hence add a temporal aspect to the characterization of a facial expression, something beyond the scope of the
16.
FACE-TO-FACE COMMUNICATION
351
Facial Action Coding System (Ekman & Friesen 1978). In another approach, optical flow is used in conjunction with deformable models to estimate shape and motion in human faces (De Carlo & Metaxas, 1996). Methods using optical flow require high textural detail in the images, as the computations rely on pixel-level detail. Facial Communication Facial communication among virtual humans has recently attracted much attention. Cassell and colleagues (1994) describe a system that generates automatic speech and gestures, including facial expressions, for modeling conversation among multiple humanlike agents. This system, however, does not include cloning aspects. Thorisson (1997) presents a mechanism for action control, using a layered structure, for communicative humanoids. The virtual environment consists of Gandalf, a simplified caricatural face, used as the autonomous actor. There can be four different situations for face to virtual face communication in a virtual world. These are: real face to virtual face, cloned face to cloned face, virtual face to virtual face, and cloned face to virtual face. The four cases are briefly described as follows. Real Face to Virtual Face Here, a real face communicates with a virtual face that is autonomous and has some intelligence to understand what the real face conveys. This may have two types of input from the real face: video input from the camera and speech from audio (microphone). The emotions of the real person are recognized by facial features extraction, and tracking from the video input and the audio are converted to the text and phonemes. These are used by the autonomous virtual face to respond in a believable manner through his or her facial expressions and speech (see Fig. 16.1). Mimicking the input from the real-face video or speech are the special cases of this situation.
Video Input Emotion Extraction
Virtual Face
(face) Audio Input (speech)
Speech to Text
FIG. 16.1. Real face to virtual face.
3D Virtual Face Animation with Audio
352
MAGNENAT-THALMANN, KALRA, AND ESCHER
Speech to Phonemes
3D Clone 2 Animation with Audio
Video Input
Audio Input (speech 2)
Face Cloning and Animation
Feature Extraction and Tracking
Feature Extraction and Tracking
(face 1)
Face Cloning and Animation
(face 2) Video Input
3D Clone 1 Animation with Audio
Speech to Phonemes
Audio Input (speech 1)
FIG. 16.2. Cloned face to cloned face.
Virtual Human Face 1
Emotion Text
Virtual Human Face 2
FIG. 16.3. Virtual face to virtual face.
Cloned Face to Cloned Face In a networked 3-D virtual environment, communication may exist between the participants located at different sites represented by their 3-D clones. This requires construction of a 3-D clone and facial motion capture of the real face from the video camera input for each participant. The extracted motion parameters are mapped to the corresponding facial animation parameters for animating the cloned models. The audio or speech input is also reproduced on each clone. Figure 16.2 shows the modules for such a communication. Virtual Face to Virtual Face A virtual face that is autonomous may communicate with another autonomous virtual face. The communication may involve both speech and facial emotions (Fig. 16.3). This situation does not require any cloning aspects. The autonomy of
16.
FACE-TO-FACE COMMUNICATION
353
the virtual humans gives them the intelligence and capacity to understand what the other virtual humans convey and generate a credible response. Cloned Face to Virtual Face A virtual environment inhabited by the clones representing real people and a virtual autonomous human would require communication between a cloned face and a virtual face. This needs the cloning and mimicking aspects to reconstruct the 3-D model and movements of the real face. The autonomous virtual face is able to respond and interact through facial expressions and speech. This is of particular interest when there is more than one real participant being represented by their clone with one or more autonomous virtual humans. Although the participant may not be interested in watching their clone, they would be able to see the presence of the other real participants. Figure 16.4 shows the configuration of the communication between a cloned face and a virtual face. The communication between a cloned face and a virtual face addresses all the problems involved for facial cloning and communication and is considered a general case for all the four situations described here. We now describe our system, with its different modules, which enables face-tovirtual-face communication in a virtual environment inhabited by the virtual clone of a real person and the autonomous virtual human. The system concentrates only on the face. Other body parts, though often communicative in real life, are for the moment considered passive. Although the system is described considering
Emotion Extraction
Virtual Face 3D Virtual Face Animation with Audio
Video Input (face)
Feature Extraction and Tracking
Face Cloning and Animation 3D Clone Face Animation with Audio Speech to Phonemes Audio Input (speech) Speech to Text
FIG. 16.4. Cloned face (of real face) to virtual face.
354
MAGNENAT-THALMANN, KALRA, AND ESCHER
one cloned face and one virtual autonomous face, it can have multicloned and multivirtual faces.
SYSTEM DESCRIPTION The general purpose of the system is to be able to generate a clone of a real person and make it talk and behave like the real person in a virtual world. The virtual world may contain another virtual human, who is autonomous. Dialogue and other communication can be established between the clone and the autonomous virtual human. The autonomous virtual human understands the expressions and speech of the real person (conveyed via the clone) and communicates using both verbal and nonverbal signals. The dialog is simulated in a 3-D virtual scene that can be viewed from different remote sites over the network. Implementing such a system necessitates solving many independent problems in many fields: image processing, audio processing, networking, artificial intelligence, virtual reality, and 3-D animation. In designing such a system, we divide it into modules, each module being a logical separate unit of the entire system. These modules are: 3-D face reconstruction, animation, facial expression recognition, audio processing, interpretation and response generation, and audiovisual synchronization. The following subsections describe the different modules of the system. The data flow and the inter relationship of modules combined with network issues are described in subsequent sections. 3-D Face Reconstruction The 3-D face model is constructed from a generic/canonical 3-D face using two orthogonal photographs, typically a front and a side view. The process is also referred to as model fitting, as it involves transforming the generic face model to the specific face of the real person. First, we prepare two 2-D templates containing the feature points from the generic face—points that characterize the shape and morphology of the face—for each orthogonal view. If the two views of the person to be represented have different heights and sizes, a process of normalization is required. The 2-D templates are then matched to the corresponding features on the input images (Lee, Kalra, & Magnenat-Thalmann, 1997). This employs structured discrete snakes to extract the profile and hair outline, and filtering methods for the other features like the eyes, nose, and chin. The 3-D coordinates are computed using a combination of the two sets of 2-D coordinates. This provides the set of target positions of the feature points. The generic nonfeature points are modified using a deformation process called dirichlet free-form deformation (DFFD; Moccozet & Magnenat-Thalmann, 1997). DFFD is a generalized method for free-form deformation (FFD; Sederbeg & Parry, 1986) that combines traditional FFD with scattered data interpolation methods based on Delaunay–Dirichlet
16.
FACE-TO-FACE COMMUNICATION
355
Automatic feature point displacement on the picture.
Feature point definition Face picture digitalisation
Extraction of the 3D coordinates of each feature point
Mapping of the picture on the 3D face model
Deformation of the generci model according to the feature points
Positionning of the feature points on the generic model
FIG. 16.5. Steps for clone construction.
diagrams. DFFD imposes no constraint on the topology of the control lattice. Control points can be specified anywhere in the space. We can then perform model fitting using a set of feature points defined on the surface of the generic face as the DFFD control points (Escher & Magnenat-Thalmann, 1997). For realistic rendering, we use texture mapping to reproduce the small details of facial features that may not show up in the gross model fitting. Figure 16.5 shows the different steps for constructing a 3-D face model for the clone. Animation A face model is an irregular structure defined as a polygonal mesh. The face is decomposed into regions where muscular activity is simulated using rational freeform deformations (Kalra, Mangili, Magnenat-Thalmann, & Thalmann, 1992). As model fitting transforms the generic face without changing the underlying
356
MAGNENAT-THALMANN, KALRA, AND ESCHER
3D Polygonal Surface Muscle Simulation RFFD
Minimum Perceptible Actions (MPA)
Expressions Phonemes
Emotions and Sentences
FIG. 16.6. Different levels of animation control.
structure, the resulting new face can be animated. Animation can be controlled on several levels. On the lowest level, we use a set of 65 minimal perceptible actions (MPAs) related to the muscle movements. Each MPA is a basic building block for a facial motion parameter that controls a visible movement of a facial feature (such as raising an eyebrow or closing the eyes). This set of 65 MPAs allows construction of practically any expression and phoneme. On a higher level, phonemes and facial expressions are used, and at the highest level, a script containing speech and emotions with their duration and synchronization controls animation. Depending on the type of application and input, different levels of animation control can be utilized. Figure 16.6 shows these levels. Facial Expression Recognition Accurate recognition of facial expression from a sequence of images is complex; however, there have been numerous efforts in this area, some of which we have briefly outlined. The difficulty is greatly increased when the task is to be done in real time. In trying to recognize facial expression with a reasonable degree of accuracy and reliability in real time, a few simplifications are inevitable. We focus our attention on only a few facial features for detection and tracking. The method relies on a “soft” mask, which is a set of points defined on the frontal face image. During the initialization step, the mask can be interactively adjusted to the image of the real person, permitting detailed measurements of facial features (Magnenat-Thalmann, Kalra, & Pandzic, 1995). Other information such as a color sample of the skin, background, and hair, are also stored. The feature detection method is based on color sample identification and edge detection. The feature points used for facial
16.
FACE-TO-FACE COMMUNICATION
357
expression recognition are concentrated around the mouth, the neck, the eyes, the eyebrows, and the hair outline. The data extracted from previous frame are used only for features that are easy to track (e.g., the neck edges), thus avoiding the accumulation of error. In order to reproduce the corresponding movements on the virtual face, a mapping is carried out from the tracked features to the appropriate MPAs, the basic motion parameters for facial animation. This allows us to mimic the facial expression of the real person on their clone. Figure 16.7 illustrates how the virtual face of the clone is animated using input from the real person’s facial expressions. All this information still does not tell us enough about the mood and/or the emotion of the real person. The person’s mood and emotions are important information to be input to the process of formulating the autonomous virtual human’s response. We employ a rule-based approach to infer the emotion of the person. These rules are simple mapping of active MPAs to a given emotion; however, these are limited to only a few types of emotion. Another approach to recognizing the basic emotion is to use a neural network with the automatically extracted facial features as input associated to each emotion. We are currently making some experiments for classifying the basic emotions (surprise, disgust, happiness, fear, anger, and sadness) using a neural network approach where the input is the extracted data from video and the output is one of the six emotions. Difficulty remains in identifying a blend of basic emotions. This may be partially resolved by identifying the dominant features of the basic emotion depicted and masking the others. Audio Processing The processing of the audio signal (which contains most of the dialogue content) to text is nontrivial. Recently, some results have been reported in speakerdependent restrained contextual vocabulary for continuous speech (90% success rate; Philips). There are, however, many techniques for speech recognition. Typically, they involve the following processes: digital sampling of speech, acoustic signal processing (generally using spectral analysis), recognition of phonemes, groups of phonemes, and then words (hidden Markov modeling systems are currently the most popular; basic syntactic knowledge of the language may aid the recognition process). However, these techniques are time-consuming and not appropriate for real-time applications. Recent developments suggest that recognition is possible in real time when used in a simplified context (Pure Speech, Vocalis). Audio analysis is essential if the semantics of the input speech is required before formulating a response. Text-to-speech, or speech synthesis, is another important part of audio processing. The usual way is to split textual speech into small units, generally phonemes, the smallest meaningful linguistic unit. Each phoneme has a corresponding audio signal. It is no easy matter, however, to combine them to produce a fluent speech. One commonly employed solution is to use diphones, instead of just phonemes, which contain the transitions between pairs of phonemes. This squares
358
MAGNENAT-THALMANN, KALRA, AND ESCHER
(a)
(b) FIG. 16.7. Facial animation of the clone’s face: (a) working session with the user in front of a CCD camera; (b) real-time facial expression recognition and animation of the clone’s face.
16.
FACE-TO-FACE COMMUNICATION
359
FIG. 16.8. Animation using audio input.
the number of the elements to be processed in the database but improves the quality considerably. Inflections corresponding to punctuation are added to generate a more humanlike voice. To animate the face using audio input, the audio phonemes are mapped to their corresponding visual output, called the viseme (a viseme is a particular shape of the mouth region for a given phoneme). Because the viseme is defined as set of MPAs, as previously mentioned, it can be applied to any face. Figure 16.8 shows the same viseme on two different face models. Interpretation and Response Generation For there to be virtual dialogue between the clone and an autonomous virtual human, the autonomous virtual human should be able to “understand” the speech and the emotions of the clone. This requires the addition of an intelligent module to act as the “brain” of the autonomous participant. The information it has to process is composed of the text of the clone’s speech and its emotions inferred from facial expressions. The analysis may involve techniques of natural language processing (see Fig. 16.9).
360
MAGNENAT-THALMANN, KALRA, AND ESCHER
Text
Lexical Analysis
Emotion
Syntactic analysis
Semantic and Pragmatic Analysis
Automatic Text Generation
Text
FIG. 16.9. Natural-language processing steps.
For each word, the lexical analysis retrieves information stored in a lexicon, and then the syntactic analysis relies on a set of grammatical rules to parse each sentence in order to determine the relations among the various groups of words. The semantic analysis infers the meaning of the sentence and also generates the output used in the automatic response generation. This step can be bolstered by pragmatic analysis of real-world states and knowledge, exemplified in our case by the emotions conveyed by the clone. Our simplified prototype is an automaton where the final state after the syntactic and semantic analysis is one of the responses in a database available to the autonomous virtual human. This database contains a number of predefined utterances, each associated with an emotional state. The current system has limited intelligence; however, this is being extended and elaborated by including complex knowledge analysis and treatment. Audiovisual Synchronization To produce bimodal output where the virtual human exhibits a variety of facial expressions while speaking, audio and visual output must be synchronized. To synchronize the sound with the animation, the sound stream is stored in an audio buffer, and the animation, in terms of MPAs, is stacked in an MPA buffer. An MPA synchronizer controls the trigger to both buffers. At present for text to phoneme we are using Festival Speech Synthesis System from the University of Edinburgh, and for phoneme to speech (synthetic voice) we use MBROLA from the Facult´e Polytechnique de Mons, Belgium. Both are public domain software. In order to have synchronized output of the MPA arrays from different sources (e.g., emotions from video and phonemes from audio speech) at a predefined frame rate (Fd , generally 25 frames/sec) with the acoustic speech, a buffer or stack is introduced for each source of MPAs. An initial delay is caused if the frame rate of one source is less than Fd (see Fig. 16.10). It is assumed that for each MPA source the frame rate is known (e.g., F1 , F2 ). The intermediate frames are added using interpolation and extrapolation of the existing computed frames in each buffer to match the frame rate Fd . The MPA array from
16.
FACE-TO-FACE COMMUNICATION
Source 1
Fd
Source 2
F1 F1
F2
MPA’s Buffer 1 Fd
361
MPA’s Buffer 2 Fd
F2 MPA’s Composer (Σ)
Initial delay
Fd
FIG. 16.10. Synchronization of MPA streams.
each buffer goes to the composer, which produces a single stream of MPAs for the deformation controller, where a face is deformed. The deformation process for each frame on average takes less than 1/40th of a second on a fully textured face with about 2,500 polygons on an SGI O2 work station. The synchronization, however, does not take into account the frame-to-frame difference if caused at the operating-system level. As the animation may involve simultaneous application of the same MPA coming from different types of actions and sources, a mechanism to compose the MPAs is provided. A weight function is defined for each MPA in an action. It is a sinusoidal function with a duration relating to an action, generally considered as 10% of the total duration of the action. This provides a smooth transition with no jump effect when there is overlap of actions with the same MPA. Network Issues From the networking point of view, the virtual dialogue system acts like a simple client server, whose architecture is that of an ongoing networked collaborative virtual environment: Virtual Life Network (VLNET; Capin, Pandzic, Noser, Magnenat-Thalmann, & Thalmann, 1997). The clients include the user, as represented by the participation of their clone, as well as one or more external (nonactive) viewers. To reduce bandwidth, only animation or reconstruction parameters of the clone are transmitted to the server, as opposed to the entire data set of the face. This means that all the video and sound processing has to be computed by the client. The server returns the parameters of the scene to the clients. These include the deformation of objects in the scene, the parameters for speech synthesis of the autonomous
362
MAGNENAT-THALMANN, KALRA, AND ESCHER
virtual human, and the compressed speech of the clients and/or clones. The client is free to select a point of view of the scene, with the default being determined by the position of the clone. The client has to reconstruct the view according to the parameters given by the server, decompress the audio, and synthesize the speech from the phonemes
Complete Pipeline Figure 16.11 gives the overview of the complete pipeline showing the data flow among the various modules. The input can be in various forms or media; here we are primarily concerned with video and audio inputs. However, this can be extended for other input, such as 3-D trackers for body gestures and positions. The audio signal is analyzed to extract the speech content (text) and the phonemes composing this text. In order to synchronize with the animation, we
Real World
Virtual world Clone Animation
Video
Video Processing
Texture 3D feature points
Face Cloning
3D Model
Feature Tracking Emotion Extraction
Audio Speech
Phonemes Audio Processing
Face Animation
Real Voice Text
Emotion
3D scene info. Text
Synthetic Voice
Text Analysis
Knowledge Based System Phonemes
Emotions
Face Animation Autonomous Virtual Human
3D Scene Rendering
FIG. 16.11. Data flow of the virtual dialogue system.
16.
FACE-TO-FACE COMMUNICATION
363
need the onset and duration of each phoneme extracted. The data from the video signal is processed to extract all the visual information about the user, including 3-D feature points for modeling the clone face and tracking features for animation. The autonomous virtual human, a separate entity, receives text and emotions as input. This is processed by the knowledge-based module, which then also provides the output response, containing text and emotion. The text is processed to generate temporized phonemes for generating facial deformations as well as synthetic speech. The voice is synchronized with the face deformation, which also accounts for the emotions. To summarize: The output of the system includes the virtual clone reproducing the motion and speech of the real person, and the autonomous virtual human communicating with the real person, as represented by the clone, via both speech and emotion-conveying expressions.
STANDARDIZATION FOR SNHC SNHC (synthetic natural hybrid coding) is a subgroup of MPEG-4 that is devising an efficient coding for graphics models and compressed transmission of their animation parameters specific to the model type (Doenges et al., 1997). The University of Geneva is making a contribution to the group as a working partner in VIDAS, a European project formulating a standard set of parameters for representing the human body and the face. For faces, the facial definition parameter set (FDP) and the facial animation parameter set (FAP) are designed to encode facial shape and texture, as well as animation of faces reproducing expressions, emotions, and speech pronunciation. The FAP is based on the study of minimal facial actions (like the MPAs in our system) and are closely related to muscle actions. They represent a complete set of basic facial actions and allow the representation of most natural facial expressions. The lips are well defined to take into account inner and outer contours. Exaggerated FAP values permit actions that are not normally possible for humans, but could be desirable for cartoonlike characters. All parameters involving motion are expressed in terms of the facial animation parameter units (FAPU). They correspond to fractions of distances between key facial features (e.g., distance between the eyes). The fractional units are chosen to ensure a sufficient degree of precision. To implement FAPs in our current system, a mapping or conversion table is built that changes FAPs to MPAs. The parameter set contains three high-level parameters. The viseme parameter allows direct rendering of visemes on the face without the intermediary of other parameters, as well as enhances the result by applying other parameters, thus ensuring the correct rendering of visemes. The current list of visemes is not intended to be exhaustive. Similarly, the expression parameter allows the definition of high-level facial expressions.
364
MAGNENAT-THALMANN, KALRA, AND ESCHER
The FDPs are used to customize a given face model to a particular face. They contain: 3-D feature points (e.g., mouth corners and contours, eye corners, and eyebrow ends), 3-D mesh (with texture coordinates if appropriate, optional), texture image (optional), and other (hair, glasses, age, and gender, optional). Figure 16.12 shows the feature points that are characteristic points on the face according to the MPEG4 current draft (MPEG-N1902).
CONCLUSION AND FUTURE WORK Providing the computer with the ability to engage in face-to-virtual-face communication—an effortless and effective interaction among real-world people—offers a step toward a new relationship between humans and machines. This chapter describes our system, with its different components, which allows real-time interaction and communication between a real person represented by a cloned face and an autonomous virtual face. The system provides an insight into the various problems embodied in reconstructing a virtual clone capable of reproducing the shape and movements of the real person’s face. It also includes the syntactic and semantic analysis of the message conveyed by the clone through his or her facial expressions and speech, which can then be used for provoking a credible response from the autonomous virtual human. This communication consists of both verbalizations and nonverbal facial movements synchronized with the audio voice. We have shown through our system that in a simple situation, it is possible to gather considerable perceptual intelligence by extracting visual and aural information from a face. This knowledge can be exploited in many applications: natural and intelligent human–machine interfaces, virtual collaborative work, virtual learning and teaching, virtual studios, and others. It is not too far-fetched to imagine a situation, real or contrived, where a fully cloned person (face, body, and clothes) interacts in a virtual environment inhabited by other virtual humans, clones or autonomous. However, further research effort is required to realize this in real time with realistic and believable interaction. Future work will involve recognition and integration of other body gestures and full body communication among virtual humans. We have done some work in the direction of communication between virtual humans using facial and body gestures (Kalra, Becheiraz, Magnenat-Thalmann, & Thalmann, in press). However, cloning of complete virtual humans, including animation, still needs a lot of research.
ACKNOWLEDGMENTS The research was supported by the Swiss National Foundation for Scientific Research and the European ACTS project Video Assisted with Audio Coding and Representation (VIDAS). The authors would like to thank the members of Miralab,
16.
11.5
11.5
11.4
11.4
11.2 11.2 4.4 4.6
11.1 4.2
4.1
4.3
4.4 4.5
10.2
10.7
10.8 5.2
10.10
10.3
5.3
5.4
y
10.2
10.9
10.10
10.5
5.4 10.4 10.8
5.1
10.6
x
x 2.1
2.11
2.12
2.10
z
7.1
2.10
z
5.2
y
2.13
2.14
4.2
4.6
10.1
10.6
11.1
11.3
11.6
10.4
365
FACE-TO-FACE COMMUNICATION
2.14
2.12
2.1 3.13
3.14 3.2
3.1 3.8
3.6
3.12
3.11
3.5
3.7
3.3
3.4 3.10
3.9
Right eye
9.6
Left eye
9.7
9.8 9.12
Nose 9.14 9.10
9.13
9.11 9.3
9.9
9.1
9.2
Teeth
8.9
8.6 8.4 2.5 6.4
6.2
9.5
8.10 8.1
2.7
2.2
2.9
2.3
2.6
8.5 2.4
6.3 8.8
6.1
9.15
9.4
Tongue
Mouth
Feature points affected by FAPs Other feature points
FIG. 16.12. The feature definition points.
2.8 8.2
8.7
8.3
366
MAGNENAT-THALMANN, KALRA, AND ESCHER
in particular Laurence Suhner, Laurent Moccozet, Jean-Claude Moussaly, Igor S. Pandzic, Marlene Poizat, and Nabil Sidi-Yagoub.
REFERENCES [MPEG-N1902] Text for CD 14496-2 Video, ISO/IEC JTC1/SC29/WG11 N1886. Akimoto T., & Suenaga, Y. (1993). Automatic creation of 3D facial models. IEEE Computer Graphics & Applications, 16–22. Badler, N. I., & Morris, M. A. (1982). Modeling flexible articulated objects. Proceedings of Computer Graphics ’82, Online Conference, 19–38. Blake, A., & Isard, M. (1994). 3D position, attitude and shape input using video tracking of hands and lips. Computer Graphics (Proceedings of SIGGRAPH ’94), 185–192. Blinn, J. (1982). A generalization of algebraic surface drawing, ACM Transactions on Graphics, 1, 235–256. Capin, T. K., Pandzic, I. S., Noser, H., Magnenat Thalmann, N., & Thalmann, D. (1997). Virtual human representation and communication in VLNET. IEEE Computer Graphics and Applications, 42–53. Cassell, J., Pelachaud, C., Badler, N. I., Steedman, M., Achorn, B., Becket, T., Deouville, B., Prevost, S., & Stone, M. (1994). Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. Computer Graphics (Proceedings of SIGGRAPH ’94), 413–420. Cyberware Laboratory, Inc. (1990). 4020/RGB 3D scanner with color digitizer. Monterey, CA: Author. De Carlo Donglas, & Metaxas, D. (1996). The integration of optical flow and deformable models with applications to human face shape and motion estimation. CVPR ’96, 231–238. Doenges, P., Lavagetto, F., Ostermann, J., Pandzic, I., & Sunday, P. (1997). MPEG-4: Audio/video and synthetic graphics/audio for real-time, Interactive Media Delivery, Image Communications Journal, 5. Ekman, P., & Friesen, W. V. (1978). Manual for the facial action coding system. Palo Alto, CA: Consulting Psychology Press. Elson, M. (1990). Displacement facial animation techniques, In State of the art in facial animation (SIGGRAPH ’90 Course Notes No. 26, pp. 21–42). Escher, M., & Magnenat Thalmann, N. (1997). Automatic cloning and real-time animation of a human face. In Proceedings of Computer Animation ’97 (pp. 58–66): IEEE Computer Society Press. Essa, I., & Pentland, A. (1995). Facial expression recognition using a dynamic model and motion energy. In Proceedings of the International Conference on Computer Vision (pp. 360–367). IEEE Computer Society Press. Forsey, D. R., & Bartels, R. H. (1988). Hierarchical B-spline refinement. Computer Graphics (Proceedings of SIGGRAPH ’88), 22, 205–212. Horace, H. S., & Ip, L. Y. (1996). Constructing a 3D individualized head model from two orthogonal views. In The visual computer (pp. 12:254–12:266): Springer-Verlag. Information International. (1982). SIGGRAPH film festival video review. ACM. Kalra, P., Becheiraz, P., Magnenat Thalmann, N., & Thalmann, D. (in press). Communication between synthetic actors. In S. Luperfoy (Ed.), Automated spoken dialogues systems. Cambridge, MA: MIT Press. Kalra, P., Mangili, A., Magnenat Thalmann, N., & Thalmann, D. (1992). Simulation of facial muscle actions based on rational free form deformations. In Proceedings of Eurographics ’92 (pp. 59–69). Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. International Journal of Computer Vision, 1, 321–331.
16.
FACE-TO-FACE COMMUNICATION
367
Kleiser, J. (1989). A fast, efficient, accurate way to represent the human face. State of the art in facial animation (SIGGRAPH ’89 Tutorial), 22, 37–40. Kleiser-Walczak, (1988). Sextone for president. SIGGRAPH Film Festival Video Review, Kleiser-Walczak (1989). Don’t touch me, SIGGRAPH Film Festival Video Review, Kurihara, T., & Arai, K. (1991). A transformation method for modeling and animation of the human face from photographs. In N. Magnenat Thalman & D. Thalmann (Eds.) Proceedings of Computer Animation ’91 (pp. 45–58). Springer-Verlag. LeBlanc, A., Kalra, P., Magnenat Thalmann, N., & Thalmann, D. (1991). Sculpting with the ball and mouse metaphor. In Proceedings of Graphics Interface ’91 (pp. 152–159). Calgary Canada: Lee, W. S., Kalra, P., & Magnenat Thalmann, N. (1997). Model-based face reconstruction for animation. In Proceedings of Multimedia Modeling (MMM)’97. Singapore: Lee, Y., Terzopoulos, D., & Waters, K. (1995). Realistic modeling for facial animation. Computer Graphics (Proceedings of SIGGRAPH ’95)29, 55–62. Magnenat Thalmann, N., Bergeron, P., & Thalmann, D. (1982). DREAMFLIGHT: A fictional film produced by 3D computer animation, Proceedings of Computer Graphics, Online Conference., 353–368. Magnenat Thalmann, N., Kalra, P., & Pandzic, I. S. (1995). Direct face-to-face communication between real and virtual humans. International Journal of Information Technology, 1, 145–157. Magnenat Thalmann, N., Primeau, N. E., & Thalmann, D. (1988). Abstract muscle actions procedures for human face animation. The Visual Computer, 3, 290–297. Magnenat Thalmann, N., & Thalmann, D. (1987). Synthetic actors in computer-generated 3D films. Tokyo: Springer-Verlag. Moccozet, L., & Magnenat Thalmann, N. (1997). Dirichlet free-form deformations and their application to hand simulation. In Proceedings of Computer Animation ’97 (pp. 93–102). IEEE Computer Society Press. Nahas, M., Huitric, H., & Sanintourens, M. (1988). Animation of a B-spline figure. The Visual Computer, 3, 272–276. Pandzic, I. S., Kalra, P., Magnenat Thalmann, N., & Thalmann, D. (1994). Real-time facial interaction. Displays, 15, 157–163. Parke, F. I. (1974). A parametric model for human faces. Unpublished doctoral dissertation, University of Utah, Salt Lake City. Parke, F. I. (1982). Parameterized models for facial animation. IEEE Computer Graphics and Applications, 2, 61–68. Parke, F. I. (Ed.). (1990). State of the art in facial animation. (SIGGRAPH ’90 Course Notes No 26. Philips Speech Processing. [Online]. Available: http://www.speech.be.philips.com Platt, S. M., & Badler, N. I. (1981). Animating facial expressions. Computer Graphics (Proceedings of SIGGRAPH ’81), 15, 245–252. Pure Speech. [Online]. Available: http://www.speech.com Sederbeg, T. W., & Parry, S. R. (1986). Free-form deformation of solid geometry models. Computer Graphics (SIGGRAPH ’86), 20, 151–160. Terzopoulos, D., & Waters, K. (1990). Physically based facial modeling, analysis, and animation. The Journal of Visualization and Computer Animation, 1, 73–80. Terzopoulos, D., & Waters, K. (1991). Techniques for realistic facial modeling and animation. In N. Magnenat Thalmann & D. Thalmann (Eds.). In Proceedings of Computer Animation ’91 (pp. 59–74). Tokyo: Springer-Verlag. Terzopoulos, D., & Waters, K. (1993). Analysis and synthesis of facial image sequences using physical and anatomical models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 569– 579. Thorisson Kristinn, R. (1997). Layered modular action control for communicative humanoids, In Proceedings of Computer Animation ’97 (pp. 134–143). IEEE Computer Society Press. Vocalis. [Online]. Available: http://www.vocalis.com
368
MAGNENAT-THALMANN, KALRA, AND ESCHER
Waite, C. T. (1989). The facial action control editor, FACE: A parametric facial expression editor for computer-generated animation. Unpublished master’s thesis, Massachusetts Institute of Technology, Media Arts and Sciences, Cambridge, MA. Wang, C. L. (1993). Langwidere: Hierarchical spline-based facial animation system with simulated muscles. Unpublished master’s thesis, University of Calgary, Canada. Waters, K. (1987). A muscle model for animating three-dimensional facial expressions. Computer Graphics (Proceedings of SIGGRAPH ’87), 21, 17–24. Williams, L. (1990). Performance-driven facial animation. Computer Graphics (Proceedings of SIGGRAPH ’90), 24, 235–242. Yuile, A. L., Cohen, D. S., & Hallinan, P. W. (1989). Feature extraction from faces using deformable templates. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 104–109).
17 Integration of Human Factors Aspects in the Design of Spatial Navigation Displays Eric Theunissen Faculty of Information Technology and Systems Delft University of Technology
Spatial displays have been proposed for a wide variety of applications, including telemanipulation (Bos, 1991), conflict detection and resolution (Ellis, McGreevy, & Hitchcock, 1987), air traffic control, and aircraft navigation and guidance (Grunwald, 1984; Wickens, Haskell, & Harte, 1989). Ellis, Kaiser, and Grunwald (1991) present an overview of applications, design questions, and research into spatial displays. For more than 40 years, perspective flight-path displays have been considered for aircraft navigation and guidance. The origin of the concept dates back to the U.S. Army–Navy Instrumentation Program (ANIP), between 1952 and 1963. The ANIP led into the Joint Army–Navy Aircraft Instrumentation Program (JANAIR) in the 1960s. Because of limitations in computer technology, many of the display formats proposed in the context of the ANIP and JANAIR have never been implemented. Although display and computer graphics technology certainly have been a limiting factor, this is no longer the case. Figure 17.1 shows an example of a spatial display combining data about the surrounding terrain with a perspective flight-path display. Figure 17.2 shows an experimental perspective flight-path display during a flight test performed by the Telecommunications and Traffic Control Systems Group of the Delft University of Technology. These flight tests were performed in combination with flight tests for the concept demonstration of the multimode 369
370
THEUNISSEN
FIG. 17.1. Example of a spatial display in which data about the terrain is integrated with a perspective flight-path display.
integrated approach system (MIAS), in which the Microwave Landing System (MLS) and the global positioning system (GPS) are used to provide an integrated position solution, and the MLS auxiliary data words are used to uplink differential GPS corrections to the aircraft (Breeuwer, van Goor, Zaaijer, Moelker, & van Willigen, 1995). The hardware of this system consisted of PC-based commercial off-the-shelf components. Another factor that has prevented the introduction of spatial displays for navigation and guidance is the complexity of their design. Fadden, Braune, and Wiedemann (1987) indicated this problem by stating that while the promise of spatial displays is great, the cost of their development will be correspondingly large. The knowledge and skills that must be coordinated to ensure successful results is unprecedented. From the viewpoint of the designer, basic knowledge of how human beings perceive and process complex displays appears fragmented and largely unquantified.
Hardly any detailed guidelines to the design of these types of displays exist that take specific human capabilities in the areas of perception, cognition, and control into account. This causes many problems for designers in terms of understanding how to format displays for navigation and guidance, and ultimately why they should be formatted in a particular manner. Further, it becomes very difficult to maintain an overview of all the design aspects and their relations with the task requirements,
17.
SPATIAL NAVIGATION DISPLAYS
371
FIG. 17.2. In-flight test of an experimental spatial display for aircraft navigation, guidance, and control (Theunissen, 1997).
which increases the danger that certain undesired effects are overlooked. To change this situation, the question of how to utilize the existing knowledge in the areas of perception, cognitive science, and control theory is addressed. The resulting approach is sufficiently generic that it can be used to aid in the design of spatial displays for control tasks that require guidance of a vehicle along a predefined trajectory.
SPATIALLY INTEGRATED DATA PRESENTATION The idea behind spatially integrated data presentation is that in the real world, humans are very efficient at using position and orientation estimates relative to objects for all kind of purposes, including moving around in their environment. For tasks that require the integration of data, conventional symbolic displays do not exploit this extremely well-developed capability of humans. A spatial display does not necessarily yield a veridical perception of the 3-D environment. McGreevy and Ellis (1986) have demonstrated that with certain types of spatial displays, observers make systematic errors in the judgment of direction. For a variety of tasks, it is not necessary to have accurate quantitative knowledge about relative position and orientation, and certain errors are tolerable. Manual tracking tasks typically comprise both open- and closed-loop control. An error in the estimation of direction can result in an error in the open-loop control action. However, open-loop control actions are almost never completely error free because this would require a perfect internal representation (Stassen, Johannsen, & Moray, 1990) of the dynamics of the system under control and knowledge about the future disturbances. Further, if
372
THEUNISSEN
the operator is trained with a particular spatial display, the effect of the estimation error in the direction will be included in the internal representation. Because the magnitude of the systematic error in the judgement of direction is a function of the geometric and the observer field of view, this will still lead to estimation errors for other geometric and observer fields of view. The closed-loop control actions are meant to compensate for any remaining errors. Therefore, when using spatially integrated data presentation, one should distinguish between the need for veridical perception of the spatial layout and the goal of reducing the required effort for integration and interpretation of the displayed data. The latter requirement is easier to satisfy than the former one and allows more trade-offs between the various design requirements to be made. An example of a display employing spatially integrated data presentation is a perspective flight-path display, which presents a spatially integrated view of the required trajectory and its constraints. Instead of presenting control commands (Fig. 17.3), a perspective flightpath display makes control intuitive through direct visualization of the guidance requirements (Fig. 17.4). A perspective flight-path display benefits from the fact that an integrated presentation of data in a suitable frame of reference can reduce the amount of required mental integrations and rotations. This alleviates the pilot from the task of performing the mental integrations of the otherwise separately displayed position and orientation data into a spatially coherent picture. Further, as a result of the trajectory preview, guidance and short-term navigation information are available from a single display format. The advantage is that the data used for guidance and control also conveys the information needed to maintain a certain level of spatial
FIG. 17.3. Display presenting control commands by means of the solid bars.
17.
SPATIAL NAVIGATION DISPLAYS
373
FIG. 17.4. Display presenting guidance requirements by visualizing the future trajectory constraints.
and navigational awareness. The trajectory preview allows the pilot to continuously refresh important information contributing to navigational awareness and thus reduces the demands on working memory. Further, it relieves the pilot from continuously having to scan the navigation display and allows the sampling of information on the navigation display to become more optimal because it is not determined by the limitations of working memory.
DESIGN COMPLEXITY Whereas the development of 2-D command displays such as the flight director is mainly a control engineering problem for which structured approaches to the design exist, the design of spatial displays requires a more integrated approach taking into account control theoretical, perceptual, and cognitive aspects. To be able to pursue an integrated approach, an overview of the elements involved in the presentation of spatially integrated data is needed. Figure 17.5 shows an overview of the different elements involved in the spatially integrated presentation of data for navigation, guidance, and control. With some minor modifications, the overview depicted in Fig. 17.5 can represent other types of applications, for example, a spatial display for telemanipulation. Static synthetic data refers to data that describe abstractions of real-world objects. Symbology specification refers to data describing symbology that due to their specific representation have particular meanings. Properties of the symbology such
374
THEUNISSEN
FIG. 17.5. Overview of the different elements involved in the spatially integrated presentation of data for navigation, guidance, and control.
as position, orientation, color, and size can be used to convey information. Dynamic synthetic data refers to data that describe the geometry of objects according to a set of representation rules and a forcing function. An example is the representation of the required trajectory. Based on the selection rules, the selection logic controls which data is to be presented. The transform rules determine the dynamic properties of the objects to be presented such as position, orientation, size, color, and style. The data transformation applies the transform rules. Design Questions The design process must address representation and functionality. Representational aspects comprise the symbology specification, the static synthetic data specification, and the representation rules for dynamic synthetic data. Functionality comprises data selection rules and data transformation rules. The specification of these rules raises many design questions. It is evident that to answer these questions, knowledge about how humans perceive and use the data presented by spatial displays is needed. To benefit from the existing knowledge in this area, specific design
17.
SPATIAL NAVIGATION DISPLAYS
375
questions must be translated into a more general context. To be able to evaluate different design options and potential trade-offs, the underlying relations must be addressed. This is only possible when the design concept and design parameters can be related to the task requirements and task performance. Task requirements determine the data that must be observable. The specific representation determines the observability of the data and the cognitive work needed to extract meaning from the available data. An Integrated Approach As indicated earlier, a major argument against spatial displays is the apparent complexity of the design process. Because of the trade-off required between generalizability and level of detail, guidelines to man–machine interface design are typically too general to be of direct use for a detailed design. Fortunately, several concepts from engineering psychology can be used to provide more insight into design questions and to serve as a foundation for a design framework. With such an integrated approach, it becomes possible to relate changes in the design and the design parameters to the available visual cues,1 the range of potential control strategies, and the resulting changes in performance. It allows one to determine plausible causes for problems and can serve as a guide toward solutions. When making modifications or proposing new designs, it is of crucial importance to understand the motivations that resulted in the current and past ones and the reasons that caused other approaches to be abandoned. Results from the following seven research areas provide a solid basis for a more structured design process of spatial displays for navigation, guidance, and control: 1. Research into the use of task-specific visual cues from spatial displays (Owen, 1990) 2. Research into the use of integrated data to reduce the required effort for mental integration (Wickens & Andre, 1990) 3. Research into the use of emergent features to reduce the required cognitive processing (Buttigieg & Sanderson, 1991) 4. Research into the effect of the frame of reference on spatial awareness and task performance (Ellis, Tyler, Kim, & Stark, 1991; Prevett & Wickens, 1994) 5. Research into the use of information about the future forcing function with respect to anticipatory control (McRuer, Allen, Weir, & Klein, 1977) 6. Research into the use of information about future constraints with respect to error-neglecting control (Godthelp, 1984) 7. Research into perspective flight-path displays (Grunwald, 1984; Theunissen, 1993; Wickens et al., 1989). 1 The
stimuli that are responsible for the information transfer from display to observer.
376
THEUNISSEN
How to Benefit from Previous Research To benefit from the knowledge in the fields of perception, cognitive engineering, and control theory, application-specific design questions need to be translated to these domains. The remaining part of this chapter will focus on how to translate specific design questions with respect to the specification of the representation and transform rules into a more general context, and how to apply concepts from engineering psychology in helping to answer these questions. To accomplish this, results from studies into the determinations of task-specific visual cues are used to identify the representation requirements. After the information content of the task-specific visual cues has been classified, the magnitude of these visual cues is described as a function of the design parameters by deriving a relation between the position and orientation errors of the vehicle and the resulting changes in the position and orientation of the perspectively presented trajectory. In this way, it can be investigated how the various design aspects influence this relation, what the consequences are for the translation of the data into useful information, and how useful the information is with respect to the ability to apply a particular control strategy. Specifying Representation and Transform Rules Figure 17.5 showed an overview of the different elements involved in the spatially integrated presentation of data for navigation and guidance. The following discussion will focus on the specification of the representation and the transform rules. Figure 17.6 shows a schematic overview of the different steps involved in the generation of a perspective image of the future desired trajectory. The upper block contains the 3-D object that represents the future trajectory and its spatial constraints. The shape and size of this object influence the type and magnitude of the visual cues that convey information about position and orientation errors.
FIG. 17.6. Overview of the different steps involved in the generation of a perspective image of the future desired trajectory.
17.
SPATIAL NAVIGATION DISPLAYS
377
To depict the desired trajectory as seen from the vehicle, its geometric description must be transformed from the world reference frame to the desired reference frame. Typically, the desired reference frame is coupled to the position of the vehicle. The transformation requires translations in the three spatial dimensions, followed by rotations around three axes. The input to the translation is the 3-D position of the vehicle in the world reference frame. The input to the three rotations can be the orientation or the direction of travel of the vehicle. For the translations and the rotations of the viewpoint, the rules establishing the relation with the position and orientation of the vehicle must be specified. The resulting choices determine the frame of reference. The next step is to use a perspective projection to present the 3-D object on a 2-D display. This projection requires the specification of a viewing volume. The specification of the viewing volume directly influences the magnitude of the visual cues that convey information about the orientation. To summarize, the design parameters can be divided into a set that determines the representation of the 3-D object (shape and size), a set that determines from where the 3-D world is viewed (frame of reference), and a set that determines how the 3-D world is projected onto 2-D display space (viewing volume). Frame of Reference The different frames of reference can be divided into egocentric and exocentric ones. In an egocentric frame of reference, the three-dimensional world is depicted as seen from the vehicle. In an exocentric display, the vehicle and the surrounding environment are viewed from another position. The representation determines the cognitive effort that is required to translate the perceived visual cues into useful information. To reduce the cognitive effort, one should aim to exploit specific features that are recognized in the early stages of human perceptual processing. In an egocentric frame of reference, a symmetrical object to represent the future trajectory yields a symmetrical shape in the absence of position and orientation errors. Both position and orientation errors distort this symmetry. The visual cues resulting from a distortion of the symmetrical reference provide information about the magnitude of position and orientation errors. Because the detection of symmetry takes place in the early processing cycles of visual information, this feature can be exploited to reduce the required effort for interpretation and evaluation. With an exocentric frame of reference, however, the natural symmetry of the presentation in a stationary condition is no longer present, and extraction of position and orientation information will require additional mental processing. A significant drawback of an egocentric frame of reference is that the limited geometric field of view2 always imposes restrictions on the amount of visible data, because one
2 The geometric field of view is the visual angle of a scene as measured from the center of projection.
378
THEUNISSEN
generally only sees the situation ahead. Thus, for tasks requiring an overview of the situation relative to the object of interest, an exocentric frame of reference is likely to prove superior. The selection of the appropriate frame of reference has been the subject of various studies. Ellis and colleagues (1987) report that “While exocentric reference frames are more beneficial for threat-detection and traffic avoidance tasks, egocentric reference frames appear to be better for the aircraft guidance task.” A later study into egocentric and exocentric frames of reference for aircraft guidance and control (Prevett et al., 1994) also confirms that egocentric perspective displays support better tracking performance than either planar or exocentric perspective displays. In a study on manual three-dimensional pursuit tracking with exocentric display formats, Ellis and colleagues (1991b) report that human subjects can simultaneously adapt to a variety of display control misalignments. Egocentric and exocentric refer to the position of the viewing vector. Another distinction is inside-out and outside-in. Inside-out refers to a presentation of the surrounding environment as it would be observed from the vehicle. As a result, with an inside-out frame of reference, changes in orientation are conveyed through translations and/or rotations of the complete visual scene. Outside-in refers to a viewpoint that is stabilized in a world-referenced system. With a vehicle referenced alignment, in most cases the frame of reference is aligned with vehicle body axis. In this situation, the viewing vector is aligned with the direction in which the vehicle is pointing. Another option for the alignment of the view plane normal is the direction of the inertial velocity vector of the vehicle. With cars, most of the time the inertial velocity vector and the vehicle body axis point in the same direction; with aircraft this is not the case. Representation To benefit from spatially integrated data presentation, the representation of the trajectory must meet the following four requirements: 1. The representation must evoke holistic perception.3 2. The representation should contain emergent features that allow the observer to determine relative position and orientation. 3. To prevent misperception, the representation should not provide ambiguous information. 4. The successive presentation of images should not produce undesired dynamic cues. The three general characteristics that allow an object to be perceived holistically are the surrounding contours, the correlated attributes, and a potential familiarity 3 The term holistic describes a mode of information processing in which the whole is perceived directly rather than as a consequence of the separate perceptual analysis of its constituent elements.
17.
SPATIAL NAVIGATION DISPLAYS
379
(Wickens, 1984). Even if the contours are not physically complete, the perceptual mechanism completes the contours through top-down processing. In other words, the level of realism of an object must exceed a certain threshold after which it does not significantly contribute to control performance. This presents the designer with some freedom that can be used to trade-off between the available computing power, amount of detail, and potential display clutter. In general, the representation of the trajectory is defined by cross sections spaced over a certain distance. To provide stronger cues about the shape, the cross sections can be interconnected. An alternative is the depiction of a road, for example, by means of tiles. To reduce the required effort for interpretation and evaluation, emergent features4 can be used to exploit certain cognitive abilities that are involved in the early stages of perceptual processing. An important emergent feature is symmetry (Buttigieg & Sanderson, 1991). To exploit the capability of humans to accurately judge horizontal and vertical, the object should contain horizontal and vertical, elements. The integration of the three spatial dimensions onto a two-dimensional display causes ambiguity that can lead to misperception. This ambiguity is resolved through assumptions about the geometry of the three-dimensional object and through the presence of visual motion cues. Therefore, the object must have a familiar shape and sufficient features to provide texture rate cues. Some cues convey ambiguous information that can only be resolved with additional information. Sometimes this additional information is not explicitly available and assumptions about the structure of the environment are used. In case of an erroneous assumption, misperception results. To prevent errors resulting from misperception, one has to understand its causes. When dealing with potentially misleading visual cues in the real world, adequate training can be used to compensate. In contrast to the real world, the designer of a spatial display has control over the structure of the virtual environment that conveys the visual cues, and as a result has more possibilities to prevent misperception through adequate design. The perception of velocity is influenced by the average texture rate. If the texture rate is too high, it can become a distracting feature. If the cross sections are connected by solid lines or are not connected at all, the texture flow rate is determined by the spacing between the cross sections. If dashed interconnections are used, the average texture flow rate is higher. Identification of Task-Specific Visual Cues In a display showing a spatially integrated presentation of the required trajectory and its constraints, several cues containing task-relevant information may be available. This raises the question about the relevance of each cue for the specific tasks. Owen (1990) distinguishes between two classes of event variables influencing 4 Emergent
features are specific attributes of an object that are instantaneously recognized.
380
THEUNISSEN
sensitivity to changes in self-motion, indicated by functional and contextual variables. He defines a functional variable as “a parameter of an optical flow pattern used to select and guide a control action.” An example of such a parameter is the orientation of the borders and the centerline of a road, which provide essential information for position control during car driving. Contextual variables are defined as “those optical properties that influence sensitivity to functional variables.” Based on results of research into perception and control of changes in selfmotion, Owen (1990) states that “results to date indicate that functional variables are of an order high enough to be completely relative, e.g., not specific to either absolute event or optical variables.” For a spatial navigation display, this implies that in case the visual cues comprise the task-relevant functional variables, observers do not have to know absolute size, distance, speed, or flow rate to apply the correct control actions. Describing the Visual Cues as Properties of the Optic Flow Pattern To better understand the contribution of specific visual cues to task strategies and task performance, a description of the information contents of the visual cues is needed. In an egocentric frame of reference, a representation can be selected that provides a symmetrical reference condition. By expressing the control-oriented visual cues as a distortion of the symmetry, they can be described as properties of the optic flow pattern. Theunissen (1994) presents a set of equations that describe the distortion of the symmetrical reference condition caused by position and orientation errors as a function of a set of design parameters. The parameters in the equations expressing the distortion of symmetry are directly related to the 3-D world. For every element of the trajectory that is presented on the 2-D display, distance to the viewpoint must be known to calculate the distortion of the symmetry caused by that specific element. Although these equations are useful to provide more insight in the contribution to position and orientation cues of elements at a certain spatial location, the third dimension in these equations makes it difficult to relate the perceived visual cues to the control actions. Because the emergent feature (distortion of symmetry) is a 2-D phenomenon, that is, the magnitude of the distortion does not vary along the viewing axis, an expression relating the distortion to position and orientation errors as a function of tunnel size and field of view without including the third dimension is desirable. To relate control actions to the available visual cues in terms of the optic flow pattern, such an expression should directly relate the position and orientation of the elements in the 2-D representation to the position and orientation errors of the vehicle under control. An additional advantage of expressing the effects of spatial position and orientation errors as 2-D cues is that perceptual thresholds can be related to minimal perceivable differences in spatial position and orientation errors. This allows the designer to specify minimum display size and resolution. The conceptual framework described by Owen
17.
SPATIAL NAVIGATION DISPLAYS
381
(1990) allows the distortion of the symmetrical reference condition to be described as properties of the optic flow pattern. In the following discussion, a distinction will be made between those that can be obtained from a single snapshot, and those that are conveyed through the successive presentation of images. Static Cues For position errors that are small compared to the magnitude of the constraints, the distortion of the symmetrical reference condition can be approximated by a rotation of the lines representing the constraints. The effect of small orientation errors can be approximated by a translation of the whole object representing the desired trajectory over a distance that is equal to the ratio of the orientation error and the field of view. Wolpert, Owen, and Warren (1983) refer to the angle between the lines representing the constraints and the line perpendicular to the horizon as the splay angle. To describe the information about position and orientation errors contained in a single snapshot, the approach will be based on the concept of splay angle. Figure 17.7 illustrates the concept of splay angle S0 for the situation of a
FIG. 17.7. Influence of a cross-track error on splay angle S0 . The dashed tunnel indicates the symmetrical reference condition, and the solid tunnel is seen when the viewpoint is displaced to the left, yielding a cross-track error. Tunnel lines 1–4 rotate around the vanishing point as a result of the cross-track error. The change is splay angle is referred to as Si , in which the index i is used to identify the tunnel line. As shown, the upper tunnel lines rotate clockwise, and the lower tunnel lines counterclockwise around the vanishing point.
382
THEUNISSEN
cross-track error XTE, which causes the tunnel lines 1 to 4 to rotate over an angle S1 to S4 , respectively. Horizontal and vertical translations of the viewpoint result in changes in the splay angle. For a cross-track error XTE, the change in splay angle S1 is equal to −S3 , and S2 is equal to −S4 . The size of the tunnel is specified by means of the width w and the height h of a cross section. When the cross-track error is small relative to the size of the tunnel, S1 is approximately equal to S2 , and S3 is approximately equal to S4 . In the absence of a vertical path error, and when defining a clockwise rotation as positive, S1 can be approximated by −XTE ×K w h /w. The constant K w h is determined by the ratio of tunnel width and tunnel height, and is equal to 1 when w is equal to h. For a vertical-track error VTE, S1 is equal to −S2 and S3 is equal to −S4 . In the absence of a cross-track error, S1 can be approximated by −VTE × K hw / h. The constant K hw is equal to 1/K w h . In the presence of both a cross-track error and a vertical-track error, S1 to S4 follow from the sum of the rotations caused by a cross-track error and the rotations caused by the vertical-track error. To summarize: When the position errors are small compared to the magnitude of the constraints, the absolute ratio K SPLAY between the change in splay angle and the absolute position error can be approximated by Equation 1. K SPLAY =
Kwh [Rad/m] size
(1)
In this equation, size represents the magnitude of the constraints. Orientation errors cause a translation of the whole visual scene. The amount of translation is proportional to the ratio of the orientation error and the field of view. When expressing the amount of translation as a percentage of the total display size, the displacement can be approximated by Equation 2. In this equation, TSCENE represents the displacement in percentage of the total display size, the orientation error, and GFOV the geometric field of view. TSCENE =
· 100% GFOV
(2)
Dynamic Cues When the data update rate of the display exceeds a certain threshold, the successive snapshot images of the situation yield a smoothly animated display. The motion of the viewpoint relative to the object in the 3-D environment representing the desired trajectory generates additional cues. These dynamic cues provide the observer with a sense of ego speed and three-dimensionality, convey position-error rate and directional information, and allow the extraction of temporal range information. Owen (1990) describes several studies that all indicate that splay rate is the functional variable for altitude control. A symmetrical object as the one depicted in
17.
SPATIAL NAVIGATION DISPLAYS
383
Fig. 17.7 provides splay rate cues both for vertical and horizontal position control. It can be assumed that when presenting both horizontal and vertical constraints, splay rate is the functional variable for horizontal and vertical position control. Position error rate can be approximated by the product of velocity and directional error expressed in radians. Using this relation, the rate of change of the splay angle S can be approximated by Equation 3. Kwh · V S˙ = · [Rad /sec] size
(3)
Because position error rate is proportional to the directional error, there are two cues for orientation errors. The gain of the splay rate cue is proportional to vehicle velocity and inversely proportional to the magnitude of the position constraints. The gain of the direct cue (image translation) is inversely proportional to the geometric field of view. Research into the perception of self-motion from the optical flow field provides a good basis for understanding the velocity cues conveyed by a spatial navigation display. Two important velocity cues available in the 3-D world are optical edge rate and global optical flow rate. Experiments into the contribution of edge rate and flow rate to the sensitivity to acceleration (Owen, 1990) revealed that the sensitivity to edge rate and flow rate varies among individuals. Owen (1990) concluded that “the findings indicate that the human visual system has two types of sensitivity for detecting increase in speed of self motion, and that the two types are unequally distributed over individuals.” With a spatial navigation display, velocity cues are conveyed by the motion of the cross-section frames toward the observer and changes in the interconnections between these frames. Optical edge rate is determined by the distance between the successive frames. When this distance varies while the observer is unaware of it, the cues provided by edge rate can cause a misperception of relative velocity. Global optical flow rate is determined by the geometric field of view and the magnitude of the position constraints. If this magnitude varies while the observer is unaware of it, global optical flow rate can cause a misperception of relative velocity. Thus, a tapered segment of the trajectory that might be needed to gradually increase the position error gain also increases global optical flow rate and can potentially yield a misperception of velocity. Further, all velocity cues are relative and inertially referenced. When the preview contains information about changes in the future trajectory, the dynamic presentation of this preview conveys so-called temporal range information. This temporal range information can be used for the timing of anticipatory control actions, for example, the moment to initiate a control action to enter a curve. In the situation of rectilinear motion, the dynamic perspective presentation of the trajectory allows the observer to estimate the time until the center of projection reaches a certain reference point without knowing object dimensions or vehicle velocity. This important phenomenon has been referred to as time to contact (TTC; Lee, 1976) and time to passage (TTP; Kaiser & Mowafy, 1993). In general, the eye reference point will not lie in the center of projection, which introduces
384
THEUNISSEN
image compression or expansion. However, because both the spatial angle and its rate are compressed by the same factor, this does not affect the estimate of the TTP. With increasing magnitude of the constraints, the maximum angular rate of change decreases, which increases the minimum TTP that can be perceived. For anticipatory control, it is important that within a certain time window an accurate estimate can be made of the time until a specific event. When the minimum TTP exceeds the maximum threshold of the useful time window, its contribution to anticipatory control will become useless. Equation 4 presents the minimum average value of TTP, TTPmin [sec], that can directly be perceived, as a function of velocity V [m/sec], magnitude of the constraints size [m], and geometric field of view GFOV [rad]. TTPmin =
size V · GFOV
(4)
Kaiser and colleagues (1993) performed a series of studies in which they examined the ability of an observer to infer temporal range information from objects that are not on a collision course but leave the observer field of view before they pass them. They report a nonveridical temporal scaling effect; shorter TTPs were overestimated, and longer TTPs underestimated. The previous discussion assumed the absence of a rotational component. A rotational component changes the angular rate of the points projected onto the 2-D view plane, which in turn makes it impossible to estimate the TTP from the motion of a single point. However, when looking at the constructs in 3-D space defined by interconnections between a set of points, the size of these constructs is not affected by a rotational component, and the TTP can be perceived from the size and its rate of change, which is equivalent to the concept of TTC. Specifying Design Parameters and Coping with Design Constraints The previous discussion has illustrated how the magnitude of the visual cues is related to the specification of the viewing volume and the specification of the dimensions of the object representing the desired trajectory and its constraints. Because the magnitude of the trajectory constraints is the parameter with which the gain of the functional variable for position control is determined, the selection of this magnitude should be based on requirements with respect to the maximum allowable position error. The magnitude of the constraints is not necessarily limited to physically relevant values. As indicated by Equation 4, the magnitude of the trajectory constraints is proportional to the average value of TTPmin . Therefore, when the magnitude of the constraints exceeds a certain threshold, no useful cues for anticipatory control are available. Based on his research into perspective flightpath displays, Wilckens (1973) reports that “the acceptance for high deviation sensitivity allows the use of channel dimensions as meaningful tolerance limit
17.
SPATIAL NAVIGATION DISPLAYS
385
indications even in the—generally most critical—landing phase. It was shown by test results that optimum tracking precision can be achieved with a channel calibrated for standard runway width.” Because no validated human operator models for use with a spatial display are presently available, the relation between splay gain and tracking performance must be obtained through human-in-the-loop experiments. To reduce the number of human-in-the-loop studies needed to determine the desired magnitude of the spatial constraints to meet the performance requirements, models describing human control behavior when using a spatial display for the tracking of a forcing function are needed. Displays have certain constraints in size. This will influence the range of usable values for the design parameters. The size of the display and the distance to the observer’s eye point determine the observer’s field of view.5 For a given display size, the geometric field of view determines the magnitude with which changes in viewpoint elevation and azimuth are conveyed. If the observer’s field of view is smaller than the geometric field of view, the information is compressed. A mismatch between the geometric field of view and the observer field of view causes direction judgment errors (McGreevy & Ellis, 1986). Another design constraint occurs when the display device imposes spatial conformality requirements. This is the case with see-through presentation devices, for example, head-up displays used in aircraft. For non-see-through displays, the field of view can be selected based on requirements with respect to the orientation error gain and constraints with respect to requirements concerning the minimum orientation data range and the maximum allowable perspective distortion. Grunwald and Merhav (1978) indicate that the lack of cues that results from a too narrow field of view can yield an undamped system. They show that the addition of predictive display symbology can compensate for these missing cues. Based on the requirements with respect to the visible range of orientation data and the required resolution of this data (in terms of spatial angles), a minimum display size requirement can be defined. If such requirements cannot be met, the reason behind the range and resolution requirements must be revisited to determine whether alternatives are possible. For example, if a certain resolution of orientation data is needed for a stabilization task, an alternative might be to include predictive information to support the stabilization task. EVALUATION For human-in-the-loop evaluation, tasks and measures are needed to quantitatively rate a certain design in terms of performance for the range of potential control strategies. Displays for guidance and control are typically compared with each other 5 The observer field of view is the visual angle of a scene as measured from the observer’s eye point.
386
THEUNISSEN
in terms of maximum tracking performance and control activity. With conventional 2-D command displays, maximum tracking performance is achieved when the operator applies a continuous compensatory control strategy. The description of the visual cues has demonstrated that information about current position and orientation errors and error rates is available from displays providing a spatially integrated depiction of the required trajectory and its constraints, thus making it possible to apply a compensatory control strategy. However, there is more. As indicated by Haskell and Wickens (1993), the way in which a task is performed differs as a function of the displays employed. They point out that “when making empirical comparisons between different display types, researchers must evaluate measures other than performance on only one type of task; they must go beyond performance in any case and examine task performance strategies.” Besides compensatory control, two additional task strategies are possible: anticipatory and error neglecting control. When preview on changes in the desired system state and an internal representation of the system dynamics are available, an anticipatory control strategy can be used to reduce the effects of these changes and the required gain for closed-loop control can be reduced (McRuer, Allen, Weir, & Klein, 1977). When constraints are presented besides the future forcing function, it becomes possible to apply an error-neglecting control strategy.6 This type of control strategy allows the operator to make a trade-off between control effort and tracking performance. With error-neglecting control, the similarity with car driving in terms of control task (boundary control) and the type of visual cues (spatially integrated presentation of constraints) together with the results of the experiments performed by Godthelp (1984), suggest that the moment of an error-correcting action is directly related to an estimate of the remaining time before the vehicle crosses one of the constraints. Theunissen and Mulder (1994) refer to this time as the time-to-wall crossing (TWC). Because spatial displays that provide preview on future constraints allow a mix of compensatory, anticipatory, and error-neglecting control strategies to be applied, tasks and measures should be used to assess and rate the possibility of applying each of these different types of control strategies. Two parameters that can be used to rate the potential of a display format for anticipatory control are the deviation from a reference time that indicates the moment the open-loop control action should be initiated, and the deviation from a reference state that is achieved as a result of the anticipatory control action. By defining a task in which subjects must try to minimize the number of control inputs while staying within the spatial constraints, the TWC, measured at the moment a control action is initiated, can be used as a measure to determine the ability to extract information about the constraints. 6 With
error-neglecting control, the pilot willingly ignores position and orientation errors for a certain period of time.
17.
SPATIAL NAVIGATION DISPLAYS
387
SUMMARY AND CONCLUSIONS This chapter has focused on the design of egocentric spatial navigation displays based on considerations with respect to perception, human information processing, and control. The three topics that were covered are the representational aspects, the factors that should influence the choice of the frame of reference, and a method to relate the magnitude of task-related visual cues to a set of design parameters. Whereas previously different designs could only be compared with each other in terms of the values of the design parameters, with this approach, it is possible to compare different designs of perspective flight-path displays in terms of gains for the task-related visual cues. Further, the need for new tasks and measures to evaluate the whole range of potential control strategies has been addressed. Although much of the underlying framework resulted from research into perspective flight-path displays, an attempt has been made to remain sufficiently generic so the approach can be used for the design of spatial displays for various types of control tasks. Any operational environment will impose additional task-related information requirements, which likely will necessitate additional data to be presented. Theunissen (1997) presents an elaborate discussion on this subject for perspective flight-path displays.
ACKNOWLEDGMENTS The research into spatial displays for navigation and guidance performed in the context of the Delft Program for Hybridized Instrumentation and Navigation Systems Phase 2 (DELPHINS II) is sponsored by the Dutch Technology Foundation STW.
REFERENCES Bos, J. F. (1991). Man–Machine aspects of remotely controlled space manipulators. Delft, the Netherlands: Delft University of Technology. Breeuwer, E. J., van Goor, S. P., Zaaijer, M. B., Moelker, D. J., & van Willigen, D. (1995). Flight test results of an advanced hybrid position filter for the multi-mode integrated approach system. Proceedings of the International Symposium on Precision Approach and Automatic Landing. (pp. 361–368). Braunschweig, Germany: 21–24 Feb 1995 Published by the German Institute of Navigation. Buttigieg, M. A., & Sanderson, P. M. (1991). Emergent features in visual display design for two types of failure detection tasks. Human Factors, 33, 631–651. Ellis, S. R., McGreevy, M. W., & Hitchcock, R. J. (1987). Perspective traffic display format and airline pilot traffic avoidance. Human Factors, 29, 371–382. Ellis, S. R., Kaiser, M. K., & Grunwald, A. J. (1991a). Pictorial communication in virtual and real environments. Bristol, PA: Taylor & Francis.
388
THEUNISSEN
Ellis, S. R., Tyler, M., Kim, W. S., & Stark, L. (1991b). Three-dimensional tracking with misalignment between display and control axes. SAE Transactions: Journal of Aerospace, 100-1, 985– 989. Fadden, D. M., Braune, R., & Wiedemann, J. (1991). Spatial displays as a means to increase pilot situational awareness. In: S. R. Ellis, M. K. Kaiser and A. J. Grunwald (Ed.). Pictorial communication in virtual and real environments, pp. 172–182. Taylor & Francis Ltd, London, UK. In Spatial displays and spatial instruments (NASA CP10032, pp. 35-1–35-12). Godthelp, H. (1984). Studies on human vehicle control, Soesterberg, the Netherlands: Institute for Perception TNO, 8, pp. 113–122. Grunwald, A. J. (1984). Tunnel display for four-dimensional fixed-wing aircraft approaches. Journal of Guidance, 7, 369–377. Grunwald, A. J., & Merhav, S. J. (1978). Effectiveness of basic display augmentation in vehicular control by visual field cues. IEEE Transactions on Systems, Man, and Cybernetics, SMC-8, 679– 690. Haskell, I. D., & Wickens, C. D. (1993). Two- and three-dimensional displays for aviation: A theoretical and empirical comparison. The International Journal of Aviation Psychology, 3, 87–109. Kaiser, M. K., & Mowafy, L. (1993). Visual information for judging temporal range. In Proceedings of piloting vertical flight aircraft (pp. 4.23–4.27). San Francisco, CA: American Helicopter Society. Lee, D. N. (1976). A theory of visual control of braking based on information about time-to-collision. Perception, 5, 437–459. McGreevy, M. W., & Ellis, S. R. (1986). The effect of perspective geometry on judged direction in spatial information instruments. Human Factors, 28, 439–456. McRuer, D. T., Allen, R. W., Weir, D. H., & Klein, R. H. (1977). New results in driver steering control models. Human Factors, 19, 381–397. Owen, D. H. (1990). Perception and control of changes in self-motion: A functional approach to the study of information and skill. In R. Warren & A. H. Wertheim (Eds.), The perception and control of self-motion (pp. 290–325). Hillsdale, NJ: Lawrence Erlbaum Associates. Prevett, T. T., & Wickens, C. D. (1994). Perspective displays and frame of reference: their interdependence to realize performance advantages over planar displays in a terminal area navigation task. Urbana, IL: University of Illinois, Aviation Research Laboratory. Stassen, H. G., Johannsen, G., & Moray, N. (1990). Internal representation, internal model, human performance and mental workload. Automatica, 26, 811–820. Theunissen, E. (1993). A primary flight display for four-dimensional guidance and navigation— Influence of tunnel size and level of additional information on pilot performance and control behaviour. Proceedings of the AIAA Flight Simulation Technologies Conference (pp. 140–146). Monterey, CA. Theunissen, E. (1994). Factors influencing the design of perspective flightpath displays for guidance and navigation. Displays, 15, 241–254. Theunissen, E. (1997). Integrated design of a man–machine interface for 4-D navigation. Delft, the Netherlands: Delft University of Technology Press. Theunissen, E., & Mulder, M. (1994). Open and closed loop control with a perspective tunnel-inthe-sky display. Proceedings of the AIAA Flight Simulation Technologies Conference (pp. 32–42). Scottsdale, AZ. Warren, R. (1982). Optical transformation during movement: Review of the optical concomitants of egomotion (Tech. Rep. No. AFOSR-TR-82-1028). Bolling Air Force Base, DC: Air Force Office of Scientific Research. (NTIS No. AD-A122 275) Wickens, C. D. (1984). Engineering psychology and human performance. Columbus, OH: Merrill. Wickens, C. D., & Andre, A. D. (1990). Proximity compatibility and information display: Effects of color, space, and objectness on information integration. Human Factors, 32, 61– 77.
17.
SPATIAL NAVIGATION DISPLAYS
389
Wickens, C. D., Haskell, I., & Harte, K. (1989). Ergonomic design for perspective flight path displays. IEEE Control Systems Magazine, Vol. 9, no. 4, pp. 3–8. Wilckens, V. (1973). Improvements in pilot/aircraft-integration by advanced contact analog displays. In Proceedings of the Ninth Annual Conference on Manual Control (pp. 175–192). Cambridge, MA: MIT Press. Wolpert, L., Owen, D. H., & Warren, R. (1983). The isolation of optical information and its metrics for the detection of descent. In D. H. Owen (Ed.), Optical flow and texture variables useful in simulating self motion (II). Columbus, OH: Ohio State University, Dept. of Psychology.
18 Implementing Perception–Action Coupling for Laparoscopy F. A. Voorhorst C. J. Overbeeke G. J. F. Smets Faculty of Industrial Design Engineering Delft University of Technology
Laparoscopy1 in its current form is far from optimal. While the surgeon stands next to the patient, he has no direct contact with the patient. For manipulation, special instruments are used that lack tactile information, have limited degrees of freedom, and are limited in their movements because they enter the abdomen at fixed locations. Visual information is obtained using a laparoscope that is, as the surgeon needs both hands to operate, controlled by an assistant. In general, the perception–action coupling is hampered. This chapter discusses implementation aiming at restoring perception–action coupling with respect to visual information by linking the point of observation or the point of illumination directly to the head movements of the surgeon. This restores the perception–action coupling and allows the surgeon to obtain spatial information through exploration. However, two problems arise. First, manipulation and making explorative movements conflict with each other. Although observation tasks do invite the observer to move, manipulation tasks, for example, putting a wire through the eye of a needle, do not. Second, there are technical constraints such as size and shape; for example, the mechanism must be small enough to be used during 1 Laparoscopy is a minimally invasive surgical technique, or “keyhole” surgery, in the abdomen area during which the surgeon operates through small keyholes.
391
392
VOORHORST, OVERBEEKE, AND SMETS
laparoscopy, or the mechanism may not harm the patient. Implementation aimed at restoring perception–action coupling during laparoscopy is explored within the framework of the ecological approach to visual perception as described by Gibson (1979).
LAPAROSCOPIC OPERATION Technical Development to the Present The history of the development of laparoscopic devices demonstrates the close relation between development and applicability: the interaction between researchers and users (Edmonson, 1991; Paraskeva, Nduka & Darzei, 1994; Walk, 1966). On the one hand, researchers develop a technical solution for a problem indicated by the user, and users on the other hand apply the developed solution, and by doing so explore its limitations. Laparoscopy, in its current form and as a practical tool, was introduced by pioneers such as Philip Mouret (Mouret, 1996). It has evolved from an obscure technique into an overall accepted new type of surgery. In the early 1990s, it was expected that besides laparoscopic cholestectomy (gallbladder removal), other frequent surgical interventions would be done laparoscopically. However, there are currently only two well-established applications of laparoscopic surgery: laparoscopic cholestectomy and diagnostic laparoscopy. Other applications like, for example, hernia repair and antireflux surgery, are less frequently performed for many reasons. For the patient, laparoscopy has a number of advantages as compared to open surgery, such as less trauma, less morbidity, and faster recovery of the patient. For the surgeon, however, there are none. Compared to open surgery, during laparoscopy the surgeon is limited in the information that can be obtained and the manipulations that can be performed. For example, haptic information is reduced as instruments for manipulation have poor mechanical properties (Herder, Horward, Sjoerdsma, Plettenburg, & Grimbergen, 1997; Sjoerdsma et al., 1998). Visual information is reduced because laparoscopes commonly provide monocular information. In addition, the laparoscope is operated by an assistant rather than the actual “user.” In short, the perception–action coupling is greatly reduced. If we want to restore the perception–action coupling to allow the surgeon to obtain spatial information, the question of what information is needed during laparoscopy arises. Spatial Information During Laparoscopy Monocular laparoscopes provide for limited information about the spatial layout. This reduced spatial information hampers manipulation, orientation within the abdomen, and perception of the spatial structure and the identification of various tissues. Therefore, the use of a stereoscopic laparoscope often is suggested
18.
PERCEPTION–ACTION COUPLING
393
(Mitchell, Robertson, Nagy, & Lomax, 1993; Pichler et al., 1993; Wenzl et al., 1994; Zobel, 1993). For example, Cole, Merritt, Fore, and Lester (1990) describe an experiment in which they compared binocular and monocular vision. Participants were asked to maneuver a rod through a maze. The maze was constructed to provide minimal monocular spatial information. They concluded that such manipulation tasks require binocular information. Similar results were reported by Spain (1990; Spain & Holzhausen, 1991), who also investigated the advantage of binocular vision over monocular vision when performing the task of placing a peg in a hole. However, binocular vision does not always prove to be an advantage. For example, Kim, Tendick, and Stark (1987) found equivalent performance for a computer pick-and-place task when sufficient perspective information is available, for example, cues indicating the position of objects relative to the ground surface. Liu, Tharp, and Stark (1992) found that for a three-dimensional tracking task in a simulated virtual environment under monocular and binocular vision results in similar performance when monoscopic cues such as occlusion and perspective are present. Birkett, Josephs, and Este-McDonald (1994) reported no advantage of binocular vision over monocular vision for a simple laparoscopic procedure, but did find shorter operation time for binocular vision when performing a complex procedure. Tendick, Jennings, Tharp, and Stark (1993), in a laboratory setting, found binocular vision to be an advantage for a pick-and-place task under laparoscopic conditions, but also found a difference in completion time between monocular direct vision and monocular laparoscopic vision that therefore cannot result solely from the lack of binocular vision. They expected that a binocular laparoscope improves performance, but it will not be as good as binocular direct viewing. Hanna, Shimi, and Cuschieri (1998) compared monocular and binocular laparoscopic vision in a large clinical study of 60 operations. Contrary to the findings of Tendick et al., (1993), they found that binocular laparoscopic vision did not affect performance. Why Binocular Disparity Alone May Be Not Enough The problem with binocular disparity is that it provides only part of the information needed. The spatial information needed by the surgeon to perform his manipulation tasks consist of two parts. The first part of spatial information is the specification of the structure of the environment relative to the point of observation, for which binocular disparity can be used. Normally, the point of observation coincides with the viewpoint of the observer. However, in laparoscopy they do not coincide as the point of observation is located at the tip of the laparoscope, and the viewpoint of the surgeon, from which he observes the monitor, is located inside the operation room. The second part of the spatial information, therefore, is the specification of the point of observation relative to the viewpoint of the surgeon. The surgeon has to be able to relate the spatial structure to his own viewpoint. Currently, the surgeon relates the spatial structure to his own viewpoint by moving his instruments relative
394
VOORHORST, OVERBEEKE, AND SMETS
to the tip of the laparoscope. By assuming that the environment is stationary and that the laparoscope is not moving, a move of the instrument shows which direction within the monitor image corresponds with a movement with the instrument. For example, when the surgeon moves his hand upwards and the tip of the instrument moves sideways, it follows that the laparoscope is tilted. This way of obtaining information about the spatial layout is current practice. However, the instruments used during laparoscopy do not, in general, match the perceptual–motor system of the surgeon, so his actions are restricted. The instruments can be made more suitable for the actions the surgeon has to perform by, for example, increasing their degrees of freedom and simplifying their handling (Herder et al., 1997). In some cases, it is possible that the surgeon can use his or her hand inside the abdomen. For example, Bannenberg, Meijer, Bannenberg, and Hodde (1996) describe a procedure in which they make an incision large enough for the surgeon to enter a hand inside the abdomen and attach a plastic sleeve, airtight, to the skin around the incision. Such a procedure is feasible when a large incision is needed, for example, to remove an organ. That current laparoscopic operation, during which spatial information is obtained though instrument movements, is far from optimal can be concluded from the following two observations: First, with every movement of the assistant with the laparoscope the surgeon has to recover the relation between his point of observation and the spatial layout within the abdomen, second, there seems to be a fundamental conflict because the surgeon has to manipulate his instruments to obtain the spatial information that is needed to perform his manipulation task. Information about how the point of observation inside the abdomen relates to the viewpoint of the surgeon can also be provided visually by linking the movements of the tip of the point of observation directly to the head movements of the surgeon. This not only allows the perception of the spatial layout without the need for binocular disparity (Fig. 18.1), it also specifies the spatial layout relative
FIG. 18.1. Depth perception does not exclusively rely on binocular disparity. A monocular observer, who has the ability to move, is also able to perceive the spatial layout.
18.
PERCEPTION–ACTION COUPLING
395
to the surgeon’s movements, and thus relative to the surgeon. For example, when a vein needs to be clipped, the surgeon has to perceive the direction in which to move the instrument in order to reach the vein. Kim, Tendick, and Stark (1987) experimentally investigated the influence of the orientation of the laparoscope (the azimuth angle) relative to the working area on the performance of a pick-and-place task. They found that for angles of 90 deg or greater, performance of a pick-andplace task decreased significantly. Holden, Flach & Donchin (1999) describe an experiment in which they alter the placement of the camera, the position of the surgeon relative to the work space, and the positions of the participants within the work space. They found, for a pick-and-place task, that moving the camera relative to the instruments can temporally be very disruptive to coordination when either the position of the camera or the position of the surgeon is changed. Wade (1996) differentiates between a geocentric frame of reference, which refers to the world, and an egocentric frame of reference, which refers to the surgeon. He suggests that both frames of reference have to be aligned, for example, by using a head-mounted display (Geis, 1996), but he fails to identify what specifies a frame of reference. In this chapter, a distinction is made between the viewpoint of the observer and the point of observation within the abdomen, both of which are specified visually, namely, by the optic array.2 They can be linked by coupling the point of observation directly to the viewpoint of the surgeon. When moving the laparoscope, the surgeon creates parallax shifts, which provides for information about the spatial layout. Also, the surgeon can relate this information to his or herself because the parallax shifts follow from his or her own movements. The act of moving makes the spatial information body scaled, which is important because the surgeon has to manipulate within the abdomen. The DVWS: Space Through Movement The relation between observer movement and the perception of the spatial layout through parallax shifts was technically implemented in the so-called Delft Virtual Window System (DVWS) (Overbeeke, Smets & Stratmann, 1987; Smets, 1995; Smets, Overbeeke, & Stratmann, 1988). The DVWS, depicted in Fig. 18.2, creates shifts similar to movement parallax by rotating a camera around a rotation point while it is aimed at this point as well. Then, objects behind the fixation point appear to shift within the image in the same direction as the observer, and objects in front of the fixation point appear to shift within the image in the opposite direction to the observer. Because the fixation point (i.e., the point at which the camera is aimed) and the rotation point (i.e., the point around which the camera rotates) coincide, the fixation point is then stationary relative to the scene displayed on the monitor and relative to the monitor itself. As a result, the fixation point links the spatiality 2 The optic array is the light arriving at a point of observation. As it arrives from different starting points, the structure of the optic array is uniquely related to the structure of the environment.
396
VOORHORST, OVERBEEKE, AND SMETS
FIG. 18.2. The Delft Virtual Window System (DVWS). The motions of the camera are linked to the head movements of the observer.
of the observed scene to the movements of the observer. The DVWS therefore provides body-scaled spatial information about the scene observed. If the motions of the camera are linked to the movements of the observer, the observer is able to relate the shifts within the image to his own movements. This allows him not only to obtain spatial information about the scene observed, but also to relate the spatial layout to his own movements, and hence to his or her own body. Figure 18.2 illustrates the principle of the Delft Virtual Window System. The DVWS seems feasible for laparoscopy because it provides not only for spatial information, but also for spatial information that it related to the surgeon. The next paragraph will discuss possible technical implementations.
IMPLEMENTING PERCEPTION–ACTION COUPLING The main difficulty when implementing the DVWS for laparoscopy is the technical realization. From Fig. 18.3 it is obvious that implementing the principle of the DVWS as is, without designing a special laparoscope, removes all advantages of minimally invasive surgery. Instead, a special laparoscope is needed with a mechanism that, preferably, should be as simple as possible. However, the mechanism is bound to be complex because the fixation point, around which the camera has to move in a circular manner, is located in front of the tip of the laparoscope. A feasible implementation should include a mechanism for rotating the camera around this fixation point, and therefore will be complex.
18.
PERCEPTION–ACTION COUPLING
397
Abdomen wall
FIG. 18.3. The movements of a laparoscope around a fixation point. The cube, at which the laparoscope remains aimed, is observed from different points of observation (top) resulting in the corresponding monitor images (bottom). If the DVWS were implemented in this way, laparoscopy would not be minimally invasive.
Possible implementations are constrained in three ways. First, there are perceptual constraints, that is, what information has to be provided for the task to be performed. Second, there are medical constraints; that is, the instruments cannot be allowed to harm the patient. For example, instruments must not lose parts, because these can be overlooked and left behind inside the patient without being missed. Also, any part of the instrument that can make contact with either the surgeon, the patient, or another instrument must be capable of being sterilized. Third, there are constraints with respect to usability; that is, the surgeon must be able to use the instrument to obtain the required information. This is not as trivial as it seems. For example, manipulation tasks that require steady dexterity invite the surgeon to stand as still as possible. Linking the motions of a laparoscope to the movements of the surgeon to provide for spatial information during these tasks may be feasible in theory, but in practice it is not. Thus, the technical implementation has to be such that the surgeon can use it within the constraints of laparoscopic operation. This paragraph will take a look at possible technical implementations and evaluate them using the three criteria. Moving a Camera Inside the Laparoscope A possible technical implementation of the DVWS is to built a small mechanism inside the laparoscope to move an even smaller camera. This was demonstrated with Prototype 1 (Subroto, 1991). Instead of a camera, a small glass-fiber scope
398
VOORHORST, OVERBEEKE, AND SMETS
FIG. 18.4. Prototype 1: a mechanism with in the tip of the laparoscope for moving a fiber-scope (the arrow indicates the tip of the fiber-scope).
was used. The tip of the glass-fiber scope (indicated by an arrow in Fig. 18.4) could move in the horizontal plane around a fixation point. Analysis of registered head movements showed that subjects mostly move in the horizontal direction (Voorhorst, Overbeeke, & Smets, 1997). The fixation point was located at a distance of 30 cm in front of the prototype and the maximum angle of rotation relative to this point was approximately 20 deg. Prototype 1 demonstrated the feasibility of viewpoint parallax for a laboratory setup; that is, the prototype was fixed in a mechanical support. However, a test in a more realistic situation, when the laparoscope was not fixed but directed by an assistant or by the surgeon himself, showed that the motions of the camera were too small. In conclusion, both from a perceptual point of view and from a technical point of view, Prototype 1 was found not feasible for medical application. Moving the Entire Tip of the Laparoscope Contrary to the limited space within the laparoscope, the space outside the laparoscope is limited only by the inflated abdomen. Therefore, instead of moving the camera inside the laparoscope, the tip of a nonrigid laparoscope can be made to move. Moving the tip can be achieved by dividing the laparoscope into three segments (Fig. 18.5), with the camera located in the first segment. All segments could be controlled individually, but to minimize the weight of the laparoscope they
3
2
1
FIG. 18.5. Three segments are needed to rotate a camera around a fixation point.
18.
PERCEPTION–ACTION COUPLING
399
FIG. 18.6. Prototype 2, based on the mechanism shown in Fig. 18.5. The prototype has a diameter of 20 mm.
preferably are controlled in combination. A method to connect the segments that was explored is the use of wires, which are commonly used to control flexible endoscopes. Its advantage is that motors to control the segments can be located outside the laparoscope, but the disadvantage is that it is difficult to keep the wires perfectly stretched. If they are not perfectly stretched they introduce dead time; that is, the observer moves his head while the camera remains stationary. Contrary to a small time delay, dead time will be noticed almost immediately. Prototype 2 demonstrated the feasibility of such a technical solution (Fig. 18.6). It has the advantage that it allows larger motions of the camera compared to a laparoscope with a built-in movable camera (Prototype 1). The main disadvantage from a user’s point of view is that Prototype 2 has to be inserted so that at least the first two segments are inside the patient. This disadvantage partly disappears when the trocar3 is allowed to rotate around the point of entrance of the abdomen (Fig. 18.7). The prototype then requires only two segments, but additionally requires a mechanism for rotating the trocar. However, the laparoscope still has to be inserted so that at least the first segment is inside the patient. In conclusion, although this implementation is feasible from a perceptual point of view, it is technically too complex to be feasible in other than a laboratory setting. Moving the Light Source Technical implementation remains the main stumbling block for implementing viewpoint parallax (the DVWS) for laparoscopy. A possible alternative to 3 The
trocar is the device through which an instrument or laparoscope enters the abdomen.
400
VOORHORST, OVERBEEKE, AND SMETS
second point of rotation Abdomen wall
Abdomen wall
second point of rotation
FIG. 18.7. If rotation around the point of entrance of the abdomen is allowed, then the second point of rotation can be omitted.
viewpoint parallax is shadow movement parallax. With shadow movement parallax (or shadow parallax) the movements of the light source are linked to the head movements of the observer. Shadow parallax has the advantage over viewpoint parallax in that a moving light source can be simulated by two stationary light sources of which the intensity balance varies. For example, a light source movement to the right can be simulated by increasing the intensity of the right light source while simultaneously decreasing the intensity of the left light source. Implementation now requires no moving parts. Instead, only two separate light guides are needed, each of which terminates to one side of the image guide. Experiments (Voorhorst, 1998) showed the feasibility of shadow parallax for inspection tasks, during which participants were asked to determine a difference in height of an observed surface. In a later experiment, viewpoint parallax and shadow parallax were compared for a spatial observation and manipulation task (Voorhorst, Mijer, Overbeeke, & Smets, 1998). Results showed that viewpoint parallax is preferred over shadow parallax. However, the stimuli used in this experiment were knots made out of electrical wires, from which spatial information through shadows is minimal. Overall results suggested that shadow parallax or a combination of both principles could be feasible, for example, in the situation when the stimuli are more organically shaped. Therefore, to test the principle of shadow parallax in a practical situation, a laparoscope with two separate light guides was built (Fig. 18.8, bottom). Figure 18.8 (top) shows a close-up of the tip of a commonly used 10-mm laparoscope with a single light source, and Figure 18.8 (bottom) shows a laparoscope with the two separate light sources, Prototype 3. It was tested in a laboratory setup and in a practical setting at the Laboratory for Experimental Surgery at the Academic Medical Centre of the University of Amsterdam (AMC/UvA). In the
18.
PERCEPTION–ACTION COUPLING
401
FIG. 18.8. The tip of a conventional laparoscope (top ) and the tip of the laparoscope built by Surgi-Tech (bottom ).
laboratory setup, the variations in the balance of the intensity of both light sources was visible, but in the practical situation it was not. Compared to the movements of the organs inside the abdomen caused by heartbeat and breathing, the variation in the balance of the intensity of the light sources did not show. In conclusion, while shadow parallax is feasible from a technical point of view, from a perceptual point of view it provides insufficient information to be feasible for medical application. Making Use of the Fish-Eye Lens Because an alternative method of generating shifts within the image, namely shadow parallax, was found to provide insufficient information, research refocused on implementing viewpoint parallax (the DVWS). Previous prototypes were based on two rotation axes (for example, Fig. 18.6: Prototype 2). As discussed earlier, one of these rotation axes can be omitted if the laparoscope is allowed to rotate around the point of entrance of the abdomen (Fig. 18.7). As a result, a laparoscope with only one rotation axis is needed, but then an additional mechanism will be needed to rotate the laparoscope in synchrony with the movements of the tip. However, a solution without a mechanism in the laparoscope is possible by using the fish-eye lens of the laparoscope.
402
VOORHORST, OVERBEEKE, AND SMETS
Abdomen wall
FIG. 18.9. Movements of a laparoscope when rotated around the point of entrance of the abdomen (top) and the corresponding monitor images (bottom).
A commonly used laparoscope has a fish-eye lens with a viewing angle that can vary from 60–120 deg. This fish-eye lens makes it possible to direct the tip of the laparoscope away from the area of interest while keeping the area of interest within the field of view (Fig. 18.9). Because the point of observation is located at the tip of the laparoscope, rotating the laparoscope around the point of entrance of the abdomen results in a translation of the point of observation. The area of interest is observed from a different angle. By making use of the fish-eye, exploration can be allowed with a commonly used rigid laparoscope that rotates around the point of entrance of the abdomen. There is one major difference between rotating the laparoscope around the point of entrance of the abdomen and the original idea of the DVWS. Originally, with the DVWS the fixation point (i.e., the point at which the camera is aimed) and the rotation point (i.e., the point around which the camera rotates) coincide. Because both points coincide, the observer movements relative to the monitor are linked to the spatiality of the scene that is presented on the monitor. However, when making use of the fish-eye lens, the fixation point and the rotation point do not coincide. As a result, the area of interest does not remain stationary at the center of the screen during observer movements; rather, it shifts toward the edge of the monitor screen. Although the two points do not coincide, the spatiality of the observed scene is still linked to the movements of the observer because the point of rotation is indicated by the shifts during camera movements (Fig. 18.10). In a laboratory setting (Fig. 18.11, left), it was verified that such movements of the laparoscope do allow an observer to explore. With a first prototype (Fig. 18.11, right), it was
18.
PERCEPTION–ACTION COUPLING
403
FIG. 18.10. Views when the camera moves according to the DVWS, that is, when the fixation point and the point of rotation coincide (left). Views when the camera moves similar to the proposed movements of the laparoscope, i.e., when the fixation point and the point of rotation do not coincide (right). Although both points do not coincide, the point of rotation (which is located at the abdomen wall) is implied by the shifts within the monitor image.
FIG. 18.11. The setup with which it was tried to explore a scene by rotating the laparoscope the point where the trocar enters the abdomen (left ) and a first prototype of a feasible implementation for controlling the movements of the laparoscope (the 3 panels on the right ).
404
VOORHORST, OVERBEEKE, AND SMETS
verified in a practical setting at the Laboratory for Experimental Surgery at the AMC/UvA. Technically, rotating the laparoscope around the point where the trocar enters the abdomen has the advantage that it requires no moving parts within the laparoscope because it virtually removes all limitations in respect of the mechanism. For example, Finlay and Ornstein (1995) exploit the freedom with respect to the size of the mechanism and use an industrial robot for controlling the motions of the laparoscope. However, a large industrial robot may not be the best of solutions. In our case, for instance, the mechanism needs, if possible, to be small and simple so that it can be manipulated by the surgeon. Perceptually, this solution has the advantage that the surgeon is not disturbed by motions of the laparoscope generated by the assistant holding and directing the laparoscope. The previously described implementations all were based on a mechanism that was included within the laparoscope that allowed the surgeon to explore a selected area of interest. However, supporting and directing the laparoscope toward the area of interest was performed by an assistant. Experiments have shown that the surgeon’s performance of a manipulation task is reduced by the motions generated by the assistant who is holding and directing the laparoscope (Voorhorst et al., 1998), and it was concluded that the laparoscope preferably is supported mechanically. An implementation based on the rotation of the laparoscope around the point where the trocar enters the abdomen will combine the support of the laparoscope with a mechanism for allowing exploration. Such implementation will take over the assistant’s task of supporting the laparoscope. In conclusion, from a technical point of view, implementing viewpoint parallax by making use of the fish-eye is feasible because all mechanical parts are located outside the abdomen. From a perceptual point of view, rotating the laparoscope is feasible as it allows for observing an area of interest from different points of observation and obtain spatial information. The next section describes the resulting prototype and its evaluation in a practical setting. A FEASIBLE PROTOTYPE Controlling the Laparoscope Through Head Movements Based on the implementation of viewpoint parallax by making use of the fish-eye, a prototype was designed and built (Fig. 18.12; Voorhorst, 1998). The prototype basically consist of a trolley that can move along a guide. It was designed for ease of use during the operation, but also for ease of assembly and disassembly. The main problem for sterilization and cleaning are movable parts, such as rotating shafts and bearings. Therefore, all movable parts of the prototype can be dismantled, for which only three bolts and two pins have to be removed. To use the prototype, it is placed such that the point where the trocar enters the abdomen is located at the center of the guide. Then the trolley makes a circular
18.
PERCEPTION–ACTION COUPLING
405
FIG. 18.12. The prototype of the circular guide. It clearly shows the movements of the laparoscope around the center of the circular guide, where the laparoscope enters the abdomen.
movement around the point of entrance. A motor, connected to the side of the prototype, controls motions of the trolley along the guide. The motor is linked with an infrared detection device that registers the head movements of the surgeon. Selecting an Area of Interest The surgeon with the circular guide has direct control over the movements of the laparoscope. Simply by making head movements, the surgeon controls the motions of the laparoscope, allowing him or her to explore an area of interest. The assistant may only be needed to redirect the laparoscope toward a (new) area of interest for which adjustments in the location and the orientation of the circular guide may be needed. However, as the surgeon already has (partial) control over the laparoscope, it is more likely such redirections also will be done by the surgeon. Therefore, when designing a prototype that allows the surgeon to explore, a mounting device should also be designed to facilitate redirecting the laparoscope toward a new area of interest. There are two types of redirections that are likely to occur, namely, redirections that do not include the reorientation of the circular guide and redirections that include the reorientation of the circular guide. The first type of redirection (i.e., those that do not include the reorientation of the circular guide) can be implemented simply, as it only involves temporarily unlinking the head movements of the surgeon and the motions of the laparoscope. It is expected that for these type of readjustments the surgeon will grasp the laparoscope and direct it toward the new area of interest. To facilitate this, and to facilitate it only when the surgeon holds the laparoscope, a button was placed on the outer end of the laparoscope
406
VOORHORST, OVERBEEKE, AND SMETS
FIG. 18.13. A button on the laparoscope to temporarily disconnect the surgeon from the it was placed at the end of the instrument. When holding the laparoscope, the button is automatically pressed (right ), and the laparoscope can move freely.
FIG. 18.14. Prototype of the mounting device based in exploded view (left ). To lock the mounting device, the grip has to be rotated upward (right ), and to lock it the grip must be rotated downward (middle).
(Fig. 18.13). Pressing this button allowed the surgeon to move the instrument freely in the horizontal direction. Releasing the button relinks the laparoscope to the surgeon. The second type of redirection (i.e., those that include the reorientation of the circular guide) required the design of a mounting device. A mounting device is used to connect the circular guide to the operation table. When changing the orientation of the circular guide, it first has to be unlocked from the mounting device and relocked in its new orientation. Commonly used mounting devices release and constrain all 6 DF and therefore require two-handed operation: one to support the mounting device and the other to operate it. The surgeon, however, has only one hand to operate the mounting device because the other hand will be directing the laparoscope. Therefore, a mounting device was designed that releases and constrains only 4 DF (Fig. 18.14). It can be manipulated with one hand without having to support the entire mounting device with the other. It has a handle that,
18.
PERCEPTION–ACTION COUPLING
407
FIG. 18.15. The prototype is tested in a practical setting at the AMC/UvA.
by pushing it upward, locks 3 rotation DF and 1 translation DF. Pulling the handle downward releases all 4 DF. The mounting device is designed so that it can be taken apart for cleaning or sterilization.
EVALUATION The circular guide and mounting device were tested in a practical setting. The evaluation test started with the insertion of a trocar. After the abdomen was inflated, the circular guide was placed over the trocar. The laparoscope was inserted through the trolley and the trocar. The space between the inner diameter of the trolley and the outer diameter of the laparoscope was small enough for the laparoscope to remain fixed to the trolley. Figure 18.15 gives the situation during the evaluation test. The experimental surgeon started the tryout of the prototype by looking around, followed by a small manipulation task. Initially, the transmission ratio between the head movements and the rotations of the laparoscope was found to be too small, and it was increased. After trying the link with the laparoscope, two extra trocars were placed to insert two instruments. The size of the circular guide was found to be slightly too large for comfortable operation. The surgeon noted that the abdomen of a human is larger than the abdomen of the pig, and he expected that the circular guide would allow comfortable operation on a human. With the two instruments, inserted the surgeon performed a small manipulation task (dissecting a part of the spleen). This was done without being assisted in operating the laparoscope or the mounting device. The head-tracking device worked properly, allowing the surgeon to turn his head and even walk away and return. During the test, some adjustments were made to the viewing direction of the laparoscope. The mounting device was designed such that for adjustments in a direction other than the horizontal the surgeon had to use both hands. However,
408
VOORHORST, OVERBEEKE, AND SMETS
during the test, all adjustments were performed with one hand. The surgeon just pulled the laparoscope to the new orientation with one hand while holding both instruments with the other hand. The mounting device, although it was not used during the operation, was found to make it easy to position the circular guide at the beginning of the operation. Overall, the surgeon was enthusiastic about being linked to the laparoscope. Being linked and controlling the movements of the laparoscope in the horizontal direction allowed him to look around and to explore, but he suggested that a coupling in the vertical direction could be preferable, as well as a coupling in the direction from and toward the monitor (zoom). This is subject of further investigation.
CONCLUSION In conclusion, this chapter has described implementation aimed at restoring perception–action coupling in laparoscopy. The basis has been a technical implementation of movement parallax, the Delft Virtual Window System (Overbeeke et al., 1987; Smets et al., 1987). The DVWS allows for depth perception by directly linking the head movements of an observer to the motions of the camera. Currently, for the surgeon, perception and action with respect to visual information are uncoupled as the laparoscope is held and directed by an assistant. To improve the perception of the spatial layout inside the abdomen, the surgeon should have direct control over the laparoscope. The main difficulty for restoring this perception–action coupling is the design of a feasible technical implementation (a technical constraint) that provides sufficient information (a perceptual constraint) to perform the tasks occurring during laparoscopy. However, restoring perception–action coupling is more than a combination of technical and perceptual constraints. Restoring an action possibility, in this case allowing the surgeon to explore, also reveals previously invisible affordances. For example, providing the surgeon with the ability to explore invites him or her to also (re)direct the laparoscope toward a new area of interest. Thus, restoring perception–action coupling involves more than designing a mechanism; it involves designing a whole new routine of usage. Figure 18.16 shows how perception–action coupling is restored by providing the surgeon with control over his point of observation (located at the tip of the laparoscope and presented on the monitor). There are three ways in which the surgeon has control over the laparoscope. First, for (continuous) exploration of the selected area of interest, the head movements of the surgeon are linked directly to the motions of the laparoscope (see panel 1 in Fig. 18.16). For (less frequent) adjustments in the selection of the location of the area of interest the surgeon can redirect the laparoscope towards a new area of interest by hand (see Panel 2 in Fig. 18.16). For installation of the circular guide relative to the trocar, the mounting device (which connects the circular guide with the operation table) has to be
18.
PERCEPTION–ACTION COUPLING
409
FIG. 18.16. Perception–action coupling is restored in three ways: the movements of the laparoscope are linked directly to the head movements of the surgeon (1), the orientation of the laparoscope can be controlled by hand (2), and the orientation of the circular guide can be adjusted by operating the mounting device.
operated (see Panel 3 in Fig. 18.16). (Re)Mounting to the table is performed only at the beginning of the operation or when the trocar through which the laparoscope enters the abdomen is changed. The approach to implementation described in this chapter focuses on use rather than either user or apparatus, creating a tension field that perhaps is most characteristic for an ecological approach to interface design. On the one hand, implementation explores what information can be provided from a technical point of view, and on the other hand it explores which of this information has to be provided from a perceptual point of view. Or, to say it in other words, it explores affordances in relation to the task to be performed. ACKNOWLEDGMENTS This work was supported by the Dutch Technology Foundation (STW) under grant DIO22–2732. Surgi-Tech kindly supporting us with a prototype of a laparoscope based on shadow parallax. We are grateful to Dr. D. W. Meijer from the AMC/UvA, who enthusiastically provided the possibility to test (wood and wire) prototypes in a practical setting. REFERENCES Bannenberg, J. J. G., Meijer, D. W., Bannenberg, J. H., & Hodde, K. C. (1996). Hand assisted laparoscopic nephrectomy in the pig. Minimally Invasive Therapy and Allied Technologies, 5, 483–487.
410
VOORHORST, OVERBEEKE, AND SMETS
Birkett, D. H., Josephs, L. G., & Este-McDonald, J. (1994). A new 3D laparoscope in gastrointestinal surgery. Surgical Endoscopy, 8, 1448–1451. Cole, R. E., Merritt, J. O. Fore, S., & Lester, P. (1990). Remote manipulation tasks impossible without stereo TV. Proceedings of the SPIE: Stereoscopic Displays and Applications, 1256, 255–265. Edmonson, J. M. (1991). History of the instruments for gastrointestinal endoscopy. Gastrointestinal Endoscopy, 37(2), S27–S56. Finlay, P. A., & Ornstein, M. H. (1995). Controlling the movement of a surgical laparoscope. IEEE Engineering in Medicine and Biology, 289–291. Geis, W. P., Kim, H. C., Zern, J. T., & McAfee, P. C. (1996). Surgeon voice control of the laparoscopic visual field using the robotic arm. Proceedings of the Eighth International SMIT meeting. (p. 48). (Abstract only) Gibson, J. J. (1979). The ecological approach to visual perception. London: Lawrence Erlbaum Associates. (Reprinted 1986) Hanna, G. B., Shimi, S. M., & Cuschieri, A. (1998). Randomised study of influence of two-dimensional versus three-dimensional imaging on performance of laparoscopic cholecystectomy. The Lancet, 351, 248–251. Herder, J., Horward, M., & Sjoerdsma, W. (1997). A laparoscopic grasper with force perception. Minimal Invasive Therapy, and Allied Technology, 6(4), 279–286. Herder, J. L., Maase, S., Voorhorst, F., Sjoerdsma, W., Smets, G., Grimbergen, C. A., & Stassen, H. G. (1997). Ergonomic handgrip for laparoscopic graspers. Minimally Invasive Therapy & Allied Technologies, 6(1). p. 55. Holden, J. G., Flach, J. M., & Donchin, Y. (1999). Perceptual–motor coordination in an endoscopic surgery simulaton. Surgical Endoscopy, 13:127–132. Kim, W., Tendick, F., & Stark L. (1987). Visual enhancements in pick-and-place tasks: Human operators controlling a simulated cylindrical manipulator. IEEE Journal of Robotics Automation, 3(5), 418– 425. Liu, A., Tharp, G., & Stark, L. (1992). Depth cue interaction in telepresence and simulated telemanipulation. Proceedings of the SPIE, 1666, 541–547. Mitchell, T. N., Robertson, J., Nagy, A. G., & Lomax, A. (1993). Three-dimensional endoscopic imaging for minimal access surgery. J. R. Col. Surg, 285–292. Mouret, P. (1996). How I developed laparoscopic cholecystectomy. Ann. Acad Med Singapore, 25(5), 744–747. Overbeeke, C. J., Smets G. J. F., & Stratmann, M. H. (1987). Depth on a flat screen II. Perceptual and Motor Skills, 65, 120. Overbeeke, C. J., & Stratmann, M. H. (1988). Space through movement. Unpublished doctoral dissertation, Delft University of Technology, the Netherlands. Paraskeva, P. A., Nduka, C. C., & Darzei, M. (1994). The evolution of laparoscopic surgery. Minimally invasive therapy, 1(3), 69–75. Pichler, C. v, Radermacher, K., Grablowitz, V., Boekmann, W., Rau, G., Jakse, G., & Schumpelick, V. (1993). An ergonomic analysis of stereo-video-endoscopy. In Proceedings of the 15th Annual International Conference on Engineering in Medicine and Biology Society. (Vol. 3, pp. 1408–1409). Semm, K. (1983). Endoscopic appendicectomy. Endoscopy, 15, 59–64. Sjoerdsma, W., Herder, J. L., Horward, M. J., Jansen, A., Bannenberg, J. J. G., & Grimbergen, C. A. (1997). Force transmission of laparoscopic grasping instruments. Minimally invasive therapy and allied technologies, 6(4), 279–286. Smets, G. J. F. (1995). Industrial design engineering and the theory of direct perception and action. Ecological psychology, 4, 329–374. Smets, G. J. F., Overbeeke, C. J., & Stratmann, M. H. (1987). Depth on a flat screen. Perceptual and Motor Skills, 64, 1023–1034. Spain, E. H. (1990). Stereo advantage for a peg-in-hole task using a force feedback manipulator. Proceedings of the SPIE: Stereoscopic Dispays and Applications, 1256, 224–254.
18.
PERCEPTION–ACTION COUPLING
411
Spain, E. H., & Holzhausen, K. P. (1991). Stereoscopic versus orthogonal view displays for performance of a remote manipulation task. Proceedings of the SPIE: Stereoscopic Dispays and Applications II, 1457, 103–110. Subroto, T. H. (1991). 3D Endoscopy, Unpublished Internal Report. Delft University of Technology The Netherlands. Tendick, F., Jennings, R. W., Tharp, G., & Stark, L. (1993). Sensing and manipulating problems in laparoscopic surgery: Experiment, analysis and observation. Presence, 1, 66–81. Voorhorst, F. A. (1988). Affording action, implementing perception–action coupling for endoscopy. Unpublished doctoral dissertation, Delft University of Technology, the Netherlands. Voorhorst, F. A., Mijer, D. W., Overbeeke, C. J. & Smets, G. J. F. (1998). Depth perception in laparoscopy through perception–action coupling. Minimally invasive therapy and allied technologies, 7(4), 325–334. Voorhorst, F. A., Overbeeke, C. J., & Smets, G. J. F. (1997). Using movement parallax for 3D endoscopy. Medical Progress Through Technology, 21, 211–218. Voorhorst, F. A., Overbeeke, C. J., & Smets, G. J. F. (1997). Spatial perception during laparoscopy. Proceedings of Medicine Meets Virtual Reality, 379–386. Wade, N. J. (1996). Frames of reference in vision. Minimally Invasive Therapy, and Allied Technologies, 5, 435–439. Walk, L. (1966). The history of gastroscopy. Clio Medica, 1, 209–222. Wenzl, R., Lehner, R., Vry, U., Pateisky, N., Sevelda, P., & Husslein, P. (1994). Three-dimensional video-endoscopy: Clinical use in genoecological laparoscopy. The Lancet, 1621–1622. Zobel, J. (1993). Basics of three-dimensional laparoscopic vision. Laparoscopic Surgery 1, 36–39.
19 Psychological and Physiological Issues of the Medical Use of Virtual Reality Takami Yamaguchi, M.D. Department of Mechanical and Systems Engineering Nagoya Institute of Technology
Virtual reality is a novel technology recently developed in primarily engineeringoriented fields such as the military and entertainment industries. Its potential, however, is thought to be vast and deep, involving a very wide variety of related fields. Medical applications are among those most frequently referred to in the literature, and an increasing number of developments in this area have been reported (e.g., Satava & Jones, 1998, 2002). These include surgical simulators, endoscopic examination and surgery trainers, and education tools for medical and comedical personnel. Sophisticated computer technology has made it possible to “walk” through the inside of the human body (Ackerman, 1998) and to learn safe and effective manipulation techniques using endoscopes without harming real patients. It should be pointed out, however, that there are few attempts among these primarily technological approaches (to our best knowledge) in which the patients and their welfare are of primary concern. Most current developments are primarily aimed at the introduction of new technologies into medical diagnosis and treatment. In other words, the driving motivation behind their development appears to be technology centered as opposed to use centered or patient centered. We have proposed a novel medical care system referred to as the Hyper Hospital. It is constructed on an electronic or computerized information network and uses 413
414
YAMAGUCHI
virtual reality (VR) as the principal human interface (Yamaguchi et al., 1994). The major purpose of the Hyper Hospital is to restore human interaction between patients and various medical caregivers by increasing contact between them in a real medical environment, compared with that currently provided in conventional medical practice (Entralgo, 1969; Foster & Anderson, 1978). The Hyper Hospital is intended to be built as a distributed, computerized information network. Each node of the Hyper Hospital network serves as a part of a networked medical care system and shares activities from a variety of medical care facilities. Of these facilities, the most important is the personal VR system, which is designed to support each patient by providing private, individual contact with the Hyper Hospital network. This system should be composed of highly sophisticated personal computers (PCs) with very high-speed graphics subsystems, a high-speed interface for the communication network, and very large amounts of internal and external memory. All of these items were just a dream in the past, but are now readily available thanks to very rapid advancement of technology. Among these features, two are of particular interest. First, the significance of very high-speed PCs with large amounts of memory storage should be noted. These enable us to devise a patient-centered medical system, which is the most critical concept underlying the Hyper Hospital system. Through the use of this technology, all relevant medical information, either diagnostic or therapeutic, from simple text to gigabyte-scale imaging data, such as X ray, CT, and MRI, can be handed over to and stored by the patient at home. The storage of one’s own medical records and information is a central element in establishing a patient’s right in future virtual medical environments, but may also underlie the open and distributed medical information storage system of the future (Yamaguchi, 1997). Another important feature enabled by advancement of the relevant technology is the availability of highly sophisticated human interface techniques, particularly those related to VR approaches. Using a VR methodology, patients will be able to “meet” and consult with medical caregivers in a private and secure virtual environment by using the personal VR system connected to the Hyper Hospital network. In addition, they can seek care and assistance from remote sites as if they were visiting the out-patient office, rehabilitation center, or other facility. However, to successfully apply VR technology in this situation, we need to be very careful regarding its safety. The safety of this technology is an issue that deserves widespread attention, but which has not been studied extensively. MonWilliams, Wann, and Rushton (1993) reported the short-term effects on binocular stability of wearing the conventional head-mounted display (HMD) used to explore a virtual reality environment. Although they found clear signs of induced binocular stress in several of their subjects, the physical stresses were not studied extensively. Pausch, Crea, and Conway (1993) presented a literature survey of the virtual environment of military flight simulators, focusing on the visual systems and simulator sickness (see also Kennedy et al., this volume). However, the literature mentioned in this report consists mainly of military research reports, which are difficult to obtain. Almost all of these studies, however, appear to be on “motion
19.
MEDICAL USE OF VIRTUAL REALITY
415
sickness,” and there were very few that deal with fatigue and other potential risk factors. By contrast, the workload induced by ordinary video display terminals has been studied by a number of authors (Gao, Lu, She, Cai, Yang, & Zhang, 1990; Lundberg, Melin, Evans, & Holmberg, 1993). These studies found that physiological parameters reliably indicated the magnitude of physical and psychological fatigue and stress. Consequently, we examined the safety of our own virtual reality system from physiological, neurological, and psychological perspectives (Igarashi et al., 1994). The development and examination of the safety features of our virtual reality technology have been conducted simultaneously. So far, our studies have been based on an ergonomics framework for analyzing work, stress, and strain (Cox, 1978, 1985; Rohmert, 1987). The ergonomics framework includes various human physiological and psychological factors and is therefore a suitable framework for evaluating the safety of our virtual reality system. We therefore examined circulatory physiological parameters, such as heart rate, its variance, and blood pressure. These are routinely measured in occupational stress studies and are widely accepted measures of physical and mental workloads (Smith, 1987).
THE HYPER HOSPITAL VR SYSTEM Although it is not always easy to define what virtual reality is and what it might be able to accomplish, we can perhaps agree that there are certain prerequisites to building a functional VR system. In our experience, these include a high-speed graphics computer, with various types of custom hardware, such as a head-mounted stereoscopic display, and software that allows one to create the virtual world that is displayed. Though some types of expensive graphics work stations were thought to be necessary to create acceptable virtual world as recently as several years ago, very rapid advancement of hardware components of PCs, especially 3-D graphic accelerator boards, has almost replaced such expensive machines. Visual output devices are not completely exempt from this problem with rapid obsolescence. To immerse the subject in the graphics environment, we developed our own customized head-mounted display (HMD) hardware for the virtual reality system. We have developed several versions of the HMD, and the one used in this experiment (Version T-5) was the most recent version developed in our laboratory. After the development of T-5, we have decided to purchase commercial HMDs because they have become less expensive and of increasingly sophisticated optical design—a feature requiring enormous initial costs if developed in a laboratory. The HMD we used in the current series of studies consisted of an ultrasound point-ofview detector (designed and fabricated in our laboratory) and two liquid-crystal graphic-image displays (LCD—SONY XC-M07, which has 100,000 pixels) with an optical system that magnifies the graphic images approximately 15 times. Our HMD gave a wider field of view (more than 90 deg) than that obtained with
416
YAMAGUCHI
FIG. 19.1. A representative view of the inside of a virtual out-patient office of the proposed Hyper Hospital system.
commercially available HMDs at that time. The total weight of the HMD was approximately 1.0 kg, which varied slightly depending on the attachments used for personal adjustment. The software that created our virtual space was also developed from scratch in our laboratory. Its innovation lies in the mode of interaction between the user and the virtual world. We invented a virtual world building system that allows the user, as well as the developer or administrator, to modify the look and feel of the virtual world. The technical details of this software system were reported separately (Hayasaka, Nakanishi, & Yamaguchi, 1995). The graphics images generated to view the virtual world are displayed stereoscopically in the HMD. In Fig. 19.1, a representative view of the Hyper Hospital (virtual out-patient office) created by our software is shown.
EXPERIMENTAL SETUP AND MEASUREMENTS Subjects To measure the stress induced while using virtual reality and the resultant mental, psychological, and physical fatigue, we performed two series of controlled experiments using healthy, young, male participants. In the first preliminary series
19.
MEDICAL USE OF VIRTUAL REALITY
417
of experiments (Igarashi et al., 1994), 20 healthy, young, male student volunteers were participants. To screen the effective measurements, we made several different groups of measurements. They included various physiological and biochemical measurements, neurological tests, and psychological and subjective measures related to the VR experiences. Based on the first-series experimental data, we designed a second series of controlled study using 12 young, male student volunteers. In the following, we concentrate on the second series of experiments. None of 12 participants had any type of previous experience with virtual reality or using an HMD. The data obtained from two subjects was ultimately excluded from this study because they did not conform to the experimental protocol and smoked during the experiment. The average age of the 10 participants included in the study was 20.6 years. Although it was believed that this study would not harm the participants in any way, the requirements of the Declaration of Helsinki were strictly observed. Before giving consent, each participant was fully informed of the experimental procedure so that he understood the nature of the study, and he was made aware that he was free to withdraw from the experiment at any time. Because there was no formal authority to oversee human experiments at the university where this study was conducted, no such approval was sought. All the experiments were carried out in the presence of and under the supervision of a licensed medical doctor. Protocol of Stress Loading The experiment consisted of two parts: the first being the control condition and the second the VR-experience condition. In the control condition, participants were asked to watch an environmental video on a normal home TV set in a dimly lit room shielded from outside light and noise (Fig. 19.2a). The conditions of the participant, such as the time of day, meal schedule, and ban on smoking, were carefully matched in the subsequent VR-experience condition. Each volunteer participated in the control condition the day before participating in the VR-experience condition. The VR sessions consisted of two parts: a low-stress load period and a highstress load period. As shown in Fig. 19.2b, the stress loads were further subdivided into “psychological” load and “visual” load components. For the psychological load, in each VR-experience session, each participant was interviewed in the virtual reality space by an electronic nurse, displayed in the HMD. To determine whether the responses differed according to the appearance of the interviewer, we randomly chose one of two faces for the nurse, either a cartoon image (Fig. 19.3) or a photographic image (Fig. 19.4). The response to the different images was analyzed separately as part of another series of experiments, and will be reported elsewhere. The participant then watched moving graphical objects from the perspective of a roller-coaster ride in the HMD. Two different speeds were used. In the first half of the VR interview session, the participants were asked lowstress questions about their daily lifestyle for 10 min. In the second half of the session, they were asked potentially stressful questions, such as those relating to
418
FIG. 19.2. (a) Protocol for the control experiment; (b) Protocol for the VR-experience experiment.
(a)
FIG. 19.2. Appearance of the cartoon electronic nurse used in the virtual interview sessions.
FIG. 19.3. Appearance of the photographic electronic nurse used in the virtual interview sessions.
419
420
YAMAGUCHI
their private or sexual behavior, for 10 min. During these sessions, only a limited number of necessary coworkers were allowed to watch the experiments, and the privacy of the participants was fully respected. The responses to these interviews were recorded on a video recorder and kept in strict confidentiality. Measurement Protocol We conducted various measurements to assess the effects of using our virtual reality system. These are grouped into several related categories. Some of the procedures were improved, based on the experience obtained with the previously reported experiments. Consequently, we will here only briefly discuss the methods used for the physiological and psychological measurements used in this study. For details of the experimental procedures, and particularly for the finer aspects of the measurement techniques, refer to the previous report (Igarashi et al., 1994). A brief description of the schedule that was followed in the experiment is outlined here. As shown schematically in Figs. 19.2a and 19.2b, the total lengths of the control and VR-experience experiments were matched. The participants were asked to report to the experiment coordinator 1 h before the experiment started. Except for water loading, the subjects fasted from this time until the end of the experiment. The participants were asked to empty their bladder completely before drinking 500 ml of tap water. This water loading was used to collect urine once the experiment started. The preexperimental procedures included answering several questionnaires, having core temperature measured, and taking the Uchida– Kraepelin test (Kashiwagi, & Yamada, 1995; Ujihara, Ogawa, Higashiyama, Murase, & Yamanaka, 1992). The first urine sample was then collected, and the participant was asked to wait until the designated start time. In the control experiments, after measuring the blood pressure and obtaining an electrocardiogram (ECG), the participant watched a relaxing video program (a so-called environmental video program, showing a scene taken from the northern seashore from a fixed camera angle) for 20 min. During this interval, the participant was asked to relax in a chair and gaze at the 14-in. TV monitor. Subsequently, the ECG and blood pressure were recorded. The same tasks given before the session were repeated. That is, the participants answered a questionnaire on subjective fatigue, their core temperature was measured, and the Uchida–Kraepelin test was given. There was an interval of 40 min between the first and second sessions. The second session was exactly the same as the first. A second urine sample was taken after the second session. The VR-experience session was designed to start at the same time the day after the control experiment. The participant was asked to report at the same time as for the control experiment. Although we could not perfectly match the conditions between the control and VR-experience experiments, the participants were asked to act normally between the two experiments and get a good night’s sleep. The participants were given the same amount of water and the same preexperiment
19.
MEDICAL USE OF VIRTUAL REALITY
421
measurements (questionnaire, temperature, Uchida–Kraepelin test, and urine sampling). Afterward, the participant lay on a bed, wearing the HMD. The ECG was monitored continuously and blood pressure was monitored every 5 min with an automatic sphygmomanometer. This blood pressure measurement takes approximately 1 min to complete, so a 5-min interval was thought appropriate. The participants were given the low-stress interview and then shown the slowly moving motion picture. The same measurements as performed in the preexperiment were repeated. Then the participants were given the high-stress interview, followed by the high-speed motion picture. After the final VR-experience session, the same set of measurements was repeated. The exact timing is illustrated in Figs. 19.2a and 19.2b. Methods of Measurement Physiological Measurements The first group of measurements comprised physiological circulatory parameters, which were monitored continuously throughout the VR and control sessions. The ECG, blood pressure, and core body temperature measures were obtained as described above. The ECG was measured continuously throughout the VR session. Blood pressure was measured by an automatic sphygmomanometer at 5-min intervals throughout the VR session. The core (brain) temperature was measured with a radiation thermometer (OMRON MC-500) that measured the temperature of the left tympanic membrane. Biochemical Measurements The second group of measures consisted of biochemical measurements. The urinary excretion of three catecholamines (dopamine, epinephrine, and norepinephrine), which are indices of stress and fatigue, was measured. It is wellknown that excretion of these catecholamines also represents overall autonomic nerve activity. The catecholamines were measured in the control experiments at the same time of day as in the other experiments because of the circadian rhythm of catecholamine production (Cox, 1978, 1985; Frankenhauser, 1991; Igarashi et al., 1994). Physical condition and food intake also affect the release of catecholamines, and some of the participants were excluded from the analysis because of their noncompliance with the experimental protocol in this respect (Levy, 1972). The subjects fasted for a total of 3 hr, from the beginning of the hour before until the end of the hour after the 1-hr session. One hour before the session, they were asked to completely empty their bladders and then to drink 500 ml of water. Urine samples were taken just before, just after, and 1 hr after the end of the session (Levy, 1972). The total amount of urine was also measured. We immediately froze the urine samples and later measured the concentration of the three catecholamine fractions in these samples in a batch mode by using a high-performance liquid chromatography method.
422
YAMAGUCHI
Psychological Measurements and Subjective Fatigue Finally, some psychological examinations were conducted before and after each experiment. Objective psychological fatigue was measured using a modified Uchida–Kraepelin test, a validated measure of the capability to process information as a function of fatigue (Kashiwagi & Yamada, 1995; Kuraishi, Kato, & Tsujioka, 1957; Ujihara et al., 1992). Subjective fatigue was rated by using a standard questionnaire, and a general impression of the use of the VR system was obtained by another questionnaire (Igarashi et al., 1994).
RESULTS Physiological Parameters The time course of the heart rate was measured by averaging the R–R interval of the ECG for each minute. As shown in Fig. 19.5, the heart rate decreased continually during the control experiments and was lowest following the end of the control experiment. In contrast, the heart rates during the VR-experience experiments did not change in a consistent manner. They were statistically stable and were somewhat higher than in the control experiments. As described earlier in this paper, the VR-experience experiments were divided into two parts, an initial low-stress part and a subsequent high-stress part. In both parts, the virtual interview was followed by continuous exposure to a scene resembling the view from a moving roller coaster, which we referred to as the motion picture. It was interesting that viewing the motion picture stabilized rather than changed the heart rate. This is shown in Figs. 19.6 and 19.7, which illustrate the results of the low-stress and high-stress experiments, respectively. This stabilizing effect is more obvious in the high-stress portion shown in Fig. 16.7. In Fig. 16.7,
FIG. 19.4. Change in heart rate during the control experiments.
19.
MEDICAL USE OF VIRTUAL REALITY
423
FIG. 19.5. Change in heart rate during the first half of the low-intensity stress portion of the VR-experience experiment.
FIG. 19.6. Change in heart rate during the second half of the low-intensity stress portion of the VR-experience experiment.
the heart rate decreased even though the subject was asked continual high-stress questions on sexual behavior and school achievement. The core temperature was measured by a novel method, which used a radiation thermometer applied to the tympanic membrane of the ear. With this method, we can measure a temperature that approximates that of the brain. We measured the core temperature for two reasons. First, surface temperature varies significantly according to the environment and is difficult to measure reproducibility with normal methods. Second, the core temperature follows a circadian rhythm in humans and varies with the awakening state.
424
YAMAGUCHI
FIG. 19.7. The change in core (brain) temperature measured by a radiant thermometer applied to the tympanic membrane of the ear.
FIG. 19.8. Blood pressure measured in the control experiments.
The core temperature was lower in the VR-experience experiments than in the control experiments, particularly at the end. The difference between the VR experience and the control experiments was statistically significant (Fig. 19.8). Blood pressure was measured with an automatic sphygmomanometer in the VRexperience experiments, whereas a manual technique was used to measure blood pressure in the control experiments. In the control experiments, only the systolic pressure decreased significantly during the experiment (Fig. 19.9). The mean and diastolic pressures did not change significantly. On the other hand, in the VRexperience experiments, all three blood pressure parameters, the systolic, diastolic, and mean pressures, decreased significantly during the experiments (Fig. 19.10 and Fig. 19.11). The magnitude of the decrease in the blood pressure, however, was larger in the control experiments than in the VR experiments.
19.
MEDICAL USE OF VIRTUAL REALITY
425
FIG. 19.9. Blood pressure changes measured with an automatic sphygmomanometer in the VR-experience experiments during the high-stress interview and the highintensity motion picture.
FIG. 19.10. Blood pressure changes measured with an automatic sphygmomanometer in the VR-experience experiments during the low-stress interview and the lowintensity motion picture.
Biochemical Parameters The urinary excretion of all three catecholamines (epinephrine, norepinephrine, and dopamine) was significantly elevated after the control experiments relative to the initial values. In the VR-experience experiments, there was no significant increase in catecholamine excretion before or after the low-stress experiment. However, the urinary excretion of epinephrine and dopamine increased by 70%
YAMAGUCHI
Total Urinary Epinephrine Release (µg)
426
FIG. 19.11. Urinary excretion of epinephrine during the VR-experience experiments.
after the high-stress experiment. This was statistically significant. The urinary excretion of epinephrine is shown in Fig. 19.12. There was no increase in norepinephrine excretion after either the low- or high-stress VR experiences. Mental or psychological excitement and the resulting fatigue are known to increase the level of epinephrine in the blood and urine, (Akerstedt & Levi, 1978; Hartley, Arnold, Smythe, & Hansen, 1994; Levy, 1972; Smith, 1987). Unlike epinephrine, norepinephrine is believed to be related to physical fatigue. In our previous series of similar experiments (Igarashi, 1994), we found that only epinephrine increased in the VR experience. This finding agrees with the current observations. In terms of catecholamine release, the VR was less fatiguing than the dull environmental video.
Psychological and Subjective Measures of Fatigue No significant changes in the results of the Kraepelin-type test were found for any combination of the conditions. Although statistically not significant, the variability index of the Kraepelin score was larger in the control experiments than in the VRexperience experiments. Particularly in the control experiments, the variability of the Kraepelin score increased during the experiments and stabilized after the experiments. There was no consistent trend in the change in the Kraepelin scores during the VR experiments. In the analysis of the answers given in the standard questionnaire for subjective fatigue, the most frequent complaints were those related to sleepiness and easy fatigability. The second most frequent complaints were about difficulty in paying attention and concentrating. Complaints of physical fatigue were the least frequent. Complaints related to mental or psychological fatigue were significantly
19.
MEDICAL USE OF VIRTUAL REALITY
427
more frequent. This trend occurred in both the control and VR-experience experiments. The number or severity of complaints increased significantly after both experiments. In other words, the VR experience did not produce more complaints than viewing the control video.
DISCUSSION AND CONCLUSION In this study, healthy, young male participants underwent a 30-min controlled exposure to a VR environment, which included two medical interviews in the virtual space, of different intensities. Motion picture scenes with different intensities were also presented to the participants during the VR experiments. In the control experiments, the participants were shown a normal environmental video. Several indices of physiological, biochemical, neurological, and psychological fatigue were measured before, during, and after both the control and VR-experience experiments. As noted in our first series of similar experiments, there were no significant changes in the measured objective parameters of fatigue in the VR experience when compared to the control video presentation. As previously described, most previous studies on the effects of VR technology have focused on visual side effects and motion sickness. Visual fatigue and vestibular disturbances have been studied extensively, particularly with respect to flight-simulator–type applications (Kennedy et al., this volume; Kennedy et al., 1993; Pausch et al., 1993). Although most of these reports are difficult to obtain because of their military nature, discrepancies between the motions of the head and body and visual function were thought to be important. In such situations, the resolution of the display may be important in terms of the images created on the retina. However, visual and vestibular functions were not the focus of the present study, so the resolution of the displays (the LCD displays used in the HMD and TV monitor used in the control experiments) was not matched. We hoped to determine the safety of using our VR system in a medical environment. In contrast to the commercial or recreational use of VR, we do not have to assume that a very strong stimulus is exerted on the user in a medical environment. This is the major reason why we designed this study as we did. In other words, the main reason is that we intend to use VR in a medical environment, so we wanted to know what effect VR interviews might have. We have already conducted a human ethological study to test the responses of normal participants under similar conditions (Yoshida, Yamaguchi, & Yamazaki, 1994). In that study, we tested several combinations of interviewers, such as a real human interviewer versus an active TV interviewer, an active TV interviewer versus a recorded one, and so forth. There were no significant differences in terms of fatigue. This was another reason that an extensive comparative study was not conducted in the present study. It is also noted that no significant differences in the physiological measurements were found when comparing the motion picture experiences at two different speeds. We hypothesized that some detectable visual or vestibular disturbances might
428
YAMAGUCHI
occur as the speed increased. However, we did not find any differences in the physiological parameters measured in the present study. No subjective disturbances were reported in the questionnaire after the session. Therefore, we concluded that the speed was not high enough to induce significant physiological disturbances. This point will be addressed in a future study, which will incorporate a more sophisticated method to measure visual and vestibular functions. The results showed that there was subjective fatigue, particularly related to the subject’s psychological condition. The urinary catecholamine excretion of epinephrine and dopamine increased in the VR-experience sessions, as well as in the control video sessions. There was no increase in the norepinephrine excretion in the VR experiments, while it increased in the control video session. It is commonly believed that increased epinephrine excretion is related to mental or psychological excitement and fatigue, whereas increased norepinephrine release is related to physical stress. Based on this, we speculate that the use of VR in the present context does not have much affect on physical condition, whereas it may slightly affect the mental or psychological state, although not to the extent of normal TV viewing. The most frequent complaints were of psychological fatigue, which agree with the excretion of epinephrine as an indicator of mental or neural stresses. The parameters measured in this study may be criticized as being relatively insensitive to fatigue or physical influences because the results did not significantly distinguish between the effects of the video sessions and the VR sessions. However, the epinephrine excretion increased in both VR load sessions when compared to the control (before and after the sessions). This suggests that the parameters were sensitive enough to detect the very slight workload from watching TV, but that there was no distinguishable difference between the workloads in the control and VR sessions. The major motivation of the present series of studies is to apply VR technology to more humanistic purposes. As Sheridan (1993) pointed out, a virtual environment is not new or inherently bad. However, virtual environments already enable users to experience many types of violence: fighting with animals and other humans, driving a vehicle at high speed, and shooting, bombing, or doing worse violence to other people. It was in this context that Stone (1993) discussed the importance of social interaction and social development in virtual environments. She classified possible directions of VR technology into four types with combinations of two key words, social and creative. In her analysis, she concluded that nonsocial and noncreative VR might be hazardous to humanity. We totally agree with her conclusion, and we are very apprehensive about the boom in medical applications. As has been frequently pointed out, there is a serious lack of human contact in modern medical care because of the technological improvements of the last few centuries. The introduction of technology has been accused of bringing about the loss of human contact. These problems are most serious in chronic care situations and in the medical care of the elderly. In these settings, modern medical therapeutic measures, such as drugs and surgery, are not as effective as they are in treating acute diseases. The most important and necessary
19.
MEDICAL USE OF VIRTUAL REALITY
429
measure in this situation is often how to establish human or spiritual contact with patients and how to give them more psychological or mental support. These are the reasons that we proposed the novel concept of an online medical care system constructed in a virtual environment called the Hyper Hospital. It may seem paradoxical to restore lost humanity by using another new technology. However, it is obvious that a simplistic antitechnology philosophy does not help to solve modern problems. We believe that what has been lost to technology can be reconstructed with technology. To implement the Hyper Hospital, it is mandatory to develop human–machine interfaces utilizing VR technology, such as a cybernetic interface (Mitsutake et al., 1993), that will accommodate people with physical and mental diseases and disabilities of different types and severity. Patients must also be allowed to behave as the principal participants in the VR environment in order to restore their active participation in the medical scene. We have developed a special VR software framework that allows participants to modify their VR world (Hayasaka et al., 1995). This development must be based on ethological, psychological, and physiological studies of the behavior of normal and ill people (Yoshida et al., 1994). Virtual reality technology can be applied to humans in the medical care environment only after these fundamental studies have been made. In this context, the present study can assure us that it is safe to proceed with the development of such virtual reality systems. In conclusion, when used for a relatively short time (1 hr), our VR system did not cause significant physical effects in healthy young subjects in terms of circulatory and autonomic nerve activity. In other words, the safety of our VR system was confirmed in terms of physical fatigue and stress. Some of the subjective fatigue mentioned by the participants after the examination was thought to be related to the design of the HMD used in these experiments. This was also pointed out in our previous studies, and we redesigned our HMD, improving the graphic quality and significantly decreasing its weight. However, although better than the older model, the new version of our HMD is still somewhat uncomfortable. To date, we have conducted studies using healthy, young, male participants mainly because of the availability of homogeneous samples. It is also necessary to conduct similar studies with females, sick people, and participants that are physiologically and psychologically abnormal. Because the ultimate goal of these studies is to apply virtual reality technology to real medical practice, various possible situations should be considered.
ACKNOWLEDGMENTS The author is grateful to the following foundations for their financial support: A grant from the Japan Foundation for Aging and Health, for 1993–1995, and its renewal for 1996–1998 entitled “A functional training system for aged people using a virtual reality system.” A grant from the Nissan Science Foundation, 1995–1997,
430
YAMAGUCHI
entitled “Safety features of the virtual reality technology.” Grants-in-aid of scientific research on Priority Areas from the Ministry of Education, Science and Culture, No. 07244219, “Study of the Safety Features of Virtual Reality,” and No. 08234223, “Study of the Medical Application of Networked Virtual Reality.” Special coordination funds from the Science and Technology Agency of the Japanese government, 1995–1997, “Research Program on Network Applications for Medical Research Using the Inter-Ministry Research Information Network.” The author would like to acknowledge the support of students in the Department of Biomedical Engineering, School of High Technology for Human Welfare, at Tokai University. They are Y. Sugioka, N. Furuta, K. Shindo, T. Kobayashi, Y. Takahashi, T. Nakayama, Y. Yamamoto, G. Goto, M. Sudo, and N. Yamaoka. The authors also wish to thank Dr. K. Yamazaki for his advice in the experiments, and Dr. T. W. Taylor for his help in preparing the manuscript.
REFERENCES Ackerman, M. J. (1998). The visible human project. Proceedings of the IEEE, 86, 504–511. Akerstedt, A., & Levi, L. (1978). Circadian rhythms in the secretion of cortisol, adrenaline and noradrenaline. European Journal of Clinical Investigation, 8, 57–58. Cox, T. (1978). Stress. London: Macmillan. Cox, T. (1985). The nature and measurement of stress. Ergonomics, 28, 1155–1163. Entralgo, P. L. (1969). Doctor and patient. London: Weidenfeld & Nicolson. Foster, G. M., & Anderson, B. G. (1978). Medical anthropology. New York: Wiley. Frankenhauser, M. (1991). The psychophysiology of workload, stress and health: Comparison between the sexes. Annals of Behavioral Medicine, 13, 197–204. Gao, C., Lu, D., She, Q., Cai, R., Yang, L., & Zhang, G. (1990). The effects of VDT data entry work on operators. Ergonomics, 33, 917–924. Hartley, L. R., Arnold, P. K., Smythe, G., & Hansen, J. (1994). Indicators of fatigue in truck drivers. Applied Ergonomics, 25, 143–156. Hayasaka, T., Nakanishi, Y., & Yamaguchi, T. (1995). An interactively configurable virtual world wystem. IEICE Transactions on Communications, E78-B, 963–969. Igarashi, H., Noritake, J., Furuta, N., Shindo, K., Yamazaki, K., Okamoto, K., Yoshida, A., & Yamaguchi, T. (1994). Is virtual reality a gentle technology for humans?—An experimental study of the safety features of a virtual reality system. IEICE Transactions on Information and Systems, E77-D, 1379–1384. Kashiwagi, S., & Yamada, K. (1995). Evaluation of the Uchida–Kraepelin psychodiagnostic test based on the 5-factor model of personality traits. Japanese Journal of Psychology, 66, 24–32. Kennedy, R. S., Lane, N. E., Lilienthal, M. G., Berbaum, K. S., & Hettinger, L. J. (1993). Profile analysis of simulator sickness symptoms: application to virtual environment systems. Presence, 1, 295–301. Kuraishi, S., Kato, M., & Tsujioka, B. (1957). Development of the “Uchida–Kraepelin Psychodiagnostic Test” in Japan. Psychologia, 1, 104–109. Levy, L. (1972). Methodological considerations in psychoendocrine research. Acta Medica Scandinavica (Suppl. 528) 28–54. Lundberg, U., Melin, B., Evans, G. W., & Holmberg, L. (1993). Physiological deactivation after two contrasting tasks at a video display terminal: Learning vs. repetitive data entry. Ergonomics, 36, 601–611.
19.
MEDICAL USE OF VIRTUAL REALITY
431
Mitsutake, N., Hoshiai, K., Igarashi, K., Sugioka, Y., Yamamoto, Y., Yamazaki, K., Yoshida, A., & Yamaguchi, T. (1993). Open sesame from the top of your head—An event-related potential based interface for the control of a virtual reality system. Proceedings of the 2nd IEEE International Workshop on Robot and Human Communication (IEEE Catalog No. 93TH0577-7, pp. 292–295). Mon-Williams, M., Wann, J. P., & Rushton S. (1993). Binocular vision in a virtual world: visual deficits following the wearing of a headmounted display. Ophthalmic & Physiological Optics, 13, 387–391. Pausch, R., Crea, T., & Conway, M. (1993). A literature survey for virtual environments: Military flight simulator visual systems and simulator sickness. Presence, 1, 344–363. Rohmert, W. (1987). Physiological and psychological work load measurement and analysis. In Salvendy (Ed.), Handbook of human factors (pp. 348–368). New York: Wiley. Satava, R. M., & Jones, S. M. (1998). Current and future applications of virtual reality for medicine. Proceedings of the IEEE, 86, 484–489. Satava, R. M., & Jones, S. M. (2002). Medical applications of virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications, Mahwah, NJ: Lawrence Erlbaum Associates. Sheridan, T. B. (1993). My anxieties about virtual environments. Presence, 2, 141–142. Smith, M. J. (1987). Occupational stress. In Salvendy (Ed.), Handbook of human factors (pp. 706–719) New York: Wiley. Stone, V. E. (1983). Social interaction and social development in virtual environments. Presence, 2, 153–161. Ujihara, H., Ogawa, K., Higashiyama, H., Murase, T., & Yamanaka, Y. (1992). Shinri Rinsho Daijiten [Encyclopedia of Psychology Clinic] (pp. 442–446) Tokyo: Baifukan. Yamaguchi, T. (1997). Performance tests of a satellite-based asymmetric communication network for the “Hyper Hospital.” Journal of Telemedicine and Telecare, 3, 78–82. Yamaguchi, T., Furuta, N., Shindo, K., Hayasaka, T., Igarashi, H., Noritake, J., Yamazaki, K., & Yoshida, A. (1994). The Hyper Hospital—A networked reality-based medical care system. IEICE Transactions on Information and Systems, E77-D, 1372–1378. Yoshida, A., Yamaguchi, T., & Yamazaki, K. (1994). Quantitative study of human behavior in virtual interview sessions for the development of the Hyper Hospital—A network-oriented virtual reality– based novel medical care system. IEICE Transactions on Information and Systems, E77-D, 1365– 1371.
PART III: Adaptive Environments
20 Supporting the Adaptive Human Expert: A Critical Element in the Design of Meaning Processing Systems John M. Flach∗ Psychology Department Wright State University
Cynthia O. Dominguez Human Engineering Division, Armstrong Laboratory Wright-Patterson Air Force Base
Pilots and other operators live in a world of irreducible uncertainty, of “unknown unknowns,” where they must deal with what was not known not to be known. (Rochlin, 1997, p. 215) No plan survives contact with the enemy. (Chapman, 1987, pp. 91–92)
A complex system might be characterized as a system where even God doesn’t know what will happen next. This is certainly not intended as a theological statement. Rather, God is used here as a metaphor for the designer—the “creator” of the system. The point is that the behavior of a complex system will sometimes surprise * Correspondence regarding this chapter should be directed to John Flach, Department of Psychology, Wright State University, Dayton, OH 45435.
433
434
FLACH AND DOMINGUEZ
even the designer or creator of that system. This poses a difficult challenge, particularly for systems that operate in risky environments where unexpected behaviors may lead to catastrophic accidents (e.g., nuclear power plants, surgery, aviation, or combat). What can be done to reduce the probability of “normal accidents” with these complex systems (Perrow, 1984)? How can these systems be designed so that they can adapt to the demands of unanticipated situations? This chapter will discuss the problem of engineering adaptive cognitive systems. Cognitive systems engineering has evolved from studies of system safety in the nuclear industry (Rasmussen, 1986) and is gradually being extended to other complex domains (Rasmussen, Pejtersen, & Goodstein, 1994; Vicente, 1999). This chapter begins with a general introduction to the adaptive control problem. Next, the human operator is introduced as an important component in complex systems that are capable of implementing a wide range of adaptive control strategies. The next two sections address the problem of meaning. The first of these sections considers the objective basis for meaning in work domains. This section focuses on the objective functional demands to which a system must adapt within a complex domain. It considers how to identify these demands and how to represent these demands to a human operator. The second section considers the meaning-processing capacity of the human operator. The last major section of this chapter considers how the human component can be tuned to the demands of the adaptive control task through training in simulated work environments.
THE GENERAL ADAPTIVE CONTROL PROBLEM With a predictive model, and a verified theory, an organization can exercise “control”; it can make precise corrections by comparing feedback from its actions with the predictions of the model. If the organization’s knowledge is sufficiently comprehensive, there will be very few unanticipated events around which it has to improvise. (Rochlin, 1997, pp. 189–190)
Are the conditions for control, outlined by Rochlin (i.e., “predictive model,” “verified theory,” and “knowledge [that] is sufficiently comprehensive”), ever satisfied for complex systems? Certainly, the conditions are not met adequately to satisfy the requirements for the simplest type of feedback control (i.e., the classical servo-mechanism or more sophisticated pursuit control system). A simple feedback control system will not function properly unless it is carefully tuned to the requirements of the control task. This fact is often not appreciated by those with only a passing knowledge of control systems. Servo-mechanisms must be tuned to the dynamics of the system being controlled and the bandwidth of the signal to be followed if they are to regulate performance in the face of disturbances. This is illustrated in the crossover model of human tracking, in which the human controller adjusts his control strategy to satisfy the global demands for stability
20.
MEANING PROCESSING SYSTEMS
435
FIG. 20.1. Two types of control systems. The Pursuit Control System utilizes both feed-forward and feedback control loops to regulate performance in the face of disturbances. The Gain Scheduling System adds an adaptive loop in which different preset control algorithms can be selected as a function of a measured property of the output.
(e.g., see Flach, 1990). Utilizing knowledge of the plant to anticipate outputs can further enhance stability of control. This anticipation allows feed-forward control in which the controller can respond directly to a command without the presence of any error resulting from feedback. Figure 20.1 illustrates a pursuit controller. Both feedback and feed-forward control must be carefully tuned to the demands of the control task to function properly. If the task demands change from those for which the controller was design, then instability is a likely consequence. For example, an automatic control system that can optimally regulate flight at low altitudes may become unstable at higher altitudes due to changes in aircraft handling qualities. Thus, the stability of a simple control system depends on the designer’s
436
FLACH AND DOMINGUEZ
ability to anticipate the task dynamics and design the control algorithms to fit the task context. If the task context changes in ways not anticipated by the designer, instability and catastrophic breakdown are likely. Adaptive control systems have been developed in attempts to deal with some of the uncertainties of complex systems. Three general types of adaptive control systems are gain scheduling, model reference control, and self-tuning regulators. Gain Scheduling One of the first applications of gain scheduling was in the design of automatic flight-control systems. As noted above, the handling qualities of an aircraft can change as a function of altitude so that a control algorithm (e.g., gain) that leads to stable flight at lower altitudes may lead to an unstable response at higher altitudes. With gain scheduling, multiple control algorithms are stored in the automatic control system and the correct algorithm can be selected as a function of where the system is in the functional problem space (e.g., the altitude). This selection from a set of predetermined control algorithms is illustrated in Fig. 20.1 in terms of the table look-up function. Gain scheduling allows the system to function over a greater range of situations. However, it should be obvious that it is no better at handling unanticipated variability than the simple servo-mechanism, because the designers of the control system must preset algorithms and the criteria for choosing the various algorithms. Gain scheduling allows the system to adapt only to circumstances that the designers were able to anticipate. Model Reference Control A second style of adaptive control is model reference control, shown in Fig. 20.2. This adaptive control system incorporates a model of the expected or ideal transfer function (input–output response) of the complex system. This model or simulation is operated in parallel with the complex system to provide a reference against which the actual behavior of the complex system can be compared. Deviations between the behavior of the system and the predictions of the simulation can then be fed back to the controller and adjustments can be made so that the errors between actual and desired behavior are reduced. In other words, the controller compensates for the deviations so that the controlled system behaves more like the reference model. For example, the low-altitude handling qualities of the aircraft might be included as the reference model. At higher altitudes, the adaptive logic will adjust the control parameters relative to deviations from this reference model in such a way that will result in stable control at the higher altitudes. This style of adaptation begins to address the problem of unanticipated variability. This system has the capability to adapt to situations that were not anticipated by the designer. However, the designer does have to begin with an ideal or normative model of system performance.
20.
MEANING PROCESSING SYSTEMS
437
FIG. 20.2. Two styles of adaptive control are shown. The model reference controller compares performance of the actual system to performance of a normative model. Deviations from the model can be used to adapt control so that the actual system behaves more like the normative model. The self-tuning regulator solves the dual control problem by continuously observing the plant and adjusting the control algorithm to reflect the observations.
Self-tuning Regulator A third style of adaptive control, the self-tuning regulator, is also shown in Fig. 20.2. This control system is design to solve the “dual control problem,” that is, it is both a regulator and an observer. An observer is a system for determining the transfer function of a system from observations of the inputs and outputs of that system. For
438
FLACH AND DOMINGUEZ
classical control systems, the observation problem is solved a priori in the design of the regulator. However, self-tuning regulators incorporate an observer as part of the control system. Thus, the input–output function of the controlled system is continuously observed and the control algorithms are continuously updated to reflect any changes in the transfer function of the controlled system. Thus, in a real sense, the incorporation of the observer allows the controller to continuously redesign itself. The self-tuning regulator is an example of a self-organizing system that is capable of adapting to situations that were not anticipated by the designer. A normative model of system behavior is not required. However, it is important to note that the self-organizing property arises from a competition between the controller and the observer. The observer requires active stimulation of the system so that information is available for inferring the transfer function. However, this “information” (from the perspective of the observer) is “error” from the perspective of the controller. Weinberg and Weinberg (1979) describe this Fundamental Regulator Paradox in the context of driving: The lesson is easiest to see in terms of an experience common to anyone who has ever driven on an icy road. The driver is trying to keep the car from skidding. To know how much steering is required, she must have some inkling of the road’s slickness. But if she succeeds in completely preventing skids, she has no idea how slippery the road really is. Good drivers, experienced on icy roads, will intentionally test the steering from time to time by “jiggling” to cause a small amount of skidding. By this technique, they intentionally sacrifice the perfect regulation they know they cannot attain in any case. In return, they receive information that will enable them to do a more reliable, though less perfect job. (p. 251)
Thus, the designers of a self-tuning regulator must specify the parameters for this competition—what is the appropriate balance between information and error and how this balance is negotiated in real time. For example, a self-tuning regulator for an aircraft may constantly “dither” the control system (that is, stimulate it with high-frequency, low-amplitude inputs). The response to this input would be information that the observer could use to monitor changes in the handling qualities of the craft. The control algorithm would then be tuned in response to any observed changes. The dithering would have little consequence in terms of flight path error but would provide the information needed to adjust parameters of the controller in ways that would ensure stability across changing handling qualities. The dynamics of both the model reference controller and the self-tuning regulator are nonlinear and potentially chaotic. In other words, these adaptive controllers are themselves complex systems with the potential for behaving in ways that were not anticipated by the designer. A small change in the task parameters could lead to large (and potentially catastrophic) changes in the outcome.
20.
MEANING PROCESSING SYSTEMS
439
Each of the different styles of adaptive control requires different kinds of knowledge or understanding on the part of the designer. The gain scheduler requires that the designers can specify satisfactory control algorithms for each region of the problem space, that they can specify the boundary conditions for each control algorithm, and that the system can monitor position relative to the boundary conditions in real time in order to select the appropriate algorithm for each region. The model reference controller requires that the designers have normative expectations about how the system ought to behave and that they can measure deviations from those expectations and translate those deviations into appropriate control adjustments. The self-tuning regulator requires the designer to balance the information requirements for an observer with the control requirements so that the resulting nonlinear, self-organizing system achieves stable control. For many complex systems, designers do not have adequate knowledge for reliably implementing any of the adaptive control strategies presented above. However, there is a component that seems to be capable of combining the best features of each of the adaptive control strategies. This component is the expert human operator. The human operator provides perhaps the best solution for dealing with unanticipated variability in complex work domains.
THE HUMAN EXPERT: AN ADAPTIVE CONTROLLER Indeed, many human factors analysts believe that minimizing human error is the primary goal of any human factors design. If people never made errors, there would be little need for a science of human factors. (Kantowitz & Sorkin, 1983) The main reason why humans are retained in systems that are primarily controlled by intelligent computers is to handle “non-design” emergencies. In short, operators are there because system designers cannot foresee all possible scenarios of failure and hence are not able to provide automatic safety devices for every contingency. (Reason, 1990, p. 182) The smart machine, as it turns out, requires smart people to operate it as well as to maintain and support it. (Rochlin, 1997, p. 146)
The first statement above represents the classical view of human factors. From this perspective, the human operator was considered to be a source of error. A primary goal of human factors design was to protect the system from this source of anticipated variability. Automation was one avenue to protect the system from the unreliability of human operators. For simple problems and simple systems, automatic controls can function far more reliably than human operators. However, as discussed in the previous section and indicated in the second quote, as problems and systems become more complex, the reliability of even sophisticated control systems becomes suspect due to the limited knowledge of the designers. Ironically, the addition of automatic control systems tends to push systems toward greater
440
FLACH AND DOMINGUEZ
complexity. As the third quote above emphasizes, automation has not necessarily decreased the demands on human intelligence. The bad news is that humans make errors. The good news is that humans learn from their errors. This capacity to learn from mistakes is what makes human experts our best hope for designing complex systems. It is somewhat ironic that the variability that was a concern for classical human factors is the source of the creativity and adaptability that has captured the imaginations of cognitive systems engineers. It is a classic case of Weinberg and Weinberg’s (1979) regulator paradox—what is error from the perspective of control is information from the perspective of observation. This is clearly illustrated in the following comments from a surgeon we interviewed in our studies of laparoscopic surgery (Dominguez, 1997): I would be trying to shoot an intraoperative cholangiogram before I’d go ahead and clip that but then again that’s just my own bias from my own previous experience from having a ductile injury. In that girl, [she] had a fairly acute, disease, wasn’t quite as bad looking as this but everything was fine until 5 days post-op when she came back to the office just still puking her guts out. And I’d just destroyed her hepatic duct, her common hepatic duct, because I hadn’t realized where we were and that was an error on my part and I had been fooled by the size of her cystic duct. The stone, it had been a good size stone, it had worked its way down chronically to the cystic duct enlarging it so that it looked like the infundibulum of the gallbladder and then at the common duct junction I thought the common duct was the cystic duct so I went ahead and clipped it, divided and then started cauterizing. Well when you cauterize up through there you’ve got the hepatic duct line right behind it and I eventually burned that part. If you talk to any other surgeons who’ve had that kind of an injury, I mean I lost sleep for several nights over that. It’s one of those things that haunt you and you hate it, you just hate it.
Cutting the common bile duct is a serious error with potentially catastrophic consequences (e.g., a possible need for liver transplant). Clearly, the human is fallible, but it is clear that this surgeon has learned an important lesson and that his future patients will greatly benefit from this experience. In our field studies of expert surgeons in laparoscopic surgery, we found examples that human experts combined features from each style of adaptive control in dealing with uncertainty (Dominguez, 1996). Surgeons recognized situations that they had experienced in the past and were capable of retrieving and implementing control strategies that had worked in those situations (gain scheduling). For example, this surgeon explains why it would be dangerous to proceed using a minimally invasive procedure (using a laparoscope and long thin surgical instruments inserted through narrow ports) and why he would change to an open procedure (where he can directly manipulate the gall bladder): It’s going to shred. The gallbladder wall is dying. You’re going to find yourself flailing. You’re going to pull on the gallbladder to give yourself exposure to the
20.
MEANING PROCESSING SYSTEMS
441
cystic duct, and it’s going to tear . . . you have torn the gallbladder, you’ve exposed their belly to everything the gallbladder has in it, you increase their risk of abdominal infection, increase their risk of a wound infection.
Thus, based on the appearance of the gallbladder, this surgeon would, in effect, change control modes, switching from a minimally invasive control mode to a more traditional open-surgery control mode. Surgeons also expressed normative expectations about how the surgery should proceed (model reference adaptive control). For example, this surgeon described normative expectations about the time course of the surgery: I would continue to dissect in the area where they’re working right now, to be able to identify with certainty both the cystic and the common bile duct. If I could do that, then I would proceed. I would probably give myself a time limit . . . so I’d give myself another 5 or 10 minutes, and if it didn’t become immediately obvious, I’d open.
Thus, the surgeon has a normative model of the time course for a normal procedure. If the surgery is not proceeding according to expectations (there is a deviation from the normative model), then consideration will be given to changing the control mode from the preferred (but riskier) minimally invasive mode to the safer open mode of surgery. Finally, surgeons continuously acted to regulate the problem (i.e., remove the gall bladder), but also to explore the situation (self-tuning regulator). For example, the following passage from Dominguez (1997) describes how surgeons act as observers in solving the dual control problem: A surgeon might see a structure that looks like the cystic artery. He or she will tease away at the surrounding tissues, and will observe the structure while rotating it back and forth. What is seen and felt during this activity either supports or disconfirms an identification of the cystic artery, or compels the surgeon to continue dissecting to get a better look. (p. 21)
Figure 20.3 shows a schematic sketch of a cognitive control system that combines components from multiple styles of adaptive control. This cognitive control system can regulate with respect to error (feedback). It can anticipate outputs (feed forward). It has prestored strategies and tactics that have worked in the past (table look-up). It has normative expectations derived from previous experiences (model). Finally, it is capable of observing the system in real time to infer the dynamical properties (observer). All of these adaptive capabilities are resources that the human operator contributes to the solution of complex problems. Traditional human factors has seen the human as a half empty glass—a source of undesirable variability. For the cognitive systems engineering approach, however, the glass is half full. That is, the human is viewed as a sophisticated adaptivecontrol element that may offer a workable solution to problems associated with
442
FLACH AND DOMINGUEZ
FIG. 20.3. A cognitive control system, such as an expert human operator, combines multiple strategies for adapting control to the changing demands of a complex work domain.
unanticipated variability. Where classical approaches to design have focused on ways to remove the human from the control loop, cognitive systems engineering explores methods to more completely integrate or immerse the human in work activities. Note, however, that the glass is only half full. Human operators will not function as intelligent, adaptive controllers without extensive support. The challenge for designers is to provide the necessary support. In particular, the cognitive design problem is to construct the representations that will allow the human operators to function as expert adaptive controllers. These representations can be constructed through training and/or through interface design. In both cases, the goal is to enhance situation awareness; to support the “direct” pickup of meaning; and/or to bridge the gulfs of execution (control problem) and evaluation (observer problem) so that the human–machine system will function as a stable adaptive control system.
REPRESENTATION DESIGN Without adequate information (and what is adequate depends on the context), neither pilots nor controllers can make uniformly wise decisions. Without correct and timely information, displayed in a way that minimizes operator cognitive effort, even the best pilots and controllers cannot remain constructively involved in an operation, and thus cannot maintain command of the situation. The designer must ask how he or she is affecting the processes involved in extracting meaning from the data or information provided. (Billings, 1997, p. 43, emphasis added)
20.
MEANING PROCESSING SYSTEMS
443
Classically, problems of human performance in complex systems have been framed within the context of the information processing paradigm with its associated metaphor of a communication channel. Within this framework, the focus has been on the relation between the syntax of the representation (spatial layout, color, modality, format—graphical vs. alphanumeric, etc.) and the syntax of the response (e.g., speed and consistency). Although the syntax of responses would be typically identified with speed and accuracy, the term consistency is technically more accurate. For example, an information analysis cannot distinguish between a system that consistently responds yes, when yes is appropriate and a system that responds consistently no when yes is appropriate. Information statistics only measure the consistency of input and output. There is no basis within the information paradigm to judge correctness in a way that reflects the appropriateness or correspondence of a response to the demands of a situation. Within the information processing framework, there is no way to distinguish between a system that is consistently “right” and a system that is consistently “wrong.” The information processing (IP) framework ignores the problems of meaning. Within the IP framework, the definitions of “right” and “wrong” are arbitrary. This allows researchers to take a cavalier attitude toward their choice of research tasks. This has led to a paradigm of nonsense syllables and nonsense tasks. It is a paradigm that typically fails to generalize beyond the syntactical constraints of toy worlds to make contact with the real functional demands of actual systems. As noted in the quote from Billings, the problem of representation design is first and foremost a problem of “extracting meaning.” Wertheimer (1954) expressed this in terms of the “structural truths” of a problem. An effective representation is one that reveals the structural truths of the problem. Note that “meaning” and “structural truth” refer to properties of a problem, not to constructions in the “mind.” For example, in laparoscopic surgery, the “structural truths” refer to the real anatomy and the functional role of those anatomical structures. Certain structures (e.g., the cystic duct) need to be incised in order to remove the gall bladder, and certain other structures must be left intact (e.g., the common bile duct) if the patient is to recover fully. These are not subject to interpretation. Similarly, in aviation, there are a limited range of approach angles that will allow a soft touchdown in landing. This range of angles is a structural truth of the work domain that arises from the aerodynamics of flight. These boundaries to safe performance are meaningful, independent of the interpretation of a pilot. In both examples, failure to respect the structural truths of the problem leads to real, negative consequences. The examples of structural anatomy and glide slope are fairly straightforward from the point of view of interface design. That is, they represent stable properties of the work domains that can be anticipated fairly easily in the design stage. The only question is what is the best way (i.e., syntax) to represent these meaningful properties so that they can be easily perceived in the dynamic work context. However, there are other “structural truths” of these problem spaces that are not so straightforward. Under what circumstances should the surgeon convert from
444
FLACH AND DOMINGUEZ
a laparoscopic procedure to an open surgical procedure? In uncertain weather conditions, should pilots attempt the scheduled landing or should they delay or divert to an alternative airport? Solving these problems also requires that the operators can perceive the structural truths of the problem. However, the particular confluence of constraints that are relevant is not so easy to anticipate at the design stage. These problems are complex. There is no simple normative path to represent. These are the kinds of problems that must be addressed in the representation if the human– machine system is to function adaptively in the complex work environment. Thus, the problem of meaning or structural truth is becoming increasingly difficult with advanced automation and growing system complexity. Reason (1990) describes this as the Catch-22 of supervisory control: Human supervisory control was not conceived with humans in mind. It was a byproduct of the microchip revolution. Indeed, if a group of human factors specialists sat down with the malign intent of conceiving an activity that was wholly ill-matched to the strengths and weaknesses of human cognition, they might well have come up with something not altogether different from what is currently demanded of nuclear or chemical plant operators. (p. 183)
Despite this caution from Reason, domains such as aviation and medicine continue to migrate in the direction of increased dependence on microchips, and thus the gap between operators and meaning continues to grow. Rasmussen (e.g., 1986; Rasmussen, Pejtersen, & Goodstein, 1994) offers two constructs for helping designers and researchers to address this meaning gap: the abstraction/decomposition space and the ecological interface. The abstraction/decomposition space is a framework for addressing meaning within a work domain. A principal insight of this framework is that there are multiple layers of meaning in any work domain and that the appropriate level of decomposition (e.g., chunk size) for addressing meaning varies across levels of abstraction. At high levels of abstraction, meaning is best addressed in terms of global constraints (the general purposes of the work and the global constraints on activity, such as physical laws and resource limitations). At lower levels of abstraction, consideration must be give to more local constraints that reflect the functional flow, the allocation of function, and the specific activities required to satisfy a function. The lower levels of abstraction require a finer grain of analysis. Figure 20.4 illustrates the abstraction/decomposition space (see Rasmussen, 1986; Rasmussen et al., 1994, for detailed examples that illustrate the use of this space for uncovering meaning within work domains). The figure gives a thumbnail sketch for some of the constraints that might be considered at the various levels of abstraction for the work domains of surgery and aviation. Note that the critical information for the design of adaptive control systems lies on the diagonal of this space. That is, at high levels of abstraction global distinctions are most relevant, but at lower levels of abstraction increasingly detailed distinctions become important.
20.
MEANING PROCESSING SYSTEMS
445
FIG. 20.4. The semantics of an adaptive control problem tend to fall along the diagonal of the abstraction/decomposition space.
The abstraction/decomposition space is a conceptual tool to help system designers wrestle with the problem of meaning. The construct of ecological interface (e.g., Rasmussen & Vicente, 1989) considers how designers can communicate their understanding of meaning to system operators. The goal of ecological interface design is to create a representation through which the operator can directly perceive and directly manipulate meaningful properties of the work domain. The general idea is that configural properties of the interface should map as directly as possible to meaning in the work space. Vicente (1992) developed the DURESS microworld to illustrate and test the power of explicitly designing a representation to reflect meaning in terms of the abstraction/decomposition space. The ecological interface to the DURESS microworld is a configural graphic display whose geometry maps
446
FLACH AND DOMINGUEZ
directly to the constraints identified by an abstraction/decomposition analysis of the domain of feedwater control. The geometric constraints allow direct perception of the constraints on meaning. Figure 20.5 provides an heuristic scheme for thinking about the design of ecological interfaces. The general idea is to map the multiple levels of meanings within a work domain (as described using an abstraction hierarchy) to nested geometric constraints within a configural display geometry. Within the abstraction hierarchy, global levels of abstraction provide the context necessary to appreciate the meaning of local actions or changes. Within the configural geometry, global symmetries provide the ground against which to appreciate local distinctions. The nesting of symmetries within the display geometry can then function as an analog to the contextual nesting of meaning reflected in the abstraction hierarchy. In this way, the geometry can reflect the “structural relations” that reflect “meaning” or “functional significance” within the work domain. Thus, the operator can see both
FIG. 20.5. A heuristic scheme for the design of configural graphic displays where the nested functional constraints (as characterized within an abstraction hierarchy) are mapped to nested geometrical constraints to provide a window so that meaningful properties of the work domain can be directly perceived and directly manipulated.
20.
MEANING PROCESSING SYSTEMS
447
the reasons why and the procedures (how to perform activities), and both can be seen in the context of the relational constraints that shape the work space. How does the ecological interface address the problem of unanticipated variability? Certainly, the structure of the interface is constrained by the designers’ understanding of the work domain. This understanding is never complete, so there is always a danger that meaningful properties of the work domain will not be represented within the interface. In this sense, the ecological interface is no less constrained than adaptive control automation. However, there is an important difference. With adaptive controls, the designers’ understanding is encapsulated within the automated system. This understanding is not explicit to the operator. The point of an ecological interface is to make the designer’s understanding of the work domain directly available to the operator. As Billings (1997) advises, “automation must never be permitted to perform or fail silently” (p. 249). Thus, the operator can stand on the shoulders of the designer when he is faced with novel problems or situations. The ecological interface provides a platform for the operator to participate in the design of the system. The hope is that this platform will enable the operator to extend the design—to discover creative solutions to novel problems. However, it does not guarantee solutions. The process of discovery and hypothesis testing must also be supported at the interface. This reflects the need for “direct manipulation” within an ecological interface. The concept of direct manipulation was introduced by Schneidermann (1983). A good example of the implementation of direct manipulation in the context of ecological interfaces is the Bookhouse (Goodstein & Pejtersen, 1989; Pejtersen, 1980). The Bookhouse is an interface to support a search for interesting books in the context of a public library. Unlike DURESS, the lawful constraints on what determines whether a book will be interesting or satisfying for a patron are not well understood. However, librarians do have strategies that can be helpful in assisting patrons in discovering good books. This interface was designed to enable the search strategies that expert librarians used when assisting patrons. The Bookhouse interface is designed so that manipulation of icons at the interface allows the patrons to function like expert librarians. Some of the manipulations that may be important to the design process are included in Fig. 20.5. These include the ability to test general assumptions and global laws; the ability to evaluate causal networks (e.g., to test for unanticipated side effects of a potential solution); and the ability to evaluate causal chains (e.g., to see links between elemental actions and progress toward specific goals). Thus, the ecological interface should support both “direct perception” and “direct manipulation.” The operator should be able to directly perceive what the designers know (at multiple levels of abstraction and decomposition). Further, they should be able to manipulate data and test hypotheses using intelligent strategies. The hope is that this type of support will allow operators to go beyond the understanding of the designers and will allow adaptation to situations that the designers couldn’t have anticipated.
448
FLACH AND DOMINGUEZ
A MEANING-PROCESSING SYSTEM Meaningful interaction with an environment depends upon the existence of a set of invariate constraints in the relationships among events in the environment and between human actions and their effects . . . purposive human behavior must be based on an internal representation of these constraints. The constraints can be defined and represented in various different ways, which in turn can serve to characterize the different categories of human behavior. (Rasmussen, 1986, p. 100)
An ecological interface provides an external representation for supporting the human in adapting to novel events. However, it is not a silver bullet. A complementary source of support is the internal representation (dynamic world model; Rasmussen, 1986) that the operator brings to the task. This internal representation develops as a result of experience and training. The discussion of meaning in the previous section applies equally to internal and external representations. An internal representation will be effective to the extent that it reflects meaningful properties of the work domain. It is interesting to note that the verbal protocols of problem solving by domain experts often recapitulates the diagonal structure of the abstraction/decomposition space illustrated in Fig. 20.4 (e.g., Rasmussen, 1986; Rasmussen et al., 1994). Duncker (1945) described this process as “progressive deepening” in which people searched for solutions to complex problems by working from abstract functional goals toward concrete solutions and iterating back to higher levels of abstraction when a path did not lead to a concrete solution. The abstraction hierarchy/decomposition space is a framework for evaluating the “constraints in the relationship among events in the environment” that are referred to in the quote from Rasmussen (1986) that introduces this section. Figure 20.6 (from Flach & Rasmussen, 2000) is an attempt to illustrate different ways in which these constraints can be represented (signals, signs, and symbols) and the different categories of behavior that result (skill based, rule based, or knowledge based; see Rasmussen, 1986). Figure 20.6 is a representation that Flach and Rasmussen (2000) believe will help to focus attention on the crux of the meaning-processing problem faced by human operators in complex sociotechnical systems. This representation includes traditional information processing stages. However, the general layout of the processing stages has been reorganized to emphasize the circular coupling between perception and action. This circular arrangement emphasizes the closed-loop nature of human processing, which plays a significant role in determining the dynamics of behavior in most natural environments. Because of the closed-loop structure, stimulus and response are not separate, distinct events, but are intimately coupled. The relation between stimulus and response is not a linear cause (stimulus)–effect (response) dynamic. Stimuli and responses both contribute to the set of interlocking constraints that combine to bound the performance space.
20.
MEANING PROCESSING SYSTEMS
449
FIG. 20.6. A representation of the multiple paths through a meaning processing system. Different paths reflect qualitatively different types of processing (skill based, rule based, and knowledge based) that reflect different types of information support (signals, signs, and symbols, from Flach & Rasmussen, 2000).
A second important feature of the representation in Fig. 20.6 is the set of arrows radiating from the central “dynamic world model.” The dynamic world model represents the largely tacit knowledge that experts utilize to discriminate, anticipate, integrate, and synchronize with events in their domains of expertise. It reflects internalization or attunement to the invariant properties of the task environment (see Rasmussen, 1986). This attunement might be characterized as an “internal model,” but it should not be thought of as an independent module in long-term memory. Rather, it reflects a tuning of the distributed network to the consistent properties or constraints of a particular task environment, as might be characterized by Runeson’s (1977) concept of “smart mechanism.” The radiating arrows indicate that every stage of processing can be linked to every other stage of processing. Thus, there is no fixed precedence relationship
450
FLACH AND DOMINGUEZ
among the processing stages. Rather, the meaning processing system is an adaptive system capable of reorganizing and coordinating the flow of information to reflect the constraints and resources within the task environment. The radiating arrows symbolize potential links between all nodes in the system. The capacity to shunt from one pathway to another means that processing is not ultimately limited by the capacity of any particular channel. However, the ability to utilize a particular pathway will depend on the sources of regularity and constraint within the task space. Skill-based interactions represent the most direct coupling between stimulus and response. This type of coupling is seen in the smooth, effortless performance of experts. Skill-based interactions are made possible when the regularities within the task environment are associated with invariant space–time properties of a representational medium that Rasmussen referred to as signals. One of the goals of ecological interface design is to create an interface geometry that can function as a signal over a wide range of situations. That is, space–time properties of the geometry (e.g., patterns in space and time) should directly map to meaningful distinctions within the work domain. Two conditions seem to be necessary before skilled-based interactions are possible. First, there must be constraints or regularity (lawfulness) within the task environment (what Shiffrin & Schneider, 1977, called consistent mappings or what Gibson, 1966, called invariants). Consistency appears to be a prerequisite for the development of skill-based processing because extended practice in inconsistent environments (variable mapping) does not result in increased processing efficiency (Schneider & Shiffrin, 1977). The second condition is practice, practice, practice. Because skill-based processing relies heavily on “tacit” or “mindless” knowledge, instructional practices directed at intellectual understanding do not seem to have an impact at this level. There seems to be no shortcut. Hayes (1985) found across a number of fields that no one achieved the highest levels of achievement without a minimum of 10 years of practice. Thus, one of the goals in designing training programs is to create opportunities for practice. This is one of the attractive features of simulators—they offer opportunities for more intensive practice. For example, in a simulator, a reset option that instantly places the pilot on the approach path to a runway can allow the pilot to practice many more landings in a fixed amount of time than would be possible in an actual aircraft that required him to take off and maneuver into landing position prior to each landing attempt. Rule-based processing, like skill-based processing, also depends on consistency. However, for rule-based processing, the consistency is generally established through convention rather than through the space–time properties of a feedback signal. Rasmussen (1986) referred to these conventions as signs. Language and text can often function as signs. The icons used in graphical displays are also examples of signs. The space–time properties of alphanumerics and icons do not correspond to dynamic constraints. However, through consistent association, signs can elicit “automatic”-type behaviors in which the sign is a direct trigger for an action. When
20.
MEANING PROCESSING SYSTEMS
451
consistencies are present in the work environment (either as space–time properties of signals or as conventions associated with signs) meaning processing systems will utilize those consistencies to increase the efficiency of cognitive processing. Consistency allows the meaning-processing system to coordinate perception and action without direction from higher, more computationally intensive processes. Knowledge-based processing reflects the least direct path through the meaningprocessing system. For knowledge-based processing, the stimuli function as symbols that must be interpreted. In other words, meaning in terms of space–time constraints or conventions is not obvious; it must be derived or figured out. This path represents what is typically referred to as thinking, problem solving, or, more generally, information processing. For a novice (someone operating in unfamiliar territory like a sterile laboratory), this is the pathway that will be most utilized. This pathway represents the laborious process of evaluating and comparing alternatives in order to choose the appropriate action. Experts rely on this pathway less often. They can bypass the computational demands of knowledge-based processing because of their attunement to task constraints that allow the use of more direct skill-based or rule-based pathways. However, when unexpected or nonroutine events occur, that go beyond the normal task constraints (invariants or rules), even experts will then be forced to rely on knowledge-based processing to plan and carry out systematic experiments to discover the relevant task constraints. These experiments can be implemented as exploratory actions in which the observer interacts directly with the space–time constraints in the task environment (for example, pumping the brakes of an automobile to discover the changing dynamics due to a wet surface) or as mental simulations (e.g., Klein, 1993), in which the observer engages with the internalization of task constraints embodied in the dynamic world model to visualize potential outcomes (for example, mentally rehearsing a gymnastics routine prior to mounting the apparatus). Experts often rely on knowledge-based processing in the planning stages prior to entering dynamic high-risk environments like tactical aviation (Amalberti & Deblon, 1992) or surgery Xiao (1994). In these planning stages, the experts will mentally simulate the task to identify potential problems and difficulties. As a result of these mental simulations, the experts can often structure the task to avoid situations that might require knowledge-based interactions. For example, an anesthesiologist might arrange the syringes in the order of their use and then tag and return empty syringes to their original position so that the spatial layout becomes an explicit reminder of the activity sequence. The next action is specified by the spatial arrangement. As Amalberti and Deblon (1992) observe, “When a pilot takes off, a large set of potential problems has already been solved” (p. 649). Thus, three qualitatively different levels of processing are represented in Fig. 20.6. Skill-based processing represents loops that directly couple stimuli and responses. These direct couplings are made possible by tacit knowledge of the space–time constraints of the task. Utilization of these pathways places little
452
FLACH AND DOMINGUEZ
demand on “higher” level computational processes. Rule-based processing represents shunts across intermediate levels within the meaning processing system that utilizes consistencies established by convention. The knowledge (much of which is tacit) of the space–time and conventional constraints that allows these shortcuts across the meaning processing network is represented in the diagram as the dynamic world model. Again, this is not meant to imply a localized “memory” or “program.” Rather, it represents the distributed attunement of a computational network to the constraints and regularities of a particular task. Knowledge-based processing represents the computationally intense process of explicitly comparing and evaluating alternatives to choose actions that are most appropriate for accomplishing the functional goals of the system.
DESIGNING EXPERTS The expertise built up over years of learning and practice is not easy to encode within the narrow confines of even a very capable computer. (Billings, 1997, p. 16) If you can’t see what you gotta know, you gotta know what you gotta know. (T. Demothenes, personal communication, November 1994, cited in Billings, 1997, p. 32)
The goal of training is to facilitate skill development. In the context of the discussions above, this means that the goal is to facilitate the discovery and attunement to the meaningful aspects of a situation. The goal is to shape a student’s awareness so that she or he recognizes the significance of events relative to the functional demands of a specific work domain. This section will focus on the use of simulators to accelerate the development of expertise. When simulators are used for training, one of the first considerations is fidelity. That is, to what degree do the simulated situations reflect the meanings within the work domain to which the training is directed? Fidelity Fidelity refers to the degree of correspondence between events in the simulator and events in the target work domain. For example, to what extent do the displays and controls in the simulator look and feel like the controls in the actual aircraft; to what extent does the training simulator respond to control inputs like the actual aircraft; and to what extent do the task scenarios used in training reflect real situations that a pilot will face in the work domain? A mundane approach to fidelity would argue that anything less than an exact match between the simulator and the work environment will limit the value of the training experience. From this perspective, the best place to develop skill is on the
20.
MEANING PROCESSING SYSTEMS
453
job—in the actual target work domain. Such a philosophy is probably wrong, but at best it is impractical. However, the goal of skill is to help the operator to discover meaning in the work domain. In terms of Rasmussen’s (1986) framework, the goal is to help the operator discover and attune to the signals, signs, and symbols of the work domain. If there is no or limited correspondence between the signals, signs, and symbols in the simulator and those in the work domain, then little transfer of skill can be expected. It might be useful to consider a concrete example. In visual flight simulators, there is typically a trade-off between the degree of pictorial realism and the time delay. More pictorial realism (many objects and textured surfaces on the objects) generally results in greater time delays. What is the best compromise? For highperformance aircraft, the dynamic response of the aircraft is generally considered to be a significant aspect of the work domain. Thus, great efforts are made to minimize time delays and to get the best match to the dynamic response of the aircraft as possible. However, this balance depends on what aspect of the job is being trained. For training visual search strategies to locate and identify potential “bogeys,” high-fidelity visual displays may take precedence over the dynamic response of the aircraft. Perfect fidelity between simulators and work domains is an impractical ideal. Thus, decisions must be made about what are the significant meanings relative to the functional goals in the actual work environment and with respect to the training objectives. Also, strong arguments can be made that reductions in fidelity, if done wisely, can actually facilitate the skill development process. As a simple example, eliminating the risk associated with many environments (e.g., crashing an aircraft or losing a patient) can greatly reduce training costs and can increase the probability that the students will survive long enough to learn from their mistakes. Simplification Work domains like combat aviation and surgery can be incredibly complex. This complexity can be an obstacle to the discovery or attunement process. In these cases, there can be benefits to breaking down a work domain into more manageable chunks. Strategies for simplification include part-task training and aiding (unburdening). Part-task training has a mixed history (see Flach, Lintern, & Larish, 1990; Wightman & Lintern, 1985) with respect to empirical evaluations. This is not surprising; as with anything else, there are right and wrong ways to apply a strategy. The success of part-task training depends on whether the part-task preserves meaningful “chunks” of the work domain. For example, Naylor (Briggs and Naylor, 1962; Naylor 1962; Naylor & Briggs, 1963) concluded that task organization was a critical variable for determining whether a part-task training strategy would be successful. Naylor (1962) concludes that “for tasks of low
454
FLACH AND DOMINGUEZ
organization, i.e., independent components, there is evidence that part practice on the least proficient components results in the greatest improvement. For tasks of high organization, i.e., interacting components, whole task practice is necessary” (p. 21). One part-task strategy that has been used with some success is backward chaining. Flach, Lintern, and Larish (1990) discuss this: In a review of part-task training for tracking and manual control, Wightman and Lintern (1985) noted that one of the most effective of all part-task techniques was that of backward chaining. In this part-task method terminal portions of the task are practiced first with preceding segments added successively until practice on the whole task was accomplished. Bailey, Hughes, and Jones (1980) demonstrated this to be more effective than whole task training for teaching a 30◦ dive bomb maneuver. Wightman and Sistrunk (1987) found similar results when the same technique was used to teach straight-in approaches for landing on aircraft carriers. The effectiveness of this method can be accounted for within our framework. The relevance of stimulus structure is defined by the task goals. In the flight tasks for which backward chaining was effective, the goal of a correct dive or safe touchdown was far removed from initial portions of the task. For this reason, it may be difficult for subjects to evaluate early segments of the task with respect to this goal. For example, it may be difficult for the pilot to evaluate his downwind leg in terms of a task whose objectives are defined in terms of the final approach. In the backward chaining method the initiation point of end segments becomes the goals for early segments. These goals provide better metrics for evaluating performance and the functionality of the information on which it is based. Also, in the whole task method errors in early segments concatenate. This can make it difficult to evaluate the functionality of information in the terminal segments. (p. 347)
Another important consideration for part training is the capability of the simulator. Because of limitations in simulator capabilities, a high degree of fidelity with regard to parts of the work domain may be possible where high fidelity with respect to the whole domain is impossible. The bottom line is that part-task training can be an effective strategy for developing skills for complex tasks. However, the “parts” must be chosen in a way that preserves meaningful properties of the situation. Aiding (unburdening) is another strategy for dealing with complexity. With aiding, the whole task is simulated; however, the student is only given responsibility for parts of the task. Other parts of the task can be automated or carried out by an instructor. For example, a flight instructor might unburden the student during early landing approaches by taking care of communications so that the student can concentrate on controlling the aircraft. Unburdening allows the student to focus attention on part of the task, but also allows the student to experience that task in the context of the whole. The same concerns expressed for part-task training apply for aiding. The parsing of the task should reflect meaningful chunks with respect to functions in the target domain.
20.
MEANING PROCESSING SYSTEMS
455
Augmentation Simplification deals with task complexity by removing parts of the task. Augmentation, on the other hand, deals with task complexity by adding information in the simulator that is not typically present in the natural work domain. For example, when learning to land, it is often difficult for novices to recognize where the appropriate glide slope is by simply looking out the window. Thus, flight instructors might augment the natural cues by verbal inputs telling the student when they are too high or too low. Lintern (1980) used visual augmentations in the form of guideposts to create a visual highway in the sky showing the correct glide slope to student pilots in a simulator. This added information can greatly simplify the task of landing, and performance in the simulator increases dramatically. However, will that increased performance translate to better landings in the natural work domain where the augmentation is not available? Lintern and Roscoe’s (1980) review of augmented training suggests that the answer depends on whether the augmentation draws attention to natural sources of regularity (functional aspects of the natural domain) or whether the augmentation becomes a substitute for natural sources of regularity. In the latter case, the students become dependent on the augmentation. When this crutch is removed, the gains seen in training disappear and performance collapses. One way to reduce the problem of dependence on the augmentation is to provide supplementary cues only when the student is “off course.” Lintern and Roscoe (1980) explain that off course augmentation “assures that the student does not establish dependence on the augmented cues; as learning occurs, the augmented feedback goes away and only the intrinsic cues remain” (p. 234). On-course feedback might also be useful if the supplemental cues draw attention to intrinsic structures that reflect properties of the natural work domain. Off-course feedback is naturally adaptive—that is, as the student becomes more skilled the augmentation comes on less and less. On-course feedback can also be adapted to the skill level of the student. Early in practice, the natural information might be enhanced so that it is more salient to the student. With practice, these enhancements can be gradually removed so that the student does not become dependent on them. In general, augmentations will facilitate skill development to the extent that they draw attention to or increase exposure to the meaningful properties of the target domain. Stress Training protocols can be designed to increase or decrease the stress associated with a natural work domain. An important source of stress is risk (e.g., danger to self or others). In general, high stress of this nature is not conducive to the discovery process. If anything, people tend to reduce their attention span (tunnel vision) in response to stress and tend to fall back on habitual response patterns
456
FLACH AND DOMINGUEZ
(even when they have no functionality with regard to the situation). One great benefit of simulators is to remove the risk inherent in work domains like aviation and surgery. This has two potential benefits. First, the tension and tunnel vision associated with high stress is reduced. Second, the catastrophic consequences of mistakes are eliminated, giving students the opportunity to explore freely. For example, very meaningful properties of work domains are critical margins that define the edge of the performance envelop, like the stall margin. The process of discovering this boundary and its properties can be greatly facilitated if the student can cross it (and live to learn from the experience). Another example is learning to drive in snow and ice and to control skids. Developing the necessary skills can be greatly facilitated if drivers can practice in an open parking lot, where they can experiment at the edge of the envelop without catastrophic consequences. Another aspect of stress is associated with the difficulty of the task (independent of the consequences of failure). Difficulty might include the frequency, density, or complexity of events. Adapting the difficulty to the skill of the student can often facilitate skill development. For example, begin teaching a driver in an open space with few constraints, and later move to a low-traffic residential area. Then, gradually build until the student can handle high-density highway traffic situations. Geis and colleagues (1994) suggest that when learning laparoscopic techniques, surgeons should begin with easier procedures before advancing to more complex situations. For example, they recommend, “before performing laparoscopic right colon resection, the surgeon should perform many laparoscopic appendectomies” (p. 211). Simulators can greatly facilitate adaptation of task difficulty to skills; they allow both manipulation of difficulty and measurement of performance skills. Thus, measures of skills can be logically linked to task manipulations so that difficulty can be progressively increased to reflect changing skills. Training can also be used to exaggerate stress (in terms of either perceived risk or difficulty). This is sometimes referred to as the “heavy bat” approach to training. For example, surgeons have reported that they will sometimes exaggerate the danger of a situation to “test” the coping abilities of a resident that they are training. By creating false stress, they are preparing the student to deal with real stress that might be associated with later cases. Simulated wind gusts or high levels of traffic in the terminal area might exaggerate the difficulty of simulated landings. These manipulations may help tone the “muscles” so that students are “overtrained.” Because it is difficult to simulate the stresses associated with risk in a simulator—exaggerating difficulty levels may be one way to prepare the student for high risk. Exceptional Cases A final consideration for training strategies involves exceptional cases or unanticipated variability. The human is increasingly being relied on to deal with “messy problems” or “rare events” not anticipated by the designers of the automatic control
20.
MEANING PROCESSING SYSTEMS
457
systems. Skill depends heavily on practice. However, because of their nature, it is difficult to practice rare events in the natural environment. Therefore, a simulator can be a very important tool for preparing humans to deal with rare events. This is particularly important for systems such as nuclear power plants, where you hope that emergencies happen rarely—but you also hope that the operators will be prepared to respond skillfully to those emergencies. The question of exceptional cases or rare events raises a Catch-22 for training program designers. How can you anticipate what “rare” events will occur? How do you predict unanticipated variability? One approach to this problem is the idea of “free play.” That is, rather than giving students a specific task to accomplish in the simulator, the students are told to play with the simulator—explore the edges of the envelop, perform experiments, be creative. This kind of exercise can help students to better tune to the signals and signs that are associated with critical regions of the control space (skill- and rule-based processing). Free play also has potential benefits for knowledge-based processing. First, free play might help the students to anticipate difficulties and plan ahead so that these difficulties are avoided. Second, free play might help students to develop higher level strategies and tactics for experimentation and hypothesis testing that will enhance their performance in later problem-solving situations. However, for these exercises to be representative of the actual work domain, high fidelity will be important. In sum, skill depends on a correspondence between situation constraints and awareness. To the extent that situation constraints are preserved in the training context (e.g., simulator), then practice in that context should enhance the correspondence between the student’s awareness and the situation. The student can discover the consistencies (signal and signs) that allow skill- and rule-based interactions. They can also discover the boundary conditions where skill- and rule-based control will lead to instability. They can also exercise problem-solving skills in the context of the domain semantics. This exercise can help students to avoid conditions where knowledge-based processing would be required and may prepare them to be more effective problem solvers when they do get into situations that require knowledge-based interactions. CONCLUSION Science finds, industry applies, man conforms. (motto of the 1933 Chicago World’s Fair, cited by Norman, 1993; Billings, 1997, p. 52) People propose, science studies, technology conforms. (Norman, 1993, p. 253; Billings, 1997, p. 53) People adapt, science adapts, technology adapts. (Billings, 1997, p. 52)
Unanticipated variability is a property of any complex system. If the complex system lives in a high-risk environment, then this unanticipated variability can lead
458
FLACH AND DOMINGUEZ
to catastrophic failures (i.e., normal accidents). The probability of these failures can be reduced (but not to zero) if the system is capable of adapting to the changing task demands created by unanticipated variability. As implied in the quote from Billings, adaptation is essential to long-term stability. The first section of this chapter gives an overview of the problem of unanticipated variability and introduces several different types of adaptive control systems for dealing with this unanticipated variability. The next section describes the human operator as a preferred solution to the adaptive control problem. Figure 20.3 illustrates how a cognitive system can incorporate multiple adaptive control strategies. The third section presents adaptation as a meaning-processing problem. The abstraction/decomposition space is introduced as a framework for thinking about the nested constraints that determine meaningful properties of specific work domains. Also, the construct of ecological interface is introduced as a means for representing that meaning in a way that it can be directly perceived and manipulated by operators. Next, a conceptual model for the internal constraints of the human-meaning processing system is presented (Fig. 20.6). Finally, recommendations are made for how simulators can be used to facilitate the development of expertise. It should be clear that there is no simple recipe for designing systems that can adapt to the unexpected. This is a “messy” design problem, and it requires an iterative process of trial and error. No framework can guarantee a successful design solution. The abstraction hierarchy/decomposition framework for characterizing the meaningful properties of a work domain and the skills–rules–knowledge framework for characterizing the behavior of skilled operators have evolved from a long struggle with the problems of complexity associated with nuclear power, and it is gradually being extended into other work domains (Rasmussen et al., 1994). For now, these frameworks are our best guides for thinking about the problems of adaptive control and for designing systems that will be generally stable, even in situations that God might not be able to anticipate.
ACKNOWLEDGMENTS John Flach was supported by grants from the Japan Atomic Energy Research Institute and from the Air Force Office of Scientific Research during preparation of this manuscript. Positions presented in this paper are the opinions of the authors alone. They do not reflect official positions of any of the supporting organizations.
REFERENCES Amalberti, R., & Deblon, F. (1992). Cognitive modeling of fighter aircraft process control: A step towards an intelligent on-board assistance system. International Journal of Man–Machine Systems, 36, 639–671.
20.
MEANING PROCESSING SYSTEMS
459
Bailey, J. S., Hughes, R. G., & Jones, W. E. (1980). Application of backward chaining to air-to-surface weapons delivery training (AFHRL-TR 79–63). Williams Air Force Base, AZ: Air Force Human Resources Laboratory. Billings, C. E. (1997). Aviation automation: The search for a human-centered approach. Mahwah, NJ: Lawrence Erlbaum Associates. Briggs, G. E., & Naylor, J. C. (1962). The relative efficiency of several training methods as a function of transfer task complexity. Journal of Experimental Psychology, 64, 505–512. Chapman, G. (1987). The new generation of high-technology weapons. In D. Bellin & G. Chapman (Eds.), Computers in battle. Will they work? (pp. 61–100). New York: Harcourt Brace Jovanovich. Dominguez, C. (1997). First, do no harm: Expertise and metacognition in laparoscopic surgery. Unpublished doctoral dissertation, Wright State University, Dayton, OH. Duncker, K. (1945). On problem solving. Psychology Monographs, 58, 1–113. Flach, J. M. (1990). Control with an eye for perception: Precursors to an active psychophysics. Ecological Psychology, 2, 83–111. Flach, J. M., Lintern, G., & Larish, J. F. (1990). Perceptual motor skill: A theoretical framework. In R. Warren & A. H. Werheim (Eds.), Percpetion and control of self-motion (pp. 327–355). Hillsdale, NJ: Lawrence Erlbaum Associates. Flach, J. M., & Rasmussen, J. (2000). Cognitive engineering: Designing for situation awareness. In N. Sarter, & R. Amalberti (Eds.), Cognitive engineering in the aviation domain (pp. 153–179). Mahwah, NJ: Lawrence Erlbaum Associates. Geis, W. P., Coletta, A. V., Verdeja, J. C., P. G., Ojogho, O., & Jacobs, O. (1994). Sequential psychomotor skills development in laparoscopic colon surgery. Archives of Surgery, 129, 206–212. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Goodstein, L. P. & Pejtersen, A. M. (1989). The BOOKHOUSE: System functionality and evaluation. Roskilde, Denmark: Risø National Laboratory. Hayes, J. R. (1985). Three problems in teaching general skills. In J. Segal, S. Chipman, & R. Blaser (Eds.), Thinking and learning (Vol. 2) (pp. 391–405). Mahwah, NJ: Lawrence Erlbaum Associates. Hutchins, E. (1995). Cognition in the wild. Cambridge, MA: MIT Press. Klein, G.A. (1993). A recognition-primed decision (RPD) model of rapid decision making. In G. A. Klein, J. Orasanu, & C. E. Zsambok (Eds.), Decision making in action: Models and methods (pp. 138– 157). Norwood, NJ: Ablex. Lintern, G. (1980). Transfer of landing skill after training with supplementary visual cues. Human Factors, 22, 81–88. Lintern, G., & Roscoe, S. N. (1980). Visual cue augmentation in contact flight simulation. In S. N. Roscoe (Ed.), Aviation psychology (pp. 239–250). Ames, IA: Iowa State University Press. Naylor, J. C. (1962). Parameters affecting the relative efficiency of part and whole training methods. A review of the literature (NAVTRADEVCEN 950-1). Columbus: Ohio State University Research Foundation, Laboratory of Aviation Psychology. Naylor, J. C., & Briggs, G. E. (1963). Effect of task complexity and task organization on the relative efficiency of part and whole training methods. Journal of Experimental Psychology, 65, 217–224. Norman, D. A. (1993). Things that make us smart. Reading, MA: Addison-Wesley. Pejtersen, A.M. (1980). Design of a classification scheme for fiction based on an analyis of actual user-librarian communication, and use of the scheme for control of librarians’ search strategies. In O. Harbo & L. Kajberg (Eds.), Theory and application of information research (pp. 167–183). London: Mansell. Perrow, C. (1984). Normal accidents. New York: Basic Books. Rasmussen, J. (1983). Skills, rules, and knowledge: Signals, signs, and symbols and other distinctions in human performance models. IEEE Transactions on Systems, Man, & Cybernetics, SMC-13, 257– 266.
460
FLACH AND DOMINGUEZ
Rasmussen, J. (1986). Information processing and human–machine interaction: An approach to cognitive engineering. New York: North-Holland. Rasmussen, J., Pejtersen, A. M., & Goodstein, L. P. (1994). Cognitive systems engineering. New York: Wiley. Rasmussen, J., & Vicente, K. J. (1989). Coping with human errors through system design: Implications for ecological interface design. International Journal of Man–Machine Studies, 31, 517–534. Reason, J. T. (1990). Human error. Cambridge, MA: Cambridge University Press. Rochlin, G. (1997). Trapped in the net. Princeton, NJ: Princeton University Press. Runeson, S. (1977). On the possibility of “smart” perceptual mechanisms. Scandinavian Journal of Psychology, 18, 172–179. Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic information processing I: Detection, search, and attention. Psychological Review, 84, 1–66. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic information processing II: Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127–190. Schneiderman, B. (1982). The future of interactive systems and the emergence of direct manipulation. Behavior and Information Technology, 1, 237–256. Vicente, K. J. (1992). Memory recall in a process control system: A measure of expertise and display effectiveness. Memory & Cognition, 20, 356–373. Vicente, K. J. (1999). Cognitive work analysis. Mahwah, NJ: Lawrence Erlbaum Associates. Weinberg, G. M., & Weinberg, D. (1979). On the design of stable systems. New York: Wiley. Wertheimer (1954). Productive thinking. New York: Harper & Row. Wightman, D. C., & Lintern, G. (1985). Part-task training for tracking and manual control. Human Factors, 27, 267–284. Wightman, D. C., & Sistrunk, F. (1987). Part-task training strategies in simulated carrier landing final approach training. Human Factors, 29, 245–254. Xiao, Y. (1994). Interacting with complex work environments: A field study and planning model. Unpublished doctoral dissertation, the University of Toronto, Canada.
21 A Human Factors Approach to Adaptive Aids Sylvain Hourlier∗ Jean-Yves Grau Claude Valot Cognitive Sciences Department Aerospatial Medical Institute for the Army Health Service
Technological developments place increasing constraints and demands on operator activity, but the use of adaptive assistance to help operators face this complexity is also facilitated by these technologies. Operators will eventually be assisted by systems able to grasp changes in their environment, their actions, and possibly even their intentions.1 These systems are designed to enhance operators’ means of perception, dialogue capacities, or situation awareness. A prerequisite to using such systems is that they must not turn into another automation layer, further complicating operator activity. An adaptive aid should be able to adjust to operator needs, whatever the situation at hand. This concept should be explored as an appropriate solution to meet the increasing demands arising from work situations because early studies on adaptive aiding showed performance improvement (Rouse, 1991, for a detailed review of the studies); however, its actual development still raises numerous questions. Is * Correspondence
regarding this chapter should be addressed to Sylvain Hourlier, Institut de M´edecine A´erospatiale du Service de Sant´e des Arm´ees, D´epartement Sciences Cognitives, BP 73, 91223 Br´etigny-sur-Orge Cedex, France. E-mail:
[email protected] 1 This latter capability seems very promising as it enhances the most efficient strategy of any operator, that is, anticipation.
461
462
HOURLIER, GRAU, AND VALOT
adaptive assistance the latest automation design philosophy, or is it truly a new step in the relationship between operators and machines? If automation was first developed and used to compensate for operator limitations with the underlying assumption of eventually replacing these operators, it quickly became obvious that human beings had to be maintained “inside the control and decision-making loop.” The aids being developed take into account operator-processing capacities. The point is to create through the system a situation compatible with operator analytic capacities. The first “intelligent” aids were based on models describing the limited processing capacities of human beings and relied on the real-time integration of operators’ available cognitive resources (Parasuraman, Bahri, Deaton, Morrison, & Barnes, 1992; Rouse, 1988). Many approaches were developed, but they all focused on interface design and barely addressed the operating principles of assistance. It may be important to further develop the concept of assistance, and envisage the possibility of maximizing human abilities by insisting on what humans can do, rather than deliberately compensating for what they cannot do. Seen from this angle, adaptation requires great similarities between human and machine reasoning. If a machine is to assist human operators during a task, it must follow the underlying logic behind their actions to put operators back on the right track. Operators may have deviated from it, but machine and operator equally know it as being the right one. This approach of adaptive assistance relies on having a thorough knowledge of the mechanisms behind operator reasoning in work situations. The objective of this chapter is to show how “situated” cognition was gradually taken into account in the development of automation and assistance concepts, and then to illustrate the principles of adaptive assistance with two practical applications. The first application involves the design and development of a flight-assistance system for military aircraft. The second application deals more specifically with the definition of dialogue principles driving the development of an interface that integrates multiple modalities. EVOLUTION OF AUTOMATION AND ASSISTANCE TOWARD ADAPTIVITY Automation Automating means delegating a task to a machine. Automation is also defined in relation to various characteristics such as:
r Level of allocation (from all manual to all automatic; Parson, 1985) r Capacity to run on its own without human intervention under normal operating conditions (Davis & Wackers, 1987)
r Capacity to detect modifications in the environment r Capacity to detect its own effect on the environment (Scerbo, 1996).
21.
AN APPROACH TO ADAPTIVE AIDS
463
As a whole, these characteristics tend to make automation as “human-like” as possible, similar to what happens when an operator delegates to another: “Do this, you’re on your own, react properly and acknowledge in case of problem.” This kind of automation is completely autonomous and really sounds like the perfect assistant. However, the tasks automation is designed to support, as defined by Wickens (1992), are generally not easy or even possible to delegate to other human beings: functions beyond the ability of humans, those where humans do poorly and finally undesirable activities. The goal of automation is therefore very simple. We want automation to perform specific tasks at least as well as human beings would, if only they could. One of the key issues is that we have to rely on a model of nonexisting performance. Automation, which has already proved its benefits by reducing workload, by allowing for the control of complex systems, and by reducing human-performance variability, has also displayed some major drawbacks, such as shift in activity, interautomation conflicts, loss of competence, and complacency (Bainbridge, 1983; Parasuraman, Malloy, & Singh, 1993; Wickens, 1992). Unfortunately, it seems that automating does not necessarily mean helping. The automation of modern aircraft cockpits has not reduced pilot workload, which remains high, but it has improved overall system performance. Safety does not seem to have greatly benefited, because it remains stable (Amalberti, 1998). Further, Jeoffroy (1995) points out that, to help operators, automation must compensate for operator weaknesses, optimize strengths, and make the best of available resources. This kind of aid can be applied to three domains: situation awareness, decision making, and monitoring of actions. Jeoffroy insists on the need to take into account the way the system operates, and to distinguish between aid required during normal and critical situations. Under normal circumstances, the operator must be freed from all details that are not relevant to understanding the situation at hand; the automated system is also there to take things over whenever the operator’s psychophysiological capacities no longer allow for correct operations, to assist in cooperation between operators, to improve the shared cognition, and to improve the intervisibility of each and everyone’s actions. In critical situations, the focus is on helping the perception of time and on integrating unusual logic. The idea here is to reason in terms of assistance requirements, as determined by interactions with the environment. Automation can therefore only be meaningful and fully justified if it is designed as an element supporting the provision of a global assistance service to operators (humans + system + environment). Assistance/Aid To make things clearer, a few definitions will be helpful. Etymologically, assistance is very close to aid, except for one thing. Assistance comes from the word assist, which means “being present, being close to.” This distinction should have limited the use of this word to aid provided by one operator physically close to
464
HOURLIER, GRAU, AND VALOT
another. However, this strict form of the definition has somewhat lost its relevance with the development of sophisticated communications systems. We prefer the word assistance to aid in reference to the human dimension involved. Of course, nonhuman agents can provide assistance, but today, operator assistance mainly relies on human agents. The present challenge is to develop nonhuman “assistance” systems providing real benefits to human operators. Rouse (1991) describes two dimensions in the philosophy of assistance. The first is based on the nature of aid: directive or nondirective. The degree of “directiveness” of assistance depends on level of operator expertise and/or motivation. There are prescriptive aids and descriptive aids. Prescriptive aids help meet the needs of poorly trained and/or poorly motivated operators—“helping do the right thing.” Descriptive aids help meet the needs of well-trained and/or highly motivated operators—“helping do things the right way.” Even if it seems that there is an opposition between these two conceptions of assistance, they are, in fact, complementary. Assistance must sometimes be prescriptive, and at other times descriptive, to successfully carry out tasks while taking into account inter- and intraoperator variability. Rouse’s second dimension is based on the operator model used (Hammond, McClealand, & Mumpower, 1980). It can be a model of optimal human behavior, used to reach the best possible results in connection with objective criteria (doing as well as possible in relation to an ideal human model). It can also be a human performance model, used to reach the best possible result in connection to the capacities of the operators involved in the task (doing as well as possible in relation to what the operator knows how to do). These two models represent two totally opposed conceptions of assistance. One is based on the allegation that an individual must be fitted into the best possible behavior, whereas the other one is based on the best possible use of this individual’s capacities related to the task and situation at hand. The combination of these dimensions covers the entire possible range of assistance. It is necessary to determine from the beginning (Rouse, 1991) the kind of assistance adapted to the situation encountered (human beings + system + environment), but the assistance system must under no circumstances be constrained to a single modality, because this could jeopardize its adaptiveness to other situations. So, even though there are many ways to automate, most of them address performance goals. And even though there are many ways to assist, only a few apply to the specifics of a given situation. Most approaches to assisting are far too generic to become effective when the situation gets complicated and when assistance is urgently required. Which brings us to the need for assistance adaptiveness. Adaptivity of Assistance The objective of adaptive assistance has always been, from the onset, to try to provide operators with the aid required, in due time. Two different considerations are implicitly connected with this approach and have an impact on the operator.
21.
AN APPROACH TO ADAPTIVE AIDS
465
First, adaptive systems inherently have the capacity to initiate their own adaptation. Referring to the dictionary (Webster, 1989) definitions of key terms, we observe the following: Adaptable—capable of being adapted2 Adaptive—able to adapt, capable of adaptation; manifesting, showing, contributing to adaptation Adaptiveness—the capacity to be adaptive
Adaptive systems adapt autonomously. Unfortunately, if humans are fundamentally adaptive and technology is still only adaptable, then adaptiveness is specific to living systems (Billings & Woods, 1994). The consequences of attributing such a “human” capacity to automation must therefore be carefully studied. Second, adaptive systems work by recognizing the nature of the aid required by operators. Predictive and deterministic models of this requirement have driven this approach for quite some time (Rouse, 1991) without yet demonstrating their operational validity, supposedly because of a lack of computing power. Their limitation, in fact, is that these systems rarely take variability into account and that they cannot manage the unexpected. But the real shortcoming comes from the validity of the predictive model developed (Amalberti, Bataille, & Deblond, 1989). Another way of looking at the problem is to offer systems that analyze interactions between operators and environment, and where the effective aid provided to operators can be adapted in real time. The definition of the aid to be provided is guided by the investment of operator cognitive resources that lie at the core of this interaction. In working situations, human beings adapt to their environment. They commit a greater or lesser amount of resources to meet task demands. Being governed by the principle of economy, operators tend to never work at full resource potential for two main reasons:
r To remain in a comfort zone and not perform the entire task at maximum capacity, which would quickly be exhausting
r To always have excess resources at hand, which can be tapped when confronted with unplanned elements due to situation variability. The cognitive margins set aside are significant, and managing these margins is at the heart of the cognitive compromise on which performance is based. The adaptation of assistance can be considered the modification of aid in relation to available operator cognitive resources. This principle is commonly accepted, but its application tends to give rise to many questions. Because the adaptiveness of assistance relies on the assumption that operator resources will be exceeded, a relevant model describing how operators manage resources is necessary. 2 Without consideration to whoever may be initiating the adaptation, it is just the possibility of being adapted.
466
HOURLIER, GRAU, AND VALOT
Operators allocate cognitive resources to tasks according to their existing set of priorities. Because resources are limited, operators try to remain in a situation where they do not have to mobilize more resources than they can provide. To this end, they reorganize their priorities if need be (i.e., they adapt). In aeronautics, for example, a secondary mission could very well be abandoned to allow for the primary one to be fully executed. Focusing the assistance on the preservation of operator resources amounts to deliberately aiming at providing operators with adaptation margins. This kind of assistance has two strong points. First, operators are once again in charge of planning their objectives and by the same token their resources; they can now do what they know how to do. Second, the margins regained can help them face a complex situation if it should arise. Methods of Assistance The literature provides for three methods in assistance design: allocation, partition, and transformation. Allocation means that the task is entirely delegated to the system, whereas partition means that only part of the task is going to be automated. If you consider it possible to break a task down into subtasks, and for the operator to carry out several tasks in parallel, the difference between both methods lies only in the degree of allocation. From an adaptive point of view, this assumes that man and machine are equally able to carry out the task with the same level of performance (Dekker & Billings, 1997). The idea is that if an operator lacks resources, an adaptive assistance can prompt an automation to fill in for the operator, as long as this automation is as “good” as the operator. Of course, this disengages operator resources, but in human performance terms the important point is that the operator remains “inside the loop” and can control whether the task was correctly executed by the assistance. The next question is which task should be automated by the assistance. Should it be the last task executed by the operator, where resources were exceeded, a task that cannot be done, one he does not know how to do, or one which is being done and which the assistance can take over? This last solution means that in the end, resource allocation will be redistributed in a way the operator has not planned and which depends on the competence of the assistance. The risk if to drift away from the original objectives of the assistance, that is, from adapting to operator needs toward adapting to situation requirements. With these methods, the operator could end not understanding what the assistance is doing, which is why it is so important to make sure the operator is given the opportunity to construct a mental picture of what the assistance is capable of, and its adaptation rules. Transformation means modifying a task so it becomes easier to execute. This method is defined as being the lowest assistance level possible. However, it assumes that highly relevant models of the abstraction levels of operator representations are available (Rouse, 1983). The advantage of this method is that operators remain in the loop and in control of their objectives. Even though this kind of assistance
21.
AN APPROACH TO ADAPTIVE AIDS
467
can be envisaged for an entire system, it is mainly a highly flexible and ecological choice that can potentially save operator resources. Tasks become less demanding, but the initial coherence ruling resource distribution remains. A fourth method can also be envisaged, where the goal is not to take over all or part of the task, but to provide operators with a vision of the situation for which they have knowledge and know-how, allowing them to operate with fewer resources (the SRK3 model of Rasmussen, 1986). An example of this method would be to remind operators of what they are able to do, at a time when they have not thought about it because they were engaged in a more costly problem-solving procedure. Automation legibility and predictability requires understanding assistance at three possible levels (Amalberti, 1996): (1) understanding the proposal and being able to apply it, (2) understanding the proposal and being able to assess it, and (3) understanding how the system eventually came up with this proposal. There are two ways to achieve this goal: giving the operator what is needed to perceive the automation’s actions and intentions, or building the automation around a model consistent with the operator’s previous model. Understanding the assistance is an absolute prerequisite for operators to be able to develop their own mental picture of what the assistance is capable of. Operators will or will not develop confidence or doubt toward the functionalities of the assistance out of this mental picture. This step is essential if operators are ever to accept the assistance. When developing adaptive assistance systems, the primary risk lies in failing to make system intentions sufficiently legible or predictable. For example, there is justified skepticism on the use of self-adapting systems based on artificial intelligence (AI) for critical applications (Billings & Dekker, 1997). Automation/assistance systems must not be allowed to go adaptive if they do not have a strong capacity for communicating intent. This capacity cannot be dissociated from adaptiveness if the operator is to stay “in the loop.” Any operator dealing with an assistance must always be aware of what this assistance is attempting to accomplish, so as to be able to integrate this intention into his or her own plans and to possibly invalidate suboptimal changes or aberrant system behavior (Billings & Woods, 1994). Further, self-adapting systems must be predictable at a lesser cost for the operator so as not to drain resources available during the learning curve, and ultimately during expert use. In the 18th century, automatons closely imitated human beings, with an remarkable concern for details in terms of appearance, whereas performance remained secondary. Form was more important than substance because techniques available at the time were limited to that aspect of automation. Thanks to technological developments, this is no longer the case. The advantage of R2D2, the flight assistant in the movie Star Wars, is not the way it looks or its computational abilities, but the fact that it has the ability of adopting a “human” behavior, especially 3 The
skill, rule, and knowledge model describes three levels of activity control: the lowest, based on skills; the intermediate, based on the application of rules, and the highest, based on the use of knowledge (Rasmussen, 1986).
468
HOURLIER, GRAU, AND VALOT
in communicating intent. This is where the true current technological challenge lies. If going adaptive really turns out to be the next conceivable step for machines dedicated to operator assistance, it should be under the express condition of aiming at giving adaptive capacities back to operators, allowing them to optimize the balance between resources available and demands placed by the task. Further, this new step should never be implemented without a strong ability to communicate intent.
FLIGHT ASSISTANCE FOR MILITARY AIRCRAFT Aeronautics is an area that is highly sensitive to intelligent operator aids. Aircraft computerization and automation generate increasingly numerous and complex data streams into civilian and military cockpits, which aim at increasing crew performance by raising awareness of the surrounding environment and obtaining a better grasp of aircraft and system capacities. The difficulty for the crew is to simultaneously manage this flow of information while executing mission-specific tasks. In a cockpit, the task is a dynamic process with strong time constraints, during which the crew is expected to manage unanticipated events related to various time frames (Amalberti, 1992). Crew activity is the result of cognitive compromises, through which a satisfactory performance level is attained. However, as described by Reason (1990), because of its heuristic nature, performance can never be optimal; it can even be counterproductive, with errors being generated when the compromises reached are not adapted to the situation encountered. In this context, the objective of the assistance under development is to try and allow for the best possible cognitive compromise, using the crew’s specific adaptation margins in order to optimize strong points without systematically trying to make up for weak points. The point is to tell the crew about a point of view or an approach with which it is already familiar, but which did not spring to mind because of lack of time or lack of available memory. Out attempts to develop an “intelligent” flight assistance system for military aircraft is based on these observations. The philosophy underlying our approach is to provide the pilot with assistance similar to that provided by another crew member in the cockpit, which is why this system was coined electronic copilot. This metaphor helps understand the operating principles and goals underlying this approach to assistance: enhancing situation analysis, understanding, anticipation of future developments, and decision making. However, this project is unique because it aims at providing assistance focused on the cognitive mechanisms involved in the dynamic management of information and constraints, instead of only being focused on the limitations and characteristics of cognitive operation. This choice has consequences at various levels:
r The assistance offered is not an “optimal” aid in the classical sense. Assistance systems developed through expert systems always aimed at making sure the
21.
r
AN APPROACH TO ADAPTIVE AIDS
469
assumptions or solutions offered were as comprehensive as possible. In the electronic copilot approach, optimal use can only be assessed in situ, and the a priori validation of such a concept is extremely difficult for designers and crew members. It is therefore more appropriate to speak of assistance being “adapted” to the situation and the crew. Assistance transparency, a prerequisite in all human–machine couplings, is based on the ease with which the crew understands the proposals offered. What good is the best solution if the crew does not understand it? It will probably not be applied, because no crew is ready to apply a “miracle” solution without verifying or controlling it when the stakes are high. The assistance provided must be understood in terms of solutions, as well as in terms of the rationale used to develop these solutions. Amalberti (1996) highlights this fact, stating that four objectives of understanding must be met to guarantee that an assistance is properly coupled: (1) understanding the final proposal, (2) being able to assess it, (3) understanding how the system came up with it, and (4) being able to execute it.
Behind this need for understanding appear two major properties of any “humanlike” assistance: maintaining the crew inside the flight-control “loop” and making sure it trusts the assistance. These two properties are easy enough to accept, but the real challenge is putting them into practice. Existing assistance systems integrate models of crew behavior and of expert knowledge, optimizing the performance models of engineered machines. But this is not sufficient to create a real coupling. Human performance constraints must be integrated in the first stages of assistance architecture design to obtain the similarity required between real-crew operations and assistance operation, and the eventual coupling expected. A model of crew performance is required before defining and designing the assistance. The goal is to design assistance that operates as similarly as possible to the crew. Activities correspond to tasks actually performed by the crew during missions. The performance model is different from the competence model generally used in knowledge-based aids. Developing a performance model requires knowledge and time, two factors that are not always available in today’s design processes. Performance Modeling in the Electronic Copilot Program Performance modeling comes from the analysis of crew activity. This analytical approach tries to identify characteristics of cognitive activities used during operational missions to define the human performance specifications of crew assistance. For the “electronic copilot” program, the activity was analyzed in three stages: development of a task model, development of a crew performance model, and inferences of crew activities’ cognitive characteristics. The first step in the analysis process is developing a model task. Missions and related context are described, as well as the regulations and procedures required for execution. This analysis calls for studying official documents and interviewing
470
HOURLIER, GRAU, AND VALOT
operational crew members. This phase is extremely important to make sure the expert knowledge analyst (EKA) in charge of the analysis thoroughly identifies all elements that, on the one hand, will later contribute to understanding crew activity, and on the other hand will help develop a model for the “prescribed” task, to be used as a reference when analyzing what is “really” occurring. Performance modeling is done by observing real flight situations. Because of the difficulty involved in monitoring real in-flight performance of military crews, full mission simulators were used in our development efforts, with eight crews having different skill levels. Two concerns presided over mission scenario development: offering a scenario revealing the difficulties encountered by crew members during the execution of the mission, and developing a scenario with a genuine “ecological” value to adequately involve crew members and to collect activity-related data as similar as possible to the data collected during an operational flight. Ergonomic experts working in close cooperation with an operational crew therefore developed the scenario. The first performance analysis was carried out using the traces of activity left by crew members (performance criteria, records on systems use, and reactions to environmental changes). After the fact, interviews were conducted with crew members to analyze the knowledge and reasoning used. After formalization, the expertise collected was transformed into a model representing crew activity. This model was first validated in relation to the various crew members professional backgrounds and then further generalized to other military aircraft missions using a methodology similar to the one employed in the original model. Human Factors Recommendation for the Electronic Copilot Out of the performance model, it is possible to identify the characteristics of an adaptive flight assistance system by inferring the specific cognitive aspects of crew activity. A first characteristic is the highly anticipative nature of aircraft flying. Aeronautical situations are complex, and waiting for events to occur to solve them is a strategy that can quickly exceed the crew’s cognitive capacities. Therefore, in order to guide and optimize information-processing mechanisms, the crew is always anticipating situations. Anticipation has a double function: not getting caught unprepared to avoid having to work in a reactive mode, which can quickly lead to dropping out of the dynamic situation control loop, and the avoidance of problem situations, possibly by changing the course of a task to always remain in the realm of planned and possible situations. An effective assistance must take this anticipative behavior into account to closely match the crew’s cognitive processes. To take the anticipative nature of flight control into account, most designers need to reconsider approaches that often tend to be based on reactive models of human behavior, whereas machines typically perform better and faster than crews on certain types of tasks. The corollary to the definition of anticipative assistance is the availability of a model able to recognize crew intentions that are fully integrated into the assistance.
21.
AN APPROACH TO ADAPTIVE AIDS
471
The second characteristic, also related to the anticipative nature of aircraft flying, is the organization of knowledge into patterns (Norman, 1983). These knowledge patterns are blocks of knowledge, independent from other knowledge, available in memory and used to reach a goal. These patterns prevent crew members from operating analytically; they help anticipate events and provide the crew with prioritized blocks of knowledge, allowing them to reason at different abstraction levels. It is essential to take the formalism of crew member representations into account in the assistance to enhance the similarity existing between the crew’s logic and the assistance system’s logic. A third characteristic comes from the fact that the crew must simultaneously manage different time frames. By essence, there are two specific time frames: short term and long term. Short term refers to anything helping understand or reacting to the aircraft within time spans of less than 1 minute. The issue here is the immediate safety and performance of the aircraft. Long term refers to anything happening or about to happen in the time span included between a minute from now and the end of the mission. Long-term safety and performance issues are under reduced time constraints and involve overall mission coherence. The simultaneous management of these two time frames is important, because satisfactory shortterm management can have detrimental effects on the long term, and vice versa. This means that if the crew does not have enough time to analyze both shortand long-term time segments, it will resort to a two-phase analysis: adopting in the short term a response level guaranteeing basic safety and performance, and allocating additional time margins in a second phase to try and find the right solution for the long term. The assistance must absolutely take these two time frames into account to raise the crew’s awareness on possible solutions it may be familiar with, but which did not immediately spring to mind, given situation constraints. Crews operate with two kinds of knowledge: declarative and procedural knowledge on the one hand, and operator knowledge related to their own field of work on the other hand. This latter knowledge is called meta-knowledge. Meta-knowledge regulates crew activity by directing it toward conduct at which it is proficient. The integration of meta-knowledge into the assistance system helps create an intermediate screening process, contributing to a better management of crew resources, improving the crew’s understanding of situations and of its possible actions. The fifth and last characteristic addresses the need to provide adaptive assistances with “what if?” functionalities. This is designed to help the crew analyze situations and make decisions according to the best cost–risk-effectiveness compromise. Naturalistic decision-making research (Klein, 1993) justly emphasizes that understanding and decision making are associated with action. This strong relationship between decision and action requires the constant assessment of alternate solutions. If the crew is provided with an aid-to-assumption management tool, its adaptation capacities are increased because it can better adjust its cognitive margins to optimize its strong points.
472
HOURLIER, GRAU, AND VALOT
Type of Expert Knowledge Used for the Electronic Copilot Beyond human performance principles, the core of adaptive assistance is its knowledge base. Knowledge is elicited from crew expertise. In the framework of an ergonomic approach to adaptive assistance, the origin, nature, and processing of this knowledge must be discussed to define analysis and formalization methods. Choosing the right experts is not easy. However, this matter must be addressed whenever eliciting expertise to design an expert system. The expertise collected must faithfully reflect the routines and knowledge of operational squadrons. In this respect, it may be difficult to call on experts whose expertise level may be high but who do not adequately represent everyday work. Test pilots, for example, fall into this category; their function is more technological than operational. The same applies to the pilots recognized by all as being experts in their field but whose knowledge is too individualistic and cannot be applied to run-of-the-mill squadron pilots. This is why the expertise selected for the “electronic copilot” project was that of “patrol commanders” with 2 to 3 years experience. The number of experts is also an important matter. Traditionally, expert systems adopt a “multiexpert” approach to build up a body of expertise never encountered with any single expert, because it is the aggregate of different experiences and knowledge bases. This approach goes against the grain of two leading principles:
r The crew’s understanding of the logic behind the assistance; even though the r
expertise elicited claims to be “optimal,” it cannot be understood or used as such by the crew. The crew will find it difficult to find a consistent reasoning in the assistance, and will therefore find it difficult to predict the kind of reasoning used and to assess the risk level associated. Consequently, it will be difficult for the crew to trust the assistance system.
This is why a single expert was selected for the electronic copilot project. The rationale justifying this choice was that the assistance was meant to operate according to a single type of reasoning, as would have been the case with a “human copilot,” had one existed. In these conditions, users can develop a stable mental picture of the way the assistance operates, just as anyone normally develops a model of a colleague’s strengths and weaknesses. The important notion is not the single expert, but the idea of style. The research conducted to generalize the crew performance model showed that various flying styles exist among flight personnel. Styles are closely connected to crew qualification and experience. The expert(s) eventually chosen must epitomize the flying style looked for in the assistance. This is why great efforts were directed toward guaranteeing consistency in internal expertise, to make sure the coherence and comprehensiveness in style would be “optimum.” It will be possible to introduce various styles in the assistance, representing different levels among operational pilots, but this has not yet been done. Parameters could be set in the assistance system, at the crew’s request, to obtain the style of assistance desired.
21.
AN APPROACH TO ADAPTIVE AIDS
473
Expertise collection was done using written scenarios, placing the expert crew in the situation. The crew received a mission paper simulating an operational mission. It then prepared its mission in front of the expert knowledge analyst, with all the tools generally at its disposal. The EKA got involved to get specific details on why hypotheses envisaged were accepted or turned down. Collecting the expertise relating to the scenarios under consideration took several sessions. The EKA focused the interviews on tactical and strategic knowledge, as well as on the knowledge necessary to accomplish the prescribed task. This latter information was available with the model task prescribed. Between sessions, interviews were written down on paper and validated with the expert crew. Once transcribed, the expertise was modeled according to a specifically developed modeling method taking four concepts into account: objects and their properties, tests, actions, and reasoning. Specially designed software made this task easier, and made it possible to process the syntactic, semantic, and lexical coherence of the expertise collected. Using these results, the EKA was then able to structure and drive the ensuing interview sessions. An iterative cycle of ten sessions lead to a formalized expertise directory, which could readily be used:
r To define assistance functionalities in relation to the cognitive requirements r
expressed by the expert crew, and by those collected by the EKA during expertise analysis. By the designers, to directly program this expertise into the electronic copilot assistance system.
Rules for the Electronic Copilot Human–Machine Interface Human factors recommendations help define the nature of the aids required by the crew. However, the true relevance of such assistance can only be judged if the definition of the crew–assistance interface is taken into account. The purpose of this chapter is not to address interface-related questions in detail, but several remarks may be made on the specific features of the interface designed as an adaptive flight assistance system for military aircraft:
r An electronic copilot–type of assistance acts as an adviser and can under no
r
circumstances take over controls from the crew. It is just another element in the process through which the crew examines various assumptions and then makes decisions. The crew remains the sole executive in the aircraft and has the right to ignore the assistance, or even turn it off if it believes the aid provided is useless. Expertise analysis suggests a great number of possible functionalities for the assistance. Even though they all have an interest, they do not all have the same relevance. On the one hand, operational interest can vary according to the assistance offered. On the other hand, several assistance functionalities
474
r
r
HOURLIER, GRAU, AND VALOT
can prove to be very relevant, but the constraints connected with task completion, such as time pressure, may make them obsolete during various flight phases. The assistance works round the clock, transparently for the crew, and assesses the situation without any need for a specific dialogue between human and machine. The dialogue must be initialized, either by the crew, requesting an assessment or an advice, or by the assistance itself whenever an unpredicted event occurs, or when the mission plan assessment no longer provides for safety and performance objectives to be met. The more sophisticated the assistance level (replanning, “what if ?”), the more highly developed the human–machine dialogue, with a multiplication of complexity and number of interactions. Consequently, the next problem to be solved is the introduction of complex and technologically developed dialogue systems, without which the assistance no longer has any ergonomic relevance. The definition of rules and the abstraction level of the dialogue depend on the crew resource management model obtained from the performance model. If an assistance is to provide real help, without increasing crew workload, its ergonomic assessment must not be limited to the mere evaluation of its interface. The benefits expected out of the quality of analyses and of decisions made by the crew to maintain mission safety and performance at highest level must also be taken into consideration.
ADAPTIVE AIDS AND INTERFACES? One of the recognized advantages of adaptive automation is that it gives improved support to the operator in case of unanticipated workload changes (Harris, Goement, Hancock, & Arthur, 1994). In this case, an adaptive aid helps the operator in the learning curve before proficiency is achieved. Adaptive aids are not useful only in emergencies; however, in this kind of situation, one of the most important complementarity criterion between operator and aid is highlighted: Adaptive aids are relevant when the constraints limiting operator capacities restrain operations and make it necessary for the answers supplied by the operator to be supplemented by automated assistance. The various possible forms adaptive aids can adopt have already been discussed. However, further work and analysis on the dialogue between assistance and automation is necessary to optimize complementarity. This complementarity between the aid and the adaptive interface is almost a prerequisite: the aid, just like the interface, must adjust to various operators, to various situations and constraints, to truly provide the operator with an answer helping him in constrained situations when help is truly needed. The shape of dialogues with an adaptive aid requires a specific interface. The quality of this interface is especially critical, because the constraints on operator
21.
AN APPROACH TO ADAPTIVE AIDS
475
activities that justify the need for the adaptive aid in the first place also have consequences on the quality of the dialogue and absolutely require the aid to be adaptive. When working on the nature of adaptive aid, it is extremely important to first think about the shape the interface should adopt. There are many different adaptation levels, from machine-run adaptiveness, which assumes that user states and characteristics are highly predictable, to adaptation scenarios left to user competence or user preference, and thus only moderately automated. What are the criteria used, and what elements of analysis can help make a motivated choice on the nature of the adaptive interface designed for a specific adaptive aid? What Kind of Interface Adaptiveness? The notion of adaptiveness in the framework of systems intelligence and automation assumes that several models are available and that their transposition into a technical system is possible. These models define interactions between operator and automations. General criteria providing for the development of intelligent interfaces are clearly identified (Szekely, 1991). They are series of models (of programs, users, tasks, or workposts), or of knowledge (on how to design interfaces), describing several areas of competence. Using these models supposedly helps develop an interface tailored to the user and all characteristics entailed by the adaptation. The level of detail reached specifies, at least as far as principles, how far the system will be able to adapt itself to various operator constraints, strategies, or objectives in the different situations encountered. The main issue is to make sure these models are reliable in various situations, which are not always available when the models are designed. If high reliability is attained, this approach may be valid and relevant. However, if the interface adaptation cannot assure and generalize reliability, the operator runs the risk of being faced with inconsistencies or impossibilities, which will be detrimental to the rightful use of the aid as well as the interface. In this respect, many task models, taken in the sense of action prescription, are generally relevant; however, they neglect the many aspects of the activity connected to dynamic situations or to unanticipated regulation and planning actions. The activity and diversity of actions involved further complicate the modeling and adaptiveness of the real dialogue connecting machine and operator. Dix, Finlay, Abowd, Beale (1997) tempers down the attraction these models may have, by showing that, even for a human expert, the knowledge required to recognize someone else’s intentions in problem-solving situations is very significant. In case of an automated system, this can only be of increased complexity. This author, in the framework of machine use, lists various techniques. They are:
r Quantification of experience according to the levels at which the task will be executed.
476
HOURLIER, GRAU, AND VALOT
r Task execution stereotypes providing structure for major guidelines, without trying to go into complex knowledge procedures.
r Idealized expert models that are meant to be used as references to assess user performance level. These various techniques try to find the right compromise between the formalisms required to ensure the operation of automated systems and the specifics of a human–machine dialogue as discussed by Schneidermann (1993). He makes several recommendations structured around a key finding: All interface designs and adaptations must take into account the fact that human operators have various operating modes. If an interface is to be well adapted to its user, a great number of varying parameters need to be taken into account: types of interactions, levels of competence, familiarity with system, proficiency of user, and others. Starting from this observation, the author then sets a rule that seems quite obvious (“know your user”). However, his comment becomes more critical when he adds that many designers believe that they already know the user and the task performed, when in fact all they have done is to transpose their own representation of these tasks and knowledge. In such a context, adaptiveness must be approached with caution. Like intelligent aids, it appears to be some type of technological ideal aiming at making all dialogue-related constraints transparent to the user, who will be then free to express himself during the dialogue in the way he desires. The perfected form of this technological ideal will eventually allow the operator to speak freely, while the machine in the background will translate, interpret, and implement the dialogue and its consequences. This concept of dialogue uses numerous technological innovations in the framework of multimodal interfaces. However, in dialogues with adaptive aids, we have to live with many environmental, technical, and cognitive constraints. The quality of the dialogue can only be improved if it can be adapted to the various changing factors that affect communication between operator and aid system. This issue is further compounded by the fact that we are not dealing with a simple interaction between an operator and a machine, where the operator gives orders and monitors the changes in machine status or in the surrounding environment. We are dealing with a real dialogue, where each statement expressed by one of the two players may modify the other’s internal state in the framework of an adaptive development. Theoretical models able to define the operational characteristics of such a dialogue should therefore be developed in the area of interpersonal situations, where communication modalities and media are more numerous and refined by experience. Multimodal interfaces can perfectly suit this purpose, because they are technically able to take into account the statement issued by the operator through various media of expression. Voice, gesture, direction of glance, and all the combinations
21.
AN APPROACH TO ADAPTIVE AIDS
477
of the above used by the operator to communicate are effective and robust communication vectors. Multimodal interface4 developments materialize the association of ergonomic issues on the one hand, oriented toward obtaining the highest quality of dialogue possible, and significant technological improvements on the other hand, in areas of voice reconnaissance and synthesis, virtual gesture, or interactions between the position of head, the position of eyes, and direction of glance. The work carried out on multimodal interfaces has also evidenced two different avenues: optimal tools on the one hand, offering less constraint-related robustness and less adaptability to situations, and on the other hand, suboptimal tools in terms of the technology implemented, but which are more versatile and effective in adapting to various situations and operators. The research described here shows that the search for adaptation solutions more closely related to the way humans operate could also prove interesting in the area of interfaces. The development context of multimodality is similar to that of intelligent aids: A series of technological innovations make the development of certain kinds of human–machine interactions possible, with the aim of obtaining an optimal solution. However, this development is not only driven by the genuine need of operators for aid and dialogue but also by technological opportunities. Reason (1990) mentions the same technical context when he suggests that technological developments are not driven by a desire to meet operator needs, but by technological motivations calling for the implementation of innovations. A Suboptimal Adaptive Interface? The principle of multimodal interfaces is now well-known (Dix & al. 1993, Sullivan 1991). The operator can dialogue with the system by using various modes of expression such as natural language, gestures, and so on. The operator thus has the possibility of embarking on a dialogue using techniques directly related to those used when dealing with other people in familiar dialogue situations. The benefits expected from this opportunity are in direct line with Schneidermann’s generic recommendations (1993): transparent interfaces, because they require no extensive specific knowledge that would reduce spontaneity in interface use, and user-friendly dialogue that the user can readily adopt without being constrained by restrictive syntax rules, and which allow to employ the methods that feel the most comfortable. Multimodality has another key strong point: The operator can deliberately adapt the dialogue to any expression favored for a specific activity; for instance, the operator may choose, for some interactions, to speak out, whereas for others gestures 4 Multimodal
interface: interface allowing the operator to use, simultaneously or sequentially, various modes of expression (voice, gesture, keyboard, glance, etc.) to control the system.
478
HOURLIER, GRAU, AND VALOT
may be preferred, depending only on habits or ease of use. In summary, multimodality offers very attractive perspectives, but a great number of technical and ergonomic issues remain unknown. A Technical Approach to Multimodality The techniques needed to support this kind of interaction are advanced as such, but the problem remains on how to integrate them within an entity endowed with multimodal capacities. In scientific terms, a multimodal dialogue is perceived as a series of signals captured by sensors. The problem is not the allocation of a meaning expressed in language to all these signals; recognition technologies are now well able to perform this. The real problem is building a chronology of signals so that once combined and mingled, they can be exploited by the machine in charge of interpreting operator messages and can be acted on with the machine producing answers or effects. The typical technical issue is how to successfully integrate a deictic (gesture-based designation) interacting with verbally expressed messages already integrating all or part of this deictic. A typical example of this is a statement combining words and gesture, where an operator points to an object while at the same time saying, “put this here.” Approaches centered on this technical problem have, for example, produced gesture/voice integration models using their organization in time. Four cases were described by researchers (Nigay 1993). The combinations of expressions listed were the following:
r Sole use of a single mode: The operator works with a multimodal dialogue r r
r
platform but only uses one mode, for reasons of personal preference (e.g., confidence and experience) Alternate use of modes: The various modes are available, but the operator only uses one at a time. In this case, the signals received are a sequence of single mode signals, alternatively employing the various routes offered. Concurrent use of modes: Two distinct tasks are simultaneously carried out using different modalities. Each task is carried out using a single mode. The issue here is that signals arrive simultaneously but do not address the same task. The interpretation of simultaneousness must take into account this distinction, without the operator having suggested it in the first place. Combined use of modes: This is the most technically complex case, because signals arrive simultaneously for a same task. They complement each other according to operator know-how and individual preference in terms of dialogue.
This technical approach of multimodality highlights the difficulties that must be taken into account when managing the dialogue and developing dialogue adaptations. However, its major shortcoming is that it provides no model representing user competence in multimodal dialogue situations.
21.
AN APPROACH TO ADAPTIVE AIDS
479
An Ergonomic Approach to Multimodality The use of a multimodal interface in a professional environment remains unknown, because very few effective multimodal device exists as of yet. It is therefore impossible to study the way such an interface would be used through a traditional activity analysis, even though we are well aware of the exceptional opportunities it would offer in terms of adaptiveness. Constructing a user competence model for multimodal interfaces has two goals. On the one hand, it makes it possible to weight the different interface routes according to the technical approach. The most complex technical case, which is the combined use of multimodality, will only prove to be a problem if an ergonomic study actually shows that this combination belongs to the operator’s competence model. If not, then the fact that it may not be available will in no way be detrimental to users. In ergonomics, constructing a competence model requires the use of techniques where operators are studied in situ. But how can this be achieved if no real-life device actually exists, even though a model is an absolute prerequisite for opportunities to be appreciated? The only effective solution is to simulate the existence of such a platform with an experiment where subjects are placed in a “Wizard of Oz” type of protocol. Human experimenters generating the behavior expected from the machine, without the subject being aware of it, mime reconnaissance and machine-reactive behavior. A cycle of experiments was carried out (Carbonell, 1997) according to this protocol. The results obtained show that the performance model of users must be significantly reviewed, with spontaneous adaptiveness being readily used by subjects:
r Individually, subjects used a single mode. r With the availability of a multimodal interface, users quickly developed their
r r
own preference for a specific dialogue mode. Subjects quickly stabilize the way they use multimodality; it may not always be optimal for the purpose of dialogue, but it is the way they master best. Multimodes were used in emergencies, or when the dialogue was not readily understood (changing modes after the introduction of an additional constraint). Multimodality was used in a way that can be assimilated to making a compromise between the cognitive cost of using an interface and the effectiveness of its practical use. The choices made by subjects are those representing the best compromise (individual preference for a specific mode, change of mode, suboptimal use of a mode, etc.)
Of course, this performance model studying the use of a multimodal interface is incomplete, but it highlights the differences in approaches between methods centered on the technical difficulties linked to using an interface and other methods
480
HOURLIER, GRAU, AND VALOT
oriented toward developing a performance model based on the observation of operators in situ.
CONCLUSION The concept of adaptive assistance aims at defining and designing aids focused on the operator. This vision may be nothing new in the area of ergonomics, but theoretical and methodological developments in “situated cognition” now make it possible to properly single out the issues involved in this approach. Adaptation means the capacity of the assistance to assess the operator–environment interaction to suggest an aid which will really help the operator function within the best possible cognitive compromise. Managing the operator’s cognitive adaptation margins lies at the heart of the assistance. In this respect, the optimal operation looked for in the human–machine coupling cannot be defined in relation to competence models, but only to performance models. Developing these performance models is the foundation of all approaches aimed at designing an adaptive assistance, be it for the definition and development of assistance functionalities or for the definition of its interface. Even though these ideas are well accepted within the human factors community, the perspective required to demonstrate their operational validity is still lacking. It is only by evaluating real systems that it will become possible to confirm the concepts developed and to design principles for adaptive assistance. Unfortunately, designers of real systems tend to be quite reluctant. The assistance systems envisaged call for a radical philosophical change in the relationship between man and work, and in the vision of what aid can actually provide. Further the methods required to develop this kind of assistance are not widely known among designers and are quite expensive. Also, few systems, if any, exist today and can be used to demonstrate their operational validity. Finally, evaluating this type of assistance is difficult. Adaptive assistance modifies operator tasks, which are then difficult to compare with existing tasks. Assessment methods cannot be limited to a mere system evaluation but need to integrate all operator activities, because the improvements expected are quantitative (objective criteria) as well as qualitative (quality of assessment or of decisions made).
REFERENCES Amalberti, R. (1996). La conduite des syst`emes a` risques. Paris: PUF. Amalberti, R. (1998, April). Why operator’s cognitive models are hard to incorporate into design? The case of human reliability models. Paper presented at the Second European Conference on Cognitive Modelling, University of Nottingham, England. Amalberti, R., Bataille, M., & Deblon, F. (1989). Developpement d’aides intelligentes au pilotage: Formalisation psychologique et informatique d’un mod`ele de comportement du pilote de combat engag´e en mission de p´en´etration. (89.09 LCBA). Paris, France: CERMA.
21.
AN APPROACH TO ADAPTIVE AIDS
481
Amalberti, R., & Deblon, F. (1992). Cognitive modelling of fighter aircraft process control: A step towards an intelligent on-board assistance system. International Journal of Man–Machine Studies, 36, 639–671. Bainbridge, L. (1983). Ironies of automation. Automatica, 19, 775–779. Billings, C. E., & Dekker, S. W. A. (1997). Advanced and novel automation concepts for the future system. In C. E. Billings (Ed.), Aviation automation: The search for a human-centered approach (pp. 221–230). Mahwah, NJ: Lawrence Erlbaum Associates. Billings, C. E., & Woods, D. D. (1994). Concerns about adaptive automation in aviation systems. In M. M. Mouloua & R. Parasuraman (Eds.), Human performance in automated systems: Current research and trends. Hillsdale, NJ: Lawrence Erlbaum Associates. Carbonell, N., Valot, C., Mignot, C., & Dauchy, P. (1997). Etude empirique de l’usage du geste et de la parole en situation de communication homme machine. Travail Humain, 60, 155–184. Davis, L. E., & Wackers, G. J. (1987). Job design. In G. Salvendy (Ed.), Handbook of human factors (pp. 431–452). New York: Wiley. Dekker, S. W. A., & Billings, C. E. (1997). humans and the evolution of industrial automation. In C. E. Billings (Ed.), Aviation automation: The search for a human-centered approach (pp. 51–64). Mahwah, NJ: Lawrence Erlbaum Associates. Dix, A., Finlay, J., Abowd, G., & Beale, R. (1997). Human computer interaction. London: PrenticeHall. Hammond, K. R., McCleland, G. H., & Mumpower, J. (1980). Human judgment and decision making. New York: Hemisphere/Praeger. Harris, W. C., Goement, P. N., Hancock, P. A., & Arthur, E. (1994). The comparative effectiveness of adaptive automation and operator initiated automation during anticipated and unanticipated taskload increases. In M. Mouloua & R. Parasuraman (Eds.), Human performances in automated systems: Current research and trends (pp. 40–56). Hillsdale, NJ: Lawrence Erlbaum Associates. Jeoffroy, F. (1997). Automatisation en termes d’aide. In M. d. Montmollin (Ed.), Vocabulaire de l’ergonomie (pp. 20–22). Toulouse, France: Octares. Klein, G. A. (1993). A recognition-primed decision (RPD) model of rapid decision making. In G. A. Klein, J. Orasanu, R. Calderwood, & C. Zsambok (Eds.), Decision making in action: Models and methods (pp. 138–147). Norwood, NJ: Ablex. Nigay, L., & Coutaz, J. (1993). A Design Space for Multimodal Systems: Concurrent Processing and Data Fusion. In S. Ashlund, K. Muller, A. Henderson, E. Hollnagel, & T. White (Eds.), INTERCHI’93 Conference Proceedings (pp. 172–178), New York: Association for Computing Machinery Press, Addison Wesley Publishing Company. Norman, D. A. (1983). Some observations on mental models. In S. Gentner (Ed.), Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Parasuraman, R., Bahri, T., Deaton, J., Morrison, J., & Barnes, M. (1992). Theory and design of adaptive automation in aviation systems (Prog. Rep. No. 92033-60). Warminster, PA: Naval Air Warfare Center, Aircraft Division. Parasuraman, R., Malloy, R., & Singh, I. L. (1993). Performance consequences of automation-induced “complacency.” International Journal of Aviation Psychology, 3, 1–23. Parsons, H. M. (1985). Automation and the individual: Comprehensive and comparative views. Human Factors, 27, 99–111. Rasmussen, J. (1986). Information processing and human–machine interface. New York: NorthHolland. Reason, J. T. (1990). Human error. New York: Cambridge University Press. Rouse, W. B. (1988). Adaptive aiding for human/computer control. Human factors, 30, 431–443. Rouse, W. B. (1991). Design of aiding, design for success: A human-centered approach to designing successful products and systems (pp. 145–211). New York: Wiley. Rouse, W. B., & Rouse, S. H. (1983). A framework for research on adaptive decision aids (Tech Rep. No. 83-082). Wright-Patterson Air Force Base, OH: Air Force Aerospace Medical Research Laboratory.
482
HOURLIER, GRAU, AND VALOT
Scerbo, M. W. (1996). Theoretical perspectives on adaptive automation, automation and human performance. Mahwah, NJ: Lawrence Erlbaum Associates. Schneidermann, B. (1993). Designing the user interface. Reading, MA: Addison-Wesley. Szekely, P. (1991). Structuring programs to support intelligent interfaces. In J. W. Sullivan & W. Sherman (Eds.), Intelligent user interfaces,. New York: Addison-Wesley. Wickens, C. D. (1992). Engineering psychology and human performance (2nd ed.). New York: HarperCollins.
22 Adaptive Pilot–Vehicle Interfaces for the Tactical Air Environment Sandeep S. Mulgund The MITRE Corporation
Gerard Rinkus Greg Zacharias Charles River Analytics
Advances in aircraft performance and weapons capabilities have led to a dramatic increase in the tempo of tactical situations facing the combat pilot, reducing the pilot’s available processing and decision time. Further, technological advances in cockpit displays and electronics have resulted in an explosion in the complexity and sheer quantity of information that is available to the pilot. The pilot has more things to deal with in the cockpit (each of which is becoming more complex to understand) and less time in which to do so (Endsley & Bolstad, 1992). To counter this increasingly complex cockpit environment, a need exists to develop advanced pilot–vehicle interface (PVI) concepts that will make optimal use of the pilot’s abilities. The PVI should enhance the flow of information between pilot and cockpit in such a way as to improve the pilot’s situation awareness (SA) while managing workload and thus improving the pilot–vehicle system’s survivability, lethalness, and mission effectiveness. The premise of this research is that three essential functions may be used to design and develop a next-generation pilot–vehicle interface that enhances pilot situation awareness: 1. A means of assessing the current tactical situation 483
484
MULGUND, RINKUS, AND ZACHARIAS
2. A means of inferring the current pilot workload or “mental state” 3. A means of combining these via an innovative PVI adaptation strategy The purpose of the first function is to use the aircraft’s information systems to generate a high-level interpretation of the tactical situation facing the pilot. The air-combat task is a process in which the pilot must make dynamic decisions under high uncertainty, high time pressure, and rapid change. It has been demonstrated that high pilot SA is an accurate predictor of engagement success in complex timestressed scenarios (Endsley, 1989, 1990, 1993; 1995; Fracker, 1990; Klein, 1994; Zacharias, Miao, & Riley 1992b). Consequently, improving pilot SA in air combat has become the number one goal of the human systems technology investment recommendations made by the U.S. Air Force’s Development Planning Directorate at Wright-Patterson Air Force Base (Development Planning Directorate, 1995). Many new technologies and subsystems are thus being considered to enhance pilot situation awareness. These include advanced sensor systems, onboard datalinks to theater C3 I systems, helmet-mounted virtual reality (VR) displays, and decision aids. Given the importance of situation assessment in the cockpit, any pilot–vehicle interface should be designed to maximize the pilot’s situation awareness without bombarding the pilot with superfluous information. To compute an assessment of the tactical situation, a means is needed for integrating the outputs of the aircraft’s various information systems (each of which may have varying levels of reliability) and deriving a high-level abstraction of the situation in the context of the overall mission goals, rules of engagement, and other considerations. A combination of fuzzy logic (Zadeh, 1973) and belief networks (BNs; Pearl, 1988) provides an effective solution to this problem and offer a natural framework for encoding complex tactical knowledge. Unlike conventional expert system approaches used in SA technology efforts, such as the pilot’s associate (Corrigan & Keller, 1989), which may suffer from “brittleness” in situations not explicitly modeled, BNs enable the designer to partition a large knowledge base into small clusters. The designer need then only specify probabilistic relationships among variables in each cluster (and between neighboring clusters). Considerable evidence suggests that this is how humans structure their knowledge bases (Pearl, 1988). The key benefit of this modeling approach is that it makes it possible to construct large, robust knowledge bases without explicitly specifying the relationships and dependencies between all possible combinations of variables. Further, BNs provide a capacity for nonmonotonic reasoning: Multiple pieces of evidence that support or refute a given hypothesis can be applied to a BN in an arbitrary order, depending on the sequence in which they are received. The purpose of the second function is to use physiological (and other) measurements to instantaneously infer the pilot’s mental state. By pilot state, we mean a set of metrics that serve to define the pilot’s information-processing burden, mental
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
485
workload, and level of engagement in a set of tasks. To derive a judgment of pilot state, a need exists for unobtrusive, automatic, and continuous estimation of the pilot’s mental workload. There are three broad categories of workload measurement techniques: (1) Subjective procedures, based on operator judgments of task workload; (2) Performance-based techniques, based on the operator’s ability to perform tasks; and (3) Physiological techniques that interpret the operator’s physiological response to a task. Wierwille and Eggemeier (1993) suggest that it is often desirable to measure multiple workload indicators, because a single technique may not always provide an accurate indication of operator loading. The use of pilot workload measures in conjunction with estimates of the current tactical situation offers considerable potential to adapt the display to enhance overall awareness and manage pilot workload. This brings us to the third function of an envisioned interface, the PVI adaptation strategy. For example, if a workload assessor determines that the pilot’s visual channel is saturated (or if the pilot’s line of sight is directed out the window), a highurgency display element that would nominally be presented on a visual display (e.g., approach of a g limit or radar lock-on) could be presented via an audio alert or force feedback in the control stick. Further, the PVI could prioritize and filter visual display components to present only the highest priority information for the current tactical situation to alleviate the pilot’s visual search and make the crucial information readily available. Another possibility is to have the PVI control the allocation of tasks between the human (the manual element) and the system (the automated element) as a function of the pilot’s level of engagement in the task (Pope, Bogart, & Bartolome, 1995) or his mental workload (Haas, 1995). In all cases, the PVI adaptation strategy should be founded on a coherent model of human capabilities so that the pilot–vehicle system can operate as effectively as possible. If the adaptation is performed in an ad hoc manner that does not take into consideration human limitations and the pilot’s information needs for accurate situation assessment, the effect may be to degrade pilot performance and mission effectiveness. This chapter contains eight additional sections that describe the prototype adaptive pilot–vehicle interface. We first provide background information on tactical situation assessment and decision-making behavior, as well as brief overviews of situation awareness modeling and workload estimation. We then present the functional design of the adaptive PVI. The following two sections outline the general requirements for an online tactical situation assessor, as well as the specific implementation developed in this study. We then describe the pilot state estimator, which derives an estimate of pilot workload. The subsequent section describes the operation of the PVI adaptation module, which uses the pilot state estimate and the assessed situation to drive the content, format, and modality of the adaptive PVI. We then describe the prototype implementation and preliminary evaluation efforts, and provide some summarizing comments.
486
MULGUND, RINKUS, AND ZACHARIAS
BACKGROUND Tactical Situation Assessment and Decision-Making Behavior Human performance in decision making in general, and in tactical planning in particular, have been studied extensively by psychologists and human factors researchers, primarily through empirical studies in the field but increasingly so with computational modeling tools. These studies span the theoretical-to-applied spectrum and cover many domains. Many aspects of human performance have been studied. Endsley (1995a) and Adams, Tenney, and Pew (1995a) discuss a psychological model of decision making, focusing in particular on situation awareness and the impact of particular system characteristics on the operator workload, attention, and memory requirements, and the likelihood of errors. Klein (1989a) has studied a particular type of decision making predicated on the quick extraction of salient cues from a complex environment and a mapping of these cues to a set of procedures. Research indicates that such recognition-primed decision making plays a major role in tactical planning, and it is therefore critical for decision-aiding systems to recognize this mode of human information processing and support it through appropriate display design. Studies have been conducted investigating reasoning styles and comparing analytical and intuitive cognitive styles in expert decision making (Hammond, Hamm, Grassia, & Pearson, 1987). Results indicate that particular attributes of tasks (e.g., number of redundant cues, complexity of the situation, and degree of perceptual vs. abstract and objective task elements) induce an automatic mode of processing underlying intuitive judgment. Such results are particularly relevant for tactical visualization, where complex combinations of intuitive and analytical judgments and decision making are common in assessing the situation. In the battlefield management and tactical planning domain, a number of studies of human performance have been conducted. Several findings stand out in their relevance to tactical situation visualization and analysis. Several types of biases in tactical SA have been identified (Fallesen, 1993; Tolcott, Marvin, & Lehner, 1989) that contribute to the inadequate development of tactical alternatives or to the selection of an inappropriate final course of action (COA). Of particular importance is the primacy bias, that is, selecting an a priori option and then looking for confirming evidence and ignoring disconfirming evidence for that option. Another common bias is success orientation, that is, the overconfidence in friendly plans and underestimation of possible enemy activities that could jeopardize projected friendly activities (Fallesen & Michel, 1991; Lussier, Solick, & Keene, 1992). Empirical studies of tactical planning and decision making indicate that certain categories of failures are common, resulting in inadequate COA development and selection (Fallesen, 1993). Fallesen divides these failures into categories according to the stage of the COA development process (e.g., situation assessment, formulation of alternative COAs, comparison of these alternatives, and war gaming). For
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
487
each category, he then identifies the most critical factors that contribute to ineffective and nonoptimal performance. Examples of these factors are failure to use systematic comparison strategies for alternative COAs, failure to verify uncertain information, failure to develop adequate action–reaction trees due to inadequate wargaming, failure to consider all factors, failure to verify assumptions, failure to assess information quality, failure to interpret available information, and failure to make predictions for situation assessment. Other research indicates that knowledge of enemy activities is particularly critical but often neglected by tactical planners (Castro, Hicks, Ervin, & Halpin, 1992; Shaw & Powell, 1989). A number of studies have been conducted focusing on the differences between expert and nonexpert performance. An experiment designed to determine differences in information usage by tactical planners indicated that 78% of critical facts identified by the experts were missed by the nonexperts. The facts missed by nonexperts included timing information, actions of adjacent units, changes in boundaries, enemy activities, terrain constraints, mobility, engineering capabilities, and logistical loads (Fallesen et al., 1992). Another critical difference between experts and nonexperts is the use of uncertain information. Experts were more aware of uncertain assumptions and made explicit predictions of events that would confirm their expectations and thus confirm or disconfirm assumptions (Tolcott et al., 1989). A study of expert military tactical decision making (Deckert, Entin, Entin, MacMillan, & Serfaty, 1994) found that experts’ performance differed along a number of dimensions, including awareness of enemy activities, learning from past mistakes, flexibility of planning, seeking of disconfirming evidence, deeper exploration of options, and better management of uncertain information. Situation Awareness Models A variety of situation awareness models have been hypothesized and developed by psychologists and human factors researchers, primarily through empirical studies in the field, but increasingly so with computational modeling tools. Because of the critical role of situation awareness in air combat, the U.S. Air Force has taken the lead in studying its measurement and trainability (Caretta, Perry, & Ree, 1994). Numerous studies have been conducted to develop SA models and metrics for air combat (Endsley, 1989, 1990, 1993; Fracker, 1990; Hartman & Secrist, 1991; Hartwood, Barnett & Wickens, 1988; Klein, 1994; Stiffler, 1988; Zacharias, Miao, Illgen, & Yara 1995). Situation awareness modeling can be roughly categorized into two efforts: descriptive model development and prescriptive (or computational) model development. Table 22.1 summarizes the major features of the two model classes and presents their relative advantages and disadvantages. Descriptive models dominate the SA modeling effort, and most developed SA models belong to the descriptive group. Shown to accurately reflect actual pilot decision making, the descriptive models contribute to the recognition of SA’s importance for air combat and of a need to improve subsystems for SA
488
MULGUND, RINKUS, AND ZACHARIAS TABLE 22.1 Features, Advantages, and Disadvantages of Prescriptive and Descriptive Situation Awareness Models
Class
Features
Advantages
Disadvantages
Descriptive
Data driven Qualitative
Reflect actual SA process Capable of handing qualitative constrains
Lack of predictive capability Provide vague and nonextensible conclusions Do not support computational implementation
Prescriptive
Assumption or theory driven Quantitative
Prescribe SA process
High development cost
Support computational implementation Support objective SA metric development
Limitations in applicability
enhancement (Endsley, 1989, 1990, 1993; Fracker, 1990; Hartman & Secrist, 1991; Hartwood et al., 1988; Spick, 1988; Stiffler, 1988). Endsley (1995a) presents a descriptive model of situation awareness in a generic dynamic decision-making environment, depicting the relevant factors and underlying mechanisms affecting situation awareness. The relationship between SA and numerous individual and environmental factors is explored. Among these factors, attention and working memory are considered the critical factors limiting effective SA. Mental model and goal-directed behavior are hypothesized as important mechanisms for overcoming these limits. Although the descriptive models are capable of identifying the dependent relationships between subsystem modifications and SA enhancements, they do not support a quantitative evaluation of such relationships when no empirical data is available. There currently exists no descriptive model that has been developed into a computational model for actual emulation of pilot decision-making behavior in real-time simulation studies. In contrast to the situation with descriptive models, few prescriptive models of SA have been proposed or developed. Unlike descriptive models, prescriptive models do support computational implementation, and they are amenable to the application contemplated here. Early attempts used production rules (Baron, Zacharias, Muralidharan, & Lancraft, 1980; Milgram, van der Wijngaart, Veerbeek, Bleeker, & Fokker, 1984). In these efforts, the SA model was developed as a forward chaining production rule (PR) system where a situation is assessed using the rule “if a set of events E occurs, then the situation is S.” There are several serious
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
489
shortcomings associated with this approach. First, the format of the PR approach, events −> situation, is opposite to that of SA as we know it. In an SA problem, we expect beforehand that if the situation is S, then E (a set of event cues associated with S) should occur. After we detect these events, we then attempt to reassess S based on our understanding of the situation–event relations. In other words, the SA process is a diagnostic process (from effects to possible reasons), instead of a deductive reasoning process. Unless the correspondence between situation and event is strictly one-to-one, we simply cannot deduce a situation from events by using the PR approach of events −> situation. The most troubling aspect of the PR approach is its essential lack of long-term memory or an internal mental model. The best that can be done with this approach is to model the assessment of a situation using current event cues, whereas in the real world, effective SA makes use of event histories and event cue information. Recognizing that SA is fundamentally a diagnostic reasoning process, Zacharias, Miao, and Riley (1992), and Zacharias, Miao, Kalkan, and Kao (1994 a,b) used belief networks (Pearl, 1988) for developing prescriptive SA models for two widely different domains: counter–air operations and nuclear power plant diagnostic monitoring. Both efforts modeled situation assessment as an integrated inferential diagnostic process in which situations are considered as hypothesized reasons, events as effects, and sensory (and sensor) data as symptoms (detected effects). SA starts with the detection of event occurrences. After the events are detected, their likelihood (belief) impacts on the situations are evaluated by backward tracing the situation–event relation (diagnostic reasoning) using Bayesian logic. The updated situation likelihood assessments then drive the projection of future event occurrences by forward inferencing along the situation–event relation (inferential reasoning) to guide the next step of event detection. This technique was later used to model pilot situation awareness under free-flight air traffic management (Harper, Mulgund, Zacharias, & Kuchar, 1998). Physiological Measurement of Workload There have been numerous studies finding correlations between heart rate (HR) and workload (Harris, Bonadies, & Comstock, 1989; Speyer, Fort, Foulliet, & Blomberg 1987). However, there have also been studies that failed to find systematic relations between HR and mental workload. In contrast, the studies of heart rate variability (HRV, also called sinus arrhythmia), measured either in the time domain or frequency domain, have shown relations that are more consistent to workload (Horst, Ruchkin, & Munson, 1987; Kalsbeek, 1971). HRV is measured as the variability in the R-R interval over time. More recently, frequency analysis of an electrocardiogram (ECG) has been applied. The spectrum can be naturally decomposed into three bands associated with different physiological control mechanisms. The middle band (centered at .10 Hz, i.e., the vagal band) has been found to decrease in power with increased mental effort (Aasman, Wijers, Mulder, & Mulder, 1988;
490
MULGUND, RINKUS, AND ZACHARIAS
Egelund, 1982; Vicente, Thornton, & Moray, 1987). Additional reports showing better workload prediction from measures of HRV than from HR are those of Byrne, Kathi, and Parasuraman (1995) and Backs, Wilson, and Hankins (1995). There have also been numerous studies finding correlations between various EEG spectral bands and mental workload indices. In general, mental states corresponding to alertness and increased mental workload have been associated with a lower power in the middle EEG frequency bands—that is, theta (4–7 Hz) and alpha (8–13 Hz)—whereas relaxed or low-workload states have been associated with higher power in those bands and lower power in the higher frequency beta band (14–25 Hz). Numerous studies have reported significant negative correlations between mental workload and power in the alpha band (Gale, 1987; Gevins & Schaffer, 1980; Natani & Gomer, 1981; Sirevaag, Kramer, DeJong, & Mecklinger, 1988; Sterman, Mann, & Kaiser, 1992). There is somewhat less support for decrease in theta power with increased workload (Natani & Gomer, 1981; Sirevaag et al., 1988). More recently, Backs, Ryan, Wilson, & Swain (1995) have found psychophysiological change—specifically, reduced alpha power—occurring prior to performance decrement in a monitored task and therefore suggest the predictive utility of using such signals to measure mental workload. Eye blink rate and duration have been found to correlate with workload, especially visual workload. Studies showing this are Wilson, Purvis, Skelly, Fullencamp, and Davis (1987) and Holland and Tarlow (1972). Kramer (1991) indicates that blink duration studies have been more consistent on average, with blink duration decreasing as workload increases (Sirevaag, Kramer, DeJong, & Mecklinger, 1988; Stern & Skelly, 1984). Finally, respiration rate has also been found to correlate with workload level. Pettyjohn, McNeil, Akers and Faber (1977) found that inspiratory minute volume (IMV) correlated with flight segment (for both fixed-wing aircraft and helicopters), and thus roughly with workload level. For the fixed-wing case, they found that IMV increased during those flight segments normally associated with higher workload and stress, take-off, and landing. Lindholm, Chatham, Koriath and Longridge (1984) and Opmeer and Krol (1973) also found respiration rate to be positively correlated with workload. Functional Design of Adaptive Pilot–Vehicle Interface We now present the design for our adaptive pilot–vehicle interface for tactical aircraft cockpits. We first provide a functional block diagram of the system and describe its overall operation. The system’s key components are then described in detail. Figure 22.1 shows a block diagram of the adaptive pilot–vehicle interface architecture, shown within the context of an overall pilot–vehicle system. The dotted lines indicate the scope of the adaptive interface. Our preliminary effort focused on developing a conceptual framework for creating the adaptive interface concept, and creating a limited-scope prototype that demonstrates each principal component.
22.
Pilot State Estimator
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
External Direct Observation Environment - Terrain - Weather - Enemy aircraft
Physiological Measurements
Pilot State
Assessed Situation
PVI Adaptation Module
491
PVI Input/Output
PVI Commands
SA Information Requirements
Pilot/Vehicle Interface
Subsystem Outputs & Pilot Commands Pilot Commands & Vehicle Response
Situation Assessor
Subsystem Outputs
Aircraft Avionics
FIG. 22.1. Functional block diagram of an adaptive pilot–vehicle interface.
The overall architecture consists of three distinct but tightly coupled modules: (1) an online computational situation assessor that generates a “picture” of the tactical situation facing the pilot, used to determine the pilot’s information needs for accurate SA; (2) a pilot state estimator, which uses physiological signals and other measures to generate an estimate of pilot mental state (workload, level of task engagement, etc.); and (3) the PVI adaptation module, which uses the assessed situation to determine the pilot’s tactically relevant information requirements, and the pilot state estimate, to determine the most appropriate modality and display format for conveying that information to the pilot. This system architecture provides to key information streams to the PVI adaptation module to drive the adaptive interface’s behavior: (1) a content path, driven by the tactical situation assessor and (2) a format path, which uses the estimate of pilot state to determine the most appropriate modality for conveying the required information to the pilot. In essence, the content path decides what information to make readily available to the pilot, and the format path decides how to do it. The pilot interacts with the aircraft using a number of modalities. Graphical displays may be in the form of head-down displays, head-up displays, or helmetmounted displays. The PVI provides auditory alerts in the form of tones or synthesized speech, using localized 3-D or nonlocalized audio as appropriate. The latter has shown considerable promise in supporting the target detection process (McKinley et al., 1995). Speech recognition technology may make it possible for the pilot to command system modes and content verbally. The pilot operates the aircraft via manual control inputs, and he may receive tactile feedback from the
492
MULGUND, RINKUS, AND ZACHARIAS
controls via (for example) a control loading system. The following three sections describe the system’s key components in detail. SITUATION ASSESSMENT USING BELIEF NETWORKS A key component of the adaptive PVI architecture is the situation assessment module, which uses aircraft information system outputs to generate a high-level interpretation of the tactical situation facing the pilot, from which it determines the pilot’s information needs for accurate situation awareness. The adaptive pilot– vehicle interface design presented here relies on Bayesian belief networks (Pearl, 1988) for probabilistic reasoning in the presence of uncertainty. Any robust computational model of situation assessment requires a technology that has: (1) a capability to quantitatively represent the key SA concepts such as situations, events, and the pilot’s mental model; (2) a mechanism to reflect both diagnostic and inferential reasoning; and (3) an ability to deal with various levels and types of uncertainties, because most real-world systems of any complexity involve uncertainty. Russell and Norvig (1995) cite three principal reasons for this uncertainty:
r Theoretical ignorance: All models of physical systems are necessarily approximations.
r Laziness: Truly exceptionless rules require numerous antecedents and conr
sequents (cf. “Frame Problem,” McCarthy & Hayes, 1969) and are therefore computationally intractable. Practical ignorance: Even if all rules are known, there may be insufficient time to measure all properties of the particular objects that must be reasoned over.
Through many tactical modeling efforts, we have found that belief network technology is an ideal tool for meeting these requirements and modeling (quantifying) SA behavior. The principal advantages of belief networks over other uncertain reasoning methods are:
r Its probability estimates are guaranteed to be consistent with probability r r
theory. This stems from its Bayesian update procedure’s strict derivation from the axioms of probability. It is computationally tractable. Its efficiency stems principally from exploitation of conditional independence relationships over the domain. This will be explained shortly. The structure of a BN captures the cause-and-effect relationships that exist among the variables of the domain. The ease of causal interpretation in BN models typically makes them easier to construct, thus minimizing the knowledge engineering costs.
22.
493
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
worn piston rings (WPT)
cracked gasket
loose belt
oil spill
10 01
excessive oil consumption (EOC)
oil leak
greasy engine block
low oil level (LOL) clean exhaust (CE) battery power
oli gauge
radio
FIG. 22.2. Simple example of a belief network (adapted from Druzdzel, 1996).
r The BN formalism supports many+ reasoning modes: causal reasoning from causes to effects, diagnostic reasoning from effects to causes, mixed causal and diagnostic reasoning, and intercausal reasoning. Intercausal reasoning refers to the situation in which a model contains two potential causes for a given effect. If we gain evidence that one of the possible causes is very likely, this reduces the likelihood of the other cause. Russell and Norvig (1995) assert that no other uncertain reasoning formalism supports this range of reasoning modes. Figure 22.2 illustrates an example BN model of a simple automotive diagnostic process based on a system of binary (true/false) variables. Each variable, x, has an associated probability distribution or belief vector—denoted BEL(x)—which indicates the model’s current relative belief for each of x’s possible values. Thus, BEL(WPT) might be initially set to T = 0.01, F = 0.99, reflecting the low prior likelihood of worn piston rings. A central feature of the BN formalism is that BEL(x) is decomposed as a product of the total causal evidence at x, π (x), which comes from the set of nodes leading to x (i.e., x’s parents), and the total diagnostic evidence at x, λ(x), which comes from the set of nodes to which x leads (i.e., its children) as in the following equation. [However, note that root nodes, i.e., nodes with no parents—there may be more than one, are special cases in that they require some initial estimate for their π(x) vectors.]: BEL(x) = π(x)λ(x) Each variable (i.e., node) also has an associated conditional probability table (CPT) that describes the probabilistic dependencies between it and its parents. In
494
MULGUND, RINKUS, AND ZACHARIAS
other words, the CPTs are local joint probability distributions involving subsets of the whole domain. For example, if a variable, x, is 4-valued and has one parent variable, y, which is 3-valued, then x’s CPT can be represented as a 3 × 4 table where the (i,j)th entry is p(xj|yi). The CPT for variable EOC, shown in Fig. 22.2 captures the correlation between worn piston rings and the tendency to burn oil. Belief vectors generally change as new evidence regarding any of the variables is added to the network. Thus, if we obtain new evidence of excessive oil consumption, our initial belief about the likelihood of worn piston rings, represented by BEL(WPT) = (T = 0.01, F = 0.09), should be revised accordingly, for example, to T = 0.6, F = 0.4. This is an example of diagnostic reasoning from effects back to possible causes. Formally, it is accomplished by multiplying λ(EOC) by the transpose of the CPT characterizing the link between WPT and EOC. This new evidence should also cause us to revise BEL(CE) to reflect a lower probability that the exhaust is clean, for example, T = 0.01, F = 0.99. This is an example of causal reasoning, from causes to effects. Formally, it is accomplished by multiplying π(EOC) by the CPT relating EOC and CE. Belief networks provide the capability and flexibility of modeling situation assessment with its full richness without arbitrary restrictions. Belief networks provide several advantages over other modeling approaches. They provide a comprehensible picture of the SA problem by indicating the dependent relationships among the variables, both high level (symbolic) and low level (numeric), relevant to the SA problem. This provides a clearer view (than, for instance, a low-level neural network-based approach would) of how each individual piece of evidence affects the high-level situation characterization. They allow the incremental addition of evidence, concerning any of the domain variables, as it arrives, thus allowing for real-time SA update. The method is consistent with the axioms probability. Finally, the graphical representation in which the global CPT is decomposed into a set of smaller local CPTs, each describing the interactions of a small subset of the domain variables, renders probabilistic inferencing tractable even for large domains such as tactical situation assessment. The belief network formalism is thus a very natural choice for modeling the SA process, and it is used in the adaptive PVI architecture to capture the essential tactical variables that drive the pilot’s information needs.
TACTICAL SITUATION ASSESSMENT EMBEDDED IN THE ADAPTIVE PILOT–VEHICLE INTERFACE The adaptive PVI prototype uses BNs to model the pilot’s threat assessment process during the defensive reaction segment of a battlefield air interdiction mission, whose goal is to destroy a bridge behind the forward edge of the battle area before enemy ground troops can cross it. Table 22.2 describes the principal phases of this mission, along with the principal piloting tasks during each mission phase. During the defensive reaction mission segment, the pilot must make a judgment
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
495
TABLE 22.2 Key Mission Phases in the Battlefield Air Interdiction Mission
Mission Phase
Ingress Defensive reaction Target acquisition Weapons Employment Egress
Key Piloting Activities
Initial entry into mission Encounter with threat aircraft Setup for attacking bridge behind enemy lines Bomb delivery within specified time window Exit back to friendly territory
as to the potential threat to ownship posed by an aircraft detected by the onboard sensors in the overall context of the mission and its rules of engagement. This assessment will support the pilot’s decision to attack, avoid, or defend against the detected contact. Because the mission’s overall objective is to destroy the bridge within a specified time window, the pilot should only engage enemy aircraft if necessary. Figure 22.3 illustrates the threat assessment belief network developed to support the pilot’s threat assessment, through consultation with an air combat pilot who served as a domain expert for this development effort. The network quantizes threat potential into one of four categories: high, medium, low, or none. The rectangular nodes denote information that is derived directly from sensor measurements, whereas the oval nodes denote hypotheses that are computed in accordance with the axioms of probability from this sensor data, using the information stored in the network’s conditional probability tables. The threat potential depends directly on the type of sensor contact (transport, bomber, fighter, or missile), its aspect angle (low, medium, or high), its location with respect to our estimate of its threat envelope (out of range, within range, inside minimum range, or within gun range), and its maneuvering actions (turning toward ownship, turning away, or nonmaneuvering). The type and aspect hypotheses are integrated into a single supernode called type and aspect because fighter aircraft and missiles are a greater threat when pointing directly at ownship, whereas bombers often have tail guns and therefore are of greater concern when pointed directly away. As such, the effect of aspect angle on threat hypothesis depends on the type of vehicle, making it appropriate to integrate the two variables. The type hypothesis depends on the following: (1) the output of ownship’s noncooperative target recognition system; (2) the radar type classification generated by the radar warning receiver (RWR), if any; (3) the contact’s altitude; and (4) the contact’s speed. Contact altitude, speed, and aspect angle are derived using radar system outputs and ownship air-data systems.
496
MULGUND, RINKUS, AND ZACHARIAS High Medium Low None
Threat Potential
Type and Aspect
Transport Bomber Fighter Missile
Type
Location
Actions
Turning towards Out of Range Turning away Within Range Inside Minimum Range Non-maneuvering Within Gun Range Aspect Low Medium High
NCTR Transport Bomber Fighter Unknown
RWR ID Bomber Fighter Missile Unknown
Altitude
Speed
Low High
Slow Fast Very Fast
FIG. 22.3. Belief network for air-to-air threat assessment.
In the course of a simulation, multiple copies of this network are instantiated (a capability afforded by object-oriented software development), with one for each sensor contact. The threat hypothesis generated by each network drives the radar display symbology so that display appearance correlates with the hypothesized threat assessment. The intent of this adaptation is to make display symbology relate to the situation, to assist the pilot in assessing which detected contacts pose the most potential hazard to mission completion. In the future, this network will be enhanced by adding other evidence components (e.g., enemy radar lock-ons or high-level intelligence data delivered by friendly AWACS aircraft, which can be introduced directly into high-level situation nodes) and refining the conditional probability tables that link the various nodes. Belief network models will also be developed for other tactical scenarios. Another key attribute of the tactical situation assessor is that it employs fuzzy logic for deriving the low-level data that feeds the network (shown as the rectangular nodes in Fig. 22.3). In the BN formalism, continuous readings must be quantized into one of a finite set of descriptions; for example, threat speed is characterized as slow (Mach 0–0.8), fast (Mach 0.8–1.9), and very fast (Mach 1.9 and beyond). However, it is intuitively apparent that there is no meaningful difference between a speed of Mach 0.799 and Mach 0.801. Unfortunately, a BN that uses “hard” boundaries between one category and the next would classify these values as slow and fast, respectively, which might lead to different conclusions regarding
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
1
Slow
Degree of Membership
Degree of Membership
22.
Very Fast
Fast
0.8 0.6 0.4 0.2 0
0
0.5
1
1.5
2
2.5
3
497
1
Slow
Fast
0.8
Very Fast
0.6 0.4 0.2 0
Mach Number
0
0.5
1 1.5 2 Mach Number
2.5
(a) Nonfuzzy classification
(b) Fuzzy classification
FIG. 22.4. Fuzzy and nonfuzzy classification of threat speed.
vehicle type. Further, if a contact is accelerating or decelerating smoothly, a transition from slow to fast would result in discontinuous jumps in BN output, which may produce undesirable effects in the high-level threat assessments and subsequent information display. It is reasonable to expect that smooth inputs should lead to smooth outputs, and that BNs should be tolerant of noise in the input signals. Fuzzy logic (Zadeh, 1973), a technique that seeks to model the uncertainty and imprecision in human decision making, provides one solution to this problem. Zadeh introduced the notion of multivalued or fuzzy sets that are defined by a membership function, which relates or measures the degree to which a given element belongs to a given set. FL also provides a means of transforming linguistic variables, such as slow or fast, into the precise numerical quantities needed by a computer. For example, Fig. 22.4a illustrates the speed classifications previously described using classical, nonfuzzy sets. A given speed measurement may fall into only one of the three categories shown. By contrast, Fig. 22.4b shows three fuzzy sets for the classifications slow, fast, and very fast. A given speed measurement is characterized by its degree of membership in each of these fuzzy sets. For example, a speed of Mach 0.8 has a 0.5 degree of membership in the fuzzy set slow, and a 0.5 degree of membership in the fuzzy set fast. As speed increases or decreases, these degrees of membership change in a continuous manner. This fuzzy characterization has the following benefits:
r The degrees of fuzzy membership are tolerant of noise in the speed measurement.
r Similar inputs produce similar outputs, to maintain a smooth, nonjerky response from the BN to time-varying input signals. This results in situation assessments that do not change suddenly, but vary in a manner that is proportional to the change in the BN inputs.
3
498
MULGUND, RINKUS, AND ZACHARIAS
PILOT STATE ESTIMATOR Requirements for the Pilot State Estimator As indicated in Fig. 22.1, the combined output of the aircraft’s information systems provides one important source of information relevant to PVI configuration. Another potentially important source of information is the pilot’s mental state. By pilot state, we mean a set of metrics that serve to define the pilot’s informationprocessing burden, mental workload, and level of engagement in a set of tasks. To properly incorporate a judgment of pilot state into a PVI management strategy, a need exists for unobtrusive, automatic, and continuous estimation of the pilot’s mental workload. In the course of this research, we explored the feasibility of using belief networks for modeling pilot workload. Workload measurement techniques can be grouped into three broad categories: 1. Subjective techniques, which are based on operator judgments of task workload 2. Performance-based techniques, which assess workload based on the operator’s ability to perform tasks 3. Physiological techniques, which assess workload based on physiological measures, such as heart rate, eye blink rate, EEG and many others. There are several established subjective techniques, for example, the modified Cooper Harper Scale (Wierwille & Casali, 1983), the Subjective Workload Assessment Technique (Reid & Nygren, 1988), or the NASA Task Load Index (Hart & Staveland, 1988). These techniques typically take the form of questionnaires that the subject completes either during or after the task of interest. The most significant advantage of these techniques is the apparent directness of the relationship between the measured variables and actual information processing demands of the task or situation. That is, the pilot’s are directly asked to score “level of difficulty” for various tasks or flight phases. Thus, no theory relating other measures, for example, response time (performance metric) or heart rate variability (a physiological metric) need be constructed. The principal disadvantage is that these techniques are by their nature highly obtrusive and are therefore not appropriate for online mental workload estimation. Rather, these techniques are best utilized offline to aid in the PVI design process. In contrast, performance measures and physiological measures can both be gathered unobtrusively in real time and used to control the PVI configuration. However, some theory of the relationship between these types of measures and mental workload is needed. Performance-based measures can be grouped into two categories: those measuring some aspect of performance—for example, reaction time and tracking accuracy—of a primary task (e.g., targeting an enemy plane) and those measuring some aspect of performance of a secondary task. Although various studies (Casali & Wierwille, 1983; Eggemeier & Wilson, 1991) have demonstrated a
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
499
reliable relationship between various primary task measures and workload, a caveat to their usage is that at low-to-moderate workload levels, humans can maintain a certain level of performance through increased effort, thus rendering the measure insensitive. This potential problem provides a principal motivation for using secondary task performance as an index of mental workload. Secondary tasks may be realistic for the piloting environment (e.g., monitoring telemetry) or unrealistic (e.g., performing mental addition, or a memory task). The standard premise of secondary task measurement is that there is some unitary pool of capacity and that the secondary task will be accomplished with the spare capacity not used by the primary task. Thus, degradation of secondary task performance indicates less available spare capacity and thus higher primary task workload (Eggemeier & Wilson, 1991). In an effort to model the data more closely and in particular to explain certain insensitivities between various measures and workload, Wickens and his colleagues (Wickens, 1991; Wickens, Sandry, and Vidulich, 1983) have developed a more general, multiple resource model in which the input, output, and central processing capabilities of a human consist of parallel channels. If the secondary task taps a coupling of input, central, and output channels that is disjointed from the primary task, then performance of the secondary task will be largely insensitive to manipulations of workload on the primary task. Typical secondary tasks include reaction time, mental arithmetic, and estimation of elapsed time intervals. The general finding is that secondary task measures often show sensitivity to primary task load even when the direct primary task performance measures remain relatively invariant (Bortolussi, Kantowitz, & Hart, 1986; Casali & Wierwille, 1983; Kantowitz, Hart, Bortolussi, Shively, & Kantowitz, 1984). This suggests that a combined vector of both primary and secondary measures would provide independent evidence regarding mental workload. Although it is not reasonable to impose additional, irrelevant secondary tasks in the actual cockpit environment, that environment is inherently multitask, and multiple subsidiary tasks will generally be present for monitoring. Physiological measures are the least directly related class of measures to mental workload but they have a number of advantages that make them increasingly attractive, at least in conjunction with performance-based measures. As Kramer (1991) describes
r They are the least obtrusive of all the classes of measure. r They can be measured continuously and are thus potentially sensitive to phasic shifts in mental processing.
r They do not require overt performance. r They are objective measures and sensitive to unconscious aspects of cognitive processing. The general strategy of experimental research in this domain is to demonstrate correlation between measures in one category and measures in another. For example, Wilson, Fullencamp, and Davis (1994) found correlations between subjective
500
MULGUND, RINKUS, AND ZACHARIAS
measures of task difficulty and four physiological measures—evoked cortical potentials, heart rate, blink rate, and respiration rate. Although numerous pairwise correlations have been found repeatedly, many authors have suggested that finer discriminations of mental workload could be obtained by combining many measures (Kantowitz & Casper, 1988; Spyker, Stackhouse, Khalafalla, & McLane, 1971; Wierwille & Eggemeier, 1993; Wilson et al., 1994). Accordingly, one of the goals of this research was to assess the feasibility of combining multiple measures to arrive at an assessment of overall workload. Given the need for a consistent mechanism for combining evidence of varying reliability concerning multiple variables at various hierarchical levels, we used the belief network formalism in the preliminary modeling effort.
Belief Network Implementation of Pilot State Estimation Module We have developed a preliminary topology for a comprehensive model of pilot workload using belief networks. Figure 22.5, which illustrates this network, depicts a candidate BN capturing the hypothesized dependence relationships that exist among some of the variables, hidden and observable, relevant to workload assessment. The links represent direct conditional dependencies between variables. This network asserts that certain observable measures—for example, the response time
Overall WL
Performancebased WL
Physio. WL
Cardio. WL
HR
HRV RSA
IMV
HRV vagal
Visual WL
Blink rate
CNS WL
Blink dur.
Track. Error
RT
Alpha
Theta
FIG. 22.5. Candidate Bayesian network model of human workload.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
501
to any of various tasks in which the pilot might be concurrently involved and tracking error (if the pilot is currently engaged in some tracking task)—depend directly only on the state of certain (hidden) brain structures and mechanisms that determine the pilot’s level of performance. Similarly, it asserts that certain other observable measures—for example, eye blink rate and blink closure duration—directly depend only on the overall visual workload of the current environment (e.g., as defined by the number of visual targets present and the number of instruments to which the pilot must concurrently attend in order to make some control decision). Yet, evidential knowledge concerning any of these variables influences, according to the mathematical framework of Bayesian inference, affects the probability distributions (i.e., belief vectors) of the remaining variables at all levels of the network. The development of this candidate BN model of human workload was broken into three tasks:
r The identification of the relevant set of relevant variables r The determination of the qualitative causal relationships among these variables
r The determination of the quantitative causal relationships among these variables. As shown in Fig. 22.5, we identified the variable Mental Workload (MWL) as an important high-level descriptor for driving PVI adaptation. We made the simplifying assumption that MWL is determined primarily by factors (not shown in Fig. 22.5) external to the pilot, that is, by aspects of the tactical situation, for example, blue/red player ratio and threat situation. The directed links in a BN mean “causes” or “influences.” Thus, Fig. 22.5 means that the hypothetical construct, MWL, drives lower level mental state variables including Cardiovascular Workload (CWL), respiration rate (i.e., Inspiratory Minute Volume, (IMV), Visual Workload, and Central Nervous System Workload (CNSWL). Some of these lower level variables are themselves considered as hypothetical constructs (i.e., hidden variables) that finally drive the lowest-level, directly observable, variables, for example, Heart Rate (HR) and Heart Rate Variability (HRV). For example, this qualitative (i.e., topological) structure captures the idea that elevated mental workload (for whatever reason) will cause elevated workload in one or more of the various systems actually called on to perform the work, that is, the heart, lungs, eyes, and brain, which will in turn cause increased HR, decreased HRV, decreased blink rate, and other symptoms. Although Fig. 22.5 constitutes the overall design for combining various physiological and performance-based workload measures, during our prototype development we explicitly modeled only the physiological measures subtree, shown in expanded form in Fig. 22.6. This BN generates plausible belief measures throughout the BN as a function of evidence presented to any particular variable. Thus, if evidence that eye blink rate has suddenly decreased significantly is presented to the BN, the belief that visual workload is high increases significantly. In addition,
502
MULGUND, RINKUS, AND ZACHARIAS
Mental WL
0.7 0.3 0.1
0.2 0.4 0.2
0.8 0.1 0.1
0.1 0.3 0.7 0.5 0.3 0.2
Cardio. WL
0.3 0.4 0.3
0.2 0.3 0.5
0.7 0.2 0.1
0.2 0.6 0.2
0.1 0.8 0.1
0.1 0.1 0.8 CNS WL
0.1 0.2 0.7 0.2 0.3 0.5
Visual WL IMV 0.5 0.3 0.2
0.3 0.4 0.3
.25 .35 .40
0.2 0.3 0.5 0.3 0.5 0.7
0.7 0.5 0.3
.35 .30 .35
EEG Alpha
.40 .35 .25 0.2 0.3 0.5
0.1 0.3 0.6
0.3 0.4 0.3
0.6 0.3 0.1
HR HRV (RSA)
0.3 0.4 0.3
0.3 0.4 0.3
0.5 0.3 0.2 .27 .35 .38
0.5 0.3 0.2
.35 .30 .35
.38 .35 .27
EEG Theta
Blink rate Blink duration
HRV (vagal)
FIG. 22.6. Belief network model of physiological workload.
the belief that overall mental workload is high also increases, reflecting the fact that a more complex visual task generally implies a more complex cognitive task as well. We implemented this network in line with our flight simulation model and provided the capacity for the user to simulate the low-level physiological measurements (heart rate, blink rate, etc.) using the keyboard. Each of the directed links in this contain a conditional probability table (CPT) that specifies the conditional probability that the child node takes on a particular value, given that the parent variable takes on a specified value. In general, we have chosen to represent most of the variables in the BN as three-valued, for example, low, medium, or high. Although there is a rather large literature on the estimation of mental workload based on physiological indicators, these studies usually report only correlation coefficients, typically involve only a few of the physiological variables identified in Figs. 22.5 and 22.6, and they conflict with one another. Thus, in producing the CPTs of Fig. 22.6, it was necessary to scan a wide range of the literature and estimate an average strength of correlation for each pairwise relationship across studies. These limitations on the level of significance suggested the use of a small number of generic CPTs reflecting only the ordinal relationships between the strengths of influence from causes to effect. For example, the CPT between MWL and CNSWL represents a very strong correlation
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
503
band (Gale, 1987; Gevins & Schaffer, 1980; Natani & Gomer, 1981; Sirevaag et al., 1988; Sterman, Mann, & Kaiser, 1992), the CPT between MWL and IMV represents a relatively weak correlation (Pettyjohn et al., 1977; Lindholm et al., 1984; Opmeer & Krol, 1973), the CPT between CWL and HRV represents a fairly strong negative correlation (Horst, Ruchkin, & Munson, 1987; Kalsbeek, 1971). A CPT contains more information than a correlation coefficient, and thus imposes constraints on the shape of the probability distributions of the parent and child variable. In effect, this constitutes an addition of information to the model. For this reason, we make the minimal assumption of symmetric or nearly symmetric CPTs, to introduce as few artifacts as possible into the resulting distributions. The CPTs in Fig. 22.6 are intended to capture the approximate correlations— based on a wide sampling of the workload assessment literature—between various observable physiological signals and several hidden variables, including overall mental workload. The CPT connecting the variables, mental workload, and central nervous system (CNS) workload encodes a strong positive correlation between these two variables. The CPT connecting MWL and IMV encodes a much weaker but still positive correlation. The relative strengths of these two CPTs reflect the generalization that direct measures of cortical activity (e.g., power spectra) are much more reliable indicators of mental workload than respiratory measures, which are strongly influenced by noncognitive factors as well, for example, physical exertion and emotion. Given the wide variance of results in the literature and the wide range of experimental conditions, these CPTs should be understood as imposing primarily an ordinal scale of strengths over the various pairwise interactions of the domain. This is in line with the overall objective of the objectives of our preliminary design effort, which was to assess the feasibility of using the BN approach rather than attempting to fit any particular subset of the data. In our current development work, we are continuing to pursue this approach while enhancing the modeling fidelity by incorporating diagonal CPT matrices with increasing eigenvalues along the main diagonal to model a positive correlation coefficient and decreasing eigenvalues for a negative correlation coefficient. This requires the empirical development of a mapping from correlation coefficients (which lie in the range of −1, 1) into sequences of eigenvalues. The best solution is for the BN parameters—in particular, the CPT entries—to be learned by sampling the multivariate distribution. Although learning methods have been developed for the BN formalism (Pearl, 1988; Spiegelhalter, Dawid, Lauritzen, & Cowell, 1993), we clearly require access to the raw data. In the case of our physiological workload BN (Fig. 22.6), this would require simultaneous recording of the eight measures corresponding to the leaf node. Very few experiments of this scale have been performed (e.g., Spyker et al., 1971). As such raw data becomes available (using, for example, physiological measurement system such as the Workload Assessment Monitor; Wilson, Fullencamp, & Davis, 1994), it can used directly within our existing framework to learn the network parameters. We are also making
504
MULGUND, RINKUS, AND ZACHARIAS
the following enhancements to the prototype design:
r The prototype pilot state estimator (PSE) module uses a one-dimensional
r
r
r
r
model of mental workload. This is consistent with many studies in the literature. However, the multidimensional representation of workload in Wickens’s (1984, 1991) multiresource model would naturally map into a larger, and thus more task-specific, space of PVI configurations. Accordingly, one of the current research goals is to generalize the BN model to include additional hidden variables corresponding to hypothetical resources or channels (e.g., auditory channel load, verbal processing load, and spatial processing load). The prototype PSE predominantly uses variables not directly related to the central nervous system, in particular, the brain. However, given that the brain is the body’s information-processing system and given our interest in information-processing parameters, we are structuring our current modeling effort so that it will naturally evolve toward having mostly CNS-related measures (e.g., EEG-related signals). Topologically, the prototype PSE BN is a simple tree. This implies a single root variable, which in our case is overall mental workload. This is currently being generalized to the polytree and directed acyclic graph (DAG) topologies to allow for a multidimensional representation of mental workload and a richer model for the causal interactions among the domain variables, both observable and hidden. The prototype PSE module has fixed CPTs that are predetermined through an offline knowledge engineering effort. We are adding learning capabilities (Pearl, 1988; Spiegelhalter et al., 1993) to the BN simulation, thus making the model more general and minimizing the need for knowledge engineering. Note that learning methods exist not only for estimating the CPT parameters of a fixed network topology but also for estimating the topology as well. The prototype PSE BN topology only uses physiological measurements for inferring physiological workload. It may be the case that task-dependent features affect the interpretation of such signals. We are seeking to identify the existence of any such relationships and modify the network topology to allow for such interdependencies.
PILOT–VEHICLE INTERFACE ADAPTATION MODULE The final principal component of our adaptive PVI architecture is the PVI adaptation module, which uses the tactical situation assessment to drive interface content, and the pilot state estimate to drive interface modality and format. One of the key objectives of this research was to develop a systematic approach and formal means of studying this problem.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
505
TABLE 22.3 Scale of Degrees of Automation
Degree
1 2 3 4 5 6 7 8 9 10
Type of Automation
The computer offers no assistance; human must do it all. The computer offers a complete set of action alternatives, and Narrows the selection down to a few, or Suggests one, and Executes that suggestion if the human approves, or Allows the human a restricted time to verb before automatic execution, or Executes automatically, then necessarily informs the human, or Informs him after execution only if he asks, or Informs him after execution if it, the computer, decides to. The computer decides everything and acts autonomously, ignoring the human.
Note. Adopted from Sheridan, 1992.
In many respects, the development of adaptive interfaces is analogous to the development of automated systems. In building any automated system, the designer is faced with the problem of deciding how far to go with automation. To formalize this problem, Sheridan (1992) developed a scale quantifying the degree of automation, shown in Table 22.3. Each level of the scale assumes some previous ones (when ANDed) or imposes more restrictive constraints (when ORed). Each successive level precludes human intervention to a greater extent and introduces additional opportunities for machine error. This motivates careful consideration of what should be automated and what should be left to the human. Inspired by Sheridan’s scale of automation, we have developed a preliminary taxonomy of the degrees of adaptation of a human–machine interface, shown in Table 22.4. Each level corresponds to an increasing degree of computer-based control of the interface, just as Sheridan’s taxonomy characterizes increasing computer-based assumption of tasks. At Level 1, no interface adaptation takes place; that is, this corresponds to a conventional human–machine interface. At Level 2, the computer adapts graphic symbology in a manner that corresponds to the features of the situation. The intent of this adaptation level is to drive graphic (i.e., visual) interface content in a manner that directly supports situation assessment. Broadly speaking, there are three levels to situation assessment: (1) perception of the essential elements of the situation, (2) comprehension of their meaning, and (3) projection of the situation into the future. For the purposes of establishing how to support SA, we can consider the converse problem: How do we mitigate SA errors? Endsley (1995b) has developed
506
MULGUND, RINKUS, AND ZACHARIAS TABLE 22.4 Scale of Levels of Interface Adaptation
Level
1 2 3 4 5
Type of Adaptation
No interface adaptation; the human controls all aspects of interface operation. The computer adapts graphical symbology and Augments the display by modality, and Manages interface mode and configuration. and Automates specific operator tasks.
a taxonomy of SA errors at each of these three levels, as follows: 1. Level 1 SA: failure to correctly perceive the situation r Data not available r Data difficult to detect or perceive r Failure to observe or scan the data r Omission r Attention narrowing/distraction high task load r Misperception of data r Memory failure 2. Level 2 SA: failure to comprehend the situation r Lack of, or a poor, mental model r Incorrect mental model r Overreliance on default values in the mental model (i.e., relying on general expectations) r Memory failure 3. Level 3 SA: Failure to project the situation into the future r Lack of, or a poor, mental model Accordingly, the intent of our Level 2 adaptation is to reduce the chances of Level 1, 2, or 3 SA errors by configuring the graphic interface appropriately. Our interface adaptation Levels 3, 4, and 5 are intended to have the computer take on a greater degree of interface/system management, as dictated by pilot workload or task engagement. At Level 3, the computer augments graphical displays with multimodal elements (e.g., auditory or haptic displays), when a specific highsalience piece of information must be brought to the user’s attention and the system has determined that the operator’s visual workload is very high. At Level 4, the computer takes on the tasks of configuring display modes and settings in a manner that is appropriate for the situation. Finally, at Level 5, the computer offloads specific operator tasks and carries them out itself.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
507
PROTOTYPE IMPLEMENTATION AND EVALUATION Adaptive PVI Implementation The adaptive PVI prototype was developed and implemented on a Silicon Graphics work station and integrated with a limited-scope flight simulator. The simulation model provided a rudimentary simulation of aircraft and missile dynamics, sensor models, and basic cockpit displays. Threat aircraft were programmed as autonomous agents, with predefined rule-based tactics for attacking ownship. This provided a simulated threat environment for evaluating the performance of the adaptive interface, as it carried out threat assessments and situation-based adaptation during the defensive reaction segment of the air-to-ground attack mission. To implement the prototype adaptive PVI, we developed a single aircraft-specific example of each of our candidate adaptation levels, as follows:
r Level 1: basic air-to-air radar display. r Level 2: graphical radar symbology that incorporates the BN-driven threat assessment so that symbol coding is proportional to threat hypothesis.
r Level 3: augmentation of adaptive graphical radar symbology with audio alerts.
r Level 4: adaptive graphical radar symbology, audio alerts, and computerbased control of radar scan pattern as a function of the tactical situation.
r Level 5: adaptive graphic radar symbology, audio alerts, computer-based control of radar scan pattern as a function of the tactical situation, and computer takes over the task of releasing chaff during missile evasion maneuvers. Figure 22.7 presents a legend for our air-to-air radar symbol coding for Level 2 adaptation. This legend is based in part on the symbology used in the FITE simulator facility at Wright-Patterson Air Force Base (Haas, Hettinger, Nelson, & Shaw, 1995). The color and symbol coding is the same as that found in the FITE candidate radar display. To this we have added the threat potential coding: As the threat potential predicted by the threat assessment BN increases, the symbol’s line thickness increases in proportion. For a high threat, the symbol is drawn completely filled in to give it the highest possible salience. When the RWR detects a radar signal, an oval is superimposed on the display to indicate the estimated origin of the signal. The lower part of the figure shows the RWR oval superimposed on a radar icon. Again, the line thickness is proportional to the estimated threat potential. The size of the oval in a direction parallel to the radar line of sight indicates the uncertainty in range, whereas the size in the direction perpendicular to the line of sight (in the horizontal plane) indicates the uncertainty in azimuth. When the RWR detects a missile lock, the icon will flash to draw attention to the expected origin of the incoming missile. In the event that the RWR detects a radar signal at a range that is beyond the current radar range setting, an arrow appears at the boundary of the radar display to indicate the direction of
508
MULGUND, RINKUS, AND ZACHARIAS
Color:
Symbols: Bomber
Friendly:
Green
Own team:
Blue
Fighter
Hostile:
Red
Missile (for RWR alerts)
Unknown:
Yellow
Other (transports, unknown)
Radar Threat coding: Line thickness proportional to threat potential (Fighter example):
No Threat
Low Threat
Medium Threat
High Threat
RWR coding: Oval indicates approximate location of radar source; size depicts uncertainty in range and azimuth. Flashing symbol indicates missile lock (Fighter example):
No Threat
Low Threat
Medium Threat
High Threat
FIG. 22.7. Air-to-air radar symbology.
the incoming radar signal. The line thickness of this arrow is also proportional to the estimated threat potential. For Level 3 PVI adaptation, we have incorporated the following audio alerts into the adaptive PVI prototype:
r An audio warning tone that sounds when the RWR detects a hostile radar signal at a range beyond the current radar display setting
r An audio alert tone that sounds when the RWR detects an enemy missile lock on ownship. Both audio alerts are correlated with visual icons that indicate the same information. Experimental research has shown that correlated bimodal alerting provides improved response latencies and enhanced subjective SA over uni modal alerting (Selcon, Taylor, & Shadrake, 1992). These alerts could be made localized using 3-D audio so that the origin of the sound corresponds to the direction of the detected threat. There at least two conceivable modes of operation for such bimodal alerting: (1) Bimodal alerts are presented in all cases or (2) bimodal alerts are presented only when pilot workload increases beyond a certain threshold and/or the density of information on the radar beyond a certain level. The adaptive PVI prototype provides the capacity to select either mode of operation, with user-defined workload thresholds (using metrics compatible with the BN-derived estimate of pilot state).
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
20 TWS
509
F Fighter Aircraft (Medium Threat)
Current Radar Settings
Incoming Missile (High Threat)
Predicted Future Flight Path
Medium Threat Radar detected beyond current display range
Instantaneous radar scan pattern
FIG. 22.8. Prototype of an adaptive radar display.
For Level 4 adaptation, the adaptive PVI prototype can automatically select appropriate radar display range in the event that an enemy radar at or above a given threat threshold is detected at a range beyond the current display setting. Again, this automation can happen whenever the tactical situation and system configuration warrants it, or when the pilot workload rises beyond a certain threshold. Finally, for Level 5 adaptation, our prototype system can automatically release chaff to confuse enemy missile radar, in the course of a missile evasion maneuver. Figure 22.8 presents a snapshot of our integrated display during the course of a simulation. The blue markings are not part of the display itself; they are meant to show the meaning of the various symbology elements. At the instant shown, an incoming missile is approximately 7.5 nautical miles away from ownship, while the aircraft that fired it is veering to its left. At the same time, the RWR has detected a medium-threat fighter radar (as indicated by the F within the arrow) at a range greater than 20 nm (the current radar display setting). Prototype Evaluation Pilot State Estimator Evaluation Figure 22.9 illustrates the dynamic evolution of the pilot state estimator BN as evidence accumulates over time. The figure shows how presentation of successive pieces of evidence, all of which are correlated with the hypothesis that physio workload—physiologically measured mental workload—is high, gradually
510
MULGUND, RINKUS, AND ZACHARIAS Physio. WL
a)
Card. WL
HR
HRV (vag.)
Vis. WL
IMV
Blink r. .33,.33,.33
Physio. WL
b)
.33,.33,.33
CNS WL
.15,.29,.56
IMV
Card. WL
Alpha
HR
HRV (vag.)
.33,.33,.33
1,1,10
10,1,1
Vis. WL
CNS WL Alpha
Blink r. .35,.33,.32
.38,.33,.29
HRV (RSA)
HRV (RSA) .5,.5
Blink D.
Blink D.
.94,.06
Theta
Theta
10,1
IMV 1,1,10
HR
Physio.WL .03,.15,.82
.06,.20,.74
Card. WL
1,1,10
d)
Physio. WL
c)
HRV (vag.) 10,1,1
HRV (RSA) .94,.06 10,1
Vis. WL
Blink r. .85,.08,.07 10,1,1 Blink D. 10,1,1
CNS WL
IMV
Card. WL
1, 1, 10
Alpha
HR
HRV (vag.)
.41,.32,.27
1,1,10
10,1,1
Theta
Vis. WL
Alpha
Blink r. .86,.08,.06 .10,1,1
HRV (RSA) .95,.05 10,1
CNS WL
.85,.08,.07 10,1,1
Blink D.
Theta
.10,1,1
10,1,1
FIG. 22.9. Example of a pilot state estimator evaluation.
increases the belief that workload is high. (The nodes of the network have been depicted as squares in this figure to facilitate visual presentation.). In this example, we track the belief vectors of four of the network nodes. Panel (a) of Fig. 22.9 shows the BN before any evidence concerning any of the physiological variables has been posted, with the BN nodes initialized to a set of default values. Accordingly, the probability distributions (i.e., belief vectors) are all uniform. The model has equal belief in all three possible values (low, med, high) of the variable physio workload. Panel (b) of Fig. 22.9 shows the updated belief vectors following presentation of evidence to the three nodes concerning cardiac measures. The evidence vectors are shown in the gray fields of the nodes. An HR evidence vector of (1, 1, 10) is introduced to the network, meaning that evidence (from the preprocessed HR sensor) favors the hypothesis that HR = high over the two other hypothesis in to the ratio (1, 1, 10). The two HRV evidence vectors strongly favor the hypotheses that these variables are low—that heart rate variability is low, which correlates with higher HR. These individual pieces of evidence combine to increase the belief that the physiologically measured mental workload is high. Panels (c) and (d) show how presentation of further evidence that is consistent with the hypothesis that workload is high has the intuitively expected effect of increasing the model’s belief in that hypothesis.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
Threat Potential .25, .25, .25 .25
a)
Type & Aspect
Location .26, .41, .19, .14
NCTR ID RWR ID .16, .21, .35, .27 .22, .29, .17, .32
Type & Aspect
Type 0, .1, .9, 0
NCTR ID 0, 0, 1, 0 0, 0, 1, 0
Altitude .44, .56
Type 0, .50, .50, 0
Speed .25, .57, .18
Threat Potential .04, .14, .81, .01
c)
Type & Aspect
Aspect .35, .22, .43
Type .17, .28, .37, .18
Location .0, .98, .01, .02
Threat Potential .12, .21, .66, .01
b) Actions .4, .38, .22
Altitude .01, .99
Aspect .96, .01, .03 1, .01, .01
Type & Aspect
Altitude .01, .99 .01, 1.
Speed 0, 1, 0 .01, 1, .01
Location 0, .99, .01, 0
Type 0, 0, 1, 0
Speed 0, 1, 0
Actions 0, .01, .99 .01, .01, 1
Threat Potential .40, .60, 0, 0
d)
Aspect .93, .02, .05
RWR ID .13, .69, .08, .1
Location .01, .98, 0, .01 .01, 1, .01, .01
NCTR ID RWR ID .07, .35, .35, .23 .32, .38, .10, .20
Actions 0, 0, 1
511
NCTR ID 0, 0, 1, 0
RWR ID 0, .99, .01, 0 .01, 1, .01, .01
Actions 0, 1, 0 0, 1, 0
Aspect 0, 0, 1 0, 0, 1
Altitude 0, 1
Speed 0, 1, 0
FIG. 22.10. Example of a threat assessment network evaluation.
Evaluation of Situation Assessment Module Figure 22.10 presents a corresponding illustration of the threat assessment situation dynamically evolving over time as information is generated by the aircraft’s sensors. The basic network topology was presented earlier in Fig. 22.3. Panel (a) of Fig. 22.10 shows the network in its initial state before any evidence concerning a sensor contact has been presented to the network. The threat potential node shows that it is equally likely that the detected contact is a high, medium, low, or no threat. In Panel (b) of Fig. 22.10, the following evidence is presented to the network (as indicated by the shaded boxes), using processed data obtained from the radar system:
r r r r r
The contact is in missile range [location evidence (.01, 1, .01, .01)] The contact is not maneuvering [actions evidence (.01, .01, 1)] The contact is at a low aspect angle [aspect evidence (1, .01, .01)] The contact is at high altitude [altitude evidence (.01, 1)] The contact is traveling at a fast speed [speed evidence (.01, 1, .01)]
The propagation of this evidence through the network causes the type hypothesis to reflect a 0.5 probability that the contact is a bomber aircraft, and a 0.5 probability that it is a fighter. The threat potential node indicates that this is most likely a low
512
MULGUND, RINKUS, AND ZACHARIAS
threat, with a probability of 0.66. In Panel (c) of Fig. 22.10, the noncooperative target recognition system produces evidence that this is most likely a fighter aircraft. This causes the type hypothesis to shift significantly, indicating that there is now a 0.1 probability that the contact is a bomber, and a 0.9 probability that it is a fighter. Note that although an RWR identification has not yet taken place, the belief vector on the RWR identification node in Panel (c) indicates that detection of a fighter radar may be expected, with a probability of 0.69. Finally, in Panel (d) of Fig. 22.10, the detected aircraft maneuvers in such a manner as to increase its threat potential considerably. The following evidence is presented to the network:
r The contact is turning toward ownship [actions evidence (0, 1, 0)] r The contact is at a high aspect angle [aspect evidence (0, 0, 1)] r The RWR identification indicates a fighter [RWR ID evidence (.01, 1, .01, .01)] The type node now indicates with complete certainty that this is a fighter aircraft. The threat potential node now indicates a 0.4 probability that the aircraft is a high threat and a 0.6 probability that the aircraft is a medium threat. As discussed earlier and further illustrated in the next subsection, the results of this probabilistic reasoning are used to drive the radar display symbology and draw the pilot’s attention to the highest-priority threats detected by the aircraft’s sensor systems. Evaluation of Integrated Prototype Operation To evaluate the integrated operation of the adaptive PVI prototype, we constructed two tactical scenarios in which the ownship is attacked by a number of threat aircraft. Figure 22.11 illustrates the first scenario, in which the ownship is attacked sequentially by three threat aircraft. By design, the threat aircraft were spaced far apart so that there would be a discernable interval between each successive missile attack. Figure 22.12 shows the radar display 7 sec into the simulation, at which point neither of the aircraft within 40 miles has been classified in terms of its type. They are currently beyond their estimated missile range; therefore, they appear to pose no threat (i.e., the threat assessment network for each contact generates a threat hypothesis of none, so the aircraft icons appear as outlines only). The box on the upper left corner of the figure shows a timer indicating minutes and seconds of simulation time, whereas the status bar on the lower left corner of the screen displays status messages generated by the PVI system. Figure 22.13 shows a screen snapshot 32 sec into the simulation, at which point the aircraft on the right is now a medium threat, as its range is within the estimated weapons envelope (approximately 25 nautical miles). Both aircraft have been classified as fighter aircraft. The ovals surrounding both aircraft indicate that the radar warning receiver has detected radar emissions from both aircraft.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
513
80 15,000 ft Mach 1.4
60 N 15,000 ft Mach 1
40
15,000 ft Mach 1
20
0 15,000 ft Mach 1
60
40
20
0
20
40
60
FIG. 22.11. Tactical scenario no. 1 for PVI prototype evaluation.
0:07 40 TWS
Status
FIG. 22.12. Tactical scenario no. 1, first frame.
The pilot workload monitor, which shows workload readings extracted from the physiological workload BN, is displayed at the bottom right corner of the screen. Figure 22.14 provides a screen snapshot 38 sec into the simulation, at which point the aircraft on the right is now a high threat. The status bar on the lower left portion of the screen indicates that the PVI is generating an audio missile lock alert for this contact, located at an azimuth angle of 39 deg. During the simulation,
514
MULGUND, RINKUS, AND ZACHARIAS
0:32 40 TWS
Workload Monitor
Status tot
car imv
vis
cns
FIG. 22.13. Tactical scenario no. 1, second frame.
0:38 40 TWS
Workload Monitor
Status 0:38: Audio missile lock alert; az = 39
tot
car imv
vis
cns
FIG. 22.14. Tactical scenario no. 1, third frame.
the radar oval for this contact would be seen to flash in synchronization with the audio alert, denoting the missile lock detection. Finally, Fig. 22.15 shows a screen snapshot 1 min and 18 sec into the simulation, at which point both threat aircraft have released their missiles and are starting to turn away. Both aircraft now show up as medium threats, whereas the missiles they have released are high threats. Because of the high workload condition, the PVI system released chaff at 1:11 (as indicated in the status bar), when the missile from the aircraft on the right comes within a range of 10 nm. Figure 22.16 shows the second tactical scenario, which contains six threat aircraft. This scenario was selected to show PVI operation with much denser scene content. It was found during preliminary evaluations that the RWR oval indicators
22.
515
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
0:16 20 TWS
Workload Monitor
Status 1:11: Dropped chaffy
tot
car imv
vis
cns
FIG. 22.15. Tactical scenario no. 1, fourth frame.
80
60 N 40
14, 000 ft Mach 1 15, 000 ft Mach 1
20 15, 000 ft Mach 1.4
0 15, 000 ft Mach 1
60
40
20
0
20
40
60
FIG. 22.16. Tactical scenario no. 2 for PVI prototype evaluation.
produced excessive screen clutter when threat aircraft were located close to each other. As such, an alternate symbology for RWR indication was developed: Fixedradius circles are used to indicate RWR detections. If the ownship’s radar detects an aircraft at the same position that the RWR does, then the circle’s line thickness does not change with threat potential. If it is an RWR-only sensor contact (e.g., a missile), then line thickness increases with threat potential.
516
MULGUND, RINKUS, AND ZACHARIAS
0:22 40 TWS
Workload Monitor Status tot
car
imv
vis
cns
FIG. 22.17. Tactical scenario no. 2, first frame.
0:36 40 TWS
Workload Monitor
Status 0:36: Audio missile lock alert: az = −2
tot
car
imv
vis
cns
FIG. 22.18. Tactical scenario no. 2, second frame.
Figure 22.17 illustrates the radar display 22 sec into the simulation. The RWR has detected radar emissions from the three aircraft that are within 30 nm. All aircraft are classified as posing no threat at this instant. Figure 22.18 shows a screen snapshot 36 sec into the simulation. The three aircraft pointing directly at ownship are now high threats. All three have obtained a missile lock on the ownship, and the RWR circles were flashing at the instant the screen snapshot was taken. The status bar indicates that a missile lock was just detected from the aircraft at an azimuth angle of −2 deg. At this instant, the PVI is generating audio missile lock alerts for all three aircraft. Observe that the other two aircraft that are heading away from ownship show up as no threat. These two aircraft are visually quite distinct from the three high-threat aircraft.
22.
517
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
0:10 20 TWS
4
1
6
5
3 2
Workload Monitor
Status 1:05: Dropped chaffy
tot
car
imv
vis
cns
FIG. 22.19. Tactical scenario no. 2, third frame.
Figure 22.19 shows a screen snapshot at 1:10, at which point the three aircraft have already fired their missiles at ownship. The missiles appear as solid circles, indicating that they are high threats. Although not visible in this black-and-white image, small arrow symbols denoting missiles are visible on the SGI screen. The three threat aircraft are now classified as medium threats: They are within missile range, but they are turning away. Note the numbers that appear above the six icons: They represent a prioritization of threats, generated on the basis of decreasing BNgenerated threat potential. When the first missile came within a 10-mile range, the PVI system released chaff (due to the very high workload level), as indicated in the status bar. Finally, note the RWR warning that appears at an azimuth angle of approximately 60 deg left, at the edge of the radar display. It indicates that the RWR system has detected a threat radar at a range beyond 20 miles (the current display setting). The letter F within the arrow indicates that the system has classified it as a fighter radar. The arrow’s line thickness indicates that this sensor contact is presently classified as a low threat.
SUMMARY This study has demonstrated the basic feasibility of a concept prototype for adaptive pilot–vehicle interfaces in the military cockpit. The study was specifically structured to be narrow in scope, but of sufficient depth to set the foundation and to provide a road map for a full-scope development and pilot-in-the-loop validation effort. Our study has shown that belief networks and fuzzy logic offer a robust, extensible framework for situation assessment and pilot state modeling. The key benefit of using BNs for these applications is that they readily facilitate extending
518
MULGUND, RINKUS, AND ZACHARIAS
existing models with new variables or dependencies without recoding the existing models. This offers a considerable benefit over conventional expert system approaches of domain knowledge modeling, in which it is necessary to redesign rule bases whenever a new variable is introduced. The use of fuzzy logic provides an SA module that generates smooth outputs for smooth inputs. The interface adaptation taxonomy we propose provides the beginnings of a conceptual framework for creating adaptive interfaces so that they can be designed in a rational, structured manner. Our prototype radar display shows how we can apply each element of this taxonomy to provide situation-driven, workload-adaptive PVI configuration during the course of a simulated air-to-air encounter. The basic taxonomy was designed to be generic in scope and applicable to domains beyond fighter aircraft cockpits. In summary, this study has established the feasibility of developing adaptive pilot–vehicle interfaces for tactical aircraft cockpits and provides a rich set of conceptual building blocks for follow-up development.
REFERENCES Aasman, J., Wijers, A., Mulder, G., & Mulder, L. (1988). Measuring mental fatigue in normal daily working routines. In Human Mental Workload. Amsterdam: Elsevier. Adams, M. J., Tenney, Y. J., & Pew, R. W. (1995). Situation awareness and the cognitive management of complex systems. Human Factors, 37(1), 85–104. Backs, R. W., Ryan, A. M., Wilson, G. F., & Swain, R. A. (1995). Topographic EEG changes across single-to-dual task performance of mental arithmetic and tracking. Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting. Backs, R. W., Wilson, G. F., & Hankins, T. C. (1995). Cardiovascular assessment of mental workload using autonomic components: Laboratory and in-flight examples. Proceedings of the Eighth International Symposium on Aviation Psychology. Columbus: Ohio State University. Badre, A. N. (1978). Selecting and representing information structures for battlefield decision systems (Tech. Rep. No. 79-A20). ARI, Alexandria, VA. Baron, S., Zacharias, G., Muralidharan, R., & Lancraft, R. (1980). PROCRU: A model for analyzing flight crew procedures in approach to landing. 16th Annual Conference on Manual Control. Cambridge, MA:. Berry, D. C. (1987). The Problem of Implicit Knowledge. Expert Systems, 4(3). Boose, J. H. (1985). A knowledge aquisition program for expert systems based on personal construct psychology. International Journal of Man–Machine Studies, 23, 495–525. Bortolussi, M. R., Kantowitz, B. H., & Hart, S. G. (1986). Measuring pilot workload in a motion-based trainer. Applied Ergonomics, v(17), 278–283. Breese, J. S., & Heckerman, D. (1994). Decision–theoretic case-based reasoning (Tech. Rep. No. MSR-TR-95–03). Microsoft. Redmond, WA. Brickman, B. J., Hettinger, L. W., Roe, M. M., Stautberg, D., Vidulich, M. A., Haas, M. W., & Shaw, R. L. (1995). An assessment of situation awareness in air combat simulation: The global implicit measure approach. International Conference of Experimental Analysis and Measurement of Situation Awareness. Daytona Beach, FL. Broadbent, D. E., Fitzgerald, P., & Broadbent, M. H. P. (1986). Implicit and explicit knowledge in the control of complex systems. British Journal of Psychology, 77, 33–50.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
519
Byrne, K., & Parasuraman. (1995). Differential sensitivity of heart rate and heart rate variability as indices of mental workload in a multi-task environment. Proceedings of the Eighth International Symposium on Aviation Psychology. Columbus: Ohio State University. Caglayan, A. K., & Snorrason, M. (1993). On the relationship between the generalized equality classifier and ART2 neural networks. World Congress on Neural Networks ’93. Portland, OR. Caretta, T. R., Perry, D. C., & Ree, M. J. (1994). The ubiquitous three in the prediction of situation awareness: Round up the usual suspects. In Gilson, Garland, & Koonce (Eds.), Situational awareness in complex systems (pp. 125–137). Daytona Beach, FL: Embry-Riddle Aeronautical University Press. Casali, J. G., & Wierwille, W. W. (1983). A comparison of rating scale, secondary task, psychologial, and primary task workload estimation techniques in a simulated flight task emphasizing communications load. Human Factors, 25, 623–641. Castro, F. D., Hicks, H. E., Ervin, H. E., & Halpin, S. M. (1992). ACCES application 91–02: ACCES assessment of command and control during a division-level CPX (Research Note 92–78). US ARI for Behavioral and Social Sciences. Chi, M. T. H., Glaser, R., & Farr, M. J. (1988). The nature of expertise. Hillsdale, NJ: Lawrence Erlbaum Associates. Cooke, N. M., & McDonald, L. E. (1988). The application of psychological scaling techniques to knowledge elicitation for knowledge-based systems. In Boose & Gaines (Eds.), Knowledge acquisition tools for expert systems. New York: Academic Press. Cooper, R. G., (1993). Winning at new products: Accelerating the process from idea to launch (2nd ed.). New York: Addison-Wesley. Corrigan, J., & Keller, K. (1989). Pilot’s associate: An inflight mission planning application. AIAA Guidance, Navigation, and Control Conference. Boston, MA. Deckert, J. C., Entin, E. B., Entin, E. E., MacMillan, J., & Serfaty, D. (1994). Military command decisionmaking expertise (Rep. No. 631). Burlingten, MA: Alphatech. de Kleer, J., & Brown, J. S. (1983). Assumptions and ambiguities in mechanistic mental models. In Gentner & Stevens (Eds.), Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Development Planning Directorate. (1995). Technology investment recommendations report. WrightPatterson Air Force Base, OH: Author. Druzdzel, M. J. (1996). Five useful properties of probablistic knowledge representations from the point of view of intelligent systems [Special issue]. Fundaments Informaticae. Vol. 30, No. 3–4. Egelund, N. (1982). Spectral analysis of heart rate variability as an indicator of driver fatigue. Ergonomics, 25, 663–672. Eggemeier, F. T., & Wilson, G. F. (1991). Performance-based and subjective assessment of workload in multi-task environments. In Ramos (Ed.), Multiple task performance (pp. 217–278). London: Taylor & Francis. Endsley, M. (1995). A taxonomy of situation awareness errors. Human Factors in Aviation Operations: Proceedings of the 21st Conference of the European Association for Aviation Psychology Research, 3, 1995c. Endsley, M. R. (1988). Situation awareness global assessment technique (SAGAT). Proceedings of the National Aerospace and Electronics Conference New York,:. Endsley, M. R. (1989). Pilot situation awareness: The challenges for the training community. Interservice/Industry Training Systems Conference. Ft. Worth, TX. Endsley, M. R. (1990). Predictive utility of an objective measure of situational awareness. Proceedings of the Human Factors Society 34th Annual Meeting. Santa Monica, CA. Endsley, M. R. (1993). A survey of situation awareness requirements in air-to-air combat fighters. International Journal of Aviation Psychology, 3, 157–168. Endsley, M. R. (1995a). Measurement of situations awareness in dynamic systems. Human Factors, 37(1). Endsley, M. R. (1995b). Toward a theory of situation awareness in dynamic systems. Human Factors, 35(1), 32–45.
520
MULGUND, RINKUS, AND ZACHARIAS
Endsley, M. R., & Bolstad, C. A. (1992). Human capabilities and limitations in situation awareness. In AGARD: Combat automation for airborne weapons systems: Man/machine interface trends and technologies. Neuilly-sur-Siene, France: Advisory Group for Aerospace Research and Development. Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press. Fallesen, J. J. (1993). Overview of army tactical planning performance research (Tech. Rep. No. 984). US ARI for Behavioral and Social Sciences. Fallesen, J. J., Carter, C. F., Perkins, M. S., Micehl, R. R., Flannagan, J. P., & McKeown, P. E. (1992). The effects of procedural structure and computer support upon selecting a tactical course of action (Tech. Rep. No. 960). US ARI for Behavioral and Social Sciences. Fallesen, J. J., & Michel, R. R. (1991). Observation on command and staff performance during CGSC Warrior ’91 (Working Paper No. LVN-91–04). US ARI for Behavioral and Social Sciences. Forbus, K. (1983). Qualitative reasoning about space and motion. In D. Gentner & A. L. Stevens (Eds.), Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Fracker, M. L. (1990). Attention gradients in situation awareness, Situational awareness in aerospace operations (AGARD-CP-478). Neuilly-sur-Siene, France: Advisory Group for Aerospace Research and Development. Gale, A. (1987). The electroence phalogram. In Gale & Christie (Eds.), Psychophysiology and the electronic workplace. London: Wiley. Gentner, D., & Gentner, D. R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & A. L. Stevens (Eds.), Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Gentner, D., & Stevens, A. L. (1983). Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Gevins, A., & Schaffer, R. (1980). A critical review of electroencephalographic (EEG) correlates of higher cortical functions. CRT Critical reviews in Bioengineering, 4, 113–164. Guilfoyle, C., & Warner, E. (1994). Intelligent agents: The new revolution in software. London: Ovum. Haas, M. W. (1995). Virtually-augmented interfaces for tactical aircraft. Biological Psychology, 40, 229–238. Haas, M. W., Hettinger, L. J., Nelson, W. T., & Shaw, R. L. (1995). Developing virtual interfaces for use in future fighter aircraft cockpits (AL/CF-TR-1995–0154). Wright-Patterson Air Force Base, OH: Air Force Material Command. Hammond, K. R., Hamm, R. M., Grassia, J., & Pearson, T. (1987). Direct comparison of the efficacy of intuitive and analytical cognition in expert judgment. IEEE Trans. on Systems, Man & Cybernetics, 17(5), 753–770. Harper, K., Mulgund, S., Zacharias, G., & Kuchar, J. (1998, August). Agent-based performance assessment tool for general aviation operations under free flight. Paper presented at the 1998 Guidance, Navigation, and Control Conference, Boston. Harris, R., Bonadies, G., & Comstock, J. R. (1989). Usefulness of heart measures in flight simulation. Proceedings of the Third Annual Workshop on Space Operations, Automation, and Robotics, Houston, TX: NASA Johnson Space Center. Hart, S. G., & Staveland, L. E. (1988). The development of the NASA task load index (TLX): Results of empirical and theoretical research. In Meshkati (Ed.), Human mental workload (pp. 139–183). Amsterdam: North-Holland. Hartman, B., & Secrist, G. (1991). Situation awareness is more than exceptional vision. Aviation, Space and Environmental Medicine, 62, 1084–1091. Harwood, K., Barnett, B., & Wickens, C. D. (1988). Situational awareness: A conceptual and methodological framework. Proceedings of the 11th Symposium of Psychology in the Department of Defense. Heckerman, D. (1995). A tutorial on learning Bayesian networks (Tech. Rep. No. MSR-TR-95–06). Microsoft: Redmond, WA. Hegerty, M., Just, M. A., & Morrison, I. R. (1988). Mental models of mechanical systems (54). Pittsburgh: Carnegie-Mellon University.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
521
Henrion, M. (1989). Some practical issues in constructing belief networks. In Kanal, Levitt, & Lemmer (Eds.), Uncertainty in Artificial Intelligence (vol. 3, pp. 161–173). North Holland, the Netherlands: Elsevier. Horst, R., Ruchkin, D., & Munson, R. (1987). Event-related potential processing negatives related to workload. In Johnson, Rohbaugh, & Parasuraman (Eds.), Current trends in event-Related potential research. Amsterdam: Elsevier. Hudlicka, E., & Huggins, A. W. F. (1994). The use of cognitive science techniques for developing FAA inspection indicators Final Technical Report (Report No. 7940). BBN Laboratories. Janca, P. C. (1995). Pragmatic application of information agents. BIS Strategic Decisions. Johnson-Laird, P. N., Byrne, R. M. J., & Schaeken, W. (1992). Propositional reasoning by model. Psychological Review, 99(3), 418–439. Kalsbeek, J. W. H. (1971). Sinus arrhythmia and the dual task method in measuring mental load. In Fox, Singleton, & Whitfield (Eds.), Measurement of man at work. London: Talylor & Francis. Kantowitz, B. H., & Casper, P. A. (1988). Human Workload in Aviation. In Weiner & Nagel (Eds.), Human factors in aviation (pp. 157–187). New York: Academic Press. Kantowitz, B. H., Hart, S. G., Bortolussi, M. R., Shivley, R. J., & Kantowitz, S. C. (1984). Measuring pilot workload in a moving-based simulator: II Building levels of load. Proceedings of the Annual Conference on Manual Control. Kieras, D. E. (1984). A simulation model for procedure inference from a mental model for a simple device (15): University of Arizona. Klein, G. A. (1989a). Recognition-Primed Decisions. Advances in man–machine Systems Research, 5, 47–92. Klein, G. A. (1994). Naturalistic decision making: Implications for design. Gateway, 5(1), 6–7. Klein, G. A., et. al. (1989b). Critical decision method for eliciting knowledge. IEEE Trans. on Systems, Man and Cybernetics, 19(3), 462–472. Klein, G. A., Calderwood, R., & Clinton-Cirocco, A. (1986). Rapid decision-making on the fire ground. Proceedings of the Human Factors Society 30th Annual Meeting. Kosslyn, S. M., Cave, C. R., Forbes, Z. E., & Brunn, J. L. (1983). Imagery ability and task performance (2). Waltham, MA: Brandeis University. Kosslyn, S. M., & Schwartz, S. P. (1977). A simulation of visual imagery. Cognitive Science, 1(3). Kramer, A. (1991). Psysiological metrics of mental workload: A review of recent progress, in multitask environments. In Ramos (Ed.), Multiple task performance (pp. 279–328). London: Taylor & Francis. Larkin, J. H. (1983). The role of problem representation in physics. In D. Gentner & M. L. Stevens (Eds.), Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Larkin, J. H., & Simon, H. A. (1987). Why a diagram Is (sometmes) worth ten thousand words. Cognitive Science, 11, 65–99. Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computation with probabilistics in graphical structures and their applications to expert systems. Journal of the Royal Statistical Society B, 50(2). Lewis, H. V., & Fallesen, J. L. (1988). Human factors guidelines for command and control systems: Battlefield and decision graphics guidelines (Rep. No. 89–01). US ARI for Behavioral and Social Sciences. Lindholm, E., Chatham, C., Koriath, J., & Longridge, T. (1984). Physiological assessment of aircraft pilot workload in simulated landing and simulated hostile threat environments (Tech. Rep. No. AFHRL-TR-83-49). Williams Air Force Base, AZ: Air Force Systems Command. Lohse, G. L., Biolsi, K., Walker, N., & Rueter, H. H. (1994). A classification of visual representations. Communications of the ACM, 37(12), 36–49. Lussier, J. W., Solick, R. E., & Keene, S. D. (1992). Experimental assessment of problem solving at the combined arms services staff school (Research Note 92-52): US ARI for Behavioral and Social Sciences.
522
MULGUND, RINKUS, AND ZACHARIAS
McCarthy, J., & Hayes, P. J. (1969). Some philosophical problems from the standpoint of artificial intelligence. In Meltzer, Mitchie, & Swann (Eds.), Machine intelligence (vol. 4, pp. 463–502). Edinburgh, Scotland: Edinburgh University Press. McKinley, R. L., D’Angelo, W. R., Haas, M. W., Perrot, D. R., Nelson, W. T., Hettinger, L. J., & Brickman, B. J. (1995). An initial study of the effects of 3-dimensional auditory cueing on visual target detection. Wright-Patterson Air Force Base, OH: Milgram, P., van der Wijngaart, R., Veerbeek, H., Bleeker, O. F., & Fokker, B. V. (1984). Multi-crew model analytic assessment of landing performance and decision making demands. Proceedings of the 20th Annual Conference on Manual Control. Moffett Field, CA:. Natani, K., & Gomer, F. (1981). Electrocortical activity and operator workload: A comparison of changes in the electroencephalogram and event-related potentials (Tech. Rep. No. MDC E2427). McDonnell-Douglas. Norman, D. (1983). Some observations on mental models. In D. Gentner & M. L. Stevens (Eds.), Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Olson, J. R., & Biolsi, K. (1990). Techniques for representing expert knowledge (Tech. Rep. No. 34). Ann Arbor: University of Michigan, Cognitive Science and Machine Intelligence Laboratory. Opmeer, C. H. J. M., & Krol, J. P. (1973). Towards an objective assessment of cockpit workload: I psychological variables through different stages. Aerospace Medicine, 44, 527–532. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann. Pettyjohn, F. S., McNeil, R. J., Akers, L. A., & Faber, J. M. (1977). Use of inspiratory minute volumes in evaluation of rotary and fixed wing pilot workload. Methods to Assess Workload (AGARD Conference Proceedings No. CP216). Pope, A. T., Bogart, E. H., & Bartolome, D. S. (1995). Biocybernetic system evaluation indices of operator engagement in automated task. Biological Psychology, 40, 187–195. Reid, G. B., & Nygren, T. E. (1988). The subjective workload assessment technique: A scaling procedure for measuring mental workload. In Hancock & Meshkati (Eds.), Human mental workload (pp. 185– 218). Amsterdam: North-Holland. Roschelle, J., & Greeno, J. G. (1987). Mental models in expert physics reasoning (GK-2). Berkeley: University of California Press. Russell, P., & Norvig, P. (1995). Artificial intelligence: A modern approach. Englewood Cliffs, NJ: Prentice-Hall. Sarkar, M., & Brown, M. H. (1994). Graphical fisheye views. Communications of the ACM, 37(12), 73–84. Sarter, N. B., & Woods, D. D. (1995). How in the world did we ever get into that mode? Mode error and awareness in supervisory control. Human factors, 37(1). Scannell, T. (1993). Someone (or some thing) to watch over me. Computer Currents, 8, 24–25. Selcon, S. J., Taylor, R. M., & Shadrake, R. A. (1992). Multi-modal cockpit warnings: Pictures, words, or both? Proceedings of the Human Factors Society 36th Annual Meeting (pp. 57–61). Atlanta, GA. Shaw, J. J., & Powell, W. S. (1989). Lessons learned from the cascade polar exercise (Tech. Rep.). Burlington, MA: Alphatech. Shaw, R. (1988). Fighter combat: Tactics and manuvering. Annapolis, MD: Naval Institute Press. Sheridan, T. B. (1992). Telerobotics, automation, and human supervisory control. Cambridge, MA: MIT Press. Sirevaag, E., Kramer, A., DeJong, R., & Mecklinger, A. (1988). A psychophysiological analysis of multi-task processing demands. Psychophysiology, 25, 482. Speyer, j., Fort, A., Foulliet, J., & Blomberg, R. (1987). Assessing workload for minimum crew certification. In Roscoe (Ed.), Practical assessment of Pilot workload. Washington, DC: AGARD. Spiegelhalter, D., Dawid, P., Lauritzen, S., & Cowell, R. (1993). Bayesian analysis in expert systems. Statistical Science, 8, 219–282.
22.
ADAPTIVE PVI FOR THE TACTICAL AIR ENVIRONMENT
523
Spyker, D. A., Stackhouse, S. P., Khalafalla, A. S., & McLane, R. C. (1971). Development of techniques for measuring pilot workload (Tech. Rep. No. CR-1888). :National Aeronautics and Space Administration. Sterman, M. B., Mann, C. A., & Kaiser, D. A. (1992). Quantitive EEG patterns of differential in-flight workload, Space operations applications and research proceedings (pp. 446–473). Houston, TX: NASA Johnson Space Center. Stern, J., & Skelly, J. (1984). The eyeblink and workload considerations. Proceedings of the Human Factors Society 28th Annual Meeting. San Antonio, TX. Stewart, L., & McCarty, P. (1992). The use of Bayesian belief networks to fuse continuous and discrete information for target recognition, tracking, and situation assessment. Proceedings of the SPIE, 1699, 177–185. Stiffler, D. R. (1988, Summer). Graduate-level situation awareness. USAF Fighter Weapons Review, 115–120. Tolcott, M. A., Marvin, F. F., & Lehner, P. E. (1989). Expert Decision making in evolving situations. IEEE Transactions on Systems, Man & Cybernetics, 19(3), 606–615. Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press. Vicente, K., Thornton, D., & Moray, N. (1987). Spectral analysis of sinus arrhythmis: A measure of mental effort. Human Factors, 29, 171–182. White, B. Y., & Frederiksen, J. R. (1987). Causal model progressions as a foundation for intelligent learning environments (6686). Cambridge, MA: BBN Laboratories. Wickens, C. D. (1984). Engineering psychology and human performance. Columbus, OH: Merrill. Wickens, C. D. (1991). Processing resources and attention. In Ramos (Ed.), Multiple task performance (pp. 3–34). London: Taylor & Francis. Wickens, C. D., Sandry, D. L., & Vidulich, M. (1983). Compatibility and resource competition between modalities of input, central processing, and output. Human Factors, 25(2), 227–248. Wierwille, W. W., & Eggemeier, F. T. (1983). Evaluation of 20 workload measures using a psychomotor task in a moving base aircraft simulator. Human Factors, 25, 1–16. Wierwille, W. W., & Eggermeier, F. T. (1993). Recommendations for mental workload measurement in a test and evaluation environment. Human Factors, 35(2), 263–282. Wilson, G. F., & Eggemeier, F. T. (1991). Physiological assessment of workload in multi-task environments. In Ramos (Ed.), Multiple task performance (pp. 329–360). London: Taylor & Francis. Wilson, G. F., Fullencamp, P., & Davis, B. S. (1994, February). Evoked potential, cardiac, blink, and respiration measures of pilot workload in air-to-ground missions. Aviation, Space, and Environmental Medicine, 100–105. Wilson, G. F., Purvis, B., Skelly, J., Fullencamp, P., & Davis, I. (1987). Psysiological data used to measure pilot workload in actual flight and simulator conditions. Proceedings of the Human Factors Society, Santa Monica, CA. Witte, T., & Kelly, V. C. (1994). Visualization support for an army reconnaissance mission (95–43). DTIC. Zacharias, G. L., & Baron, S. (1982). A proposed crew-centered analysis methodology for fighter/attack missions (4866). Cambridge MA: Bolt, Beranek & Newman. Zacharias, G. L., Miao, A., Kalkan, A., & Kao, S. (1994b). Operator-based metric for nuclear operations automation assessment. Transactions of the 22nd Water Reactor Safety Information Meeting. Bethesda, MD:. Zacharias, G. L., Miao, A., Kalkan, A., & Kao, S. (1994a). Operator model for nuclear operations automation assessment (Tech. Rep. No. R93141). Charles River Analytics. Zacharias, G. L., Miao, A. X., Illgen, C., & Yara, J. M. (1995). SAMPLE: Situation awareness model for pilot-in-the-loop evaluation (Final Rep. No. R95192). Charles River Analytics. Zacharias, G. L., Miao, A. X., & Riley, E. W. (1992). Situation awareness model for the BVR mission (Final Rep. No. R90011). Charles River Analytics.
524
MULGUND, RINKUS, AND ZACHARIAS
Zacharias, G. L., Miao, A. X., Riley, E. W., & Osgood, R. K. (1992). Situation awareness metric for cockpit configuration evaluation (Final Rep. No. AL-TR-1993–0042). Wright-Patterson Air Force Base, OH:. Zadeh, L. A. (1973). Outline of a new approach to the analysis of complex systems and decision processes. IEEE Transactions on Systems, Man and Cybernetics, SMC-3, (1), 28–44. Ziegler, B. (1996, September 18). Hey kids, tell mom this game really makes you use your brain. The Wall Street Journal, p. B1.
23 The Implementation of Psycho-Electrophysiological Interface Technologies for the Control and Feedback of Interactive Virtual Environments Alysia M. Sagi-Dolev Combat Environment Human Systems Israel Electrophysiologically based interactive virtual environments have great potential use in training and selection applications where complex multitasking and pressure-based high-performance functioning is required, such as in the aerospace and military fields (aviators, flight officers, and advanced weapon system operators). Current virtual environments for training and selection can be taken a step further by incorporating electrophysiologically based interactivity on two levels: operator control and adaptive training algorithms based on the user’s current psychophysiological state. Implementation of the above may be accomplished using current research results on electrophysiological parameters of human performance and control. Examples of signals that can be used in control mechanisms include brain activity measured by electroencephalographic (EEG) signals and evoked potentials, eye activity measured by electro-oculography (EOG), and muscle activity measured by electromyography (EMG). Examples of signals that can be used as measures of multitask performance include visual, auditory, and somatosensory-evoked potentials. Indicators of stress responses to simulated situations include heart rate variability and galvanic skin response. Research advances in these fields, combined
525
526
SAGI-DOLEV
with those in applied sensor technologies, show promise for their implementation in current virtual reality simulation systems. The purpose of this chapter is to: (1) describe the various psychoelectrophysiological parameters that can be used to interface real-time systems for control and information feedback, (2) provide an overview of current fieldoriented sensing technologies, including guidelines for practical design and implementation, and (3) provide examples of interactive psycho-electrophysiological interfaces for control or performance status that can be incorporated into interactive virtual environments.
PARAMETERS FOR PSYCHO-ELECTROPHYSIOLOGICALLY BASED INTERACTIVE INTERFACES Basic Concepts in Electrophysiological Signal Manifestation The utilization of electrophysiological parameters as command controls, or for quantifying cognitive components, begins with understanding the architecture, function, and communication pathways of the nervous system. Information received by sensory receptors (visual, auditory, tactile or other receptors) will either result in direct reactions or be stored in memory areas of the brain. The central nervous system (CNS) is responsible for function at the spinal cord level, lower brain level, and the higher brain or cortical level. Neuronal circuits in the spinal cord are responsible for events such as limb movements, withdrawal and support reflexes, blood vessel reflexes, gastrointestinal movements, and more. At the lower brain level are most of the regulatory subconscious controls, such as control of arterial pressure, respiration, equilibrium control, feeding reflexes, and many emotional reactions. The cortex, with its large memory-storage capacity, functions with the lower brain centers to achieve specific and precise operations. Although the cortex does not function alone, it is essential for most thought processes. The peripheral nervous system (PNS) consists of the neural structures (motor nerves, sensory fibers, specialized receptors, and sympathetic fibers) that are located outside of the CNS. The autonomic nervous system, activated mainly by centers in the spinal cord, brain stem and hypothalamus, is the portion of the nervous system that controls the visceral functions of the body. It controls arterial pressure, gastrointestinal secretion, sweating, temperature, and many other activities. This system is subdivided into the sympathetic and parasympathetic systems. Figure 23.1 illustrates the CNS, PNS, and paths that transmit sensory and motor information. Information is transmitted in the nervous system primarily through nerve impulses through a succession of neurons in small assemblies or clusters, forming neural networks that communicate electrochemically at the synapse. These
23.
PSYCHO-ELECTROPHYSIOLOGICAL INTERFACE TECHNOLOGIES
Motor
527
Somatic
Autonomic Nervous System (Sympathetic and Parasympathetic) Speech Auditory
Visual
Pressure Touch Pain Temperature Muscle Spindle Kinesthetic Receptor Joint
Peripheral Nervous System
Somatic Sensory and Motor sensors
FIG. 23.1. Schematic representation of the central nervous system, the peripheral nervous system, the autonomic nervous system, and neural paths.
communications induce a measurable electrical field. Although direct and local monitoring of the changes in potential are optimal for detecting and understanding the sensory information process, noninvasive methods of acquiring this information are necessary when dealing with human subjects. Although these signals yield no information at the neuronal level, their integration into an overall potential can be used as a measure of response or activation in the nervous system, thereby yielding information from the brain, cardiac and skeletal muscles, and communication paths. The methods used to measure these occurrences noninvasively, at the skin surface, include electrophysiological parameters such as electroencephalogram (EEG), evoked potentials (EPs), electrocardiogram (ECG), electromyogram (EMG), and electro-oculogram (EOG), all of which will be described and discussed later.
528
SAGI-DOLEV TABLE 23.1 Types of Interactive Interfaces for Control and Training Using Electrophysiological Parameters
Control
Adaptive Training
Adaptive Selection Processing
Predefined Command Control
Cognitive traits (multitask performance, memory, reaction time, scanning technique, etc.) Psychological traits (decision making under stressful situations, endurance, perseverance, etc.)
Of cognitive traits (multitask performance, memory, reaction time, scanning technique, etc.)
Subtle drive command control
Subliminal feedback control
How Can Electrophysiological Parameters Be Used for Interactive Interfacing in Virtual Environments? Electrophysiological parameters can be used to either control activity in a virtual environment or as state indicators for training and adaptive selection processing. “Control,” in this case, means to give a command to the system. A command can either be an active and conscious command, analogous to pushing a button, or a drive command by tracking consecutive input orders such as the location of a gaze. Adaptive training and selection entails using the electrophysiological parameters to make inferences about an individual’s cognitive and other psychological states and traits, and in some instances use measurements to modify a program and thus make it interactive. The following section discusses each of these potential applications, including definitions and implementation. Table 23.1 provides a list of the types of interactive interfaces discussed in this section. Control Where nonmanual control is beneficial, three major categories of alternative controls can be utilized: 1. Predefined command control: This method uses predefined conscious control commands in which the operator actively issues instructions to the system through methods such as: limb movement, gestures, straining, voice commands, active blinking, and evoked response gesturing choice selection.
23.
PSYCHO-ELECTROPHYSIOLOGICAL INTERFACE TECHNOLOGIES
529
2. Subtle drive command control: This entails the use of methods such as brainactuated control systems based on an EEG, choice selection through evoked potentials, and tracking via eye movements. 3. Subliminal feedback control: This involves the maintenance of control levels based on a psychological or physiological state, such as remaining “calm” or unaffected by sensory-mismatch-induced motion sickness. Examples of implementing these methods for control include: the use of EMG for gesture, straining, and limb movement; EOG for tracking (Kaufman, Bandopdhay, & Shaviv, 1993; Smyth, 1998) or blinks; EEG and EMG (Cress, Hettinger, Nelson, & Hass, 1997; Nelson et al., 1997); and evoked potentials (Farwell & Donchin, 1988). Adaptive Training and Selection Processing The power of virtual reality and immersed environments to invoke sensory mismatch, stress, anxiety, and physiological imbalances in addition to the simulated situation itself can be combined with electrophysiological parameters to build an interactive adaptive process. This can be used to quantify the performance of tasks, workload, or psychological states. The resulting information can be used to iteratively modify the next scenario or task. This process allows participants to adapt the learning process to their own pace and capabilities, and also to identify weaknesses and focus on improving them. Examples include iterative training for improving decision-making skills and tactics under stress, multitask attention strategies, and improving scanning strategies. For selection applications, conventional testing and scoring can be replaced by an adaptive process, which can better identify the cognitive or psychological component responsible for a given response, thereby allowing a finer definition of skills capabilities. To implement such an approach, the first step is to define which traits, tasks, or other cognitive components we wish to monitor and quantify. Using a cognitive task-analysis model specific to the occupation for which the system is intended, a general architecture of parameters can be defined, capable of measuring the relevant cognitive or psychophysiological components. Cognitive components can than be associated with those physiological systems that are either directly responsible for or are secondary reactions to the cognitive component in the model. Figure 23.2 illustrates a cognitive model with associated physiological systems and relevant electrophysiological parameters. Bioelectric Signals and Psycho-Electrophysiological Parameters Electroencephalogram The electroencephalogram is the scalp recording of spontaneous brain electrical activity from various cerebral sources. The signals are obtained using a single or bipolar electrode pair and reference electrode. This enables the study of
530
SAGI-DOLEV
Auditory Sound Hearing
Visual
Speech Hearing
EEG
View Field Monitoring
Looking
Scanning
EEG
EEG
Perception
AEP
Sensory Input
AEP
EOG
VEP VEP
Basic Cognitive Functions
Working Memory Verbal
Long Term Memory Spatial Semantic Acoustic
Spatial
EOG EKG
Attention Resource Allocation
Decision Making & Choice of Reaction Motoric
Continuous Actions
Discrete Actions
Speech
Output
FIG. 23.2. A cognitive model of a task analysis, including relevant electrophysiological parameters. Based on a modified version of a task analysis by Brikener and colleagues, 1994.
space–time pattern variables of the electric fields. The intensity and patterns of this electrical activity are the result of the overall excitation level. Signal amplitudes are typically in the range of 1–100µV, with frequency ranges of 1–40 Hz. There are several distinct patterns that appear and are classified according to their frequency:
r Alpha waves occur between 8 and 12 Hz and are associated with a quiet,
r r
awake, and resting state of cerebration. They are most intense in the back region of the head known as the occipital region, but can also be found in the parietal and frontal areas of the scalp. Beta waves occur between 12 and 20 Hz for Beta Type 1, and above 20 Hz for Beta Type 2. These waves occur during activation of the CNS and during tension. Theta waves are between 4 and 7 Hz and are typical of Stages 2 and 3 of sleep, transitional states of consciousness, certain disorders, and frustration.
23.
PSYCHO-ELECTROPHYSIOLOGICAL INTERFACE TECHNOLOGIES
(a)
531
(b)
FIG. 23.3. (a) The 10–20 system of electrode placement used in a clinical EEG. (b) Examples of known EEG frequency classification.
r Delta waves occur below 4 Hz, typically 2–4 Hz, and occur in the cortex independently of activities in the lower regions of the brain. These waves occur in deep sleep, unconsciousness, and serious organic brain disease. Figure 23.3 illustrates the different types of brain waves and the conventional clinical electrode placement montage. The basic parameters of EEG signals are frequency, amplitude, the power at a given frequency, and the power density distribution. For these types of analysis, the transition is made from the time domain to the frequency domain. Most attempts at control via EEG are based on the use of alpha waves (or “mu” waves) because they respond to changes in mental effort, such as a conscious rise in attention level. An example of an invention based on the use of alpha waves as a communication input to a system is described by Wolpaw and McFarland (1995). Other systems use an aggregate of EEG and EMG signals from the frontal locations. In cognitive studies, the location in those brain centers responsible for the processing path of the specific task, and the relationship between those areas, are of importance. Topographic information, materialization, and absolute and relative amounts of power at the different frequency bands are all significant. There is a vast amount of literature associating EEG with cognitive tasks (e.g., Easton & John, 1995; Gevins & Schaffer, 1991; Jung, Makeig, Stensomo, & Sejnowski, 1997; Mitushina & Stamm, 1994). A practical application is the invention of a human– computer interface by Gevins (1995), based on EEG correlates of workload that can vary task difficulty in accordance with the subject’s neurocognitive workload level. Evoked Potentials Evoked potentials and evoked-related potentials (ERPs) reflect responses of the brain to specific stimuli, along the communication pathway of the stimulus. Visual
532
SAGI-DOLEV
evoked potentials (VEPs) will yield signals corresponding to an external visual stimulus. Auditory evoked potentials (AEPs), recorded from the vertex, will show the response along the auditory pathway. Somatosensory evoked potentials (SEPs), recorded from the sensory cortex, record the responses due to tactile stimuli. Signal amplitudes are in the range of 0.5–10µV, and frequency typically ranges from 2–3 KHz, depending on the input stimuli. Electro-oculogram The electro-oculogram (EOG) measures changes in corneal-retinal potential due to eye movement. Horizontal and vertical movement can be acquired from electrode pairs placed on the horizontal and vertical axes. Frequency range is typically from DC-30Hz, and the dynamic range from 10µV–5mV. For tracking, using a proper arrangement of the EOG electrodes will yield voltage changes proportional with eye rotations with a range of approximately 30 deg from the center (Lusted & Knapp, 1996) and lowest achieved resolutions of 1 deg (Smyth, 1998). Hashiba, Yasui, Watabe, Matsuoka, and Baba (1995) defined the CEM—composite eye movement, which is a combination of SPEM, smooth pursuit eye movement—and the saccade. Saccades are rotations, or “jumps,” of the eye to ensure that an image is always in the visual scene. The disadvantages of using EOG for tracking applications are problems with drift, muscle artifact, and eyelid artifact. When using an EOG interface for generating communication and control functions, predefined gestures based on blink morphology, rate, and occurrence pattern are usually used. For cognitive assessment, the eye blink is the primary behavioral and performance measure, whereas horizontal movements are of importance in the determination of scanning strategies. Sources of blinks are reflexive, voluntary, and endogenous (which occurs in the absence of an eliciting stimulus). Eye blinks are usually less than 150 msec in duration. The parameters that are of importance are: blink rate, blink duration, blink amplitude, and blink pattern. Correlation with performance and vigilance will differ, depending on task source. For a review of endogenous eye blink parameters for behavioral, vigilance, and performance correlation, the reader is referred to Stern, Walrath, and Goldstein (1984) and Steinman (1986). Electromyogram The electromyogram (EMG) is the recording and interpretation of muscle action potentials, that is, the electrical signals traveling back and forth between the muscles and the peripheral and central nervous system. The dynamic range is 50µV–5mV, and frequencies can range from 2 Hz to approximately 150 Hz during intense straining. EMG can be used to actively drive a system by recording straining. It was first initiated as a control parameter for prostheses and in feedback for functional neuro-stimulation (FNS) for the disabled. EMG can contribute to control, not only
23.
PSYCHO-ELECTROPHYSIOLOGICAL INTERFACE TECHNOLOGIES
533
using limb tension or movement, but by recording of facial gestures and tension. In other cases, the EMG signal will be a “noise” factor in other measurements such as EOG, frontal EEG, and EKG measurements, and will be filtered. Electrocardiogram The electrocardiogram (ECG) is the recording on the body surface of the electrical activity generated by the heart. The waveform is recorded differentially between two points on the body, referred to as a lead. Components of the wave are the P wave, QRS complex, and the T and U waves. The peak amplitude of the QRS is approximately ±2mV. Instrumentation usually requires a system with a bandwidth between 0.05 and 150 Hz. Most of the relevant information is in the low frequency range, although some diagnostics systems will look for high-frequency changes. The most widely applied parameters in workload and task-related analysis include measures of the R-R interval, its length, rate (heart rate), and variability. Heart rate variability analysis looks not only at variations of the absolute rate, but also at variations of the relationship between low- and high-frequency changes, which can indicate dynamic relationships between the sympathetic and parasympathetic systems of the autonomic nervous system. An important development in this area is analysis of transient phenomena in short intervals (Keselberner & Akselrod, 1996). Additional Measurements There are several other electrophysiological parameters that can play a role in the interactive interfacing, such as the galvanic skin response, which is a measure of microsweat changes. This parameter changes with stress and is currently in use in many biofeedback and “lie detection” systems. An additional, less well-known parameter, but one that is perhaps relevant to the high-intensity sensations produced by virtual environments, is the electrogastrogram (EGG). This is a measure of changes in electrical signals emanating form the gastric area and has been used to quantify motion sickness at sea.
IMPLEMENTATION Signal Sensing and Acquisition Techniques The communication of information in the nervous system and other excitable tissue is via the conduction of electronic signals. Current flow in the body involves ions. Electric conductivity in the outside world occurs via electrons. The transition is carried out by electrodes with a conductive media in contact with the aqueous ionic current from the body. At the electrode–skin interface a reaction occurs in which charge is transferred between the electrode and the ionic source. The electrode
534
SAGI-DOLEV
+
1
1
A/D −
Electrodes
Preamplifier High-Pass Filter
Isolation Amplifier
Low-Pass Filter
Processor Computer
Analog to Digital
FIG. 23.4. Schematics for a biopotential acquisition system, including amplification, isolation, and digital conversion of biosignals.
provides the first transition between ionic current flow in biological tissue and electronic current flow that is then fed into the amplification system. It is this first stage that has the most profound influence on the composition and quality of the signal. As seen from the previous section, biosignal measurements are at very low voltages, ranging between 1µV and 100mV. They pass though very high skin impedance and high levels of interference and noise. To make them compatible with any type of computing component, the system requires amplification, rejection of superimposed interference and noise, and a safe interface between the participant and current in accordance with biomedical instrumentation safety standards. All these make up a biopotential amplifier. More detailed designs can be found in Naglel (1995). After amplification, the signal must be converted into digital form and fed into a computing unit that will perform data analysis and storage functions. This can be an off-the-shelf device, computer, or a programmable chip in an integrated circuit. A schematic design of the process is provided in Fig. 23.4. The following are major components of, and considerations regarding, a biomedical acquisition system with a differential amplifier and digitization system:
r Bioelectorode r Common-mode rejection: the rejection of interference of the same potential that appears in both inputs of a differential signal.
r Preamplifier: the most critical part of the amplifier, because it can eliminate, r r r r
or at least minimize, most interference. High-pass filter: filters out frequencies below the desired range. Isolation amplifier: isolates and protects the subject from electric shock. Low-pass filter: filters out frequencies above the desired range. Analog-to-digital conversion: conversion that can be implemented with a chip or board. Typical digitization rates range from 30 Hz for EOG signals and up to 10 KHz for some EPs and high-frequency EKG.
23.
PSYCHO-ELECTROPHYSIOLOGICAL INTERFACE TECHNOLOGIES
535
All of these components can easily be incorporated within miniaturized electronic designs. This makes it convenient for incorporation into a VR system or helmet/hat used for immersive environments, thereby eliminating the cables and the interface module of the past. The problems of comfort and practicality regarding the sensors have not yet been sufficiently solved. Practical Sensors Resistive Electrodes Resistive electrodes are conductive materials, usually metal plates, that come in contact with the skin. One of the focal issues in using this type of electrode is ensuring that the impedance is below a certain value, usually 2K , depending on the system and electrodes. The most common electrode currently in use is the silver/silver-chloride (Ag/AgCl) electrode. There are two categories of resistive electrodes, wet and dry. Wet Resistive Electrodes This category includes all metal plate electrodes that require the use of conductive gel or paste to reduce impedance. In general, electrodes requiring prior skin preparation and the active addition of gels or pastes are not practical for use in a system that should be comfortable and of a “plug and play” nature. However, in the wet resistive electrode category, we can find more practical types that contain the necessary electrolyte and either dispense it in a regulated manner or ensure a constant electrolyte level. These types of electrodes include sponge saturated electrodes, usually self-adhesive and disposable (Fendrock, 1992), dispensing systems (Gevins, 1993), and slow-release electrodes (A. Guteman, personal communication, 1994). Dry Resistive Electrodes Dry electrodes are conductive metals or other materials that come in contact with the skin, either in plate or other designs. Examples of materials include the use of silver/silver chloride in direct contact, carbon-loaded silicon, conductive materials (Sem Jacobsen, 1976), a p-n semiconductor (Shmid, 1983), and various metals. In the use of the various materials, care must be taken to match the type of signals acquired with material compatibility. For example, carbon-based materials will usually allow good ECG signals, but they are usually too noisy for the lowamplitude signals of EEG. An additional factor with dry resistive electrodes is the variation in design. One of the most practical designs is a “hairbrush” electrode, made of conductive hard wires, that penetrates the hair. Examples include a brush-tip electrode (Chen & Laszlo, 1993) and conductive metal fingers incorporated in a flexible hat (Gevins,
536
SAGI-DOLEV
Disposable Self-Adhesive Electrodes
Brush Electrodes
Dry Conductive Plate Electrodes
Electrolyte Injector
Active Capacitive Electrodes
Carbon-Silicon Electrodes
FIG. 23.5. Various types of resistive and capacitive electrodes.
Durousseau & Libove, 1990). Variations of this method provide a multicontact sight with no need for prior preparation. A disadvantage is that the electrodes cannot be passed from one subject to the next, because in some instances the wires may penetrate the skin. Capacitive Electrodes Capacitive electrodes are also known as insulated electrodes and dry electrodes. Care must be taken in distinguishing between capacitive and resistive dry electrodes. In this type of electrode, the dielectric coating (which is not conductive) is in direct contact with the skin and creates a capacitor between the skin and the conductive plate. Examples include the use of dielectric materials such as aluminum oxide (Lopez & Richardson, 1969); pyre varnish (Potter & Menke, 1970); silicon oxide, a hard anodization dielectric coating (Prutchi & Sagi-Dolev,1993); and silicon nitride (Taheri, Knight, & Smith, 1994). This type of electrode is active, meaning that it requires an energy source to drive electronic components. The advantages of such electrodes are: no need for skin preparation, no need to use conductive gels or pastes, reduced susceptibility to factors that change electrochemical equilibrium, such as skin temperature and sweat. Disadvantages of some coatings are their vulnerability to scratches and surface deterioration, which render them inefficient. The non-prep qualities of this electrode type make them a good choice in terms of practicality. Each electrode type must be evaluated
23.
PSYCHO-ELECTROPHYSIOLOGICAL INTERFACE TECHNOLOGIES
537
in terms of signal size and frequency for which it is intended to make a correct choice. Implementation and Artifact Considerations Application Requirements To use an interface in any environmental setting, practical requirements must be met. The application of electrodes should require no skin preparation or addition of gels and pastes. Therefore, either a dry resistive, dry capacitive, or regulated wet electrode should be used. Electrodes and electronics must not interfere with the virtual environment system itself, either mechanically or electronically. Special care must be taken in placing electrophysiological sensors in head-mounted systems and gloves with preexisting motion and pressure sensors. Electrodes can be mounted directly into the system or into a modular application system. In either event, electrodes must be easily replaceable to avoid using the same electrodes with different subjects. This point is especially important for brush electrodes, which may penetrate the skin. Including an autonomous power supply and not using central power can avoid many safety issues. This will also reduce 60 Hz interference. Artifact Sources Artifacts of various sources must be defined prior to designing the interfacing system. Based on the identification and proper quantification, precautions can be taken in the design to avoid or reduce these interferences, either by means of hardware design or filtering algorithms. Common sources of artifacts are human, system-induced, and surrounding artifacts. Human Artifacts Human artifacts include the overlapping of various signals, accompanying motion artifacts, breathing, and tension. In signal overlapping, electrophysiological signals of various sources may show up in different locations. For example, in frontal EEG recordings, eye blinks and movements will appear in the EEG signal. In some locations, EKG can be recorded as well. EMG will show up in EKG signals if the subject is in motion. This type of artifact is one of the most difficult to address. Adaptive filters are usually used to clean and separate signals such as these from one another. Motion artifacts will cause either movement of the electrode on the interface or changes in pressure of the electrode on the skin. The extent will depend on the electrode and mounting apparatus. It is recommended that a fixation apparatus such as a suspension system be used to secure the electrode in place and ensure
538
SAGI-DOLEV TABLE 23.2 Examples of Inventions That Can Be Used to Interface Interactive Virtual Environments
Patent No.
Year
Inventor
Invention
5,686,938 5,360,971
1997 1994
L. Z. Batkhan A. B. Kaufman A. Andopadhay G. J. Piligian
5,726.916
1998
C. Smyth
5,692,517 5,638,826
1997 1997
A. Junker J. Wolpaw D. J. McFarland
Predefined Command Control Adaptive cursor control system Apparatus and method for eye-tracking interface
Subtle Command Control Method and apparatus for determining ocular gaze point of regard and fixation duration Brain–body actuated system Communication method and system using brain waves for multidimensional control
Iterative Training and Selection Cognitive 5,649,061
1997
C. Smyth
5,447,166
1995
A. Gevins
5,406,957
1995
M. Tansey
5,295,491
1994
A. Gevins
5,601,090
1997
M. Toshimitsu
5,568,126
1996
A. Stig J. O. Sorensen
Device and method for estimating a mental decision Neurocognitive adaptive computer interface method and system based on online measurements of the user’s mental effort Electroencephalic neurofeedback apparatus for training and tracking of cognitive states Noninvasive human neurocognitive capability testing method and system Psychological Method and apparatus for automatically determining somatic status Providing an alarm in response to a determination that a person may have suddenly experienced fear
constant pressure. Breathing artifacts are caused by mechanical movement of the chest and can also affect recordings. Artifacts cause by muscle tension will show up as spurious EMG signals. System-Induced Artifacts These artifacts are caused by the system itself and can include mechanical vibration, system motion (differs from human motion artifact), and EMI/RFI artifacts from other components of a system.
23.
PSYCHO-ELECTROPHYSIOLOGICAL INTERFACE TECHNOLOGIES
539
Surrounding Artifacts These artifacts are due to characteristics of the surrounding environment. Each environment needs to be broken down into elements that can induce an artifact to an electrophysiological system, and these should be isolated to the greatest extent possible. EMI/RFI artifact are one example of this class of artifacts and are comprised of electromagnetic and radio frequency interference from instrumentation in the surrounding area. Temperature can cause variations in measurements as well. If a participant’s temperature rises and sweating occurs, this can change the balance at the electrode skin interface. Depending on the type of electrode, this will either be beneficial or cause minor changes in recordings.
INCORPORATING ELECTROPHYSIOLOGICALLY BASED INTERACTIVE INTERFACES IN INTERACTIVE VIRTUAL ENVIRONMENTS We have shown the utility of this type of interface and the various options and considerations involved in its implementation. A vast pool of knowledge from research and development in this area is available for accessing and application. An existing example of a subtle drive command control application is the Cyberlink interface, based on EEG and EMG signals, already in use to control a virtual flight course (Nelson et al., 1997). Table 23.2 lists of relevant patents that can be applied in developing the types of applications discussed in this chapter. It is evident from the impressive increase in recent patents that further development is on the verge of implementation. Computer programs, immersed environments, and virtual reality systems are the current applications for which these interfaces are being developed. They undoubtedly mark the way for future developments that will become increasingly common in many other applications of daily life.
REFERENCES Chen, Y., Laszlo, C. A., & Hershler, C. (1995). Brush-tip electrode (U.S. Patent No. 5,443,559). Cress, J. D., Hettinger, L. J., Nelson, W. T., Haas, M. W. (1997). An approach to developing adaptive interface technology for advanced airborne crew stations. 14th Annual AESS/IEEE Dayton Section Symposium. Synthetic Visualization: Systems and Applications (pp. 5–10). New York. Farwell, L. A., & Donchin, E. (1988). Talking off the top of your head: Toward a mental prosthesis utilizing event-related brain potentials. Electroencephalography and Clinical Neurophysiology, 70, 510–533. Fendrock, C. (1992). Disposable, pregelled, self-prepping electrode (U.S. Patent No. 5,305,746). Gevins, A. (1990). Electrode system for brain wave detection. (U.S. Patent No. 5,038,782). Gevins, A., & Shaffer, R. E. (1991). A critical review of electroencephalographic (EEG) correlates of higher cortical functions. CRC Critical Reviews in Bioengineering Oct, 113–157.
540
SAGI-DOLEV
Gevins, A. (1993). Electrode electrolyte application device. (U.S. Patent No. 5,404,875). Guyton, A. C. (1986). Textbook of medical physiology. Tokyo: Saunders. Hashiba, M., Yasui, K., Watabe, H., Matsuoka, T., & Baba, S. (1995). Quantitative analysis of smooth pursuit eye movement. Nippon Jibiinkoka Gakkai Kaiho. 98(4), 681–696. (Abstract) John, E. R., & Easton, P. (1995) Quantitative electrophysiological studies of mental tasks. Biological Psychology, 40, 101–113. Jung, T. P., Makeig, S., Stensomo, M., & Sejnowski, T. J. (1997). Estimating alertness from the EEG power spectrum. IEEE Transactions in Biomedical Engineering. 44, 60–69. Kaufman, A. E., Bandopadhay, A., & Shaviv, B. D. (1993). An eye tracking computer user interface, Proceedings of 1993 IEEE Research Properties in Virtual Reality Symposium. Los Alamitos, CA. Keselberner, L., & Akselrod, S. (1996). Selective discrete fourier transrom algorithm for time-frequency analysis: Method and applicaiotn on simulated and cardiovascular signals. IEEE Transactions on Biomedical Engineering, 43, 789–802. Lusted, H. S., & Knapp, R. B. (1996). Controlling computers with neural signals. Scientific American, 275(4), 56–63. Lopez, A., & Richardson, P. C. (1969). Capacitive electrocardiographic and bioelectric electrodes. IEEE Transactions in Biomedical Engineering, 16, 99. Mitrushina, M., & Stamm, J. (1994) Task-induced differential cortical activation pattern. International Journal of Psychophysiology, 17, 15–23. Nagel, J. H. (1995). Biopotential Amplifiers. In Bronzino, J. D.(Ed), The Biomedical Engineering Handbook, (pp. 1185–1195). Boca Raton, FL: CRC. Nelson, W. T., Hettinger, L. J., Cunningham, J. A., Roe, M. M., Haas, M. W., & Dennis, L. B. (1997). Navigating through virtual flight environments using brain–body-actuated control. Proceedings of the IEEE 1997 Annual International Symposium on Virtual Reality (pp. 30–37). Los Alamitos. Potter, A., & Menke, L. (1970). Capacitive electrocardiographic and bioelectric electrodes. IEEE Transactions in Biomedical Engineering, 17, 350–351. Prutchi, D., & Sagi-Dolev, A. M. (1993). New technologies for in-flight pasteless bioelectrodes. Aviation, Space, and Environmental Medicine June, 552–556. Schmid, W. (1983). Electrode for detecting bioelectric signals (U.S. Patent No. 4,375,219). Sharbrough, F. W. (1990). Electrical fields and recording techniques. In D. D. Daly & T. A. Pedley (Eds.), Current practice of clinical electroencephalography (p. 47). New York: Raven. Smyth, C. C. (1998). Method and apparatus for determining ocular gaze point of regard and fixation duration. (U.S. Patent No. 5,726,916). Sem-Jacobson, C. (1976). Flexible sensor pad for nonattached monitoring EKG signals of human subjects. (U.S. Patent No. 3,954,100). Stern, J. A., Walrath, L. C., & Goldstein, R. (1984). The endogenous eye blink. Psychophysiology, 21, 22–33. Steinman, R. M. (1986). Eye movement. Vision Research, 26(9), 1389–1400. Taheri, B. A., Knight, R. T., & Smith, R. L. (1994). A dry electrode for EEG recording. Electroencephalography and Clinical Neurophysiology, 90, 376–383. Wolpaw, J. R., & McFarland, D. (1995). Communication method and system using brain waves for multidimensional control. (U.S. Patent No. 5,638,826).
Author Index
A Aasman, J., 489, 518 Abowd, G., 475, 481 Achorn, B., 366 Ackerman, M.J., 413, 430, Adams, M.J., 486, 518 Adams, W., 241, 243 Addison, R., 332, 342 Adelstein, B.D., 141, 164 Aharon, S., 332, 342 Aiello, G.L., 175, 177, 184 Akers, L.A., 490, 522 Akerstedt, A., 426, 430, Akimoto, T., 349, 366 Akselrod, S., 533, 540 Alexander, E., 342 Allard, T., 171, 185 Allen, J.A., 5, 17 Allen, R.W., 375, 386, 388 Alley, T.R., 197 Allgood, G.O., 254, 274, 319, 322 Allison, R., 229, 244 Altobelli, D., 342 Amalberti, R., 451, 458, 463, 465, 467, 468, 469, 480, 481 Anand, S., 155, 167 Andersen, G., 52, 64 Anderson, B.G., 414, 430 Anderson, J.H., 78, 88 Anderson, J.R., 69, 88, 90 Anderson, P.L., 84, 88 Anderson, T.R., 4, 18, 115, 126, 305, 323 Andre, A.D., 375, 388 Angell, J.R., 68, 88
Anstis, S.M., 237, 243 Appla, D., 217 Applewhite, H., 142, 164, 306, 323 Arai, K., 349, 367 Arierly, D., 36, 44 Arigiro, V., 342 Arnold, P.K., 426, 430 Arrott, A.P., 151, 165, 251, 277 Arthur, E., 474, 481 Arthur, E.J., 123, 127 Asch, S., 51, 64 Ashmead, D.H., 28, 44 Ayache, N., 342 Azuma, R., 316, 318, 319, 321
B Baba, S., 532, 540 Bach-y-Rita, P., 170, 171, 172, 174, 178, 179, 181, 182, 183, 184, 185, 186 Bachman, T.A., 309, 324 Backs, R.W., 490, 518 Backus, B.T., 226, 243 Badler, N.I., 345, 349, 366, 367 Badre, A.N., 518 Bahri, T., 462, 481 Bailey, J.H., 72, 73, 74, 90, 276 Bailey, R.E., 317, 322, 324 Baily, J.S., 149, 160 Bainbridge, L., 463, 481 Baird, J.C., 25, 41 Bakker, J.T., 317, 321 Bale, R.M., 253, 275 Balliet, J.A., 119, 127
541
542
AUTHOR INDEX
Ballinger, C.J., 137, 160, 161 Baltzley, D.R., 131, 134, 142, 160, 164, 251, 261, 273, 275 Bamber, D., 147, 162 Bandopadhay, A., 529, 540 Banks, M.S., 226, 243, 296, 300 Bannenberg, J.H., 394, 409 Bannenberg, J.J.G., 394, 409, 410 Bard, P., 249, 252, 277 Bardy, B., 114, 117, 119, 128, 137, Barfield, W., 27, 36, 41, 70, 88, 107, 108, 130, 131, 160 Barkhof, F., 85, 90 Barlow, T., 133, 166 Barnes, M., 462, 481 Barnett, B., 487, 520 Baron, S., 488, 518, 523 Barrett, G.V., 157, 160 Bartels, R.H., 347, 366 Bartolome, D.S., 485, 522 Bataille, M., 465, 480 Beagley, N., 78, 88 Beale, R., 475, 481 Beall, A.C., 22, 26, 30, 36, 37, 40, 41, 42, 44, 45, 295, 298, 299 Beaulieu, C.F., 342 Becheiraz, P., 364, 366 Becker, W., 52, 65 Becket, T., 366 Beckwith, F.D., 250, 275, 320, Bedford, F.L., 153, 160 Beek, P., 115, 127 Beirs, D.W., 118, 123, 124, 127 Begault, D.R., 188, 192, 196, Bennett, C.T., 283, 300 Bennett, K., 216 Bennett, K.B., 3, 18, 194, 196, Benson, A.J., 247, 251, 255, 273 Berbaum, K.S., 55, 65, 131, 133, 134, 151, 160, 163, 164, 250, 253, 261, 274, 275, 320, 322, 430 Berg, C., 308, 322, 323 Bergeron, P., 345, 367 Berry, D.C., 518 Bertelson, P., 145, 165 Berthoz, A., 49, 64 Bex, P.J., 114, 115, 118, 126 Bhalla, M., 22, 42, 44 Biggs, N.L., 295, 299
Billinghurst, M., 96, 109 Billings, C.E., 442, 447, 452, 457, 459, 465, 466, 467, 481 Bingham, G.P., 155, 160 Biocca, F., 92, 94, 108, 118, 126, 131, 133, 134, 142, 143, 153, 160, 164, 166, 306, 323 Birkett, D.H., 393, 410 Bischoff, W., 208, 215, 216 Bishop, G., 316, 318, 319, 321 Bjorneseth, O., 107, 108, 131, 160 Black, P.M., 342 Black, F.O., 155, 165, 261, 277 Blackburn, L.H., 138, 161 Blake, A., 350, 366 Bleeker, O.F., 488, 522 Blinn, J., 347, 348, 366 Bliss, J.P., 260, 273, 276 Blomberg, R., 489, 522 Bloomberg, J.J., 155, 160, 165, 249, 277 Boden, M., 204, 215 Boekmann, W., 410 Bogart, E.H., 485, 522 Bolia, R.S., 188, 191, 193, 194, 195, 196, 197 Bolstad, C.A., 483, 520 Bonadies, G., 489, 520 Bonato, P., 3, 5, 18 Boose, J.H., 518 Boorstin, J., 99, 108 Borah, J., 307, 321 Bortolussi, M.R., 499, 518, 521 Bos, J.F., 369, 387 Boussaoud, D., 153, 160 Bouzit, M., 86, 89 Bouwman, J.H., 251, 276 Bowman, D.A., 96, 108 Boxtel, A. von, 175, 185 Bradshaw, M.F., 230, 231, 234, 245 Branco, P., 3, 5, 18 Brain, W.R., 22, 42 Brand, J.J., 150, 157, 165, 248, 250, 252, 255, 277 Brandt, T., 52, 64, 116, 119, 121, 126, 250, 271, 273 Braun, C.C., 260, 277 Braune, R., 370, 388 Braunstein, M.L., 23, 26, 42, 45, 52, 64 Breen, T.J., 201, 216 Breeuwer, E.J., 370, 387
AUTHOR INDEX
Brickman, B.J., 4, 14, 18, 194, 196, 197, 518, 522 Bridgeman, B., 22, 42, 155, 167 Briggs, G.E., 453, 459 Brindley, G.S., 261, 273 Bronkhorst, A.W., 188, 196 Brookes, A., 237, 243 Brooks, F.P. Jr., 93, 108 Brooks, T.L., 141, 160 Brouchon, M., 147, 165 Browder, G.B., 262, 273 Browman, K.E., 155, 167 Brown, K., 188, 197 Brungart, D.S., 191, 194, 195, 196 Buckley, D., 243 Buckwalter, J.G., 71, 90 Buffardi, L.C., 5, 17 Burdea, G., 86, 89 Burnham, C.A., 147, 162 Busetta, P., 200, 217 Busquets, A.M., 280, 301 Butler, L.F., 132, 163 Butrimas, S.K., 262, 273 Buttigieg, M.A., 375, 379, 387 Byrne, K., 490, 519
C Caelli, T., 204, 208, 215, 216 Cagnello, R., 226, 238, 243, 243 Cai, R., 415, 430 Caird, J.K., 131, 163 Calhoun, G.L., 309, 313, 323 Calkins, D.S., 261, 277 Calvert, S.L., 133, 160 Cameron, B.M., 332, 342 Canedo, A., 170, 186 Canon, L.K., 145, 160 Capin, T.K., 361, 366 Carbonell, N., 479, 481 Card, S.K., 70, 88 Carello, C., 22, 45 Carlson, V.R., 28, 42 Carleton, L.R., 74, 89 Caro, P.W., 123, 126 Carpenter-Smith, T., 55, 64 Carr, K., 121, 123, 126, 131, 160 Caretta, T.R., 487,
543
Carrithers, C., 69, 88 Carroll, J.M., 69, 88 Carter, A., 82, 90 Carter, R.J., 5, 18 Casali, J.G., 251, 273, 315, 322, 498, 519 Casey, S.M., 7, 18 Casper, P.A., 500, 521 Cassell, J., 351, 366 Castore, C., 251, 274 Castro, F.D., 487, 519 Cave, K.R., 189, 197 Chambers, D., 240, 244 Champion, H., 78, 88 Chance, S.S., 36, 42 Chapman, G., 433, 459 Chase, W.G., 68, 88 Chatham, C., 490, 521 Chen, Y., 535, 539 Cheung, B.S.K., 250, 273 Chien, Y.Y., 131, 160, 258, 273 Chinn, H.I., 248, 273 Christou, C., 94, 95, 108, 121, 126 Chrysler, S.T., 123, 127 Cicerone, K.D., 84, 88 Cisneros, J., 188, 197 Clark, B., 252, 274 Clawson, D.M., 73, 74, 88, 89 Clayworth, C.C., 172, 186 Cleary, K., 77, 78, 89 Clement, J., 104, 108 Clement, J.M., 342 Clement, W.F., 315, 317, 318, 322 Clower, D.M., 153, 161 Cobb, S.V.G., 133, 161 Cohen, B., 50, 64 Cohen, D.S., 350, 368 Cohen, M.M., 137, 138, 142, 144, 146, 151, 161, 166, 167 Coiffet, P., 134, 166 Cole, R.E., 151, 161, 393, 410 Colehour, J.K., 150, 162, 248, 273 Coletta, A.V., 459 Collewijn, H., 140, 161, 222, 223, 237, 244, 245 Collins, C.C., 172, 174, 185, 186 Compton, D.E., 248, 250, 253, 276 Comstock, J.R., 489, 520 Contrucci, R.R., 136, 167 Conway, M., 133, 165, 414, 431
544
AUTHOR INDEX
Cook, A.M., 261, 276 Cooke, N.M., 201, 216 Coren, S., 148, 161 Cormack, L.K., 235, 245 Corrigan, J., 484, Costello, P.J., 139, 163 Cothren, R., 343 Coutaz, J., 481 Coward, R.E., 251, 274 Cowell, R., 503, 522 Cox, T., 415, 421, 430 Coyne, R., 100, 108 Crabb, T., 174, 185 Cramer, D.B., 252, 274 Crampton, G.H., 250, 261, 273 Crane, D.F., 317, 318, 321 Craske, B., 144, 161 Craver, K.D., 119, 127 Crawford, J.M., 343 Crawshaw, M., 144, 161 Crea, T., 133, 165, 414, 431 Creem-Regehr, S.H., 40, 45 Creme, S.H., 22, 42 Cress, J.D., 3, 18, 194, 196, 529, 539 Crosbie, R.J., 138, 161 Crosby, T.N., 251, 273 Crowell, J.A., 296, 300 Cumming, B.G., 230, 244 Cunningham, H.A., 155, 161 Cunningham, J.A., 4, 18, 197, 323, 540 Cuschieri, A., 393, 410 Cutting, J.E., 22, 25, 42, 283, 300
D Dance, S., 208, 215 Dake, M.D., 342 D’Angelo, W.R., 188, 190, 193, 196, 522 Darzei, M., 392, 410 Da Silva, J.A., 23, 26, 28, 29, 30, 34, 42, 43, 44 Darken, R.P., 71, 74, 88 Darwin, E., 249, 273 Dauchy, P., 481 Davidson, L., 144, 166 Davis, B.S., 499, 523 Davis, E.T., 36, 45 Davis, E.E., 304, 321
Davis, I., 490, 503, 523 Davis, L.E., 462, 481 Davis, W., 314, 320, 322 Dawid, P., 503, 522 Deane, F.R., 150, 162 Deaton, J., 462, 481 Deblon, F., 451, 458, 465, 480, 481 DeCarlo, D., 351, 366 Deckert, J.C., 487, 519 Dede, C., 260, 277 DeJong, R., 490, 522 Dekker, S.W.A., 466, 467, 481 Delingette, H., 342 Delp, S.L., 333, 342 Dember, W.N., 197, 323 Dennis, L.B., 4, 14, 18, 323, 540 Deouville, B., 366 DeRoshia, C.W., 144, 167 Desouza, J.F.X., 234, 245 Deutsch, J.E., 86, 89 Dewar, R., 149, 151, 161 Dichgans, J., 52, 64, 116, 119, 121, 126, 250, 271, 273 Dillon, C., 204, 216 Ditton, D., 27 Dix, A., 475, 477, 481 Dixon, M.W., 26, 27, 44, 46 DiZio, P., 118, 126, 133, 137, 148, 161, 252, Doenges, P., 363, 366 Dolezal, H., 144, 161, 251, 274, 320, Dominguez, C., 440, 441, 459 Donchin, E., 529, 539 Donchin, Y., 410 Dornheim, M.A., 317, 322 Douglas, S.D., 119, 127 Doxey, D.D., 155, 165 Draper, M., 55, 65, 135, 141, 151, 158, 161, 166 Drexler, J.M., 53, 66, 134, 142, 164, 248, 250, 253, 259, 275, 276 Duncan, J.S., 329, 342 Duncker, K., 68, 88, 448, 459 Dunlap, W.P., 133, 163, 164, 250, 259, 274, 275, 276 Durlach, N.I., 1, 3, 10, 18, 27, 43, 74, 89, 120, 125, 127, 131, 144, 149, 152, 161, 163, 166, 191, 195, 196, 258, 261, 274, 305, 306, 307, 313, 315, 320, 322, 326, 342 Durso, F.T., 201, 216
AUTHOR INDEX
Dyer, J.A., 243, 244 Dyk, R., 157, 167
E Ebenholtz, S.M., 133, 139, 144, 150, 162, 257, 271, 274 Eby, D.W., 26, 42 Edgar, G.K., 114, 115, 118, 26 Edinger, K.M., 274 Edmonson, J.M., 392, 410 Efstathiou, A., 149, 163 Egelund, N., 490, 519 Eggemeier, F.T., 485, 498, 499, 500, 519, 523 Eggleston, R.G., 305, 323 Ekman, P., 351, 366 Elliot, D., 28, 42 Ellis, S.R., 36, 42, 117, 121, 126, 136, 137, 141, 162, 164, 165, 369, 371, 375, 378, 385, 387, 388 Elson, M., 348, 366 Elvers, G.C., 188, 197 Encarnacao, M, 3, 5, 18 Endsley, M.R., 483, 484, 486, 487, 488, 505, 519 England, R., 131, 160 Entin, E.B., 487, 519 Entin, E.E., 487, 519 Entralgo, P.L., 414, 430 Epstein, W., 23, 42 Ericson, M.A., 188, 192, 193, 195, 196, 197 Ericsson, K.A., 68, 88 Erkelens, C.J., 222, 223, 237, 238, 244, 245 Ervin, H.E., 487, 519 Escher, M., 355, 366 Eskin, A., 248, 274 Essa, I., 350, 366 Este-McDonald, J., 393, 410 Evans, G.W., 415, 430 Evrad, S., 334, 342
F Faber, J.M., 490, 522 Fadden, D.M., 370, 388
545
Fahlstrom, P.G., 303, 322 Fairchild, K.M., 104, 109 Fallesen, J.J., 486, 487, 520 Falmagne, J., 62, 64 Farwell, L.A., 529, 539 Faterson, H., 157, 167 Feinsod, M., 172, 185 Feld, M.S., 343 Feller, J.F., 342 Fendrock, C., 535, 539, Ferison, C., 245 Ferrell, W.R., 141, 162 Ferris, S.H., 26, 42,156, 162 Festinger, L., 147, 162 Fikes, T.J., 26, 36, 42 Finlay, J., 475, 481 Finlay, P.A., 404, 410 Finley, D., 237, 244 Fiorita, A.L., 315, 316, 323 Fischer, M., 52, 64 Fitzmaurice, M., 343 Flach, J.M., 22, 42, 50, 65, 107, 108, 123, 124, 126, 281, 283, 291, 299, 300, 395, 410, 435, 448, 449, 453, 454, 459 Flagg, T., 237, 244 Flood, W., 188, 197 Flook, J.P., 155, 162, 164 Flotzinger, D., 308, 323 Flynn, S.B., 123, 128 Fokker, B.V., 488, 522 Foley, J.M., 23, 25, 28, 42 Forbes, J.M., 251, 276 Fore, S., 151, 161, 393, Forsey, D.R., 347, 366 Fort, A., 489, 522 Foster, G.M., 414, 430 Foulliet, J., 489, 522 Fowlkes, J.E., 133, 155, 162, 163, 251, 254, 274, 275 Fox, C.R., 137, 164 Fox, T., 119, 127 Foxlin, E., 4, 18, 306, 322 Fracker, M.L., 484, 487, 488, 520 Frank, L.H., 248, 252, 261, 274, 275, 315, 322 Frankenhauser, M., 421, 430 Franzel, S.L., 189, 197 Franzen, K.M., 82, 88 Frederici, A., 51, 64
546
AUTHOR INDEX
Fredericksen, J.R., 77, 88 Freedman, S.J., 146, 162 Freeman, F.F., 3, 19 Friedmann, F., 54, 66 Friesen, W.V., 351, 366 Fright, W.R., 306, 323 Frisby, J.P., 235, 243, 245 Frost, B., 59, 66 Frost, D.O., 171, 185, 186 Frost, G., 316, 322 Frye, D., 82, 90 Fujita, N., 28, 29, 44 Fukusima, S.S., 23, 28, 29, 30, 31, 43, 44 Fulghum, D.A., 304, Fullenkamp, P., 490, 499, 503, 523 Funabiki, K., 280, 300 Funaro, J.F., 275 Funda, J., 141, 149, 162 Furness, T., 55, 65, 70, 88, 119, 120, 121, 122, 125, 127, 131, 151, 160, 166, 309, 322 Furuta, N., 430, 431 Futamura, R., 55, 64
G Gailey, C., 151, 166 Gale, A., 490, 503, 520 Gao, C., 415, 430 Garcia-Lara, J., 170, 172, 185 Garding, J., 243 Garnett, R., 74, 89 Gaunet, F., 36, 42 Gaver, W.W., 95, 109, 114, 124, 127 Gawron, V.J., 317, 322 Geis, W.P., 395, 410, 456, 459 Gelade, G., 189, 190, 191, 197 Georgeff, M.P., 200, 216 Gevins, A., 490, 503, 520, 532, 535, 539, 540 Ghnassia, J.P., 342 Gibson, E.J., 118, 126 Gibson, J.J., 11, 18, 22, 43, 50, 61, 64, 92, 95, 107, 109, 112, 113, 114, 115, 126, 180, 185, 280, 283, 296, 299, 300, 392, 410, 450, 459 Gibson, W., 36, 44 Gick, M.L., 68, 88
Gilinsky, A.S., 23, 43 Gilkey, R.H., 115, 126 Gillam, B., 237, 238, 240, 243, 244 Girone, M., 86, 89 Glaser, D.A., 237, 238, 244 Gleason, P.L., 333, 334, 342 Gleason, T.J., 303, 322 Godthelp, H., 375, 386, 388 Goement, P.N., 474, 481 Gogel, W.C., 22, 23, 24, 25, 26, 28, 31, 33, 34, 43, 237, 244 Goffman, E., 121, 126 Goldberg, I.A., 151, 166 Goldman, R., 200, 216 Goldman, W.S., 136, 167 Goldsmith, F.T., 201, 216 Goldstein, E., 50, 64 Goldstein, R., 532, 540 Golledge, R.G., 28, 30, 44 Gomer, F., 490, 503, 522 Gonshor, A., 140, 162 Gooch, A.A., 40, 45 Goodale, M.A., 22, 28, 43, 44, 45 Goodenough, D.R., 157, 167 Goodson, J.E., 132, 165 Goodstein, L.P., 434, 440, 447, 459, 460 Gorday, K.M., 123, 128 Gordon, D.A., 283, 297, 298, 300 Goss, S., 204, 212, 214, 215, 216, 217 Gossweiler, R., 22, 44 Gower, D.W., 133, 134, 160, 162, 251, 254, 273, 274 Gowing, J., 23, 46 Grablowitz, V., 410 Graham, M.E., 235, 238, 243, 244, 245 Grassia, J., 486, 520 Gray, S., 28, 42 Graybiel, A., 144, 150, 155, 162, 164, 248, 249, 250, 251, 252, 273, 274, 275, 276, 320, Greco, R., 77, 89 Green, S.J., 136, 167 Greene, M., 149, 163 Greenleaf, W.J., 336, 342 Gregorire, H.G., 253, 275 Gregory, R.L., 183, 185 Griffin, M.J., 53, 64, 316, 318, 319, 324 Grigsby, J., 82, 89 Grimbergen, C.A., 410
AUTHOR INDEX
Groen, J., 143, 144, 162 Grunwald, A.J., 280, 297, 300, 369, 375, 385, 387, 388 Guedry, F.E., 155, 162, 251, 274 Guic-Robles, E.J., 171, 185 Gugerty, L., 197
H Haas, M.W., 3, 4, 9, 14, 18, 19, 194, 196, 197, 323, 485, 507, 520, 522, 529, 539, 540 Haber, R.N., 23, 44 Hafner, C.D., 201, 216 Hagen, B.A., 283, 300 Haldane, C., 133, 167 Hallinan, P.W., 350, 368 Halpin, S.M., 487, 519 Hamm, R.M., 486, 520 Hammond, K.R., 464, 481, 486, 520 Hampton, S., 118, 123, 124, 127 Hancock, P.A., 3, 19, 123, 127, 474, 481 Hankins, T.C., 490, 518 Hanna, G.B., 393, 410 Hansen, J., 426, 430 Harm, D. L., 51, 53, 64, 66, 133, 151, 155, 162, 163, 165, 249, 250, 254, 261, 272, 274, 275, 277 Harper, K., 489, 520 Harris, C.S., 143, 144, 153, 163 Harris, R., 489, 520 Harris, W.C., 474, 481 Harris, W.T., 317, 318, 323 Hart, S.G., 498, 499, 518, 520, 521 Harte, K., 369, 389 Hartley, L.R., 426, 430 Hartman, B., 487, 488, 520 Hartwood, K., 487, 488, 520 Hashiba, M., 532, 540 Haskell, I.D., 280, 300, 369, 386, 388, 389 Havron, H.D., 132, 163 Hay, J.C., 140, 144, 145, 163 Hayasaka, T., 416, 429, 430, 431 Hayes, P.J., 492, 522 Hayes, J.R., 450, 459 Hays, R.T., 5, 17 He, Z.J., 26, 28, 44, 45 Healy, A.F., 77, 89
547
Hebb, D., 49, 64 Heckman, T., 52, 65 Heeter, C., 27, 43 Heffner, H.E., 188, 196 Heffner, R.S., 188, 196 Heil, J., 179, 185 Hein, A., 144, 147, 149, 163 Heinze, C., 215, 216, 217 Held, R.M., 10, 18, 27, 28, 42, 43, 120, 125, 127, 142, 144, 146, 147, 148, 152, 161, 163, 166 Hendrix, C., 107, 108, 131, 160 Henn, V., 50, 64, 66 Hepp-Reymond, M.C., 188, 197 Herder, J.L., 392, 394, 410 Herndon, K.P., 100, 110 Hess, R.A., 315, 317, 318, 322 Hettinger, L.J., 3, 4, 5, 9, 10, 14, 18, 19, 118, 133, 135, 162, 164, 194, 196, 197, 249, 250, 251, 258, 272, 274, 275, 319, 320, 322, 430, 507, 518, 520, 522, 529, 539, 540 Heubner, W.P., 155, 160, Hicks, H.E., 487, 519 Higashiyama, H., 420, 431 Higgins, G.R., 78, 88, 342 Hippisley-Cox, D., 243 Hitchcock, R.J., 369, 387 Hochberg, J., 114, 115, 118, 119, 127 Hodde, K.C., 394, 409 Hodges, L.F., 36, 45, 96, 108 Hodges, L., 84, 88, 90 Holden, F.G., 395, 410 Holden, M.K., 71, 89 Holmberg, L., 415, 430 Holyoak, K.J., 68, 88 Holzhausen, K.P., 393, 411 Homick, J.L., 251, 274, 277 Horst, R., 489, 503, 521 Horward, M., 392, 410 Hoshiai, K., 431 Howard, A., 250, 274 Howard, I.P, 25, 43, 50, 51, 52, 61, 65, 150, 163, 223, 224, 226, 227, 229, 230, 234, 235, 236, 238, 239, 240, 241, 242, 243, 243, 244, 245, 250, 273, 274 Howarth, P.A., 139, 163 Huemer, V., 22, 42 Huffman, S.B., 214, 216
548
AUTHOR INDEX
Hughes, B., 179, 181, 183, 185, 186 Hughes, J.F., 100, 110 Huitric, H., 347, 367 Hummels, C.C.M., 100, 109 Humphrey, G.K., 22, 43 Hung, G.K., 4, 19 Hunt, E., 73, 74, 77, 90 Husslein, P., 411
Jones, R.M., 214, 216, 217 Jones, S.A., 136, 163 Jones, S.B., 71, 82, 90, 413, 431 Josefsson, T., 216 Josephs, L.G., 393, 410 Jung, T.P., 532, 540 Junker, A., 308, 309, 322, 323
K I Ichikawa, T., 96, 109 Igarashi, H., 415, 417, 420, 422, 426, 430, 431 Igarashi, K., 431 Ikeda, K., 145, 163 Illgen, C., 487, 523 Immamizu, H., 153, 163 Ingle, D.F., 323 Inglis, E., 144, 167 Isard, M., 350, 366 Ittlelson, W.H., 23, 43
J Jacobs, O., 459 Jacoby, R.H., 137, 165 Jakse, G., 410 James, K.R., 131, 163 Jansen, A., 410 Jeffrey, R.B., 342 Jenison, R.L., 27, 46 Jenkins, J., 131, 160, 258, 273 Jenkins, W.M., 171, 185 Jennings, R.W., 393, 411 Jeoffroy, F., 463, 481 Jewell, W.F., 315, 317, 318, 322 Johannsen, G., 371, 388 Johnson, W., 251, 274 Johnson, W.L., 217 Johnson, W.W., 283, 295, 300 Johnston, B.E., 273 Johnston, E.B., 221, 230, 244 Jolesz, F.A., 331, 342 Jones, M.B., 4, 18, 142, 164, 248, 250, 253, 254, 259, 261, 275, 276 Jones, R., 28, 42
Kaczmarek, K.A., 107, 108, 131, 160, 170, 172, 175, 176, 177, 178, 185 Kahneman, D., 62, 66 Kaiser, D.A., 490, 503, 523 Kaiser, M.K., 104, 109, 369, 383, 384, 387, 388 Kalawsky, R.S., 131, 163 Kalish, M., 296, 301 Kalkan, A., 489, 523 Kalra, P., 348, 350, 354, 355, 356, 364, 366, 367 Kalsbeek, J.W.H., 489, 503, 521 Kaminka, G.A., 200, 216 Kancherla, A., 133, 166 Kaneko, H., 226, 227, 234, 235, 236, 239, 240, 241, 242, 244 Kantowitz, B.H., 499, 500, 518, 521 Kantowitz, S.C., 499, 521 Kao, S., 489, 523 Kappe, B., 298, 300 Karp, S.A., 157, 167 Karsh, E.B., 139, 166 Kashiwagi, S., 420, 422, 430 Kass, M., 350, 366 Kato, M., 422, 430 Katz, R., 63, 65 Kauffman, L., 140, 166 Kaufman, A.E., 529, 540 Kaufmann, H.R., 78, 88 Kaur, M., 4, 19 Kaye, K., 82, 89 Kedzier, S., 206, 216 Keene, S.D., 486, 521 Kehler, T., 100, 109 Keller, K., 484, Kellogg, R.S., 250, 251, 261, 274 Kelly, M., 78, 88
AUTHOR INDEX
Kennedy, R.S., 4, 18, 53, 55, 65, 66, 131, 133, 134, 141, 142, 146, 150, 151, 154, 155, 157, 160, 162, 163, 164, 166, 248, 249, 250, 251, 252, 253, 254, 256, 258, 259, 261, 273, 274, 275, 276, 277, 278, 319, 320, 322, 414, 427, 430 Keselberner, L., 533, 540 Khalafalla, A.S., 500, 523 Kikinis, R., 331, 342 Kilpatrick, F.P., 23, 43 Kim, H.C., 410 Kim, W.S., 375, 388, 393, 395, 410 King, D., 40, 44 King, R.A., 36, 45 Kingdon, K.S., 258, 276 Kirch, M., 22, 42 Kirton, T., 118, 123, 124, 127 Kistler, D.J., 188, 197 Klatzky, R.L., 28, 30, 44 Klein, G.A., 451, 459, 471, 481, 484, 486, 487, 521 Klein, R.H., 375, 386, 388 Kline, P.B., 36, 39, 40, 43, 45, 90 Knapp, D., 73, 74, 90 Knapp, J.M., 30, 31, 36, 37, 38, 40, 43 Knepton, J., 248, 274 Knerr, B.W., 72, 73, 74, 77, 90, 276 Knight, J.R., Jr., 124, 127 Knight, R.T., 536, 540 Knott, B.A., 74, 77, 89, 90 Knotts, L.H., 317, 322 Koch, K.L., 248, 277 Kocian, D.F., 306, 322 Koehl, C., 342 Koenderink, J.J., 229, 235, 244, 245 Koenig, E., 52, 64, 116, 126 Koffka, K., 22, 43 Koh, G., 74, 89 Kohler, I., 143, 144, 164, 249, 276 Kohn, S., 297, 300 Kolasinski, E.M., 259, 261, 263, 276 Koonce, J., 202, 216 Koriath, J., 490, 521 Kornheiser, A.S., 144, 164 Kornilova, L.N., 249, 277 Kornmuller, A., 52, 64 Koss, F., 217 Kothari, M., 174, 186 Kozak, J.J., 123, 127
549
Kramer, A., 490, 499, 521, 522 Krantz, J.H., 147, 167 Krol, J.P., 490, 503, 522 Krueger, M.W., 93, 109 Kruk, R., 141, 148, 164 Kuchar, J., 489, 520 Kumar, T., 237, 238, 244 Kuntz, L.A., 147, 167 Kuraishi, S., 422, 430 Kurihara, T., 349, 367
L Lackner, J.R., 118, 126, 133, 137, 144, 148, 150, 151, 155, 161, 164, 167, 252, 276 Laird, J.E., 214, 216, 217 Lambert, E.Y., 142, 146, 166 Lampton, D.R., 260, 276 Lamson, R., 336, 342 Lancraft, R., 488, 518 Landolt, J.P., 52, 65, 250, Lane, N.E., 55, 65, 133, 151, 164, 253, 256, 276, 320, 322, 430 Lanham, D.S., 134, 164, 275 Lansdown, J., 100, 109 Lansky, A.L., 200, 216 Larish, J.F., 283, 299, 300, 453, 454, 459 Laser, S., 100, 109 Laszlo, C.A., 535, 539 Lathan, C.E., 77, 78, 89, 90 Laughery, K.R., 5, 18 Laurel, B., 99, 109 Lauritzen, S., 503, 521, 522 Lavagetto, F., 366 Lawergren, B., 240, 244 Layne, C.S., 155, 165 LeBlanc, A., 348, 367 Lee, D.N., 22, 44, 95, 109, 116, 127, 292, 297, 299, 300, 383, 388 Lee, J.H., 84, 90 Lee, W.S., 354, 367 Lee, Y., 348, 367 Lefkowicz, A.T., 308, 323 Lehner, P.E., 486, 523 Lehner, R., 411 Lester, P., 151, 161, 393, 410 Levelt, W., 51, 64 Levi, L., 426, 430
550
AUTHOR INDEX
Levin, C.A., 23, 44 Levine, M., 52, 65 Levy, L., 421, 426, 430 Lewis, S.A., 40, 44 Lichtenberg, B.K., 151, 165, 251, 277 Likert, R., 252, 276 Lilienthal, M.G., 55, 65, 131, 133, 134, 151, 160, 162, 163, 164, 250, 251, 253, 254, 261, 273, 274, 275, 319, 320, 322, 430 Lindholm, E., 490, 503, 521 Lindsay, T.S., 141, 162 Lintern, G., 77, 90, 202, 216, 453, 454, 455, 459, 460 Lindblad, I.M., 248, 277 Lishman, J.R., 116, 127, 292, 297, 299, 300 List, U., 318, 323 Liu, A., 393, 410 Lloyd, I., 217 Lloyd, L., 216 Lobovits, D., 150, 164 Loftin, R.B., 260, 273, 277 Lomax, A., 393, 410 Lombard, M., 27, 44 Longridge, T., 490, 521 Longuet-Higgins, H.C., 230, 244, 291, 300 Loomis, J.M., 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 40, 42, 43, 44, 45, 295, 298, 299 Lopez, A., 536, 540 Lornesen, W.E., 331, 342 Lotens, W.A., 107, 108, 130, 131, 160 Lowry, L.D., 136, 167 Lu, D., 415, 430 Luchins, A.S., 69, 89 Luchins, E.H., 69, 89 Lundberg, U., 415, 430 Lusk, S.L., 315, 323 Lussier, J.W., 486, 521 Lyde, C.L., 273
M Maalej, N., 174, 186 Maase, S., 410 Mach, E., 52, 65 Machover, C., 261, 276 MacMillan, J., 487, 519 Madey, J.M., 172, 174, 185
Magnenat-Thalmann, N. 345, 348, 349, 350, 354, 355, 356, 361, 364, 366, 367 Makeig, S., 532, 540 Malloy, R., 463, 481 Mangili, A., 355, 366 Mann, C.A., 490, 503, 523 Manoharan, R., 343 Manshadi, F., 82, 88 Marescaux, J., 329, 342 Marino, J., 170, 186 Mark, L.S., 102, 109, 119, 127 Marmie, W.R., 77, 89 Martens, T.G., 243, 244 Martin, E., 50, 65 Martin, R.L., 187, 196 Martinez, L., 170, 186 Martins, A.J., 140, 161 Marvin, F.F., 486, 523 Massey, C.J., 275 Matin, L., 137, 164 Matsuoka, T., 532, 540 Mavor, A.S., 1, 3, 18, 131, 161, 258, 261, 274, 305, 306, 307, 313, 315, 322, 326, 342 Mayer, D., 150, 162 Mayhew, J.E.W., 230, 235, 244, 245 McAfee, P.C., 410 McAnally, K.I., 187, 196 McCallum, B.C., 306, 323 McCandless, J.W., 141, 164 McCarthy, J., 492, 522 McCauley, M.E., 131, 133, 136, 142, 164, 248, 252, 261, 274, 275, 276, 278 McCleland, G.H., 464, 481 McCloskey, M., 104, 109 McCloud, S., 99, 109 McCready, D., 23, 44 McCruer, D.T., 375, 386, 388 McDonald, D.G., 112, 123, 127 McDonald, P.V., 155, 165 McDonough, R.C., 250, 275, 320, McFarland, D.J., 308, 323, 531, 540 McGonigle, B.O., 155, 162, 164 McGovern, D.E., 133, 164 McGreevy, M.W., 369, 371, 385, 387, 388 McGuinness, J., 251, 276 McKee, S.P., 238, 243, 244 McKinley, R.L., 188, 190, 192, 193, 195, 196, 197, 491, 522
AUTHOR INDEX
McKinnon, G.M., 141, 148, 164 McLane, R.C., 500, 523 McMillan, G.R., 4, 18, 50, 65, 123, 124, 126, 305, 307, 308, 309, 313, 315, 316, 317, 320, 322, 323 McNeil, R.J., 490, 522 Mecklinger, A., 490, 522 Meehan, J., 212, 216 Meglan, D.A., 336, 342 Meier, B.J., 100, 109 Meier, K., 170, 172, 178, 185 Meijer, D.W., 394, 409 Meirs, S.L., 192, 197 Meisner, M., 336, 342 Melin, B., 415, 430 Melvill Jones, G., 140, 162 Menges, B.M., 36, 42 Menke, L., 536, 540 Mergner, T., 52, 65 Merhav, S.J., 385, 388 Merrill, G.L., 342 Merrill, J.A., 342 Merrit, J.O., 151, 161, 393, 410 Merzenich, M.M., 171, 185 Metaxas, D., 351, 366 Methilin, G., 342 Metin, C., 171, 185, 186 Meyer, K., 142, 164, 306, 323 Miao, A.X., 484, 487, 489, 523, 524 Michaels, C., 115, 127 Michel, R.R., 486, 520 Michie, D., 206, 216 Middendorf, M.S., 315, 316, 317, 323 Midgett, J., 22, 44 Mignot, C., 481 Mijer, D.W., 400, 411 Mikulka, P.J., 3, 19 Miletic, G., 181, 184, 186 Milgram, P., 488, 522 Millard, R.T., 283, 300 Miller, C., 200, 216 Miller, D., 151, 166 Miller, E.F., 252, 274 Miller, G.A., 179, 186 Miller, J.W., 132, 165 Miller, M.S., 73, 74, 76, 77, 88, 89 Milner, A.D., 22, 44 Mitchell, T.N., 393, 410 Mitchison, G.J., 238, 243, 244
551
Mitsutake, N., 429, 431 Mitushina, M., 532, 540 Moar, I., 74, 89 Moccozet, L., 354, 367 Moelker, D.J., 370, 387 Mon-Williams, M., 115, 128, 133, 139, 165, 166, 167, 271, 277, 414, 431 Money, K.E., 247, 250, 252, 273, 276 Moore, J.P., 251, 277 Moore, M.E., 144, 166 Moore, T.J., 193, 197 Moran, T.P., 70, 88 Moray, N., 371, 388, 490, 523 Morgan, M.J., 179, 182, 186 Morley, R.M., 194, 195, 196, 197 Moroney, B.W., 123, 127, 195, 197 Moroney, W.F., 118, 123, 124, 127, 253, 275 Morris, M.A., 345, 366 Morris, M.W., 296, 301 Morrison, J., 462, 481 Mourant, R.R., 258, 277 Mouret, P., 392, 410 Mowafy, L., 383, 388 Moyses, B., 342 Muchisky, M., 155, 160 Mulder, A., 489, 518 Mulder, G., 489, 518 Mulder, M., 281, 282, 283, 284, 287, 292, 296, 299, 300, 386, 388 Mulgund, S., 489, 520 Mulligan, B.E., 275 Mumpower, J., 464, 481 Munson, K., 304, 305, 323 Munson, R., 489, 503, 521 Muralidharan, R., 488, 518 Murase, T., 420, 431 Murray, G., 200, 216, 217 Mutter, D., 342 Myers, A.A., 318, 322
N Nagel, J.H., 534, 540 Nagy, A.G., 393, 410 Nahas, M., 347, 367 Nahon, M.A., 113, 117, 127 Nakanishi, Y., 416, 430 Napel, S., 342
552
AUTHOR INDEX
Nasman, V.T., 308, 313, 323 Natani, K., 490, 503, 522 Naylor, J.C., 453, 459 Nduka, C.C., 392, 410 Nedzelski, J.M., 250, 273 Negroponte, N., 5, 18, 326, 342 Nelson, W.T., 4, 18, 188, 192, 193, 194, 195, 196, 197, 309, 312, 316, 318, 323, 507, 520, 522, 529, 538, 539, 540 Nemire, K., 137, 165 Neuper, C., 308, 323 Newell, A., 70, 88 Newman, N.J., 33, 43 Newton, R.E., 33, 43 Nichols, S.G., 133, 161, 167 Nickerson, R.S., 120, 127 Nigay, L., 478, 481 Nixon, M.A., 306, 323 Nocera, F., 3, 19 Nolan, M.D., 250, 274 Nold, D.E., 280, 301 Norbash, A.M., 342 Noritake, J., 430, 431 Norman, D.A., 7, 18, 457, 459, 471, 481 Norman, M., 96, 110 Norvig, P., 492, 493, 522 Noser, H., 361, 366 Noy, N.F., 201, 216 Nygren, T.E., 498, 522
O Ochs, M.T., 171, 185 Ogawa, K., 420, 431 Ogle, K.N., 225, 243, 244 Ohmi, M., 52, 65 Ojogho, O., 459, Okamoto, K., 430 Oldfield, J., 216 Oldfield, S., 212, 216 Oliver, J.G., 280, 300 Oman, C.M., 137, 165 Ono, H., 147, 162 Ooi, T.L., 26, 28, 44, 45 Opmeer, C., 490, 503, 522 Ordy, J.M., 133, 164, 250, 275 Ornstein, M.H., 404, 410 Ostermann, J., 366
Ouyang, L., 151, 165 Overbeeke, C.J., 93, 100, 102, 106, 109, 110, 114, 116, 124, 127, 395, 398, 400, 408, 410, 411 Owen, A.M., 82, 89 Owen, D.H., 282, 283, 295, 296, 299, 300, 301, 375, 379, 380, 381, 382, 383, 388, 389 Oyama, T., 23, 44
P Pagulayan, R.J., 119, 128, 137, Paillard, J., 51, 65, 147, 165 Paloski, W.H., 155, 165, 249, 261, 277 Pandzic, I.S., 350, 356, 361, 366, 367 Pang, X.D., 152, 161 Paquin, M.J., 82, 89 Paraskeva, P.A., 392, 410 Parasuraman, R., 3, 19, 462, 463, 481, 490, 519 Parente, R., 85, 89 Parke, F.I., 345, 349, 367 Parker, A., 94, 95, 108, 121, 126 Parker, A.J., 230, 235, 244 Parker, D.E., 50, 51, 53, 55, 64, 65, 119, 120, 121, 122, 125, 127, 151, 162, 163, 165, 251, 272, 274, 277 Parker, K., 53, 65 Parrish, R.V., 280, 301 Parry, S.R., 354, 367 Parsons, H.M., 462, 481 Parsons, K.C., 72, 73, 74, 90 Pascual-Leone, A., 171, 186 Pateisky, N, 411 Paul, R.P., 141, 162 Pausch, R., 133, 150, 165, 414, 427, 431 Pearce, A., 208, 215, 216 Pearce, C.H., 144, 167 Pearl, J., 484, 489, 503, 504, 522 Pearson, J., 214, 216 Pearson, T., 486, 520 Pejtersen, A.M., 434, 440, 447, 459, 460 Pelachaud, C., 366 Pentland, A., 350, 366 Pepper, R.L., 252, 278 Perrott, D.R., 188, 189, 190, 191, 197, 522 Perrow, C., 7, 18, 434, 459 Perry, D.C., 487,
AUTHOR INDEX
Perry, G.P., 82, 89 Peters, B.T., 155, 160 Peterson, B., 71, 74, 88 Petheran, B., 83, 89 Pettyjohn, F.S., 490, 503, 522 Pew, R.W., 486, 518 Pfaff, D.W., 146, 162 Pfurtscheller, G., 308, 323 Phatak, A.V., 283, 300 Philbeck, J.W., 22, 23, 25, 26, 28, 30, 31, 32, 34, 36, 42, 44 Picard, R.W., 1, 3, 18 Pichler, C., 393, 410 Pick, H.L., Jr., 140, 144, 145, 163, 323 Pierce, B.J., 238, 239, 243, 244, 245 Piller, M., 74, 77, 89 Pimentel, K., 306, 323 Pirrene, M.H., 26, 44 Pittenger, J.B., 101, 109 Pittman, M.T., 188, 196 Platt, S.M., 349, 367 Playter, R., 338, 342 Polanyi, M., 26, 44 Polman, C.H., 85, 90 Pool, S.L., 251, 277 Pope, A.T., 485, 522 Popescu, V., 86, 80 Porrill, J., 243 Post, R.B., 137, 167 Poston, T., 104, 109 Potter, A., 536, 540 Poulton, E.C., 62, 65, 315, 323 Poupyrev, I., 96, 109 Powell, W.S., 487, 522 Prazdny, K., 291, 300 Precht, W., 50, 65 Prevett, T.T., 375, 378, 388 Prevost, S., 366 Price, K.R., 259, 277 Price, L., 84, 90 Price, N.B., 306, 323 Primeau, N.E., 349, 367 Prinzel, L.J., 3, 19 Proffitt, D.R., 22, 26, 27, 42, 44, 46, 104, 109 Pronko, N.H., 143, 144, 156, 166 Prothero, J., 49, 55, 56, 60, 65, 119, 120, 121, 122, 125, 127 Prutchi, D., 536, 540 Psotka, J., 40, 44
553
Puig, J.A., 314, 324 Purvis, B., 490, 523 Pushkar, D., 23, 46
R Raab, A., 100, 109 Rabinowitz, W.M., 191, 195, 196 Radeau, M., 145, 165 Rader, N., 118, 126 Radermacher, K., 410 Radwin, R.G., 176, 177, 185 Raibert, M., 338, 342 Raju, R., 342 Ramsey, A.D., 142, 165 Ramsey, A.R., 133, 161 Rao, A.S., 200, 216 Rasmussen, J., 434, 444, 445, 448, 449, 450, 453, 458, 459, 460, 467, 481 Rau, G., 410 Reason, J.T., 7, 18, 53, 65, 137, 150, 157, 165, 248, 250, 252, 255, 277, 439, 444, 460, 468, 477, Redding, G.M., 144, 165 Ree, M.J., 487, Reid, G.B., 498, 522 Reid, L.D., 113, 117, 127 Regal, D., 280, 301 Regan, D., 237, 245 Regan, E.C., 142, 165, 259, 277 Regenbrecht, H., 54, 66 Remez, R.E., 22, 45 Repperger, D., 194, 196 Reschke, M.F., 51, 64, 151, 155, 160, 165, 249, 251, 261, 272, 277 Reznick, J.S., 82, 90 Ribeiro-Filho, J.P., 23, 28, 42 Ribin, G.D., 331, 342 Ricard, G.L., 192, 197, 314, 315, 317, 318, 323, 324 Riccio, D.C., 248, 274 Riccio, G.E., 10, 18, 50, 54, 65, 114, 115, 116, 117, 119, 123, 124, 126, 127, 128, 250, 272, 274, 277, 319, 320, 322, 324 Rich, C.J., 260, 277 Richardson, P.C., 536, 540 Riemersma, J.B.J., 295, 298, 301 Rieser, J.J., 28, 44
554
AUTHOR INDEX
Riley, E.W., 484, 489, 523, 524 Ringl, H., 342 Ritter, A.D., 142, 164, 259, 275 Rizzo, A.A., 71, 90 Robb, R.A., 332, 342 Robbins, L.J., 82, 88 Roberts, M.A., 82, 88 Robertson, J., 393, 410 Robinett, W., 138, 166 Robinson, D.A., 155, 166 Rochlin, G., 433, 434, 439, 460 Rock, I., 140, 144, 166 Roe, M.M., 4, 9, 18, 19, 323, 540 Rogers, B.J., 25, 43, 223, 224, 226, 229, 230, 231, 234, 235, 238, 240, 243, 243, 244, 245 Rohmert, W., 415, 431 Rolfe, J., 50, 65 Rolland, J.P., 36, 44, 133, 143, 153, 160, 166 Romack, J.L., 155, 160 Roscoe, S.N., 130, 139, 140, 156, 166, 455, 459 Rosen, J., 333, 343 Rosenberg, C., 36, 41, 130, 160 Rosenbloom, P.S., 217 Ross, H.E., 139, 144, 156, 166 Rothbaum, B.O., 84, 88, 90 Rouse, S.H., 481 Rouse, W.B., 461, 462, 464, 465, 466, 481 Rubin, E., 49, 65 Ruchkin, D., 489, 503, 521 Rudmann, D.S., 190, 197 Runeson, S., 114, 127, 449, 460 Rushton, S.K., 115, 128, 133, 139, 165, 166, 271, 277, 414, 431 Russell, B., 22, 44 Russell, P., 492, 493, 522 Russier, Y., 342 Russo, D., 240, 244 Ryan, A.M., 490, 518
S Saberi, K., 188, 197 Sadowski, W.J., 10, 18, 36, 39, 45 Sadralodabai, T., 188, 197 Sagi-Dolev, A., 536, 540 Salvendy, G., 54, 61, 66 Salzman, M.C., 260, 277
Sammut, C.H., 204, 206, 207, 216 Sampaio, E., 184, 185 Sanderson, P.M., 375, 379, 387 Sandoz, G.R., 277 Sandry, D.L., 499, 523 Sanintourens, M., 347, 367 Sansom, W., 23, 46 Sarrafian, S.K., 317, 324 Satava, R.M., 71, 82, 90, 338, 343, 412, 431 Saunders, F.A., 172, 185, 186 Scadden, L., 172, 185, 186 Scallen, S.F., 3, 19 Scerbo, M.W., 3, 19, 462, 482 Schaffer, R., 490, 503, 520, 532, 539 Schiffman, H., 50, 65 Schlank, M., 144, 163 Schmid, W., 535, 540 Schmits, D., 82, 88 Schneider, P., 308, 322 Schneider, W., 450, 460 Schneiderman, B., 447, 460, 476, 477, 482 Schnitzius, K.P., 274 Schnurer, J.H., 323 Schofield, S., 100, 109 Schor, C.M., 235, 245 Schubert, T., 54, 66 Schumann, J., 100, 109 Schumpelick, V., 410 Schvaneveldt, R.W., 201, 216 Schwamb, K., 217 Sebrechts, M.M., 73, 74, 77, 88, 89, 90 Secrist, G., 487, 488, 520 Sederberg, T.W., 354, 367 Sedgwick, H.A., 23, 25, 26, 34, 44, 45, 102, 109 Sejnowski, T.J., 532, 540 Selcon, S.J., 508, 522 Semmler, R., 155, 167 Senden, M von, 49, 66 Senova, M.A., 187, 196 Serfaty, D., 487, 519 Sevelda, P., 411 Shadbolt, N., 214, 216 Shadrake, R.A., 508, 522 Shankar, N.S., 342 Shapiro, M.A., 112, 123, 127 Sharkey, T.J., 33, 43, 133, 136, 142, 164, 276 Shaviv, B.D., 529, 540 Shaw, J.J., 487, 522
AUTHOR INDEX
Shaw, R.L., 507, 520 She, Q., 415, 430 Shefner, J., 52, 65 Shelhamer, M., 155, 166 Sheng, Y-Y., 123, 128 Shepherd, A., 78, 88, 90 Sheridan, T. B., 27, 41, 94, 109, 130, 140, 162, 166, 428, 431, 505, 522 Shiffrin, R.M., 450, 460 Shimi, S.M., 393, 410 Shimojo, S., 153, 163 Shindo, K., 430, 431 Shinn-Cunningham, B., 74, 89, 144, 152, 161, 166 Shiveley, R.J., 499, 521 Shuman, D., 147, 167 Shupert, C., 261, 277 Siegel, A.W., 72, 90 Simpson, B.D., 191, 193, 194, 196 Sinai, M.J., 28, 45 Singh, I.L., 463, 481 Singly, M.K., 69, 90 Sirevaag, E., 490, 503, 522 Sjourdsma, W., 392, 410 Skelly, J., 490, 523 Skinner, N.C., 51, 64, 151, 163 Slater, M., 10, 19, 27, 41, 45, 121, 122, 127 Slenker, K.A., 309, 324 Smart, L.J., 9, 19, 119, 128, 137, Smets, G.J.F., 93, 95, 100, 102, 106, 109, 110, 116, 120, 121, 124, 127, 395, 398, 400, 408, 410, 411 Smith, D.G., 253, 275 Smith, K.U., 141, 144, 157, 166 Smith, M.J., 415, 426, 431 Smith, P.C., 27, 45 Smith, P.K., 248, 273 Smith, O.W., 27, 45 Smith, R.E., 317, 324 Smith, R.L., 536, 540 Smith, S., 84, 90 Smith, S.L., 155, 160 Smith, W.K., 141, 144, 157, 166 Smyth, C.C., 529, 533, 540 Smythe, G., 426, 430 Smythies, J., 22, 45 Snodgrass, A., 100, 108 Snyder, F.W., 143, 144, 156, 166 So, R.H.Y., 316, 318, 319, 324
555
Sokolov, Y., 188, 197 Solick, R.E., 486, 521 Somani, R.A.B., 234, 245 Song, D., 96, 110 Sorkin, R.D., 188, 197 Sotin, S., 342 Spain, E.H., 393, 411 Spearman, C., 253, 277 Sperling, A., 22, 42 Speyer, J., 489, 522 Spiegelhalter, D., 503, 504, 521, 522 Spyker, D.A., 500, 503, 523 Stackhouse, S.P., 500, 523 Staib, L.H., 329, 342 Stamm, J., 532, 540 Stanney, K.M., 1, 3, 10, 18, 19, 53, 54, 61, 66, 70, 90, 133, 142, 155, 164, 248, 250, 253, 258, 259, 261, 275, 276, 277, 305, 324 Staples, K., 50, 65 Stapleton, M., 85, 89 Stappers, P.J., 93, 100, 102, 103, 104, 105, 109, 110, 114, 116, 121, 124, 127 Stark, L., 375, 388, 393, 395, 410, 411 Stassen, H.G., 371, 388, 410 Stautberg, D., 3, 18,194, 196 Staveland, L.E., 498, 520 Steed, A., 10, 19, 27, 45, 121, 122, 127 Steedman, M., 366 Steenhuis, R.E., 28, 45 Steinman, R.M., 140, 161, 532, 540 Stensomo, M., 532, 540 Stenton, S.P., 235, 245 Sterman, M.B., 490, 503, 523 Stern, J.A., 490, 523, 532, 540 Stern, R.M., 248, 277 Steuer, J., 94, 110, 121, 128 Stevens, K.A., 237, 243 Stevens, S., 62, 66 Stevenson, S.B., 235, 245 Stewart, W.R., 248, 277 Stiffler, D.R., 487, 488, 523 Stoffregen, T.A., 9, 19, 54, 65, 114, 115, 116, 117, 118, 119, 123, 127, 128, 137, 197, 277, 319, 320, 323, 324 Stone, M., 366 Stone, V.E., 428, 431 Stoper, A.E., 137, 166 Stratmann, M.H., 395, 410 Stratton, G., 140, 143, 144, 166, 248, 277
556
AUTHOR INDEX
Strothotte, T., 100, 109 Strybel, T.Z., 188, 190, 197 Stuart, R., 70, 90 Subroto, T.H., 397, 411 Suenaga, Y., 349, 366 Sugioka, Y., 431, Sunday, P., 366 Surdick, R.T., 36, 45 Sutherland, I.E., 94, 110 Swain, R.A., 490, 518 Szekely, P., 475, 482
T Taheri, B.A., 536, 540 Talor, C.R., 28, 44 Tambe, M., 200, 214, 216, 217 Tan, H.S., 155, 166 Tan, S.L., 133, 160 Tannen, R.S., 192, 195, 197 Task, H.L., 306, 322 Tassetti, V., 342 Taub, E., 151, 166 Taylor, J.G., 147, 166 Taylor, M., 60, 66 Taylor, R.M., 508, 522 Teghtsoonian, R., 62, 66 Templeton, W., 51, 61, 65 Tendick, F., 393, 395, 410, 411 Tenney, Y.J., 486, 518 Terzopoulos, D., 348, 349, 350, 366, 367 Texeira, K., 306, 323 Thalmann, D., 345, 348, 349, 350, 355, 361, 364, 366, 367 Tharp, G., 393, 410, 411 Theunissen, E., 280, 301, 375, 380, 386, 387, 388 Thomson, J.A., 28, 29, 45 Thompson, W.B., 40, 45 Thorisson, K.R., 351, 367 Thorndike, E.L., 68, 90 Thornton, D., 490, 523 Thornton, W.E., 251, 277 Thronton, C.L.O., 157, 160 Tidhar, G., 200, 215, 217 Tidwell, P.D., 273 Tietz, J.D., 24, 26, 31, 43 Todorov, E., 71, 89
Tolcott, M.A., 486, 487, 523 Tolhurst, G.C., 249, 252, 276 Tompkins, W.J., 174, 185, 186 Torres, F., 171, 186 Tovey, M., 100, 110 Tracey, M.R., 78, 90 Traynor, L., 77, 78, 89 Treisman, A., 189, 190, 191, 197 Tremaine, M.M., 4, 19 Truyen, L., 85, 90 Tsang, P.S., 283, 300 Tsujioka, B., 422, 430 Tufte, E.R., 99, 110 Tulving, E., 69, 90 Turk, M., 4, 19 Turner, J., 23, 45 Turvey, M.T., 22, 45 Tversky, A., 62, 66 Tweed, D., 234, 245 Tyler, C.W., 235, 245 Tyler, D.B., 249, 252, 277 Tyler, M., 375, 388 Tyler, M., 170, 172, 185 Tyrell, R.A., 197
U Ujihara, H., 420, 422, 431 Uliano, K.C., 142, 146, 166 Ungs, T.J., 251, 277 Usoh, M., 27, 45, 121, 122, 127
V Valot, C., 481 Van Breda, L., 188, 196 Van Dam, J., 343 Van der Mast, C., 93, 100, 102, 110, 116 Vanderploeg, J., 251, 277 Van der Wijngaart, R., 488, 522 Van der Zaag, C., 71, 90 Van Doorn, A.J., 229, 244 Van Ee, R., 237, 238, 245 Van Goor, S.P., 370, 387 Van Oosten, B.W., 85, 90 Van Willigen, D., 370, 387 Vedeler, D., 114, 127
AUTHOR INDEX
Veltman, J.A.H., 188, 196 Veerbeek, H., 488, 522 Verdeja, J.C., 459 Verduyn, W., 82, 88 Vertut, J., 134, 166 Vicente, K.J., 434, 445, 460, 490, 523 Vidulich, M., 499, 523 Vilis, T., 234, 245 Virre, E.S., 151, 166 Vishton, P.M., 22, 25, 42 Vix, M., 342 Von Wiegand, T., 74, 89 Voorhorst, F.A., 398, 400, 404, 410, 411 Vry, U., 411
W Wackers, G.J., 462, 481 Wade, N.J., 395, 411 Waespe, W., 50, 66 Waite, C.T., 347, 368 Walk, L., 392, 411 Walker, M., 202, 216 Wallace, J.G., 183, 185 Wallach, H., 52, 66, 139, 144, 166 Waller, D., 73, 74, 77, 90 Waller, P.E., 105, 110 Walrath, L.C., 532, 540 Wang, C.L., 347, 368 Wann, J.P., 115, 128, 133, 139, 158, 165, 166, 167, 271, 277, 414, 431 Warm, J.S., 197, 323 Warren, D.H., 145, 167 Warren, R., 22, 42, 45, 107, 108, 123, 124, 126, 128, 281, 282, 296, 297, 298, 299, 301, 381, 388, 389 Warren, W.H., 22, 37, 45, 102, 110, 296, 297, 301 Watabe, K., 532, 540 Waters, K., 348, 349, 350, 367, 368 Waugh, S., 216 Weathington, B., 273 Webster, J.G., 174, 175, 176, 177, 185, 186 Weghorst, S., 96, 109, 131, 160 Weinberg, D., 438, 440, 460 Weinberg, G.M., 438, 440, 460 Weir, D.H., 375, 386, 388
557
Weiskrantz, L., 22, 45 Welch, R.B., 129, 131, 135, 136, 137, 139, 140, 142, 144, 145, 147, 148, 153, 154, 155, 156, 157, 161, 167 Wells, M., 55, 65, 119, 120, 121, 122, 125, 127 Wells, W., 342 Wendt, G.R., 252, 277 Wenzel, E.M., 188, 192, 196, 197 Wenzl, R., 393, 411 Werkhoven, P.J., 143, 144, 162 Werner, H., 237, 238, 245 Wertheim, A.H., 22, 45 Wertheimer, M., 50, 66, 443, 460 Wertsch, J.J., 174, 186 Whang, S., 37, 45, 102, 110 White, B.W., 172, 180, 185, 186 White, B.Y., 77, 88 White, K.D., 147, 167 White, S.H., 72, 90 Whitely, J.D., 315, 323 Whittington, D.A., 188, 197, 280, 301 Wickens, C.D., 134, 140, 167, 280, 300, 315, 317, 324, 369, 375, 379, 386, 388, 389, 463, 482, 487, 499, 504, 520, 523 Wiederhold, B.K., 84, 90 Wiederhold, M.D., 84, 90 Wiedermann, J., 370, 388 Wierwille, W.W., 315, 322, 485, 498, 500, 519, 523 Wightman, D.C., 77, 90, 453, 460 Wightman, F.L., 188, 197 Wiker, S.F., 252, 253, 256, 277 Wilckens, V., 280, 301, 384, 389 Wilder, J., 4, 19 Willemsen, P., 40, 45 Willey, C.F., 144, 167 Williams, J.A., 155, 167 Williams, L., 348, 368 Williams, S.P., 280, 301 Willis, M.B., 214, 216 Wilpizeski, C.R., 136, 167 Wilson, G.F., 490, 498, 499, 500, 503, 518, 519, 523 Wilson, J.R., 133, 161, 167 Wist, E., 52, 64 Witkin, A., 350, 366 Witkin, H.A., 51, 64, 157, 167, 249, 251, 278
558
AUTHOR INDEX
Witmer, B.G., 36, 39, 40, 43, 45, 72, 73, 74, 90, 276 Wloka, M.M., 314, 318, 324 Wolfe, J.M., 189, 197 Wolpaw, J.R., 308, 323, 531, 540 Wolpert, L., 283, 296, 301, 381, 389 Wong, S., 59, 66 Wood, C.D., 150, 162, 252, 274 Wood, R.W., 249, 278 Woodrow, H., 68, 90 Woods, C.B., 147, 167 Woods, D.D., 465, 467, 481 Woods, D.L., 172, 186, Woodworth, R.S., 68, 90 Worch, P.R., 304, 305, 312, 313, 315, 320, 324 Wraga, M.A., 26, 46 Wu, B., 26, 44
X Xiao, Y., 451, 460
Y Yachzel, B., 151, 167 Yamada, K., 420, 422, 430 Yamaguchi, T., 414, 416, 427, 430, 431 Yamamoto, Y., 431 Yamanaka, Y., 420, 431 Yamazaki, K., 427, 430, 431 Yang, L., 415, 430
Yang, T.L., 26, 27, 46 Yang, Y., 235, 244 Yara, J.M., 487, 523 Yasui, K., 532, 540 Yoo, Y.H., 253, 278 Yoshida, A., 427, 429, 430, 431 Yost, W.A., 192, 197 Young, F.A., 250, 273 Young, P.T., 144, 145, 167 Youngquist, G.A., 28, 44 Yuile, A.L., 350, 368
Z Zaaijer, M.B., 370, 387 Zacharias, G., 194, 196, 484, 487, 488, 489, 518, 520, 523, 524 Zacher, J.E., 229, 244 Zacks, J.L., 146, 162 Zadeh, L.A., 484, 497, 524 Zahorik, P., 27, 46 Zajac, F.R., 333, 342 Zarriello, J.J., 252, 274 Zelazo, P.D., 82, 90 Zelenik, R.C., 100, 110 Zeltzer, D., 10, 19, 27, 41, 94, 110 Zern, J.T., 410 Zhang, G., 415, 430 Zhu, H., 174, 186 Zobel, J., 393, 411 Zografos, L.M., 151, 163 Zonios, G., 332, 343 Zubek, J.P., 23, 46
Subject Index
A Abstraction/decomposition space, 444, 445–446, 448 Acceleration, vehicle, 248, see also Motion sickness Accidents, 133 Accuracy, virtual endoscopy, 331 Acoustic arrays, 115 Acoustical rearrangements, 144 Acquisition tasks, 309 Action fidelity, 123–125, see also Fidelity Action plans, virtual environments architecture, 204–212 issues, 212–214 ontology acquisition, 201–204 revisited, 214–215 task validity and fidelity, 212 Active contours, see Snakes Active interaction, 147–148 Active psychophysics, 299 Adaptation, 148, 149, 150, 154–155 Adaptive aids, human factors approach evolution of automation and assistance, 462–468 flight assistance for military aircraft, 468–474 interfaces, 474–480 Adaptive algorithms, 3 Adaptive control systems, 343–439 Adaptive environments (AEs), 3–11 Adaptive Pilot–vehicle interface, see Pilot–vehicle interface Adaptive responses, 212
Adaptivity, assistance, 464–466 AE, see Adaptive environments AEPs, see Auditory evoked potentials Aerial perspective, 25 Aerodynamic angles, 296 Aftereffects, telesystems, 132–133, 135 Agents, intentionality, 199–200 AI, see Artificial intelligence Aiding, meaning-processing systems, 454 Air-to-air radar symbology, 507, 508 Aid-to-assumption management tool, 471 Air combat, 312, 487 Air traffic management (ATM), 279 Air vehicles, 309 Aircraft self-tuning regulator, 438 target detection/identification and auditory-guided visual search, 191–192 tunnel-in-the-sky display, 281, 282, 294–296 uninhabited aerial vehicles, 313, 317–318 Algorithm prediction, 318 Aliasing, 213 Alignment, vehicle-referenced, 378 Allocation, 466 Allocentric frames, 51 Alpha waves, 530, 531 Alternative control technology eye-position tracking and speech recognition, 307 gesture recognition, 308–309 human factors, 314–320 position and orientation tracking, 305–306 uninhabited aerial vehicles
559
560
SUBJECT INDEX
Alternative control technology (Cont.) combat vehicles, 303 operator immersion, 309–314 Altitude control, 283 Altitude poles, 284–285 Ambient energy, 113, 119, 120 Ambiguity, 231 Amplification, signal sensing, 534 Amputees, 178 Analysis of variance (ANOVA), 60 Analysis, model-based, 280–284 Anastomosis simulator, 338, 339 Angle of attack, 290 Angle of climb, 290 Angle of slip, 290 Angular position, cues, 191 Animation face-to-virtual face communication, 355–356, 358, 359, 361 virtual model of human face, 349–350 ANIP, see Army–Navy Instrumentation Program Aniseikonia, 225 ANOVA, see Analysis of variance Anticipation task, 282 Anticipative behavior, 470 Arcade games, 93 Architecture adaptive pilot–vehicle interface, 491 flight simulation, 204–214 issues, 202–204 Aristotelian models, 104, 105 Army–Navy Instrumentation Program (ANIP), 369 Artifacts, 537–539 Artificial intelligence (AI), 467 Ascent, 263 Aspect hypothesis, 495 Assistance, 463–468 Astronauts, 51, 152, 155, 178, 251 ATM, see Air traffic management Attributes, spatial navigation displays, 378 Audio alerts, 508 Audio processing, 357, 359 Audiovisual synchronization, 360–361 Auditory evoked potentials (AEPs), 172, 532 Auditory feedback, 141, see also Feedback Auditory Localization Cue Synthesizer, 193
Auditory localization, 187 Auditory objects, 152 Auditory reversals, 145 Augmentation, meaning-processing systems, 455 Augmented reality, 130 Authentic environment, 132 Automatic behaviors, 451 Automation adaptive control systems, 439 evolution and adaptive aids, 462–468 human attributions of adaptivity, 465 pilot-vehicle interface adaptation module, 505 structural truth, 444 Automatons, 467 Automobile, 85 Automotive diagnosis, 493–494 Autonomic nervous system, 526, 527 Autopilot system, 206, 207, 208 Aviation, 443, 444 Avoidance strategies, 157 Axis rotation, 377 Azimuth, 189, 190
B Babies, blind, see Blindness Backward chaining, 454 BAEPs, see Brainstem auditory evoked potentials Barriers, diagnosis, 332 Battlefield air interdiction mission, 494–495 Battlefield management, 486 Bayesian logic, 489 BDI, see Belief, desire, and intentionality agent formalism Behavioral approach, 346 Behavioral effects, 133 Behind-the-tail view, 98 Belief network (BN) evaluation of pilot state estimator, 509–510 implementation of pilot state estimation module, 500–504 pilot-vehicle interfaces, 484 situation assessment, 492–494 situation awareness modeling and air combat, 489
SUBJECT INDEX
tactical situation assessment embedded in adaptive PVI, 495, 496 Belief, desire, and intentionality (BDI) agent formalism, 212 BESS questionnaire, 266, 270 Beta waves, 530 Bias, 486 Biliary tree, 334, 335 Bimodal alerting, 508 Binocular cues, 25, 26, 35, 37 Binocular disparity, 393 types, 220–228 Binocular stress, 414 Binocular vision, 158 –monocular vision, 393 Biochemical measurements, medical simulations, 421, 425–426 Bioelectric signals, 529–533 Biopotential acquisition, 533–535 Bite bars, 119 Bit-map textures, 102 Bivariate parametric functions, 347 Blind pointing response, 145, 146 Blindness babies and qualia, 184 mouth display systems, 177 sensory substitution systems and perception, 179–180 tactile-evoked potentials, 171–172 visually induced motion sickness, 249 Blood pressure, 420, 421, 424, 425 BN, see Belief network Bodycentric frames, 51 Body–motion detectors, 4 Bony deformity model, 333, 334 Bookhouse interface, 447 Booster therapy, 85 Boundary conditions, 439 Bowel preparation, 329, see also Virtual endoscopy Brain, 48, 170, 329, 330 Brain-actuated control technology, 308, 313 Brain–body-actuated control technology, 308 Brainstem auditory evoked potentials (BAEPs), 172 Breast tumors, 334 Breathing artifacts, 538 B-splines, 350 Buffers, 360, 361
561
C CAD packages, 100 Calibration hypothesis, 28–29 Camera movement Delft Virtual Window System, 395–396, 397–398, 399 perceptual studies of tactile vision substitution systems, 173 Cancer screening, 329, 331 Capacitive electrode, 536–537 Cardiovascular workload (CWL), 501, 503, see also Workload Cartooning, 99–101 Catecholamines, 421 425, 426 Cathode ray tube (CRT), 26, 27, 256–257 Causal reasoning, 494 CAVE technique, 27 CEM, see Composite eye movement Central nervous system (CNS), 62, 526–533 Central nervous system workload (CNSWL), 501, 502, see also Workload Central vection, 52, see also Vection Cerebral cortex, 526 Chaos, 438 Chin rests, 119 Chi-square analyses, 266 Chunking model, 179–180 Circuit flight, 200–202 Circular guide, 405, 406, 407, 408 Circular vection, 52, 55, 56, 119, see also Vection CLARET, see Consolidated learning algorithm based on relational evidence theory Client server networks, 361–362 Cloned face–cloned face communication, 352 Clones, 346, 355, 358 CNS, see Central nervous system CNSWL, see Central nervous system workload COA, see Course of action Cockpit, 205, see also Flight simulation Cocktail party effect, 187–188, 192 Cognitive assessment, 532 Cognitive control system, 441, 442 Cognitive effort, 295, 377 Cognitive operations, 468–469 Cognitive strategies, 156–157, 159
562
SUBJECT INDEX
Cognitive systems engineering, 434 Cognitive task, 531 analysis model, 529, 530 College-age population, 262 Color coding, 507, 508 Color look-up table, 332 Color phototomography, 332 Color sample identification, 356 Communication, 351–354 Compensation algorithms, 318 Competence, 201, 202, 478 Competition, 438 Component technologies, 325 Composite eye movement (CEM), 532 Compression, 289 Compression gradient, 283 Computed tomography (CT), 78, 326, 332 Computers animation, 345 games, 97–99 graphics and visual space perception of egocentric distance, 37 high-speed and virtual reality technology, 414 power, 338 skills acquisition and transfer of training, 69–70 Concurrent exposure, 148 Conditional probability table (CPT), adaptive pilot–vehicle interface pilot state estimator, 502, 503, 504 situation assessment using belief networks, 493–494, 495, 496 Conductive gels, 536, 537 Confidence, stroke patients, 85 Configural graphic, 446 Configural scoring characterization, 255–258 comparison of flight simulators and virtual environment data, 270–272 cybersickness, 258, 261–263 genesis, 248–249 measuring sickness with questionnaires, 252–253 psychometric properties, 253–254 results of sickness profiles, 258, 259–260 simulator sickness, 250–251 space adaptation syndrome, 251–252 total (normative) scores, 254–255
total scores from virtual environment symptoms, 263–270 visually induced motion sickness, 249–250 Conflict, 132, 255 Conformality requirements, 385 Consistency, 472 Consistent mappings, 451 Consolidated learning algorithm based on relational evidence theory (CLARET), 208, 211 Constraints, meaning-processing systems, 448, 449 Control flight simulator and virtual environment architecture, 204–205 method and tactile display systems, 175–179 psycho-electrophysiological based interactive interfaces, 528–529, 531–533 rehabilitation and transfer of training in stroke patients, 83 Convolvotron, 192 Cooper–Harper Scale, 498, 317 Core temperature, 421, 423, 424, see also Hyper Hospital system Coriolis forces, 119, 249 Coriolis Sickness and Preflight Adaptation Trainer, 264 Cost, 70, 71, 131, 304 Cost benefits, 304, see also Military applications Course of action (COA), 486–487 CPT, see Conditional probability table Craik–O’Brien–Cornsweet illusion, 237 Crew, cockpit, 473 Crossover model of human tracking, 434 Cross-track error, 381, 382, see also Error CRT, see Cathode ray tube CT, see Computed tomography Cue conflicts, 272 Cue inventory, 285–287 Cues, 486, see also Decision making Cursors, computer, 308 Curved trajectory, 282 Curvilinear motion, 283 CWL, see Cardiovascular workload Cybernetic analysis, 280–284
SUBJECT INDEX
Cybersickness, 258, 261–263 Cyclopean axis, 230, 231, 233 Cyclophoria, 229 Cyclovergence, 228, 229, 234
D DAG, see Directed acyclic graph DARPA, see Defense Advanced Research Projects Agency Data flow, virtual dialogue system, 362 Data presentation, 371–373 Data recording rate, 206 D-day invasion, 247–248 Deafness, 172 Decision making, 486–487 Declarative knowledge, 471 Deep pixels, 326, 340–341 Defense Advanced Research Projects Agency (DARPA), 178 Definition points, 364, 365 Deformable templates, 350 Deformation controller, 361 Deformation disparity, 226, 227 Deformation theory, 229 Degree of freedom, 394, see also Laparoscopy Delaunay–Dirichlet diagrams, 354 Delft Virtual Window System (DVWS) implementation of perception–action coupling in laparoscopy, 396–404 technical implementation, 395–396 Delta waves, 531 Depression angle, 289 Depth contrast, 220, 237–238, 243 Depth distortions, 138–139 Depth enhancement, 237 Depth perception, 228, 394 Depth reversal, 24–25 Descriptive aids, 464 Descriptive model development, 487, 488 Desensitization, 84, 85 Design adaptive pilot–vehicle interface, 490–492 complexity benefiting from previous research, 376 dynamic cues, 382–384 frame of reference, 377–378
563
identification of task-specific visual cues, 379–380 integrated approach, 375 parameter specification and coping with constraints, 384–385 representation, 378–379 specifying representation and transform rules, 376–377 static cues, 381–382 visual cues as properties of optic flow pattern, 380–381 experts and meaning-processing systems, 452–457 questions and spatial navigation displays, 374–375 using virtual spine biopsy as testbed, 79–81 Deterministic models, 465 DETOUR, 332–333 Device, 136 Dexterity-enhanced surgery, 340 DFFD, see Dirichlet free-form deformation Diabetics, 174 Diagnosis medical applications of virtual reality, 328–333 use of laparoscopy, 392 Diagnostic reasoning, 489 Diagnostic scoring, 255–258 Dialogues, 346–347, 474–476 Digital signal processing, 308 Diphones, 357 Direct dynamic cues, 284, see also Dynamic cues Direct manipulation, 446, 447 Direct optical cues, 291–292, see also Optical cues Direct perception, 446, 447, see also Perception Directed acyclic graph (DAG), 504 Direction, estimation, 372 Directional error, 383, see also Error Directive/nondirective, 464 Dirichlet free-form deformation (DFFD), 354 Disabilities, 84, 85, 532 Discriminant analyses, 266 Disorientation, 256–258, 265–266, 267, 271 Disparity-based stereopsis functions of vertical, 228–234
564
SUBJECT INDEX
Disparity-based stereopsis (Cont.) limitations and distortions, 234–243 types of binocular, 220–228 Displacement disparity, 226 Displacements, tunnel-in-the-sky display control/perception of rectilinear motion, 295–296, 297 dynamic optical cues, 291 lateral/vertical and static optical cues, 287, 288–290 Distance auditory-guided visual search, 190 cue effectiveness and visual space perception, 25–26 distortions and telesystems causation, 138–139 horizontal/vertical-size disparity, 229–230, 231 real- versus virtual environments and training transfer, 76 types of binocular disparity, 221 Distance–pivot distance, 24, 33 Distortions, 234–243 Distractors, 188, 189, 190, 191 Distributed ground scheme, 176 Distributed practice, 151 Dithering, 438 DOOM, 97, 98, 99 Dopamine, 425 Dreamflight, 345 Drop shadows, 103–104 Dry resistive electrodes, 535–536 Dual adaptation training, 153–156, see also Training Dual control problem, 437 DURESS microworld, 445, 447 DVWS, see Delft Virtual Window System Dynamic cues, 284, 382–384 Dynamic optical cues, 290–294 Dynamic synthetic data, 374 Dynamic world model, 449
E Earth’s surface, relative motion, 48, see also Rest frame hypothesis Eccentricity, 229–230, 231, 232, 234 ECG, see Electrocardiogram
Ecological approach, 282–283, see also Tunnel-in-the-sky display Ecological interface, 444, 445–447, 448, 450 Ecological theory, 11–12 Edge detection, 356 Education, 181, 336–340 EEG, see Electroencephalogram Efference, 147 Efficacy, speech recognition, 307 EGG, see Electrogastrogram Ego speed, 382 Egocentric distance, visual perception effectiveness of distance cues, 25–26 focal awareness, subsidiary awareness, and presence, 26–27 measuring in real environments, 27–35 significance of distance in virtual environments, 40–41 virtual environments, 36–40 very large scale, 35–36 variables and their couplings, 23–25 Egocentric frame of reference, 377–378 Egocentric frames, 51 Einstellung effect, 69 EKA, see Expert knowledge analysis Electrocardiogram (ECG) Hyper Hospital virtual reality system, 420, 421, 422 physiological measurement of workload and PVI, 489, 490 psycho-electrophysiological based interactive interfaces, 527, 533 Electrochemical communication, 526 Electrode–skin interface, 175, 533–534 Electroencephalogram (EEG) alternative controllers, 308 physiological measurement of workload and PVI, 490 psycho-electrophysiological interface technologies, 525, 527, 529–531 Electrogastrogram (EGG), 533 Electrolytic solutions, 169–170 Electromagnet digitizer, 348 Electromagnet head tracking system, 306 Electromagnetic/radio frequency interference, 539 Electromyogram (EMG) alternative controllers, 308, 309
SUBJECT INDEX
psycho-electrophysiological interface technologies, 525, 527, 532–533 uninhabited aerial vehicles and operator immersion, 311, 312 Electronic current flow, 533–534 Electronic enema, 329, 331 Electro-oculogram (EOG), 525, 527, 532 Electrophysiological signals, 526–529 Electro-tactile stimuli, 175 Elevation, target, 190 Emergencies, 474 EMG, see Electromyogram Emotion, 346, 351 Endoscopy, virtual, 329, 331, 332 Engagement success, 484 Engineering psychology, 376 Engineering solutions, telesystems, 134–135 EOG, see Electro-oculogram Epinephrine, 426, 428 Episodic memory, 69 Episodic variation effect, 206 Ergonomic assessment, 474 Ergonomic experts, 470, see also Experts ERPs, see Evoked-related potentials Error adaptive control systems, 440 decision making behavior, 486 face-to-face communication, 357 sensory rearrangement, 147, 148 simulator fidelity, 122 spatial navigation displays, 371 systematic and visual space perception of egocentric distance, 31, 32 telesystem performance and decline of user complaints, 143 Error-correcting actions, 386 Error-neglecting control strategies, 386 Ethological studies, 427 Euclidean distances, 74 Euclidean frame of reference, 49 Euler angles, 285, 294 EV, see Evoked potential Evaluation perception–action coupling, 407–408 spatial navigation displays, 385–386 Evoked potential (EV), 527, 531–532 Evoked-related potentials (ERPs), 531 Exceptional cases, 456–457 Excyclodisparity, 234
565
Exocentric frame of reference, 377–378 Experiential fidelity, 120–121, see also Fidelity Expert knowledge analysis (EKA), 470, 473 Experts knowledge, 472–473 knowledge-based and meaning- processing systems, 449, 451 –nonexpert performance, 487 ontology acquisition and action plans, 202 systems and flight assistance for military aircraft, 468–469 Exploration, 118–120 Extraocular muscles, 231 Extrapolation, 360 Eye blinks, 490, 501, 532 Eye-based control, 311–312 Eye level, 137 Eye-position tracking, 307 Eyestrain, 257–258
F Face, simulations, 346 Face-to-face communication problem domain and related work, 346–354 standardization for synthetic natural hybrid coding, 363–364 system description, 354–363 Facial animation parameter (FAP) set, 363 Facial animation parameter units (FAPU), 363 Facial communication, 351 Facial data acquisition, 348 Facial definition parameter (FDP) set, 363, 364 Facial expression, 350–351, 353, 356–357, 358 Facial features, 347–348 Facial motion tracking, 350–351 Facing direction, 30 Factor analytic scoring, 256 Failure, 487 Familiarity, 378 FAP, see Facial animation parameter set FAPU, see Facial animation parameter units Fatigue, 420, 421, 422, 426, 427 FDP, see Facial definition parameter set
566
SUBJECT INDEX
Feature detection method, 356, 357 Feedback adaptive control systems, 434, 435 error-corrective, 148, 149–150 meaning-processing systems, 455 sensory delays in telesystems, 138, 140–141 simulations of naturally rearranged sensory environments, 156–157 tongue-based tactile displays concept, 170 training and education in medical applications of virtual reality, 338 visual and transfer of training in medical simulations, 80, 81 Feed-forward, 435 FFD, see Free-form deformation Fidelity flight simulations, 212 meaning-processing systems, 452–453 pilot state estimator and adaptive PVI, 503 telesystem limitations, 132 virtual environments implications for evaluating simulators, 120–125 simulation and natural law, 113–117 simulation or reality, 112–113 transfer of training, 70, 73, 77 what is perceived, 117–120 Field of view (FOV), 52, 131, 132, 150 Field-independent/dependent individuals, 157–158 Fighter aircraft, 310, see also Aircraft Figure–ground studies, 49–50 Fingertips, 169–170 Fish-eye lens, 401–404 FITE radar display, 507 Fixation point, 395, 396, 398, 402, 403 Flashbacks, 133, 153, 154 Flexibility, 74–76, 279 Flight assistance, 469–474, see also Electronic copilot program Flight conduct, 280, see also Tunnel-in-the-sky display Flight-control systems, 436 Flight crew, 279–280 Flight-deck automation, 280 Flight domain, 201–202 Flight path, 296–298
Flight-path angle error, 290, 291, 292, 293, 297 Flight-path azimuth angle, 290 Flight-path display, 280, 369, 371, 372 Flight simulations, see also Military simulations cybersickness and configural scoring, 262 fidelity of virtual environments, 116–117, 118 medical applications of virtual reality, 336, 337 postexposure aftereffects, 154 time-delayed visual feedback effect, 315 Flight simulators meaning-processing systems, 453 limitations of telesystems, 135, 136 normative scores on simulator sickness questionnaire, 254 ontology acquisition, 201–204 reality versus simulation comparison, 112 useful ways of avoiding reality, 97 virtual environment comparison of ill effects and configural scoring, 270–272 visually induced motion sickness, 250 Flight tests, 369, 371 Flybox, 205 FNS, see Functional neuro-stimulation Focal awareness, 26–27 Focus of radial expansion (FRO), 291–292, 293 Food, intake, 421 Forcing function, 374, 385 Form distortions, 140 Forward chaining production rule, 488 Forward inferencing, 489 Four-dimensional navigation environment, 279 FOV, see Field of view Foveation, 188, 189 FRO, see Focus of radial expansion Frame of reference, 377–378 Frame rate, Free-field cueing, 190, 193 Free-form deformation (FFD), 354 Free play approach, 457 Freezing, 138 Full-cue conditions, 28–31, 33
SUBJECT INDEX
Function reallocation, 80, 81 Functional fidelity, 123 Functional neuro-stimulation (FNS), 532 Fuzzy logic, 484, 496, 497
G Gain Scheduling System, 435, 436 Gallbladder, 338, 392 Gaze, redirection, 188 GCS 2000, 309 Geocentric frames, 51 Geometric field of view (GFOV), 372, 377, 384, 385 Geometrical elements, 284, 285 Geometrical representation, 347 Gestalt principles, 50 Gesture recognition technology, 308, 313 Gesture/voice integration models, 478 GFOV, see Geometric field of view Gibson’s hypothesis, 297 Gibsonian theory, 95, 96, 101, 106–107 Glass-fiber scope, 397–398 Glide approach/landing, 209, 210 Global deformation, 229 Global optic flow field, 296–297 Global optical expansion, 283 Global optical flow rate, 291, 383 Global positioning system (GPS), 370 Glove, 174 Glove-based systems, 308 Goals, 3, 4, 131 GPS, see Global positioning system Graphics controlling/facilitating variables and sensory rearrangement, 148–149 computer and tactile-evoked potentials in blind persons, 171–172 computer games and useful ways of avoiding reality, 97 limitations of telesystems, 132 Gravitoinertial force, 138, 144, 148 Gravity, 104, 105, 137 Graybiel classification system, 252 Ground textures, 283 G–seat, 50 Guidance information, 372–373
567
H Hairbrush electrode, 535 Hand–eye coordination, 143, 172–173 Hand–eye responses, 149–150 Hands-free control, 312 Haptic arrays, 115, 116 Haptic information, 392 Haunted Swing Illusions, 249 Head-based control, 311–312 Head gear, 132 Head motion procedure, 35–36 Head-mounted display (HMD) active interaction and sensory rearrangement, 147 comparative studies of simulator sickness, 258–260 distance/depth distortions and telesystems, 138–139 Hyper Hospital virtual reality system, 415–416, 417 measurement of presence experiments, 57, 59 uninhabited aerial vehicles, 313, 316 visual space perception of egocentric distance, 27 virtual environments, 36–40 Head movements, 119, 404–405 Head position data, 318 Head-related transfer functions (HRTFs), 187, 190–191, 192, 193 Head-slaved tracking task, 316–317 Head-tracking device, 407 Head-up display (HUD), 130, 143, 156 Health care integration, 340–341 Heart rate (HR) Hyper Hospital virtual reality system, 422, 423, see also Electrocardiogram physiological measurement of workload and PVI, 489, 490 pilot state estimator, 501, 510 psycho-electrophysiological based interactive interfaces, 533 Heart rate variability (HRV) physiological measurement of workload and PVI, 489, 490 pilot state estimator, 501, 503, 510 psycho-electrophysiological based interactive interfaces, 533
568
SUBJECT INDEX
Heavy bat approach, 456 Helicopter simulations, 256–257, 263–264, 265 Helmet-mounted displays, 205 Hernia, repair, 392 Hidden Markov models, 357 Hierarchical B-splines, 347, see also B-splines Hierarchical task analysis (HTA), 78–79, 80 HMDs, see Head-mounted display Holistic perception, 378 Holographic images, 340 Horizon scaling theory, 102, 103, 104 Horizontal disparity, 220, 231, 238–240 Horizontal displacement, 223 Horizontal horopter, 231 Horizontal vergence, 223 Horizontal-shear disparity, 225–226, 227, 242 Horizontal-size disparity, 224, 225 HR, see Heart rate HRTFs, see Head-related transfer functions HRV, see Heart rate variability HTA, see Hierarchical task analysis HUD, see Head-up display Human artifacts, 537–538 Human behavior, 1, 2 Human experts, 439–442, see also Experts Human factors design, 439 Human factors engineering, 2, 11 Human factors recommendations, 470–471 Human interface techniques, 414 Human-in-the-loop studies, 262, 385, 386 Human–machine interface adaptive environments approach, 3, 5 design complexity in spatial navigation displays, 375 electronic copilot, 473 perceptual studies of tactile substitution systems, 173, 174 time delay in uninhabited aerial vehicles, 315 spatial audio displays as components of adaptive, 194–196 Hyper Hospital system, 413, 415–416 Hypergravity, 144 Hypogravity, 144, 151
I Icons, 451 i-glasses, 259, 263 Illusions, 173 Illusory self-motion, see Vection Illustrations, 99–101 Image labeling, 213 Image misalignment, 234 Imagery, recording, 205–206 Images, vision substitution systems, 172–173 Imaging, training transfer, 80, 81 IMAX theaters, 36 Immensity experience, 35–36 Immersive interface, 8–10, 27, 313 Immersive quality, 309–314 Immersive virtual reality, 93 Implicit surfaces, 347 IMV, see Inspiratory minute volume Incisions, 394 Inclination limitations and distortions of stereopsis, 237–238, 239, 240, 243 perception and types of binocular disparity, 226, 227 perceived, 242 In-cockpit view, 98 Incremental exposure, 150 Independent visual background (IVB), 54, 55 Indirect dynamic cues, 284, 290–291, see also Dynamic cues Indirect procedures, 32–35 Individual differences, 157–158 Indoor/outdoor environment, 34 Induced effects, 225, 226, 229, 235 Industrial robot, 404 Inertial arrays, 115–117 Inertial cues, 272 Inertial motion, 117, 138 Inertial receptors, 50 Inferences, 4, 28 Information correspondence in virtual reality component technologies, 101–107 tool development, 95, 96 equivalent, 326 fidelity of virtual environments, 119
SUBJECT INDEX
flow and pilot-vehicle interfaces, 483 laparoscopy limitations, 392 processing, 282, 451, 470, 504 rest frame hypothesis, 50 tongue-based tactile displays, 170–171 transfer, 282, 375 tunnel-in-the-sky display, 284–290 Infrared detection device, 405 Inside-out frame of reference, 378 Insole-pressure pad receptor system, 174 Inspiratory minute volume (IMV), 490, 503 Instrument movement, 394 Insulated electrode, see Capacitive electrode Integrated approach, 375 Integrated system evaluation, 106–107 Intellectual corrections, 156–157 Intelligent flight assistance, 468 Intelligent module, 359 Intelligibility, 192–194 Intensity information, 177 Intensive qualities, 175 Intentionality action plans in virtual environments, 199–200 electronic copilot program, 470 flight simulation, 209, 211–212 recognition and interface adaptiveness, 475 Interactive methods, 348 Interface agent, 201 Interfaces adaptive aids, 474–480 aspect of virtual reality, 325 computer–user and limitations of presence, 62–63 virtual and adaptive human performance, 2 Interference, 306, 534 Internal mental model, 489 Internal model, 449 Interpolation, 349, 360 Intersecting routes, 75–76 Intersensory bias, 145, see also Bias Intersensory conflicts, 136–138, 145, 146 Interviews, 470 Intravenous needle insertion, 336, 337 Intuitive interfaces, 6–7, 61, see also Interfaces Intuitive physics, 104, 105 IVB, see Independent visual background
569
J Jack, 345 JANAIR, see Joint Army–Navy Aircraft Instrumentation Program Japanese Manga style, 99 Jitter, 318 Joint Army–Navy Aircraft Instrumentation Program (JANAIR), 369 Jointed-angle tracking technology, 308 Judgment, 27–28, 33, 34, 38 Jump effect, 361
K KADS system, 214 Kinetic depth effect, 138–139 Knowledge, adaptive control systems, 439 Knowledge-based interactions, 451 Knowledge map, 202–203 Knowledge patterns, 471 Kraelpelin scores, 426
L Labyrinthine defects, 250 Lag matching virtual environments to reality, 70 rest frame hypothesis, 53 sensory rearrangement, 140–141, 146, 148–149 Landing maneuvers, 315–316, see also Flight simulations Language, 451 Language interfaces, 313, see also Interfaces Laparoscope tip, 398–399 Laparoscopic surgery, 440, see also Error Laparoscopy, perception–action coupling evaluation, 407–408 implementation, 396–404 operation, 392–396 prototype, 404–407 Laser-based scanners, 348 Lateral acceleration, 138 Lateral displacement, 291 Lateral displacement cues, 288–289 Lateral tunnel cues, 285–287
570
SUBJECT INDEX
Lawfulness, 451 Laws of dynamics, 346 Learning, 69–70, 173, 179–180 Learning curve, 474 Learning plans, 206, 208–212 Left-turn maneuver, 209 Lenses, 251 Leprosy patients, 174 Lexical analysis, 360 Life or death behavior, 112 Light source, 399–401 Lighting, 131, 132 Likert scale, 252 Limb position, 129, 137, 147, 149 Limb trauma simulator, 338, 339 Limitations current telesystems, 130–136 stereopsis, 234–243 Linear vection, 52, see also Vection Linearization, 287 Link 6-DOF motion platform, 116 Liquid crystal graphic-image displays, 415 Liver model, 332 Location, 22 Locomoting observer, 28, 29–31 Locomotion, 296 Locomotor flow line, 292, 296 Logical structure, 69 Longitudinal tunnel cues, 285–287 Luchins’ water jug problem, 69
M Magnetic resonance imaging (MRI) scan, 326, 332, 334 Magnitude estimation, 61, 62 Manipulated distance cues, 34 Manipulation/exploratory tasks, 391, 393, 402, 404, 405, 407 Manual control tasks, 282 Manual tracking, 371 Maps, 73, 74, 76 Markers, 350 Mass-spring model, 349 Match training, 69 McCloud’s Manga, 101 Meaningful/nonmeanful conditions, 56, 57, 60
Meaning-processing system characterization, 448–452 designing experts, 452–457 general adaptive control problem, 434–439 human expert: adaptive controller, 439–442 representation design, 442–448 Mechanical arrays, 115–117 Medical applications, virtual reality diagnosis, 328–333 education, 336–340 health care integration, 340–341 psychological/physiological issues hyper hospital system, 415–416 experimental setup and measurements, 416–422 results, 422–427 discussion and conclusion, 427–429 therapy, 333–336 Medical avatar, 326, 329, 340–341 Medical constraints, 397 Medical records, 326 Medical simulations, 77–82, see also Training Membership function, 497 Memory storage, 414 Mental imagery, 91 Mental model, 492 Mental state, 484, 498, 499 Mental workload (MWL), 415, 490, 501, 502, 503, see also Workload Metacognitive skills, 82 Meta-control, 313, 314 Meta-knowledge, 471, see also Knowledge MIAS, see Multimode integrated approach system Microgravity, 51, 138, 320 Military simulations electronic copilot program, 469–474 flight simulators comparison of simulator sickness, 258, 262, see also Flight simulators uninhabited aerial vehicles, 304 user intentionality and action plans, 200 Minimal perceptible actions (MPAs), 356, 357, 360, 361 Misaccommodation, 139 Misalignment, 129, 137 Misperception, 379 Missiles, 507, 512–517 Mixed disparities, 240–242
SUBJECT INDEX
Model-based planning systems, 200 Model fitting, 354, 355 Model Reference Control, 436, 437 Model task, 469–470 Modified Rhyme Test, 192 Modules, communication, 354 Moment-to-moment performance variability, 141–142 Monocular vision, see Binocular/monocular vision Motion artifacts, 537–538 Motion maladaptation syndrome, 251 Motion parallax, 25, 26, 33, 35 Motion perception, 25, 33, see also Perception Motion referents, 294 Motion sequence graph, 209, 211 Motion sickness, see also Individual entries Hyper Hospital virtual reality system, 427 measurement with questionnaires, 252–253 rest frame hypothesis, 48–49 derivation, 49–53 general discussion, 61–63 implications, 53–55 measurement experiment, 57–61 simulator sickness experiments, 55–56 safety of virtual reality technology, 414–415 sensory rearrangement, 150 telesystems intersensory conflicts, 137 limitations, 132, 134, 135 performance and user complaints, 142, 143, 145 role of individual differences, 157 total scores from virtual environment devices, 264 virtual environments and comparison with simulator sickness, 258–260 Motion sickness questionnaire (MSQ), 252–253 Motivation, 83 Mounting grip, 406–407, 408, see also Laparoscopy Moving-base devices, 265–266 MPAs, see Minimal perceptible actions MPEG4, 364, 365 MRI, see Magnetic resonance imaging scan MSQ, see Motion sickness questionnaire
571
Multiexpert approach, 472, see also Expert; Knowledge Multimodality ergonomic approach, 479–480 interface adaptiveness, 476–477 technical approach, 478 Multimode integrated approach system (MIAS), 369–370 Multitasking, 525 Multivariate distribution, 503 Muscle-based techniques, 349 Muscle contractions, 350 MWL, see Mental workload
N NASA, see National Aeronautics and Space Administration NASA Task Load Index, 498 National Aeronautics and Space Administration (NASA), 174, 256 Natural-language processing, 359, 360 Natural law, simulations, 113–117 Nausea configural scoring ill effects, 271 simulator sickness, 256–258 symptom of motion sickness, 248 total scores from virtual environment devices, 265, 266, 267 NAVY SEALS, 178 Negative transfer, 152–153, see also Transfer Networks, 361–362, 414 Neural networks, 357 Neurological issues, 415 Newtonian models, 104, 105 Noise, 534, 535 Noncreative virtual reality, 428 Nonhuman assistance, 464 Nonidentity, 114, 116, 117, 118, see also Fidelity Nonimmersive approaches, 325 Nonintuitive actions, 135 Noninvasive visualization, 329–332 Nonmonotonic reasoning, 484 Nonphotorealistic rendering techniques, 100–101 Nonrigid body problem, 336
572
SUBJECT INDEX
Nonsocial virtual reality, 428 Norepinephrine, 426, 428 Normalization, 354 Normative expectations, 439 Numerical biopsy, 331, 332 Numerical estimates, measurement, 27–28 Nurses, 417, 419, see also Hyper Hospital system
O Object database, 206 Observers, self-tuning regulator, 437–438 Oculomotor cues, 25, 26 Oculomotor symptoms, 256–258, 265–266, 267, 271 Ontology, flight simulation, 201–204 Open-loop control action, 371, 372 Open-loop motor behavior, 28, 30 Open-loop pointing responses, 139 Operational efficiency, 247–248, see also Motion sickness Operator adaptive aids and interfaces, 474 automation systems interaction and interface adaptiveness, 475 weakness and compensation, 463 capacity and adaptive aids, 461–462 cognitive resources and adaptivity of assistance, 465–466 meaning-processing systems, 446–447, 448 resources and methods of assistance, 466 uninhabited aerial vehicles control interfaces, 310 time-delayed visual feedback, 315, 317–318 variability and definition of assistance/aid, 464 Optic array, 115, 116, 396 Optic flow, 25, 26, 350, 380–381 Optic rules, 34 Optical biopsy, 332 Optical compression, 295 Optical cues, 284, 298 Optical depression angle, 283 Optical edge rate, 383 Optical eye trackers, 307
Optical flow field, 291–292 Optical splay angle, 283, 287, 288, see also Splay angles Optical tilt, 144 Optokinetic stimuli, 249 Organ, image, 326 Orientation errors, 377, 380, 381, 382, 383, 385 Orientation tracking technology, 305–306, 312 Orientation, laparoscopy, 395 Orthodontic retainers, 177–178 Orthogonal photographs, 354 Oscillatory behavior, 315, 316 Outcomes, 317 Out-of-the-window view, 205 Out-patient office, 416 Outside-in frame of reference, 378 Overall-size disparity, 225 Overlap, 5
P Paper-and-pencil tests, 255 Parallel search, 189 Parameterization, 349 Parametric surfaces, 347 Parasympathetic nervous system, 526, 527 Parietoinsular cortex, 50 Partition, 466 Part-task training, 77, see also Training PAT, see Preflight adaptation training Pathfinder method, 202 Patients, 341, see also Medical applications, virtual reality Patrol commanders, 472 Pensacola Slow Rotation Room, 252 Percept qualities, 175 Perception binocular disparity, 221 fidelity of virtual environments, 117–120 implications for tactile, 179–180 information and useful ways of avoiding reality, 98 recalibration and postexposure aftereffects, 152, 153 tactile vision substitution systems, 173 tongue-based tactile displays, 172–174
SUBJECT INDEX
tunnel-in-the-sky display, 281, 294–298 visual space perception of egocentric distance, 22–25 conflicts, 26, 27 Perception–action coupling, 396–404 meaning-processing systems, 448 skills and immersive virtual reality applications, 93 theory, 54 Perceptual–motor adaptation, 142 Perceptual–motor immersion, 9 Perfect assistant, 463 Performance -based techniques animation techniques and virtual model of human face, 349, 350 pilot state estimator and adaptive PVI, 485, 498–499, 501 constraints, laparoscopy using DVWS systems, 397 efficiency spatial audio displays, 193–194 Super Cockpit concept, 310 electronic copilot program, 468, 469–470 human and virtual and adaptive environments, 4 meaning-processing systems, 448 monocular versus stereoscopic laparoscopes, 393 sensory substitution systems, 181–182 telesystems and decline of user complaints, 142–146 –training match, 69 virtual environment and training transfer in stroke patients, 85 Peripheral nervous system (PNS), 526, 527 Peripheral vision, 98 Perspective gradient, 283 Perspective information, 393 Perspective projection, 377 Perspective wire frame, 283 Perspiration rate, 490 PET, see Positron electron tomography Phantom force reflecting joystick, 78, 79 Phenomenal geometry, 34 theory, 22 Philosophical issues, 182–184 Phonemes, 357, 359
573
Photogrammetry, 348 Photorealistic techniques, 99, 101 Physical complaints, 133, see also Individual entries Physical modeling, 104 Physio workload, 509–510, see also Workload Physiological issues, virtual reality, 415 Physiological measurements Hyper Hospital virtual reality system, 420, 421, 427 results, 422–425 pilots inference of mental state, 484 state estimator and adaptive PVI, 498, 499, 501 workload, 485 Pick-and-place task, 393 Picture frame, 26 Picture surface, 26 Pilot four-dimensional navigational environment, 280 intentionality and action plans, 199 ontology acquisition in virtual environments, 201 spatially integrated data presentation in flight-path display, 372 state estimator (PSE) adaptive pilot–vehicle interface, 491, 498–504 evaluation, 509–510 uninhabited aerial vehicles, 304–305, 317 tunnel-in-the-sky display, 281 Pilot-induced oscillations (PIOs), 315 Pilot-vehicle interfaces (PVI), tactical air environment adaptation module, 504–506 background, 486–492 embedded tactical situation assessment, 494–497 pilot state estimator, 498–504 prototype implementation and evaluation, 507–517 situation assessment using belief networks, 492–494 PIOs, see Pilot-induced oscillations Pipeline, 362–363 Pitch angle, 294
574
SUBJECT INDEX
Pitched environment, 137 Pixel power, 102 Plane of convergence, 221 Plane surfaces, 223–228 Plastic changes, 171 Plastic surgery, 333, see also Surgery PNS, see Peripheral nervous system Point disparity, 220–223 Point of observation, 393, 394, see also Laparoscopy Polygenic maladies, 251 Polygonal mesh, 355 Polygonal surface representation, 347 Position error spatial navigation displays, 377, 380, 381, 382, 383 tunnel-in-the-sky display, 288, 289, 290, 292, 295 Position referent, 295 Position tracking errors, 272, see also Error Position-tracking technology, 305–306, 312 Positive transfer, 97, see also Transfer Positron electron tomography (PET), 329 Postadaptation effects, 9–10 Postexposure aftereffects cybersickness and configural scoring, 261 overcoming and naturally rearranged sensory environments, 152–156, 159 telesystems performance and user complaints, 143 role of individual differences in adapting, 158 Postprocessing operations, 348 Postural stability, 56, 319–320 PR, see Production rule approach Practice, 451, 457 Predefined command control, 528 Prediction errors, 319, see also Errors Predictive models, 149, 465 Preflight adaptation training (PAT), 151, 155–156, 264 Prescriptive aids, 464 Prescriptive model development, 487, 488–489 Presence concept and human–machine systems, 3, 10–11 fidelity, 120–121, 122, 125 measurement experiment, 57–61
measuring and implications of rest frame hypothesis, 54–55 visual space perception of egocentric distance, 26–27 virtual environments, 37 Primacy bias, 486, see also Bias Priority information, 485 Prism goggles, 319–320 Prismatic displacement, 147, 149, 150, 151 Prisms, 249 Problem domain, 346–354 Problem solving, 68–69 Problems, 306, 451 Procedural knowledge, 471 Production rule (PR) approach, 488–489 Progressive disclosure model, 84 Proprioception, 98, 143 Proprioceptive shift, 153 PSE, see Pilot state estimator Pseudo-horizons, tunnel-in-the-sky display control/perception of rectilinear motion, 294–295, 296, 297 dynamic optical cues, 291 static optical cues, 288, 289 Pseudo-muscle-based techniques, 349 Psycho-electrophysiological interface technologies implementation, 533–539 incorporation into interactive virtual environments, 539 parameters for interactive, 526–533 Psychological issues, 182–184, 415 Psychological linking hypothesis, 261 Psychological load, 417, see also Hyper Hospital system Psychological measurements, 422, 426–427, see also Hyper Hospital system Psychological models, 486 Psychological science, 11–14 Psychometric properties, 253–254 Pursuit Control System, 435 PVI, see Pilot-vehicle interfaces
Q Qualia, 173–174, 183–184 Questionnaires, 252–253, 256 Questions, 417, 420
SUBJECT INDEX
R R2D2, 467 Radar warning receiver (RWR) evaluation of adaptive PVI prototype, 507, 508, 509, 512, 514–517 evaluation of situation assessment module, 512 tactical situation assessment embedded in adaptive PVI, 495 Range effect, 62 Rational free-form deformation, 355 Reafference, 147 Real environments distance perception in virtual environments, 40 measurement and visual space perception of egocentric distance, 27–35 Real face–virtual face communication, 351 Real flight situation, 470 Reality specified and natural law in fidelity of virtual environments, 113–114 useful ways of avoiding, 96–101 Reality–realism, 121–123 Real-time communication link, 314 Real-time realism, information correspondence useful ways of avoiding reality, 96–101 virtual reality, 92–93 component technologies, 101–107 theory for developing tools, 93–96 Real-time tracking, 305–306 Real-world states, 360 Rearrangements error-corrective feedback, 149–150 sensory and motion sickness, 255 spatiotemporal and visually induced motion sickness, 320 telesystems engineering solutions to limitations, 134 sensory, 136–142 stable and training, 146–147 users, 129 Reasoning, 486 Recall, 68, see also Training Recovery, 152 Rectilinear flight condition, 290
575
Rectilinear motion, tunnel-in-the-sky display curvilinear motion, 283 dynamic optical cues, 290 perception/control, 294–298 temporal cues, 292, 293 Redirections, 405–407 Reduction of effect, 152 Reference frames, 51 Reflectance map, 348 Registration problem, 332 Regulation task, 282 Rehabilitation medical applications of virtual reality, 334, 336 transfer of training, 82–86 Rehearsal, 72–74, 149, see also Training Relapse, 85 Relational learning tasks, 208 Relative depth, 292 Relative motion, 48 Relaxation membrane interpolation, 348 Reliability, 253, 356, 475 Rendering, 102–104, 329 Reorganization, 171 Representation design, 442–448 Representation rules, 374, 376–377, 378–379 Representations, 26 Resistive electrodes, 535 Resolution, 307 Respiration rate, 501 Response generation, 359–360 Rest frame hypothesis (RFH) approach to presence and motion sickness, 47–49 derivation, 49–53 implications, 53–55 Restoration, 408, 409 Restrictive aperture, 27 Retinal displays, 313 Retinal images, 221 Retinas, 230 Retraining, 84–85, see also Training RFH, see Rest frame hypothesis Right/wrong, 443 Risk, 7, 70, 455–456 Risk reduction technique, 14 Risky environments, 434 Robotics, 328 Rod-and-frame effects, 51
576
SUBJECT INDEX
Roll angle, 294 Rotating/tilted rooms, 155, 251 Rotation disparity, 226–228, 234, 239 Rotation point, 395, 401, 402, 403, 404 Rotational component, 384 Rotation-induced effects, 271 Route knowledge, 72–73, see also Knowledge Rule-based interactions, 450–451, 452 Rules, 473–474 RWR, see Radar warning receiver
S SA, see Situation awareness SAB, see Scientific advisory board Saccade, 532 Safety, 204, 414, 417, 427, 463 Scale, 25, 26, 35–36, 102 Scale of degrees of automation, 505 Scale of levels of interface adaptation, 505–506 Scientific advisory board (SAB), 304, 312 Scientific tools, 326 Scoring, questionnaires, 253–254 Sculpturing, 348 SEAD, see Suppression of enemy air defenses Search times, 189, 190 Seasickness, 320, see also Individual entries Second-person perspective, 98, 99 Seeing/“seeing,” 182–183 See-through/non-see-through displays, 385 Segmentation, human body, 329, 330 Selected rest frame, 49, 54 Selection logic, 374 Selection processing, 529 Selection rules, 374 Self-adapting systems, 467 Self-luminous targets, 31–32 Self-motion, 47, 50, see also Rest frame hypothesis; Vection Self-organizing properties, 438 Self-report checklists, 253 Self-tuning regulator, 437–439 Sensor contact, 495, 496 Sensorimotor integration, 170–171 Sensorimotor interfaces, 4 Sensors, 535–537 Sensory conflict theory, 53, 54
Sensory disarrangement, 141–142 Sensory environment, 130 Sensory feedback, 132, 136, 146, 148–149 Sensory integration, 50–51 Sensory overload, 180 Sensory–perceptual problems, 133 Sensory rearrangement theory of motion, 53 Sensory rearrangements, telesystems, see also Rearrangements controlling/facilitating variables and training, 146–151 improved performance and decline of user complaints, 142–145 Sensory substitution systems, 171, 172–174, 179–180 Separation of two points, 22 SEPs, see Somatosensory evoked potentials Serial search, 189 Servo-mechanisms, 434 SGD, see Symbolic description generator Shadow parallax, 400, 401 Sharpened Romberg stance, 55–56 Shear-deformation, 234 Shear disparity, 226 Sheridan’s Presence cube, 94, see also Presence Side effects, 427, see also Hyper Hospital system Signal overlapping, 537 Signal sensing/acquisition techniques, 533–535 Signs, 451 Silver–silver chloride electrodes, 535 Simplification, meaning-processing systems, 453–454 Simulation model, 507 Simulations information correspondence for virtual reality component technologies, 104–106 overcoming effects and naturally rearranged sensory environments, 151–157, 159 reality comparison, 112–113 Simulator fidelity, 120–125, see also Fidelity Simulator sickness, see also Cybersickness; Motion sickness; Space sickness characterization, 251 experiments, 55–56
SUBJECT INDEX
factor analytic scoring, 256 profiles and results of comparative studies, 258 reduction and implications of rest frame hypothesis, 53–54 uninhabited aerial vehicles, 314, 319 virtual interface problems and human–computer communications, 61 Simulator Sickness Questionnaire (SSQ), 253, 254, 266 Simulators characterization, 114 intersensory conflicts caused by telesystems, 137, 138, 141 meaning-processing systems, 453, 456 motion- versus fixed-based and fidelity, 116, 117 opportunity for practice, 451 Single photon computed tomography (SPECT), 329 Single-point stimulus, 24 Sink-or-swim philosophy, 129 SIRE, see Synthesized Immersion Research Environment Situation assessor, 491, 496 module, 492–494, 511–512 Situation awareness (SA) pilot-vehicle interfaces, 483, 505–506 models and air combat, 487–489 tunnel display, 280 6 DF model, 205 Size, 37, 140, 385 Size deformation disparity, 226, 243 Size–distance invariance, visual space perception of egocentric distance, 23, 24, 26, 37–38 measurement, 33, 34 Skill-based interactions, 450, 451 Skills, 85–86, 456 Skin receptors, 172–173 Slant boundary, 238 Slant contrast, 240 Slants, 229, 243 Sleep, 420 Slow Rotation Room, 252 Smart mechanism concept, 449 Smooth pursuit eye movement (SPEM), 532 Snakes, 350, 354 SNHC, see Synthetic natural hybrid coding
577
Soar architecture, 214 Sociotechnical systems, 448, 449 Soft mask, 356 Somatosensory evoked potentials (SEPs), 532 Somatosensory feedback, 138 Somatosensory systems, 115–116 Space adaptation syndrome, 251–252 Space horopter, 222, 223 Space mission, 51 Space motion sickness (SMS), 51, 272 Space sickness, 256, 261, 264–265, 271 Space–time properties, 451 Space–time variables, 530 Spatality, 402 Spatial arrangement, 451 Spatial audio cueing, 189–190 Spatial audio displays components of adaptive human–machine interfaces, 194–196 speech communications, 192–194 target detection and identification, 188–192 Spatial behavior, 22 Spatial distortions, 153 Spatial information, 392–394, 396, see also Laparoscopy Spatial layout, 394–395, see also Laparoscopy Spatial masking, 194 Spatial navigation displays design complexity benefiting from previous research, 376 design questions, 374–375 dynamic cues, 382–384 frame of reference, 377–378 identification of task-specific visual cues, 379–380 integrated approach, 375 representation, 378–379 specifying parameters and coping with constraints, 384–385 specifying representation and transform rules, 376–377 static cues, 381–382 visual cues as properties of optic flow pattern, 380–381 evaluation, 385–386 integrated data presentation, 371–373 transfer of training, 72–77 Spatial orientation, 137, 138
578
SUBJECT INDEX
Spatial perception, 48, 53 Spatial qualities, 175 Spatial resolution, 235 Spatialization, 193–194 Spatiotemporal map, 209, 210 Spatiotemporal rearrangements, 320 Spatiotemporal situations, 282 Specificity hypothesis, 113–114 SPECT, see Single photon computed tomography Spectral analysis techniques, 315–316 Speech-based controls, 311 Speech communications, 192–194 Speech recognition technology, 307, 357, 491 SPEM, see Smooth pursuit eye movement Spinal cord, 526 Spine biopsy, 78–81 Splay angles static cues and spatial navigation displays, 381, 382 tunnel-in-the-sky display, 290, 291, 295, 297 Splay rate, 297, 382 Spontaneous transfer, 68, see also Training SRK model, 467 SSQ, see Simulator Sickness Questionnaire Stability, 435 Static cues, spatial navigation displays design complexity, 381–382 information in straight tunnel sections, 284, 285–290 Static optical cues, 290–291 Static synthetic data, 373, 374 Stereopsis, 234–243 Stereoscopic displays, 40, 132 Stereoscopic laparoscopes, 392–393 Stereoscopic stimuli, 139 Stimulation correspondence, 94–95, 96 Stimulus duration, 237 Stimulus fidelity, 114, 117, see also Fidelity Stimulus–response, 448, 450 Stimulus size, 236, see also Size Storage, 341 Straight tunnel sections, 284–290, see also Tunnel-in-the-sky display Stress designing experts and meaning-processing systems, 455–456
Hyper Hospital virtual reality system biochemical measurement, 421 loading protocol, 417–420 loading, 428 measurement, 416 results, 422 reduction and rehabilitation using virtual environments, 85 Stroke, 70, 82 Structural design issues, 305 Structural truth, 443–444 Subject Workload Assessment Technique, 498 Subjective fatigue, 420, 422, 426–427, 428, see also Fatigue; Hyper Hospital system Subjective procedures, 485, see also Pilot–vehicle interface Subjective techniques, 498, 499–500, see also Pilot–vehicle interface Subjects, 416–417, see also Hyper Hospital system Subliminal feedback control, 528, 529 Subsidiary awareness, 26–27 Subtle drive command control, 528, 529 Success orientation, 486 Super Cockpit, 310 Supplementary cues, 455 Suppression of enemy air defenses (SEAD), 304 Surface, 294 Surface primitives/structures, 347 Surface slant, 231–234 Surgeons, 199, 393–394, 440, 441 Surgery, medical applications of virtual reality laparoscopic devices, 392–396 simulations and education, 336, 337 video assisted, 326 Surrounding artifacts, 539 Surrounding contours, 378, 379 Survey knowledge, 74, see also Knowledge Suspension of disbelief, 121 Symbol coding, 507, 508 Symbolic description generator (SGD), 205–206 Symbology specification, 373, 374 Symbols, 451 Symmetry, 377, 380, 381
SUBJECT INDEX
Sympathetic nervous system, 526, 527 Symptom profiles, virtual environment configural scoring, 256–262 total scores from devices, 263–270 Symptomatology, 320 Symptoms, motion sickness, 248 Syntax of responses, 443 Synthesized Immersion Research Environment (SIRE), 118 Synthetic natural hybrid coding (SNHC), 363–364, 365 System description, face-to-face communication, 354–363 System-induced artifacts, 538–539
T Tac-Air-Soar architecture, 214 Tactical air environment, 304, 310, 313 Tactical planning, 486 Tactical scenarios, 512–517, see also Pilot-vehicle interfaces Tactical situation assessment, 486–487, 494–497, see also Pilot-vehicle interfaces Tactile feedback, 491, see also Pilot-vehicle interfaces Tactile situation, 484, see also Pilot-vehicle interfaces Tactile vision substitution system (TVSS), 170, 172–173, 179, 181 Target-pointing responses, 148, 149 Targets, detection/identification, 188–192 Task allocation, 485 Task automation, see Automation Task difficulty, 456 Task hierarchy, 203 Task Load Index (TLX), 317–318 Task models, 475 Task-related analysis, 533 Task validity, 212 Tau variable, 95 Technical approach, multimodality, 478 Technical constraints, 391–392, see also Laparoscopy Technical development, 392, see also Laparoscopy Telemanipulation, 373
579
Teleoperator systems, 130, 140–141 Telepresence, 131 experiment, 106–107 surgery, 340 Telesystems evidence for/causes of improved performance and user complaints, 142–146 limitations of current systems, 130–136 role of individual differences, 157–158 sensory arrangements/disarrangements, 136–142 controlling/facilitating variables, 146–151 simulations of naturally arranged sensory experiments, 151–157 Temporal cues, 291, 292–294 Temporal range information, 383 Temporal scaling effect, 384 Terrain, 369, 370 Test pilots, 472 Text, 451 Text-to-speech synthesis, 357 Texture compression, 292, 294 Texture mapping, 331–332, 355 Texture rate cues, 379 Thalamic radiation responses, 172 Therapy, 333–336 Theta waves, 530 Threat, 495, 496 Threat potential, 511–512 coding, 507 Threat speed, 496, 497 Three-dimensional coordinates, 349 Three-dimensional digitizer/scanner, 348 Three-dimensional face reconstruction, 354–355 Three-dimensional imaging, 329 Three-dimensional input, 348 Three-dimensional interactive applications, 97 Three-dimensional object, 376, 377 Three-dimensional reversible objects, 24 Three-dimensional scenes, 219, 282–283 Three-dimensional shape reconstruction, 347–349 Three-dimensional space, 26 Three-dimensional tracking task, 393 Three-dimensionality, 382
580
SUBJECT INDEX
Time constraints, 468 Time delay, 314–317 Time-delayed visual feedback, 319 Time frames, 471 Time–series relational statements, 206, 207, 209–210 Time to contact (TTC), 383–384 Time to passage (TTP), 383–384 Time-to-wall crossing (TWC), 386 TLX, see Task Load Index Tolerance limits, 384–385 Tomb Raider, 97, 98, 99 Tongue interface, 174–179 Tongue-based tactile display access by blind persons to computer graphics, 171–172 conceptual framework, 170–171 educational and vocational considerations, 181–182 implications for perception, 179–180 perceptual studies, 172–174 psychological and philosophical issues, 182–184 tongue interface, 174–179 Total (normative) scores, 254 Touch substitution, 174 Tracking devices, 305–306 errors, 315, 501 face-to-face communication, 356, 357 performance, 315–317 psycho-electrophysiological based interactive interfaces, 532 Training adaptive and electrophysiological parameters use for interactive interfacing, 529 meaning-processing systems, 453, 456 medical applications of virtual reality, 338, 339 ontology acquisition in virtual environments, 201 sensory rearrangement, 146–151, 153 simulator fidelity, 124 telesystems, 134 transfer in virtual environments and human performance overview, 67–71 spatial navigation, 72–77
medical simulation, 77–82 rehabilitation, 82–86 uninhabited aerial vehicles, 315, 320 user and limitations of telesystems, 135–136 Trajectories, spatial navigation displays design complexity, 376, 377, 378, 379 future constraints, 372, 373 Transfer, 68, 124 Transform rules, 374, 376–377 Transformation, 466 Translation, 292 Translational speed, 26 Transparency, 469 Transparent interfaces, 7–8, 477 Traveling direction, 30 Triangulation by pointing, 29–31 Triangulation by walking, 30–31, 38–39 Triangulation methods, 29–31 Trocar, 399, 403, 404, 407 Trolley, 404–405 Trust, 472 TTC, see Time to contact TTP, see Time to passage Tunnel display, 280 Tunnel geometry, 284–285 Tunnel-in-the-sky display cybernetic analysis, 280–284 dynamic optical cues, 290–294 future of the traffic management, 279–280 information in straight tunnel sections, 284–290 perception and control of rectilinear motion, 294–298 TVSS, see Tactile vision substitution system TWC, see Time-to-wall crossing Two-dimensional cues, 380 Two-dimensional digitizer, 349 Two-dimensional input, 348–349 Two-phase analysis, 471 Type hypothesis, 495, 511
U UAV, see Uninhabited aerial vehicles UCAV, see Uninhabited combat air vehicles Uchida–Kraepelin test, 420, 422, 426 Unanticipated variability, 447
SUBJECT INDEX
Unburdening, see Aiding Uncertainty, 492 Underwater distortions, 144 Uninhabited aerial vehicles (UAV), 309–314 Uninhabited combat areas vehicles (UCAV), 303 Unlearning/relearning, 153–156 Upright, 51 Urine samples, 420, 421, 425, 428 Usability constraints, 397 User-centered design, 13–14 User competence model, 478, 479 User complaints, 132–134, 142–146 User expectations, 131 User friendliness, 6, 7 User inexperience, 135–136 User knowledge, 476, see also Knowledge User needs, 8, 9 User plans, 200
V Vagueness, 99–100 Validation, 470 Validity, 253 Vanishing point, 287–288, 289, 290, 294, see also Tunnel-in-the-sky display Vection, 52–53, 118, 249–250 Velocity, 119, 379 Velocity cues, 383 Ventriloquism, 145 VEPs, see Visual evoked potentials Verbal report, 28, 31, 34–35, 37 Vergence, 228, 229 Vertical disparity functions, 228–234 global and interactions with local horizontal disparity, 238–240 stereopsis, 221, 222 Vertical displacement, 225 Vertical displacement cues, 289–290 Vertical–horizontal illusions, 27 Vertical-shear disparity characterization, 226–228 global and aniseikonia, 229 image misalignment, 234 limitations and distortions of stereopsis, 235, 236, 238, 241
581
Vertical-size disparity, 224, 225, 226, 235 Vertical-track error, 382 Vertical tunnel cues, 285–287 VEs, see Virtual environments Vestibular nuclei, 50 Vestibular symptoms, 427, 428 Vestibular systems, 115–116 Vestibulo-ocular reflex (VOR), 140–141, 151 Video images, 334, 335 Vieth–M¨uller circle, 222, 223, 231, 233, 234 Viewing volume, 377 Viewpoint parallax, 398, 400, 401, 404 Views, distorted, 271 VIMS, see Visually induced motion sickness Violence, 428 Virtual augmented interfaces, 310, 311 Virtual cues, 54 Virtual desktop systems, 27 Virtual environments (VEs) aspect of virtual reality, 325 definition, 3–11 devices and configural scoring, 263, 269 flight simulator comparison of effects and configural scoring, 270–272 operators of uninhabited aerial vehicles, 309–314 transfer of training, 70 medical simulations, 77–81 spatial navigation, 72–77 visual space perception of egocentric distance, 36–40 Virtual face–virtual face communication, 352–353 Virtual interface, 61, 63 Virtual interview, 422 Virtual Life Network (VLNET), 361 Virtual models, 149 Virtual reality, 92, 93–96, 243 Virtual reality engineering theory, 97 Virtual TV, 57 Viseme, 359, 363 Visible Human Project, 326, 327, 332 Visual adaptation, 144 Visual background, 52, 54 Visual capture, 145 Visual cues, 272, 377, 379–380, 384 Visual defect, 332–333 Visual distortions, 156–157 Visual evoked potentials (VEPs), 532
582
SUBJECT INDEX
Visual fatigue, 427, see also Hyper Hospital system Visual feedback delay, 140–141 Visual-field flow, 50 Visual hallucinations, 22 Visual horizon, 102, 103 Visual icons, 508 Visual–inertial crossover, measuring presence, 57–61 advantages, 61–63 implications of rest frame hypothesis, 55 Visual resolution, 106–107, 131, 132 Visual search paradigm, 188, 189 Visual workload, 501, 502, see also Workload Visually directed action, 28, 29 Visually directed walking, 40 Visually induced motion sickness (VIMS), see also Motion sickness configural scoring of effects of virtual environments, 271–272 cybersickness, 261–262 historical background, 249–250 uninhabited aerial vehicles, 319 Visual–motor adaptation, 143 Visual–motor delay, 154 Visual–motor skill acquisition, 153 Visual–proprioceptive conflict, 145 Visual–vestibular rearrangements, 135 VLNET, see Virtual Life Network Vocational considerations, sensory substitution, 181–182 Voice Communications Effectiveness Test, 193 Volume visualization, 329 Vomiting, 251, see also Nausea VOR, see Vestibulo-ocular reflex
W Walls, transparency, 74, 77 Water loading, 420, 421 Waterfall effects, 173, 180 Water–glass–air interface, 156 Waveform method, 175–179 Wet resistive electrodes, 535 What if functionalities, 471 Wheelchair, motorized, 336 Wizard of Oz protocol, 479 Work domain, meaning-processing systems, 446, 449, 450 designing experts, 453, 455, 456 Workload automation, 463 pilot–vehicle interface prototype, 485, 489–490, 498, 499, 501, 508, 509, 513 psycho-electrophysiological based interactive interfaces, 533 speech recognition technology, 307 target detection/identification using auditory guided visual search, 192 Workload Assessment Monitor, 503 Worldview, 205
X X-ray vision, 334, 335
Z Zeltzer’s Autonomy–Interaction presence cube, 94, see also Presence Zero-disparity dots, 240, 241, 242