Advances
in COMPUTERS VOLUME 47
This Page Intentionally Left Blank
Advances in
COMPUTERS EDITED BY
MARVIN V. ZELKOWITZ Department of Computer Science and Institute for Advanced Studies University of Maryland College Park, Maryland
VOLUME 47
ACADEMIC PRESS San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. Copyright 0 1998 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Academic Press
525 B Street, Suite 1900,San Diego, California 92101-4495,USA http://www.apnet.com Academic Press
24-28 Oval Road, London N W 1 7 D X ,UK http://www .hbuk.co.uk/ap/ ISBN 0-12-012147-6 A catalogue for this book is available from the British Library
Typeset by Mathematical Composition Setters Ltd, Salisbury, UK Printed in Great Britain by Redwood Books, Trowbridge, Wiltshire
98 99 00 01 02 03 RB 9 8 7 6 5 4 3 2 1
Contents CONTRIBUTORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PREFACE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
...
Xlll
Natural Language Processing: a Human-Computer Interaction Perspective Bill Manaris
1. 2. 3. 4. 5. 6. 7.
lntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Field of Natural Language Processing . . . . . . . . . . . . . . . Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linguistic Knowledge Models . . . . . . . . . . . . . . . . . . . . . Knowledge and Processing Requirements . . . . . . . . . . . . . . . Multimodal Interaction . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . .
2 4 12 16 26 46
55 57 58
Cognitive Adaptive Computer Help (COACH): A Case Study Edwin J. (Ted) Selker
1. 2. 3. 4.
5. 6. 7. 8. 9. 10.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The COACH Scenario . . . . . . . . . . . . . . . . . . . . . . . . . Review of literature: On-line Computer Teaching . . . . . . . . . . . Requirements for an Adaptive Help Testbed . . . . . . . . . . . . . Technical Considerations for Creating COACH . . . . . . . . . . . . An Architecture for Adaptive User Help . . . . . . . . . . . . . . . A COACH Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluation of COACH Adaptive User Help . . . . . . . . . . . . . . Development Status . . . . . . . . . . . . . . . . . . . . . . . . . . Future Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . .
69 70 76 84 86 90 108 110 122 131 137
Cellular Automata Models of Self-replicating Systems James A . Reggia. Hui-Hsien Chou and Jason D . Lohn
1. Why Study Self-replicating Systems?
v
. . . . . . . . . . . . . . . . . 142
vi
CONTENTS
2. Early Self-replicating Structures . . . . . . . . . . . . . . . . . . . . 143 3 . Self-replicating Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 150 4 . Emergence of Self-replication . . . . . . . . . . . . . . . . . . . . . 160 5. Programming Self-replicating Loops . . . . . . . . . . . . . . . . . . 173 6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 References and Further Reading . . . . . . . . . . . . . . . . . . . . 180
Ultrasound Visualization Thomas R . Nelson
1. 2. 3. 4.
Introduction to Ultrasound/Acoustic Imaging . . . . . . . . . . . . Ultrasound Image Formation . . . . . . . . . . . . . . . . . . . . . . Volume Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . .
. 186 189 214 240 244
Patterns and System Development Brandon Goldfedder
1. WhatarePatterns? . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . Analysis: What is a Pattern? . . . . . . . . . . . . . . . . . . . . . . 3 . An Example Pattern: HandsInView . . . . . . . . . . . . . . . . . . 4 . Okay-So What Does This Have to do with Software? . . . . . . . .
5. 6. 7. 8. 9. 10. 11. 12. 13.
256 256 262 263 Applying Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Beware: Misapplication of Patterns . . . . . . . . . . . . . . . . . . 280 Reality Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Advantages of Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 282 Applying Patterns in the Development Process . . . . . . . . . . . . 283 Frameworks and Patterns . . . . . . . . . . . . . . . . . . . . . . . 284 Capturing Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Where Now? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Special Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 References and Further Reading . . . . . . . . . . . . . . . . . . . . 291
High Performance Digital Video Servers: Storage and Retrieval of Compressed Scalable Video Seungyup Paek and Shih-Fu Chang
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
294
CONTENTS
vii
2. 3. 4. 5. 6. 7.
Compressed MPEG Video . . . . . . . . . . . . . . . . . . . . . . . 295 Inter-diskDataPlacementof ConstantBitRate Video . . . . . . . . . 299 Buffer Replacement Algorithms . . . . . . . . . . . . . . . . . . . . 309 Interval Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Batching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Retrieval Scheduling and Resource Reservations of Variable Bit Rate Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 References and Further Reading . . . . . . . . . . . . . . . . . . . . 338 Software Acquisition: The Custom/Package and Insource/Outsource Dimensions Paul Nelson. Abraham Seidmann and William Richmond
1. 2. 3. 4. 5. 6. 7.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 The Software Acquisition Cost-Benefit Framework . . . . . . . . . . 345 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 AlternativeModels of the Software Acquisition Problem . . . . . . . 354 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 References and Further Reading . . . . . . . . . . . . . . . . . . . . 365
AUTHOR INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
369
SUBJECT INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
383
CONTENTS OF VOLUMES IN THIS SERIES . . . . . . . . . . . . . . . . . . 391
This Page Intentionally Left Blank
Contributors Shih-Fu Chang is an associate professor at the Department of Electrical Engineering and Center for Telecommunications Research at Columbia University. His current research interests include visual information systems, content-based visual query and digital video processing and communications. His group has developed several on-line prototypes of visual information systems, and he is co-leading the development of a Video-onDemand/Multimedia Testbed at Columbia. He holds two US patents (with four pending) and serves on the editorial boards, the program committees, and in chair positions of several international journals and conferences. Professor Chang has received two best paper awards, as well as an NSF Career award (1995-98) and an IBM Faculty Development award (1995-98). Hui-Hsien Chou got his Ph.D. in Computer Science from the University of Maryland at College Park, and his B.S. from the National Taiwan University. He is currently a Post-Doctoral Fellow at The Institute for Genomic Research (TIGR). His primary research interests are emergent behaviors modeled using cellular automata, computer graphics, artificial intelligence, and bioinformatics. He has published several research papers in these fields. Brandon Goldfedder is the Vice President of Technical Services for Emerging Technologies Consultants, Inc (ETCI). He has trained hundreds of developers in Design Patterns as well as general Object Oriented and programming techniques. He is currently focused on distributed component based architectures. He has a degree in Electrical Engineering and an M.S. degree in Computer Science from the Johns Hopkins University. He currently resides in Reston, VA with his wife, Susan and overgrown puppy, Misty. He can be reached at:
[email protected].
Jason D. Lohn is a Research Scientist with Caelum Research Corporation at NASA Ames Research Center. Previously, he was a Visiting Scholar at Stanford University and worked at IBM Corporation. He received his M.S. and Ph.D. in Electrical Engineering from the University of Maryland, and his B.S. in Electrical Engineering from Lehigh University. In his research he has made contributions in self-replicating automata and parallel processing. His other research interests include genetic algorithms, evolvable hardware, and biological computation.
ix
X
CONTRIBUTORS
Bill Manaris is an assistant professor in the Computer Science Department at the University of Southwestern Louisiana. He holds a B.S. (1986), an M.S. (1988), and a Ph.D. (1990) in Computer Science. His research interests are in the areas of artificial intelligence, natural language processing, and humancomputer interaction. His current work focuses on development methodologies for speech understanding interfaces using symbolic, statistical, and connectionist techniques. He can be reached at
[email protected] and http://www.usl.edu/-manaris.
Paul Nelson is an associate professor of Marketing at the William E. Simon Graduate School of Business Administration at the University of Rochester. His current research interests include electronic shopping, brand and service design, multiattribute models of consumer behavior and sales force management. Thomas Nelson is a professor in the Division of Physics, Department of Radiology at the University of California at San Diego. He obtained a B.A. and an M.S. degree at San Diego State University and a Ph.D. in medical physics from the University of California, Los Angeles. His principal areas of research are in medical visualization and imaging. This encompasses image analysis, mathematical modeling of function, volume visualization and systems integration. Most recently, his research group has been involved in three-dimensional ultrasound. Their primary research has focused on fetal imaging and they were the first group to demonstrate 3D ultrasound images of the beating fetal heart. Seungyup Paek is a Ph.D. candidate and a graduate research assistant at the Department of Electrical Engineering and Center for Telecommunications Research at Columbia University. His current research interests include visual information systems, video servers, image classification, content-based visual query, and digital video processing and communications. He is currently involved in projects to develop on-line prototypes of visual information systems and is also involved in the development of a Video-on-Demand/ Multimedia Testbed at Columbia University. James A. Reggia is a Professor of Computer Science at the University of Maryland at College Park, with joint appointments in the Institute for Advanced Computer Studies and in the Department of Neurology of the School of Medicine. He received his Ph.D. in Computer Science from the University of Maryland, and also an MD degree. Dr Reggia’s research interests are in the areas of neural computation, adaptive and/or self-organizing systems, self-replicating systems, evolutionary computation, and abductive
CONTRIBUTORS
xi
reasoning based on causal networks. He has authored numerous research papers in these areas. He is a former NSF Presidential Young Investigator, and received a Distinguished Faculty Fellowship Award from the University in 1992.
William Richmond is an associate with Perot Systems Corporation in Vienna, Virginia. He is currently an engagement manager on a financial systems installation for a banking services company. Before joining Perot, William was an associate professor at George Mason University. Current research interests include outsourcing and organizational design for information technology service providers. Abraham Seidman is the Xerox Professor and Area Coordinator of Computers and Information Systems, Management Science and Operations Management at the William E. Simon Graduate School of Business Administration, University of Rochester. His current research interests include information economics, business process reengineering, workflow systems, operations management and strategic interorganizational mformation systems. Ted Selker is an IBM Fellow with the IBM Almaden Research Center. He works on cognitive, graphical and physical interface. Ted is known for the design of the “TracWoint 111” in-keyboard pointing device with performance advantages derived from a special behavioral/motor-match algorithm, for creating the “COACH’ adaptive agent that improves user performance (Warp Guides in OS/2), and for the design of the 755CV notebook computer that doubles as an LCD projector. Dr Selker obtained his B.S. from Brown University, his M.S. degree from University of Massachusetts and his Ph.D. degrees from City University of New York in Computer Science and Information Sciences and Applied Mathematics.
This Page Intentionally Left Blank
Preface Advances in Computers is the longest running anthology chronicling the everchanging landscape of the information technology industq. Published continuously since 1960, this series presents those topics that are currently making the greatest impact on computers and the users of those systems. In this volume we present seven chapters. The first four have a common thread-the use of technology to take over tasks generally assumed to require human thinking, what is often called artificial intelligence. In these chapters we present natural language processing, visualization and self-replication as machine implementations of human activities. The other three chapters present other recent advances important to the information processing field. In the first chapter, Bill Manaris reports on “Natural Language Processing: a Human-Computer Interaction Perspective.” In this chapter Professor Manaris looks at the problems in communicating with a computer program in a natural language (e.g. English, French) rather than in arcane “computerspeak.” The chapter surveys many of the developments in the field, from the famous ELIZA therapy program of Weizenbaum of the 1960s to today’s sophisticated User Interface Management Systems (UIMSs). In the second chapter, Ted Selker looks at the help computer technology can provide to aid users to understand the user interfaces discussed in Chapter 1. In “Cognitive Adaptive Computer Help (COACH): A Case Study,” Dr Selker describes his COACH system, as an example of an intelligent agent that gives effective responses in the interface between people and machines. The original COACH was tested as an aid in writing LISP programs, and the successor system COACH/2 includes a What You See Is What You Get (WYSIWYG) authoring tool and is being used to explore 3D animation. Chapter 3, “Cellular Automata Models of Self-Replicating Systems” by James Reggia, Hui-Hsien Chou, and Jason Lohn, discusses the formalism behind one of the most important aspects of artificial (or natural) life. The ability to reproduce is a critical biological function. Can we develop artificial systems that reproduce? If so, what is the essential difference between those systems and biological ones? In this chapter, the authors discuss some of the formal rules that have been developed to discuss replication in the artificial, or robotic, domain. Chapter 4, the last chapter related to artificial intelligence, “Ultrasound Visualization” by Dr Thomas Nelson, is an application of this technology in the medical domain. In the medical field, three technologies have been used to allow for non-invasive study of the human body-computer tomography
xiii
xiv
PREFACE
using X-ray film (CT scan), magnetic resonance imaging (MRI) and ultrasound. Ultrasound, because of no known bioeffects and moderate costs, has an advantage over the other two technologies, but the resolution of the images is not nearly as good. This chapter discusses new approaches toward making ultrasound images that can be used to replace those other visualization techniques. Chapter 5 presents an important trend in software design and development. In “Patterns and System Development,” Brandon Goldfedder discusses the concept of design patterns. Object-oriented design has become a common, and often meaningless, term describing “good” software development. It is the application of such concepts as design patterns that makes object-oriented technology of importance. In the next chapter, Seungyup Paek and Shih-Fu Chang in “High Performance Digital Video Servers: Storage and Retrieval of Compressed Scalable Video” look at possible solutions to a growing problem. With the increased use of the Internet and World Wide Web, billions of bits of data, often representing pictures and movies, are being transmitted world-wide. This places a significant load on the telecommunications channels needed to transport that information. In this chapter, the authors discuss methods where video images can be appropriately compressed and transmitted reliably and accurately at greatly decreased bandwidths. In the final chapter, Paul Nelson, William Richmond and Abraham Seidman in “Software Acquisition: The Custom/Package and Insource/Outsource Dimensions” look at a current trend in the computer industry. Previously, most software was “contract” software built for a specific application by the organization needing the software. Today there is a trend to “outsource” the product to an independent developer. In this chapter, the authors discuss a framework where this acquisition decision can be evaluated-whether to build the product or outsource it to another organization. I would like to thank all of the-contributors for their time and effort in preparing their chapters for publication. Writing a chapter requires a significant investment in order to convey its message to the informed reader. If I have missed a topic that you would llke to see in a future Advances, please let me know at
[email protected]. I hope you find this volume of use to you.
MARVINV. ZELKOWITZ
Natural Language Processing: A Human-Computer Interaction Perspective BILL MANARIS Computer Science Department University of Southwestern Louisiana Lafayette, Louisiana
Abstract Natural language processing has been in existence for more than fifty years. During this time, it has significantly contributed to the field of human-computer interaction in terms of theoretical results and practical applications. As computers continue to become more affordable and accessible, the importance of user interfaces that are effective, robust, unobtrusive, and user-friendly-regardless of user expertise or impediments-becomes more pronounced. Since natural language usually provides for effortless and effective communication in human-human interaction, its significance and potential in human-computer interaction should not be overlooked-either spoken or typewritten, it may effectively complement other available modalities,' such as windows, icons, menus, and pointing; in some cases, such as users with disabilities, natural language may even be the only applicable modality. This chapter examines the field of natural language processing as it relates to human-computer interaction by focusing on its history, interactive application areas, theoretical approaches to linguistic modeling, and relevant computational and philosophical issues. It also presents a taxonomy for interactive natural language systems based on their linguistic knowledge and processing requirements, and reviews related applications. Finally, it discusses linguistic coverage issues, and explores the development of natural language widgets and their integration into multimodal user interfaces.
1.
2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Natural Language and User Interfaces . . . . . . . . . . . . . . . . . . . . The Field of Natural Language Processing . . . . . . . . . . . . . . . . . . . . . 2.1 An Extended Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 2 3 4 4 7
' A modality is defined as a communication channel used to convey or acquire information. (Coutaz and Caelen, 1991). ADVANCES IN COMPUTERS, VOL. 47 ISBN 0-12-012 147-6
1
Copyright 0 1998 by Academ,ic Press All rights of reproduction in any form reserved.
2
BILL MANARIS
Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Speech Understanding and Generation . . . . . . . . . . . . . . . . . . . . 3.2 Natural Language Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Discourse Management, Story Understanding. and Text Generation . . . . 3.4 Interactive Machine Translation . . . . . . . . . . . . . . . . . . . . . . . 3.5 Intelligent Writing Assistants . . . . . . . . . . . . . . . . . . . . . . . . . 4 . Linguistic Knowledge Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Symbolic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Stochastic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Connectionist Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . Knowledge and Processing Requirements . . . . . . . . . . . . . . . . . . . . . 5.1 Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Understanding Natural Language . . . . . . . . . . . . . . . . . . . . . . 5.3 Natural Language Knowledge Levels . . . . . . . . . . . . . . . . . . . . 5.4 Classification of NLP Systems . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Linguistic Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 . Multimodal Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Effects of Natural Language on User Performance . . . . . . . . . . . . . 6.2 Natural Language Widgets . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Modality Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 User Interface Management Systems . . . . . . . . . . . . . . . . . . . . . 6.5 Development Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . 7 . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.
.
1
12 12 13 14 14 15 16 17 19 22 24 26 27 27 29 30 41 42 46 47 48 49 51 53
55 57 58
Introduction
The field of natural language processing (NLP) originated approximately five decades ago with machine translation systems. In 1946. Warren Weaver and Andrew Donald Booth discussed the technical feasibility of machine translation “by means of the techniques developed during World War I1 for the breaking of enemy codes” (Booth and Locke. 1955. p . 2) . During the more than fifty years of its existence. the field has evolved from the dictionary-based machine translation systems of the fifties to the more adaptable. robust. and user-friendly NLP environments of the nineties . This evolution has been marked by periods of considerable growth and funding “prosperity. ” followed by years of intense criticism and lack of funding . This article attempts to provide an overview of this field by focusing on its history. current trends. some important theories and applications. and the state of the art as it relates to human-computer interaction (HCI) .
1.1
Overview
Currently. the field of NLP includes a wide variety of linguistic theories. cognitive models. and engineering approaches. Although unrestricted NLP is still
NATURAL LANGUAGE PROCESSING
3
a very complex problem (and according to some, an AI-complete problem2), numerous successful systems exist for restricted domains of discourse. In the context of HCI, NLP applications range from various speech recognition systems to natural language interfaces to database, expert, and operating systems, to a multitude of machine translation systems. Currently, interactive applications may be classified along the following categories (Manaris and Slator, 1996; Obexmeier, 1989): 0
0 0 0 0
speech recognition/understanding and synthesis/generation; natural language interfaces; discourse management, story understanding, and text generation; interactive machine translation; intelligent writing assistants.
This article is organized as follows. Initially, it addresses the relationship between natural language processing and human-computer interaction, as well as the potential in the merging of the two fields. It briefly examines the field of linguistics and attempts to provide a concise, yet complete definition for NLP in the context of HCI. It then takes a historical journey through major advances and setbacks in the evolution of the field; specifically, it looks at the early machine translation years, the ALPAC report and its impact, and the field’s three major evolutionary phases. Next it examines application areas and gives exarriples of research and production systems. It presents significant approaches in linguistic modeling, namely symbolic, stochastic, connectionist, and hybrid. Then it attempts to specify a classification for NLP systems in HCI based on depth of linguistic analysis. For each of the categories, it presents examples of related research and development efforts. It discusses linguistic coverage issues, and examines methodologies for developing constrained linguistic models with respect to application domains. Next it looks at natural language as ont: of the many possible modalities at the user interface and presents multimodal integration issues. Finally, it discusses user interface management systems and development methodologies for effective human-computer interaction with natural language.
1.2
Natural Language a n d User Interfaces
As the use of computers is expanding throughout society and affecting various aspects of human life, the number and heterogeneity of computer users is Similarly to the concept of NP-complete problems, the term AI-complete has been used fci describe problems, that can only be solved if a solution to the “general” A1 problem has been discovered (Carbonell, 1996). Some argue that such problems are unsolvable (Dreyfus, 1993); others suggest that even partial solutions can be beneficial from both humanistic and economic perspectives (Mostow et ul., 1994; Reddy, 1996).
4
BILL MANARIS
dramatically increasing. Many of these users are not computer experts; they are experts in other fields who view the computer as a necessary tool for accomplishing their tasks (Day and Boyce, 1993). Consequently, the user-friendliness and robustness of interactive computer systems is becoming increasingly essential to user acceptability and overall system performance. One of the goals of HCI is the development of systems which complement (and possibly augment) the physical, perceptual, and cognitive capabilities of users. The field of NLP offers mechanisms for incorporating natural language knowledge and modalities into user interfaces. As NLP tools are becoming more powerful in terms of functionality and communicative capabilities, their contribution to HCl is becoming more significant. Speech is the ultimate, ubiquitous interface. It is how we should be able to interact with computers. The question is when should it begin supplementing the keyboard and mouse? We think the time is now.3
As the relationship between NLP tools and HCI matures, more sophisticated systems are emerging-systems which combine intelligent components with various communicative modalities, such as speech input and output, non-speech audio, graphics, video, virtual environments, and telepresence. As the two fields become more integrated, new developments will make it possible for humans to communicate with machines which emulate many aspects of human-human interaction. These new interfaces will transform computers from machines which are visible, and attention-grabbing, to tools that are transparent and embedded; these tools will be so natural to use that they may become part of the context in such a way that the user may be able focus on the task and not the actual tool (Weiser, 1994; Winograd and Flores, 1986, p. 164).
2. The Field of Natural Language Processing 2.1
An Extended Definition
This section will attempt to answer the question “what is NLP?” As it turns out this is a difficult question to answer as “there are almost as many definitions of NLP as there are researchers studying it” (Obermeier, 1989, p. 9). This is due to the fact that from a linguistics point of view there are many aspects in the study of language; these aspects range from speech sounds (phonetics), to sound structure (phonology), to word structure (morphology), to sentence structure (syntax), to meaning and denotation (semantics), to styles and dialects (language variation), to Bruce Armstrong, Manager of the Speech Technologies Group, Wordperfect, The Novel1 Applications Group, as quoted in Markowitz (1996).
NATURAL LANGUAGE PROCESSING
5
language evolution, to language use and communication (pragmatics), to speech production and comprehension (psychology of language), to language acquisition, and to language and the brain (neurolinguistics) (Akmajian et al., 1990). Moreover, there are other disciplines that contribute to NLP; for example philosophy focuses on questions such as “what is the nature of meaning?” and “how do words and sentences acquire meaning?” (Allen, 1994a).
2.1.1
Linguistics
In order to derive a definition for NLP, one could focus momentarily on the nature of linguistics. Linguistics includes the study of language from prescriptive, comparative, structural, and generative points of view. This study was started at least two thousand years ago by the Sanskrit grammarians (Winograd, 1983); yet, after all this time, we still have not made much progress in undtxstanding, explaining, and modeling-the latter being an essential aspect in developing computational entities-this aspect of human existence. In order to realize the difficulty of the endeavor, as well as the field’s fascinating appeal, one could attempt to answer the highly related questions “what is the nature of language?” and “how does communication work?” Some might agree that these questions are equivalent (at least in complexity) to “what is the nature of intelligence?” The point being made is that such questions may be simply too broad to answer. Yet, it is such questions that fields like linguistics, natural language processing, human-computer interaction, and artificial intelligence are addressing. Although it is quite probable that such questions will never be completely a n ~ w e r e dit, ~is through this process of self-study that we have made intriguing discoveiries and achieved some significant results. The resultant techniques have found their way into commercial and industrial applications, and are quietly changing our lives (Munakata, 1994).
2.1.2
Motivations, Definition, and Scope
There are two motivations for NLP, one scientific and one technological (Allen, 1994a). The scientific motivation is to understand the nature of language. Other traditional disciplines, such as linguistics, psycholinguistics, and philosophy, do not have tools to evaluate extensive theories and models of language comprehension and production. It is only through the tools provided by computer This idea is related to the self-modeling paradox effectively captured by the aphonsm: “If’ the mind was so simple that we could understand it, we would be so simple that we couldn’t.’’ Or, as Ilofstadter (1979) puts it, “[t]o seek self-knowledge is to embark on a journey which ... will always Ix incomplete, cannot he charted on any map, will never halt. [and] cannot be described” (p. 697).
6
BILL MANARIS
science that one may construct implementations of such theories and models. These implementations are indispensable in exploring the significance and improving the accuracy (through iterative refinement) of the original theories and models. The technological motivation is to improve communication between humans and machines. Computers equipped with effective natural language models and processes could access all human knowledge recorded in linguistic form; considering the revolution in information dissemination and communication infrastructure that has been introduced by the World-Wide-Web, one could easily see the importance and potential of such systems. User interfaces with natural language modalities (either input or output, spoken or typewritten) would enhance human-computer interaction by facilitating access to computers by unsophisticated computer users, users in hands-busy/eyes-busy situations (such as car driving, space walking, and air traffic control tasks), and users with disabilities. Actually, the development of this technology for the latter group is motivated by US federal legislation and guidelines, such as (a) the US Public Laws 99-506 and 100-542 which mandate the establishment of accessible environments to citizens with disabilities, (b) the 1989 US General Services Administration’s guide, Managing End User Computing for Users with Disabilities, which describes accommodations for disabled computer users (Shneiderman, 1993), and (c) the 1996 US Telecommunication Act. In this context, it does not matter how closely the model captures the complexity of natural language communication; it only matters that the resultant tool performs satisfactorily in a given domain of discourse, or complements/outperforms any alternative solutions. This article adheres to this perspective in presenting and discussing various NLP theories, models, and applications. In this context, and given the state-of-the-art, NLP could be defined as the discipline that studies the lirzguistic aspects of human-human and human-machine communication, develops models of linguistic competence and performance,’ employs computational frameworks to implement processes incorporating such models, identifies methodologies for iterative refinement of such processes/models, and investigates techniques for evaluating the resultant systems. NLP is an interdisciplinary area based on many fields of study. These fields include computer science, which provides techniques for model representation, and algorithm design and implementation; linguistics, which identifies linguistic models and processes; mathematics, which contributes formal models and methods; psychology, which studies models and theories of human behavior; philosophy, which provides theories and questions regarding the underlying Chomsky (1965) defines competence as the linguistic knowledge of fluent speakers of a language, and perjonizance as the actual production and comprehension of language by such speakers (Akmajian et a[., 1990).
NATURAL LANGUAGE PROCESSING
7
principles of thought, linguistic knowledge, and phenomena; statistics, which provides techniques for predicting events based on sample data; electrical engineering, which contributes information theory and techniques for signal processing; and biology, which explores the underlying architecture of linguistic processes in the brain.
2.2
Historical Background
This section discusses major milestones in the history of NLP related to human-computer interaction. The information included in this section is somewhat fragmented, due to the non-historical focus of the article, and possibly biased, due to the author’s personal path through NLP; the reader is encouraged to follow the references for a more complete historical overview.
2.2.1 The Beginning-Machine
Translation
A common misconception is that NLP research started in the late 1960s or early 1970s. Although there was a large body of work that was performed during this period, especially within the symbolic paradigm,‘ research in NLP actually began in the late 1940s with early studies and computational implementations of systems attempting to perform mechanical translations of Soviet physic:j papers into English (Bar-Hillel, 1960; Slocum, 1986). Specifically, one of i:he first machine translation (MT) projects began in 1946 at the Department of Numerical Automation, Birkbeck College, London, as Warren Weaver and Andrew Donald Booth began work on computational translation based on expertise in breaking encryption schemes using computing devices (Bar-Hillel, 1960; Booth and Locke, 1955; Obermeier, 1989). Specifically, Weaver (1955, p. 18) states: When I look at an article in Russian, I say “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode it”. The work of the Birkbeck group continued until the mid-1960s and made significant contributions to other NLP areas such as preparation of word-indices and concordances (Josselson, 1971). MT research in the US started in 1949 at the University of Washington to be closely followed by The RAND Corporation (1950), MIT (1951), Georgetown University (1952), and Harvard University (1953). As far as other countries are concerned, the USSR joined the MT research effort in 1955, Italy in 1958, and Israel in 1958. As of April 1, 1959, there were eleven research groups in the US, seven in the USSR, and four in other countries employing an estimated 220 full-time researchers and developers (Bar-Hillel, 1960). The symbolic paradigm is covered in Section 4.1.
8
BILL MANARIS
2.2.2
The Fall-ALPAC Report
During the first years of MT research, and as considerable progress was being made towards fully-automatic, high-quality translation (FAHQT), even many skeptics were convinced that this goal was indeed attainable, and an operational system was “just around the comer.” Significant problems were being solved in quick succession, thus creating the illusion that remaining problems would also be solved quickly. This was not the case, however, as the few remaining problems were the hardest ones to solve (Bar-Hillel, 1960). Consequently, during the 1960s, disillusionment began setting in, as it became apparent that the original enthusiasm and promises for the achievement of FAHQT were unrealistic. Specifically, in 1966 the US National Academy of Sciences sponsored the Automatic Language Processing Advisory Council (ALPAC) Report which, in essence, condemned the MT field, and eventually resulted in the termination of funding for all MT projects in the US. The ALPAC report was criticized by many as narrow, biased, short-sighted, and even as based on inferior analytical work, obsolete and invalid facts, distortion of estimates in favor of human translation, and concealment of data (Josselson, 1971; Slocum, 1986).Perhaps one of the better, and more far-sighted criticisms was the following: [I]t seems premature to abandon support of machine translation after only 12 brief years, especially if the abandonment is based only on the findings of the ALPAC committee. Even the critics of machine translation admit its contribution to the knowledge of linguists. Who can say what contributions lie ahead? (Titus, 1967, p. 191) Although the ALPAC report was a major setback in MT, in specific, and NLP in general, its effects were only temporary. The contributions of early MT research to NLP were significant and long lasting in that they identified new problems for linguists and computer scientists, and thus laid the foundation for subsequent NLP research. This is clearly evident from the areas of emphasis of the pre-ALPAC conferences of federally-sponsored MT groups (Josselson, 1971):
Princeton, 1960: Dictionary design, programming strategies, compatibility/ portability of materials, code, and data formats of research groups. 0 Georgetown, 1961: Grammar coding, automatic conversion of coded materials, investigations of Russian grammar. 0 Princeton, 1962: Theoretical models for syntactic analysis, consideration of syntax problems in Russian, Arabic, Chinese, and Japanese. 0 Las Vegas, 1965: Semantic analysis. 0
This climate of cooperation contributed to the founding of the Association of Computational Linguistics (ACL)7 in 1962, to the organization of subsequent Originally, Association for Machine Translation and Computational Linguistics (AMTCL)
NATURAL LANGUAGE PROCESSING
9
meetings in the US and elsewhere, and to considerable growth in NLP research and development in both academia and industry throughout the world (Josselson, 1971; Obermeier, 1989; Slocum, 1986). During these fifty years of development, NLP research has been in a state of constant flux due to shifting goals and objectives, end-user expectations, technical and theoretical advances, predictions on the potential of hypothesisftheoryftechnique, and heated debates on the ability of cornputing devices to model certain linguistic phenomena. Possibly, the history of NLP can be seen as evolving through three phases, namely the engineering phase, the theoretical phase, and the user-centered phase. The borderlines among these phases are fuzzy and sometimes overlap. These phases are examined in the next sections.
2.2.3 Engineering Phase This first phase in NLP started in the mid-1940s and lasted until the early 1960s. It is characterized by an emphasis on algorithms relying mostly or1 dictionary-lookup techniques used in conjunction with empirical and stochastic methods. Due to the success of the latter in other fields, such as engineering and psychology, such methods were used for constructing linguistic models using corpora of natural language material. These models classified words not only based on meaning, but also on their co-occurrence with other words. They had been heavily influenced by Shannon’s work on information theory (Slhannon, 1948). These techniques were applied to MT systems with considerable early results, thus causing an overestimation of the techniques’ promise for unconstrained NLP (Bar-Hillel, 1960; Church and Mercer, 1993). During this phase, other NLP application areas began to emerge. An example is speech recognition, employing speaker-dependent, template-based architectures for digit recognition and limited phoneme classification (Fatehchand, 1960). Other events, which were unrelated to NLP in this phase, but which would considerably influence the field in subsequent phases included (a) the organization of the Dartmouth Conference in 1956, which introduced the term “artificial intelligence”-a field that would produce many NLP-shaping results, and (b) the publication of seminal works in connectionism, namely McCulloch and Pitts’ paper on formal neurons (1943), Hebb’s work on cell assemblies and synapses (1949), and Rosenblatt’s paper on perceptrons (1958) (Cowan and Sharp, 1988). Finally, Noam Chomsky’s work on the structure of language set the stage for the next NLP phase through its criticism of stochastic techniques, such as n-grams in language modeling, and its introduction of a significant theoretical framework for the analysis and generation of language (Chomsky, 1956, 1957, 1959).
10
BILL MANARIS
2.2.4
Theoretic Phase
The second phase in NLP spanned approximately from the early 1960s until the late 1980s. It is characterized by (a) a strong emphasis on theoretical topics including grammatical, logical, semantic, and pragmatic theories, (b) the construction of “toy” systems that demonstrated particular principles, and (c) the development of an NLP industry which commercialized many of this phase’s theoretical results. In terms of application foci, this phase can possibly be subdivided into three eras characterized by the early question-answering type systems and database interfaces (1960s), the broadening of the application domain to include interfaces to other interactive systems (1970s), and the commercialization of NLP research (1980s) (Bates, 1994; Obermeier, 1989). According to Grosz, et al. (1986), significant early work in this phase includes: 0
0
a syntactic parser able to effectively handle multiple analyses of syntactically ambiguous sentences (Kuno and Oettinger, 1963), and the BASEBALL question answering system which incorporated coding of words by attribute-value pairs-a technique used throughout this phase, a separate syntactic analysis, and constrained linguistic domain coverage (Green et al., 1963).
Following these systems, researchers focused on the sentence and its meaning. This was done either in isolation, with respect to the human-computer dialog, or in the immediate (preceding) context. This work laid the foundation for domainspecific applications, and commercial systems which emerged towards the end of this phase. In the late 1970s, attention shifted to semantic issues, discourse phenomena, communicative goals and plans, and user models. In 1975, in their ACM Turing Award lecture, Allen Newell and Herbert Simon formalized the physical symbol hypothesis which laid the foundation for symbolic NLP in specific, and symbolic A1 in general (see Section 4.1) (Newell and Simon, 1976). Many systems were developed to demonstrate the effectiveness of various theories in addressing various linguistic issues (see Section 5.4). Landmark systems include ELIZA (Weizenbaum, 1966), SHRDLU (Winograd, 1972), REL (Thompson and Thompson, 1975), LUNAR (Woods, 1973), SOPHIE (Brown and Burton, 1975), LIFER (Hendrix et al., 1978), to name a few. Thousands of publications were produced to describe theories, incremental results, and resultant systems; Gazdar et al. (1 987), provides a partial listing (1764) of such references appearing between 1980 and 1987. During this phase, it became clear that isolated solutions to NLP problems did not scale up well, when attempts were made to widen their linguistic coverage, or apply them to a different domain (Grosz et al., 1986; Jacobs, 1994). This realization motivated the re-examination of non-symbolic approaches which had been
NATURAL LANGUAGE PROCESSING
11
abandoned earlier by the NLP mainstream. In the stochastic arena, Baum and his colleagues developed the basic theory of hidden Markov models (HMMs), Viterbi devised an algorithm which could be applied in estimating probabilities for phonetic, lexical and other linguistic classes, and the Brown and TIMIT corpora were constructed and made available to the research community (.411en, 1994a; Church and Mercer, 1993; Rabiner and Juang, 1993). In 1985, Church used a stochastic technique based on letter trigram modeling to identify the national origin of proper names for pronunciation purposes in the context of textto-speech systems (Church, 1985). The success of this application reintroduced stochastic techniques to the mainstream of traditional NLP (Marcus, 1994) Earlier, in the connectionist arena, a milestone publication by Minsky and Papert (1969) showed that perceptrons may be less powerful than a universal Turing machine. This result inhibited progress in training algorithms for neural networks, as a way to represent linguistic and other information, for many years. Eventually, significant contributions by several researchers, such as Grossberg, Kohonen, Hopfield, and Rumelhart revitalized the significance of connectionism and set the stage for its influence on NLP. An intriguing application of this period, which demonstrated the utility of neural networks in NLP, is NETtalk. This is a system that performed text-to-speech conversion in English and was capable of handling some coarticulation effects. It worked surprisingly well, but could not handle syntactic/semantic ambiguities effectively (Clark, 1993; Cowan and Sharp, 1988; Obermeier, 1989). By the end of this phase, it had been shown that symbolic, stochastic, and connectionist approaches could address many significant problems in NLP. Moreover, stochastic and connectionist approaches had recovered from earlier criticism and were shown to be complementary in many respects to the mainstream symbolic approach. The results of this era coupled with concepts from HCI research set the stage for the final, user-centered phase.
2.2.5 User-Centered Phase In the late 1980s, NLP entered the current empirical, evaluative, “user-centered” phase. Major advances and tangible results from the last fifty years of NLP research are being reinvestigated and applied to a wide spectrum of tasks where NLP can be applied. These tasks require “real-life” models of linguistic knowledge, as opposed to the models incorporated in earlier “toy” systems. Consequently, many successful NLP applications are emerging including spelling checkers, grammar checkers, and speaker-independent, continuous-speech recognizers for various computer and telephony applications. During this phase, the field of human-computer interaction enters the mainstream of computer science. This is a result of the major advances in graphical user interfaces during the 1980s and early 1990s, the proliferation of computers, and the
12
BILL MANARIS
World-Wide-Web (Manaris and Slator, 1996). Accordingly, the evolution of NLP concerns and objectives is reflected in the continued growth of research and development efforts directed towards performance support, user-centered design, and usability testing. Emphasis is placed on (a) systems that integrate speech recognition and traditional natural language processing models, and (b) hybrid systemssystems that combine results from symbolic, stochastic, and connectionist NLP research (see Section 4.4). Developers are focusing on user interface development methodologies and user interface management systems (see Section 6.4). Due to the “user-centeredness”of this phase, theories as well as applications are being judged by their ability to successfully compete in structured evaluations or in the marketplace (Chinchor and Sundheim, 1993; Hirschman and Cuomo, 1994; Moore, 1994b; Pallett et al., 1994; Spark Jones, 1994). An emerging, promising methodology for NLP system development is the Star Model (see Section 6.5). This has been a popular development methodology in human-computer interaction, and it is now becoming relevant to NLP research, due to the emphasis on rapid-prototyping and incremental development-necessitated by the current industry focus on speech understanding and other interactive NLP applications geared towards the everyday user.
3. Application Areas Important NLP application areas include speech understanding and generation systems; natural language interfaces; discourse management, story understanding and text generation; interactive machine translation; and intelligent writing assistants (Bates, 1994; Church and Rau, 1995; Manaris and Slator, 1996; Obermeier, 1989). These areas are examined in the following sections.
3.1
Speech Understanding and Generation
The goal of speech recognitioiz systems is to convert spoken words captured through a microphone to a written representation. Speech understanding systems, on the other hand, attempt to perform a more extensive (semantic, pragmatic) processing of the spoken utterance to “understand” what the user is saying, and act on what is being said-possibly by executing some command in an underlying system such as a database, or modifying their particular knowledge of the world. Major issues in this area include speaker independence vs. dependence, continuous vs. discrete speech, complexity of the linguistic model, and handling of environment noise (Markowitz, 1996). Speech generation or synthesis systems deal with the opposite problem, namely to convert written representations of words to sounds. Compared to speech understanding, speech generation is considered by some to be a solved
NATURAL LANGUAGE PROCESSING
13
problem. This is because there exist several imperfect, yet effective speech synthesizers for many application domains and for a number of languages including English (American and British), Japanese, and Swedish (Kay et al., 1994). Major techniques utilized for speech synthesis include 0
0
0
concatenation of digital recordings, as in the output produced bby US telephone directory assistance systems; synthesis by rule, where sounds are being generated electronically through the utilization of a grammar providing information on tone, intonation, and phonetic coarticulation effects; training of connectionist architectures, as in the NETtalk system mentioned in Section 2.2.4 (Priece et al., 1994).
An interesting example of a system which combines speech recognition with speech generation is Emily (Mostow et al., 1994). Emily is an experimental speech understanding system that acts as a reading coach for children. It provides passages for reading, and listens making corrections whenever necessary-for instance, it ignores minor mistakes such as false starts, or repeated words. Reddy (1996) estimates that this project could save US taxpayers over $45 million if it could reduce illiteracy in the US by as little as 20%. Examples of speech processing (recognition, understanding, and generation) systems that have been marketed include Apple’s Plain Talk, BBN’s Hark, Decipher, DECtalk, DragonDictate, IBM VoiceType, Kurzweil Voice, ]Listen, NaturallySpeaking, Phonetic Engine (Meisel, 1993; Obermeier, 1989; Rash. 1994; Scott, 1996).
3.2 Natural Language Interfaces When examining the evolution of software systems, we observe a definite transition from the languages understood by the underlying hardware, i.e. languages based on binary alphabets, to the natural languages of humans (Feigeribaum, 1996; Firebaugh, 1988). In terms of programming languages, this transition is manifested as a shift from machine languages to assembly languages, to highlevel languages, up to non-procedural languages (also known as fourth generation languages). The goal of natural language interfaces is to bridge the gap between the linguistic performance of the user and the linguistic “competence” of the underlying computer system. These systems deal with typewritten as opposed to spoken language. They usually perform deeper linguistic analysis than tradlitional speech recognizers. Applications have been built for various domains including operating systems, databases, text editors, spreadsheets, and Internet navigation and re source location (Manaris, 1994; Manaris and Slator, 1996). Section 5.4 discusses several of these applications.
14
BILL MANARIS
Examples of natural language interfaces that have been marketed include Battelle’s Natural Language Query, BBN’s Parlance, EasyTalk, English Query Language, INTELLECT, Intelligent Query, Language Craft, Natural Language, Symantec’s Q + A Intelligent Assistant, and Texas Lnstrument’s Natural Access (Church and Rau, 1995; Obermeier, 1989).
3.3
Discourse Management, Story Understanding, and Text Generation
The objective of discourse management and story understanding systems is to process natural language input to elicit significant facts or extract the essence of what is being said. These systems require access to linguistic knowledge ranging from lexical to, possibly, world knowledge relevant to the domain of discourse (see Section 5.3). Specific applications range from indexing (text segmentation and classification), to summarization (“gisting”), to retrieval (natural language search engines, data mining), to question-and-answer dialogs. In order to perform these tasks such systems may incorporate text generation components. These components utilize the collected linguistic knowledge to generate various forms of text, such as news summaries (skimming), and special documents. Given the latest advances in integration between speech and traditional natural language processing techniques, it should be straightforward to extend such systems to incorporate spoken input and output. One example of a discourse management application that incorporates a text generation component is the patent authoring system designed by Sheremetyeva and Nirenburg (1996). This system is intended to interactively elicit technical knowledge from inventors, and then use it to automatically generate a patent claim that meets legal requirements. Another example is the system developed by Moore and Mittal (1996) which allows users to ask follow-up questions on system-generated texts. Specifically, users can highlight portions of the narrative available at the interface and, in response, the system will identify a set of followup questions that it is capable of handling. (Also, see the discussion on the DISCERN system in Section 4.4.) Examples of other systems in this area (many of which have been marketed) include ATRANS, Battelle’s READ, BORIS, Clarit, Conquest, Construe, Freestyle, FRUMP, GROK, J-Space, IPP, Oracle’s ConText, Savvy/TRS, SCISOR, Target, Tome, and Westlaw’s WIN (Church and Rau, 1995; Obermeier, 1989).
3.4
Interactive Machine Translation
This is the earliest area involving NLP in human-computer interaction. The goal of machine translation systems is to map from a source language
NATURAL LANGUAGE PROCESSING
15
representation to target language representation(s). Although no MT systern can handle unconstrained natural language, there exist many success stories in welldefined sublanguages. In many cases, pre-editing of the input and post-editing of the output may be required (Hovy, 1993; Kay et al., 1994). However, even in such cases MT systems can be very useful. This is illustrated by the fact that in 1975, and while all US government-funded MT projects had been canceled following the recommendations of the ALPAC report, [p]aradoxically, MT systems were still being used by various government agencies in the U.S. and abroad, because there was simply no alternative means of gathering information from foreign (Russian) sources so quickly. In addition, private companies were developing and selling (mostly outside the U.S.) MT systems based on the mid-1960s technology (Slocum 1986, p. 969).
As of 1994, the available languages for MT include Arabic, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Swedish (Kay er al., 1994; Miller, 1993). An example of a system that uses a well-defined sublanguage is TAUMMETEO, a fully automatic machine translation system for translating weather reports from English to French. TAUM-METE0 has been used extensively in Canada. It encapsulates a sublanguage model, which became stable in 1981. It requires almost no pre-editing of its input, or post-editing of its output. TAUMMETEO has been translating about 54000 words per day for over a decade (Church and Rau, 1995; Obermeier, 1989). (Also, see the discussion of EUROTRA in Section 5.4.) Examples of systems that have been marketed include Fujitsu’s ATLAS I and 11, Globalink’s GTS, Hitachi’s HICATS, Intergraph’s DP/Translator, Logos, Language Assistant Series, Siemens Nixdorf‘s METAL, Sanyo’s SWP-7800 Translation Word Processor, Smart Translators, Socrata’s XLT, To1tran’s Professional Translation Series, Toshiba’s AS-TRANSACT, and Tovna MTS (Church and Rau, 1995; Kay et al., 1994; Miller, 1993; Obermeier, 1989). In addition to translation of written texts, research has been directed on interactive translation of spoken utterances. One recent example of such a project is Janus-11. discussed in Section 5.4.4.2 (Waibel, 1996). Another example is Verbomil, a system designed as a portable simultaneous interpretation machine. Its goal is to mediate a dialog between two people interacting in real time using different languages, possibly over the telephone (Alexandersson et al., 199’7; Kay et al., 1994).
3.5
Intelligent Writing Assistants
Another area where NLP techniques have been successfully emp1o:yed is in providing “intelligent” support for document preparation. Examples of
16
BILL MANARIS
applications range from spell checking agents, to hyphenation routines, to intelligent spacing/formatting/text-selecting agents, to grammar checkers, to style checkers providing readability statistics, to electronic thesauri, to automated document creation/maintenance environments, to translator-support environments. Some of these applications are so well understood that many users do not consider them applications of W P ; examples include word processing, simple and approximate string matching, keyword search, and glossary look-up (Church and Rau, 1995; Hall and Dowling, 1980).*Other applications are still in an early marketability stage since their linguistic models and associated algorithms are still under development; examples include the grammar/style checkers available independently or bundled with word processing packages (Church and Rau, 1995; Oberrneier, 1989). An interesting example in this category is the Drafter system (Paris and Vander Linden, 1996). Drafter is an interactive document drafting tool. It assists technical writers with the task of managing draft, final, and updated versions of manuals in several languages. The approach is different from traditional MT, in that the system maintains a knowledge base of the concepts to be included in the manual in a language-independent form. Consequently, the translation is not dictated by the original text’s language and style. Similarly to other intelligent writing assistants, such as spell-checking agents, Drafter is not a fully automatic tool, in that it relies on the user (technical writer) to provide information about the domain of discourse. This tool supports knowledge reuse, propagation of changes throughout documents, simultaneous production of drafts in several languages, accurate and consistently used terminology, and production of stylistic variants of documents. Given the success of this area of NLP, the market is flooded with a wide spectrum of independent and embedded systems. Examples of marketed products include popular word processors, such MS Word, Wordperfect, and FrameMaker, grammar checkers, such as Grammatik, and translator workbenches such as the Eurolang Optimizer, IBM’s TranslatorManager/2, and Trados’ Translation Workbench (Church and Rau, 1995; Obermeier, 1989).
4.
Linguistic Knowledge Models
Linguistic knowledge models may be classified along four basic categories, namely symbolic (knowledge-based), stochastic (probabilistic), connectionist, and hybrid approaches. Each approach has advantages and disadvantages which make it more/less suitable for addressing specific segments of the NLP problem space. In general, regardless of the approach, development of “complete,” ‘This might be viewed as another instance of the “AI losing its best children to computer science” phenomenon.
NATURAL LANGUAGE PROCESSING
17
stable linguistic models is still a very difficult problem, especially for certain application areas such as speech understanding interfaces (Manaris and Harkreader, 1997). For this reason, investigation and adaptation of effective development methodologies from the area of human-computer interaction are becoming essential (see Sections 6.4. and 6.5). The symbolic approach is based on explicit representation of facts about language through well-understood knowledge representation schemes and associated algorithms. The stochastic approach employs various probabilistic techniques to develop approximate generalized models of linguistic phenomena based on actual examples of these phenomena. The connectionist approach also uses examples of linguistic phenomena to develop generalized models, but since connectionist architectures are less constrained than stochastic ones, such linguistic modells are harder to develop (Kay et al., 1994). Finally, hybrid approaches explore different variations of compound architectures and linguistic models attempting to use the best approach (symbolic, stochastic, or connectionist) for a given modeling subproblem in an application. The following sections examine each of these approaches in terms of their foundations, major research results, and their respective strengths and weaknesses.
4.1
Symbolic Approach
The symbolic approach in NLP is best formulated by the physical symbol system hypothesis (Newel1 and Simon, 1976), although it originated in the late 1950s (see Section 2.2.3). This hypothesis states that intelligent behavior can be modeled using a physical symbol system consisting of physical patterns (symbols) which may be used to construct expressions (symbol structures); additionally, this system contains a set of processes that operate on expressions through creation, deletion, reproduction, and arbitrary transformation. This system exists in some larger encompassing environment and evolves through time by modifying its encapsulated symbolic expressions. A great portion of the work in computational linguistic is based on this hypothesis. Specifically, numerous representation formalisms and associated analysis/ generation techniques have been developed that rely on symbolic patterns to represent various notions. These notions include phonetic, morphological and lexical constituents, as well as syntactic, semantic, and discourse structure, relationships, and constraints. Examples of such formalisms, with approximate dates in parentheses, include Chomsky’s theory of formal language grammars (regular, contextfree context-sensitive, and unrestricted) (c. 1957), Chomsky’s transformational model of linguistic competence (circa 1969, Fillmore’s case grammar (circa 1967), Halliday’s systemic grammar (circa 1967), Wood’s augmented transition network (circa 1970), Schank’s conceptual dependency theory (circa ^1975), Wilks’ preference semantics (circa 1975), Burton’s semantic grammar (circa
18
BILL MANARIS
1976), Colmerauer’s definite clause grammar (circa 1978), Gazdar’s phrase structure grammar (circa 1979), Kay’s functional grammar (circa 1979), and Bresnan and Kaplan’s lexical-functional grammar (circa 1982). Figure 1 shows a sample of a case grammar representation of the sentence “Mary will not take the apple.” Harris (1985) and Winograd (1983) provide additional details on these formalisms. Such formalisms spawned a wide variety of algorithms and systems; for examples, see Cercone and McCalla, (1986), Grosz et al. (1986), Obermeier (1989), and Pereira and Grosz (1994). Some of the apparent strengths of symbolic formalisms for NLP are as follows (Church and Mercer, 1993; Winograd, 1983): 0
0
0
0
They are well understood in terms of their formal descriptive/generative power and practical applications. They currently provide the most effective approach for modeling longdistance dependencies, such as subject-verb agreement, and wh-movement. They are usually “perspicuous,” in that the linguistic facts being expressed are directly visible in the structure and constituents of the model.” They are inherently non-directional, in that the same linguistic model may be used for both analysis and generation.
’’
Mary will not take the apple
MODALITY
I Negative Future Declarative
Mary take the apple
/ I \
I 7
VERB
take
Mary
c 2
I the apple
FIG.1. Example of case grammar representation (adapted from Ham s, 1985). ’This list is by no means exhaustive or representative of the various schools of thought within the symbolic approach to N L P moreover, it excludes significant implementations of symbolic formalisms, such as SNePS (Shapiro, 1979). Winston (1992) provides an introductory presentation on expressing language constraints, such as long-distance dependencies, using a symbolic approach (pp. 575-598). I ’ Woods (1970), uses the term perspicuity to describe the inherent readability of context-free grammars, that is being able to directly determine the consequences of a production rule for the types of constructions permitted by the linguistic model (as opposed to other formalisms such as regular g r m mars and pushdown automata).
‘”
NATURAL LANGUAGE PROCESSING
0
0
19
They can be used in multiple dimensions of patterning, that is they can be used for modeling phenomena at various linguistic knowledge levels (see Section 5.3). They allow for computationally efficient analysis and generation algorithms, as in Earley (1970) and Marcus (1978).
Some of the apparent weaknesses of the symbolic approach to NLP are (Church and Mercer, 1993; Kay et al., 1994): 0
0
0
0
Symbolic linguistic models tend to be fragile, in that they cannot easily handle minor, yet non-essential deviations of the input from the modeled linguistic knowledge-nevertheless, various flexible (robust) parsing techniques have been devised to address this weakness. Such techniques look for “islands” of structural or semantic coherence and, through heuristics or user intervention, may recover from parsing failures. Development of symbolic models requires the use of experts such as linguists, phonologists, and domain experts, since such models cannot be instructed to generalize (learn from example). Symbolic models usually do not scale up well. For instance, Slocum ( 1981) discusses how after two years of intensive development, LIFER’S E.now1edge base grew so large and complex that even its original designers found it hard and impractical to perform the slightest modification. Specifically, the mere act of performing a minor modification would cause “ripple effects” throughout the knowledge base. These side-effects eventually became almost impossible to isolate and eliminate. In many practical cases, symbolic-approach techniques perform worse than stochastic and connectionist pattern recognition systems tuned by real-life training data. They cannot model certain local constraints, such as word preferences that can be very useful for effective part-of-speech tagging and other applications.
In summary, symbolic foimalisms are the most well-studied techniques for NLP system development. Although they exhibit several weaknesses when compared against stochastic or connectionist approaches, they are still valuable and powerful mechanisms for NLP. They are especially useful, at least, in cases where the linguistic domain is small or well-defined, and where modeling of long-distance dependencies is essential.
4.2
Stochastic Approach
The driving force behind stochastic (statistical, probabilistic) models i!j their ability to perform well even in the presence of incomplete linguistic knowledge about an application domain. Such models thrive on the inherent inflexibility and
20
BILL MANARIS
fragility of symbolic models, which stem from our lack of complete understanding of many linguistic phenomena-an essential prerequisite to developing successful symbolic models. Stochastic models include a number of parameters that can be adjusted to enhance their performance (Allen, 1994a; Chamiak, 1993; Church and Mercer, 1993; Kay et al.,1994; Knill and Young, 1997; Marcus, 1994). A popular stochastic model is the hidden Markov model (HMM). Similarly to a finite-state machine, an HMM consists of a set of states (one of which is the initial state), a set of output symbols which are emitted as the system changes states, and a set of acceptable transitions among states. Additionally, each state in an HMM (as opposed to a finite-state machine) has two sets of probabilities associated with it; one determines which symbol to emit from this state (emission probabilities); the second set determines which state to visit next (transition probabilities). Once the topology of a given HMM has been decided, Baum’s (1972) backward-forward algorithm can be used to effectively derive these two sets of probabilities using a set of training data; by adjusting appropriately the model’s parameters, this algorithm is guaranteed to either improve or, at least, not worsen the performance of the model. Once the HMM’s parameters have been adjusted, the Viterbi (1967) algorithm may be used to recognize specific input against the trained HMM. Another stochastic model is the probabilistic context-free grammar (PCFG). Similarly to HMMs, which extend finite-state machines with probabilities, PCFGs extend context-free grammars with probabilities. One approach is to assign probabilities based on rule use; that is, using a set of training data, the probability of each rule’s “significance” can be determined based on the frequency of this rule’s contribution to successful parses of training sentences (see Fig. 2). If one is willing to compromise potential accuracy over efficiency, given a trained PCFG, there exist several effective algorithms that can be employed. For example, one could employ an N-first parsing algorithm. This algorithm will not explore every possible alternative, but only a preset number of most promising alternatives, e.g.,
S
S NP NP NP VP VP VP VP PP
+ + + + + + -+
+ +
+
NP VP VP N N PP N NP
v
V NP v PP V NP PP P NP
0.85 0.15 0.40 0.35 0.25 0.32 0.31 0.19 0.18 1.00
P
+
N
+
N
+
v + V - f v - + N
N
-
like like flies jumps
flies
f
jumps banana
+
time
FIG.2. Probabilistic context-free grammar (adapted from Chamiak, 1993).
1.00 0.40 0.40 0.20 0.45 0.05 0.30 0.20
21
NATURAL LANGUAGE PROCESSING
three or four. This can be accomplished by using a probability cutoff value to prune less promising parse subtrees. Of course, such techniques are not admissible, in that it is possible that they may prune a subtree which, although initially not very promising, would be a much better alternative at the end of a complete search through the parse space. However, assuming that the system explored constituents in the native language order (e.g., left-to-right for English), then the types of sentences that it would probably get confused on would be the same type of sentences that would confuse a native speaker, i.e., garden-pth senteitces, such as “we gave the girl the cake was baked by grandma’s recipe.”’ Some of the strengths of the stochastic approach are (Church and Mercer, 1993; Kay et al., 1994; Knill and Young, 1997; Rabiner and Juang, 1993; Schmucker, 1984):
’*
’
0
0
Stochastic systems are effective in modeling language performance through training based on most frequent language use. They are useful in modeling linguistic phenomena that are not well-understood from a competence perspective, e.g., speech. The effectiveness of a stochastic system is highly dependent on the volume of training data available; generally more training data results to better performance. Stochastic approaches may be easily combined with symbolic models offlinguistic constraints, such as dialog structure, to enhance the effectiveness and efficiency of an application (hybrid systems are discussed in Section 4.4). Stochastic models can be used to model nuances and imprecise concepts such as few, several, and inany, that have traditionally been addressed by fuzzy logic.
Some of the weaknesses of the stochastic approach are (Church and Mercer, 1993; Kay et al., 1994; Rabiner and Juang, 1993):
0
Run-time performance of stochastic systems is generally linearly proportional to the number of distinct classes (symbols) modeled, and thus can degrade considerably as classes increase; this holds for both training and pattern classification. In general, given the state of the art in corpus development, producing training data for a specific application domain can be a time-consuming and error-prone process. Thus, since the effectiveness of stochastic systems is tightly bound to extensive, representative, error-free corpora, the difficulty
” A search algorithm is admissible if it is guaranteed to find the optimum path to a solution, if such a path exists (Luger and Stubblefield, 1998). l 3 The embedded clause “the cake was baked by” modifies “girl”. For additional discussion on gardenpath sentences and examples, see Section 5.4.2.3and (Akmajian er al., 1990; Winograd, 1983).
22
BILL MANARIS
of developing such systems might be, in the general case, similar to that of other approaches. Stochastic approaches are very effective in addressing modeling problems in application domains where traditional symbolic processes have failed. Although their potential is still being explored, and thus new significant results may be discovered, they have already proven to be a valuable approach to NLP for human-computer interaction.
4.3
Connectionist Approach
Similarly to the stochastic approach, the connectionist approach is also based on employing training data to improve the performance of linguistic models. The difference between the two approaches is in the complexity of the system architecture. Specifically, connectionist models consist of massive interconnected sets of simple, non-linear components. These components operate in parallel, as opposed to the non-parallel systems found in other approaches, such as finite-state machines and context-free frameworks. l4 Acquired knowledge is stored in the pattern of interconnection weights among components. There exist various characteristics that affect the performance and utility of connectionist systems, such as number and type of inputs, connectivity, choice of activation threshold/function, and choice of update function (Caudill and Butler, 1990, 1992; Cowan and Sharp, 1988; Firebaugh, 1988; Kay et al., 1994; Markowitz, 1996; Obermeier, 1989; Rabiner and Juang, 1993). Although it has been argued that, due to their lack of internal structures, connectionist architectures are not competent in handling natural language (Fodor and Pylyshyn, 1988), such architectures have been used to model various linguistic phenomena, especially in phonology, morphology, word recognition (spoken and written), noun-phrase understanding, prepositional-phrase attachment, scriptbased narratives, and speech production (Elman, 1991; Miikkulainen, 1993; Wermter and Weber, 1996). For example, Fig. 3 shows a connectionist approach to phoneme classification. The input consists of a collection of feature vectors derived through a temporal context window centered on a target vector. Each of these vectors has been generated from speech input analyzed with a spectral analysis method such as Linear l4 Nevertheless, from a theoretical perspective, it can be argued that symbolic, statistical, and connectionist approaches are not necessarily distinct; for instance, connectionist and statistical architectures are usually implemented on top of non-parallel computational architectures which conform to the physical symbol system hypothesis. From a practical perspective, however, this distinction is meaningful, since different approaches appear to be best suited to different regions of the NLP problem space-similarly to how different high-level programming languages are best suited to different regions of the problem space solvable by universal Turing machines.
23
NATURAL LANGUAGE PROCESSING
Input Feature Vectors
Phonetic Labels
Phoneme Probability Vect.or 0.04 0.08 0.36
0.08
0
. .
: :
0
0.112 0.04 0.05 FIG.3. Phoneme probability estimator (adapted from Schalkwyk and Fanty, 1996).
Predictive Coding. The output is a phoneme probability vector in which each phoneme is assigned a probability indicating the likelihood that a given input frame belongs to that phoneme. Some of the strengths of the connectionist approach are (Caudill and Eutler, 1990; Fodor and Pylyshyn, 1988; Rabiner and Juang, 1993): Connectionist architectures are self-organizing, in that they can be made to generalize from training data even though they have not been explicitly “instructed” on what to learn. This can be very useful when dealing with linguistic phenomena which are not well-understood-when it is not clear what needs to be learned by a system in order for it to effectively handle such a phenomenon. Connectionist architectures are fault tolerant, due to the distributed nature of knowledge storage. Specifically, as increasing numbers of their components become inoperable, their performance degrades gracefully/gradually . The weights of a connectionist architecture can be adapted in real time to improve performance. Due to the non-linearity within each computational element, connectionist architectures are effective in modeling non-linear transformations between inputs and outputs.
24
BILL MANARIS
Some of the weaknesses of the connectionist approach are: 0
0
0
Once a connectionist system has been trained to handle some linguistic (or other) phenomenon, it is difficult to examine and explain the structure or nature of the acquired knowledge. In many cases, this is not detrimental; however, it is possible for a connectionist system to learn the wrong type of knowledge, especially if the training set is not well-developed or well-understood.l5 It is possible for a system to be over-trained and thus diminish its capability to generalize-only the training data can be recognized. Due to their massive parallelism, and their usual implementation on nonparallel architectures, connectionist systems may be ineffective from a runtime complexity perspective for many real-time tasks in humancomputer interaction.
Similarly to stochastic approaches, connectionist models are very effective in addressing NLP problems in which traditional symbolic models are ineffective. Although their potential is still being explored, and thus new techniques and applications are being developed, they have already proved to be a valuable tool for NLP in human-computer interaction.
4.4
Hybrid Approach
As seen in the previous sections, each of the above approaches has advantages and disadvantages; accordingly, each approach has advocates and critics, especially on theoretical and philosophical grounds. However, when dealing with natural language systems for human-computer interaction, theoretical debates are not of much relevance unless they contribute to one’s understanding of the applicability, or utility, of a particular approach with respect to developing an effective human-computer interface. Consequently, researchers have begun developing hybrid architectures which take advantage of the relative strengths of each approach, in an attempt to use “the best tool for the job.” One example of the hybrid approach is the DISCERN system which combines symbolic and subsymbolic (connectionist) techniques to process script-based narratives. Specifically, it reads short stories, stores them in episodic memory, generates paraphrases of the stories, and answers questions related to these
’’
One such example is a neural network developed at the Stanford Research Institute which was trained to detect the presence of tanks in photographs. Although the system was successful when presented with testing data derived from the same batch of photographs as the training set, it performed badly otherwise. Eventually, it was discovered that the system had learned to recognize other characteristics of the data set, such as the differences in light intensity and density; it turned out that all photographs in the training set containing a tank had been taken in the morning, whereas the non-tank ones had been taken in the afternoon (Clark, 1993; p. 41).
NATURAL LANGUAGE PROCESSING
25
stones. DISCERN employs subsymbolic modules to perform each of these subtasks. At a low level, the system is connectionist in nature; however, at a high level, it is symbolic in that its modules are connected using symbolic information structures, such as scripts, lexicon, and episodic memory (Miikkulainen, 1993, 1994). Another example is the SCREEN speech understanding system. It combines connectionist and symbolic techniques to perform robust analysis of real-world spoken language (Wermter and Weber, 1996). Similarly, SpeechActs combines off-the-shelf continuous speech recognizers (employing stochastic or connectionist models) with symbolic modules performing syntactic, semantic, and dialog analysis (Martin ef al. 1996). Finally, the SUITE architecture integrates speech recognition and natural language processing components (Manaris and Harkreader, 1997). This is a generic architecture for speech-understanding interfaces to interactive computer applications. It is currently based on the CSLU-C architecture for speech recognition and the NALIGE natural language interface architecture (Manaris and Dominick, 1993; Schalkwyk and Fanty, 1996). It uses connectionist techniques for phoneme identification, stochastic techniques for creating an N-best sentence hypothesis, and symbolic techniques for additional linguistic analysis of the input. Specifically, it consists of modules which perform acoustic, phonetic, lexical, syntactic, semantic, and pragmatic processing as follows (see Fig. 4): 0 0
0
0
0
a feature extractor converts the speech signal to a set of feature vectors; a phoneme probability estimator uses a connectionist model to produce a phoneme probability matrix, which approximates the probability of a feature vector being part of a given phoneme; a lexical analyzer employs a Viterbi search to determine the N-best paths through the phoneme probability matrix; this search is structurally driven to focus only on “valid” phonemic transitions, thus enhancing accuracy and efficiency; an augmented semantic grammar (ASG) parser identifies the sequence of words that could possibly occur at a given point in a ‘‘valid’’ input, enforce pragmatic constraints, and generate semantic interpretations; a code generator converts semantic interpretations to commands to be passed to the underlying system.
Additionally, the architecture includes an error handler, a knowledge-base manager and a system driver. In summary, hybrid techniques utilize the strengths of symbolic, stochastic, and connectionist approaches in an attempt to (a) minimize the human effort required for linguistic model construction, and (b) maximize the flexibility, effectiveness, and robustness of NLP applications for human-computer interaction.
26
BILL MANARIS
FIG. 4. SUITE speech understanding interface architecture.
5. Knowledge and Processing Requirements According to Feigenbaum (1996), NLP programs exist in one extreme of a continuum which he calls, the “What-to-How” software spectrum.I6 On one extreme of this continuum-the “How” or procedural side-we find general-purpose computing devices. At this level, knowledge is described in terms of binary code (data and instructions) to be executed by the underlying hardware whose processing capabilities are theoretically equivalent to universal Turing machines. On the other extreme of the continuum-the “What” or declarative side-we have the user who wishes to express hisjher goals and needs through natural communicative modalitics such as gesture, speech, and body language. The history of software evolution is rnarked by specific milestones in the attempt to bridge the gap between the “How” arid “What” ends of this continuum. Such milestones include assembly languages, high-level languages, software development environments, specification languages, intelligent agents, and domain-specific expert systems. The next milestone in this continuum ‘6Actually, Feigenbaum focuses on general A1 programs. However, assuming that NLP is an AI-complete problem, his ideas hold in the NLP realm.
NATURAL LANGUAGE PROCESSING
27
is probably intelligent user interfaces which encapsulate knowledge about the domain and the user’s communicative capabilities/preferences-thus bringing software closer to the “What” side of the continuum.
5.1
Computational Issues
One of the early influential figures in the field of linguistic analysis is hlIT’s Noam Chomsky (1957, 1965). His theories depended on a rigorous approach to studying language developed by earlier researchers, but went further by deriving a collection of grammars which describe the structural relations which are acceptable within language (Harris, 1985). Chomsky explains that a generative grainmar is a system of rules that in soine explicit and well-defined way assigns structural descriptions to sentences (Chomsky, 1965, p. 8). This implies that generative grammars offer a formal framework to be used in implementing computer systems which perform syntactical analysis on statements of a specific language. Additionally, Chomsky classified languages into four categoriey according to the restrictions imposed on the form of the actual grammar describing them, namely recursively-enumerable, context-sensitive, context-free, and regular languages. These restrictions reflect the descriptive power of the corresponding grammars. Moreover, these restrictions reflect the computational power needed by a computer system using such a grammar to perform syntactic analysis on a specific language. Analytically, the most powerful grammars, i.e., grammars describing recursively enumerable languages, require the power of a universal Turing machine to be interpreted. Grammars describing context-sensitive languages require the computational power of a linear-bounded automaton (a variation of a Turing machine constrained by the fact that the amount of storage which is available for its processing needs is finite).17 Grammars corresponding to context-free languages need the processing power of a pushdown automaton. Finally, regular grammars can be handled by a finite-state machine. Chomsky claimed that due to the recursive-embedding nature of clauses, natural language cannot be modeled by finite-state machines. This clairn has been subsequently challenged by other researchers such as Blank (1989), Marcus (1980), and Reich (1969) (see Section 5.4.2). Incidentally, Chomsky developed a formalism equivalent to a universal Turing machine, namely zrransforinational granznzars, which can be used to model natural language. However, although this formalism is appropriate for generating subsets of natural language, it is extremely inefficient for practical natural language analysis (Woods, 1970). Actually, the amount of storage available to a linear bounded automaton is linearly proportional to the storage occupied by its original input, as opposed to the infinite storage capacity characterizing universal Turing machines (Moll et al., 1988).
28
BILL MANARIS
It is important to remember that any computational model of natural language phenomena will eventually have to be communicated to and executed by a computing device. Therefore, assuming that Church’s thesis holds,‘* any theory or representation formalism that is not formally equivalent to a universal Turing machine may fall short of exploiting all the power of a computing device in its attempt to perform NLP; this might have considerable implications with respect to the potential of any NLP theory. Wegner (1997) discusses a thought-provoking alternative model of computation based on interaction, namely interaction machines, that is more powerful than Turing machines. Specifically, he argues that any system that allows for interaction is capable of exhibiting richer behavior than a Turing machine. That is the “assertion that algorithms capture the intuitive notion of what computers compute is invalid” (p. 83). This supports claims of certain researchers that natural language could be effectively (if not completely) modeled by context-free, or even regular language frameworks (Blank, 1989; Marcus, 1980; Reich, 1969)--especially if such models can be trained through interaction. Actually, such results have contributed to empirical NLP applications in the late 1980s and 1990s based on text or speech corpora, finite-state-machine modeling frameworks, such as HMMs, and neural networks (see Sections 4.2 and 4.3).
’’
5.2
Understanding Natural Language
Let us for a moment consider a hypothetical dialogue taking place between a human and a computer system. This dialog is in the flavor of Apple’s “Knowledge Navigator” vision of the future (Lee, 1993):
: I am getting ready to quit for today. : I understand. : Please do not delete any of the temporary files I have created, as they contain information that may be useful later for the Advances in Computers article that you and I are working on. : I understand. Actually, assuming that the underlying computer system did not, by default, delete any temporary files, this “understanding” system could be effectively implemented by the following LISP code (Firebaugh, 1988): (while
’*
(not ( n u l l ? (read))) ( d i s p l a y “I understand”) (new l i n e ) )
Church’s thesis proposes that the intuitive notion of what is computable corresponds to the formal notion of computability as expressed by Turing machines (Lewis and Papadimitriou, 1981; Wegner, 1997). This has a strong intuitive appeal as it is through interaction that humans acquire natural language.
’’
NATURAL LANGUAGE PROCESSING
29
Obviously, no understanding is taking place here. Nevertheless, this system contains knowledge about natural language that has been implicitly programmed into it by its developer. For example: 0 0
0
In a dialog there exist two participants. An information exchange is taking place between these participants. Actually, this rule does not specifically appear in the above program; it is nevertheless one of the rules known to the designer of the system and thus could be claimed that this knowledge is encapsulated in the design choices made during the system’s development. Once the first participant has completed a statement, the other may claim that (s)he understands what is being said, even if that is not the case.
One of the philosophical questions which naturally arise in the context of discussing knowledge and processing requirements for NLP systems is “what does it mean to understand natural language?” (Winograd, 1980). From a philosophical perspective, it can be claimed that this system is not much different from state-of-the-art NLP systems, since such systems are also not capable of doing anything more than they have been programmed to do. Hofstadter (1979) devotes much discussion to this and related issues. For instance, given the input: Margie was holding tightly to the string of her beautiful new balloon. Suddenly, a gust of wind caught it. The wind carried it into a tree. The balloon hit a branch and burst. Margie cried and cried. (Rumelhart, 1975, p. 211)
Hofstadter points out that an NLP system could never truly understand what is being said “until it, too, has cried and cried” (p. 675). Nevertheless, considering systems like ELIZA (see Section 5.4.1.1), which under certain circumstances could pass the Turing test of intelligence,’’ and the simplistic keyword matching strategies they employ (without any formal representation of syntax, semantics, or pragmatics), how could we possibly expect to distinguish between a system that understands natural language (if such as a system may ever exist) from one that does not? From a human-computer-interaction perspective, which is the one adopted in this article, such questions are mostly of importance to artificial intelligence and cognitive science researchers; what really matters is the end-result, namely the effectiveness, learnability, user-friendliness, and functionality of the user interface which employs natural language models.
*’
20 The Turing test, proposed by Alan Turing (1950), requires a human interrogator to communicate with a computer via a teletype. The interrogator is not told whether (s)he is communicating with a computer or another human; if the interrogator cannot tell the difference, then the computer has passed the test (Russell and Norvig, 1995; p. 5 ) . The interested reader may also look into the annual Loebner Pnze Competition, a formal effort to locate a machine that can pass the Turing test (Epstein, 1992). McCorduck (1979) describes how an ELIZA-like program “participated” in a lengthy, intimate conversation with an internationally respected computer scientist, as cited in Firebaugh (1988),p. 223.
30
BILL MANARIS
In general, human-defined models are at best approximations of the natural phenomena they represent-given the limitations imposed on our modeling capabilities by our intellectual capacity and our senses (or lack thereof). Since NLP systems incorporate human-defined models of natural language-a natural phenomenon-we should expect to find at best an approximation to understanding. From a computer science perspective, one possible definition to what constitutes “understanding” of natural language is the following: A system “understands” natural language, f , in response to some input, it creates a conceptual structure corresponding to that input, updates an existing conceptual structure, or makes an appropriate modification to a knowledge base. Obviously, this definition excludes minimalistic systems such as the one presented above. Additionally, it makes only a cursory reference to “correctness” which is an essential quality of algorithms in computer science.22 Finally, it introduces a new question, namely: “what type of knowledge would we expect to find in an NLP system’s conceptual structures, or its knowledge base?” This question is addressed in the next section.
5.3
Natural Language Knowledge Levels
The success of any NLP system is highly dependent on its knowledge of the domain of discourse. Given the current state of the art in NLP models, this knowledge may be subdivided into several levels. There exist different schools of thought, but, in general, researchers agree that linguistic knowledge can be subdivided into at least lexical, syntactic, semantic, and pragmatic levels. Each level conveys information in a different way. For example, the lexical level might deal with actual words (i.e., lexemes), their constituents (i.e., morphemes), and their inflected forms. The syntactic level might deal with the way words can be combined to form sentences in a given language. One way of expressing such rules is to assign words into different syntactic categories, such as noun, verb, and adjective, and specify legal combinations of these categories using a grammar. The semantic level might deal with the assignment of meaning to individual words and sentences. Finally, the pragmatic level might deal with monitoring of context/focus shifts within a dialog and with actual sentence interpretation in the given context. Table 1 shows one commonly used classification which attempts to be as thorough as possible (given our current understanding of the language phenomenon) by accounting for acoustic, as well as general world knowledge (Akmajian et al., 1990; Allen, 1994; Manaris and Slator, 1996; Sowa, 1984). In this classification, each level is defined in terms of the declarative and procedural characteristics of knowledge that it encompasses.
22
A quality that is not always provable, and, in many cases, not even attainable.
NATURAL LANGUAGE PROCESSING
5.4
31
Classification of NLP Systems
As seen in the previous section, natural language can be viewed at different levels of abstraction. Based on the application domain, NLP systems may require only subsets of the above knowledge levels to meet their application requirements. For example, a machine translation system, such as the Eurotra prototype which focuses on documents dealing with telecommunications, and covers nine languages of the European Community, namely Danish, German, Greek, English, Spanish, French, Italian, Dutch, and Portuguese (Arnold, 1986; Maegaard and Perschke, 1991), may require only knowledge levels 3 to 7 (or possibly 8). Similarly, a speech recognition system, such as Dragon Systems' Naturally Speaking and Kurzweil Voice, may require only knowledge levels 1 to 5, although it would benefit from having access to knowledge in levels S to 7 (Manaris and Slator, 1996). Finally, a speech understanding interface to UNIX may use knowledge levels 1 to 6 to produce a semantic representation of a given input, e.g., d e Ie t e - f i I e ( " x 1, and then knowledge level 8 to convert that semantic interpretation to the corresponding realization in the underlying command language, e.g., " r m -i x " . Such an interface could benefil from having access to specific knowledge about dialog in the context of communicating with an operating system (see Figure 5). An example of such an interface is UNIX Consultant (see Section 5.4.4.1). In practice, it may be hard to classify NLP systems based on the types and levels of linguistic knowledge they encapsulate. For example, even for primitive NLP systems, such as the one seen in Section 5.2, it might be argued that they contain implicit knowledge from various knowledge levels. "
TABLEI KNOWLEDGE LEVELSIN NLP MODELS 1. Acoustic/prosadic knowledge: Rhythm and intonation of language; how to form phonemes. Phoriologic knowledge: Spoken sounds; how to form morphemes. 3. Morphologic knowledge: Sub-word units; how to form words. 4. Lexical knowledge: Words; how to derive units of meaning. 5. Synfactic knowledge: Structural roles of words (or collection of words); how to form sentences. 6. Senianfir knowledge: Context-independent meaning; how to derive sentence meanings. I . Discourse knowledge: Structural roles of sentences (or collections of sentences); how to form dialogs. 8. Pragmatic knowledge: Context-dependent meaning? how to derive sentence meanings relative to surrounding discourse. 9. World knowledge: General knowledge about the language user and the environment, such as user beliefs and goals; how to derive belief and goal structures. Currently, this is a catch-all categoly for linguistic processes and phenomena that are not well understood yet. Based on past evolutionary trends, this knowledge level may be further subdivided in the future to account for new linguistic/cognitive theories and models. 2.
32
BILL MANARIS
Utterance
m
Phonemes
d i 1 & t ’ # f i l# e k s
1 J [Morphological] J 1( J I I
Delete file x
J.
“delete” “file” “x”
Morphemes
Tokens
J.
(“delete” VERB)(“file” NOUN) (“x”
ID)
Syntactic
I I
Syntactic Structure
VERB
NOUN
ID
“delete”
“file”
“x”
I
I
I
$. Semantic Interpretation
1 Pragmatic 1 Pragmatic Interpretation
delete-file( “x” ) I
f rm -i x
FIG. 5. Knowledge levels in NLP systems
Nevertheless, it may be beneficial to examine NLP systems based on the depth of explicit linguistic analysis they perform, as this may provide clues on their strengths and weaknesses. In the remainder of this section, we will attempt to classify a few representative symbolic modeling methods according to this linguistic analysis classification scheme and provide examples of relevant NLP systems.
5.4.1 Lexical Analysis Systems Lexical analysis systems, also known as keyword matching systems, employ a pattern matching mechanism designed to recognize or extract certain predefined
NATURAL LANGUAGE PROCESSING
33
keywords from the user input. Compared to the minimalistic system of Section 5.2, which implicitly encapsulates a limited amount of natural language knowledge, it can be claimed that lexical analysis systems directly understand a small subset of natural language. These systems can be compared to a human visiting a foreign country who can recognize only a few words and phrases from the native language and can only respond using a small number of previously memorized sentences; additionally, once a certain keyword has been recognized, (s)he may perform some complex action which is appropriate in the given context, such as following the directions in the sentence: “ To find the bank go left, right, and then left.” In this example, the highlighted words are the actual keywords which convey the most important information.
5.4.7.7 EL/ZA. A program using a lexical approach to natural language understanding is ELIZA. This program was developed by Weizenbaum at MIT in an attempt to study issues relevant to natural language communication between humans and computers (Weizenbaum, 1966, 1967). ELIZA assumes the role of a Rogerian psychotherapist and, under many circumstances, manages to misguide the user into believing that it actually understands all that is being said. Its h o w l edge base consists of a collection of predefined patterns against which the user’s input is compared. Each of these patterns has a generic template associated with it, which is used in constructing ELIZA’s response. However, after some interaction with this system the user starts realizing the limits of this program’s intelligence. Incidentally, Hofstadter (1979, p. 621) claims that humans get bored interacting with such an “intelligent” system not when they have exhausted its repertoire of behavior, but when they have intuited the limits of the space containing this behavior.
5.4.7.2 NLDOS. One NLP application that performs its function relying on strictly lexical analysis techniques is NLDOS (Lane, 1987). NLDOS is a natural language interface to operating systems. The main motivation for building this interface was the syntactic differences between the VAXIVMS and the MSDOS command languages, which the designer was using interchangeably. He states (p. 261): After changing default directories several dozen times on a DEC VAX with the s e t d e f command, I invariably type the same command to change directories on my IBM PC. Then, I throw a mental “Read my mind!” at the machine and edit the command to c d (short for c hd ir).
An additional consideration was the syntactic inflexibility of the MS-DOS b a c k up command, especially if the user wants to include subdirectories in the
actual request. NLDOS was implemented in PROLOG using simple lexical rules to identify
34
BILL MANARIS
tokens in the natural language input, such as disk drive, filename, and date specifications. This system’s task coverage is very limited and its linguistic knowledge is organized in an ad hoc fashion. This is because the interface development was mainly motivated by the designer’s personal difficulties and specifically tailored with respect to his needs. For instance, Lane decided that, for implementation simplicity, certain punctuation marks within the natural language input, such as colons, periods, and hyphens, may only appear inside disk drive, filename, and date specifications, respectively. NLDOS allows for flexible matching to the point where careless or obtuse wording may “confuse” the system into executing a command different from the one intended. For instance, assuming that the system was trained to recognize any pattern of the type (* d e L e t e * < F I L E S P E C >) as a request to delete file , where * matches zero or more words and stands for a filename token, an input such as “Do not delete file REPORT.DOC” would have the opposite effect. Actually, during the testing phase, Lane reports that the system inadvertently erased the complete contents of one of his computer’s disk drives. This is a general problem with systems that perform flexible (or robust) parsing.
5.4.2 Syntactic Analysis Systems Syntactic analysis systems, in addition to recognizing certain input keywords, attempt to derive a unique structure, namely a parse tree, which directly corresponds to the syntactic information encapsulated in the input sentence.
5.4.2. ‘I Augmented Transition Networks. In “Transition Network Grammars for Natural Language Analysis”, Woods (1970) presents a formalism called augmented transition networks (ATNs). This formalism is actually a computational mechanism equivalent in power to a universal Turing machine. An ATN is not a natural language understanding mechanism, but it may be used to define a process which can recognize subsets of natural language. 23 An ATN is similar to a finite-state machine, in the sense that it consists of a set of nodes, corresponding to the states of the computational process, and a set of named directed arcs connecting these nodes, corresponding to the input symbols which may cause specific transitions between computational states. Additionally, ATNs have the following features: 0
23
arcs that may be named with state names, thus allowing for recursive invocation of complete ATNs, including the caller ATN;
The ATN formalism can be thought of as a special-purpose high-level language.
NATURAL LANGUAGE PROCESSING
0
0
0
35
a set of registers each of which may be assigned to constituents of the parse tree being built; arbitrary tests that may be associated with any given arc; these tests are built in terms of registers and/or input symbols; a set of actions that may be associated with any given arc; these actions provide the mechanism to incrementally construct the appropriate parse tree.
The main advantage of ATNs resides in the immense generative power they offer to the NLP application designer. Consequently, several NLP systems have been implemented using this formalism (either exclusively, in conjunction with other programming mechanism), such as LUNAR (Woods, 1973), SOPHIE (Brown and Burton, 1975), GUS (Bobrow et al., 1977), and LIFER (Hendrix et al., 1978). Some of these systems will be examined in Section 5.4.3, since, in addition to performing strictly syntactic analysis, they attempt to construct a semantic interpretation of the natural language input. Due to their non-deterministic nature, ATNs tend to be very expensive from a computational point of view. This problem is intensified when dealing with large grammars, since, in such a case, input sentences tend to appear locally highly ambiguous, thus increasing the amount of backtracking that is necessary to successfully parse them.
5.4.2.2 PARSIFAL. A system employing an interesting approach to NLP which is diametrically opposed to the non-deterministic approach characterizing ATN based systems, is Marcus’s PARSIFAL (Marcus, 1980; Winograd, 1983). This system was built to demonstrate that there is a theoretical significance to the determinism hypothesis. This hypothesis states that given certain well-defined mechanisms, syntactical analysis [may be yerfonned] deterministically. (Winograd, 1983, p. 410.) Most natural language understanding systems employ strategies intended to explore existing alternatives in an attempt to handle apparent syntactic ambiguity. For example, consider the sentences “Have the boxes in the boiler room thrown away!” and “Have the boxes in the boiler room been thrown away‘?“ These two sentences appear structurally similar until the words “thrown”, and “been”, respectively, come into focus. The idea on which PARSIFAL is based is that one may procrastinate assigning a syntactic structure to some given constituent until encountering an input word which resolves any existing ambiguity. This system uses a small, fixed-size buffer in which constituents are stored until their syntactic functions can be determined. Although the resulting mechanism allows for extremely efficient parsing, one major drawback is that if the amount of look-ahead needed to resolve the apparent ambiguity is greater than the buffer size, then the system may choose an incorrect syntactical structure (or simply fail to produce one) (Winograd, 1983).
36
BILL MANARIS
Moreover, once a disambiguating word is read from the input, there is no way of retracing earlier steps in an attempt to correct a wrong decision (thus resulting in a parsing failure, although there may exist at least one possible syntactic interpretation). Nevertheless, the sentences on which PARSIFAL fails to assign an appropriate syntactic structure are exactly those on which humans have trouble, also known as garden path sentences (Akmajian et al., 1990; Blank, 1989). A classic example of such a sentence is “The horse raced past the barn fell down.” The part “raced past the barn” modifies the noun “horse.” The alternative interpretation, which happens to be the one chosen by most human readers, is to initially treat “raced” as the main verb of the sentence. Although such sentences may not be compatible with the linguistic competence of some English speakers, they are nevertheless treated as grammatical by mainstream linguists (Akmajian et al., 1990, p. 371). PARSIFAL demonstrates that we may be capable of designing systems which understand a significant part of natural language, and whose formal power is equivalent to that of a finite-state machine. In addition to Marcus, several other researchers follow a similar deterministic, finite-storage approach to natural language understanding. These results are of major importance, since they have set the stage for stochastic approaches based on HMMs-a probabilistic version of a finite-state machine (see Section 4.2).
5.4.3 Semantic Analysis Systems Semantic analysis systems differ from lexical and syntactic analysis systems in their ability to perform processing at the semantic level, in addition to the lexical and syntactic levels. This ability is a direct result of their encompassing knowledge related to the functionality or tasks associated with the application domain (semantic domain). These systems can be further subdivided according to the form in which they represent this semantic knowledge.
5.4.3. I Implicit Semantic Representation. Systems acting as natural language interfaces to various interactive computer systems are classifiable under this category if they represent their semantic domain in terms of the underlying system’s command language. Examples of such interfaces include LUNAR (Woods, 1973), and LIFER (Hendrix et al., 1978). However, command language syntax is not as suitable for representing the semantic domain of an application, as a predicate or lambda calculus based knowledge representation language. This is because the former, since it is equivalent to a procedural description, disallows representation of semantic domain meta-knowledge which could be used in error detection, reasoning and recovery operations. Consequently, NLP systems employing implicit semantic representation are incapable
NATURAL LANGUAGE PROCESSING
37
of, or extremely ineffective in producing “intelligent” error messages regarding conceptual errors committed by the user.
5.4.3.2 Explicit Semantic Representation. Systems under this category represent their semantic domain in terms of an intermediate level, which facilitates the explicit identification of semantic domain elements known to the system. More analytically, the knowledge base of such a system encompasses declarative (or procedural) knowledge regarding the conceptual entities known to the system, the actions which manipulate these entities, and the relationships that exist among them. Examples of such NLP systems include SHRDLU (Winograd, 1983), and UNIX Consultant (Wilensky et al., 1984, 1988).
5.4.3.3 LUNAR. One of the first systems to be implemented based on the ATN formalism is Wood’s LUNAR system (Woods, 1973). LUNAR is a natural language interface to a database system built to translate a subset of English into the corresponding database queries. It interfaces with a database containing data about the mineral samples obtained from the Apollo-1 1 mission to the moon. LUNAR consists of an ATN used to store natural language knowledge, a lexicon containing approximately 3500 words, and a target query language based on predicate calculus. An example of the translation process performed by this system follows. (Do any samples
have g r e a t e r
than
13 p e r c e n t aluminum)
9 (TEST
( F O R SOME X I (CONTAIN ‘ X I
/ (SEQ SAMPLES) :T; (NPR* X 2 / (QUOTE A L 2 0 3 ) ) (GREATERTHAN 13 P C T ) ) ) )
5.4.3.4 Augmented Semantic Grammars. The semantic grammar formalism was invented by Burton for use in the SOPHIE NLP system (Brown and Burton, 1975; Burton, 1976). These grammars are equivalent in descriptive power to context-free grammars. The only difference is that in semantic. grammars the syntactic and semantic aspects of the linguistic knowledge have been incorporated into a single framework. Although, this formalism has been effectively used for knowledge representation in limited application domains, it is incapable of dealing effectively with context-sensitive linguistic phenomena, such as parallel-association constructs, e.g., “Peter, Paul, and Mary like Bach, Debussy, and Stravinsky, respectively.” Hendrix et al. (1978) developed augmented context-free grammars (ACFGs), wluch augment semantic grammars by allowing for actions to be associated with each production rule in the grammar. These actions are equivalent to the actions associated with arcs in the ATN formalism. The semantics of associating an action with some production rule is that the specified action is executed if and only if the corresponding production rule is successfully matched against a given
38
BILL MANARIS
input constituent. This augmentation is shown to make ACFGs equivalent to a Turing machine in terms of generative/descriptive power. Although this formalism can effectively deal with various linguistic phenomena, it lacks conditions to be tested prior to examining a production rule. This deficiency results in reduced parsing efficiency. 24 Manaris and Dominick (1993), introduced an additional augmentation to the ACFG formalism, namely the association of preconditions with production rules in the grammar. The resultant formalism, namely augmented semantic grammars (ASGs) has been shown to be powerful enough to recognize any language recognizable by a computing device, maintain a degree of perspicuousness, and provide an effective mechanism for controlling combinatorial run-time behavior in parsing. ASGs have been used to develop a variety of NLP applications. Examples include a natural language interface to operating systems (see next section); a natural language interface for Internet navigation and resource location; a natural language interface for text pattern matching; a natural language interface for text editing; and a natural language interface for electronic mail management (Manaris, 1994).
5.4.3.5 Natural language Interfaces to Operating Systems. Manaris (1994) presents a natural language interface to UNIX and its subsequent porting to MS-DOS, VAX/VMS, and VM/CMS. Although it has some pragmatic-level knowledge incorporated into its knowledge base, it is best classified as a semantic analysis system as it does not maintain any knowledge on dialog structure. This system has been developed as a demonstration of the NALIGE user interface management system which can be used to construct natural language interfaces to interactive computer systems (see Section 6.4.1). The interface to UNIX handles a variety of user tasks including: 0 0 0
File Tasks: copy, delete, display, edit, print, and send via e-mail Directory Tasks: create, delete, list contents, and display name Other Tasks: change user password, display user information, display users on system, list e-mail messages, and send e-mail messages.
Figure 6 shows an excerpt from the augmented semantic grammar included in this interface.
5.4.4 Pragmatic Analysis Systems Pragmatic analysis systems improve on the natural language understanding capabilities of semantic analysis systems. This is because the main objective of 24 Conditions to be tested as well as actions to be executed are part of other formally equivalent, but potentially more efficient formalisms such as augmented phrasal structure grammars (Sowa, 1984) and ATNs (Woods, 1970).
39
NATURAL LANGUAGE PROCESSING
<MAKE-DIRECTORY> [CONSTRUCT-VI I [CREATE-V1 I [MAKE-Vl 1 [MKDIR-Vl I [OPEN-V] 1 [PRODUCE-VI
:
<MAKE-DIRECTORY-SPEC> <MAKE-DIRECTORY-SPEC> <MAKE-DIRECTORY-SPEC> <MAKE-DIRECTORY-SPEC> <MAKE-DIRECTORY-SPEC> <MAKE-DIRECTORY-SPEC>
I I . (create-directory 1 . (create-directory
II II : 1 1I
’
(create-directory (create-directory (create-directory (create-directory
:directory :directory :directory :directory :directory :directory
, (L‘ 2 ) ) , (T.’ 2)) , (1’ 2 , l\r 2 ,(I7 2 ,(I‘ 2
<MAKE-DIRECTORY-SPEC> : [*DIR-NAME*] 1 1 (V 1 ) I [DIR-NI [*DIR-NAME*] 1 1 ( V 21 1 [DIR-NI [NAMED-REF1 [*DIR-NAME*] 1 1 (V 3) 1 IDIR-Nl [REF-Vl [*DIR-NAME*] I (V 3 ) [*DIR-NAME*l [AS-REF1 [DIR-Nl I (V 1 ) [NEW-ADJI [DIR-N1 [*DIR-NAME*] 1 1 (V 3 ) [A-INDEF] [NEW-ADJ] [DIR-N] [*DIR-NAME‘] 1 1 ( V 4) 1 [A-INDEF] [DIR-N] [NAMED-REF1 [*DIR-NAME*] 1 [A-INDEF] [DIR-N1 [CALLED-Vl L*DIR-NAME*l 1 (r4;) I [BLANK-ADJ] [DIR-N] [CALLED-V] [*DIR-NAME*] 1 I (V 4 ) I [A-INDEF] [NEW-ADJI [DIR-NI [CALLED-V1 [*DIR-NAME*] 1 1 ( V 5 ) [A-INDEFI [DIR-Nl [AND-CONJI [CALL-VI [IT-N1 [*DIR-NAME*l 1 1 (V 6 ) [ME-PN] [AN-DEF] [EMPTY-ADJI [DIR-Nl [CALLED-V] [*DIR-NAME*] 1 1 (V 6 1 I [A-INDEF] [BRAND-ADVI [NEW-ADJ] [DIR-N] [CALLED-V] [*DIR-NAME*l I I ( V 6 1
1
I
1
1
;
FIG.6. ASG excerpt from UNIX natural language interface (Manaris, 1994) (reprinted with permission of World Scientific).
these systems is to participate in extended dialogues with users over some specific area of world knowledge. These systems employ a significant subset of world knowledge associated with a given application, in order to facilitate a deeper understanding of a given natural language input. Specifically, pragmatic analysis systems attempt to derive the deeper meaning or implications of natural language utterances by performing inference on pragmatic discourse elements, such as the goals of dialogue participants, social protocols associated with a given situation, and facts derived from earlier parts of the dialogue/story. For example, consider the sentence “The gas is escaping!” in a. storyunderstanding application. Although the semantics of this sentence is clear, its pragmatics is ambiguous. More specifically, if the sentence is uttered by a chemistry instructor to a student performing an experiment in a chemistry lab, then a pragmatic analysis system might infer the following facts: 1. The student has been careless. 2 . The instructor is displeased. 3. The student may receive a low grade in this lab.
On the other hand, if the dialogue is taking place in a San Francisco building following a major earthquake, the system might infer:
I. The earthquake has ruptured a gas line. 2. There is imminent danger of an explosion. 3. The building has to be evacuated immediately.
40
BILL MANARIS
Pragmatic analysis is necessary for story understanding and discourse analysis. In the last few years, pragmatic analysis has been incorporated to a wide variety of applications including natural language and speech understanding interfaces to a variety of applications including electronic appointment scheduling, battlefield simulation, currency exchange information, electronic calendar, electronic mail, real-time machine translation, Rolodex, stock quote access, voice mail, and weather forecast access (Alexandersson et al., 1997; Busemann et al., 1997; Martin et al. 1996; Moore et al., 1997; Waibel, 1996; Wauchope et al., 1997).
5.4.4.7 UNIX Consultant. UNIX Consultant (UC) is a natural language understanding system whose objective is to advise users of the UNIX operating system (Wilensky e f al., 1984, 1988). This system allows users to obtain information about the usage of commands, such as command language syntax, online definitions of general UNIX terminology, and command-line debugging problem^.^' UC employs a pattern-action formalism for processing of natural language inputs, which is formally equivalent to an ATN. It has been implemented using the LISP and PEARL programming languages. Actually, the objectives of UC are of a wider scope than those of the natural language interface applications discussed in the previous sections. This is because, instead of simply providing a natural language front-end to a particular operating system, UC attempts to improve a user’s understanding of the UNIX environment by participating in a possibly extended question-and-answer type of interactive sessions. Consequently, in addition to natural language understanding, it combines concepts from a number of A1 areas, such as natural language generation, situation planning, and problem solving (Chin, 1983). The rationale for developing UC is to provide naive users (having to phrase some specific request in an unfamiliar command language) with a more attractive alternative than locating a knowledgeable user consultant and/or searching through some esoteric manual. UC employs a natural language analyzer able to produce userfriendly error feedback when faced with ill-formed input. This system incorporates an extensible knowledge base of facts about UNIX and the English language.
5.4.4.2 JANUS-11. Janus-I1 is a speech translator that operates on spontaneous conversational dialog in limited domains (Waibel, 1996). It currently includes vocabularies of 10 000 to 40 000 words and accepts input in English, German, Japanese, Spanish, and Korean. Its output can be any of these languages.
’’
A presentation of a similar natural language understanding system, namely UNIX-TUTOR, appears in (Arienti et a!., 1989).
NATURAL LANGUAGE PROCESSING
41
The system consists of modules for speech recognition, syntactic and semantic parsing, discourse processing (contextual disambiguation), and speech generation. It incorporates HMMs and HMM-neural network hybrid techniques to generate the most promising word hypotheses. For parsing it employs semantic grammars within a pattern-based chart parser (Phoenix) and a stochastic, fragment-based generalized LR* parser. The result is a language-independent representation (an Interlingua) that is used by the generation part of the system to produce a spoken translation of the input in the desired output language. Although the Interlingua approach dates from the early MT days of NLP, it continues to be a very viable approach for the following reasons: 0 0
0
It dissociates the syntactic structures of the input and output languages. It facilitates introduction of new languages to the system, as the linguistic knowledge which is specific to a given language can be contained within only a few modules of the system.26 It allows generating output in any language, including the original input language. This facilitates feedback to the user, since, as the input gets converted to the Interlingua representation it is possible that errors may have been introduced. This approach allows the user to verify that the system has “understood” the input correctly.
Janus-I1 has been used to develop several speech-translation application prototypes including a videoconferencing station with a spoken language interpretation facility, and a portable speech translator running on a portable computer incorporating headphones and a wearable display for output. As systems move up the knowledge-level classification, they acquire more capabilities in handling natural language phenomena. But what are some of these naturaI language phenomena and associated problem areas? The next section addresses these issues.
5.5
Problem Areas
Cercone and McCalla (1986) and Grosz et al. (1987) discuss many isvues that arise in the context of processing natural language. These issues include augmenting linguistic coverage, generalizing parser capabilities and rohustness, incorporating pragmatics and meta-knowledge (knowledge about the domain and the system), devising comprehensive knowledge representation schemes, porting linguistic knowledge, modeling the user, and handling various discourse 26 This point is best illustrated in the context of the Eurotra MT project which covers 72 language pairs (Arnold, 1986; Maegaard and Perschke, 1991); by isolating the language dependencies from as many modules of the architecture as possible, only the language dependent modules have to be re written, as new languages are introduced.
42
BILL MANARIS
phenomena. The latter include reference resolution, ambiguity resolution, handling ellipsis, monitoring user focus, handling incomplete knowledge, representing exceptions, summarizing responses, handling time dependencies, and dealing with hypothetical queries, references to system generated concepts, system knowledge updates, and user goals. Weizenbaum (1976, p. 204) argues that, similarly to Einstein’s ideas on the relativity of motion, intelligence is also meaningless without a frame of reference. In everyday interaction, we provide such frames of reference, based our own cultural, educational and social background and the situation at hand. The same can be argued for language competence and performance. Although it would be wonderful to have a modeling theory and associated algorithms that accounted for and handled the complete spectrum of linguistic issues arising in human-human interaction, this is not necessarily required for achieving effective human-computer interaction. This is because human-computer interaction applications are always developed within specific frames of reference, that is specific application domains. Therefore, one should focus on methodologies for developing models which provide effective linguistic coverage in specific application domains. The next section discusses three methodologies which provide significant insights into this issue.
5.6 Linguistic Coverage Furnas et al. (1987, 1983) discuss issues related to spontaneous language use at the user interface. Although they focus on the selection of natural language words to describe semantic elements (objects, actions) in command language interfaces, their study provides intuitions regarding the potential for unconstrained natural language in human-computer interaction, as well as the development of constrained linguistic models. They indicate that the variability of spontaneous word choice is “surprisingly large,” in that in all cases they studied, the probability that the same natural language word is produced by two individuals at the interface to describe a single semantic entity is less than 0.20. Specifically, they discuss six general schemes for assigning natural language descriptions to semantic elements at the interface. Additionally, they present results from several experiments designed to evaluate each of these schemes with respect to the resultant systems’ performance measured against expected user behavior. They conclude that, out of the six schemes, the following three are the most important: the weighted random one-description-per-object scheme, which they also refer to as the armchair model, the optimized onedescription-per-object scheme, which will be subsequently referred to as the optimal single-description model, and the optimized nzulti-descriptionper-object scheme, which will be subsequently referred to as the optimal multi-description model.
NATURAL LANGUAGE PROCESSING
5.6. I
43
Armchair Model
The armchair model corresponds to the standard approach used in deriving command names and syntactical structures for input languages to interactive computer systems, such as operating environments, and information systems (I’ urnas et al., 1987). The underlying idea is that each semantic element available in the system is associated with a single natural language description. Moreover, this description is chosen by a system expert, usually the system designer, according to his/her personal intuition (“amchair” introspection) as to what constitutes a proper naming convention. The data derived from the conducted experiments suggest that this popular method is highly unsatisfactory. More analytically, it is observed that, if the natural language description of a semantic element known to a system has been derived using the armchair method, untutored subjects will fail 80 to 90% of their attempts to successfully access this semantic element. 27 It should be noted that the experiments conducted by Furnas et. al., concentrate on semantic element descriptions which consist of a single word. However, they suggest that usage of multiword descriptions is certain to result in even lower performance. Clearly, this is highly undesirable for human-computer interaction. Actually, the source of the problem is that: [Tlhere are many names possible for any object, many ways to say the same thing about it, and many different things to say. Any one person thinks of only one or a few of the possibilities. Thus, designers are likely to think of names that few other people think of, not because they are perverse or stupid, but because everyone thinks of names that few others think of. Moreover, since any one person tends to think of only one or a few alternatives, it is not surprising that people greatly overrate: the obviousness and adequacy of their own choices (Fumas et al., 1983, p. 1796).
The usual solution has been for system designers to rely on the fact that, through practice, users will eventually learn this linguistic model-a prescriptive linguistics approach. When the semantic coverage of the system is relatively small, this method works fairly well. Actually, it has been shown that, for small systems, using a semantically unrelated and/or random naming convention has little or no significant effect on initial learning by untrained users (Landauer et al., 1983). But if the system is large and its use is intermittent, this approach is unacceptable from a human factors point of view.
5.6.2
Optimal Single-Description Model
The next alternative is to discard the convenience sampling method of the armchair model, i.e., focusing on the personal taste or intuition of a single individual, ”Fumas er. al. (1987, p. 966) point to similar results reported in other studies
44
BILL MANARIS
and instead attempt to employ a more objective selection method. This method takes into consideration the preferences of numerous subjects and uses them to derive an optimal single description for each element in the system’s semantic domain based on frequency distribution. This approach has been popular in the context of command-line interfaces where there is usually only one way to describe one action (Fumas et al., 1987, 1983). It has been shown that this method approximately doubles the chances of serendipity over the armchair model. That is an untrained user has double the chances in generating the appropriate language understood by the underlying system. Although this is a significant improvement over the armchair method, it is not good enough to be used in practice, since, even with this method, subjects fail to communicate successfully with the underlying system 65 to 85% of the time.
5.6.3 Optimal Multi-Description Model The main problem with the previous models is that they fail to take into consideration the wide variety of language that users spontaneously produce with respect to a specific semantic element. As a result, the performance of the resultant systems with respect to untrained users is highly unsatisfactory. The optimal multi-description model differs from the armchair and optimal single-description models in that it provides the resultant system with an “unlimited‘’ number of aliases describing individual semantic elements (Fumas et ai., 1987, 1983). Specifically, the major characteristics of this model are the following: The system has the capability to acquire over time all descriptions which users may use to refer to a specific semantic element. Data is also collected to identify the alternative meanings associated with each of the above descriptions. In the case of semantic ambiguity, the system presents the user with the list of alternative interpretations. The results of the conducted experiments indicate that there is significant improvement over the previous models. Specifically, when the number of aliases reaches 20, the percentage of success is within 60% to 95%. The significant improvement over the previous models is due to the fact that most descriptions produced spontaneously by users are “rare” in nature, in the sense that there exists a very small probability that other users will choose exactly the same description to refer to a given semantic element. This spontaneous generation of natural language descriptions tends to follow Zipf’s distribution (Zipf, 1949). Specifically, if we plot the logarithm of the frequencies of such descriptions against the logarithm of the corresponding ranks, we produce a straight line having a slope of - 1. The interpretation of this observation is that few linguistic constructs are used very frequently, whereas the vast majority of them are used only rarely. A similar observation is made in the context
NATURAL LANGUAGE PROCESSING
45
of the Brown corpus, a representative corpus of American English, where about half of the word types appear only once-approximately 32 000 out of 67 000 total word types (Marcus, 1994). Due to the complexity of natural language, it is practically impossible to expect that a general NLP system will ever be able to collect a ‘complete’ model of natural language even with respect to a single user. Nevertheless, it could always improve its performance in an asymptotic manner. The optimal multi-description model is consistent with the current success of corpus-based stochastic and connectionist approaches. These approaches, in addition to being statically trained through corpora, which contain a wide variety of linguistic data, are also capable of dynamic adaptation based on the actual linguistic performance of users. An example of this is found in many speech recognition systems which improve their performance by constantly updating their stochastic model of phonetic knowledge during interaction with the user. Lately, it has been shown that corpora may also be utilized in the automatic development of symbolic models; this might lead to results demonstrating that the relative success of stochastic and connectionist approaches, over traditional symbolic ones, may be due to their corpus-based nature and interactive learning capabilities, as opposed to some other inherent characteristic (Marcus, 1994). This possibility is supported by Wegner’s (1997) claim on how ‘all forms of interaction transform closed systems to open systems and express behavior beyond that computable by algorithms’ (p. 83).
5.6.4 Controlled Languages Another approach for providing quasi-unconstrained natural language is through the use of controlled languages (Church and Rau, 1995). This involves restricting the natural language available at the interface in a systematic and easily remembered fashion while simultaneously allowing for some freedom of linguistic expression. These restrictions may be imposed at any level of linguistic knowledge, such as phonetic, lexical, syntactic, or semantic. Actually, in a sense, the functionality of the underlying applications already restricts the pragmatics. Examples include certain speech recognition systems (phonetic, lexical), spelling checkers (lexical), style checkers (lexical, syntactic, semantic). This approach is very successful in developing effective linguistic models for human-computer interaction. This is because the interface can ‘subliminally educate’ users through feedback (Slator etal., 1986),while the linguistic model of the interface gradually adapts to users’ linguistic preferences through appropriate extensibility mechanisms. Of course, these mechanisms are tightly bound to the type of linguistic modeling approach employed, namely symbolic, stochastic, connectionist, or hybrid. The only weakness of this approach is that in some cases, due to the size of the linguistic model, or the deviance of the user from the
46
BILL MANARIS
linguistic model, data collection (adaptation) may continue for a long time before asymptotic performance is approached.28
6. Multimodal Interaction Until the early 1980s, the prevalent interactive style was command entry. However, in the mid- 1980s another user interface paradigm became popular, namely Windows, Icons, Menus, and Pointing (WIMP). This introduced new possibilities, such as direct manipulation, which have resulted in today’s graphical user interfaces. It is now clear that the user interface designer has several building components (interaction styles, input/output devices) available from which to develop an effective interface. In terms of interaction styles these include command-line entry, menus and navigation, question and answer dialogs, from, natural language, and direct manipulation. In terms of devices these include keyboard, display, mouse, trackball, joystick, touch screen, microphone, speakers, video camera, dataglove, datasuite, 3D tracker, and various other types of sensors and actuators (F’riece et al., 1994; Shneiderman, 1998). Naturally, this list is constantly evolving, as it depends on the state of the art in technology for interfacing with human communication channels, namely sight, touch, taste, smell, and hearing. According to Coutaz and Caelen (1991), a multimodal user interface combines various interaction styles, is equipped with hardware for acquiring and rendering multimodal expressions in real time, must select appropriate modality for outputs, and must “understand” multimodal input expressions. This is contrasted to a multimedia interface which acquires, delivers, memorizes, and organizes written, visual, and sonic information, but ignores the semuntics of the information it handles. 29 Additionally, Coutaz and Caelen (1991) identify a taxonomy for multimodal user interfaces as follows: 0
0
0
An exclusive multimodal user interface allows one and only one modality to be used in rendering a given inputloutput expression. An alternative multimodal user interface allows alternative modalities to be used in rendering an input/output expression. A concurrent multimodal user interface allows several inputloutput expressions, possibly in different modalities, to be rendered in parallel.
** This is the case in many current speech recognition systems as their linguistic models make implicit assumptions regarding the users’ educational, cultural, and native-language backgrounds. For instance, users with non-American accents or speech impediments cannot fully benefit from such applications. 29 Another term for multirnodal user interfaces is intelligent multimedia interfaces (Maybury, 1993).
NATURAL LANGUAGE PROCESSING
0
47
A synergistic multimodal user interface allows components of input/output expressions to be rendered in different modalities.
Since natural language is one of the prevailing modalities for human-human interaction, it is a natural candidate modality for effective, user-friendly, human-computer interaction. This has led to an extension of the WIMP paradigm, namely WIMP++. The latter incorporates additional modalities, such as natural language and animation (Hirschman and Cuomo, 1994). However, some HCI researchers indicate that although natural language at the interface has several advantages, it is not a panacea. Shneiderman (1998, p. 294), for example, states that “[pleople are different from computers, and human-human interaction is not necessarily an appropriate model for human operation of computers.” So, the question is “what is the benefit, if any, from having natural language modalities available at the interface?” The next section reports on three studies that address this issue.
6.1 Effects of Natural Language on U s e r Performance Ledgard et al. (1980) report on an experiment conducted at the University of Massachusetts, Amherst, in association with Digital Equipment Corporation. One of this study’s assertions is that an interactive system should facilitate use of familiar, descriptive, everyday words and legitimate English phrases at the interface. The experiment’s objective was to test the above hypothesis and, simultaneously, demonstrate the effects that human engineering can have on commercially available software in terms of human efficiency, performance, and satisfaction. The study involved users with varying degrees of computing experience. Users were divided into inexperienced, familiar, and experienced groups. Additionally, the study utilized two semantically equivalent versions of an interactive computer system-a text editor. The differences between the two versions were confined at the input syntax level. One employed a natural language front-end which accepted flexible, yet constrained natural language-this was the only input modality provided. The other understood the standard computer system syntax which consisted of abbreviated, cryptic keywords and rigid syntactical rules. One half of the subjects wei-e randomly assigned to the natural language equipped system, whereas the other half were assigned to the standard system. Subsequently, the subjects were given a number of tasks to be performed by the assigned computer system. These tasks were compound in nature, in the sense that each required a sequence of actual system commands in order to be accomplished. On all measures, and regardless of user expertise, performance using the natural language front-end proved superior to performance using the regular command language. Finally, they conducted a post-experimental study to test asymptotic
4a
BILL MANARIS
performance. They found that significant performance differences between the two versions remained even after a long period of use. Similarly, Napier et al. (1989) report on a closely related study comparing the performance of novice users using two syntactically different (but semantically equivalent) versions of the Lotus 1-2-3 spreadsheet system. The first version employed a restricted natural language front-end, whereas the second one employed the standard user interface, i.e., command language, and menu selection. They found that there was a clear and consistent advantage in favor of the natural language equipped system. Finally, Pauch and Leatherby (1991) compare the use of discrete speech input against menu selection. They found considerable performance increase (2 1%) in users who had access to both voice selection and mouse, as opposed to those with access only to mouse. Although all the above studies indicate that natural language is beneficial at the interface, one should not assume that this always the case. For instance, Napier et al. (1989) point to other studies that are critical of natural language being the only available modality at the interface. Since natural language is not appropriate for every type of information exchange in human-human interaction, it is not surprising that this may also be the case in certain types of information exchange in human-computer interaction. Obviously, investigation of the application domain, as well as good design of the interface, are essential to user acceptance-regardless of whether the interface incorporates natural language or not. That is developers should not religiously employ a single modality, such as natural language, at the interface, but instead select the best combination among the set of available modalities for building an interactive system.
6.2
Natural Language Widgets
There already exist several popular toolkits and environments, such as XWindow and MS-Windows, which facilitate development of user interfaces by making available a collection of user interface building elements. How could we possibly extend such collections to incorporate natural language building blocks? Hall et al. (1996) introduce the concept of natural language edit controls (NLECs). NLECs are special user interface building elements which facilitate use of natural language for specific input expressions in multimodal user interfaces. NL,ECs are an extension of traditional text entry controls, in that they facilitate rendering a portion of a user input in natural language. When combined with the other traditional controls available at the interface, such as buttons, check boxes, slidebars, and menus, they can result in the most effective interface for a given application. Although NLECs are an excellent idea, they are nevertheless limited in that they deal only with typewritten natural language.
NATURAL LANGUAGE PROCESSING
49
One could extend this concept to a more general natural language widget, of which an NLECs is only an instance. Specifically, a natural language widget is defined as a user interface widget that incorporates a generic NLP engine capable of handling a limited domain of discourse. Natural language widgets may come in different flavors, as they may employ any of the approaches for modeling natural language (see Section 4), may incorporate as many or as few knowledge levels as necessary (see Section 5.3), and may implement any kind of natural language application that is relevant to the task at hand, such as speech recognition, voice identification, word spotting, and spoken or written natural language generation.
6.3 Modality Integration Oviatt and Cohen (1991, p.70) indicate that modalities physically constrain the flow and shape of human language just as irresistibly as a river bed directs the river’s current. . .. Although communication modalities may be less visually compelling than the terrain surrounding a river, it is a mistake to assume that they are less influential in shaping the [information] transmitted within them.
Spoken or written natural language is clearly not appropriate for all tasks at the user interface. This is supported by the fact that human-human interaction includes additional modalities such as pointing, drawing and various other gestures (Oviatt et al. 1997). Specifically, some things can be best pointed at, or drawn, or selected from a list of alternatives. For this reason, natural language needs to be combined with other traditional or non-traditional modalities in order to construct effective user interfaces. This introduces the question “what tasks (or portions of tasks) are best specified through natural language?”
6.3.1 Effective Use of Natural Language Hall et al. (1996) provide a decision procedure for when to employ natural language over deictic controls-controls utilizing a pointing device, such as mouse, pen or finger. Extending, their ideas, in order to accommodate both actions and objects, it appears that natural language is best for input tasks where the set of semantic elements (objects, actions) from which to choose is large, unfamiliar to the user, or not well-ordered; or small and unfamiliar, with no bijective mapping” between that set to a set of familiar elements.
’“A bijective mapping is a one-to-one mapping between two sets which includes all elements from both sets.
50
BILL MANARIS
Figure 7 shows the extended decision procedure for selecting between natural language widgets and traditional deictic controls.
6.3.2 Synergistic Multimodal Interfaces with Natural Language Oviatt et al. (1997) discuss integration patterns of input modalities in the context of synergistic multimodal interfaces that include natural language. They report that multimodal interaction is most frequent for tasks involving spatial location, and somewhat frequent for tasks involving selection. Integration patterns consist of sequential, simultaneous, point-and-speak, and compound rendering of input expressions. In temporal terms, written input may overlap with spoken input subexpressions-with written (pen) input providing location information at the beginning of an expression. For example, in the context of a user interface to an operating system, a user may circle two file icons while saying “delete these”. Additionally, spoken and written modalities supply complementary (as opposed to redundant) semantic information. For instance, a user might type in a file name and say “edit this”. In a different study, Cohen et al. (1997) have found that multimodal interaction makes users generate simpler linguistic constructs than unimodal speech interaction. For example, in order to create a red line between two (x,y) grid coordinates a user might say “create a red line from one three five point eight to four six seven point nine”; whereas (s)he could draw a line on the map while saying “red”.
IF (familiar set) THEN IF (large set) AND (not well-ordered set) THEN use a text entrv control / / user types familiar object ELSE use deictic control
/ / user selects familiar object
ELSE / / unfamiliar IF (exists bijective mapping) AND ((small set) OR (well-ordered set)) THEN use a deictic control
ELSE
/ / user selects familiar object / / system maps to unfamiliar object
use a natural lanauaue widuet
FIG.7. Extended decision procedure for use of natural language widgets (adapted from Hall et al., 1996).
NATURAL LANGUAGE PROCESSING
51
One example of a multimodal interface which includes speech input is CommandTalk (Moore et al., 1997). This is a user interface to the ModSAF battlefield simulator which allows the use of natural language to create forces, assign missions, modify missions during execution, and control various simulation functions, such as map display control. Another example of a multimodal interface that incorporates natural language is QuickSet (Cohen et al., 1997). QuickSet combines pen and speech to manage distributed interactive simulations. It combines widgets (agents) that handle speech recognition, written natural language, and gesture recognition (pen input). Finally, MedSpeak is a multimodal interface for creating radiology reports (Lai and Vergo, 1997). It accepts input via speech (dictation and command modes), mouse, and keyboard.
6.4
User Interface Management Systems
User interface management systems (UIMSs) are environments which facilitate the specification, design, implementation, evaluation, and run-time support of user interfaces (Bass and Coutaz, 1991). Although they have been used extensively in developing graphical user interfaces, only recently have they been used in the context of natural language interfaces. Such systems are extremely significant since they facilitate the development of natural language widgets (see Section 6.2).
6.4.1 NALIGE An example of such an environment is NALIGE (Manaris and Dominick, 1993; Manaris, 1994). NALIGE facilitates the development of natural language interfaces to interactive computer systems through the use of high-level specifications. These specifications describe the linguistic model to be incorporated in the interface in terms of lexical, syntactic, semantic, and pragmatic knowledge. As shown in Figure 8, the NALIGE environment consists of several subsystems, namely: 0 0
0
0
Specification editor modules: These editors provide context-sensitive assistance in the development of the NALIGE input specifications. Specification compiler modules: These compilers accept the input specifications and convert them to an efficient internal representation (declarative and procedural knowledge). Specification integrity-checker module: As the input specifications sometimes have to reference the same entity, e.g., a token, this module performs checks to enforce inter-specification integrity. Subsystem template module: This module contains generic procedural components, such as code templates for a lexical analyzer, a parser, a
52
BILL MANARIS
NALIGE FIG.8. NALIGE architecture (Manaris, 1994) (reprinted with permission of World Scientific).
0
target-code generator, a low-level interface to the underlying system, a knowledge-base manager, and an error handler. Application generator module: This module combines the necessary code templates with the declarative and procedural knowledge produced by the compilers to generate an operational natural language interface.
6.4.2 SpeechActs Another example of such a system is SpeechActs (Martin et al., 1996). This is a user interface management system for developing loosely coupled speech understanding applications. The architecture facilitates the incorporation of commercial speech recognizers as front ends to interface applications. Currently, the system is compatible with the following continuous-speech recognizers: BBN’s Hark (Smith and Bates, 1993), Texas Instruments’ Dagger (Hemphill, 1993), and Nuance Communications (Digalakis and Murveit, 1994).
53
NATURAL LANGUAGE PROCESSING
6.5
Development Methodologies
There exist many different system development methodologies employed in human-computer interaction (Priece et al. 1994). One of the most promising is the Star model (Hix and Hartson, 1993) (see Figure 9). This model encompasses a user-centered approach,31since it provides for constant evaluation of all aspects of system development by users and experts. It stresses the idea that system development activities should follow a flexible order. It facilitates top-down (analytic), bottom-up (synthetic), and iterative-refinement types of development-the latter in tighter, smaller loops than spiral development methods. Finally, it emphasizes rapid prototyping and incremental development (Priece et at., 1994).
Task Analysis/ Functional Analysis
Implementation
I Conceptual Design
T FIG.9. The star model for NLP system development (adapted from Priece e t a / . , 1994).
” The basic idea behind user-centered design is to incorporate end-users throughout the development of a product or application. This is somewhat intrinsic in the process of collecting linguistic data for stochastic or connectionist applications, since users are involved in the development by contrihuting to the linguistic model to be incorporated in the NLP application; however, user involvement needs to extend to every step in the system development process.
54
BILL MANARIS
In the context of developing natural language widgets, the Star model may be adapted as follows: 0
0
0
0
0
Task analysis: Identify the tasks to be performed through the natural language widget. This could be performed in various ways, including study of corpora collected through Wizard-of-Oz technique^.^^ This phase is essential in that it provides information about the functionality that the widget is to provide through its “understanding” of natural language. Linguistic analysis: Specify (or automatically capture) the sublanguage to be modeled. Depending on the expected size of the sublanguage and the knowledge modeling approach (symbolic, stochastic, connectionist, hybrid), this phase might include specification of vocabulary, syntax, semantic elements (entities, actions), or dialog structure. Conceptual design: Identify the internal architecture of the widget in terms of knowledge components, procedural modules, and their interaction. A UIMS can greatly assist during this phase in that it may provide design support based on existing linguistic knowledge and processing modules. Prototyping: Develop an operational prototype of the system. A UIMS can greatly assist during this phase in that it may combine the sublanguage model captured during the linguistic analysis phase with existing knowledge and processing modules to construct a functional prototype. Evaluation: Evaluate the deliverable of any of the above phases. A UIMS can greatly assist during this phase by providing specific benchmarks and automated evaluation tools to be used in formative and summative evaluation. These benchmarks and tools should facilitate utilization of user expertise (possibly through collected corpora) to test and measure the effectiveness or relative completeness of any of the above deliverables.
Additional information on user interface development, including task analysis, design, and prototyping may be found in Day and Boyce (1993); Priece et al. (1994); Shneiderman (1998). Significant work has been carried out in NLP system evaluation (Hirschman and Cuomo, 1994; King, 1996; Moore, 1994b; Pallett et al., 1994; Spark Jones, 1994). However, existing techniques and methodologies require additional development, so that they (a) take into account user tasks and environments, in terms of the influence that these have on system development and performance, and (b) address natural language as one of many available modalities, in the context of multimodal human-computer interaction (Spark Jones and Galliers, 1996). ?’ Wizard-of-Oz techniques employ simulated NLP systems, which collect linguistic data from users. The systems’ natural language understanding capabilities are simulated by remotely located humans.
NATURAL LANGUAGE PROCESSING
55
7. Conclusions The field of natural language processing has entered its sixth decade. During its relatively short lifetime, it has made significant contributions to the fields of human-computer interaction and linguistics. It has also influenced other scientific fields such as computer science, philosophy, mathematics, statistics, psychology, biology, and engineering by providing the motivation for new ideas, as well as a computational framework for testing and refining existing theoretical assumptions, models, and techniques. Finally, it has impacted society through applications that have shaped and continue to shape the way we work and live our lives. But other events have occurred in these fifty years. For instance, it was initially believed that no more than a handful, so to speak, of computers would ever be needed around the world. It was also predicted that enabling these machines with (artificial) intelligence would be only a matter of a few years’ worth of research and development. Both predictions were wrong. They did, however, set the stage for a ping-pong effect felt throughout the history of the field. On one side, enter overzealous critics who interfere with creative thinking and vision, e.g., the ALPAC report and the dampening effect it had on the field’s evolution; on the other side, enter overzealous enthusiasts who make unrealistic, full-or-hype claims and promises, e.g., the translation bureaus which had opened up in big cities promising fully automatic, high-quality MT services-only to close down in a few months (Josselson, 1971). Numerous examples of this effect can be seen throughout the evolution of the field. Assuming that historical patterns can provide insights about the future, it is clear that we need to support visionary, far reaching, yet well-founded research ideas, while keeping our feet on the ground and maintaining a critical view with respect to our intellectual and modeling limitations and their effect on the capabilities and potential of technology. In these fifty years, computers have become ubiquitous. Consequently, the field of human-computer interaction has become extremely important as it focuses on bridging the communicative gap between humans and machines. This is accomplished by studying the nature of human communication and by deriving models which attempt to augment the flexibility, leamability, robustness, and overall habitability of computing tools. It is hoped that results from this field will increase our quality of life by facilitating the seamless integration of computing devices into the fabric of society; computer users will be able to focus more on the things they want to accomplish, as opposed to the actual user interfaces. In terms of natural language in human-computer interaction, we now have several linguistic modeling approaches available, namely symbolic, stochastic, connectionist, and hybrid. Currently, we have a relatively good understanding of the symbolic approach, in that it seems that we have mapped out the limits of the
56
BILL MANARIS
region it best addresses within the NLP problem space. Based on our relatively incomplete understanding of stochastic and connectionist approaches, it appears that these address a different region of the NLP problem space, although some overlap might exist with respect to the symbolic approach. Moreover, symbolic techniques traditionally adhere to the armchair and optimal single-description models, and thus have been generally ineffective from the perspective of users -that is users other than the system developer(s). On the other hand, corpusbased stochastic and connectionist techniques inherently adhere to the optimal multi-description model, and thus have already produced a wide variety of useful speech- and text-based systems. Although corprus-based stochastic and connectionist models appear to be more effective than traditional symbolic ones, they are nevertheless bound to some notion of “statistical average” with respect to users’ linguistic performance, and thus their effectiveness depends on a given user’s proximity to that “average.” Such models are usually capable of adapting dynamically, and thus are considerably more flexible and robust than symbolic ones. Nevertheless, their adaptation effectiveness is also bound to the user’s proximity to the modeled “average.” For instance, users with speech impediments cannot benefit from the latest advances in speech recognition technology, because such technology is geared towards the “average” user-for obvious marketability reasons. Therefore, as we focus on natural language-enabled interfaces dealing with a wide spectrum of user needs and instances of linguistic competence/performance, our attention needs to shift from general corpora, whose development is expensive and error-prone, to effective methodologies for developing specialized corpora. Such methodologies may be incorporated into user interface management systems that facilitate, among other things, effective development of linguistic models for well-defined user groups-a truly user-centered approach. In terms of multimodal interaction, we need to continue research on integrating natural language with other modalities. This has already been recognized by funding agencies around the world, since in the last few years, they began providing support for basic and applied research in human-computer multimodal communication utilizing, among other modalities, speech, text, and images (Strong, 1996). Moreover, we need to focus on techniques for intermodal translation of communicative information. This will be of great benefit in situations were some modalities are temporarily or permanently unavailable, either because of the task at hand, or because of the user’s capabilities (or lack thereof). Specifically, major results have been achieved in the context of converting text to speech; such results have been incorporated into text-reading software utilizing speech synthesis technology. Nevertheless, certain problems still remain, such as dealing with the two-dimensional nature of text (and the ease of scanning it affords), compared to the one-dimensional nature of speech. Such research will
NATURAL LANGUAGE PROCESSING
57
be extremely valuable to users with certain motor and/or visual disabilities in that it will facilitate interaction with various computing or computer-controlled devices. The benefits of related applications are immense, considering the new possibilities they open to users with disabilities in terms of access to information and control/manipulation of immediate and remote physical environments (Muller et al., 1997). In terms of the future, in the short-term, we should experience benefits in various aspects of everyday life from the momentum of the latest research and development efforts, such as dialog-based speech-understanding telephony applications. Actually, several development environments are already beginning to emerge for dialog-based speech processing telephony applications. Additionally, we could see various speech-enabled and voice-related applications appear dealing with security, voice dialing, voice identification, language identification, and language translation (Flanagan, 1994; Wilpon, 1994). As we continue to improve our appreciation of the strengths and weaknesses of linguistic modeling approaches, and our understanding of the nature and use of language, we will become more effective in addressing NLP problems. Considering the boom of the NLP industry within the last decade, it is safe to assume that we have reached a critical point in our understanding of modeling approaches, linguistic phenomena, application requirements, and compromises that we can afford. Clearly, some approaches appear more suitable to specific applications than other approaches, e.g., stochastic approaches to speech recognition. Nevertheless, there seems to be a complexity associated with linguistic model development that transcends the modeling approach. Some believe that this is due to the language phenomenon itself (Ristad, 1993).It is highly probable that achieving truly effective, natural human-computer interaction will be the next bottleneck, in that our inability to understand “how it is that we do what we do” gets in the way of accomplishing this field’s ultimate goal. This has been the case with other scientific endeavors that study aspects of human existence, such as biology, psychology, and cognitive science. Nevertheless, although our models of natural phenomena-such as language-will most probably always remain approximations, it is through this quest for knowledge about self that we are finding out more about who we are, how to be more effective in doing what we do, and thus contribute to the evolution of society and ourselves. ACKNOWLEDGMENTS The author would like to thank Istvin Berkeley, Adrienne Broadwater, Subrata Dasgupta, Anthony Maidn, Renee McCauley, and Brian Slator for their invaluable comments and suggestions; Brian Slator is acknowledged for his contribution to an early outline+specially on the phases of NLP evolution; Istvin Berkeley for discussions on philosophical and connectionist issues; and Eleni Efthirniou for providing several important references. This work has been supported in part by the Louisiana Board of Regents through grant LEQSF (1997-00)-RD-A-31.
58
BILL MANARIS
REFERENCES AND FURTHER READING
In addition to the references cited above, the following list contains supplementary references, which are closely related to the topics raised, and thus may be of interest to the reader. Abney, S. (1997). Part-of-speech tagging and partial parsing. In Corpus-based Methods in Language and Speech Processing (S. Young and G. Bloothooft, eds.), pp. 118-136. Kluwer Academic Publishers, Dordrecht, The Netherlands. Akmajian, A., Demers, R. A., Farnier, A. K., and Harnish, R. M. (1990).Linguistics-An liitroducrion to Language arzd Corninunication, 3rd edn, The MIT Press, Cambridge, Massachusetts. Alexandersson, J., Reithinger, N., and Maier, E. (1997). Insights into the dialogue processing of verbomil. In Proceedings of F$th Conference on Applied Natural Language Processing, Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California, pp. 33-40. Allen, J. ( 1 9 9 4 0 Natural Language Understanding, 2nd edn, Benjamin/Curnmings, Redwood City, California. Allen, J. (1994b). Linguistic aspects of speech synthesis. In Voice Comrnunication Berween Hunzans and Machines (D. B. Roe and J. G. Wilpon, eds.). pp. 135-155. National Academy of Sciences, Washington, District of Columbia. Arienti, G., Cassaniga, T., Gardin, F., and Mauri, M. (1989). UNIX-TUTOR: an experiment for the use of deep knowledge for tutoring. In INFORMATION PROCESSING 89 (G. X. Ritter, ed.), pp. 569-574, Elsevier, North-Holland, Amsterdam. Arnold, D. (1986). Eurotra: a European perspective on MT. Proceedings of IEEE, 74(7), 979-992. Bar-Hillel, Y. (1960). The present status of automatic translation of languages. In Advances in Compurers (F. L. Alt, ed.), pp. 91-163. Academic Press, New York. Bass, L., and Coutaz, J. (1991). Developing Software for the User Inferface. Addison-Wesley, Reading, Massachusetts. Bates, M. (1994). Models of natural language understanding. In Voice Communication Berween Hunians arzd Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 238-254. National Academy of Sciences, Washington, District of Columbia. Baum, L. E. (1972). An inequity and associated maximization technique in statistical estimation of probabilistic functions of Markov processes. Inequalities, 3, 1-8. Baum, L. E., and Pehie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat., 37,1554-1563. Cited in Rabiner and Juang (1993). Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat., 41(1), 164-171. Cited in Rabiner and Juang, (1993). Biermann, A. W. (1976). Approaches to Automatic Programming. In Adwm-es in Computers (M. Rubinoff and M. C. Yovits, eds.), pp. 1-63. Academic Press, New York. Blank, G. D. (1989). A finite and real-time processor for natural language. Comnzurt. ACM, 32(10), 1174-1189. Bobrow, D. G., Kaplan, R. M., Kay, M., Norman, D., Tornpson, H., and Winograd, T. (1977). GUS, a frame-driven dialog system. Artifkid fnfelligence,8, 155-173. Reprinted in GTOSZ et a/., (1986). Bod, R., and Scha, R. (1997). Data-oriented language processing. In Corpus-based Methods in Language and Speech Processing (S. Young and G. Bloothooft, eds.), pp. 137-173. Kluwer Academic Publishers, Dordrecht, The Netherlands. Bonarini, A. (1993). Modeling issues in multimedia car-driver interaction. In Intelligent Multimedia Interfaces (M. T. Maybury, ed.), pp. 353-371. The MIT Press, Cambridge, Massachusetts. Booth, A. D., and Locke, W. N. (1955). Historical Introduction. In Machine Translation o f Languages, (W. N. Locke and A. D. Booth, eds.), pp. 1-14. TheTechnology Press of MIT and John Wiley, New York.
NATURAL LANGUAGE PROCESSING
59
Brown. J. S., and Burton, R . R. (1975). Multiple representations of knowledge foi-tutorial 1-easoning. In Re/JreSe/itUtioJl and Understunding (D. G. Bobrow and A. Collins, eds.), pp. 31 1-349. Academic Press, New York. Burger, J. D., and Marshall, R. J. (1993). The application of natural language models to intelligent multimedia. In I?itelligent Multiniedio Ifrte[fuce.s (M. T. Maybury, ed.), pp. 174- 196. The MIT Press, Cambridge, Massachusetts. Burton, R. R. (1976). Semantic Grammar: A Technique for Efficient Language Understanding in Limited Domains. Ph.D. Thesis, Ilniversity of California, Irvine. Busemann, S., Declerck, T.. Diagne, A. K., Dini, L., Klein, J., and Schmeier, S. (1997). Natural language dialog service for appointment scheduling agents. In Proceedings of Fifrh Conference 0 7 7 Applied Natural Larzguuge P rocessirig, pp. 25-32. Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California. Carbonell, J. (1996). Foreword. In Franz, A. Auroniurir Ambiguity Resolution in Natural Lufiguage Processing. Springer-Verlag, New York. Caudill, M . , and Butler, C. (1990). Naturally Iiztelligent Systems, The MIT Press, Cambridge, Massachusetts. Caudill, M., and Butler, C. (1992). U~idersrundirigNeural Networks. The MIT Press, Cambridge, Massachusetts. Cercone, N., and McCalla, G. (1986). Accessing knowledge through natural language. In Adi.urrces in Conipurers (M. C. Yovits, ed.), pp. 1-99. Academic Press, New York. Charniak, E. (1993). Srutistical Luriguuge Leu/-ning.The MIT Press, Cambridge, Massachusetts. Chin, D. N. (1983). A case study of knowledge representation in UC. In Proreedings of fhe Eighth Irrrernutiotiu/ Corlfererice on Arfjficiul ittiel/igence3pp. 388-390. IJCAI, Karlsruhe, Germany. Chinchor, N., and Sundheim, B. (199.3). MUC-5 Evaluation Metncs. In Proceedikgs qfFifih Messuge Understanding Cn$erenc~,pp. 69-78. Morgan Kaufmann, San Francisco, California. Chomsky, N. (1956). Three models for the description of languages. IRE Transactions on Inforrnution Theory, 2(3), 1 13- 124. Chomsky, N. (1957). Syntactic Structures. Mouton, The Hague, The Netherlands. Chomsky, N. (1959). On certain formal properties of grammars. hforrnation arid Conrr-ol, 2(2), 137-167. Chomsky, N. (1965). Aspects qf the T/ieory oj'Syrirrm. The MIT Press, Cambridge, Massachusetts. Church, K. (1985). Stress assignment in letter to sound rules for speech synthesis. In Proceedings oj' the 23rd Anfiual Meeting of ACL, pp. 136-143. Morgan Kaufmann, San Francisco, California. Cited in Marcus (1994). Church, K. W., and Mercer, R. L. (1993). Introduction to the special issue on computational linguistics using large corpora. In Usifig Lurgr Corpwu, ( S . Armstrong, ed.), pp. 1-24. The MIT Press, Cambridge, Massachusetrs. Church, K. W., and Rau, L. F. (1995). Commercial applications of natural language processing. Comniun. ACM, 38(11), 71-79. Clark, A. (1993). Associative Efzfiirie.~-Corir?ectioriisin, Cotzcel~ts.arid Represerzturional Churige. The MIT Press, Cambridge, Massachusetts. Cohen, P. R., and Oviatt, S. L. (1994). The role of voice in human-machine communication. In Voice Comrnurzicatiorz Bemeen Huniuns u77d Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 34-75. National Academy of Sciences, Washington, District of Columbia. Cohen, P., Johnston, M., McGee, D., Oviart, S., Pittman, J., Smith, I., Chen, L.. and Claw, J. (1997). Quickset: multimodal integration for simulation set-up and control. In Proceedings of Fifth Conferenre on Applied Nutural Language Proressing, pp. 20-24. Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California. Coutaz, J., Caelen, J. (1971). A taxonomy for multimedia and multimodal user interfaces. In Proceedings of the ERCIM Workshop on Uiser Ifiterfares and Multimedia, pp. 142-1 47. European Research Consortium for Informatics and Mathematics, Lisbon, Portugal.
60
BILL MANARIS
Cowan, J. D., and Sharp, D. H. (1988). Neural nets and artificial intelligence. In The Artificial Intelligence Debate-False Starts, Real Foundations, ( S . R. Graubard, ed.), pp. 85-12], The MIT Press, Cambridge, Massachusetts. Crangle, C., and Suppes, P. (1994). Language arzd Learning for Robots. Center for the Study of Language and Information, Stanford, California. Day, M. C., and Boyce, S. J. ( I 993). Human factors in human-computer system design. In Advances in Coniputers (M. C. Yovits, ed.), pp. 333-430. Academic Press, New York. Digalakis, V., and Murveit, H. (1994). Genomes: optimizing the degree of mixture tying in large vocabulary hidden Markov model based speech recognizer. In Proceedings of International Corference on Acoustics, Speech, arzd Signal Processing, pp. 1537-1540. IEEE Press, Piscataway, New Jersey. Dreyfus, H. L. (1993). What Computers Still Can’t D-A Critique of Artificial Reason. The MIT Press, Cambridge Massachusetts. Earley, J. (1 970). An efficient context-free parsing algorithm. Conzniun. ACM, 13(2), pp, 94-102. Reprinted in Grosz et al., (1986). Elman, .I. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195-225. Epstein, R. (1992). The quest for the thinking computer. AI Magazine, 13(2), 80-95. Fatehchand, R. (1960). Machine recognition of spoken words. In Advances in Conrputers (F. L. Alt, ed.), pp. 193-229. Academic Press, New York. Feigenbaum, E. A. (1996). How the “what” becomes the “how”-ACM Turing award lecture. Conznzun. ACM, 39(5), 97-104. Firebaugh, M. W. (1 988). Art@cial Intelligence-A Knowledge-based Approach. PWS-Kent, Boston, Massachusetts. Flanagan, J. L. ( I 994). Speech communication-an overview. In Voice Conzmunication Between Humans arid Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 76-104. National Academy of Sciences, Washington, District of Columbia. Fodor, J. A,, and Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: a critical analysis. Cognition, 28, 3-71. Furnas, G. W., Landauer, T. K . , Gomez, L. M., and Dumais, S. T. (1983). Statistical semantics: analysis of the potential performance of key-word information systems. The Bell System Techrzicol Journal, 62(6), 1753-1806. Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. (1987). The vocabulary problem in human-system communication. Coninruri.ACM, 30( 1 I), 964-971. Furui, S. (1904). Toward the ultimate synthesis/recognition system. In Voice CornrnunicationBetween Hunzans and Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 450-466. National Academy of Sciences, Washington, District of Columbia. Gamin, P. L., and Spolsky, B. (eds) (1966). Coniputatiori i n Linguistics. Indiana University Press, Bloomington, Indiana. Gazdar, G., and Mellish, C. (1989). Natural Language Proce.\sing i n LISP-An Introduction to Computational Linguistics. Addison-Wesley , Reading, Massachusetts. Gazdar, G., Franz, A,, Osbome, K., and Evans, R. (1987). Natural Language Processirig i n the 1980s. Center for the Study of Language and Information, Stanford, California. Giachin, E. P. (1992). Automatic training of stochastic finite-state language models for speech understanding. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Cited in Harper et at., (1994). Giachin, E., and McGlashan, S. (1997). Spoken language dialogue systems. In Cori)us-ha.~edMethod.~ in Language and Speech Processing (S. Young and G. Bloothooft, eds.), pp. 69-1 17. Kluwer Academic Publishers, Dordrecht, The Netherlands. Green, B. F., Wolf, A. K., Chomsky, C., and Laughery, K. (1963). BASEBALL: an automatic question answerer. In Computers and Thoughf (E. Feigenbaum and J. Fedman, eds.), pp. 207-216. McCraw-Hill, New York. Reprinted in Grosz et al., (1986).
NATURAL LANGUAGE PROCESSING
61
Grosz, B. J., Appelt, D. F., Martin, P. A,, and Pereira, C. N. (1987). TEAM: an experiment in the design of transportable natural-language interfaces. Artificial Intelligence, 32, 173-243. Grosz, B. J., Spark Jones, K., and Webber, B. L. (eds) (1986). Readings in Natural Lailguage Processing. Morgan Kaufmann, Los Altos, California. Hall, G., Popowich, F., and Fass, D. (1996). Natural language edit controls: constrained natural language devices in user interfaces. In Proceediiigs of IEEE 8th Iiiterriational Confererice 011 Tools with Artificial Intelligence, pp. 475-477. IEEE Computer Society Press, Los Alamitos, California. Hall, P. A. V., and Dowling, G. R. (1980). Approximate string matching. Computing Surveys 12(4), 38 1-402. Harris, M. D. (1985). Irztroductiori to Naturol Lariguage Processing. Reston Publishing Company, Reston, Virginia. Harper, M.P., Jamieson, L. H., Mitchell, C. D., Ying, G., Potisuk, S., Shrinivasan, P. N., Chen, R., Zoltowski, C. B., McPheters, L., Pellom, B., and Helzerman, R. A. (1994). Integrating language models with speech recognition. In Prowedings of AAAI-94 Workshop on Integration of Natural Lmzguage and Speech Processing, (P. McKevitt, ed.), pp. 139- 146, Seattle, Washington. Hayes-Roth, F., and Jacobstein, N. The state of knowledge-based systems. Cornmuti. ACM. 37(3), 27-39. Hebb, D. 0. (1949). The Orgarzizatiori of Behavior. John Wiley, New York. Hemphill, C. (1993). DAGGER, directed acyclic graphs of grammars for enhanced recognition. User’s Guide mid Reference Manual, Texas Instruments, Dallas, Texas. Hendrix, G. G., Sacerdoti, E. D., Sagalowicz, D., and Slocum, J. (1978). Developing a natural language interface to complex data. ACM Transactions on Database S y s t e m , 3(2), 105-147. Herdan, G. (1964). Quarzritclrive Linguistic.s. Butterworths, Washington, District of Columbia. Hill, D. R. (1971). Man-machine interaction using speech. In Advances in Conzputers (F. L. Alt and M. Rubinoff, eds.), pp. 165-230. Academic Press, New York. Hirschrnan, L. (1994). The roles of language processing in a spoken language interface. In Voice Coinrnunication Between Hunzarzs and Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 21 7-237. National Academy of Sciences, Washington, District of Columbia. Hirschman, L., and Cuomo, D. (1994). Report from the ARPA Workshop on Evaluation of Human Computer Interfaces. Tech. Rep. MP 94B0000259, MITRE, Bedford, Massachusetts. Hix, D. and Hartson, H. R. (1993). Deivlopirig User Iiiterfac.es-Eiisuri,ig Usability Through Product & Process. John Wiley & Sons, New York. Hofstadter, D. P. (1979). Giidel, Escher, B w h : An Eternal Golden Braid. Random House, New York. Hollan, J., Rich, E., Hill, W., Wroblewski, D., Wilner, W., Wittenburg, K., and Grudin, J. (1991). An introduction to HITS: human interface tool suite. In Intelligent User Interfaces, (J. W. Sullivan and S. W. Tyler, eds.), pp. 293-337. ACM Press, New York. Hovy, E. (1993). How MT works. Byte, 18(1), 167-176. Jackendoff, R. (1990). Semuntic Struuures. The MIT Press, Cambridge, Massachusetts. Jacobs, P. S. (1994). Text-based systems and information management: artificial intelligerice confronts matters of scale. In Proceediiigs of Sixth International Conference on Tools with Art[ficial Intelligence, pp. 235-236, IEEE Computer Society Press, Los Alamitos, California. Josselson, H. H. (1971). Automatic translation of languages since 1960: a linguist’s view. In Advances in Computers (F. L. Alt and M. Rubinoff, eds.), pp. 1-58. Academic Press, New York. Kamm, C. (1994). User interfaces for voice applications. In Voice Cornrnunicurion Between Humans and Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 422-442. National Academy of Sciences, Washington, District of Columbia. Kay, M., Gawron, J. M., and Norvig, P. (1994). Verbmobil: A Translation System f o r Face-&-Face Dialog. Center for the Study of Language and Information, Stanford, California. King, M. (1996). Evaluating natural language processing systems. Cormnun. ACM, 39( l), 73-79. Knill, K., and Young, S. (1997). Hidden Markov models in speech and language processing. In
62
BILL MANARIS
Corpus-basedMethods in Language and Speech Processing ( S . Young and G. Bloothooft, eds.), pp. 27-68. Kluwer Academic Publishers, Dordrecht, The Netherlands. Koons, D. B., Sparrell, C. J., andThorisson, K. R. (1993). Integrating simultaneous input from speech, gaze, and hand gestures. In Intelligent Multimedia Interfaces (M. T. Maybury, ed.), pp. 257-276. The MIT Press, Cambridge, Massachusetts. Krause, J. (1993). A multilayered empirical approach to multimodality: towards mixed solutions of natural language and graphical interfaces. In Intelligent Multimedia Interfaces (M. T. Maybury, ed.), pp. 328-352. The MIT Press, Cambridge, Massachusetts. Kuno, S., and Oettinger, A. G. (1963). Multiple-path syntactic analyser. In fnfortnatiori Processing 62, (C. M. Popplewell, ed.), pp. 306-312. North-Holland, Amsterdam. Reprinted in Grosz et al., (1986). Kupiec, J. (1992). Hidden Markov estimation for unrestricted stochastic context-free grammars. In IEEE Internatiotial Corference on Acoustics, Speech, and Signal Processing, pp, 177- 180. Cited in Harper el al. (1994). Lai, J., and Vergo, J. (1997). MedSpeak: report creation with continuous speech recognition. In Proceedings of CHI '97 Conference on Human Factors in Computing Systems, ACM, New York, pp. 431-438. Landauer, T. K., Galotti, K. M., and Hartwell, S. (1983). Natural command names and initial learning: a study of text-editing terms. Comniun. ACM, 26(7), 495-503. Lane, A. (1987), DOS in English. Byte 12(12), 261-264. Lari, K., and Young, S . J. (1991). Applications of stochastic context-free grammars using the inside outside algorithm. Computer Speech & Language, 5(3), 237-257. Cited in Harper et al., (1994). Ledgard, H., Whiteside, J. A,, Singer, A,, and Seymour, W. (1980). The natural language of interactive systems. Conimun. ACM, 23( lo), 556-563. Lee, K-F. (1993). Automatic speech recognition. In The Distinguished Lecture Series VI, Video. University Video Communications, Stanford, California. Levinson, S. E. (1994). Speech recognition technology: a critique. In Voice Coniniuriicution Between Humans and Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 159-164. National Academy of Sciences, Washington, District of Columbia. Levitt, H. (1994). Speech processing for physical and sensory disabilities. In Voice Cortlmunicatiori Between Humans and Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 311-343. National Academy of Sciences, Washington, District of Columbia. Lewis H. R., and Papadimitriou, C. H. (1981). Elements ofthe Theory of Conipuration. Prentice-Hall, Englewood Cliffs, New Jersey. Locke, W. N., and Booth, A. D. (eds) (1955). Machine Translation qflanguages. Technology Press of MIT and Wiley, Cambridge, Massachusetts. Luger, G. F., and Stubblefield, W. A. (1998). Arti>cial Intelligence: Structures and Strategies for COl71pkX Problem Solving. 3rd edn. Addison-Wesley, Reading, Massachusetts. McCorduck, P. (1979). Machines Who Think. W. H. Freeman, San Francisco, California. McCulloch, W. S., and Pitts, W. H. (1943). A logical calculus of the ideas immanent in nervous activity. Butletin of Mathematical Biophysics, 5 , 115-137. Maegaard, B., and Perschke, S. (1991). An introduction to the Eurotra programme. In The Eurotra Linguistic Spec,$cations (C. Copeland, J. Durand, S. Krauwer and B. Maegaard, eds.), pp. 7-14. Office of Official Publications of the European Communities, Brussels, Luxembourg. Makhoul, J., and Schwartz, R. (1994). State of the art in continuous speech recognition. In Voice CommunicationBetween Humans arid Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 165-198. National Academy of Sciences, Washington, District of Columbia. Malmkjaer, K., (ed.) (1991). The Linguistics Encyclopedia. Routledge, New York. Manaris, B. 2. (1994). An engineering environment for natural language interfaces to interactive computer systems. International Journal of Artificial Irztelligence Tools, 3(4), 557-579. Manaris, B., and Dominick, W. (1993). NALIGE: a user interface management system for the
NATURAL LANGUAGE PROCESSING
63
development of natural language interfaces. I~~ternutionut Journal qf Maiz-Machine Studiex, 38(6), 891-921. Manaris, B., and Harkreader, A. (1997). SUITE: speech understanding interface tools and erivironments. In Proceedings of the Tenth Iiiterriutiorzal Florida Artificial fiztelligeizceSynzposium, Florida A1 Research Society, Daytona Beach, Florida, pp. 247-252. Manaris, B. Z., and Slator, B. M. (1996). lnteractive natural language processing: building on success. ZEEE Coniputer, 29(7), 28-32. Marcus, M. (1978). A computational account of some constraints on language. In Theoretical Zssues in Natural Language Processing-2, (D. Waltz, ed.), pp. 236-246. Association of Computational Linguistics, Urbana-Champaign, Illinois. Marcus, M. P. (1980). A Theory of Syritudc Recognition for Natural Language. The MI7 Press, Cambridge, Massachusetts. Marcus, M. (1994). New trends in natural language processing: statistical natural language processing. In Voice Communication Between Humans mid Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 482-504. National Academy of Sciences, Washington, District of Columbia. Markowitz, J. A. (1996). Using Speech Rerognition. Prentice Hall PTR, Upper Saddle River, New Jersey. Martin, P., Crabbe, F., A d a m , S., Baatz, E., and Yankelovich, N. (1996). SpeechActs: a spoken language framework. ZEEE Computer, 29(7), 33-40. Maybury, M. T., ed. (1993). Intelligent Multimedia Interfaces. The MIT Press, Cambridge, Massachusetts. Meisel, W. S. (1993). Talk to your computer. Byte, 18(10), 113-120. Miikkulainen, R. (1 993). Subsymbolic Nnturul Latiguuge Processing: An Integrated Model of Scripts, Lexicon, and Memory. The MIT Press, Cambridge, Massachusetts. Miikkulainen, R. (1994). Integrated connectionist models: building A1 systems on subsymbolic foundations. In Proceedings of ZEEE 6th Ititernatioiiul Conference on Tools with Arttficial Intelligeizre, IEEE Computer Society Press, Los Alainitos, California, pp. 23 1-232. Miller, L. C. (1993). Resource guide: machine-translation software. Byte 18(1), 185-186. Minsky, M., and Papert, S. (1969). Perceptrons: An Zntroductiorz to Conipututiorzal Geonierry. The MIT Press, Cambridge, Massachusetts. Cited in Cowan and Sharp (1988). Moll, R. N., Arbib, M. A., and Kfoury, A. J. (1988). An 117rroductioiito Formal Language Theory. Springer-Verlag, New York. Moore, J. D., and Mittal, V. 0. (1996). Dynamically generated follow-up questions. ZEEE Conzputer. 29(7), 75-86. Moore, R. C. (1994a). Integration of speech with natural language understanding. I n Voice Coininunicution Between Hurnuiu urzd Machbzes (D. B. Roe and J. G. Wilpon, eds.), pp. 254-27 I. National Academy of Sciences, Washington. District of Columbia. Moore, R. C. (1994b). Semantic evaluation for spoken-language systems. Ln Proceedings qf the Hurnuri Luiiguage Technolog)) Workshop, Morgan Kaufmann, San Francisco, California, pp. 126-131. Moore, R., Dowding, J., Bratt, H., Gawron, J. M., Gorfu, Y., and Cheyer, A. (1997). CommandTalk a spoken-language interface for battlefield simulations. In Proceedings of Ffth Cotlference on Applied Natural Language Processing, pp. 1-9, Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California. Mostow, J., Roth, S. F., Hauptmann. A. G . , and Kane, M. (1994). A prototype reading coach that listens. In Proceedings of Twe!fth National Conference on Artificial frztelligence, pp. 785-792, AAAI Press, Palo Alto, California. Muller, M. J., Wharton, C., McIver, Jr., W. J., and Laux, L. (1997). Toward an HCI research and practice agenda based on human needs and social responsibility. In Proceedings CHI'97---Human Factors in Computing Systeins, ACM Press, Atlanta, Georgia, pp. 155-161. Munakata, T. (1994). Commercial and industrial AI. Comrnun. ACM, 37(3), 23-25.
64
BILL MANARIS
Napier, H. A., Lane, D. M., Batsell, R. R., and Guada, N. S. (1989) Impact of a restricted natural language interface on ease of learning and productivity. Commun. ACM, 32(10), 1190-1 198. Newell, A., and Simon, H. (1976). Computer science as empirical inquiry: symbols and search. Cor?zmun.ACM, 19(3), 113-126. Newmeyer, F. J. (1983). Grarmnatical Theory-Its Limits and Its Possibilities. The University of Chicago Press, Chicago, Illinois. Ney, H. (1991). Dynamic programming parsing for context-free grammars in continuous speech recognition. IEEE Transactions on Sigrid Processing, 39(2). Cited in Harper et al. (1994). Ney, H. (1997). Corpus-based statistical methods in speech and language processing. In Cor[Jw-bused Methods inlanguage andspeech Processing ( S . Young and G . Bloothooft, eds.), pp. 1-26.Kluwer Academic Publishers, Dordrecht, The Netherlands. Obenneier, K. K., ( I 989). Natural Language Processing Techrzologies iii Artificial Intelligence-The Science and Industry Perspective. John Wiley & Sons, New York. Oettinger, A. G. (1955). The design of an automatic Russian-English technical dictionary. In Machine Translation oflanguages, (W. N. Locke and A. D. Booth, eds.), pp. 47-65. The Technology Press of MIT and John Wiley, New York. Otten, K. W. (1971). Approaches to the machine recognition of conversational speech. In Advances iii Conzputers (F. L. Alt and M. Rubinoff, eds.), pp. 127-163. Academic Press, New York. Oviatt, S., and Cohen, P. R. (1991). The contributing influence of speech and interaction on human discourse patterns. In Intelligent User Interfaces, (J. W. Sullivan and S. W. Tyler, eds.), pp. 69-83. ACM Press, New York. Oviatt, S., DeAngeli, A., and Kuhn, K. (1997). Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings CHI'97-Hunzun Factors in Computing Systernx, ACM Press, Atlanta, Georgia, pp. 415-422. Pallett, D. S., Fiscus, J. G., Fisher, W. M., Garofolo, J. S., Lund, B. A., Martin, A,, and Przybocki, M. A. (1994). 1994 benchmark tests for the ARF'A spoken language program. In Proceedings of the Spoken Language Systems Techfrology Workshop, Morgan Kaufmann, San Francisco, California, pp. 5-36. Paris, C., and Vander Linden, K. (1996). An interactive support tool for writing multilingual manuals. IEEE Computer, 29(7), 49-56. Pausch. R. and Leatherby, J. H. (1991). An empirical study: adding voice input to a graphical editor. Journul of the Aniericari Voice hzput/Orrtput Society 9(2), 55-66. Cited in Shneiderman (1998). Pereira, C. N., and Grosz, B. J. (eds) (1994). Natural Language Processing. The MIT Press, Cambridge, Massachusetts. Reprinted from Artificial Intelligence, 63( 1-2), 1993. Priece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., and Carey, T. (1994). Human-Computer Interaction. Addison Wesley, Reading, Massachusetts. Rabiner, L., and Juang, B-H. (1993). Fundainenrafs of Speech Recognition. Prentice Hall PTR, Englewood Cliffs, New Jersey. Rash, W. (1994). Talk show. PC Magazine, December 20,203-219. Reddy, R. (1996) To dream the possible dream- ACM Turing award lecture. Commuri. ACM, 39(5), 105-112. Reeker, L. H. (1976). The computational study of language acquisition. In Advances i n Cornputers (M. Rubinoff and M. C. Yovits, eds.), pp. 181-237. Academic Press, New York. Reich, P. A. (1969). The finiteness of natural language. Latiguage, 45(4), 831-843. Revzin, I. 1. (1966). Models oflanguage. Methuen, London, England. Ristad, E. S. (1993). The Language Complexity Game. The MIT Press, Cambridge, Massachusetts. Roe, D. B. ( 1994). Deployment of human-machine dialogue systems. In Voice Comm~tnicutiori Between Humans and Machirzes (D. B. Roe and J. G. Wilpon, eds.), pp. 373-389. National Academy of Sciences, Washington, District of Columbia. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386-408.
NATURAL LANGUAGE PROCESSING
65
Rudnicky, A. I., Hauptmann, A. G., and Lee, K.-F. (1994). Survey of current speech technology. Corimiuri. ACM, 37(3), 52-57. Rumelhart, D. E. (1975). Notes on a schema for stories. In Representation and Uriderstaridirig (D. Bohrow and A. Collins, eds.) Academic Press, New York. Cited in Hofstadter (1979), p. 67.5. Russell, S., and Norvig, P. (1995). Artificial Iritelligence-A Modern Approach. Prentice Hall, Englewood Cliffs. New Jersey. Sager, N. (1967). Syntactic analysis of natural language. In Advarices iii Cnnipurers (F. L. Alt and M. Ruhinoff, eds.), pp. 153-188. Academic Press, New York. Sager, N. (1978). Natural language information formatting: the automatic conversion of texts to a structured data base. In Advances in Co~rzpurers(M. C. Yovits, ed.), pp. 89-162. Academic Press, New York. Schafer, R. W. (1994). Scientific bases of human-machine communication by voice. In Voice Coniniurzication Eenveeii Hurriarz.r uiid Machines (D. B. Roe and J. G. Wilpon, eds.), pp. 15-33. National Academy of Sciences, Washington, District of Columbia. Schalkwyk, J., and Fanty, M. (1996). The CSLU-C Toolkit for Automatic Speech Recognition. Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology. http://www.cse.ogi.edu/CSLU/toolkit/d~cumentation/csluc/Cdoc.html, Schalkwyk, J . , Vermeulen, P., Fanty, M., and Cole, R. (1995). Embedded implementation of a hybrid neural-network telephone speech recognition system. In Proceedings of IEEE International Coizfererice on Neural Networks and Sigtial Proressing, pp. 800-803, IEEE, Nanjing, China. Schmucker, K. J. (1984). Fuzzy Ser5, Nufural Language Conzpulotions, and Risk Analysis. Computer Science Press, Rockville, Maryland. Scott, J. C. (1996). The voices of automation. Computer Shopper, 16(9), 550-555. Shannon, C. (1948). The mathematical theory of communication. Bell Systems Technical Journal, 27, 398-403. Cited in Church and Mercer, (1993). Shapiro, S. C. (1979). The SNePS semantic network processing system. In Associative Nerworks: Rep-esentation arid Use ofKiiowkdge by Coniputers (N. V. Findler, ed.), pp. 179-203. Academic Press, New York. Sheremetyeva, S., and Nirenhurg, S., ( 1996). Knowledge elicitation for authoring patent claims. IEEE Computer, 29(7), 57-63. Shneiderman, B. ( I 993). Designing r/ie Uspr Interfuce: Strutegies f o r Effective Hurtiu~~-Co~iz~iu~er Inreractiori, 2nd edn, Addison-Wesley. Reading, Massachusert5. Slator, B. M, Anderson, M. P., and Conley. W. (1986). Pygmalion at the interface. Commuri. ACM, 29(7), 599-604. Slocum, J. (1981). Machine translation: an American perspective. Proceedings of IEEE, 74(7), 969-978. Smith, G., and Bates, M. (1993). Voice activated automated telephone call routing. In Proceedings of Ninth IEEE Conference on Artifrial Intelligeizce ,for Applications, pp. 143-148, IEEE CS Press, Los Alamitos, California. Sowa, J. F. (1984). Conceptual Strrrctures-li!ffirr?iatioii Processing in the Mind arid hluchirie. Addison Wesley, Reading, Massachusetts. Spark Jones, K. (1994). Towards better NLP system evaluation. In Proceedbigs of the Hunian Language Technology Workshup, pp. 102-107, Morgan Kaufmann, San Francisco, California Spark Jones, K., and Galliers, J. R. ( 1 996). Evaluating Natural Language Processing Sysfenis-An Ana/>wis arid Review. Springer-Verlag, New York. Stock, O., and the ALFRESCO Project Team f 1993). ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In Inrelligenf M~iltiniedia Interfaces (M. T. Maybury, ed.), pp. 197-225. The MIT Press, Cambridge, Massachusetts. Strong, G. (1996). Human language research and the National Science Foundation. lEEE Compufer, 29(7), 3 1. Thompson, F. B., and Thompson, 8 . H. (1975). Practical natural language processing: The REL
66
BILL MANARIS
system as prototype. In Advarzces in Coiizputers (M. Rubinoff and M. C. Yovits, eds.), pp. 109-168. Academic Press, New York. Titus, J. P. (1967). The nebulous future of machine translation. Conzmuiz. ACM 10(3), 190. Cited in Josselson (1971), p. 48. Turing, A. M. (1950). Computing machinery and intelligence. Mirid, 59,433-460. Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theory, IT-13, 260-269. Wahlster, W. (199 I). User and discourse models for multimodal communication. In Inreftigerit User Iizterfaces, (J. W. Sullivan and S. W. Tyler, eds.), pp. 45-68. ACM Press, New York. Waibel, A. (1996). Interactive translation of conversational speech. /EEE Coniputer, 29(7), 41 -48. Wauchope, K., Everett, S., Perzanowski, D., and Marsh, E. (1997). Natural language in four spatial interfaces. In Proceediiigs of Fifth Coufereiice of1Applied Natural Language Processing, pp. 8-1 1, Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California. Weaver, W. (1955). Translation. In Madzirie Trarislatioiz of Languages, (W. N. Locke and A. D. Booth, eds.), pp. 15-23. The Technology Press of MIT and John Wiley, New York. Wegner, P. (1997). Why interaction is more powerful than algorithms. Coinmuri. ACM 40(5), 80-91. Weiser, M. (1994). The world is not a desktop. Ziiteractiorzs, 1( l), 7-8. Weizenbaum, J. (1966). ELIZA-a computer program for the study of natural language communication between man and machine. Corizniur~ACM, 9(7), 36-43. Weizenbaum, J. (1967). Contextual understanding by computers. Conunun.ACM, lO(8). 474-480. Weizenbaum, J. (1976). Cornputer Power and Human Reason-From Judgnierit to Calculation. W. H. Freeman, San Francisco. Wermter. S., and Weber, V. (1996). Interactive spoken-language processing in a hybrid connectionist system. IEEE Coinputer, 29(7), 65-74. Wilensky. R., Arens, Y., and Chin, D. (1984). Talking to UNIX in English: an overview of UC. C01711nUll. ACM, 27(6),574-593. Wilensky, R., Chin, D. N., Luria, M., Martin, J., Mayfield, J., and Wu, D. (1988). The Berkeley UNIX consultant project. ConiputatioriafLinguistics, 14(4), 35-84. Wilks, Y. (1975). An intelligent analyzer and understander of English. Commurz. ACM, 18(5), 264-274 (reprinted in Grosz el a/., 1986). Wilks, Y. (1996). Natural language processing-introduction. Cornrnun. ACM, 39( I), 60-62. Wilks, Y., Slator, B. M., and Guthrie, L. M. (1996). Electric Words--Dictioiiaries, Conzyuters, and Meuriing. The MIT Press, Cambridge, Massachusetts. Wilpon, J. G. (1994). Applications of voice-processing technology in telecommunications. In Voice Corninuciication Bemeeii Hurnaiis arid Machines (D. B. Roe and J. G. Wilpon, eds.) pp. 280-310. National Academy of Sciences, Washington, District of Columbia. Winograd, T. (1972). Understanding Natural Larzguuge. Academic Press, New York. Winograd, T. (1980). What does it mean to understand language? Cognitive Science, 4, 209-241. Winograd, T. (1983). Laizguage as a Cogiiitive Pr-ocess.Addison-Wesley, Reading, Massachusetts. Winograd, T., and Flores, F. ( 1986). Uridersfaridirig Coniputers and Coguition. Ablgx Publishing, Norwood, New Jersey. Winston, P. H. (1992). Arttficial lrztelligence, 3rd edn. Addison-Wesley, Reading, Massachusetts. Woods, W. A. (1973). Progress in natural language understanding: an application to lunar geology. In AFIPS Conference Proceedings, 42,441-450. Woods, W. A. (1970). Transition network grammars. Cornnzun.ACM 13(10), 591-606. Woods. W. A. (1978). Semantics and quantification in natural language question answering. In Advances iri Coniputers (M. C. Yovits, ed.), pp, 1-87. Academic Press, New York. Yankelovich, N. (1996). How do users know what to say? ACM Irzteractions 3(6), 33-43 Zipf, G. K. (1949). Human Behaviour urzd The Priwiple ofLeast Effort, Addison-Wesley, Cambridge. Massachusetts. Zue, V. W. (1994). Toward systems that understand spoken language. ZEEE Expert, 9( I ) , 51-59,
Cognitive Adaptive Computer Help (COACH): A Case Study EDWIN J . (TED) SELKER ISM Almaden Research Center San Jose, California
Abstract User interfaces can be difficult to master. Typically, when a user has a problem understanding the interface, computers respond with generic, difficult-to-interpret feedback. This is a case study of Cognitive Adaptive Computer Help (COACH), an example of a style of intelligent agent which gives more effective responses when problems occur in the interface between people and computers. It implements a cognitive interface which attempts to recognize the needs of a user and responds proactively as the user is typing. It records and analyzes user actions to adupr computer responses to the individual, offering useful help information even before the user requests it. The approach uses dynamic models of both the user and the domain of knowledge the user is learning. These models teach and guide a user. COACH was first used in a study which validated real-time learning and reasoning in computer interface can improve users’ productivity and comfort with an interface. COACH was designed to facilitate development and study of adaptive help systems. The help given the user, the domain in which the user is being coached, even the way the system adapts to the user, are represented in frames and coritrolled by rules which can he changed. COACH was first tested in teaching the problem domain of writing Lisp programs. To demonstrate the generality of COACH for teaching arbitrary problem domains, help systems were then created for the UNIX command language and GML text formatting language. In the original text-based COACH implementation, a new problem domain can he supported simply by defining the syntax of the domain and writing the help text. The ideas have been applied to a modern graphical user interface (GUI) in a product version created for the OS/2 and Windows95 GUIs. This COACH/2 system uses new teaching techniques to graphically demonstrate GUI syntax arid procedures. A see-through technique called masking draws users’ attention to GUI objects. An animation technique called slug trails walks the user through graphical procedures. COACH/:! also includes a WYSIWYG authoring approach and tool to extend the notion of the system being a shell for creating adaptive help systems. This tool automatically picks up syntax from the graphical interaction with the author, so only the help text itself needs to be written. The architecture is available as WarpGuides in the OS/2 Warp 4 release. WarpGuides show a user how to perform graphical actions in a graphical user interface. ADVANCES IN COMPUTERS, VOL. 41 ISBN 0-12-012147-6
67
Copyright 8 1998 by Acadzmic Press All rights of reproduction in any form reserved.
68
EDWlN J . (TED) SELKER
Current development work with COACH/2 is exploring 3-d animation. mixing adaptive computer help with adaptive tutoring .
1. 2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The COACH Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 A Novice Lisp Programmer . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 A Student Programmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 An Expert Programmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Review of Literature: On-line Computer Teaching . . . . . . . . . . . . . . . . . 3.1 Tutoring Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Helpsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Coaching Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Critic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . Requirements For An Adaptive Help Testbed . . . . . . . . . . . . . . . . . . . 5 . Technical Considerations for Creating COACH . . . . . . . . . . . . . . . . . . 5.1 Suitable Domains for Demonstrating Adaptive User Help . . . . . . . . . . 5.2 Classifying Knowledge Deficits . . . . . . . . . . . . . . . . . . . . . . . 5.3 A Classification for the Help to Provide to Users . . . . . . . . . . . . . . 5.4 Tracking User Proficiency as the User is Working . . . . . . . . . . . . . . 6. An Architecture for Adaptive User Help . . . . . . . . . . . . . . . . . . . . . . 6.1 Window Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Reasoning System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 System Knowledge and the Adaptive User Model (AUM) . . . . . . . . . 6.4 Instrumented Multilevel Parser . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. ACOACHShell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. I Using COACH in Open Systems . . . . . . . . . . . . . . . . . . . . . . . 7.2 Using COACH For Different Domains . . . . . . . . . . . . . . . . . . . 7.3 Experimentation With Help Presentation Strategies . . . . . . . . . . . . . 8. Evaluation of COACH Adaptive User Help . . . . . . . . . . . . . . . . . . . . . 8.1 Pilot COACH Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Quantitative Study; Demonstrating COACH Usability Improvements . . . 8.3 Futurework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Development Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Animated Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 SlugTrails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 IconDressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 CueCards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Guides and Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Hailing Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. Future Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Future System Development . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . .
69 70 71 72 75 76 77 80 81 83 84 86 86 87 88 89 90 92 94 96 102 108 108 109 109 110 110 111 111
121 122 122 123 123 125 125 127 128 129 129 131 131 134 138
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
69
1. Introduction Cognitive Adaptive Help (COACH) is a user help system that monitors the interaction between the user and the computer to create personalized user help. Imagine learning a new operating system of programming language. COACH watches the user’s actions to build an Adaptive User Model (AUM) that selects appropriate help advice. Just a football coach will stand on the sidelines and encourage, cajole, or reprimand, COACH is an advisory system that does not interfere with the user’s actions but comments opportunistically to help the user along. COACH chooses descriptions, examples, syntactic definitions, etc. as appropriate for user-demonstrated experience and proficiency. A description that advertises a command or function is helpful for getting started, but might become ignored if it is presented too often. An example showing how to perform a procedure is often valuable until the procedure is mastered, after which it 1s no longer useful arid may even become annoying. A syntactic definition describing the generalization of the procedure becomes valuable when the procedure is close to being mastered. Computer users find themselves needing classes, tutoring, help, and reference materials in order to be able to accomplish even the simplest of tasks with a computer. Terry Winograd and Fernando Flores’s book (Winograd and Flores, 1986) discusses “breakdown” of readiness-to-hand in terms with which we are all familiar-a computer becoming the focus of attention because the user does not know how to proceed. The need to seek aid frustrates the user and prolongs the process of becoming proficient. Attempts to computerize teaching aids have created an active research field. The impact and acceptance of computers in teaching roles, however, continues to be elusive. Creating computer interactions so natural that they require no outside learning (the “walk up and use” ideal) would allow all user effort to be focused on the primary task. The idea of a tool as an object that allows workers to concentrate on their task, rather than on the tool, was at the heart of Martin Heidegger’s concept of “readiness-to-hand” (Heidegger, 1977; Winograd and Flores, 1986). COACH contributes toward that goal by giving more effective assistance to users while they try to focus on their work. Winograd (Winograd and Flores, 1986) describes the process of managing the “breakdown conversations” as the way to progress. When the productive conversation a user is having with the computer to accomplish a task breaks down, it sends the user into a new conversation to break through or go around the roadblock. COACH’Sgoal is to manage these conversations when the computer’s “unreadiness-to-hand” is slowing progress. Unlike teacher-oriented learning paradigms, in which the teacher is driving the student, the design of COACH uses a style of teaching which is driven by the student. COACH facilitates a user’s own objectives in a work session.
70
EDWIN J. (TED) SELKER
COACH is an example of an advisory agent, as contrasted with the assistance agent used in other help system and teaching implementations. An advisory agent shows users what they can learn. In contrast, an assistance agent does the task for the user. This advisory agent uses an implicit dialog with the user, working in the background, tracking user actions and looking for opportunities to present relevant, but unsolicited, advice. The user is free to ignore the advisory agent, as opposed to many implementations of agents that engage in an explicit dialog with the user. The following are the research objectives addressed by COACH: 0
0
0
0
To study the differences between an automated adaptive help system and a standard passive help system. To explore automated teaching technology that shifts the teaching paradigm away from a pre-structured format to concentrate instead on users’ individual needs. COACH is an example of a teaching paradigm that moves toward an apprenticeship, or learn-while-doing, approach. To demonstrate the feasibility of an AUM-based advisory agent to guide selection of help advice without introducing unacceptable delays in system response. COACH is a demonstration of help system that adapts to a user while running concurrently with the software for which help is being provided. Earlier research proposed that an interactive adaptive teaching system was not feasible (Zissos and Witten, 1985). The COACH system demonstrated that an interactive adaptive teaching system is indeed feasible. To create a tool for enabling research in adaptive learning paradigms. The COACH architecture allows researchers to describe and test ideas about adaptation in learning and to develop adaptive teaching approaches. COACH allows researchers to build working adaptive help systems.
2. The COACH Scenario Users need help while working on solutions to problems in a curriculum or attempting to do productive work using a computer. COACH aids the user in the mechanics of using the computer. The computer creates an adaptive user model (AUM) of a user’s experience and level of expertise. Machine-leaming and reasoning techniques adapt the help provided to the needs of the particular user. Such help is said to be proactive when the computer anticipates the needs of the user and presents help before it is requested. Both the user and the computer can initiate help, in a mixed initiative interaction. This learning paradigm is introduced by these three hypothetical users, working at different levels of experience and proficiency, learning the Lisp interpreted language.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
71
2.1 A Novice Lisp Programmer Freshman Bill is taking his first programming class. He has attended two classroom sessions and is sitting down for the first time in front of the computer. His assignment requires him to use Lisp s-expressions to make arithmetic calculations. The computer screen with which he works is segmented into four panes: a uses input pane to type and edit work, two help presentation panes (one general and one more specific) and an output pane (Fig. 1). To help Bill get started using the computer to do his work, the system encourages him to type an open parenthesis to begin an s-expression or to type a defined word (see Fig. 2 ) . Examples show him this. He types (. The help changes, telling him that he must type a function name, and gives an example. “function”, “s-expression” and “defined word” are concepts in the system’s domain knowledge network. Bill remembers these words but does not quite remember what they mean. An example of the use of a function is displayed to get him started. Bill types ADD. The help window tells him that no known function starts with A D , and suggests that he press the “rubout” key. He could press the mouse button to browse available functions, but P L U S, not A D D, is the function he now remembers from his class and he types it (see Fig. 3). As
General Help
Token Help
User Interaction Pane
Output Pane
FIG. 1. A user interface demonstrating the COACH adaptive help system.
72
EDWIN J. (TED) SELKER
FIG.2. The COACH interface after one character is typed.
Bill types a space after PLUS, an example of using P L U S , together with a simple description and a simplified syntax, is presented on the help pane. The context-dependent help allows Bill to avoid the usual startup stalemate in which a user does not quite know what to type to get started. Novice programming problems such as mixing syntax with ideas have also been averted.
2.2 A Student Programmer Sophomore Harry is trying to write a program. He types DEFUN. The help
COGNITIVE ADAPTIVE COMPUTER HELP (COACH1
73
Description
This 1s a L I S P REFIDER iuhich coaches programming LISP Type
LISP commands
OP
EHUCS edltrng Commands:
Sgntax: to start vrrtrng rn LISP. TYPE:
I
or
a defined 5YnBOL.
For EMACS help #usethe EUITUR-HELP menu.
IP
you intend to &Cine
It Later. please cswrett your work.
FIG. 3. The COACH interface during a simple error situation.
pane reminds him that he must name the function being defined and then give it an argument list. The system gives him an abbreviated syntax, omitting the difficult argument types (optional arguments, keyword arguments, etc.). He types TIMES-2 (I) (PLUS. The system shifts its focus to helping him with the P L U S function. An example of a use of P L U S he has already made is displayed. He realizes that he did not really mean to add numbers. He back-spaces and types TIMES 12). The system changes its focus of help to T I ME S as Harry is typing it, and back to D E F U N when he is done with T I M E S .
74
EDWIN J. (TED) SELKER
FIG.4. The COACH interface supporting learning about a specific Form and the idea of form.
Intermediate programmers like Harry often have problems keeping track of the context and appropriateness of program pieces. COACH works to keep this type of programmer oriented by providing context-sensitive help and user-examples. An instance of user-example help is shown in Fig. 5; the last correct use of T I M E S - 2 ( T I M E S - 2 4 ) was presented when the user forgot to include an argument in the function call.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
75
FIG.5. The COACH interface using automatically accumulated knowledge help for a user-defined form.
2.3 An Expert Programmer Expert programmer Connie is working on an internal part of the Up d a t e A model of her expertise allows the system to know that it need present very little help. When she types (SETF, the system shows the very complex argument list syntax for the S E T F function. If she uses a function for which no help has been written, the system reaches into that function’s definition to present an argument list for it. If she makes an error (e.g., wrong argument Ru t e computer program.
76
EDWIN J. (TED) SELKER
type), the system changes its view of her expertise slowly at first. If she keeps making errors, it will change its opinion of her more quickly and begin to provide more help; it will show her examples and remind her of things which are related to the constructs she is using and the language concepts involved. Expert programmers must be aware of anomalous, as well as simple, relationships between parts of a computer language. Even if they do not memorize them, experts are likely to use sophisticated syntactic features. If Connie were using function or variable names which she had not yet defined, the system would put these names on a list of undefined functions. A menu would allow Connie to select from those names to aid her in remembering to define them later. In other words, COACH would select information for experts on the obscure, anomalous parts of Lisp without bothering them with introductory information. The above vignettes illustrate an adaptive user interface model tracking users’ needs to teach what they need to know about their computer environment while they are engaged in their own work. A videotape (Selker, 1991) demonstrates COACH. The following overview of relevant work describes the current state of computer aided instruction (CAI) and inspirations for the COACH approach.
3. Review of Literature: On-line Computer Teaching COACH is a vehicle for research in human-computer interaction and A1 as applied to teaching and learning. Teaching styles impact students’ roles in their learning tasks. Tom Mastaglio (Mastaglio, 1989) described a continuum of interaction styles for computer teaching: from a tutor who prescribes what a student should do-to a coach who kibitzes with a student while the student is trying to do something-to a critic who reviews work after it is completed. In these three teaching styles the point at which the system intervenes is varied relative to a student’s phases of work: design, construction or evaluation. The use of computer systems for teaching has been called computer aided instruction (CAI), intelligent computer aided instruction (ICAI), artificial intelligent computer aided instruction (AICAI), or intelligent tutoring systems (ITS). These names have been created by their proponents to reflect the technological and research progress through the years. A common component of all such work is a prescribed educational goal to which students are guided with subgoals and tasks. The curriculum can take the form of text with comprehension tests or problem sets or educational games. The term CAI will be used in this paper to refer to all such systems.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
77
Research in CAI has included experiments using artificial intelligence (AI) representation, reasoning, and machine-learning techniques to direct a tutoring session (Reiser et al., 1985; Sleeman and Brown, 1982). This section outlines progress in CAI, highlighting the roles A1 has played. CAI teaching styles can be broken into the following classes: 0
0
0
Tutoring systems which include a syllabus or courseware schedule of' what a user should know when, and how to teach it. Such systems use generic models of what all students need to learn. Help systems which answer questions a user asks. These have no syllabus or model of what a student needs to learn and only interact with a user when the user requests assistance. Coaching systems which remark on user problems and successes as they occur. Like traditional help systems, coaching systems help users while they are working on problems. Unlike help systems, coaching systems offer unsolicited advice. A coaching system might also have mini-tutorials useful for specific situations in which users find themselves. Critic systems suggest or perform changes to completed user solutions. The original Lisp Critic (Fischer et al., 1985) worked in this way, giving criticism and making improvements to work students brought to it. A critic allows users to think and solve problems on their own, only giving them advice for a completed attempt. This can be compared to the grading and feedback phase in a traditional classroom course format, or the project review phase common to design projects.
A major distinction among these styles concerns motivation for, and timing of, teaching feedback.
3.1 Tutoring Research Tutoring systems instantiate the classic theory of classroom teaching, that students should learn things in a stepwise fashion. These systems present a session based on an analysis previously made by a courseware designer of what a student needs to learn and in what sequence. Such an approach is not generally responsive to an individual user, except to indicate performance scores. Coaching and help systems respond immediately when the user has a problem. A critic does not provide suggestions until the user has completed a solution. Current theories of instructional design (Reigeluth, 1983) focus on important issues of syllabus design (Aronson and Briggs, 1983), student motivation and teaching students how to learn (Collins and Stevens, 1983). Dennis Aronson and Leslie Briggs, for example, developed a widely referenced list of
78
EDWIN J. (TED) SELKER
“instructional events” centered on steps of teaching a topic (Aronson and Briggs, 1983):
(1) gaining (the learner’s) attention; ( 2 ) informing the learner of the objective; (3) stimulating recall of prerequisite learning; (4) presenting the stimulus material; ( 5 ) providing learning guidance; (6) eliciting the performance; (7) providing feedback about performance correctness; (8) assessing the performance; (9) enhancing retention and transfer. Such a list encourages the teacher to consider the teacher’s goals for the student and use these objectives to guide interaction with the student. How should educational approaches themselves be judged? The educational goals differ in varying approaches. Glen Snelbecker (Snelbecker, 1983) created a list of nine educational issues that may be used to evaluate the priorities of a particular teaching model. The teaching model might emphasize the importance of student preparation for a topic or approach, gaining and keeping student attention, quality teaching presentation, timely response, appropriate feedback, teaching for retention and use, presenting material to aid student understanding, encouraging students to learn to use their creativity, or the model might focus on management of teaching situations. A teacher’s goals may be met by choosing materials based on a model which emphasizes the desired aspects. With the exception of student preparation, the COACH teaching model’s adaptive approach allows it to concentrate on all of the above issues as appropriate. Tutoring systems depend on syllabi to guide students. Computer tutors most often guide lessons by branching on students’ responses to yes/no, multiple choice, or word fill-in questions. Other tutoring work uses more sophisticated techniques to help guide a user through a lesson plan. CAI work dates back to at least the late 1950s (Rath et al., 1959). Suppes (Suppes, 1967) describes the classic CAI method of mechanizing a traditional paper textbook. In place of a textbook, a user reads text and questions from a computer screen. The user works through the text by typing word or number responses to problem questions. Most commercial tutoring systems use this mechanized programmed textbook method of presenting information to a student. Even early tutoring systems varied their responses relative to a user’s answers, something that currently available commercial help systems fail to do. CAI research has taken the syllabus approach to learning much farther than the initial automated textbook efforts. John Anderson’s Lisp Tutor has a sophisticated way of presenting the lesson questions as programs for a student to write (Reiser
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
79
et al., 1985). Exercises require students to write programs designed to teach about a particular concept or tool. The student’s solution can vary from the teacher’s prototype in the naming of variables, but cannot vary the functions used; for example, a student cannot use the “IF” function where the system expects the “COND’ function. The student answers questions and writes programs; the system guides the student through the syllabus. The system certifies a student as a learned programmer relative to the problems in the syllabus that have been completed correctly. Anderson’s system improves the learning abilities of Lisp students. His system expands on the CAI textbook-style syllabus. A student’s progress is guided by knowledge formalized in “if, then” rules, and the word or number answers of early CAI systems are replaced with programs the user must write. In one experiment (Corbett and Anderson, 1989), the Lisp Tutor was modified to allow users to use the Lisp interpreter to experiment with Lisp statements. Students could explore the Lisp environment if they desired to “try things out” without leaving the tutoring environment. Curiously, in this experiment, student progress was retarded by allowing them to work on things other than the solution to the current tutoring problem. In this otherwise controlled learning environment, the flexibility of allowing exploration seemed to distract the student (Corbett and Anderson, 1989). Educational settings in which students thrive on exploration do, however, exist. Seminal work in applying A1 in the field of education is typified by John Seely Brown and Richard Burton’s productive collaboration. Brown and Burton’s “Debuggy” (Burton, 1978) introduced knowledge representation and reasoning into CAI. In grade school studies, they showed Debuggy could teach a student about long subtraction with carrying, understanding the student’s mistakes better than a teacher. Their approach to teaching subtraction included cataloguing the one hundred twenty or so possible types of mistakes a student can make while doing a subtraction problem. The system used a static sub-skill network to characterize what skills the student might be lacking which generated particular errors in an answer to a problem. For each possible mistake, the system had knowledge describing underlying missing concepts which could be responsible for the mistake. Debuggy ’s sophisticated representation of the problem domain enabled it to use a reasoning approach to evaluate bugs in student subtraction strategies. Pre-analyzing the entire solution and error domains gave the system the ability to explain all incorrect subtraction algorithms. The reasoning approach which Brown and Burton used in Debuggy required them to completely describe and analyze all possible subtraction errors. Many domains of interest are much larger than subtraction; identifying all possible mistakes in them is usually impractical. In fact, Brown and Burton found teachers seldom understood subtraction in the detail that the Debuggy approach used to reason about potential gaps in student understanding.
80
EDWIN J. (TED) SELKER
The syllabus approach to teaching has been validated as useful in some situations. For example, the order in which students learn two kinds of loop constructs can determine how easily they can master them both. Jeff Bonar (Bonar and Soloway, 1985) ran a large scale study to test this. Specifically, he taught some students the R E P E A T U N T I L construct in Pascal before the D 0 W H I L E construct and some after. The students did better when they learned R E P E A T U N T I L first. The order in which new skills are introduced can be important, even when the learning of one skill is not a prerequisite to the learning of another. Evaluations of the sort done by Jeff Bonar are especially instructive. Unfortunately, because various educators (and learners) may have different goals, not all issues of educational approach can be definitively resolved.
3.2 Help Systems While tutoring systems present a curriculum through which a student travels, help systems at the other extreme allow motivated users to ask questions of the system. Many students are not motivated to follow the rigid lesson plan of a tutoring system. Many computer users come to an unfamiliar computer system with relevant experience and do not need to learn everything about it as though they were novice computer users. Their reason for using the new system may be that they have a particular task they want to perform that requires that system. Students absorb information when they have a use for it. Rather than provide a generic syllabus for learning the entire system, it is preferable to center the aid users will receive from the system on their specific needs related to that task. Unlike tutoring systems, help, coach, and critic systems work with the students in productive situations. Systems that support student goals can allow the student flexibility. They can also provide user support more easily than systems that give students simplified so-called “training-wheels” tutoring outside of their work environment. Training-wheel tutoring systems protect students from a realistic work situation, but must be left behind when a student is ready to experiment or begin a real project. Systems which respond to user inquiries in work situations are termed help systems. Unlike other CAI research, most research on help systems has focused on the quality of information available to aid the user and has not extensively explored the use of A1 or other strategies for delivering the information appropriate to the situation (Borenstein, 1985). Nathaniel Borenstein (Borenstein, 1985) performed behavioral experiments showing that help systems are more effective when they are available from within (integrated in) the computer program than when a separate help system
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
81
must be consulted. His studies also showed that help systems are improved when they give users context-dependent responses, basing the information users receive on the part of the computer program with which they are interacting. Borenstein also observed that the quality of help text and its relevance to the particular situation are more important than other usability issues, such as graphic design or the ease of asking for help. That is, content matters more than form.
3.3 Coaching Systems A computer aid for using or learning a body of knowledge may be called a coach when, like a human coach, the computer trains, reprimands, gives aid for a personal weakness, and tries to provide a needed idea or fact when appropriate. As well as being extravagant, human coaches can be wrongly perceived. There is evidence that continuous human guidance provided during students’ writing activity is often seen as ill-intended, leading students to reject it (Brehm and Brehm, 1981). Computer-based guidance does not arouse such a reaction (Zellermayer et al., 1991). However, until now such coaching systems have relied on what users are doing (context) or whether they have made an error, without using any representation of the users’ actual performance or users’ understanding of the system. To be able to respond to the user’s actual level of proficiency, the system needs to learn from the user’s actions. If a coach system had this ability, it could be said to have an adaptive user model (AUM). The system could build the adaptive model by asking a user questions. Elaine Rich’s GRUNDY (Rich, 1983) system is well known for using a simple nonadaptive user model to choose books for users. A user filled out a “form” which her system used to create a user model. GRUNDY consulted this user model “stereotype” to select library books which might be interesting to that user. An adaptive version might include feedback and followup questions. This work, instead, explores user models which are built by watching the user’s actions (Selker, 1989). An important issue is whether unsolicited feedback is intrusive, derailing and frustrating users, or whether it can offer welcome advice (Bereiter and Scardamalia, 1984). Michael Zellermayer (Zellermayer et al., 199 1) performed a study which gave mixed results concerning unsolicited computer-presented advice. A system called “The Writing Partner” cued students with so-called “metacognitive” questions, concerned with higher level information than the writing itself, such as planning and organizing. The system attempted to help the students plan and organize their papers by asking questions such as: “Do you want your composition to persuade or describe?” In a comparison of three groups of students writing essays, one group received no guidance, one group received
82
EDWIN J. (TED) SELKER
metacognitive guidance when they solicited it from “The Writing Partner”, and one group received unsolicited metacognitive guidance from “The Writing Partner”. Many people have the intuition that unsolicited computer advice would intrude and slow a user down. And, indeed, while using “The Writing Partner” during the training period, the students in the group that received unsolicited advice took longer to accomplish their work and did not show an improved essay writing ability. While this might seem to corroborate the impeding, intrusive advisor hypothesis, those same students who had been continuously advised on the metacognitive aspects of their writing tasks were able in essays written two weeks later (on paper, not using “The Writing Partner”) to write better essays than students in the other groups. This provocative study shows how a coach offering unsolicited help can teach a person a new skill. Unfortunately, it also indicates that learning this new skill (essay organization and planning) with this type of unsolicited help has a cost. Happily, this study provides new evidence that unsolicited help can shorten rather than prolong a task. The idea of an explicit model or expectation of a user’s performance is not new. Burton and Brown’s electrical circuit trouble-shooting learning environments (SOPHLE 1 , 2 and 3) (Burton and Brown, 1982) give important results concerning user modelling. SOPHIE 3 included an evaluation strategy which compared students’ performances in designing circuits with those of expert circuit designers. The system reasoned about differences between novices and experts; in so doing, it attributed problems encountered by the novices to bugs in the novice user’s otherwise expert approach. SOPHIE research promoted user exploration of the domain as a way of improving the task relevance of a syllabus. Since either the system or the user could control the session, these systems can be said to have provided mixed initiative interaction. One important conclusion which this work (and others e.g., Feldman, 1980; Genesereth, 1982) put forward was the fallacy of the novice having an expert user model with some missing parts. Burton and Brown, instead, concluded that novices have a qualitatively different model of a domain than experts. Because of this, a novice user cannot easily be evaluated relative to an expert model. An expert has understanding which does not necessarily follow the procedural and simplistic analysis of a beginner. Educational systems like SOPHIE or Burton and Brown’s WEST (Sleeman and Brown, 1982) involve users in a little world in which they can explore, learn, and try things out. Games with simulation are widely used in CAI (Sleeman and Brown, 1982). They are particularly appropriate in coachirzg systems. The feedback and integration of such environments is natural for the coaching paradigm. Game user interfaces often include a consistent simulated environment referred to
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
83
as a microworld. Microworlds and other game teaching approaches have the advantage of addressing student motivation as an explicit goal.
3.4
Critic Systems
A critic system criticizes or evaluates work at a specific point, either when requested by a user or at the end of a session. Only when a user has come to this point does the critic system offer its aid. This approach has often been employed in order to allow the computer to analyze student work off-line. The advantage of a critic system is that it gives the user time to reflect upon the system’s suggestions and create a solution without interference. However, the disadvantage is that the advice is not available at the time the problems arise. Adrian Zissos and Ian Witten (Zissos and Witten, 1985) built a prototype adaptive critic system which could analyze transcripts of EMACS text editor usage after a user session. ANACHIES, as it was called, could decide how to improve a person’s use of EMACS editor commands. Zissos and Witten’s paper offers a
FIG. 6. An illustration of the responsiveness and educational goal differences which characterize different computer learning environments. The horizontal dimension, adaptability, refers to the system’s ability to be changed for a situation or user. The shaded ellipse indicates where systems which automatically change or adapt to a user’s goals would lie in the illustration.
84
EDWIN J. (TED) SELKER
pessimistic view of the feasibility of having a system react as the user needs assistance. Their research convinced them that the computational requirements for using adaptive AI techniques in interactive applications could not feasibly be met with the computers available in the foreseeable future, a cynicism which this research demonstrates to have been unwarranted. This section has reviewed CAI teaching styles in terms of their responsiveness to users and how their educational goals are chosen (Fig. 6). Critic systems, for example, offer batch responses, while coaching environments respond interactively to a user. While much work has been done on tutoring environments, demonstrations of real-time adaptive teaching environments have not been convincing. Researchers still question the reasonableness of real-time adaptation. Until now, neither the utility of adaptive interfaces of any type nor the possibility of unintrusive unsolicited help have been shown. COACH offers results addressing these concerns. The following sections describe the COACH interface, concentrating on student-motivated teaching and learning interaction.
4.
Requirements for an Adaptive Help Testbed
Tools for building computer aided instruction systems (CAIs) are usually referred to as CAI authoring systems. Tools for managing a rule base and providing a ready-made production system that can run these rules are often referred to as expert system “shells”, connoting their ability to be hard containers that can store knowledge or representations (Ban and Feigenbaum, 1984). The earliest described rule system, PLANNER (Hewitt, 1972), with its implementation, MICRO-PLANNER (Sussman et al., 1970), could be described as a shell for developing A1 applications. It was quite a general system in which “theorems”, which can be generally thought of as rules, could be specified as being useful for forward or backward chaining. Filters specified classes, which can be loosely thought of as rule sets. More widely available systems like EMYCIN (Clancy, 1986) and Intellicorp’s KEE (Fikes and Keeler, 1985) contain many tools to represent and reason about a domain. As well as providing an authoring system in which teaching information could be changed, an adaptive user model would be used as a shell in which reasoning about the teaching process itself could be modified. An A1 practitioner using an expert system shell can utilize the shell’s representation and reasoning machinery without having to build it from scratch. Shells are designed to lever A1 practitioners’ efforts by allowing them to create expert systems by merely describing the rules relevant to the task or skill domain; the practitioner may then use the reasoning tools provided in the shell. The challenge to building an A1 application is understanding the domain knowledge
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
85
that is to be embodied in the application, understanding the reasoning relationships in the knowledge, formalizing these, and converting them into the A1 shell’s representation and reasoning formalisms. COACH, which is described in detail in the following sections, is a shell for adaptive coaching. It provides machinery for formalizing both teaching and domain knowledge for coaching user interfaces. COACH is designed to allow a curriculum designer to encode a domain to be taught. It is also designed to allow an educational researcher to encode theories of when and how to present information to a student. Additionally, COACH allows the cognitive scientist to encode approaches to gathering user information and methods for altering treatment of students based on their responding behavior. The courseware developer’s process of converting the system to teach in a new skill domain requires the following steps: (1) Identify a task or skill domain. (2) Identify any delimiters and other token types that the system does not already have. (3) Write token handling function methods for the domain’s token types not already supported. I (4) Change the token table for parsing delimiters. (5) Identify commands in the skill domain for which help initially will be made available. (6) Describe the syntax of the language being taught in the COACH formalism.’
A primitive help system would then exist for the skill domain. The system would be able to check user syntax, look for undefined functions and variables, learn about user examples and add help for new and system functions. It could also increase and decrease help and change levels of assistance. The system would not yet know about relationships between syntactic units, concepts, basis sets or required knowledge. As described in Section 6, these create a representation for generating adaptive help which can remind a user of related material and concepts in the skill domain when appropriate. Identifying the relationships between parts of the domain then allows the developer to add deeper knowledge about the skill domain. This can be accomplished with the following steps: (1) Identify skill domain concepts. ( 2 ) Identify basis sets in the skill domain. (3) Identify required knowledge for skill domain parts.
’
COACH/2 automates much of 2,3,4, and 6 above, allowing the author to identify and annotate a GUI uing a WYSIWYG authoring tool.
86
EDWlN J. (TED) SELKER
(4) Write description, syntax and example text for different expertise levels of each skill domain part. Just as A1 shells have simplified A1 application development, a shell for testing and extending adaptive help simplifies experimentation and development of adaptive help applications. As an expert system shell requires a practitioner to formally understand the reasoning that is included in an A1 system, so an adaptive help shell requires the courseware designer to understand the formal syntax of the language for which the system is to produce help and the relationships between its parts (Selker, 1989). For as long as computers have existed, people have talked of the possible value of intelligent computer assistants. Unfortunately, the architectures suggested have not been entirely successful. Either they cannot run interactively with a user, or they have not been shown to improve the user’s performance (Zissos and Witten, 1985; Gentner, 1986; Waters, 1982). The COACH implementation has neither of those problems; it has been demonstrated to run interactively and improve user performance.
5. Theoretical Considerations For Creating COACH After years of work, researchers came to the conclusion that proactive interactive adaptive computer interfaces were not feasible (Zissos and Witten, 1985; Gentner, 1986; Waters, 1982). This work challenges that conclusion and demonstrates an example of such a system. The demonstration relies crucially on an understanding of the constraints and requirements of real time response. To satisfy these constraints and requirements, the following questions must be addressed. What kinds of domains are interesting and possible to work with? What kinds of errors do users make? In what ways can these errors be addressed? What teaching techniques can be used without undue computational overhead? This section addresses these issues. Section 6, which describes the COACH architecture, addresses language theory issues that pertain to evaluating user work (see Section 6.4).
5.1 Suitable Domains for Demonstrating Adaptive User Help Demonstrating that adaptive user help is a viable approach requires showing both that it will work for an important class of interfaces and that it will provide valuable improvements over current help systems. The class of interface chosen was text; the first domain chosen was Lisp. Text entry is probably the most common interactive technique used to communicate with computers. Even in the age of graphical languages, many operating
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
87
system command languages, computer application interfaces, text markup languages and programming languages are interpreted text interfaces. Text was chosen for the initial COACH implementation because text is computer parsable and used in so many interfaces. Also, learning a text-based interface forces a user to confront difficult educational issues, and to work with incomplete and inconsistent knowledge. Computer interfaces often include commands which are too complicated or too seldom used to be easily remembered. To show that COACH will alleviate such problems, the representative demonstration domain should contain many commands with complex syntax. Computer interfaces contain many interdependent commands and concepts. Thus, a representative demonstration domain should also include complex interdependencies . Computer interfaces often allow more than one way of doing things. Therefore, a demonstration domain should permit redundancies and alternative solutions. The information the users must master often changes as time goes on. Therefore, a demonstration domain should be extensible. Educational domains are often too large to enumerate or analyze fully. The domain should be larger than the implementation can fully represent. To demonstrate adaptive user help, a domain was sought which would show several strengths of the approach: the ability to cope with a large and changing domain, the ability to extend help to include additions made to the domain by the user, the ability to permit solutions to have a complex structure, and the ability to coach in domains relying upon many difficult concepts. Lisp programming is a domain which provides all of these challenges. Many pedagogical tools are designed for limited or toy domains. Lisp is extensible, requiring a help system that will work even when the skill domain changes and enlarges. Working with a domain that changes and enlarges (i.e., an “open system” (Hewitt, 1985)) is a test of the robustness of COACH. Lisp is a complex open system that allows demonstration coaches to work in a realistic domain. Lisp has redundant ways of doing things that require the system to act in ambiguous situations. Lisp is also a domain for which other people have built intelligent tutoring tools. This allows COACH to be compared to and benchmarked against their work. COACH/2 was explicitly designed to allow experimentation with teaching GUI procedures and coaching the use of dialog boxes.
5.2
Classifying Knowledge Deficits
A user is trying to learn a skill domain, e.g., Lisp. The initial implementation of COACH was designed for skill domains that require interpreted text input. Users must remember keywords, delimiters, syntax, and their previous input to effectively use text-based programming languages and computer command line
88
EDWIN J. (TED) SELKER
interfaces. People forget; even experts are always working with gaps in their knowledge. These gaps may be classified into three types:
0
Issue-the focus of what a person is trying to learn. This is the material users are aware of and know they do nut know. On-line help system queries are useful for learning issues. COACH shows such information automatically. Incomplete-the knowledge a person does nut know exists. These are actual holes in the particular user’s knowledge. COACH points out general knowledge when a user is having difficulty in a particular area. Inconsistent-the knowledge users think they know but do not. COACH points out errors to highlight such inconsistencies.
5.3 A Classification for the Help to Provide to Users Interesting models and theories of instruction and instructional design exist (Reigeluth, 1983). David Memll’s (Merrill, 1983) “Component Display Theory” describes structures which educators use to organize their efforts. He separates domains of content into facts, concepts, procedures, and principles which can be known well enough to remember and apply correctly in appropriate situations. Such a taxonomy centers the process of learning on a domain and its interrelationships. This is extremely useful for creating a network of knowledge relationships which characterize the domain. Edwina Rissland (Rissland, 1978) created a simple taxonomy of help examples. It highlights the importance of providing different kinds of examples appropriate to the expertise level of the user. Rissland’s help taxonomy consists of four levels of help: 0
0 0
0
Sdurfer help is used at a novice level. Only simplified basic information is provided. Novices depend on the literal cues in a problem situation (Glaser, 1985). The information given them, then, must be carefully designed so as not to mislead them. Reference help is more complete to familiarize users with standard usage. Model help is a complete description of what something is and how to use it. Expert help is machine-level description such as one might find in reference manuals.
This taxonomy of help examples shown to a user can be refined to segment each level of help into types of help a student may need. For COACH, this taxonomy Is extended to distinguish and include examples of correct usage, legal syntax and descriptive text telling how and when to use something: 0
Exampfe help is an actual demonstration of an exemplary solution or solutions. Despite efforts to teach design through concepts and theory, the only
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
0
0
89
effective teaching tool for design is commonly agreed to be the providing of examples (Vertelney et a / . , 1991). Moreover, procedural and syntactic knowledge are often most easily conveyed through examples. Description help is an explanation of the utility and use of a solution type. This information can range from philosophical background to an explanation of the use of a specific language part suggested for the solution to a user’s problem. Syntax help is a template showing the structure of a legal solution. For users to apply a specific statement in varying situations, they must internalize a model of its utility; syntax is the essence of a concise definition.
5.4
Tracking User Proficiency as the User is Working
Various approaches to analyzing knowledge about users have been tried and have been found to be problematic. Systems such as Don Gentner’s (Gentner, 1986) used mathematical proofs of programming correctness to decide what a user was doing. The problem with this approach was that the computation required for these proofs grew exponentially with the amount of user work under analysis (Hopcroft and Ullman, 1979) which limited its utility for large bodies of work. Zissos and Witten’s EMACS critic system, ANACHIES, used cluster analysis to make determinations of user capabilities (Zissos and Witten, 1985). This, too, was a computationally intensive procedure which grew exponentially with the size of the language. Care has to be taken to create an architecture capable of recording user activity in a representation which it can use to reason about how to provide help and react as the user is typing. Careful use of representational, reasoning and learning techniques make this real time response possible. Several strategies can be used in the representation to limit knowledge search and access problems. Relationships in the knowledge representation may be recorded and stored as links as soon as they are known. Search would thereby be diminished by the pre-defined network created by these links. By limiting the depth of relationship links, search difficulty caused by complex links is decreased. As many user model characteristics as possible would be recorded as scalars, so as to limit representation growth and reasoning difficulty. Several strategies would be used to make the reasoning system efficient. The rule system would use an “if/then forward-chaining approach” and avoid the less determinate, computationally extravagant “backward-chaining means/ends analysis” (Barr and Feigenbaum, 1984). The reasoning can be broken into rule sets which would be used on specific knowledge and on specific parts of the reasoning process. These segmentations of reasoning problems decrease complexity when searching through the rules and searching in the knowledge.
90
EDWIN J. (TED) SELKER
Finally, it is important to choose learning techniques which are feasible for real time computation. In order to learn in real time, the system should limit itself to opportunistic and simple hill-climbing learning. Below is a classification of learning taken from Machine Learning (Michalski et al., 1983). This classification is annotated to show low computational cost learning techniques which could be used in an adaptive learning system to organize the ways the system can change its behavior: 0
0
0
0
Learning fronz examples is the practice of using specific solutions already achieved in more complex situations. This technique can be used in the COACH interaction style to collect user-provided syntax and examples, and to offer help for user-defined variables and statements. In the interaction style, syntax may be collected by recording user definitions. Examples can be collected from user work. Learning by analogy is gathering knowledge in one situation for use in another similar situation. This kind of learning can be used by COACH to decide when to explain things in terms of skill domain parts the user already knows. The help system architecture would also include a network of skill domain parts in which it could access information for this purpose. Learning from instruction is utilizing a user interface or special language to introduce knowledge into the system. A courseware designer uses this technique to modify a CAI system without programming. Expert systems and most state-of-the-art A1 education systems rely exclusively on developer modification to change response behavior. Modifications and improvements for the Lisp Tutor (Corbett and Anderson, 1989), and the Lisp Critic (Fischer et al., 1985) are made in this way. COACH is designed to allow a researcher to add facts and rules that improve the adaptive user model system without writing Lisp code. Observations of students using the system give the researcher ideas of how to change the way the system treats a user in different situations. These ideas are put into additions and changes in presentation text or presentation rules. Learning by programming is simply the practice of having a developer add knowledge to a system by actually writing code. Before rule systems existed, this was the only way of improving an A1 application. Any system can be extended by programming to add function or change domain.
The technology described above is used in the following section to present the COACH architecture, which enables real-time adaptive help in interactive computer environments.
6.
An Architecture for Adaptive User Help
This section introduces the structure that enables adaptive coaching. The adaptive help system can be modeled with four interacting parts or objects:
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
91
a window interface, a reasoning system, an adaptive user model (AUM) which relies on coaching knowledge and domain knowledge, and a parser (Fig. 7). To use a computer language effectively, a student needs to understand its syntax and semantics. So, it is reasonable to use a language definition as part of the structure of the AUM and use the language definition as a way to classify user progress and to guide instruction. This would include the domain knowledge used to compose statements and the token types used. This definition by itself, however, does not include all knowledge needed to understand and use a computer language, because a user must also understand fundamental programming concepts and relationships. The AUM should represent the concepts underlying the language, and the basis sets necessary to accomplish defined tasks (see Section 6.3.1). Each statement, token, concept and basis set may be referred to as a learnable unit, that is, the smallest quantity of information represented as a discrete entity to a user. Each of these learnable units is represented in frames with named slots for useful attributes. one frame for each learnable unit.
i (1 : Parser
i
0 \
Multilevel Parser (see Fig. 10)
FIG.7. Dashed lines in the figure represent logical relationships, solid lines represent physical relationships. COACH is composed of interacting parts or objects. The window interface manages text editing, output formatting and menus. The reasoning system creates and uses the AUM to display domain knowledge help and to modify domain knowledge. Coaching knowledge controls these reasoning activities. A multilevel parser notes a user's work context and dispatches information to the reasoning system.
92
EDWlN J. (TED) SELKER
The adaptive automated help architecture must represent examples of each of these learnable units and a model of the status of the student in terms of the particular student’s understanding and ability to use each one. A user model frame is recorded for new user-defined learnable units as they are created, allowing the system to give help for these as well. A skill domain like Lisp, for which a user is being helped, is represented in the system by these syntactic and conceptual parts. Rules draw on knowledge in frames as they update user help and frame knowledge. A simplified blackboard mechanism allows the knowledge module to propose and veto help text before it is presented. The presentation rules build a list of help items to present. Veto rules and help presentation space constraints eliminate all but the most appropriate items of help. The AUM relies on the production system to make decisions based on the user model it has built and to decide how to advise the user. The architecture relies on A1 technology both for building the AUM and for guiding instruction. The guiding knowledge is embodied in domain knowledge facts and coaching knowledge rules. Domain knowledge is represented in the help system parser grammar and in subject and adaptive frames (see Section 6.3.1 below). Subject frames contain knowledge about the skill domain a user is trying to learn (e.g. Lisp). These frames are associated with each learnable unit. Adaptive frames hold usage data and user examples for each function. They are collected as the user works and comprise the AUM knowledge structure. Coaching knowledge is contained in rules that create and control the adaptive frames and the help presentation blackboard. Update rules control the recording of user experience for the AUM. Consistency rules contain knowledge about how to build the AUM. These two rule sets work to update the AUM. Presentation rules embody knowledge for using the AUM. The parts or “objects” guided by expert systems knowledge comprise the COACH adaptive automated help architecture. These parts and the way they work together are described in the following sections.
6.1
Window Interface
The window interface manages the screen real estate. It provides separate panes for help text, user input, computer output, and for a menu by which the user can request help (see Fig. 8). It dispatches input key and mouse events to the other modules and presents computer response and advisory help text. Text-based interactive environments generally type computer output, help, and error messages to a single user console window. The user also types into this window. Confusion often arises concerning which text the computer typed and which the user typed. The combination of such different streams of information
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
II
93
General Help
Menu Pane FIG. 8. The window interface separates user input from help and system responses. A menu at the bottom allows a user to request help directly.
into one communication channel requires that the user remember which text on the screen was written by the user and which was written by the computer. The window interface design used in the initial COACH study physically separates user input from computer output and advisory help. This segmentation insures that computer help and advice do not physically interfere with user input. One way to do this is to vertically separate the token help pane and the general help pane from the user interaction pane and the computer output pane. The user interaction pane does not lock the keyboard when an error intervenes; instead, the character that caused the error (reported on other panes) is highlighted. Using character highlighting to replace keyboard locking and separate panes to preclude typing on the user interaction pane provide visual aids permitting the user to focus more attention on the problem to be solved and less on the computer mechanics. More specifically: 0
The Token Help Pane is positioned as closely as possible to the user interaction pane to allow a user to see it easily while typing. This pane is designed to give focused help concerning the specific characters a user is typing. For example, when a novice is typing a token (e.g., a number, symbol, defined variable, etc.), the token help pane displays help concerning that token. When the computer can reasonably predict the next token required, such a pane provides advice concerning it. The pane shows the token name and, as described in Section 6.3.2, presents various levels of description, example and syntax help adapted to the particular user. This local information would not be as useful to users at intermediate and expert levels of proficiency. The adaptive user model chooses when to eliminate this kind of help.
94
EDWlN J. (TED) SELKER
0
The General Help Pane is positioned farther from the user’s immediate view than the token help pane. This pane presents all teaching knowledge not presented for token help. Various kinds of teaching text concerning concepts, functions, and user work compete to be presented here. This help window could be made to shrink as a user improves.
0
The Output Pane is placed next to the user interaction pane, so as to be noticed and available but not intruding on the user’s workspace. This placement allows a user to easily compare computer output with input text, without their interfering with each other on the screen. By contrast, on a standard “Lisp listener” console, user text competes on the screen with system error and output text. This output pane provides the same output and error feedback that a standard system gives. The User Interaction Pane is a text editor window. This pane allows users to enter and edit work just as they do without a help system. Errors are highlighted in sequence to allow users to correct them in an organized way. As users improve and the need for other help windows diminishes, this window could be made to take up more screen real estate. The Menu Pane would be a fixed menu area with spatially separated words that call attention to additional system support. This menu pane allows the user to request explicit interaction by using a mouse, rather than by the usual method of typing requests in a console pane and possibly obscuring current work already there. While the help panes ordinarily would automatically provide computer generated assistance to the user, the menu would permit users who recognize the need for help to initiate a request for it. The help generated by the request would appear on the help panes. By selecting menu items with a mouse, the user could ask the computer to give help, show undefined variables or functions and so on. This pane would give the user an explicit medium for interaction with the coaching system. The user would also use the menu for such routine functions as saving and reading files, logging in or logging out. Pressing a mouse button in the other window panes could be arranged to provide appropriate so-called “pop up” menus useful for the context of the window.
0
0
6.2
Reasoning System
The reasoning system controls all aspects of user monitoring and assistance in COACH. It is made up of a production system which interprets rules and a simplified blackboard which resolves presentation goal conflicts. Together they operate on the adaptive user model (AUM) by referring to domain knowledge and using coaching knowledge to make decisions, as described in the following sections.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
95
In the two decades since MICRO-PLANNER demonstrated the utility of using a rule interpreter, or production system, to achieve reasoning tasks represented in rules (Hewitt, 1972; Sussman et al., 1970), many A1 architectures have included them (Davis and Shortliffe, 1977; Barr and Feigenbaum, 1984). For a time it seemed that A1 and rule systems were synonymous. The components of a rule system are, in fact, basic to A1 representation and search (Winston, 1977; Barr and Feigenbaum, 1984). In order to demonstrate reasoning and learning in real time, COACH limits itself to a forward-chaining rule system. This means that rules are searched through in order and fired when they apply. By breaking the system into small rule sets and not using backward chaining (a goal-oriented rule search), COACH avoids both indetemiinate and long searches. This is necessary to allow real-time advising. Knowledge for building a user model and for coaching can be separated into sets of rules, each of which operates in specific situations. The search time in a rule system can be significantly decreased by breaking reasoning rules into groups or rule sets that are scanned for specific reasoning needs. For example, with this kind of segmentation, if a token is used incorrectly, only rules that provide help for incorrectly used tokens need be consulted. Section 6.3.2 will describe rule sets upon which the model depends for reasoning. Rules are made up of an antecedent and a consequent. The antecedent part must be true for a consequent part to fire. Both parts are Lisp s-expressions. Rules may be defined with the following simple Lisp syntax: (DEFINE-RULE
(rule-name rule-set-name)(user-model-parameters) IFs-expressions THENs-expressions 1
The order of rules in a rule set determines the order in which they will be run. The rule’s antecedent, consequent, and position in the rule set determine the reasoning behavior. A blackboard allows statement-proposals and statement-vetoes to interact in reasoning decisions. Blackboard architectures were first introduced in Hearsay, a speech activated chess playing program (Erman and Lesser, 1975). The Hearsay blackboard was an innovative distributed decision making paradigm for running the system. Different levels of speech recognition each had different parts of a blackboard. Knowledge sources could post proposals and look at proposals on the blackboard. Through this blackboard communication process, multiple knowledge sources collaborated to interpret speech related to chess moves. This architecture has had a continued and marked influence on the artificial intelligence community. A blackboard is used in COACH to arbitrate adaptive help presentation. Rules in the presentation rule set are knowledge sources that propose and veto various kinds of help to choose the appropriate text to be presented to a user. The order in which
96
EDWIN J. (TED) SELKER
proposals are posted on the blackboard determines their priority. Vetoes might take proposals off the blackboard. A knowledge source decides to present between one and three help text items on the help window. The three highest priority proposals on the blackboard represent text which would then be presented to the user.
6.3 System Knowledge and the Adaptive User Model (AUM) An Adaptive User Model (AUM) is a formal description of a user relative to a domain that tracks changes in the user’s knowledge in that domain. COACH uses an explicit user model. Frames, facts, and rules represent the user and the skill domain the user is learning. The AUM is a set of user model frames (Minsky, 1976) for syntactic and conceptual parts of the domain being coached. While the user is working on a task, these frames record aspects of the user’s successes and failures. The AUM for COACH is composed of this representation of the user and an associated reasoning system for creating and accessing knowledge frames. The defined network of relationships between skill domain parts, what the user is doing, and the state of the user model is the basis for selecting user help. The reasoning system uses this network of domain knowledge and coaching knowledge in the form of rules together with the AUM. Reasoning and planning about how information interacts, the way the system updates the AUM, and even the system’s adaptation algorithms reside in rules in COACH. By changing these rules, a researcher could tailor help for different skills and pedagogical theories. Each skill domain part has a help knowledge frame. These frames can include descriptions, syntax, and example help at the four levels of help proposed by the taxonomy described in Section 5.3.
6.3.7 Domain Knowledge (1) Adaptive frames. The AUM frames for each learnable unit have the following user model characteristics or slots: (a) User examples. Examples are recorded of user errors and user corrections of those errors. When a user makes a mistake, the system records it. When the user is able to correct the mistake, the system stores that “fix” with the example. If the user later makes a syntactically isomorphic mistake, the system displays the familiar earlier example. (b) Usage data. The following information is also recorded: (i) experience (how often a particular learnable unit has been used by this user); (ii) latency (how long since the user has used this learnable unit); (iii) slope (how fast the user is learning or forgetting something); (iv) goodness (a measure of the user’s overall performance with respect to this learnable unit).
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
97
A demonstration of rules that use these slots is contained in Section 6.3.2. (2) Subject frames. Subject frames (Fig. 9) are made up of a subject definition along with various kinds of related material and description text, syntax text and system-defined examples (by contrast with the user-created examples) for the four levels of help. The reasoning system described above (see Section 6.2) and the multilevel parser described below (see Section 6.4) rely on the subject frames and adaptive frames to run the user interface. The knowledge exists in slots within the frames. Subject frame styles are defined for each type of learnable unit (statements, tokens, concepts and basis sets). Each learnable unit has its own frame. Formally, the learrzahle units are represented as follows: (a) Language statements: S. Statements are learnable units which are defined in a syntax facts table described more fully in Section 6.4. This table is extended by user-defined functions. A simplified definition for P L U S, for example, is ( C P L LI S * N 1 1. In this notation described in more detail later (see Tables 4 and 5 ) , the star, "*", indicates that
Adaptive User Model (AUM) knowledge
FIG.9. The knowledge and reasoning structure in an adaptive coaching environment. For each learnable unit, AUM and subject frames are built and controlled by model building and help presentation rules.
98
EDWIN J. (TED) SELKER
what follows can occur zero or any other number of times. Open and closed brackets, “ C ” and “I”, represent parentheses and the N represents a number type argument. Statement knowledge is a tuple S (L,SH,D,SE,R,O) where: (i) (ii)
(iii)
(iv)
(v)
(vi)
Language syntax: L is the statement’s formal definition which COACH uses to evaluate user work (see Section 6.4.3). Syntax help: SH frame slots contain formal abstract help definitions of a learnable unit. For a specific level in the help taxonomy (starter, model, reference or expert), the syntax describes the learnable unit in more detail. Concepts are introduced by examples; the syntax assists a user in internalizing the concepts. Description help‘:D frame slots contain help text for each level of the help taxonomy. These slots contain text explaining what a specific learnable unit is useful for, information describing when it can be used, and an explanation of what it does. System example: SE frame slots contain helpful examples typifying a learnable unit for a specific level in the help taxonomy. The user examples contained in the adaptive frames described above supplement these system examples. Required knowledge: R frame slots hold the set of skill domain parts with which a user must be familiar to use a particular learnable unit (e.g., using the C 0 N S function requires a user to understand the evaluation, s-expression, and atom concepts). This set defines a network of related material. Required knowledge makes it possible for a reasoning system to form the strategies needed to create teaching goals in a coaching environment. The required knowledge set details the necessary prerequisites for understanding a particular learnable unit. If a user is having trouble with a particular learnable unit, COACH displays related things with which the user is already familiar or which could be considered as alternatives. If a user is doing well, this knowledge allows COACH to see how to encourage the user to Ieam new learnable units which relate to ones already known. Required knowledge is a tuple R (*F,*C,*T) where: 0 Function: F is a statement name. 0 Concept: C is a concept name (described more fully below). 0 Language token: T is a token type of the domain language (described more fully below). Other related knowledge: 0 is a set of learnable units which are pertinent to another learnable unit. This set defines a network of
COGNITIVE ADAPTIVE COMPUTER HELP [COACH)
99
relationships which helps the architecture utilize the concepts that tie the domain together. This network is crucial to the coaching that will expand a user’s breadth and, when necessary, search for alternative teaching approaches. Other related knowledge is a tuple 0 (“F, “C, *T). F, C, and T are defined above in “required knowledge”. (b) Language tokens: T . Tokens are learnable units which are keywords and acceptable variable types for a skill domain (e.g., “(” , “CONS”). They are defined in a table with associated token methods described in Section 6.4.1. Token knowledge is a tuple T (SH, D, SE, 0).SH, D, SE, and 0 are defined above in “language statements”. (c) Concepts: C . Concepts are learnable units which are semantic ideas not codified by syntactic parts (e.g., evaluation, iteration, stored variable, etc.). A concept is a tuple C (*F, *C, *T). F, C, and T are defined above in “required knowledge”. (d) Basis-sets: B. Basis sets are learnable units composed of groups of other learrzable units comprising minimal sets of knowledge necessary to understand a topic. This term is borrowed from mathematics where it defines a similar concept. An arithmetic basis set, for example, would require a user to know about P L U S, D I F F E R E N C E , and the List and Number concepts. The elements of a basis set are skill domain parts, all of which must be known to do a task in a particular topic area (e.g., the basis set for “simple-lists” includes C O N S , C A R , and C D R and the atom and s-expression concepts). Generally, basis sets will be a subqet of a required knowledge set; for example required knowledge for L i s t includes the Eva1 concept as well as C 0 N S , C A R, and C D R . Basis sets may be elements in 0 or R sets of learnable uizits. These allow the system to reason about basic knowledge a user may be missing when trying to use a learnable unit. A basis-set is a tuple B (*F, “C, “T). F, C and T are defined above in “required knowledge”. The subject frames described in this section create a domain representation in COACH. This representation consists of syntax, descriptions, system-examples, and related materials for each learnable unit in the domain. The language syntax definitions, L, define knowledge with which COACH can record correctness of user work. Required knowledge, related material, and basis sets, included as R, 0, and B, define relationships between parts of the domain, much like the links in a hypertext system (Conklin, 1986). This network, R , 0, and B, can in fact be browsed like hypertext. More importantly, these are the basis for COACH
100
EDWlN J. (TED) SELKER
reasoning about relationships in the subject domain. Rules like “explore exploration” and “out of practice” described in the next section use these to orient and teach users. These subject frames are augmented by the AUM to give a rich representation from which coaching knowledge makes decisions.
6.3.2 Coaching Knowledge Coaching knowledge is embodied in rule sets which suggest information to place on help windows. Rule sets for creating and maintaining the AUM consist of update rules and consistency rules. These rule sets change the AUM frames each time the parser signals a change in parse state. In this way, user model frames are changed each time a function is closed, a token is typed, a token is found to be undefined, etc. Presentation rules consult the parser, the AUM and the blackboard to make decisions about what to present. Detailed analysis of two of the more interesting and complex of the presentation rules is provided below.
( 1 ) Update rules. A simple update rule set consists of rules with the following mnemonic names: (a) Note-Success (activated by a correct usage of a learnable unit; improves user rating on AUM frame slots for a learnable unit and related material). (b) Note-Failure (activated for an incorrect usage of a learnable unit; decreases user rating on AUM frame slots for a learnable unit and related material). (c) Was-Bad-but-Getting-Better (activated when a user has a success with a learnable unit which has been problematic; increases user ratings). (d) Was-Good-but-Getting-Worse (activated when a user begins making mistakes for a learnable unit which has previously been rated well; decreases ratings slowly). (e) Best-and-Getting-Better (activated when a user continues to use a learnable unit correctly at more sophisticated ratings; bumps up Best). (0 Worst-and-Getting-Worse (activated when a user continues making mistakes in use of a learnable unit that has been being used poorly; bumps down Worst). ( 2 ) Consistency rules. A consistency rule set would work with update rules to create and maintain a user model. A simple consistency rule set has the following mnemonic names: (a) Note-Used (activated each time a learnable unit gets used). (b) Maintain-Best (works with Best-and-Getting-Better to bump up Best).
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
101
(c) Maintain-Worst (works with Worst-and-Getting-Worst to bump down Worst). (d) Bound-Goodness-and-Best (activated to record user’s “personal Best”). (e) Bound-Goodness-and-Worst (activated to record user’s “personal Worst”). ( 3 ) Presentation rules. Presentation rules determine the help that will be provided to the user, posting and removing the various possibilities on the blackboard. Specific presentation rules “argue” for their position and the blackboard records the result which is presented to the user. Separate presentation rules exist for statements, tokens and concepts. Their particular order determines which text will have priority for use as help. For the purpose of creating COACH (and the evaluation of COACH in Section S), a model rule set for presentation of statements contains the following rules mnemonically named: (a) Losing-Ground (provides the user with the most basic help). (b) Out-of-Practice (reminds the user of information that was previously understood). (c) Encourage-Exploration (suggests useful information not presently being used). (d) Veto-Overly-Sophisticated-Help (protects the user from help beyond an appropriate level). (e) Veto-Extra-Help(protects the user from too much help). To give a feeling for how these rules work, an examination of a few of them follows. A simple “Losing-Ground” rule provides a user with examples: IF l e a r n a b l e u n i t u s e d has a low Goodness s c o r e and a low l e a r n i n g Slope, THEN PUSH a User-Example o n t o t h e b l a c k b o a r d , and PUSH a System-Example o n t o t h e b l a c k b o a r d .
This rule implements the following concept: if the Slope and Goodness measure are both low, then the person is doing poorly. In such a situation, the rule proposes that both a prior user-example, if available, and a system-example of correct use of the statement be placed on the blackboard for the confused user. Besides pushing things onto the blackboard, the system might use other rules, such as “Veto-Extra-Help”, to take inappropriate information off the blackboard. IF user expertise f o r t h i s learnable u n i t b e t t e r than t h e Best i t has been
i s NOT
102
EDWlN J. (TED)SELKER
THEN PUSH V e t o - E x t r a - H e l p
onto the blackboard.
This rule implements the following concept: if a user’s expertise is not at its highest point so far, tell the blackboard not to provide overly verbose help. The defined network of relationships is used in rules such as “EncourageExploration” to expose a user to new information. IF l e a r n a b l e u n i t h a s a h i g h Goodness s c o r e , a n o n - n e g a t i v e Slope, a n d h a s b e e n u s e d many T i m e s THEN PUSH p r e v i o u s l y unused R e l a t e d and R e q u i r e d knowledge f o r t h e l e a r n a b l e u n i t o n t o t h e blackboard.
This rule implements a tutoring concept: if a user is doing well with a learnable unit, expose them to more material in that area of knowledge.
6.4
Instrumented Multilevel Parser
The domain a user is trying to learn (e.g., Lisp) has a syntax- the set of things a user can type that are correct and interpretable. Like the standard UNIX facility, Yet Another Compiler Compiler (YACC), and LEX (Kernighan and Pike, 1984a), COACH includes a general-purpose parser which uses state machines to classify character and token types to drive lexical analysis. A formal language definition drives actual parsing strategy. Unlike other parsers, the COACH parser is instrumented to run rules and add knowledge to a user model after each keystroke. The multilevel parser (see Fig. 10) structurally separates different kinds of data about the user’s interaction with the system. It is made up of a character classifying table, a token parse table, a token attribute grammar parser, and a statement attribute grammar parser. The parser structure, function, and syntax are described below.
6.4. I
Parser Structure
Character Classifying Table. The character classifying table lists all the characters and their functions. It is an efficient mechanism for classifying each character’s impact on a user’s work. Character types can readily signal delimiters and user mode changes with one computer array reference instruction. Use of this technique for the two “bottom-most’’ analysis levels is part of the overall strategy of real-time response required to provide an automated adaptive coaching interaction style.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
Character Classifying
Delimiter parse level
(e.g., A)
Token parse level
(e.g., AB)
103
E
R
t Token Attribute Grammar
Actions
----___--_______________ Syntax parse level [ABSN]
(e.g., ( ABS 7 ))
Actions
Statement Attribute Grammar Parser FIG.10. The structure of a multilevel parser.
Token Parse Table. A token or word in a user input language has meaning by nature of its kind or “type”; it might be a number, keyword, variable name, etc. The token parse table (see Tables 1-111) notices token type changes and dispatches tokens to the token reader. It is the next level of user input analysis. As with the character classifying table, the token parse table uses a lookup technique, which permits most user context changes to be recognized without resource intensive reasoning. Token Attribute Grammar Parser. Recognition of a token could have side effects. The token attribute grammar parser is tied to the token parse table to handle token level interpretation of user typing. Each token has an associated “Object method” which analyzes the impact of a token’s completion. The methods call for coaching help, update the user model, and change the way COACH views the domain and the user. Statement Attribute Grammar Parser. Syntax is the defined order in which tokens can legally follow each other. The statement attribute grammar parser handles the lexicon and builds a structure describing the syntactic unit the user is typing. User input is filtered through this parser for lexical analysis. Static semantics refers to the meaning attainable from a program without running it.
104
EDWIN J. (TED) SELKER
TABLE I
MODELTOKENPARSE TABLESUFFICIENT TO BREAKLISPINPUT INTO DELIMITERS, ERROR STATES AND TOKENS Current state Character read
pls
min
sym
qte
st
cmt
num
shp
hlp
er SP num chr OPn cl qte
hlp hlp num sym PIS rnin hlp
hlp hlp num sym pls min qte st cmt min
hlp sym sym sym pls min qte
hlp sym sym sym pls hlp qte
hlp st st st
hlp hlp num sym pls min qte
St
st
cmt min min sym
cmt hlp min sym
min st st st st
hlp cmt cmt cmt cmt cmt cmt cmt cmt min cmt cmt
hlp hlp hlp hlp hlp hlp shp hlp hlp hlp hlp hlp
hlp hlp min min hlp min hlp hlp hlp hlp hlp hlp
St
St
cmt ecm blnk ShP
cmt pls pls shp
min shp
St
st st
st
cmt min min hlp
TABLE I1 SYMBOLS
er SP num
LN
A
TOKENPARSE TABLE BREAKUP ~ U INTO T POSSIBLE DELIMITERS AND SPECIFIC TOKEN TYPES
cntrl characters etc.
ChI
0 a
OPn
(
Cl
)
qte
I
-
-
9md. zmdA
-
2
St
crnt ecm blnk ShP
If cr #
Characters not used in the domain. Special characters. The decimal numbers. The standard roman characters. Open parenthesis symbol. Close parenthesis symbol. Single quote is the Lisp QUOTE macro symbol. Double quote is the string delimiter symbol. Semicolon is the Lisp comment delimiter symbol. Characters that end Lisp comments. Blank space character. “Pound” or sharp character.
Syntactic and static semantic advisory responses are triggered by this parser. The statement parser sends information about the user to the AUM. As explained below, a simple description language can be used to create statement templates. The parser steps through these templates accepting user input. The parse state pushes onto and pops off a stack as expressions are evaluated.
6.4.2 Parser Function The mechanics of, and the relationships required between, these elements of
COGNITIVE ADAPTIVE COMPUTER HELP (COACH1
105
TABLE 111 STATES IN TOKEN PAESE TABLE. THESESTATES MODELAND EXECUTABLE PARSE OF TOKENS. LANGUAGE PARSING IS DRIVEN BY THESE STATES PIS
min sYm qte St
cmt num ShP hlP
Is the start of form parse state. Is end of parse context state. Indicates a symbol is being parsed. Indicates a quoted object is being parsed. Indicates a string is being parsed. Indicates a comment is being parsed. Indicates a number is being parsed. Indicates a macro is being parsed. Indicates an illegal object is being parsed.
the parser will now be described. The token parser can be modeled by a finite state automaton. The COACH interpreter is parsing, not to interpret the domain language, but rather, to build a user model and to note teaching opportunities. Table I shows details of transitions that can determine Lisp token delimiters for COACH. As characters are accepted by this token parse table, they are added to the partially constructed token. The token parse table takes the current parser “state” (the column) and the current character (the row) as input to determine the reader state. For example, if the reader were in string state, s t , and received an illegal character, e r , the reader would change to the help, h 1p , state. The character classifying table, the token parse table, and the token attribute grammar parser collectively comprise the token reader. The accepted characters and the state of the parse table drive the token reader. A function for each token type checks token side effects of the parse state. When a new Lisp variable is read, for example, such a function would add it to the user’s environment as necessary. When a token is accepted, the AUM is updated. A statement is composed of legal tokens in a syntactically legal sequence defined by language parse templates. The statement attribute grammar parser is controlled by language parse templates. Each time a token is accepted by the token reader, the statement parser takes a step through the currently active template and predicts what the user might need to do. The acceptance of a token and progress through a template cause the rule system to select presentation help. The template state and past input give the AUM knowledge of user goals that are often adequate to predict expected needs (as described in Section 6.3). The statement parser uses a formal language to describe syntactic parsable expressions in templates. If a statement call is made, a new context is started, for example (SETQ
a
(CONS
106
EDWlN J. (TED) SELKER
makes the C O N S statement parse template active, pushing the S E T Q parse template onto the pending parse stack. The statement parser steps through these templates, pushing them on and popping them off the pending parse stack, as new contexts are started or completed. Expression side effects can be further detailed in an action function to be run after a parse is accepted so that COACH can keep a record of the user’s environment as well as the user’s state. To implement side effects, a token or function can have an action function associated with it. When the adaptive automated help framework has completed recognizing the token or function, the action function will run. The S E T Q function, for example, has an action function which adds new variables to the known variables list. A more complete example can demonstrate the statement parser in action: writing an s-expression to sum three with the product of five and four. Starting in the beginning state, when the user types (
the reader puts the statement parser in the p 1s state. The token rule set now consults the adaptive frames to decide whether help concerning a function name should be displayed for this user, and if so, what kind of help is needed. As the user types PLUS
the statement parser makes the P L U S parse template the current template. The function rule set and blackboard now use the AUM to decide how much and what kind of help to present. The token rule set fires to decide what immediate help to present, possibly indicating that a number is called for in the P L U S parse template. If the token rule set demonstrates user need, number help is given to the user as three is typed in. While the product is being entered, the parse stack has to remember the P L U S parse state. The T I ME S parse template is now put on the parser stack. The function help rule set is fired again to reason as to what help to present for the T I ME S function. When the product is ended with a 1
the sum parse template comes into force. When the sum parse template is closed, the beginning state would be in force. A rule set now gives top level help if appropriate. Each transition change above causes an action function, if it exists, to run when a token or function is recognized. The multi-level parser rule system and the AUM work together as an architecture for adaptive help.
6.4.3 COACH Syntax Languages can be formally defined. Specifically, they are defined by a grammar and alphabet (Hopcroft and Ullman, 1979). COACH uses a formal
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
107
language representation of a subject domain to assess user work and progress. The domain language is defined for the statement attribute grammar in a formal, context-sensitive syntax notation. This notation is shown with the Lisp system key symbols in Table IV. Logical conjunction in a template is indicated by juxtaposition. A legal statement or sentence, s, consists of a string of symbols from the alphabet (described in Table V) that satisfies an expression in a legal syntax table. In the formal language definition, key symbols, control symbols and delimiters are surrounded by a set of parentheses: CS
:=
(
*
11
The language can be described as being made up of the alphabet: A
:= A , S , N , L , F , X , Q , F S , C , > , [ , I , V , ? , * , @ X X X X X TABLEIV LANGUAGE PARSEMODEL: TOKEN TYPES A S N L F X
Q FS
atom defined symbol number list function any of the above types check only parenthesis level function specification
Notes: These allow modeling of Lisp’s major token types. Such a parser table is designed to analyze user proficiency; a language parser designed to implement a language might have more types.
TABLE V LANGUAGE PARSEMODELSYNTAX DELIMITERS open a syntactic parse unit close a syntactic parse unit (, open a clause 1 , close a clause next syntax part can occur 0 or more times next syntax part can occur 0 or 1 time at least one component of the next clause must occur at least once consider the following characters a symbol Notes: The parse modeling language itself has immutable token type control symbols to allow designers to describe a language to be coached.
108
EDWIN J. (TED) SELKER
where X X X X is any string of characters. Delimiters should always come in pairs: €1, [ I .
In the Lisp language, a new S is signaled by an open parenthesis, represented by a C , so all s-expressions end with a closed parenthesis, represented by a I. (The UNIX command language ends commands with a carriage return, and so does not require this 1.) The COACH architecture uses an attribute grammar parser. Each token type and each completed parse sends an :a c t i message when recognized. Each key symbol in the notation has a method (function) associated with it which can cause an action during the parse. Functions that the system parses are described in this notation. The simple example I:
ABS N 1
defines the absolute value function as requiring a parameter of type number. A slightly more complicated syntax such as I: SETG!
*
C A X
>
1
requires 0 or more atom-anything pairs for a legal “sentence”.
6.5 Conclusion Section 6 has described the COACH architecture. Several representations work together to create help for the user: the subject frames (definitions of the domain), the adaptive frames (recording of a user relative to a domain), the presentation rule sets (which embody a model of teaching), and the multi-level parser (syntax domain definition). The section has further shown how formal representations are used for domain knowledge, teaching knowledge, adaptive strategy, and the way the adaptive frames are used to create a coaching advisor. The next section discusses additional reasons for using these multiple interacting representations. The COACH/2 system is somewhat different. It does not require the author to write syntax text. COACH/2 keywords are the spatial and graphical elements on the screen. It defines arguments to these icons, dialog boxes, etc. as the mouse actions or typing that is transmitted to the GUI element.
7. A COACH Shell Cognitive Adaptive Computer Help (COACH) is a proof by demonstration of a real-time, adaptive, advisory agent. Demonstrations of possibility are important,
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
109
but further progress in a field requires tools to make experiments feasible. As well as being a demonstration, COACH was designed to be a testbed for understanding adaptive user interaction. The structure is organized to allow a courseware designer to change the skill domain information, the presentation approach or the adaptive strategy with minimal effort. It has been used to show that COACH can work for open systems (see Section 5 ) and with different domains, and can support experimentation with user modeling and help presentation strategies.
7.1
Using COACH in Open Systems
Open systems are systems which are too big to be analyzed or which grow with use (see Chapter 5). The Lisp COACH demonstration shows that adaptive user help can be used for open systems. COACH creates user models and provides help for any number of system functions and new user functions. The system was tested on a twenty-five thousand function Lisp programming environment. Since help text could not be provided for all of the constantly changing Genera Lisp functions, a mechanism for adding help for functions as they get used was provided instead. Multi-level help was provided for a basic set of functions and was automatically augmented to include any other functions actually used or added by a user. COACH queries the Genera Lisp environment for a syntax description which it uses to start a user model for a previously unknown function. When a user defines a new function, COACH records its syntax to start a user model for this function. As newly added functions get used, examples of correct and incorrect usage are collected for use as example help text.
7.2
Using COACH For Different Domains
COACH works from a courseware designer’s definition of a domain language. A UNIX version was created to demonstrate that COACH can be ported to different domains. Matt Schoenblum, a talented seventeen-year-old high school student without programming experience, was able to demonstrate this capability by adapting the COACH system to teach the UNIX operating system’s shell command language (Kemighan and Pike, 1984b) in a ten-week internship. Schoenblum learned enough UNIX to be comfortable using it to edit documents, send mail, transfer data, print things, etc., to accomplish his work. He interviewed UNIX users to identify twenty key UNIX commands and wrote multi-level help text for these commands. He then defined delimiter and token types needed for the system to parse UNIX commands. Finally, he wrote syntax definitions for all identified commands. COACH enabled such an accomplishment by only requiring data, rather than reprogramming, to create a help system for a new domain.
110
EDWIN J. (TED) SELKER
Two functions were provided to handle the new tokens which were used in UNIX but not in Lisp: the carriage return delimiter and the “anything” ( or *) token. Changes to the parse table-altering the end-of-statement delimiter from ‘‘1 to carriage return and eliminating the “;” for comments-were provided as well. At the time, the COACH system only ran on Symbolics computers. A command caller which interfaced to a UNLX workstation over a Telnet (Kemighan and Pike, 1984b) connection was proposed but has not yet been tested. This UNIX help system was experimented with by several people, and improved through iterative experimentation. No formal study has yet been performed with it. ”
7.3
Experimentation With Help Presentation Strategies
COACH also supports experimentation with help presentation strategies. All help presentation is managed by rules. These rule sets have allowed continued testing and changing of the coaching strategies in the COACH implementation (see Chapter 6 above). Several students have experimented with the rule system to learn about adaptive strategies (Matt Kamerman, Kevin Goroway, Frank Linton, and Chris Frye). These experiences demonstrated that the COACH can be used as a shell for developing proactive, interactive adaptive computer help systems. The COACH implementation gives proof by demonstration that adaptive computer help works in open systems, for different domains, and supports experimentation with adaptation and help strategy. The next section describes experiments which show COACH can improve student performance. The COACH/2 system has allowed instruction development professionals with no programming expertise to create help content for OS/2.
8. Evaluation of COACH Adaptive User Help Two user studies have been performed to evaluate COACH. A preliminary study investigated COACH user perception differences. The second study quantitatively demonstrated these differences and performance improvements as well. In this five-session Lisp course, the system was found to improve both performance and perceived usability when compared to a version which offered only non-adaptive user-requested help. This is the first demonstration of an adaptive interface showing performance differences for users. Enhanced interface features available to both groups may have improved productivity as well (e.g., the pointing device, on-line selectable help, separate input and output windows, real-time error detection, etc.).
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
8.1
111
Pilot COACH Study
A pilot study was conducted to evaluate automated adaptive help in the COACH system and to flesh out issues for the full scale study. Six programmers who had no knowledge of Lisp were recruited from research staff, programmers and co-op students at IBM T. J. Watson Research Center. The three day course consisted of a classroom lecture each evening followed by a work period. The students worked through exercise sets, and responded to interview questions and quizzes. Each evening involved a new exercise set. One group performed work using COACH, the other group used a standard interpreted Lisp reader on an IBM PC-RT computer. Many differences between the actions and reactions of the two groups were noticeable. Students appeared much more energetic and productive when they were using COACH. They used the on-screen help and they put their fingers on the screen. While the COACH students tended to experiment within the system, the other group tended to write ideas on paper. When, on the last night, the groups switched places, the behavior also switched. However, technical difficulties and the small number of participants make formal analysis of the pilot study uninteresting.
8.2 Quantitative Study; Demonstrating COACH Usability Improvements Major improvements to the pilot study were included in the full, quantitative study: 0
0
0
0 0
The lecture format of the pilot study was changed to a self-paced format in the full study because it had appeared that the lectures made the students feel pressured. They seemed to believe that the difficulties they were having were caused by the lecture, when in fact, the course was designed to be difficult. The students in the pilot study appeared anxious and self-recriminating when they could not finish the entire exercise set provided for that evening. So, for the full study, the three exercises from the pilot study were combined into one unbroken problem set. This relieved some of the unnecessary performance pressure. In the pilot study, the group using the adaptive automated help seemed to be enjoying themselves, while the other group did not, so a daily comment sheet was added to the course to record the way students felt. Recorded audio interviews were added for the same reason. Improvements in the COACH implementation reliability made the mechanics less daunting.
112 0
EDWIN J. (TED) SELKER
A version of COACH without an AUM was arranged for the control group. This allowed the study to concentrate on the value of an AUM, the central COACH technology, rather than other ergonomic advantages of COACH.
The full study tested the hypothesis that an adaptive coaching paradigm can improve user productivity. Normally COACH adapts to its user and automatically offers help at an appropriate level of understanding for that user. A method was devised to focus the user study on the comparison of automatic adaptive help with user-requested help. A control version of COACH was created which included all interface aids, but excluded the AUM, which had the effect of eliminating the automated adaptive help. It still separated user actions on the window panes from system actions when reporting errors and displaying user-requested help (Fig. 8). The study compared user experiences with this control version and experiences with the automated adaptive version.
8.2.1 Method Nineteen employees of IBM T. J. Watson Research Center were recruited. They varied from summer interns to professional programmers. While all of these “students” had prior programming experience, none had previous experience with Lisp. The students were recruited with an electronic poster. The poster solicited people who knew how to program but had no exposure to the Lisp programming language, and who wanted to participate in a short class/study teaching Lisp. Incentives to participate were sandwich dinners, exposure to the experimental system, and the promise of learning a new language. The students were separated into an early session meeting from 5:OO p.m. to 6:OO p.m., and a late session meeting from 6:OO p.m. to 7:OO p.m., each day for five days. Attempts were made to assign students to whichever session best fitted their schedules. Students were assigned at random to use the manual help or to use the adaptive help. By the time the course was underway, eight students were using the manual help system and eleven students were using the automatic adaptive help system. Courseware created to support the user study include a course introduction, a Lisp tutorial, and an EMACS editor reference card. Materials used to evaluate the students consisted of a test given before the course began (pre-test), daily comment sheets and a test given at the conclusion of the course (post-test). In addition, audio interviews and student exercise solutions were used as sources of data. These are described in more detail below.
he-Test. Before the course began, the students were tested for their knowledge of Lisp and programming concepts in general. The written pre-test was administered to them to collect background information and to insure that they
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
113
had no working knowledge of Lisp. The test included questions to evaluate previous programming experience, to establish which programming languages the students knew, and to measure awareness of common programming concepts. Lisp-specific questions were asked to eliminate any students who had prior Lisp experience.
Precourse Materials. In this study, students worked with COACH on Symbolics workstations with a screen layout as shown in Fig. 8 in Section 6 above. The students were isolated from one another and not permitted to converse with each other. Some had separate offices; one group of three sat facing three different walls in a large office. All users in this “communal room” were members of the non-adaptive group. Students were instructed in basic operations they would need to use, such as where the rubout key was on the keyboard and how to use the mouse. They were all given the same set of course materials. The materials consisted of a brief Lisp tutorial, a quick reference sheet for the EMACS editor, and an exercise set. Students were encouraged to use the computer help whenever possible. The tutorial was stapled to the exercises facing backwards to force them to turn over the tutorial to see the exercises. The students were told they would only receive help from the experimenters in the case of machine problems, not in learning Lisp. They were instructed to read as little of the tutorial as they felt necessary to familiarize themselves with Lisp, to work on the problem sets in a self-paced manner, and to refer to the help windows often. Lisp Tutorial. A tutorial presentation of the major Lisp constructs was provided on paper. It was written so as to not to solve the exercise problems, yet complete enough to aid the students with learning Lisp. A motivational introduction listed advantages of Lisp as an application development language. The format of the tutorial was similar to a textbook; concepts were explained in an order so as to build on each other. Once a concept was explained it was followed by simple examples. The brief nine-page tutorial covered the range of topics a full Lisp course would cover, which might be daunting to beginning students. They could have read all of this material, but were not required to. If they read it all, they would certainly have been introduced to many more challenging topics than could normally be mastered in five hours. The topics of the tutorial were: 0 0 0
What is Lisp, and why use it? Read-Eval-Print loop. Functions.
114 0 0
EDWIN J. (TED) SELKER
Lists. Conditionals.
0
MAP & LAMBDA.
0
Defining functions. Repeated computation; Iteration. Repeated computation; Recursion Data structures; property lists. Datastructures; D E F S T R U C T .
0 0 0 0
Exercise Sets. The exercise sets contained problems covering basic arithmetic operations, list operations, conditional execution, and a small database project. To avoid a ceiling effect (all or many users completing all exercises), the exercise sets were intentionally longer than what a student could complete in the time available. Solutions to the first nine exercises consisted of expressions composed of built-in Lisp functions. Correct solutions indicated completion of exercises. When the students finished these single-answer questions at their own pace, they began the database project. The students were not told which Lisp functions they should use, nor how they should construct the database. Arithmetic problems introduced the concept of evaluation order, forcing the students to understand that function names come first in Lisp s-expressions. For example, one student, while trying to accomplish (times
( p l u s 5 5) ( p l u s 2 2 ) )
was observed trying ((plus
5 5) t i m e s
(plus 2 2 ) )
but was able to understand the problem with the aid of COACH. The system instantly recognized that an error was being made and popped up an attentiongrabbing error message in the immediate help window. When this error message was noticed, the student was able to use the COACH on-line help to figure out the problem in the model of evaluation used to construct the statement. The student was able to understand the order of evaluation problem and fix the operator/argument order in the solution. The list operations required for solutions to the exercises included C A R, C D R and C 0 N S, the concept of nested lists, and Q U 0 T E. The students were asked to create lists of varying degrees of difficulty. The simplest was a single level list ( 1 2 3 ) , and the most difficult involved a multiple level list that required an understanding of quoting. A conditional statement was required to solve one of the exercises. The task required the student to cause the computer to print y e s if a certain element was contained in a list. Students could have used C O N D and iteration or recursion to
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
115
solve this problem. An easier approach using the M E M B E R function could simplify the solution. Unlike earlier problems, students had to write their own functions in solutions to the database project exercises. One might expect students to use the simplest data structure, a list, or more rarely, property lists, the first time they need accessor functions. The tutorial covered these data structures, and the sophisticated D E F S T R U C T macro as well. Surprisingly, most students chose to use D E F S T R U C T instead of lists or property lists, all of which were covered in the Lisp tutorial. The reason most students gave for this choice of using D E F S T R U C T was, “It does everything for you.” This indicates that they understood the value of a macro that writes functions the student would normally have to write. The database problem was worded carefully to segment the solution into six small user-written functions. For example, the students were asked to write a function to add a person to the database, and to write a function to retrieve a person’s phone number from the database. This allowed a direct measure of productivity by the number of functions written. Analysis of student solutions was used to compare productivity of the two groups. In addition, quality of user solutions was evaluated. Students’ code was examined for appropriateness and sophistication of Lisp functions used, use of variables and condition checking as well as overall style.
Comment Sheets. During each of the five one-hour sessions, the students were given a comment sheet to record impressions. They were instructed to write as much or as little as they chose. The following questions were asked on the comment sheet: (1) How often do you look at the help screen while solving a problem? ( 2 ) How helpful is the help screen? (3) How helpful is the COACH window system, as compared to a line-based interpreted environment? (4) Observations about COACH? (Answers from this question were evaluated for perceived value or rating of the COACH environment.) (5) Observations about Lisp? (Answers from this question were evaluated for perceived utility of the Lisp programming language.) ( 6 ) What problems are you having? (7) What problem are you working on? (8) What is your motivation to learn Lisp? (Answers to this problem were given on a scale of one to ten.) To compare the answers of the two groups, all the written answers were analyzed and coded as varying from zero to five. Two readers evaluated each answer and assigned it a value without knowing which group it came from. These
116
EDWIN J. (TED) SELKER
numerical values were used to evaluate the likelihood that the two groups had the same experience with the system (see Table VI and Fig. 11).
IftterVieWS. Near the end of the course, six randomly chosen students from each group were interviewed for a few minutes on audio tape while they were working. The important question asked was: What do you find most helpful while solving a problem: the help screen, the selectable menus, or the tutorial?
Post-Test. At the end of the user study, the students were given a post test. To measure the amount of Lisp learned by each student, questions covered the same material as the pre-test. In addition, questions about specific feelings toward COACH and Lisp were posed. Although similar to the comment sheets, these questions were worded differently to reveal more about the students’ personal views. 8.2.2 Experimental Data and Analysis The following sections give detailed descriptions and analyses of data taken in the study. The data recorded came from the pre-test, comment sheets from each day, saved exercise solutions from each student, verbal interviews of students during their sessions, and the post-test taken after the course was completed.
Pre-TeSt. The pre-test showed that all students selected were experienced programmers, but had no prior Lisp experience. Answers to the Lisp-specific TABLEVI DATACOLLECTED FROM THE COMMENT SHEETS Question (Answers on a scale of 0-5) Rate Lisp as a programming language. How often is the help screen consulted? Rate COACH as a learning environment. How helpful is help screen? Is COACH better than a line-based environment?
Manual Mean
Adaptive Mean
p-value
1.90 3.16 1.97 2.44 3.31
3.36 3.91 2.97 3.20 3.49
0.01 0.04 0.05 0.1 1 0.70
Notes: This shows that the students using the adaptive version of COACH N e d Lisp more than the other group, consulted the help screen more often, and rated COACH higher as a learning environment. Although the students using the adaptive COACH tended to find the help screen more helpful than the other group, the difference was not significant. Notice that both groups rated COACH as better than a standard line-based environment. In the above table,p-value is the probability that the means in two samples are the same. This data is shown graphically in Fig. 1 1.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
117
programming language
I
How often is the help screen consulted?
I
Rate COACH as a learning environment
I How helpful is the help screen
Is COACH better than a line-based environment? I
I
questions showed that, except for the simplest cases, the students were unable to answer Lisp questions by guessing.
Saved Exercise Solutions. The COACH system internally stores information about each user. This internal representation is the student’s adaptive user model (AUM). At the completion of users’ work sessions, COACH creates two files. The first contains the user’s work (Lisp code), and the second contains the user model (usage data for determining the level of the user’s expertise for specific learnable units). Unfortunately, user model files were not kept for manual help students. The students saved work and their saved user models showed that all students had finished the eight introductory exercises, i.e. those exercises which did not require defining functions. This was followed by a database project requiring students to write functions. The comment sheets indicated that all the students had begun writing functions for the database project by the last session. Examining their work showed surprising differences in the percentage of the ten database project functions the students in the two groups had actually completed. The users of the adaptive system wrote an average of 2.5 functions, as compared to 0.5 for the users of the
118
EDWlN J. (TED) SELKER
nonadaptive system. No user of the nonadaptive system wrote more than two functions. This is the most significant result of the study: on the average, the users of the adaptive system definedfive times as many of the functions required in the data base project. In addition, the style and quality of functions written by the adaptive system users were much better than that of the control group. One user of the adaptive system wrote a function which demonstrated astonishing progress for five hours of experience. This function, included below, demonstrates an understanding of parameters, scoping and formatting, as well as boundary checking, D E FUN, and lists. (DEFUN a d d - p e r s o n ( n a m e p h o n e L a n g ) (COND ((MEMBER name U S E R S ) ) ( ( E Q U A L USERS n i l ) (SETQ USERS ( L I S T ( L I S T name p h o n e t a n g ) ) ) ) ( T (SETQ U S E R S ( C O N S ( L I S T name p h o n e t a n g ) U S E R S ) ) ) ) )
Comment Sheets. At the end of each day, the students were asked to fill out a comment sheet. The data from the comment sheets are the students’ ratings of various aspects of the course. The values vary from zero to five, zero being the worst and five being the best. The statistical analysis was done using a two-tailed p-test, to determine the probability ( p ) that the mean rating of the users of the adaptive system was different than the mean of the control group. This was done using Welch’s method (Brownlee, 1984) (see Table VI). The strongest result concerned users’ ratings of Lisp as a programming language. The users of the adaptive system indicated that they had a higher regard for the Lisp language; the means of the answers for the two groups have a 0.01 probability of being the same as each other, giving a 99 % probability that the adaptive help group had a higher regard for Lisp than the control group ( p = 0.01). The question, “How often do you look at the help screen while solving a problem?” showed that users of the adaptive system used the help screen more often ( p = 0.04). The results from the question, “How helpful is the COACH window system, as compared to a linebased interpreted environment?” showed that both groups thought the COACH environment was an improvement over a standard interpreted environment. The mean rating for the question “How helpful is the help screen?” seemed to be higher for the users of the adaptive system; however, with the study sample size it did not prove to be significant ( p = 0.11). This is probably because it is a hard question to answer. The question might have yielded better data if it had asked the students to compare the help screen to the tutorial or some other form of help.
interview Tapes.
Students were interviewed on audio tape during their
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
119
work sessions with standardized questions. Six randomly chosen individuals from each group participated in these interviews. The data collected from the interview tapes show differences in the ways the two groups utilized help while solving a problem (see Table VII and Fig. 12). All users reported the usefulness of menus to ask for help. The manual user-requested help group received help messages for syntax errors, the kind of help for which an interpreted Lisp environment is known. They reported that this computer presented help was not particularly useful. All users of the automated adaptive help reported finding it useful. While only one member of the manual group reported making use of the tutorial packet, all interviewed members of the adaptive group reported the tutorial useful. While TABLEVII STUDENTS USING DIFFERENT METHODSTO SOLVE PROBLEMS Learning materials used
Manual
Adaptive
6 0 1 1
6 6 6 0
Asks COACH for Help Uses COACH presented Help Refers to tutorial Uses trial and error
Notes: Of six students interviewed in the manual group and six students interviewed in the adaptive group, the adaptive help group found more of the support materials useful. This data is shown graphically in Fig. 12
b I
Use trial and error to solve problems
r
Refer to tutorial materials
r
Use computer presented help Use menus to ask for help
1
I
I
I
I
I
I
0
I
2
3
4
5
6
Individuals
automated adaptive help group
0 manual user requested help group FIG. 12. Number of users reporting use of each method to solve problems. Data recorded from the six people interviewed from each of the two groups from Table VII.
120
EDWIN J.
(TED)SELKER
one member of the manual group reported relying on trial and error to solve problems, none of the adaptive group reported using this technique. In Fig. 12, it is notable that students in the adaptive system group used all the different types of help available to them, while students in the nonadaptive group did not. /%St-teStS. A post-test was given to the students at the end of the course. The post-test asked the students how comfortable they felt with the Lisp language. The data collected from the post-tests show the difference in comfort levels between the two groups. Out of the six post-tests completed by users of the nonadaptive group, two students felt somewhat comfortable and four were uncomfortable. Of the nine post-tests completed by users of the adaptive system, three students were comfortable, five were somewhat comfortable, and one was uncomfortable (see Fig. 13).
8.2.3 Discussion The data demonstrate differences in self-assessment and performance between users of an adaptive automated help system, as compared to users with manual help.
FIG. 13. Comfort levels reported by students at completion of course. Percentages are calculated for nine students from the adaptive automated help group and six students from the manual help group who returned the post-course questionnaire.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
121
The terse nature of the tutorial purposely masked the quantity of information with which the users were familiarizing themselves. The amount of knowledge to which the students were exposed was close to what students might be expected to master in a full semester Lisp course. Although the students were sometimes frustrated, they learned a lot of Lisp. The goal of requiring them to use the help system to solve their problems was achieved. Even though large performance differences were found between the groups, the nonadaptive group still performed extremely well. When compared to the amount of work a novice might accomplish in a typical learning environment, it is clear that the COACH environment was a significant aid. Even the nonadaptive users described their interface as an improvement over the usual tools that are available (Fig. 11). The pressure of learning so much Lisp in such a short time without any human teacher help was a challenge. Although all students completed the study, one of the manual help group students required extensive persuasion to continue after the third day. A comparison of the full study to the pilot study showed that the pressure of the amount of material to be learned was decreased by the self-paced presentation.
8.3 Future Work This study has shown that an adaptive automated help system can increase user performance. Many important questions remain unanswered (see Section 10). To simplify the user study, the most experimental rule was eliminated from the COACH adaptive user model. A small number of adaptive rules were used to change the quantity and quality of help given to a user for each function, token, and concept. It is important to examine the various kinds of adaptation and knowledge bases in such an adaptive user interface. The value of each individual rule should be studied. COACH also records user information concerning required and related knowledge. As described in Section 6.3.2 rules can use this information to describe alternative solutions or point to related learnable units when a user is struggling. For simplicity, the “encourage exploration” presentation rule was deactivated for the study. Additional rules could interject syllabi on which to tutor a user in such a situation (see Section lo). It would be interesting to study the value of such facilities, which may distract students from their task. This study tested the effectiveness and importance of an adaptive system with Lisp-illiterate users. The ways in which COACH will be helpful to experts is probably quite different from the ways in which it will be helpful to novice programmers. Experts will benefit from COACH’S “Level 3” examples and “Level 4” complete syntactic descriptions of functions. COACH leaves experts alone when they are working on something with which they are experienced. These
122
EDWIN J. (TED) SELKER
experienced Lisp programmers will benefit from the fact that COACH keeps track of context dependent situations, scope, and undefined variables, while exposing the user to the relationship between functions and concepts. Novices, on the other hand, find the changing adaptive help for functions and tokens quite useful for learning the syntax of simple functions and token types. While studies showing the different benefits for different users would be straightforward, they were beyond the scope of this experiment.
8.4 Conclusions In this study, significant differences were found between students who used automated adaptive COACH help and students who had only manual COACH help. While the responses to the comment sheet question concerning motivation during work sessions did not show a difference between the two groups, other indicators did. One might expect the group with less computer support to make greater use of the paper tutorial; however, the converse was true. Both groups had the same access to the paper tutorial and on-line help. While the group with manual COACH help only valued the user-requested help, the automated adaptive help group utilized all available materials, the Lisp tutorial and user-requested COACH help as well as automated coach help (see Fig. 12). Students from the adaptive group reported feeling more comfortable with Lisp and also completed many more of the exercises than the control group (see Fig. 13). The automated adaptive help system succeeds in improving productivity and raising motivation to use available materials. This section has described efforts to evaluate adaptive user help systems. The study raises many interesting questions for future research. The following section discusses these in more detail.
9.
Development Status
The OS/2 WarpGuides product is based on a version of COACH rewritten in C++ and extended to support a graphical user interface. This graphical adaptive help system was written by Ron Barber with help from Bob Kelley, Steve Ihde and Julie Wright. The product uses innovative dialog masks and highlighting to draw attention to the graphical features being coached. Text balloons alongside but not obscuring the dialog box describe these features. In prototype versions, animation and sound have also been explored for augmenting the masks and text. These content types have been implemented with wav files for sound and a home grown animation language called GAS (Graphical Animation System).
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
9.1
123
Animated Help
We began analyzing the GUI to create graphical help that would best match the GUI interface using the elements of visual language formalism (Selker and Koved, 1988). Objects on the display define an alphabet of the graphical language. The operations that happen on them are the graphical syntax. For example, a folder is a terminal symbol, a double click is a parameter that is sent to it when the user double clicks. This syntax is interpreted by the system to cause the “open” semantic operation to occur. Describing operations in this way helped us to map these operations to animated presentations.
9.2 Slug Trails Our first attempt to create example help to teach direct manipulation in a graphical user interface used a pictorial representation of a mouse, with buttons that changed color according to their state, as in (Goldberg and Robson, 1983). A procedure was shown in a static form by arcs between states of the mouse. This resembles the form of animation seen in comic strips. Temporal animation of the motion of the mouse and other objects made this seem more realistic. However, this did not leave a persistent record for the user to review of the important actions that had been shown in the animation. To make the animation easier to review and more concrete in the mind of the user, we developed “slug trails”. The idea behind slug trails was to augment an animation showing a task with a persistent afterimage portraying the important actions. For dragging, the right button on our mouse would go down (shown depressed), the icon would be moved across the display leaving a trail of dots or other graphical debris behind it, then the button would go back up (shown normally). Remaining on the display were the important syntactic graphical actions. The icon at its first location, the mouse in its first important mode with the cursor on the icon, the dots showing where the icon moved and the final location with the final state of the mouse all remained on the display when the action was over. In this way, a movie also has a static presentation as a reminder of the important events that occurred in the movie. This technique was relatively effective when prototyped by creating a graphical program for OS/2 called ANIMATE and a language called GAS for writing such anirnations co-authored by John Haggis and Ron Barber. The animation was supplemented with text explaining “what” was being shown and “how” to do the graphical action. These elements work together to teach concepts. Breaking down the graphical procedure in this way made interpreting the actions concrete. These animations were played on a background image of a display screen. Our first problem with this, like other help systems, was that a tremendous amount of display area was devoted to this presentation. We sought ways to convey the same information with less dedicated space.
124
I
EDWlN J. (TED) SELKER
Drag and Drop - Level 1
T * ‘Drag and drop‘ moves a graphical object on the screen. ~~~~~~
8
To move an icon, place the pointer ower the icon. Hold down mouse button 2 while moving the mouse. - Release the button to ‘DROP‘ the icon.
FIG.14. The first frame of a “slug trails” animation, with the explanatory text whlch accompanies all frames.
FIG. 15. The mouse moves to the top of the folder.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
125
FIG.16. The right mouse button is depressed, starting a drag.
9.3 Icon Dressing After working with the slug trails for some time, we began exploring approaches that did not consume large amounts of screen real estate or draw focus away from their task. We wanted to develop lightweight presentation mechanisms that did not disturb the user’s task. To address these problems, we defined a new idea called “icon dressing”. Icon dressing is a form of graphical annotation which distinguishes an icon by outlining or otherwise embellishing it. This approach dresses it up to describe a likely operation that can be performed with it. We invented a vocabulary of small animations to prompt the user about appropriate actions involving the icon. These small animations focus the user’s attention on the actual icon and what can be done with it, The icon dressing accomplished its goals; however, our prototype tended to require more interpretation by the user and was not flexible enough for general GUI assistance.
9.4
Cue Cards
We began designing a graphical presentation window with goals of associating information relative to an object temporally without covering up the interface with user controls. We realized this goal with the introduction of “Cue Cards”. Distinctive coloration (typically yellow) and appearance (rounded comers, small, light, proportional fonts) like bubbles, together with minimal controls and quick
126
EDWIN J. [TED) SELKER
FIG. 17. In this frame, “slug trails” begin to form, showing a drag in progress.
FIG.18. In the last frame, the “slug trails” shows all the important steps of the action: click, move, and release.
response gave Cue Cards the feeling of supplementing or annotating the user interface without being so much a part of it. Cue Cards associate with something spatially and with color. We found that this feeling greatly contributed to the usability of our help system.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
127
FIG.19. An animated “icon dressing” shows the user how to move by dragging.
9.5 Guides and Masks Dialog boxes are used to present some of the most complex and difficult system controls. One of the more useful applications of a GUI help system is to aid users with dialog boxes. Dialog boxes enhance the productivity of experienced users by presenting many options in a compact way, but their complexity can be frightening to the novice. Assistance agents such as Wizards (Mic, 1995) sidestep this problem, making it easier for new users to do more complex tasks such as installing printers and configuring the system, but they don’t help the user learn to
FIG.20. At lower levels of expertise, attention is guided by highlighting important areas of the dialog and de-emphasizing the features less relevant to the learnable unit. No functionality is lost; the mask does not disable any features.
128
EDWIN J. (TED) SELKER
FIG.21. As the user gains experience, the masks are discontinued,and more expert help text is provided.
use the native interface. Our idea was to make annotations that would vary the complexity of the dialog box interface according to the needs and understanding of a diverse user population. We wanted to encourage the user to interact with the subset of the interface that is important and to default the rest. Specifically, we defined masks. Masks are translucent overlays that shroud some parts of the user interface. Some parts become highlighted, and other parts are unaffected, according to their relevance. A “guide” comprises a sequence of masks which steps the user through a sequence of operations necessary to perform a task. We created a technology for defining and presenting masks and began testing various masking strategies. Masks and guides are intended to focus the user on the actual interface they will eventually have to master. The masks provide guidance without restricting interaction with the system. The masking can be progressively reduced (peeled away) revealing more of the user interface as the user becomes more familiar with it. Even if a user’s activity diverges from that suggested by the guide, the COACH system continues to show a cue card and mask for the parts of the user interface that a person is using. These techniques give a flexible expressive presentation medium for our proactive adaptive help system.
9.6 Hailing Indicator The proactive nature of COACH requires a way to communicate the availability of help information to the user. To accomplish this we added the “hailing
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
129
indicator”, a small icon which may appear in the title bar of a dialog box. If a hailing indicator is shown, it has two distinct appearances: one notifies the user that a single guide is available for the current (unambiguous) task, the other notifies the user that guides are available for more than one possible task. When the COACH user model shows that the user has little recent experience with a task, COACH presents a suitable guide. Otherwise, just the hailing indicator will come up, reminding the user that help is available. Clicking on the hailing indicator for a single guide will bring up the guide. The multiple hailing indicator will bring up a guide menu, so the user can indicate which guide is relevant.
9.7 Sound Finally, we are experimenting with the use of sound to supplement the other techniques. The use of sound can be particularly valuable to introduce the interface and the help mechanisms, creating the initial associations between kinds of help and their uses, and also between icon dressing animations and their meanings. The adaptive technology of COACH enables the sound annotation to become less verbose or disappear when it is no longer useful.
9.8 Summary Widgets, graphical presentation techniques, and audio presentation techniques are a large and exciting field. Our work has emphasized presentation techniques
FIG.22. Users can click COACH/2’s hailing indicator (in the upper left comer) to activate the advisory agent.
130
EDWIN J. (TED) SELKER
for communicating temporal and associated information while not distracting a user from the tasks in a GUI. In our search for such techniques we have developed slug trail animations, icon dressing, cue cards, masks guides, and sound for presenting adaptive proactive help. Slug trails are a modality for graphically demonstrating an example of a procedure. Icon dressing accomplishes the same thing with much less visual real estate at the cost of design flexibility and versatility. Masks are a technique for simplifying the presentation of any user interface, focusing a user’s attention on specific things. Cue cards are lightweight, versatile help presenters designed to be associated with but not cover up user interface function. Recognizing the strengths of each of these techniques has guided the choice of the COACH/2 help presentation mechanisms. A version of COACH/2 ships with the OS/2 operating system starting with release Warp 4, under the name WarpGuide. A check-in procedure identifies the user and selects the appropriate AUM from a database of users. An authoring tool has been used for experiments with using COACH-based help technology with application programs. Figure 23 shows the authoring tool in use. Three windows are open: a tool bar at the top, a text-input window on the left, and a dialog box being annotated on the right. The dialog box appears as it would look to the user,
FIG.23. COACH/2 WYSIWYG authoring tool.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
131
with masking and highlighting to emphasize a region for entering a file name. Below the dialog box is the help text as it would be presented to the user. COACH/2 monitors user events in the operating system’s event queue, therefore COACH-based help can be added to any application that uses the operating system’s API without modifying that application. However, supporting the AUM requires the application to provide a success/failure signal and an identifier for the step within a task where a problem was encountered. Without this support, COACH relies only on how a user “touched” a task step to update the user model.
10. Future Research Goals COACH was originally developed as a research tool to explore teaching approaches, adaptive interfaces, and learning paradigms. The User System Ergonomics Research (USER) group at IBM Almaden Research Center is currently using COACH to explore several new ways of using and expanding the technology. The following is a short list of directions that are or could be followed using COACH as a research vehicle: 0
0
0
0
0
Further experiments could establish the validity of particular instructional techniques in specific situations. Specific adapfive mechatzisnzs should be more fully studied to establish their impact on the user. Currently COACH gives help advice addressing user problems it identifies. The use of COACH to actually perform the solutions to these problems as an assistance agent could save users from rote work, although the impact on learning should be studied. Multi-media help such as video graphics and audio are being tested with COACH/2. Experiments should be run to show if and where users derive more value from multimedia prompts and visual presentation than from text balloons. Tutorial curricula is being added to COACH to make it useful for teaching a syllabus.
As well as being a research platform, COACH is already intrinsically useful as a coaching interface. Development work with COACH/2 is being performed on popular operating systems (OS/2, Windows95). Authoring tools currently under development facilitate support for COACH on applications, which could be used to create a uniform help system for the entire environment experienced by the user.
10.1
Future Research
This section discusses future directions for COACH research in more detail.
132
EDWIN J. (TED) SELKER
(1) Evaluating instructional techniques. Various basic instructional techniques can be embodied in a computer coaching system. In a useful exploration of the goals and techniques of teaching, Alan Collins and Albert L. Stevens (Collins and Stevens, 1983) put forward a list of ten such techniques demonstrated in computer teaching systems: (a) selecting positive and negative exemplars; (b) varying cases systematically; (c) selecting counter-examples; (d) generating hypothetical cases; (e) forming hypotheses; ( f ) testing hypotheses; (g) considering alternative predictions; (h) entrapping students; (i) tracing consequences to a contradiction; ( j ) questioning authority. A fruitful area of research would be the formal evaluation of these instructional techniques. It remains to be proven which are most effective, which can be used together, and, most vital, which are appropriate for a particular situation. COACH would allow teaching techniques to be tested objectively. They would be embodied in coaching rules.
(2) How do adaptive mechanisms impact the user? The adaptive strategies present in the COACH system were arrived at by informal experimentation. Guinea pig users worked with the system to test various adaptive and presentation strategies. A researcher can change rules to alter the system’s coaching actions and strategies (see Section 7). Formal studies to determine which strategies are best could be set up to address issues in education, cognitive psychology, and cognitive science. Many questions could be easily tested. For example, is it better, when users are first exposed to a performance help level, to show them syntax and description, or would it be better to simply focus on an example? Presently the system waits to show related knowledge until the user has shown experience with the learnable unit. The current hypothesis is that too much information might overwhelm a beginner. A different hypothesis might state that the novice should instead be provided as much information as possible when just getting started. Do tasks that are done frequently get a greater benefit from an advisory style of help, while tasks which are rarely done get less value from the teaching aspect of the help system? Exploring such questions in more detail would give insight to
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
133
better understand student cognitive models. COACH is designed to explore such issues. (3) The use of agents in an adaptive teaching interface. Should the computer tell a user how to do something, or should the computer do it for them? An advisory agent that offers advice which the user is free to ignore is less obtrusive than an assistant style agent which implements the advice. In an assistance agent paradigm, when the computer knows something needs to be fixed by a user (e.g., inserting an open parenthesis), the computer takes control and types the solution. For example, in such a paradigm, when the computer identifies some way of simplifying what a user needs to do, it might create a macro so that the complex utterance can be described in a simplified way. The computer, then, has built a new instruction to come to the aid of the user. The computer is building a private, helpful set of tools for the user, a language the user and the computer both know. While the things the computer tries to do to the user’s interaction with a program are what the user really needs and wants, and to the extent that the user can easily ask for advice or the computer can recognize the need for it, an assistance agent can be helpful. The original COACH interface did not implement assistance agents. The hypothesis was that if users do not have to perform things themselves, they will not learn them. Another reason for not including assistance agents in the coaching paradigm was based on the hypothesis that the private interface that assistance agents provide could be difficult for a teacher or colleague to understand when the computer failed to be of help.
(4) Integrating multi-media help. Hardware and demonstrations integrating video and computers are becoming popular. An early work by Steve Gano, “Movie Manual”, integrated text, menus, and video to demonstrate automotive repairs, such as changing the oil (Gano, 1982). Audio can capture attention, animation can demonstrate graphical actions, 3-d graphics can make help distinct from the user interface. Visceral media can attract attention and aid memory of learning experiences without being confused with other graphical devices. Experiments should be done to show whether these other media improve learning and comprehension, and how they are best used to complement other teaching modalities. ( 5 ) Integrating tutorial curricula with COACH. COACH demonstrates an AUM-based teaching aid based on the assumption of a goal-directed user. While most of people’s lives are spent trying to achieve their own goals, we all go through a period of schooling-a
EDWIN J. (TED) SELKER
134
time when others define our goals. The coaching interaction style supports a user’s goals, leaving any “syllabus” or goal definition up to the user or a human teacher. COACH/2 has guides as well as learnable units. A guide is composed of a single thread or path through a series of learnable units. Features can be defined to trigger advancement to the next learnable unit in the series, or the user can explicitly call for the next unit using the navigation buttons on the cue card. Multiple guides can share the same learnable units, e.g. the learnable units for naming a file would be part of guides for both opening a file and saving it under a new name. The author clicks on buttons to indicate the level of user knowledge appropriate for the learnable unit (four levels are supported) and the type of help information being provided (what, how, or general information). The architecture could easily support more directed teaching materials, as demonstrated in systems like the Lisp Tutor (Reiser et al., 1985). The network of relationships between learnable units in COACH could be made to work with a check list (an overall basis-set, see Section 6.3), to go through teaching materials in a sequence. Specific syllabus teaching materials with assignments and problems could be represented in domain knowledge frames (see Section 6.3.1). Adding a rule “what-to-teach-next” to the COACH rule set (see Section 6.4), could interface between the syllabus and the AUM to select appropriate teaching materials. Such a system would have curricular goals, a syllabus, and the ability to evaluate a user relative to these goals. In addition to giving programmed lessons as other tutors do, such a system would be able to follow and help users in their programming, even when they were not doing exactly what is set forth in the curriculum. Unlike a conventional syllabus, COACH allows the user to follow their own path through the material rather than enforcing a particular sequence. Adding tutoring techniques to COACH’S presentation technology would allow for the kind of directed learning that syllabi create without requiring users to work through more of it than they need.
10.2
Future System Development
As well as being a research system, COACH has been used in real work. Below, two efforts are outlined which are making COACH more useful and available to users.
(1) Integrating COACH into standard work environments. COACH/2 is integrated into the OS/2 offering applications automatically; however, certain more ambitious integration projects could make the COACH architecture
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
135
more widely available: (a) User interface environment aids. Rules could be added to COACH which provide already well-known agents. As demonstrated in DO What I Mean (DWIM) (Teitelman and Massinter, 1981), the system could correct spelling errors and correct variable naming or function naming. Assistance agents could also search for conflicts in other functions, similar function definitions, fix data type uses, etc. Assistance agents could also create solutions for equations and allow users to work with alternative representations, as demonstrated in the Mathematica (Wolfram, 1988) mathematical problem solving and visualization system. (b) Efficiency iinpruvenzents. Improved data structures help the COACH/2 system. The help text, rules, and user interface grammars of the Lisp implementation have been replaced in the COACH/2 implementation with a hashed object database. This greatly improves speed-of-access data structures for increased performance for giving help in standard GUI environments. Improved algorithms also make the COACH/2 system more robust. The system caches text that is relevant to the particular user’s AUM, leaving unneeded help information in files on disk. The COACH/:! system is approximately 0.5 megabytes of object code. Because of careful use of memory, it keeps its working set to this size as well. Currently, COACH/2 ships with about 1 megabyte of content. In extremely complex programming exercises, identifying how to parse a code segment can be difficult. A more adaptive scoping parser could allow the COACH system to try multiple ways of parsing small pieces of users’ work when users make changes.
( 2 ) Porting COACH for use in diflerent environments. Development work with COACH/2 is concentrating on implementations for the OS/2 and Windows95 operating systems, with the OS/2 version being shipped as WarpGuide with the OS/2 Warp 4 release. Help support across different application environments raises interesting possibilities for transferring AUM data from one domain into another. Some of the AUM data that we have found easy to track and use in different parts of a single domain could be generally useful across domains: (a) User error rate. The help system could have a lower threshold for questioning the accuracy of input by users known to be error prone. For example, a low score for typing accuracy earned in the text editor could cause the spreadsheet to be more sensitive to the entry of anomalous data, such as a four-digit number in a column of three-digit numbers.
136
EDWIN J. (TED) SELKER
(b) Slow learners. When a user is identified as a slow learner, the help system could concentrate on teaching the basic functions, while avoiding unsolicited advice on more obscure functions that may be confusing to the user. By developing this metric across different applications, appropriate help could be provided even on a user’s first exposure to an application. The history of the user with other applications would substitute for a history with the new application. (c) Experimenters. When the help system recognizes that a user quickly learns to use new features, it can be more forthcoming with hints and shortcuts. On the other hand, if a user is a non-experimenter, these helpful tips may be considered useless or annoying. (d) Preferred modality of interface. Some users prefer keyboard input rather than using a pointing device. Some users make use of function keys, while others ignore them. Some users may have physical handicaps that make one modality difficult or impossible to use. A help system that can recognize a user’s preferred style of interaction can suggest features that fit within those preferences and avoid mentioning features the user is unlikely to use.
( 3 ) Authoring fools. In the original system an author wrote the structure of learnable units in a formal language. To facilitate support of COACHbased help for applications, an authoring tool has been developed for the OS/2 environment that allows a curriculum developer to describe graphical syntax with a WYSIWYG point-and-click interface. A user selects authoring tasks from the authoring tool’s GUI. It allows them to create new guides, to identify new learnable units, to give these guides and learnable units graphical looks, to edit and add content for them, and to identify the relationships between learnable units. An author selects the syntactic part of the GUI for which they want to create help and uses the guide author to create content while demonstrating the actual use of the thing to be learned. The authoring tool automatically creates the COACH syntactic model, allowing it to notice when to give help on the learnable unit. In this way the author defines a mask, features that show through the mask, highlighted features, and a cue card for each learnable unit. A completely new implementation of the authoring tool is being developed in Java, to take advantage of the cross-platform portability of that language.
(4) Web implementation. COACH could be implemented for Web-based applications. The interaction between the user and the advisory agent is implemented as a client-server relationship in COACH. In our UNIXteaching COACH, the help agent ran on a different computer from the one
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
137
running UNIX. A Web-based COACH could have the same structure, in which a COACH server could download applets to the user’s machine to provide text, graphics, sound, and other forms of help information. Alternatively, COACH itself could run only on the client in applet space or as part of a proxy server. A Web COACH could explain interfaces with complex syntax, such as the wild cards and logical operators used to request information from search engines. It could warn about unintended side effects, such as buttons which download software and change the operation of a user’s machine. If there are plug-ins or upgrades required to implement certain features of a Web site, COACH could guide the user through installation of the software. COACH could simplify Web page design because information that isn’t relevant to all users would be hidden from view unless and until an appropriate moment to present it occurred. COACH would streamline access to the information content on the Web, by selectively controlling the content seen by the user, as appropriate for the user’s browsing experience with each site. ACKNOWLEDGMENTS This chapter would have never been completed without the tireless editing and dedication of Mark Thorson. This chapter is based on my PhD dissertation, a document that many people, especially my wife Ellen Shay and sister Diane Selker, worked hard to help me accomplish. Many people have contributed to COACH since the original dissertation; Bob Kelley, Steve Ihde, Julie Wright, and John Haggis worked with Ron Barber to create the COACH/2 implementation. As with any product, a large number of people who worked hard to support the productization of COACH. Without Les Wilson, Ashok Chandra, and Maria Villar, it would not have happened. Ron Barber’s work stands alone. His architectural work, long nights of programming, and leadership made the product happen.
REFERENCES AND FURTHER READING
Alexander, S. M., and Jagannathan, V. (1986). Advisory system for control chart selection. Computer And Industrial Engineering, lO(3). Aronson, D., and Briggs, L. (1983). Contributions of Gagnt and Briggs to a prescriptive model of instruction. In Insrructional-Desigri Theories arid Models: An Overview of Their Current Status, pp. 75-163. Lawrence Erlbaum Associates, New York. Barr, A,, and Feigenbaum, E. A. (eds.) (1984). The Handbook of Arrifiicial Intelligence. William Kaufmann, Los Altos CA. Bereiter, C., and Scardamalia, M. (1 984). Learning and comprehension. In Znformation Processing Demand qf Text Cornposition, pp. 407-421. Erlbaum, Hillsdale, NJ. Bonar, J., and Soloway, E (1985). Preprogramming knowledge: A major source of misconceptions in novice programmers. Human-Computer Interaction, 1, 133-161. Borenstein, N. S. (1985). The Design and Evaluation of On-line Help Systems. PhD thesis, Computer Science Department, Carnegie-Mellon University, Pittsburgh, PA. Brehm, S., and Brehm, J. W. (1981). Psychological Reactance: A Theory of Freedom and Control. Academic Press, New York.
138
EDWlN J. (TED) SELKER
Brownlee, K. A. (1984). Statistical Theory and Method in Science and Engineering. Krieger, Malabar,
FL. Burton, R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science, 2. Burton, R. (1982). lntelligerit Tutoring Systems, chapter 4. Academic Press, New York. Burton, R., and Brown, J. S. (1982). Intelligent Tutoring Systems, chapter 2. Addison-Wesley, New York. Campbell, R. L. (1989). Developmental levels and scenarios for smalltalk programming. Technical Report RC15305, IBM T. J. Watson Research Center, Yorktown Heights, NY, December. Campbell, R. L. (1990). Online assistance: conceptual issues. Technical Report RC15407, IBM T. J. Watson Research Center, Yorktown Heights, NY, December. Carbonell, J. G. (1970). An artificial intelligence approach to computer assisted instruction. IEEE Transactions on Man-Machine Systems, MMS-ll(4). Carbonell, J. G. (1979). Computer models of human personality traits. Technical report, Computer Science Department, Camegie-Mellon University, Pittsburgh PA. Carroll, J. M., and Aaronson, A. (1988). Learning by doing with simulated intelligent help. CACM, 31(9). Clancy, W. (1986). From guidon to neomycin and heracles in twenty short lessons: Om final report 1979-1985. The AI Magazine, pp. 40-60, August. Collins, A,, and Stevens, A. L. (1983). Instructional-design theories and models: An overview of their current status. In Cognitive Theory of Inquiry Teaching, pp. 247-229. Lawrence Erlbaum Associates, New York. Conklin, J. (1986). A survey of hypertext. Technical Report STD-356-86, MCC, Austin TX, October. Corbett, A. T., and Anderson, J. R. (1989). Feedback timing and student control in the Lisp intelligent tutoring system. In Proceedings of The International Conference on Artificial Intelligence, pp. 64-72. IOS, Amsterdam. Davis, R., and Shortliffe, E. (1977). Production rules as a representation for a knowledge-based consultation system. In CHI '87Proceedings, Vol. 8, pp. 15-45. Ellis, T. O., and Sibley, W. L. (1966). The grail project. Spring Joint Computer Colference, Boston, MA. Verbal and film presentation. Erman, L. D., and Lesser, V. (1975). A multi-level organization for problem solving using many diverse, cooperating sources of knowledge. In IJCAI, Vol. 4, pp. 483-490. Feldman, D. H. (ed.) (1980). Beyond Universals In Child and Adult Development. Ablex, Norwood NJ. Fikes, R., and Keeler (1985). The role of frame based representations in reasoning. CACM, 28(9), 904-920, September. Fischer, G., Lemke, A,, and Schwab, T. (1985) Knowledge-based help systems. In CHIProceedings. Gano, S. (1982). Movie manual. Technical report, MIT Media Lab, Cambridge MA. Genesereth, M. (1982). Intelligent Tutoring Systems. In The Genetic Graph. Addison-Wesley, New York. Gentner, D. R. (1986). A tutor based on active schemas. Computational Intelligerzce,2. Glaser, R. (1985). Thoughts on expertise. Technical Report AD-A157 394, Learning Research and Development Center, Pittsburgh, PA, May. Goldberg, A,, and Robson, D. (1983). Smalltalk-80: The Language and Its Implementation. AddisonWesley, New York. Grise, R. F., Jr. (1986). ANGEL: A pleasant user-interface for an interactive computing environment. Master's thesis, Cybernetic Systems Department, San Jose University, San Jose, CA. Heidegger, M. (1977). The Question Concerning Technology. Harper & Row, New York.
COGNITIVE ADAPTIVE COMPUTER HELP (COACH)
139
Hewitt, C. (1972). Description and theoretical analysis (using schemata) of planner, a language for proving theorems and manipulating models in a robot. Technical Report TR-258, MIT A.I. Laboratory, Cambridge MA. Hewitt, C (1985). The challenge of open systems. Byte Magazine, pp. 223-342, April. Hopcroft, 3. E., and Ullman, J. D. (1979). Introduction to Autoruu Theory, Languages, und Coniputation. Addison-Wesley, New York. Houghton, R. C. (1984). On-line help systems: A conspectus. CACM, 27(2). Kernighan, B. W., and Pike, 9 . (eds.) (1984a). The UNIX Prograinriling Envirarinzent,pp. 240-255. Prentice-Hall, New York. Kernighan, 9. W., and Pike, B. (eds.) (1984b). The UNIX Prograniining Environment. Prentice-Hall, New York. Lawrence, K. (1984). Artificial intelligence in the man/machine interface. Data Processing, 1,23 1-236. Lieberman, H. (1985). There's more to menu systems than meets the screen. In Proceedings e f t h e ACMISIGGRAPH Conference, Vol. 24. Lieberman, H., and Hewitt, C. (1980). A session with tinker. Technical Report 577, MIT A1 Laboratory, Cambridge MA, September. Mackinlay, J. (1986). Automatic Design of Gruphical Presetitations. PhD thesis, Computer Science Department, Stanford University, Stanford CA. Malone, T. W., and Lepper, M. R. (1987). Making leaming fun: a taxonomy of intrinsic motivation for learning. Lawrence Erlbaum Associates, New York. Manna, Z. ( 1972). Muthenzaticu/ Theory of Compuration, chapter 5-3. McGraw-Hill, NY. Mastaglio, T. (1 989). Tutors coaches and critics. Technical report, Computer Science Department, University of Colorado, Boulder CO. Mays, E., Apte, C., Griesmer, J., and Kastner, J. (1988). Experience with K-Rep: An object-centered knowledge representation. In The Fourth Conference on Artgiciai Intelligerzce Applications, Proceedings. LEEE Computer Society Press. Merrill, M. D. (1983). Component display Theory, In Iiistructioizal-desigrlTheories and Models: An Overview of Their Current Status, pp. 279-334. Lawrence Erlbaum Associates, New York. Microsoft Corp. (1995). Windows 95 User's Guide. Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. (1983). Machine Learning: An Artifciul Intelligence Approach. Tioga Publishing Company, Palo Alto, CA. Minsky, M. (1976). Frames. Technical report. A1 Laboratory, MIT, Cambridge MA. Moon, D. (1987). User's Guide To Symbolics Conzputers. Morris, N. M., and Rouse, W. 9. (1986). Adaptive aiding for human-computer control: Experimental studies of. Technical Report AAMRL-TR-86-005, Armstrong Medical Research Lahoratory, Wright-Patterson Air Force Base, OH. Myers, B. A. (1986). Visual programming, programming by example, and program visualization: A taxonomy. In CHI '86 Proceedings, pp. 59-66. Pirolli, P. (1986). A cognitive model and computer tutor for programming recursion. Hurnari-Computer Interaction, 2. Rath, G. J., Anderson, N. S., and Brainerd, R. C. (1959). The IBM Research Center Teaching Machine Project. In Autoniatic Teaching: The Stute of the Art. John Wiley and Sons. Reigeluth, C. M. (ed.) (1983). Instrirctiorzal-Desigrl Theories mid Models: An Overview qf Their Current Status. Lawrence Erlbaum Associates, New York. Reiser, B. J., Anderson, J. X.,and Farrell, R. G. (1985). Dynamic student modeling in an intelligent tutor for lisp programming. IJCAI, 1. Reisner, P. (1986). Human computer interaction: What is it and what research is needed? Technical Report RJS308, IBM Almaden Research Center, Almaden CA. Revesman, M. E. (1983). Validation and Application of A Model of Human Decision Making For. PhD thesis, Industrial Engineering and Operations Research, Virginia Polytechnic. VA.
140
EDWIN J. (TED) SELKER
Rich, E. (1983). Users are individuals: Individualizing user models. Int. Man-Machine Studies, 18, 199-214. Rissland, E. (1978). Understanding mathematics. Cognitive Science, 2, 361-383. Schofield, J., Evans-Rhodes, D., and Huber, B. (1990). Artificial intelligence in the classroom: The impact of a computer-based tutor on teachers and students. Social Science Computer Review. Selfridge, 0. (1985). Personal communication. Selker, T. (1989). Cognitive adaptive computer help (coach). In Proceedings of The Znternationaf Conference on Artificial Intelligence, pp. 25-34. IOS, Amsterdam. Selker, T. (1991). Cognitive adaptive computer help. Technical Report, videotape. Selker, T., and Koved, L. (1988). Elements of visual language. IEEE Workshop On Visual Languages, October. Sleeman, D., and Brown, J. S . (eds.) (1982). ZntefligentTutoring Systems. Academic Press, New York. Snelbecker, G. E. (1983). Is Instructional Theory Alive and Well? Instructional-design Theories and Models: An Overview of Their Current Status, pp. 437-472. Lawrence Erlbaum Associates, New York. Suppes, P. (1967). Some theoretical models for mathematics learning. Journal of Research and Development in Education, pp. 4-22. Sussman, G., Winograd, T., and Chamiak, E. (1970). Micro-planner reference manual. Technical Report A1 Memo 203, A1 Laboratory, MIT, Cambridge MA. Teitelman, W., and Massinter, L. (1981). The interlisp programming environment. Computer, 14(4), 25-34. Vertelney, L., Arent, M., and Lieberman, H. (1991). Two disciplines in search of an interface: reflections on a design problem. In The Art of Human-computer Interface Design, Addison-Wesley, New York. Waters, R. C. (1982). The programmer’s apprentice: Knowledge based program editing. IEEE Transactions on Software Engineering, SE-8( l), 1-12. Weiss, L. (1987). Conceptual model of an intelligent help system. Technical Report DDC/LW-l5, ESPREE, May. Whiteside, J., and Wixon, D. (1986). Improving human-computer interaction: A quest for cognitive science. Technical report, Digital Equipment Corporation, Maynard MA. Winograd, T., and Flores, F. (1986). Understanding Computers and Cognition: A New Foundation for Design. Ablex. Norwood NJ. Winston, P. H. (1977). Artificial Intelligence. Addison-Wesley, New York. Wolfram, S. (1988). Mathemutica: A System for Doing Mathematics by Computer. Addison-Wesley, New York. Zellermayer, M., Salomon, G., Globerson, T., and Givon, H. (1991). Enhancing writing-related metacognitions through a computerized writing partner. American Education Research Journal, pp. 373-391. Zissos, A. Y., and Witten, I. H. (1985). User modeling for a computer coach: A case study. Int. J . Man-Machine Studies, 23.
Cellular Automata Models of Self-repIicating Systems JAMES A. REGGIA" Department of Computer Science and Institute for Advanced Computer Studies University of Maryland, A. V. Williams Bldg., College Park, MD20742, USA
HUI-HSIEN CHOU The Institute for Genomic Research Rockville, M D
JASON D.LOHN Caelum Research Corporation NASA Ames Research Center Moffett Field, CA
Abstract Since von Neumann's seminal work around 1950, computer scientists and others have studied the algorithms needed to support self-replicating systems. Much of this work has focused on abstract logical machines (automata) embedded in twodimensional cellular spaces. T h ~ sresearch has been motivated by the desire to understand the basic information-processing principles underlying self-replication, the potential long term applications of programmable self-replicating machines, and the possibility of gaining insight into biological replication and the origins of life. Here we briefly summarize the historical development of work on artificial self-replicating structures in cellular spaces, and then describe some recent advances in this area. Past research is viewed as taking three main directions: early complex universal computer-constructors modeled after Turing machines, qualitatively simpler self-replicating loops, and efforts to view selfreplication as an emergent phenomenon. We discuss our own recent studies showing that self-replicating structures can emerge from non-replicating components and that genetic algorithms can be applied to automatically program simple but arbitrary structures to replicate. We also describe recent work in which selfreplicating structures are successfully programmed to do useful problem solving as they replicate. We conclude by identifying some implications and important research directions for the future.
* To whom correspondence should be sent. ADVANCES IN COMPUTERS, VOL. 47 ISBN 0-12-012147-6
141
Copyright 0 1998 by Academic Press All rights of reproduction in any fomi reserved.
142
JAMES A. REGGIA ETAL.
1. Why Study Self-replicating Systems? . . . . . . . . . . . . . . . . . . . . . . . . 2. Early Self-replicating Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Cellular Automata Framework . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Von Neumann’s Universal Computer-Constructor . . . . . . . . . . . . . . 2.3 The Drive to Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Self-replicating Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Sheathed Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Unsheathed Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Varying Rotational Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Reduced Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Emergence of Self-replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. I Emergence of Replicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Automatic Programming of Replicator Rules . . . . . . . . . . . . . . . . . 5. Programming Self-replicating Loops . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Duplicated Program Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Expanding Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 6. Discussion.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . .
142 143 144 146 149 150 151 153 156 159 160 160 167 173 174 175 178 180
1. Why Study Self-replicating Systems? Self-replicating systems are systems that have the ability to produce copies of themselves. Biological organisms are, of course, the most familiar examples of such systems. However, around 1950 mathematicians and computer scientists began studying artificial self-replicating systems in order to gain a deeper understanding of complex systems and the fundamental information-processing principles involved in self-replication [5, 701. The initial models that were developed consisted of abstract logical machines, or automata, embedded in cellular spaces [2, 11, 33, 571. In addition to work on cellular automata, other computational models, such as those based on more traditional programming concepts [56],continue to be the subject of research. Mechanical and biochemical models have also been constructed and studied [26,46,49]. Much of this work on artificial self-replicating systems has been motivated by the desire to understand the fundamental information-processing principles and algorithms involved in self-replication, independently of how they might be physically realized. A better theoretical understanding of these principles could be useful in a number of ways from a computational/engineering perspective. For example, it has been proposed that: 0
Self-replicating programs undergoing artificial selection could facilitate the difficult task of programming massively parallel computers. Experiments performed on sequential computers have shown that such programs can optimize their algorithms more than fivefold in a few hours of time [56].
MODELS OF SELF-REPLICATING SYSTEMS
0
0
0
143
Understanding self-replication processes could shed light on computer viruses and may contribute to their detection and the creation of biologically inspired “immune systems” [30]. Self-replicating devices could play a key role in atomic-scale manufacturing or “nanotechnology” [ 131. Researchers in this area have already gained insight from early work on self-replicating systems [41]. Self-replicating systems may have an important future role in planetary exploration [ 161 and in creating robust electronic hardware [39,40].
Developing an understanding of the principles of self-replication is also of interest in a broader scientific context. For example, understanding these principles may advance our knowledge of the biomolecular mechanisms of reproduction, clarifying conditions that any self-replicating system must satisfy and providing alternative explanations for empirically observed phenomena. Selfreplicating systems have thus become a major area of research activity in the field of artificial life [34, 351. Work in the area of self-replicating systems could shed light on those contemporary theories of the origins of life that postulate a prebiotic period of molecular replication before the emergence of living cells [46,47, 521.
2.
Early Self-replicating Structures
Table I lists several examples of past cellular automata studies of selfreplicating structures. While the earliest work on artificial self-replicating structures/machines sometimes used mechanical devices [29, 491, subsequent work has been based largely upon computational modeling, especially with cellular automata. Thus, in the rest of this article, we will focus on self-replicating structures implemented within the framework of cellular automata. Work on such models can be viewed as primarily taking three approaches. First, early selfreplicating structures (1960s and 1970s) were large, complex universal systems modeled after Turing machines. Although these early models were so large and complex that they have never actually been fully implemented, they provided the first demonstration that artificial self-replicating structures could in principle be devised and stimulated substantial theoretical work. These early models are discussed in Sections 2.2 and 2.3. A second generation of self-replicating structures, studied since the mid 1980s, were designed to be qualitatively simpler than their predecessors. This was done by relaxing the criteria that self-replicants must also be capable of universal computation and construction. These models, characterized as self-replicating loops, are discussed in Section 3. More recently, we have taken a third approach in our own work that focuses on self-replication as an
144
JAMES A. REGGIA H A L .
TABLEI EXAMPLES OF RESEARCH INVOLVING SELF-REPLICATING STRUCTURES W CELLULAR SPACEMODELS Rot. States Neighborhood Structure Year Model type Dim. symmetry per cell size(s) size(s)" Capabilitiesb Refs
1951 CA 1965 CA 1966 CT-Mach. 1973 CA 1976 a-Univ. 1984 CA 1989 CA 1993 CA 1995 EA 1995 CA 1995 non-uni. CA 1996 CAfW-Mach. 1997 CA a
2D 2D 2D 2D 1D 2D 2D 2D 2D 2D 2D 2D 2D
weak strong weak strong strong strong strong both weak strong strong strong weak
29
8
%
10'W 8 5 8
6 63 9,13 6 2 63 192
5 5 5 5
-c 5 5
5,9 5 9 9 5 9
> lo4 > 104 = 102 > lo4
(Wd 86 12
5-48 23 52 5 127 4 ormore
Some systems were never implemented, thus certain values are approximations. s = self-replication, o = other capabilities in addition to self-replication. No fixed neighborhood size. Theorized.
emergent property rather than, as in the past, being based solely on manually designed replicants. This work has shown that self-replicating structures can emerge from initially random states, and that rules to control self-replication can be discovered using artificial evolution methods (genetic algorithms). It has also been established that self-replicating structures can be used to solve problems while they replicate. These recent developments are summarized in Sections 4 and 5. Finally, Section 6 speculates about some of the implications of this work and offers suggestions for future work.
2.1 Cellular Automata Framework Since the models of self-replication described below are implemented in a cellular automata framework, we briefly describe this framework here. Cellular automata can be characterized as an array of identical processing units called cells that are arranged and interconnected throughout space in a regular manner. Figure 1 shows a typical example of a small two-dimensional space tesselated into squares where each square represents one cell. Each cell represents the same abstract finite state automaton (computer), which typically can be in any of two or more possible states. These internal states are usually represented by letters, digits, or other non-numeric characters. A special state, called the quiescent or
MODELS OF SELF-REPLICATING SYSTEMS
neighborhood
145
neighborhood
FIG.1. The von Neumann neighborhood (5-neighborhood) and Moore neighborhood (9neighborhood) of a central cell labeled C; here N = north, E = east, S = south, and W = west. The labels here, unlike in all subsequent illustrations, do not represent cell states.
inactive state, is generally represented by an empty cell in pictures, or as a period in text. All other cells are said to be active. At each tick of simulated time, each cell simultaneously changes state as a function of its own current state and the state of its immediately neighboring cells. Which cells are considered to be immediate neighbors varies from model to model. With the 5-neighborhood or von Neumann neighborhood, each cell (such as the one marked C for “central” in the left half of Fig. 1) is considered to have five immediate neighbors: north, east, south, west and itself. In other words, a cell makes a decision about its new state based on its four adjacent cells plus its own state. During processing, various structures (configurations) arise. A structure is a fixed/moving persistent pattern of multiple contiguous activated cells. For example, Fig. 2 illustrates a structure called a “glider” from the well-known cellular automata model called the Game of Life [4, 171. The Game of Life model uses the 9-neighborhood or “Moore neighborhood” (Fig. 1, right). Cells in the Game of Life have only two possible states, dead (quiescent, indicated by empty cells) or alive (indicated by Is). Thus, in Fig. 2 at the start (upper left), exactly five cells are alive, and these form a structure called a glider. At each instant of time t, each cell follows a simple set of rules forming the transition function, based on the number N of its eight neighbors that are alive (in state 1): q u i e s c e n t a t t i m e t and N = 3 , t h e n a t t i m e t+l change s t a t e t o 1 ( “ b i r t h “ ) . I f a l i v e a t t i m e t a n d e i t h e r N < 2 o r N > 3, t h e n a t t i m e t + l become q u i e s c e n t ( “ d e a t h ” ) . O t h e r w i s e , s t a y i n t h e same s t a t e a t t i m e t + l a s a t t i m e t .
If
Following these rules, the glider structure, the pattern of 1s in Fig. 2 , goes through a sequence of steps that shift it one unit diagonally every four units of time. Again, we follow the convention here and in the following figures of showing
146
JAMES A. REGGIA ETAL.
e-
step 7
step 6
step 5
step 4
FIG.2 . Successsive applications of the Game of Life rules given in the text to a small initial cellular automata structure called a glider. The glider gradually moves to the lower right as each cell follows the same set of instructions based solely on local information. For example, at iteration 4 (lower right) it can be seen that the initial configuration has reappeared but shifted one space down and to the right.
quiescent cells as being empty or blank. We refer to each iteration of the model in which all cells simultaneously change state as one step or one unit of time. As with the Game of Life, with any cellular automata model each cell’s state transitions are governed by a set of rules forming the transition function. Each single rule is simple and based solely on locally available information. The “locality’’ of computation, that is, the fact that each cell can change its state based only on the state of its neighbors (including its own current state), is a fundamental aspect of cellular automata computation. In spite of such localized information processing, experience has shown that the complete set of rules forming a transition function, through their application by all of the cells in the model sirnultaneously and repetitively over time, can produce very rich and at times striking behavior. For this reason, cellular automata are being increasingly used as models in physics, chemistry, biology, and other scientific fields. The critical point here is that, since all computations are strictly local operations, any fixed/moving/ replicating structures that occur represent emergent behavior of the model. The reader interested in further details of cellular automata models in general is referred to the many collections and reviews on this topic [12, 14,21,53, 72,731.
2.2 Von Neumann‘s Universal Computer-Constructor The mathematician John von Neumann first used cellular automata to study the logical organization of self-replicating structures [70]. In his and most subsequent
MODELS OF SELF-REPLICATING SYSTEMS
147
work, two-dimensional cellular automata spaces are used, and cells can be in one of several possible states. At any moment most cells are quiescent or inactive; those cells that are active are said to be components. A self-replicating structure is represented as a configuration of contiguous active cells, each of which represents a component of a replicating machine. Put otherwise, there are actually r ~ levels o at which one can talk about “machines” in the models we consider below:
(1) Cells. Each cell forming the cellular automaton space is a finite state machine. For simplicity, in the rest of this review this fact will largely be kept implicit. We will instead emphasize the view that a cell represents a local region of space, a quiescent cell represents empty space, an active cell represents a region containing a component of a structure, and the transition function (rules or program) followed by a cell represents the “underlying physics” of the space. ( 2 ) Structures. A set of contiguous active cells or components (such as the “glider” in Fig. 2 ) can also be viewed as an abstract machine. Such a structure, spanning several cells of the cellular space and considered as a whole, is a machine at a higher level of abstraction. When we refer to selfreplicating structures or “machines” in the following, we will be referring to this higher level of abstraction. Since at each instance of simulated time, each cell determines its next state as a function of only its current state and the state of immediate neighbor cells, any self-replicating structures observed in the models we consider must be an emergent behavior arising from strictly local interactions. Based solely on these concurrent local interactions an initially specified self-replicating structure goes through a sequence of steps to construct a duplicate copy of itself (the replica being displaced and perhaps rotated relative to the original). Von Neumann’s original self-replicating structure is a complex universal computer-constructor embedded in a large, two-dimensional cellular automata space that consists of 29-state cells. It is based on the heighborhood, and is literally a simulated digital computer (Turing machine) that uses a “construction arm” in a step-by-step fashion to construct a copy of itself from instructions on a “tape”. In Fig. 3, the initial state of this structure is shown on the left with its attached tape and its construction arm extended out to the upper right where a replicant is in the process of being constructed. The initial machine is said to be a universal constructor in that it can construct a copy of any structure properly specified on its tape [5]. It can also copy its input tape and attach it to the new structure. Self-replication can thus occur if the original machine is given a tape with a description of its own structure. One of the important concepts introduced in von Neumann’s universal computer-constructor is that of a data path over which signals can flow. The
148
Construction Control (not drawn to scale)
JAMES A. REGGIA ETAL.
Constructing Arm
lllI
I713
Tape FIG.3. Schematic diagram of von Neumann’s self-replicating structure. The actual structure occupies at least tens of thousands of cells. The initial structure consists of construction and tape controls as shown on the left. The tape contains a description of the initial structure to be replicated, the actual work being done by a constructing arm. A partially completed copy of the original structure is shown in the nearby cellular space (upper right). During the replication process, instructions on the tape cause the construction control to send signals up the constructing arm. These signals cause the arm to move as it sequentially deposits components in the replicant. (Figure taken from Essays on Cellular Automata, A. Burks (ed.), copyright 1970 by the Board of Trustees of the University of Illinois, used with permission of the University of Illinois Press).
construction arm in Fig. 3 provides an example. Without describing the details of von Neumann’s specific design, the basic idea of a data path and signal flow is illustrated in Fig. 4. The data path fragment shown in the figure consists of a row of cells in a state labeled by the letter 0. Three signals (>>L) are embedded in the data path at t = 0, and at each iteration (tick of the clock) move one cell to the right. The transition rules obeyed by each and every individual cell that produce such behavior can be summarized as: I f in s t a t e 0 a t t i m e t a n d p o i n t e d a t b y >, t h e n c h a n g e t o s t a t e > at t i m e t+l. I f i n s t a t e > a t t i m e t and a n e i g h b o r c e l l i s i n s t a t e L, t h e n c h a n g e t o s t a t e L a t t i m e t+l.
149
MODELS OF SELF-REPLICATING SYSTEMS
FFFl 3
5
4
FIG.4. Signal sequence flow over a data path. Each box is a snapshot of the same region in the cellular automata space during successive times (iterations). Numbers below each box denote the times. The rules forming the transition function are such that signal > is followed by either a signal > or by the signal L. Signal L always changes to 0, the latter changing to > if pointed at by >. The net effect is that the signal sequence L>> progressively moves to the right at a rate of one cell per unit time. i n s t a t e L a t t i m e t, t h e n change t o s t a t e 0 a t t i m e O t h e r w i s e do n o t c h a n g e s t a t e .
I f
t+l.
Data paths similar to that illustrated here serve as the “wires” over which information is transmitted, both internally in the universal computer-constructor and via the construction arm (Fig. 3). Von Neumann’s work provided an early demonstration that an artificial information-carrying system capable of self-replication was theoretically possible. It established, within the cellular automata framework, a logical organization that is sufficient for self-replication. The detailed design of von Neumann’s original universal computer-constructor can be found in [70] and is clearly summarized in [5].
2.3 The Drive t o Simplification While the work by von Neumann established that artificial self-replication is possible, it left open the question of the minimal logical organization necessary for selfreplication [5, 701. Subsequent analysis led to several other results: it showed that some simplification of von Neumann’s configuration was possible by redesigning specific components [66] or by increasing cell state complexity [2], demonstrated that sexual reproduction could be simulated [68], generalized von Neumann’s basic result to other configurations and higher-dimensional cellular spaces [45], established theoretical upper bounds on how rapid a population of self-replicating configurations could grow [43], and examined several fundamental issues [3, 7, 2, 54, 601 that continue to generate theoretical interest today [28, 631.
150
JAMES A. REGGIA H A L .
Most influential among this early work has been Codd’s demonstration that if the components or cell states meet certain symmetry requirements, then von Neumann’s configuration could be done in a simpler fashion using cells having only eight states rather than the 29 used originally [ 111. Codd argued that using components that were symmetrical led to a simpler model, and made modifications to von Neumann’s design based on this and considerations about how brain cells transmit information. He also implemented and tested several parts of the replicating structure that he designed. His universal computer-constructor was simpler but otherwise similar in spirit to that of von Neumann. Another approach taken to reduce the complexity of von Neumann ’s design focused on using more complex components [2]. The resulting 2-D cellular space model was referred to as constructing Turing machines, or CT-machines [MI. Each cell in this space contains a finite state automataon that executes short 22instruction programs. The instructions consist of actions such as weld and move, and internal control constructs such as if and goto. Self-replication occurs when individual CT-machines copy their instructions into empty cells. While these early studies describe structures that self-replicate, these structures generally consist of tens of thousands of components or active cells, and their selfreplication has thus never actually been simulated computationally because of their tremendous size and complexity. Only recently has a simplified version of von Neumann’s universal computer-constructor actually been implemented [5 I]. This implementation involved redesigning many of the components and extending the original transition function. Self-replication with this new universal computer-constructor remains to be demonstrated; this will require design of a tape that encodes a description of the universal computer-constructor (Pesavento, 1997, personal communication).
3. Self-replicating Loops The complexity of even the simplified early cellular automata models described above seems consistent with the remarkable complexity of biological self-replicating systems: they appear to suggest that self-replication is, from an information processing perspective, an inherently complex phenomenon. Recent work with self-replicating loops provides evidence that this is not necessarily so, and represents a major step forward in efforts to produce simpler self-replicants. In this section, we consider sheathed and unsheathed replicating loops, and discuss some issues concerning component symmetry and how simple self-replicating structures can be. For clarity and preciseness, in the remainder of this article, self-replicating structures are labeled by their type (SL = sheathed loop, UL = unsheathed loop, PS = polyominoe structure) followed by the number of components, the rotational symmetry of the individual cell states (S = strong,
151
MODELS OF SELF-REPLICATING SYSTEMS
W = weak; explained below), the number of possible states a cell may have. and the type of neighborhood (V = von Neumann, M = Moore). For example, the sheathed loop discussed next is labeled SL86S8V because it spans 86 active cells, has strongly symmetric cell states with each cell assuming one of 8 possible states, and its transition function is based on the 5-neighborhood (von Neumann neighborhood). This labeling convention provides a compact description of the loops we consider.
3.1
Sheathed Loops
A much simpler self-replicating structure based on 8-state cells, the sheathed loop, was developed by Langton in the mid-1980s (see Fig. 5(b)) [33]. The term “sheathed” here indicates that this structure is surrounded by a covering or sheath (Xs in Fig. 5(a)-(c)). Before examining self-replicating loops, first consider Fig. 5(a), where a non-replicating loop plus arm (the latter coming off the lower right of the loop) is shown. The loop consists of a core of cells in state 0 and a a.
d.
XXXXXXXX xoooooooox xoxxxxxxox xox xox xox xox xox xox xox xox xoxxxxxxoxxxxx xoo +ooooooooox xxxxxxxxxxxxx -O+-0+-OL-OL
6t;-
6+
-
0 0
XXXXXXXX xo+ OL OLX x xxxxxx x x+x xox xox xox xx+x x xox xox xoxxxxxxoxxxxx xxxxxxxxxxxxxx +o +o +ooooox
b.
e.
o+-OL-OL +
0
0 0 0 0 0 0
f .
0 0 0
0 -
+
c.
000 0 0 L++OO
xx
XLOX XLZX X
g . 00 L+OO
0
0 -+o-+o-+oooo
0
0-+o-+o-+o-+oooo h.
OOcOO~LLOOOo
V
0
0
0
0
0
0 0 0 0 0 0 0 0 V 0 00>00>00>00~0000
t; t;
i.
00
V
0 0
V
0 0
0 0 0 0 0 0
3.
000 0 0 L>>OO
k.
00
L>OO
>oo>oo>ooooo
FIG.5. Self-replicating loops in two dimensional cellular automata. Cells in the quiescent state are indicated by blank spaces. (a) Sheathed but non-replicating loop. A core of 0 s is snrrounded by a sheath of Xs. A single signal (+ followed by blank space) repeatedly circulates counterclockwise around the loop. (b) A self-replicating sheathed loop designated SL86S8V; (c) A small self-replicating sheathed loop SL12S6V; (d)-(g) Unsheathed self-replicating loops designated UL48S8V, UL32S8V, ULlOSW, and ULO6S8V, respectively; (h)-(k) Unsheathed selfreplicating loops designated UL48W8V, UL32W8V, ULIOW8V, and ULO6W8V, respectively.
152
JAMES A. REGGIA ETAL.
sheath of cells in state X. In this case, a signal + followed by a blank space (quiescent cell) circulates around the “circular” data path forming the loop. Rules similar to those governing signal propagation in the data path of Fig. 4 act here to support the counterclockwise circulation of signals. Each time the signal reaches the lower-right branch point where the arm extends from the loop, a copy of it passes out the arm. This non-replicating loop can be viewed as a storage element (any signal sequence circulating in it represents stored information), and similar non-replicating structures were used as parts in the universal computer-constructors designed by von Neumann and Codd. Figure 5(b) shows the initial state of a self-replicating sheathed loop that, as noted above, we will designate as SL86S8V [33]. The signal or instruction sequence + + + + + + L L that directs replication is embedded in the core of 0 s forming a loop similar to that shown in Fig. 5(a) (reading clockwise around the loop starting at the lower right corner). As copies of this circulating signal sequence periodically reach the end of the arm, they trigger the growth and turning of that arm to form a duplicate loop in the nearby cellular space, as explained in more detail below. In creating a sheathed loop that replicates, the biologically implausible requirements of universal computability and of ability to function as a universal constructor that were used in earlier models were abandoned. To avoid certain trivial cases, replicating loops are required to have a readily identifiable stored instruction sequence or program that is used by the underlying transition function in two ways: as instructions that are interpreted to direct the construction of a replica, and as uninterpreted data that is copied onto the replica [33]. Thus, self-replicating loops are truely “information replicating systems” in the sense that this term is used by organic chemists [46]. The original sheathed loop was a modified version of a periodic emitter, a storage element and timing device in Codd’s model [l 11. Whereas von Neumann had used unsheathed data paths similar to that in Fig. 4, Codd introduced the analogous concept of a sheathed data path in his model. This consists of a series of adjacent cells in state 0 called the core covered on both sides by a layer of cells in state X called the sheath similar to what is illustrated in Fig. 5(a). The sheathed data path served as a means for signal propagation, where signals or instruction sequences, represented by cells in other states embedded in the core of a data path, propagate along it. Codd’s periodic emitter was a non-replicating loop similar to that in Fig. 5(a) except that it contained a more complex sequence of signals that continuously circulated around the loop. Each time the signal sequence passed the origin of the arm (lower right of loop in Fig. 5(a) a copy of the signal would propagate out along the arm and, among other things, could cause the arm to lengthen or turn. Langton showed that Codd’s sheathed loop, a part of a much larger selfreplicating structure, could be made self-replicating all by itself by storing in it a set of instructions that direct the replication process [33]. The “program” of the
MODELS OF SELF-REPLICATING SYSTEMS
153
replicating sheathed loop, pictured in Fig. 5(b), consists of individual instructions +, meaning “extend the current data path one cell”, and LL, meaning “extend and turn left”. Thus, the sheathed loop’s instruction sequence + + + + + + L L can be interpreted as “extend the data path forward seven cells, then turn left”. As this instruction sequence passes out the loop’s arm it is “executed” as it reaches the end of the arm or growing structure. Each time the instructions are executed they generate one side of a new loop. Thus, executing these instructions four times causes the arm to repeatedly extend and turn until a second loop is formed, detaches, and also begins to replicate, so that eventually a growing “colony” of self-replicating loops appears. This replicating sheathed loop consists of 86 active cells as pictured in Fig. 5(b), and its transition function has 207 rules based on the 5-neighborhood. Subsequently, two smaller self-replicating sheathed loops containing as few as 12 active cells in one case were described (Fig. 5(c)) [6].
3.2 Unsheathed Loops Following Codd’s and Langton’s work, we hypothesized that a number of alterations could be made that would result in even simpler and smaller self-replicating structures [57]. Such simplification is important for understanding the minimal information processing requirements of self-replication, for relating these formal models to theories of the origins of life, and for identifying configurations so simple that they might actually be synthesized or fabricated. One potential simplifying alteration is removal of the sheath surrounding data paths. It was not obvious in advance that complete removal of the sheath would be possible. The sheath was introduced by Codd and retained in developing sheathed loops because it was believed to be essential for indicating growth direction and for discriminating right from left in a strongly rotation-symmetric space ([ 111, p. 40; [6], p. 296). In fact, we discovered that having a sheath is not essential for these tasks, and its removal leads to smaller self-replicating structures that also have simpler transition functions. To understand how the sheath (surrounding covering of Xs) can be discarded, consider the unsheathed version UL32S8V (shown in Fig. 5(e)) of the original 86component sheathed loop (shown in Fig. 5(b)). The cell states and transition rules of this unsheathed loop obey the same symmetry requirements as those of the sheathed loop, and the signal sequence + - + - + - + - + - + - L - L - directing self-replication is the exact same program written using different “instruction codes” (+ - for “extend”, L - for “extend and turn left”). As illustrated in Fig. 6, just as with sheathed loops the instruction sequence circulates counterclockwise around the loop, with a copy passing onto the construction arm. As the elements of the instruction sequence reach the tip of the construction arm, they cause it to extend and turn left periodically until a new loop is formed. A “growth cap” of Xs at the tip of the construction arm enables directional growth and right-left
154 a.
JAMES A. REGGIA ETAL.
OL-OL-00 0
+
0
-
+-
0
+
b.
0
C.
OL-00000 +L 0 0 -
+ o x ~+o-+o-+o-+ox
o+o-+o-+o-+o
L -
66-
+-
0
-o+-o+-0 0
L -
0
X
xo+-
x ?-
0
+ -
+0-+00000-L0-L0-+0
X
0
d.
e.
-o+-o+-0 +o + 0 xox o x 0
OL-OL-00 -
+
0
-
+
+-
L -
0 0 ~+0-+0-+0-+00000-L
-
0 oooo+-o+ - o+-OL-OL 0 -
L 0 L
o + o
+ -o +
0
0 0 0 0
o-+o-+o: o+o-+o-+oooo
FIG.6. Successive states of unsheathed loop UL32S8V starting at time t = 0. The instruction sequence repeatedly circulates counterclockwise around the loop with a copy periodically passing onto the construction arm.At t = 3 (a) the sequence of instructions has circulated 3 positions counterclockwise with a copy also entering the construction arm. At t = 6 (b) the arrival of the first + state at the end of the construction arm produces a growth cap of Xs. This growth cap, which is carried forward as the arm subsequently extends to produce the replica, is what makes a sheath unnecessary by enabling directional growth and right-left discrimination even though strong rotational symmetry is assumed (see text). Successive amval at the growth tip of +S extends the emerging structure and arrival of Ls causes left turns, resulting in eventual formation of a new loop. Intermediate states are shown at t = 80 (c) and t = 115 (d). By r = 150 (e) a duplicate of the initial loop has formed and separated (on the right); the original loop (on the left, construction arm having moved to the top) is beginning another cycle of self-directed replication.
discrimination at the growth site (seen in Fig. 6(b)-(d)). It is this growth cap that makes elimination of the sheath possible. As shown in Fig. 6(e), after 150 iterations or units of time the original structure (on the left, its construction arm having moved to the top) has created a duplicate of itself (on the right). This unsheathed loop UL32S8V not only self-replicates but it also exhibits all of the other behaviors of the sheathed loop: it and its descendents continue to replicate, and when they run out of room for new replicas, they retract their construction arm and erase their instruction code. After several generations a single unsheathed loop has formed an expanding “colony” where actively replicating structures are found only around the periphery. Unsheathed loop UL32S8V has the same number of cell states, neighborhood relationship, instruction sequence length, rotational symmetry requirements, and so on, as the original sheathed loop and it replicates in the same amount of time. However, it has only 177 rules compared to 207 for the sheathed loop, and is less than 40% of the size of the original sheathed loop (32 active cells versus 86 active cells, respectively). The rules forming the transition function for UL32S8V are given in [58]. Successful removal of the sheath makes it possible to create a whole family of self-replicating unsheathed loops using 8-state cells and strongly rotation-
155
MODELS OF SELF-REPLICATING SYSTEMS
symmetric cell states. Examples of these self-replicating structures are shown ordered in terms of progressively decreasing size in Fig. 5(d-g) (labeled UL48S8V, UL32S8V, ULlOSW, and ULO6S8V, respectively) and are summarized in the first four rows of Table 11. Each of these structures is implemented under exactly the same assumptions about the number of cell states available (eight), rotational symmetry of cell states, neighborhood, isotropic and homogeneous cellular space, and so forth, as sheathed loops within Codd’s framework [ll]. Given the initial states shown here, it is a straightforward but tedious and time-consuming task to create the transition rules needed for replication of each of these structures [%I. The smallest unsheathed loop in this specific group using 8-state cells, ULO6SSV in Fig. 5(g), is listed in line 4 of Table 11; it is more than an order of magnitude smaller than the original sheathed loop (SL86S8V; line 9 of Table 11). Consisting of only six components and using the two instruction sequence +L, it replicates in 14 units of time (column “Replication time” in Table 11). Replication time is defined as the number of iterations it takes both for the replica to appear and for the original structure to revert to its initial state. This very small structure uses a total of 174 rules (“Total rules in Table 11) of which only 83 are needed to produce replication (“Replication rules”); the remaining
TABLEI1 REPLICATION TIMEAND NUMBEROF RULES State State change change replication rules rules
Reduced total rules
Reduced replication rules
104 104 54 49
75 14 50 66
72 71 40 32
80 17 43 44
52 52 35 31
68 66 31 33
42 42 24 20
118
101 129 56 36 60 46 35
90
Replication time
Total rules
Replication rules
UL48S8V UL32S8V UL 1OS8V UL06S8V
234 150 34 14
177 177 163 174
167 166 117 83
109 109 14 91
UL48W8V UL32W8V UL10W8V ULO6W8V
234 151 34 10
142 134 I14 101
98 98 82 58
SL86S8V UL32S6M ULlOW8M UX10W8V SL12S6V UL06S6V ULO5S6V
151 150 34 44 26
207 173 145 115 65
181 305 22 I 103 140 83 58
Label
18
17
-
70 61 64 35
77 -
-
57 46 30 23
25 45 30 23
156
JAMES A. REGGIA ETAL.
rules are used to detect and handle collisions between different growing loops in a colony, and to erase the construction arm and instruction sequence of loops during the formation of a colony. If one counts only those rules which cause a change in state of the cell to which they are applied, this structure uses a total of 91 rules (“State change rules”) of which only 49 are used to produce replication (“State change replication rules”). This latter measure is taken here to be the preferred measure of the information processing complexity of a transition function because it includes only rules needed for replication and only rules that cause a state change. Prior to [57], the smallest previously described structure that persistently selfreplicates under the same assumptions [6], designated SL12S6V here, uses 6-state cells, has 12 components (Fig. 5(c)), and as indicated in Table 11, requires 60 state change replication rules. We have been able to create unsheathed loops, designated ULO6S6V and ULO5S6V, using 6-state cells with half as many components and requiring only 46 or 35 state change replication rules, respectively (last two rows of Table 11).The initial state of UL06S6V is shown in Fig. 5(g), and that of UL05S6V is identical except it has one less component in its arm; the complete transition functions are given in [%I. To our knowledge, UL05S6V is the smallest and simplest self-replicating structure created under exactly the same assumptions about cell neighborhood, symmetry, and so forth, as sheathed loops.
3.3 Varying Rotational Symmetry Cellular automata models of self-replicating structures have usually assumed that the underlying two dimensional space is homogeneous (every cell is identical except for its state) and isotropic (the four directions NESW are indistinguishable). However, there has been disagreement about the desirable rotational symmetry requirements for individual cell states as represented in the transition function. The earliest cellular automata models, such as von Neumann’s, had transition functions satisfying weak rotational symmetry: some cell states were directionally oriented [5, 66, 701. These oriented cell states were such that they permuted among one another consistently under successive 90” rotations of the underlying two-dimensional coordinate system. For example, the cell state designated T in von Neumann’s early work is oriented and thus permutes to different cell states +, and +- under successive 90” rotations; it represents one oriented component that can exist in four different states or orientations. However, Codd’s simplified version of von Neumann’s self-replicating universal
’
I,
’
A formal definition of rotational symmetry in cellular automata can be found in [ 1I]. Care should be taken not to confuse the rotational symmetry of a cell state as interpreted by the transition function with the rotational symmetry of the printed character used to represent that state. Here the printed character L is not rotationally symmetric, for example, but the cell state it represent is treated as such.
MODELS OF SELF-REPLICATING SYSTEMS
157
constructor-computer [ 111 and the simpler replicating sheathed loops [33] are based upon more stringent criteria called strong rotational symmetry. With strong rotational symmetry all cell states are viewed as being unoriented or rotationally symmetric. The transition functions for the unsheathed loops shown in Fig. S(d-g) also all use this strong rotational symmetry requirement (indicated by S in their labels). Their eight cell states are designated .O#L- * X + where the period indicates the quiescent state. All of these states are treated as being unoriented or rotationally symmetric by the transition function. The fact that the simplest self-replicating structures developed in the past [ 11, 331 were all based on strong rotational symmetry raises the question of whether the use of unoriented cell states intrinsically leads to simpler algorithms for selfreplication. Such a result would be surprising as the components of self-replicating molecules generally have distinct orientations. To examine this issue we developed a second family of self-replicating unsheathed loops, with examples shown in Fig. 5(h-k) (labeled UL48W8V, UL32W8V, ULlOW8V, and ULO6W8V, respectively), whose initial state and instruction sequence are similar to those already described in Fig. S(d-g). However, for the structures in Fig. 5(h-k) weak symmetry is assumed, and the last four of the eight possible cell states .O#L A > v < are treated as oriented. In other words, although there are still 8 states, the cell state A is considered to represent a single component that has an orientation and thus can exist pointing up or in the three other directions >, V and <. The rernaining four cell states (. 0 # L) are unoriented. For example, in Fig. S(i) the states >, V, and < appear on the lower, left and upper loop segments, respectively, to represent the instruction sequence <<<<<
158
00 L>OO
JAMES A. REGGIA ETAL.
O<
OL>O
VL OOL>
LO 0 >OOL"
OO^O<
L>OOL
< o< VL OLVOO
FIG.7. Structure UL06W8V uses only five unique components. Shown here are eleven immediately successive structures ordered left to right, top to bottom. Starting at 1 = 0, the initial state shown at the upper left passes through a sequence of steps until at t = 10 (last structure shown) an identical but rotated replica has been created.
structure which makes use of only 5 possible components. After several generations the older, inactive structures are surrounded by persistently active, replicating progeny, as shown in Fig. 8, and this colony formation continues indefinitely. The small but complete set of transition function rules needed for one replication of ULO6W8V can be found in [57]. The results summarized in Table I1 lead to additional observations about unsheathed loops [57].For systems with either weak or strong symmetry requirements, the number of rules in the transition function required for replication increases as structure size increases but then levels off to a value characteristic of
L
LO 00 >o 00 A
L
00 00 00 00 00 00
A
L
LO 00 00 00 LO 00 >o 00 00 00 >o 00
00 * 00 L LO VL >o 00 0 LO >
00 00 00 00 00 00 00 00 00 00 00 00
* L 00 00 00 00 00 00 LO 00 00 00 00 00 00 00 >o L A 00 00 00 00 00 00 00 00 00 00 00 00 00 00
rs
00 VL 00 00 00 00 00 00 00 00 00 L* 0
LO 00 00 00 > 00 00 00
r;
00 00 00 L A V
0
FIG.8. After several generations, a colony has formed from the original single copy of structure ULO6W8V pictured in the preceding figure. Structures around the periphery are still actively replicating; those in the center have retracted their arms and erased the instruction sequence that directs their self-replication. Growth of this colony continues indefinitely (this was verified by computer simulations out to at least 11 generations for all of the unsheathed loops described in this article).
MODELS OF SE LF-REPLICATING SYSTEMS
159
which of the symmetry requirements are in effect. Replication time is essentially independent of the type of rotational symmetry used (strong versus weak) but is proportional to the size of the self-replicating loop. This proportionality is effectively linear. To assess the effects of type of neighborhood, we implemented versions of the two arbitrarily selected unsheathed loops shown in Fig. 5(e) and 5(J) using the 9-neighborhood (Moore neighborhood). The resultant systems, designated UL32S6M and ULIOW8M in Table 11, had the same replication time as identically structured loops UL32S8V and UL10W8V, but required dramatically more rules in their transition functions for replication.
3.4 Reduced Rule Sets As noted earlier, the complete transition function for self-replicating loops includes a number of rules that are extraneous to the actual self-replication process (such as instruction sequence erasure) and many rules which simply specify that a cell state should not change. The state change rules, the subset of rules that specify that a cell’s state should change, alone are completely adequate to encode the replication process. As noted above, we believe that the number of state change rules used for one replication is thus the most meaningful measure of complexity of transition functions supporting self-replication. As shown in the sixth column of Table 11, this measure indicates that, from an information processing perspective, algorithms for self-directed replication can be relatively simple compared to what has been recognized in the past, especially when oriented components are present. The simplicity of unsheathed loop transition functions when oriented components are used is even more striking if one permits the use of unrestricted placeholder positions in encoding their rules. We implemented a search program that takes as input a set of rules representing a transition function, and produces as output a smaller set of reduced rules containing “don’t care” or “wildcard” positions [58]. The size of the reduced rule sets that result from applying this program to the complete original set of rules and to only the replication rules of each of the cellular automata models described above is shown in the rightmost two columns of Table 11. With ULO6W8V this procedure reduces the complete rule set from 101 to 33 rules, and the set of rules needed for one replication from 58 to 20. Thus, by capturing regularities in rules through wildcard or “don’t care” positions, it is possible to encode the replication process for unsheathed loop ULO6W8V in only 20 rules. Computer simulations verified that these 20 rules can guide the replication of ULO6W8V in exactly the same way as do the original rules. As shown in Table 11, similar reductions occur with other self-replicating structures. Such simple systems indicate that self-replication can in principle be far simpler than previously recognized.
160
JAMES A. REGGIA ETAL.
4.
Emergence of Self-replication
The self-replicating structures described so far have all been initialized with an original copy of the structure that will replicate (the “seed”) and have been based on manually created transition rules designed for that single, specific structure. Recently, we have taken a new direction in creating self-replicating structures, focusing on self-replication as an emergent property. In this section we give two examples of our work in this area. The first example shows an approach where no initial replicants are present. Instead, self-replicating structures emerge from initial states consisting of random isolated components. The second example shows how, given a small but arbitrary initial structure, a genetic algorithm can be used to automatically discover a set of transition rules that will cause that structure to replicate.
4.1
Emergence of Replicators
Recent work by our group has shown that it is possible to create cellular automata models in which a simple self-replicating loop emerges from an initial state having a random density and distribution of components (the “primordial soup”) [S]. These emergent self-replicating loops employ a general purpose rule set that supports the replication of loops of different sizes and their growth from smaller to larger ones. This rule set also allows random changes of loop sizes and interactions of self-replicating loops within a cellular automata space containing free-floating components. An example running in a randomly initialized, small (40 x 40) cellular automata space using an initial component density of 25% is shown in Fig. 9. Periodic boundary conditions are used (opposite edges are taken as connected), so the space is effectively a torus. Initially, at time t = 0 (upper left of Fig. 9), the space is 25% filled by randomly placed, non-replicating components designated as 0, >, or L, while cells in the quiescent state are indicated by blank spaces. All components have strong rotational symmetry except > which is viewed as being oriented, as described above. This simulation is characterized by the initial emergence of very small, selfreplicating loops and their progressive evolution to increasingly large and varied replicants. During this process a replicating loop may collide with other loops or with free-floating components, and either recover or self-destruct. Thus, by time 500 (upper right of Fig. 9), very small self-replicating loops of size 2 x 2 and 3 x 3 are present. By time 1500 a 4 x 4 loop is about to generate a 5 x 5 loop in the middle left region. At time 3000 the biggest loop is 8 x 8 and it is about to generate a 9 x 9 loop. By time 5000 many very large loops have annihilated each other and only one intact 10 x 10 loop is left. By time 7500 all large loops have “died”, but there are new 3 x 3 loops in the space. These loops will replicate and it is not
MODELS OF SELF-REPLICATING SYSTEMS
3000
FIG.9. A running example of emergent self-replication. Times are shown.
161
162
JAMES A. REGGIA H A L .
clear when (if ever) self-replication will cease. In this example, the size of the replicating structures became too big to fit comfortably in such a small world (40 x 40 only), and the large loops tended to annihilate each other. As can be seen from this example, the transition function supporting these selfreplicating loops differs from those used in previous cellular automata models of self-replication in several ways. A self-replicating structure emerges from an initial random configuration of components rather than being given, replication occurs in a milieu of free-floating components, and replicants grow and change their size over time, undergoing annihilation when replication is no longer possible. All of this occurs in the presence of a single transition function based on the 9-neighborhood (Fig. 1). As is increasingly being done in cellular automata modeling, the transition function is based on a functional division of data fields [67]. As seen in Fig. 10, the bit depth of a cellular automata cell (in our case 8 bits) is functionally divided into four differentfields (4, 2, 1 and 1 bits each) such that each field encodes different meanings and functions to the rule writer. The utilization of field divisions greatly simplifies the cellular automata rule programming effort, and makes the resulting rules much more readable. In the illustrations in this paper, only the component field is shown. As noted earlier, each non-quiescent or active cell is taken to represent a potential “component” of a cellular automata structure. A cellular automata structure can be just a single cell, i.e., one with no conceptual connection with any adjacent non-quiescent cells, and in that case we call it an unbouiid component. On the other hand, a cellular automata structure can consist of several contiguous nonquiescent cells that are functionally interrelated, behaving as a whole, such as a self-replicating loop. In the latter case we call the structure a multi-component
FIG.10. The 8 bit state variable in each cell is conceptually sliced into four different bit groups calledfields. Each field represents a specific piece of information.
MODELS OF SELF-REPLICATING SYSTEMS
163
structure or simply a structure, and we call its components bound conzponents (their bound bit is set; see Fig. 10). The four data fields (Fig. 10) and their states in the transition function are as follows. The four-bit componentfield accounts for most normal operations of cellular automata structures. It encodes twelve state values (out of 16 possible) corresponding to components just as in the previous examples we have seen. These include 0 (building block of data paths), > (signals growth of data path; this acmally represents four states), B (birth of new component), L (left turn signal), C (corner), and D, E, F (branching/detachment). There is also the quiescent state which is as usual shown as white space in all figures. The other fields are new. A two-bit special field denotes special situations that arise occasionally in the cellular automata space, such as branching, blocking passage of signals on a data path, or dissolution of a loop. A one-bit growthfield, if set, marks a stimulus that may cause the existing signal sequence to increase in length. A one-bit bound field, if set, marks a cell as part of a multi-cell structure; otherwise the cell is an unbound component. The complete set of rules forming the transition function support replication of loops in a fashion similar to those used in the past [33, 571. In addition, a loop’r replicant can be of a different (larger) size, a process referred to as extended replication. A loop’s signal sequence can become modified to generate loops larger than itself if by chance an active growth field appears in one of its cells during the arm branching process. Cellular automata rules that support extended replication are new. In the past, a different rule set has been required for each size of replicating loop; here the emergence of different size loops and their simultaneous replication is supported by a single rule set. This permits an initially small emergent self-replicating structure to grow in size. Another new aspect of this model is collision detection and resolution. In all past work on self-replicating loops, replication occurs in an otherwise. empty space and the transition function does not need to handle unanticipated events. In other words, while writing the rules one has complete control over the behaviors occurring in the cellular automata space, including the initial state. In contrast, here the very first assumption is that there is no a przori knowledge about the interactions between self-replicating loops, or what the cellular automata space is like at time zero. Although the rules in the previous models of replication that we have considered so far can reliably direct a structure to do replication in isolation, they cannot guarantee that a structure will not run into another structure, that two structures will not try to replicate into the same region of the cellular automata space, or that a replicating loop will not run into free-floating unbound components. These factors are all “randomly” determined. The transition function used here thus assumes that not all designated regular procedures will always be followed without interruption or disturbance from other structures. It includes rules that will detect failed procedures and clean up the cellular automata space after
164
JAMES A. REGGIA ETAL.
such failures. When a loop has any of its cells enter a failure mode, this mode quickly spreads throughout the whole structure, causing the loop to dissolve completely. The loop’s components become unbound and revert to being controlled by the rules governing unbound components. There is no a priori information about when and where growth bits should be placed in this model of emergent replication, and none are set initially. In the example shown here, whenever a signal L dissolves or “dies”, it leaves behind a growth bit at its location. A loop usually has only one L signal, so one dissolving loop usually produces one new growth bit in the cellular automata space. This way, the generation of the growth bit becomes part of the behavior of the cellular automata space, since when and where a loop will dissolve is determined purely by the interactions within the cellular automata space. The growth bit is utilized during the arm branching phase of a self-replicating loop to extend the signal sequence in a loop. As shown in Fig. 11, this is a two-step strategy. First, if a
~~~~~
E>>
i/:j_1(1 OOFO
L O
v>>oo
8
ooo>
vOOOB
LOO L
OOA O A
o v v>ODOOL>
47
OL7
v>ooo
0
<
O
0 000
EL>>
58
0 0
L>>>OO
69
FIG. 11. The growth of a larger loop (extended replication). At time 0 the branch special flag in the lower left cell and the growth bit in the middle right cell are both set. At time 2 the normal arm branching EF signal sequence is generated. At time 3 the signal sequence becomes >>> and subsequently the growth bit is unset. By time 8 the parent loop is about to start the replication cycle with one more > signal than it normally has. By time 47 a whole new loop bigger than the original one is generated. By time 58 the two loops have separated and the original one is just about to start another replication cycle. At time 69 the new, larger loop is finished and is starting its own replication cycle.
MODELS OF SELF-REPLICATING SYSTEMS
165
signal > sees a growth bit in its place and it is the last > before the signal L, it does not copy the signal L behind itself as it normally does. Instead, it stays at its current value > for one more time step, thus effectively increasing the size of the signal sequence by one. The signal L disappears temporarily since it is not copied, but reappears when the signal > sees a trailing signal F and the growth bit in its position. The growth bit is unset after the signal L is regained, so the same growth bit does not cause another growth stimulus. Thus, when a loop dies, it leaves a set growth bit behind, and when a loop expands, it consumes a growth bit. This provides an interesting ecological balancing factor in the cellular automata universe. The emergence of self-replication is achieved by allowing the unbound components to translate and change or appear at “random”, i.e., by “stirring the primordial soup”, until the configuration corresponding to a small (2 x 2) loop occurs by chance. The rules that do this can be summarized as follows: 0
0
If a quiescent cell has exactly three active neighbors, it becomes active at the next time step. Its active value is determined based on the state of its neighbors. If an active cell has exactly two or three active neighbors, it will stay active; otherwise, an active cell will return to the quiescent state at the next time step.
These rules are generalizations (from binary to non-binary states) of those used in the Game Of Life described earlier. These rules generally produce a continually varying distribution of unbound components. All that is then required for the emergence of self-replication is a small set of rules that watch for the formation of the smallest loop configuration (a 2 x 2 loop). Once such a configuration occurs, all four members of it simultaneously set their own bound bit and produce an active smallest loop at the next time step. This is how the first self-replicant is formed. This is possible using only local operations because the minimum loop configuration is so small that it fits within a single 9-neighborhood, allowing each component to simultaneously “see” the same configuration. An example of how the unbound component rule set works and how it leads to the first self-replicating structure is demonstrated in Fig. 12. The behavior of this model of emerging self-replication has been examined experimentally [8]. Eighty one simulations were conducted while varying the cellular automata space size (50 x 50, 100 x 100, 150 x 150 and 200 x 200), initial unbound component density (lo%, 20%, 30%,40% and 50%)and random initial configuration used in each simulation. In 80 of these 81 simulations, self-replicating loops emerged, and usually these persisted indefinitely. The emergence,
* The one simulation where self-replication did not occur was with the small, 50 x 50 space, where significant variations in unbound components ceased before the configuration of the smallest selfreplicating loop appeared.
166
JAMES A. REGGIA ETAL.
pipi O
h
L
12
13
FIG.12. The emergence of a self-replicating structure. Components of structures are marked by a non-zero bound bit, or an "!" mark. At time 0 a randomly generated initial space is given. This space has only unbound components until time 8, when the pattern of the smallest replicating loop (circled) appears. At time 9 this configuration turns into a functioning self-replicating loop when its four cells set their bound bit simultaneously (set bound bits are indicated by faint exclamation points). Its peripheral cells clear and the arm branching process begins (times I0 to 13). By time 28 the first sibling is about to separate. By time 51 four loops are obtained and all are actively engaging in the replication processes.
proliferation and persistence of self-replicating loops were found to be robust phenomena relatively insensitive to the initial conditions of a simulation. There is a very stable and characteristic dynamics under the emergent self-replicating rule set. In fact, the number of active cells, and the fraction of boundlunbound components, always tended to approximate a long-term stable value. This value depends on an interaction between the rules governing replication and these governing movement of unbound components, and not on either of these subsets of rules alone. The number and size of replicating loops generally stabilizes too. After a few thousand time steps, there is typically no significant change in the average number and size of loops in the cellular automata space. These values tend to oscillate in a non-periodic, varying-amplitude fashion about a mean, suggesting an underlying chaotic dynamics. Details of these experiments can be found in [8]. These results show for the first time that non-trivial self-replicating structures can emerge in a cellular automata space initialized with a randomly distributed set
MODELS OF SELF-REPLICATING SYSTEMS
167
of components. Some other computational studies of emergent self-replication have been done (see Chapter 28 of [31], and [48]), but these have not used cellular automata methods. For example, the investigation in [48] used a very different (non-cellular automata) model having an initial state composed of randomly generated sequences of computer operations. It evolved self-replication via a mutation operation. The primary conclusion, backed up by simulation results, was that the probability of a randomly generated sequence of operations becoming selfreplicating increased with the number of computer operations it contained. Further, self-replicating sequences decreased in size once they appeared. The cellular automata model described here shows that such behaviors are not necessarily an inherent aspect of emergent self-replication, in that very small self-replicants can arise first and then increase in size, as is often argued to have occurred with the origins of biological replication. We attribute the differences in results to the fact that our cellular automata model starts with random individual components rather than random initial sequences of computer operations, that its rules were hand crafted, and that cellular automata are based solely on highly local operations (e.g., there is no global copy operation that copies a loop to a nearby region of the space).
4.2
Automatic Programming of Replicator Rules
In the past, the rules or transition function governing self-replicating structures have always been programmed manually. This is a very difficult and timeconsuming task, and it is also influenced by subjective biases of the implementer. As an alternative, we have recently shown that it is possible to automatically generate a set of rules for self-replication, i.e., to automatically program a cellular automata space to generate a sequence of steps that the components of a structure can follow to produce replicants of the original structure [36, 371. While work in this area is just beginning and the structures used so far are quite small, initial results have already created a new class of non-trivially replicating structures unlike those developed previously. The approach we describe here is based on using genetic algorithms, a powerful stochastic search technique, to discover rules producing self-replication. A genetic algorithm produces a solution to a problem by manipulation of a population of candidate solutions [ 19,20,24, 3 1,421. Each individual in the population, called a chromosome, encodes a potential solution to the problem under consideration. Typically the population is initialized with randomly generated chromosomes (see Fig. 13). Each existing chromosome (problem solution) has its effectiveness as a solution to the problem measured by a fitness function. Then, simulating natural selection as it is understood to occur in biological evolution, the most highly fit chromosomes are selected to serve as parents for offspring chromosomes that form the next generation. This process occurs repeatedly
168
JAMES A. REGGIA E TAL.
initialize population of chromosomes evaluate fitness of each chromosome while (termination criterion not reached) do select parent chromosomes for mating a.pply crossover and mutation to produce children evaluate fitness of each chromosome
end FIG. 13. Brief summary of traditional genetic algorithm.
with progressively better solutions being represented in the population. The genetic algorithm typically terminates after identifying a sufficiently good problem solution or after a prespecified number of generations. A key aspect of this genetic search process is the use of genetic operators, such as crossover and mutation, in producing offspring chromosomes. Crossover takes two parent chromosomes and swaps randomly selected parts of their contents to form two offspring chromosomes (Figure 14(a)). Mutation takes one parent/ offspring chromosome and complements randomly selected bits (Figure 14(b)). These alterations to the population of chromosomes, coupled with fitnessguided selection of parents, allows the genetic algorithm to heuristically and stochastically search the space of problem solutions for good solutions. crossover point parents
mutation point
crossover
child
1
~~~~~~~~~~~~~~~~~~~~~
FIG. 14. The two most commonly used genetic operators are crossover, illustrated on the left, and mutation, illustrated on the right. Each chromosome here is a binary string. These operations introducevariability into the chromosome manipulated by the genetic algorithm.
MODELS OF SELF-REPLICATING SYSTEMS
169
Relatively few previous studies have reported using a genetic algorithm to automatically produce rule tables for cellular automata (see, for example, [42, 591). For self-replication, there are some clear barriers to using genetic algorithms in this way. One barrier is the enormous computational load involved. At each iteration of the genetic algorithm, a whole population of cellular automata models must be run, each individually involving a substantial amount of computation. This process must be done repeatedly, generation after generation, and the fitness of each individual rule table evaluated at each generation. Further, the space of possible rules that must be searched is enormous. A second barrier is that it is not obvious what form a good fitness function should have. The straightforward approach of making fitness proportional to the number of replicants produced is generally useless. This is because there are typically no replicants produced by any randomly generated initial rule set in a population, so counting the number of replicants produced gives no guidance early on, reducing the genetic algorithm to blind search. Fortunately, it has proved possible to solve these problems, at least to a limited extent [36, 371. Figure 15 summarizes a genetic algorithm that has been applied successfully to this task. The genetic algorithm begins by generating a population of randomly initialized rule tables, and uses these to execute cellular automata simulations, each starting with the same initial structure. Following these simulations, each rule table in the population receives a fitness measure F reflecting the degree to which its rules appear promising as a means of supporting self-replication. A new population is then created, randomly choosing rule tables to carry forward to the new population in proportion to their fitness. As the new population is formed, rule tables from the old population are “mixed together” and combined through the genetic operation of crossover, and randomly altered by mutation, as explained above. (An exception is that a copy of the very best rule table in a population is always carried forward unchanged.) At this point, the whole process iterates, this time starting with the new population of rule tables and discarding the old. Typical parameter values in a simulation like this include a population of 100 rule sets examined over 2000 generations, with probabilities of crossover and mutation of 0.8 and 0.1, respectively. At the end of this process, the most highly fit rule table is returned as a potential transition function supporting self-replication with the given initial structure. Figure 16 shows the encoding of a rule table used by the genetic algorithm in this process, i.e., a chromosome representing one individual in the population. The rule table is indexed on the left by the 5-neighborhood pattern CNESW (center, north, east, south, west), and rules for each specific component are grouped together. Each rule has a “next state” entry indicating what the center cell component C should become at the next time step for the given neighborhood pattern. By adopting the convention that a rule for every possible neighborhood pattern must be represented in a chromosome, and that these are always in the
170
JAMES A. REGGIA H A L .
Population of Randomized Rule Tables
Evaluate Population Run 100 Simulations Compute Fitnesses, F,, F,, . . . ,Floe Extract Statistics Determine Best-of-Generation F‘
Create New Population (generation g+l) Linear Normalization of Fitnesses Selection Roulette Wheel Sampling Generational Replacementwith Elitism Crossover Repeated Single-pointCrossoveI within Gene Segments
Mutation Point mutation of actions/states
FIG. 15. Schematic overview of the use of a genetic algorithm to search for a rule set that produces self-replication when given a simple but arbitrary initial structure in a cellular automata space. Given an arbitrary structure, an initial population of possible transition functions is randomly generated (top). In general, none of these initial rule sets will cause the given structure to replicate. Each rule table or transition function is then tested and its potential (“fitness”) to be changed into rules that do produce self-replication is measured. If a rule set that does produce self-replication has been found, the program quits and returns that rule table. If not, a new population of rule tables is created by selecting the most promising or most fit existing rule tables. These promising rule tables are modified via crossover and mutation, and the entire process repeats.
same order, it is not necessary to explicitly store the CNESW neighborhood patterns. Thus a chromosome is represented as just a list of next-state entries (i.e., just the next state list indicated on the right in Fig. 16). For the simulations described below, chromosomes were roughly 850 next-state elements long. Creating a fitness function F that accurately measures the promise of a rule table for eventually generating self-replication of an arbitrary initial structure was the most challenging aspect of this work. None of the initial random rule tables produce replicants, so in this sense each has a zero fitness. This issue was
MODELS OF SELF-REPLICATING SYSTEMS
rules for state X
1
XX;"
171
I nextstate 1 L
4
state Y
FIG. 16. Encoding of a rule table used to represent a chromosome in the genetic algorithm of Fig. 15.
addressed by creating a fitness function F that is a linearly weighted sum of three measures, F=wgfg
+wpfp +W,fr
where the ws are fixed weights (O<w< 1) and the fs are fitness measures (0 Q f Q 1). The basic idea here is that an intermediate state on the path to evolving rules for self-replication is the evolution of a rule set that produces growth and/or configurations similar to that of the seed structure. Thus, the overall fitness F includes a growth measure f, assessing the extent to which each component type in a given initial structure generates an increasing supply of that component from one time step to the next, and a relative position measure f p assessing the extent to which each component has the same neighbor components over time as it did in the initial structure. High values of f , and fp do not necessarily imply that replication is present (although replication, if present, would be expected to make them relatively large), but they do represent behaviors that might be useful precursors to replication. The third term in F , the replicant measure f , , is a function of the number of actual replicants present. While this is zero for many early generations with a rule table, it can cause a substantial rise in F if actual replication occurs. How should the three weights in F be chosen to maximize the chances of success with this approach? There is no precise answer that can be given to this question at present. Systematic experiments have suggested that w g= 0.05, w,, = 0.75, and wr = 0.20 is a good set of values when the weights are constrained to sum to 1.0 [37]. In other words, the relative positioning measure proved to be the most critical factor in discovering rules for self-replication.
172
JAMES A. REGGIA ETAL.
To assess the success of the above approach, 100 experiments were done with each of several small arbitrary initial seed configurations. The rate of success in discovering rules producing self-replication declined sharply as the number of components in the initial structure increased. Under the best conditions, the percentage of runs in which the genetic algorithm discovered a rule-table that resulted in self-replication was 93% for structures with two components, 22% for structures with three components, and 2% for structures with four components. A representative example of a self-replicating structure discovered in this fashion is shown in Fig. 17. The naming convention used here to catalog these
u
AEC ABC
U
x a
BC
LLL I
t=O
t=l
t=2
t=3
t=4
t=5
t= 7
t=8
U
umn
ZIE
n
a
k
u
u mn
'8
umc
ABC
D
n
1 t=6
FIG.17. A 4-component self-replicating polyominoe. Its initial state is shown at the upper left (t = 0). Several replicants can be seen by t = 7 and t = 8.
MODELS OF SELF-REPLICATING SYSTEMS
173
structures is similar to those for loops except the prefix PS is used (for “polyominoe structure”) to designate the arbitrary block-like shape of the initial structure. Structure PS4W17V in Fig. 17 provides a typical example. It is a four-component replicator for which multiple replicants can be observed by t = 5 . Like selfreplicating loops, these structures gradually form expanding colonies. The replicators discovered in this fashion by the genetic algorithm can be viewed as forming a third class of self-replicating structures (the first two classes being complex universal computer-constructors and self-replicating loops). In addition to being formed from arbitrary non-loop seed structures, the replicators discovered in this fashion generally move through the cellular space, depositing copies as they go, a design that has apparently never been adopted in previous manually created cellular automata models of replication. For example, the 4component replicator in Fig. 17 can be viewed as going through transformations as it translates to the right (relative to its initial position, which is marked by the origin of arbitrary coordinate axes in the figure), periodically reappearing in its original form ( t = 3 , 6 , .. .) as it gives off replicants in the upper right quadrant (t = 4,7, .. .) that themselves are rotated and moving upwards. Further details can be found in [37].
5. Programming Self-replicating Loops We observed earlier that, in discussing self-replicating systems in cellular automata, there are two levels of abstract machines: the individual cells of the cellular space, and the configuration of “components” that, as an aggregate, jointly represent a self-replicating structure. We can thus speak of programming either type of machine. In the former case, the rule table representing the transition function is the “program” directing a cell’s behavior. In the latter case, the sequence of signals on a tape or loop that direct the self-replication forms a program. In this section we are solely concerned with the latter case when we refer to programming a self-replicating structure. The concept of programming self-replicators can be traced back to von Neumann’s original universal computer-constructor [70]. The set of instructions (signals) or description on the replicating structure’s tape that describe its own structure can be viewed as the machine’s program. Similarly, the sequence of instructions that circulate around a self-replicating loop form a program that directs the loop’s replication. Such programs have only been concerned with replication of the loop in the past. During recent years, however, the idea of programming self-replicators to do more than just replicate has been receiving increasing attention. The underlying idea is that the signal sequences directing a structure’s replication can be extended in some fashion to solve a specific class of problems while replication occurs. The motivation for such programmed
174
JAMES A. REGGIA ETAL.
repZicators is that they provide a novel, massively parallel computational environment that may lead over the long term to powerful, very fast computing methods. Two different approaches have been taken so far.
5.1
Duplicated Program Sequences
Perhaps the most straightforward approach to programming self-replicating loops to solve problems is simply to extend the sequence of signals circulating around the loop, adding additional signals representing a program that carries out some task. This application program is copied along with the replication program unchanged from generation to generation as the loop replicates, and is executed once by each loop in between replications. The viability of this approach was recently demonstrated by programming partially sheathed loops to construct a pattern (the letters LSL) in the interior of each replicated loop [65].Using a loop with four arms based on the 9-neighborhood, it was possible to create extra space on the loop for an application program by factoring more of the replication process into the loop’s transition function. In other words, rather than the growth process of the child loop being directed to occur by a sequence of > signals as in the examples above, such growth/extension was the default that occurred automatically. The instruction sequence thus needed to consist only of appropriately delayed L signals indicating when the growth process should change direction to start a new side of the loop. The price paid for automatic loop growth and the execution of an application program is in terms of the complexity of the rule set: typically, on the order of a few hundred rules are required in this situation [65]. A practical problem with the above approach to programming self-replicating loops is the restricted amount of space available along a loop for application programs and data. This problem can be solved by adding “tapes” to the loops [50]. This is analogous to the tapes used in the earliest universal computer-constructor replicators 111, 701. The basic idea is illustrated in Fig. 18. Starting with a sheathed loop (Fig. 18(a)), two vertically descending tapes of arbitrary size are added to one side of the loop (Fig. 18(b)). The left tape is used to store a signal sequence representing an application program (signal locations marked by Ps), while the right tape is used to store problem data (item locations marked by Ds). Reading heads, e.g., the H in the right part of the instruction tape sheath in Fig. 18, move up and down the tapes. As the program reading head reaches an instruction P, that instruction may cause a signal to move along the sheath to act on part of the data tape. Using this approach with the 5-neighborhood, it has been shown that one can program a self-replicating loop to perform parentheses checking [50]. An expression with parentheses is represented on the data tape, and the program checks that the parentheses are well-formed or balanced, a computation that corresponds to recognition by a non-regular language. A parent loop first replicates itself and
MODELS OF SELF-REPLICATING SYSTEMS
175
FIG. 18. (a) Self-replicating sheathed loop discussed earlier [ 3 3 ] , reproduced here for comparison purposes. (b) Schematic illustration of how the sheathed loop can be extended to support programming [50].Two potentially infinite, vertical “tapes” have been added, one to include a program (lower left) and one for data (lower right). Each P designates a simple program instruction and each D a unit of data. The Ps and Ds are replaced by specific signals in solving a problem.
copies unchanged its program and data tapes onto its child loop. It then executes its program to balance parentheses. This process uses cells having 63 states and roughly 8500 state change rules in the transition function. Because of the presence of tapes, replication is restricted to the two horizontal directions only, at least in a two-dimensional cellular automata space. It can be shown that tape-extended self-replicating loops are capable of executing any desired program [SO]. Thus, in principle such extended self-replicating loops exhibit computational universality just as did the earliest self-replicating structures, yet they are qualitatively simpler.
5.2
Expanding Problem Solutions
The programmable self-replicating loops described above literally encode a set of instructions on the loop or an attached tape that directs solution of a problem. This application program is copied unchanged from parent to child, so each generation of loops is executing exactly the same program on exactly the same data. This demonstrates the feasibility of programming self-replicating loops and might find use in some applications, but a more general approach would allow the program and data to change over time in some systematic fashion. While such a generalized approach has not been examined yet, it has proved possible to append potential problem solutions to the replication instruction sequence circulating on a self-replicating loop [9]. Unlike the above approach, the initial problem solution is not copied exactly from parent to child but is modified from generation to generation. Each child loop gets a different partial problem solution. If a loop determines it has found a valid complete problem solution, it stops replicating and
176
JAMES A. REGGIA ETAL.
retains that solution as a circulating pattern in its loop. On the other hand, if a loop determines its partial solution is not useful, the loop “dies”, erasing itself without descendents. Thus, the process of forming a colony of loops can be viewed as a parallel state space search through the space of problem solutions. At the end of this process when replication has stopped, the cellular space contains one or more non-replicating loops, each with a circulating sequence of signals that encodes a valid problem-solution (assuming such a solution exists). We recently applied this approach of generating possible solutions and selectively discarding non-viable ones to solve satisfiability problems (SAT problems), a classic example of an NP-complete problem [27]. Given a boolean predicate like P = (-x, V n3)A (xl V - x2)A (x2V xq), the SAT problem is: “What assignment of boolean values to the binary variables xl, x 2 and xi can satisfy this predicate?”, i.e., what assignment can make this predicate evaluate to True? In this case, P will be true if x, = 1, x2 = 1 and x3 = 1, for example. The predicate P here is in conjunctive normal form, where each part of the predicate surrounded by parentheses is called a clause. A SAT problem is usually designated as an m-SAT problem if there are m boolean variables in a clause of its predicate. Therefore, the above example P is a 2-SAT problem. Figure 19 illustrates the generate-and-select process for a self-replicating loop carrying 3 binary bits representing the three variables xl, x2 and x3 used in predicate P. In the initial loop at t = 0, unexplored bits are represented by the symbol A. These As replace some of the 0s forming the data path in the self-replicating loop. The original growth signal > is also replaced by the symbol + reflecting some minor differences in the replication process (the data path symbol 0 used in earlier figures has also been changed here to lower case o typographically to avoid confusion with the digit zero). Explored bits are represented by either digit 0 (“false”) or 1 (“true”) in the loops. The bit sequence that a loop carries is read off counterclockwise starting right after the L symbol. Thus, for example, the lower left loop in Fig. 19 at t = 124 carries the sequence 001. Without the selection process, in three generations all eight possible boolean assignments for the variables used in P would appear, carried by eight loops in the cellular automata space, assuming that no collisions occurred. Loops stop replicating once they have explored all of their A bits. Since the exploration of bits is done one bit at a time at each generation, and since at each exploration step a different bit appears in the parent and child loops, we can be sure that all possible boolean assignments will be found with the generation process, if there are no collisions of loops in the space. If collisions do occur, a loop unable to replicate initially will continue trying until space appears for it to do so. To remove those loops which do not satisfy a SAT predicate, each cell in the space serves as a monitor. Each monitor tests a particular clause of the SAT predicate. If the condition a cell is looking for in its role as a monitor is found, it will “destroy” the loop passing through it. For the specific predicate P , three classes of
-
177
MODELS OF SELF-REPLICATING SYSTEMS
r;r
LO1 + A *oE
0 0 oAA
LIIOO
I
I1
0
oF++L a O oAO
ORE
l F L+*
++L o l EA1
1
0
I
+
L 1 1 * A
01L
+FE
O O t
A
010 0 0 L++
10
I1
131
+ A t o E
00t
A + A1L I
L
86
II 129
A + OOL
L++Fo
44
++LOO + A too
84
I
EF+ L10
+ + L AAO
82
++L
0 0 000
000 1 0 L++
++L 0 1 oAl
1 A o 1 0 L++
124
II I
I L
134
I
FIG. 19. The generation and selection of satisfying boolean assignments by self-replicating loops for the predicate P given in the text. The monitoring of circulating loop signals by each cell provides the selection process. At time 0, the initial loop is placed in the cellular automata space, and carries unexplored binary bits represented as AAA. By time 44 the first replication cycle has completed and there are two loops in the cellular automata space. The first binary bit has been explored, resulting in the first A being converted into 0 and 1 in the two resulting loops. By time 82 the second replication cycle has completed and there are four loops in the cellular automata space. Starting at time 84 the top loop is being destroyed (note the missing corner cell of the loop). Its bit sequence ‘01A’ does not satisfy the second clause in predicate P , so it is being erased by the monitor underneath its top-right comer. At time 86 the erasing process continues while the other loops start their next replication cycle. At time 124 the third (also the last) replication stage is completed and there are six loops in the cellular automata space. Four of these loops do not survive the selection process for long and are erased (times 129 and 131). Finally, two satisfying assignments 000 and 1 11 remain in the cellular space at time 134.
monitors, each testing for one of the following conditions, are planted in the cellular automata space: x I A - x3,-xl A x2,and -x2 A x3. These conditions are just the negated clauses of P. If any one clause of predicate P is not satisfied, the whole predicate will not be satisfied. A monitor will destroy a loop passing through it if its corresponding clause is found to be unsatisfied by the bit sequence carried by the loop. This detection process is done in linear time since essentially each monitor is just a finite automata machine, and the bit sequence passing through it can be seen as the string for regular expression recognition. With enough properly distributed monitors in the cellular automata space, they can effectively remove all unsatisfying solutions.
178
JAMES A. REGGIA ETAL.
Some steps of the generation and selection process for the same 3 x 3 loop are shown in Fig. 19. Starting with one initial loop carrying a totally unexplored bit sequence AAA at t = 0, OAA appears in the parent loop and 1AA in the child loop in the first generation (t = 44). In the second generation two new loops carrying 01A and 11A are obtained; the two parents now carry OOA and IOA. If all goes well, in the third and final generation, we should get four more loops 01 1, 111, 001 and 101; the four parents would carry 010, 110,000 and 100. If no selection and no collisions occurred, then there should be all eight possible values for a 3 bit binary sequence. However, it can be seen in this figure that some of the loops are destroyed or never even generated after the second generation. For example, the topmost loop at t = 82 is erased ( t = 84, t = 86). Since it has been found (by the monitors) that this loop’s partially explored bits 01A do not satisfy one of the clauses, there is no need to explore further since all of its descendents will carry the same binary bits. In three generations only two loops are left in the cellular automata space instead of eight ( t = 134). These two loops carry exactly the only two satisfying boolean assignments for the original SAT predicate P , which are 000 and 111. The details of this process are given in 191.
6. Discussion Cellular automata models of self-replication have been studied for almost fifty years now. In this article we have presented the view that work on this topic has involved at least three different approaches. The earliest work examined large, complex universal computer-constructors that are marginally realizable. This work established the feasibility of artificial self-replication, examined many important theoretical issues involved, and gradually examined progressively simpler self-replicating universal systems. A second and more recent approach has focused on the design of self-replicating loops. Self-replicating loops are so small and simple that they have been readily realizable. Finally, we believe that a third approach merits investigation: the emergence of self-replicators from initially non-replicating systems. As examples of this, we discussed our recent studies of the emergence of self-replicating structures from randomly distributed, non-replicating components, and the evolution of transition rules that support replication of small but arbitrary initial structures. The recent work on self-replicating cellular automata models that is of most direct significance for computer science is that on programming self-replicating loops. As we have seen, problem-solving can be accomplished by self-replicating structures as they replicate. This can be achieved either by attaching a set of instructions (signals) to those directing replication, or by encoding a tentative problem-solution that systematically evolves into a final solution. Implementations have shown that programmed replicators are clearly capable of solving
MODELS OF SELF-REPLICATING SYSTEMS
179
non-trivial problems. These programmed self-replicating structures are intriguing in part because they provide a novel approach to computation. This approach is characterized by massive parallelism (each cell in the underlying cellular automata space is simultaneously carrying out computation), and by the fact that both self-replication and problem-solving by replicators appears as emergent properties of solely local interactions. While progress in creating and studying cellular automata models has accelerated during the last few years, a great deal remains to be done. A high level language that specifically supports development of cellular automata transition functions would be of great value to future investigations, as this is currently largely unavailable. Similarly, while hardware that directly supports the massively parallel but local computations of cellular automata modeling has appeared [23, 671, it is also largely unavailable today. If such software and hardware environments could be made available in the future, it would greatly reduce the large programming and processing times associated with research in this area. Among the many issues that might be examined in the future, several appear to be of particular importance. These include the further development of programmable self-replicators for real applications, and a better theoretical understanding of the principles of self-replication in cellular automata spaces. More general and flexible cellular automata environments, such as those having non-uniform transition functions [61] or novel interpretations of transition functions [36], merit exploration. It has already proved possible, for example, to create simple selfreplicating structures in which a cell can change the state of neighboring cells directly [36], or can copy its transition function into a neighbor cell while allowing cells to have different transition functions [61]. Also, from the perspective of realizing physically self-replicating devices, closer ties and exchange of information between the modeling work described here and ongoing work to develop selfreplicating molecules and nanotechnology is important. Closely related to this issue is ongoing investigation of the feasibility of electronic hardware directly supporting self-replication [39, 401. If these developments occur and progress, we foresee a bright and productive future for the development of a technology of self-replicating systems. Finally, we expect that as the modeling of self-replication progresses, it will assume increasing importance in theoretical biology. Artificial self-replicators have already shown that self-replication of information-carrying structures can be far simpler than many people have realized [57]. Analogous conclusions about unexpectedly simple information processing requirements have been reached regarding other complex phy sical/chemical processes after cellular automata models of them were developed, such as the appearance of stably rotating spiral forms in the Belousov-Zhabotinskii autocatalytic reaction [ 18, 381. Further, it seems probable that the simple self-replicating structures described here are not
180
JAMES A. REGGIA ETAL.
the only ones possible. The self-replicating structures discovered using a genetic algorithm (Fig. 17) suggest that novel approaches still remain to be identified. At present it has not been possible to actually realize any “informational replicating systems” in the biochemistry laboratory [46], although recent results in experimental chemistry suggest this may someday be possible [ 1,15,26,69].The replicating loops and polyominoes described here provide intriguing ideas for self-replicating molecular systems, but are not intended as realistic models of known biochemical processes and have only a vague correspondence to real molecular structures. The information-carrying loop, for example, might be loosely correlated, with a circular oligonucleotide, and the construction arm with a protein that reads the encoded replication algorithm to create the replica. Still, the existence of these systems raises the question of whether contemporary techniques being developed by organic chemists studying autocatalytic systems or the innovative manufacturing techniques currently being developed in the field of nanotechnology could be used to realize self-replicating molecular structures patterned after the information processing occurring in simple self-replicating cellular automata structures. REFERENCES AND FURTHER READING Amato, I. (1992). Capturing chemical evolution in a jar, Science, 255, 800. Arbib, M. (1966). Simple self-reproducing universal automata. Inform. and Control,9, 177-180. Banks, E. (1970). Universality in cellular automata. Eleventh Ann. Sytnp. on Switching and Automata Theory, IEEE, 194-215. Berlekamp, E., Conway, J., and Guy, R. (1982) Winning Waysfor Your Mathematical Plays, Academic Press, New York, vol. 2, chap. 25. Burks, A. (1970). In Essays on Cellular Automata, A. Burks, Ed., University of Illinois Press, Urbana, chap. 1. Byl, J. (1989). Self-reproduction in small cellular automata. Physica D , 34, 295-299. Case, J. (1974). Periodicity in Generations of Automata. Marhematical Sysrems Theory, 8, 15-32. Chou, H., and Reggia, J. (1997). Emergence of Self-Replicating Structures in a Cellular Automata Space. Physica D,in press. Chou, H., and Reggia, J. (1997). Solving SAT Programs with Self-Replicating Loops. In preparation. Chou, H., Reggia, J., Navarro-Gonzalez, R., and Wu. J. (1994). An Extended Cellular Space Method for Simulating Autocatalytic Oligonucleotides. Computers and Chemistry, 18, 33-43. Codd, E. (1968). Cellular Automata, Academic Press, New York. Demongeot, J., Goles, E., and Tchuente, M. (1995). Dynarnical Sysrems and Cellular Automata, Academic Press, New York. Drexler, K. (1989). Biological and Nanomechanical Systems. In Artificial Life, C. Langton, Ed., Addison-Wesley, New York, pp. 501-509. Farmer, D., Toffoli, T., and Wolffram, S. Eds. (1984). Cellular Automata, North Holland, Amsterdam. Feng, Q., Park, T., and Rebek, J. (1992). Crossover reactions between synthetic replicators yield active and inactive recombinants. Science, 256, 1179-1 180.
MODELS OF SELF-REPLICATING SYSTEMS
[40]
181
Freitas, R., and Gilbreath, W. Eds. (1982).Advanced Automation for Space Missions, NASA Conference, Publication 2255, NTIS. Gardner, M. (1970). The fantastic combinations of John Conway’s new solitaire game “Life”. Scientific American, 223(4) 120- 123. Gerhardt, M., Schuster, H., and Tyson, J. A. (1990). Cellular automaton model of excitable media including curvature and dispersion. Science, 247, 1563-1566. Goldberg, D. (1989). Genetic Algorithms in Search, Optimizarion and Machine Leurning, Addison-Wesley, Reading, MA. Grefenstette, J. (1 990). Genetic algorithms and their applications. In Encyclopedia of Cornputer Science arid Technology, Val. 21, Suppl. 6, Marcel Dekker, New York, pp. 139-152. Gutowitz, H. Ed. (1991). Cellular Automata-Theorj and Practice, MIT Press, Cambridge, MA. Herman, G. (1973). On universal computer-constructors. lnforniafion Processing Lerters, 2, 61-64. Hillis, W. (1985). The Connection Machine, MIT Press, Cambridge, MA. Holland, J. ( 1975). Adaptation i n Nuturul and Artificial Systems, University of Michigan Press, Ann Arbor. Holland, J. H. (1976). Studies of the spontaneous emergence of self-replicating systems using cellular automata and formal grammars. In Auton?ata, Languages, Development, A. Lindenmayer and G. Rozenberg, Eds., North-Holland, Amsterdam, pp. 385-404. Hang, J., Feng, Q., Rotello, V., and Rebek, J. (1992). Competition, cooperation and mutation: improving a synthetic replicator by light irradiation. Science, 255, 848-850. Hopcroft, J., and Ullman, J. (1979). Introduction to Autornafa Theory, Language and Coinpitation Addison-Wesley, Reading, MA, chap. 7. Ibanez, J., Anabitarte, D., Azpeitia, I., Barren, O., Barntieta, A,, Blanco. H., and Echarte, F. (1995). Self-inspection based reproduction in cellular automata. In Proc. 3rd Euro. Corf. Art$ Life, F. Moran et al., Eds., Springer, Berlin, pp. 564-576. Jacobson, H. (1958). On models of self-replication. Amer. Sci., 46, 255-284. Kephart, J. (1994). A biologically inspired immune system for computers. In R. Brooks and P. Maes, Eds., Artificial Lve IV, MIT Press, Cambridge, MA, pp. 130-139. Koza, J. (1992). Genetic Progrannning, MIT Press. Laing, R. (1975). Some alternative reproductive strategies in artificial molecular machines. J . Theor. Biol., 54,63-84. Langton, C. (1984). Self-reproduction in cellular automata. Physica IOD, pp. 135-144. Langton, C. Ed. (1989). Artificiul Life, Addison-Wesley, New York. Langton, C., Taylor, C., Farmer, J., and Rasmussen, S. Eds. (1992). ArtificialLife 11, AddisonWesley, New York. Lohn, J., and Reggia, J. (1995). Discovery of self-replicating structures using il genetic algorithm. IEEE Internat. COJ$ on Evohrionary Computing, Perth, pp. 678-683. Lohn, J., and Reggia, J. (1997). Automatic discovery of self-replicating structures in cellular automata. IEEE Transactions on Evolufionary Conputation, l(3). Madore, B., and Freedman, W. (1983). Computer simulations of the Belousov-Zhabotinsky reaction, Science, 222, 615-616. Mange, D., Goeke, M., Madon, D., Stauffer, A,, Tempesti, G., and Durand, S. (1996). Embryonics: A New Family of Coarse-Grained Field-Programmable Gate Array with Self-Repair and Self-Reproducing Properties. Towards Evolvable Hardware, Springer Verlag, pp. 197-220. Mange, D., Stauffer, A,, and Tempesti, G. (1997). Self-replicating and self-repairing fieldprogrammable processor arrays with universal construction. In T. Higuchi Ed., Evolvable Systems. Proc. ISth tnternat. Joint Conf on Art$ intell., IJCAI-97, pp. 13- 18, Nagoya.
182
1461 1471
1511
JAMES A. REGGIA ETAL.
Merkle, R. (1994). Self-replicating systems and low cost manufacturing. In M. Welland and J. Gimzewski, Eds., The Ultimate Limits of Fabrication and Measurement, Kluwer, Dordrecht, pp. 25-32. Mitchell, M. (1996). An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA. Moore, E. (1962). Machine models of self-reproduction. Proc. Fourteenth Symp. Appl. Math, American Mathematical Society, pp. 17-33. Morowitz, H. (1959). A model of reproduction, Amer. Sci., 47, 261-263. Myhill, J. (1964). The abstract theory of self-reproduction. In Views on General Systems Theory, M. Mesarovif, Ed., Wiley, New York, pp. 106-1 18. Orgel, L. (1992). Molecular replication. Nature, 358, 203-209. Orb, J., Miller, S., and Lazcano, A. (1990). The origin and early evolution of life on earth. Ann. Rev. Earth Planet. Sci., 18, 317-356. Pargellis, A. (1996). The evolution of self-replicating computer organisms. Physica D , 98, 111-127. Penrose, L. (1958). Mechanics of self-reproduction. Ann. Human Genetics, 23, 59-72. Penier, J., Sipper, M., and Zahnd, J. (1996). Toward a viable, self-reproducing universal computer. Physica D , 97, 335-352. Pesavento, U. (1995). An implementation of von Neumann’s self-reproducing machine. Artificial Life, 2(4), 337-354. Ponnamperuma, C., Honda, Y., and Navarro-Gonzilez, R. (1992). Chemical studies on the existence of extraterrestrial life. J . Brit. Interplanet. SOC., 45, 241-249. Preston, K., and Duff, M. (1984). Modern Cellular Aufomata,Plenum, New York. Priese, L. (1976). On a simple combinatorid structure for sublying nontrivial self-reproduction. J . Cybernet, 6, 101-137. Rasmussen, S., Knudsen C., Feldberg R., and Hindsholm, M. (1990). The coreworld: emergence and evolution of cooperative structures in a computational chemistry. Physica D , 42,111-134. Ray, T. (1992). Evolution, ecology and the optimization of digital organisms. Santa Fe Working Paper 92 -08-042. Reggia, J., Armentrout, S., Chou, H., and Peng. Y . (1993). Simple systems that exhibit selfdirected replication. Science, 259, 1282-1288. Reggia, J., Chou, H., Armentrout, S., and Peng, Y . (1992). Transition functions and software documentation. Technical Report, CS-TR-2965, Dept. of Computer Science, UMCP. Richards, F., Meyer, T., and Packwood, N. (1990). Extracting cellular automata rules directly from experimental data. Physica D , 45, 189-202. Rosen, R. (1959). On a logical paradox implicit in the notion of a self-reproducing automaton. Bull. Math. Biophys., 21,387-394. Sipper, M. (1995). Studying artificial life using a simple, general cellular model. Artificial Life, 2(1), 1-35. Sipper, M., and Ruppin, E. (1997). Co-evolving architectures for cellular machines. Physica D , 99,428-441. Smith, A. (1992). inArtiBcia1Life I t , C. Langton, C. Taylor, I. Farmer and S. Rasmusen, Eds., Addison-Wesley, Reading, MA, p. 709. Tamayo, P., and Hartman, H. (1989). Cellular automata, reaction-diffusion systems, and the origin of life. In Artificial Life, C. Langton Ed., Addison-Wesley, Reading, MA. Tempesti, G. (1995). A new self-reproducing cellular automaton capable of construction and computation. In Proc. Third European Conference on Art8cial Life, F. Moran, A. Moreno, 3. Morelo, and P. Chacon, Eds., Springer, Berlin, pp. 555-563. Thatcher, J. (1970). Universality in the von Neumann cellular model. In Essays on Cellular Automnta, A. Burks, Ed., University of Illinois Press, Urbana, ILL, pp. 132-186.
MODELS OF SELF-REPLICATING SYSTEMS
[67] [68] [69] [70] [71] [72] [73]
183
Toffoli, T., and Margolus, N. (1987). Cellular Autorizata Machines, MIT Press, Cambridge,
MA. Vitinyi, P. (1973). Sexually reproducing cellular automata. Math. Biosci., 18,23. von Kiedrowski, G.(1986). A self-replicating hexadeoxynucleotide. Angew. Chem. Int. Ed. Erzgl., 25,932. von Neumann, J. (1966). The Theory of Self-Reproducing Automata, University of Illinois Press, Urbana, ILL. Williams, K. (1993). Simplifications of a self-replication model. Science, 261,925. Wolfram, S. (1986). Theory arid Applications of Cellular Automata, World Scientific, New York. Wolfram, S. (1994). Cellular Automata arid Complexity, Addison Wesley, Reading, MA.
This Page Intentionally Left Blank
Ultrasound Visualization THOMAS R. N E L S O N Division of Physics Department of Radiology University of California, San Diego La Jolla, CA
Abstract This paper will provide an overview of ultrasound visualization with an emphasis on volume visualization. A brief review of ultrasound/acoustic imaging is given from the point of view of ultrasound image formation, transducer design and factors influencing image formation as they relate to visualization of patient anatomy and function. Standard ultrasound imaging features including grey-scale and velocity imaging are reviewed including new developments in the area of contrast and harmonic imaging. Examples of clinical imaging and visualization applications are given throughout to illustrate the concepts under discussion. Approaches for routine clinical imaging are discussed for visualization of anatomy, motion, flow and volume data. Particular attention is paid to methods of visualizing volumetric data-a rapidly developing area of ultrasound imaging. The basic algorithms and approaches to visualization of 3D and 4D ultrasound data are reviewed including issues related to interactivity and user interfaces. The implications of recent developments for future ultrasound imaging/visualization systems are considered.
1. 2.
3.
Introduction to Ultrasound/Acoustic Imaging . . . . . . . . . . . . . . . . . . . . 1.1 Overview of Ultrasound Tissue, Vascular and Volume Imaging . . . . . . Ultrasound Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basic Acoustics of Ultrasound Image Formation . . . . . . . . . . . . . . 2.2 Pulse-echo Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Structure Motion Visualization . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Blood Flow Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Enhanced Blood Flow Visualization . . . . . . . . . . . . . . . . . . . . . 2.6 Artifacts.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Measurement and Quantification . . . . . . . . . . . . . . . . . . . . . . . 2.8 Timechanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volume Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Volume Ultrasound Data Acquisition . . . . . . . . . . . . . . . . . . . . 3.2 Volume Visualization Methods . . . . . . . . . . . . . . . . . . . . . . . 3.3 Optimization of Volume Ultrasound Data Visualization . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 41 ISBN 0-12-01 2147-6
185
186 186 I89 189 191 199 202 206 208 208 212 2 14 216 222 239
Copyright 0 1998 by Academic Press All rights of reproduction in any form reserved.
186 4.
THOMAS R. NELSON
Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . .
240 242 244 244
1, Introduction to Ultrasound/Acoustic Imaging The primary role of visualization in medicine is to provide the physician with information needed to arrive at an accurate diagnosis of the patient’s condition and assess the patient’s response to therapy. Medical visualization methods increasingly rely on computer graphics techniques to assist physicians understand patient anatomy. Visualization helps physicians extract meaningful information from numerical descriptions of complex phenomena using interactive imaging systems. Physicians need these systems for their own insight and to share their observations with their clinical colleagues and their patients. Few medical articles are published without some type of data visualization. Medical visualization resources encompass a broad array of different modalities including X-ray film, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound (US). Sensor technologies utilize a variety of physical probe mechanisms (e.g. electron density, acoustic impedance, magnetic coupling) to produce image data. Increasingly, images are developed from more sophisticated sensors that require the physician to comprehend information derived from technology not developed nor discussed during their education.
1.1 Overview of Ultrasound Tissue, Vascular and Volume Imaging The inherent flexibility of ultrasound imaging, its moderate cost and advantages that include real-time imaging, physiologic measurement, use of non-ionizing radiation and no known bioeffects give ultrasound a vital role in the diagnostic process and important advantages compared to MRI and CT. Ultrasound visualization is in routine use in nearly all American hospitals and many physician offices and clinics and currently is used to diagnose much pathology. Over the past few years ultrasound imaging has made tremendous progress in obtaining important diagnostic information from the patient in a rapid, non-invasive manner and benefitted from significant improvements in image quality and visualization clarity (Fig. 1).Much of this progress has been derived from utilizing information present in the ultrasound signal such as back scatter and Doppler shift to extract useful physiological and tissue structure information. As ultrasound equipment has benefitted from increasingly sophisticated computer technology, systems integration has ensured better image quality, data acquisition,
ULTRASOUND VISUALIZATION
187
Ultrasound Visualization
20 Plane
3 0 Volume Rendered
FIG. 1. Traditional 2D ultrasound visualization produces planar tomographic slices through the object of interest, in this case a fetal head and face seen in profile on the left. Volume visualization methods provide a means of displaying the entire structure in a single image, significantly improving comprehension of spatial relationships as is seen in the fetal face on the right where the eyes, nose and lips are readily visible.
analysis, and display. Advances in technology, particularly high speed computing and storage hardware, have further expanded the possibilities for maximizing patient diagnostic information. Recently, volume sonography has sparked interest both in the academic community and commercial industry offering new opportunities in patient visualization (Kasoff, 1995; Pretorius and Nelson, 1991). Advanced technology permits volume imaging methods to be applied to diagnostic ultrasound visualization with interactive manipulation of volume data using rendering, rotation and zooming in on localized features. Although the eventual role for ultrasound volume visualization has yet to be determined, there is little doubt that its impact will be broad and substantial. In the near future ultrasound volume visualization will be a routine part of patient diagnosis and management. Two-dimensional ultrasound grey-scale imaging forms the basis of all ultrasound visualization with new techniques and instrumentation being continually
188
THOMAS R. NELSON
developed to solve diagnostic problems that 2DUS methods cannot adequately address. The overall approach to examination of the patient with ultrasound is shown in Fig. 2. Once the anatomy of interest is identified and the system is optimized (FOV, frequency, TGC, etc.) a quick 2D exploration of the organ is made. Often this is sufficient to assess the anatomic structure. Selected images are saved for the radiologist and referring physician to make a diagnosis. The success of this phase of the study depends on the patient habitus, cooperation and the specific location of the target organ. If there are obscuring structures (bones, gas, etc.) then the study will require more time to obtain adequate diagnostic images. Superficial organs may be imaged with higher frequency transducers to obtain
FIG.2. A flow diagram showing the general approach to ultrasound visualization. On any individual patient, the scope of the diagnostic examination can he expanded to include these more advanced techniques as needed. For example, duplex/color/power Doppler methods are employed to image and measure blood flow in vessels. Endovaginal probes were developed to image the uterus more clearly than possible with 2D surface probes. Volume imaging extends visualization of anatomy to complex curved structures such as fetal faces, tortuous vessels etc. that do not lie in a single plane and offers rendering of the volume in addition to reslicing. In each of these scenarios, more advanced techniques not only provide something new hut incorporate prior developments (e.g. probes that provide color/power capability also perform grey-scale imaging and volume imaging methods utilize grey-scale and color/power imaging).
ULTRASOUND VISUALIZATION
189
better resolution. If the organ is deep, a lower frequency transducer may be required. If the anatomy is relatively straightforward then the study can be completed rapidly. More complex anatomy (i.e. tortuous, curved, poorly visualized) often requires additional time to optimize the images. In some cases the optimal diagnostic view is not available because of the patient orientation. Blood flow data may be obtained after survey imaging is completed using pulsed duplex Doppler and color/power Doppler imaging. Duplex is used to measure the velocity of the blood while color/power are used to image the vascular anatomy and determine the optimal location for duplex measurements. In some cases grey-scale imaging can localize the area for duplex imaging although color/power provide a more readily understood picture of the vascular layout. Improved flow visualization and sensitivity can be made using contrast agents and imaging at the harmonic of the excitation frequency without sacrificing spatial and temporal resolution as is the case with color/power imaging. Similar approaches can be used for cardiac imaging of structural motion and blood flow particularly regurgitation. Often the tissue and vascular anatomy have a complex geometry that is difficult to visualize. The typical approach to overcome this problem is to scan repeatedly through the region of interest until it is clear what the exact relationships are. With complex or abnormal structures this process can be time-consuming and tedious. Volume imaging and visualization methods can assist this process by presenting the entire volume in a single image. Currently most volume visualization is performed using graphics workstations to provide interactive performance. Clinical scanners such as the Kretz Voluson 530 are increasingly integrating workstation performance in the scanner to provide volume visualization at the patient bedside. Volume imaging provides the physician with the capability to image the patient after they have left the scanning suite and re-evaluate the diagnosis at remote locations with experts. Increasingly, ultrasound visualization methods are expanding beyond their traditional areas to include intravascular imaging, intra-cavitary imaging and interventional imaging. These areas all include the methods mentioned above but are adapted to smaller transducers and specialized applications. This paper will provide an overview of some of the basic concepts behind ultrasound visualization, including volume visualization, and describe areas of application.
2. 2.1
Ultrasound Image Formation
Basic Acoustics of Ultrasound Image Formation
The idealized ultrasound imaging system uses a transducer in the probe to transmit a finite length ultrasound pulse into the patient. Typical medical imaging ultrasound employs frequencies between 1 and 10 MHz with pulse lengths of a
190
THOMAS R. NELSON
few cycles duration (Krestel, 1990; Macovski, 1983). Specialized applications in dermatology (Fomage et al., 1993), opthamology and intra-vascular (Kluytmans et al., 1995) imaging typically can use higher frequencies (15-100 MHz). The pulse passes uniformly through the tissue(s) of the patient. Along the path of the pulse, acoustic inhomogeneities in the tissue are encountered that result in scattering of some fraction of the incident energy. Each scattering interaction results in reflection of some of the pulse energy back towards the transducer and transmission of pulse energy forward along the path direction. The relative fraction of pulse energy reflected back toward the transducer depends on the relative acoustic impedance (z) between the tissue and the inhomogeneity, which in turn depends on their respective density ('p) and the local velocity of sound (c): (z =pc). The scattering of the pulse along its path is a continuous process since the tissue is a complex heterogenous material composed of cells, fibers, connective tissue, fluid etc. The size of these structures also varies widely ranging from centimeters to submicrons giving rise to a very complex interaction environment that includes diffraction, scattering and specular interactions. Pulse energy that is reflected back toward the transducer is detected and the amplitude, phase and time delay are measured relative to the transmitted pulse. The time delay is directly related to the depth in tissue at which the interaction occurred, Also, in general, the deeper in the tissue in which the reflection occurs the smaller the amplitude of the pulse. As a result, reflections occurring at two different depths for the same type of tissue will have quite different amplitudes based on their depth. In order to equalize the amplitudes of the returning echoes from different depths it is necessary to correct for the attenuation of the pulse as it travels through tissue. This can be done exactly if the attenuation coefficient of the material is known. Attenuation of ultrasound energy may be approximated by an exponential (Krestel, 1990)
where a(x)is the attenuation coefficient at a particular depth (x)which varies with tissue type and frequency. In practice, owing to the tissue heterogeneity and the complexity of acoustic interactions, an exact attenuation coefficient is seldom known. A typical attenuation value is on the order of 1 dB/cm/MHz for most biological tissues. Under some circumstances the pulse is reflected off a moving interface, which could be a moving blood cell or a structure such as a heart valve. In these cases the reflected pulse will exhibit similar features to reflections from a static structure plus a phase shift corresponding to the magnitude of the component of motion along the path of the pulse. The phase shift can be measured separately from the pulse amplitude and a measure of the velocity of motion along the path made.
ULTRASOUND VISUALIZATION
2.2
191
Pulse-echo Imaging
The image formation process in ultrasound imaging is based on measuring the amplitude of the reflected pulse (echo) and displaying it as a brightness on some type of display (Kremkau, 1989; Wells, 1993). The location of the echo is determined by the time delay between the transmitted pulse and the returning echo.The exact velocity in the tissue is typically unknown but assumed to be approximately 1540 m/s. The actual velocity of ultrasound in tissue varies with the type of tissue and the frequency of the ultrasound. As a result, there may be small errors in assignment of the echo position in the display. In general practice these errors are not significant, although they are measurable and can contribute to artifacts in volume ultrasound imaging. For each transmitted pulse, a chain of echoes will be detected corresponding to the locations of inhomogeneities along the path of the pulse (Fig. 3). Thus for each pulse there will be a line of echoes whose amplitude (brightness) corresponds to the impedance differences encountered along the path. By changing the origin and orientation of the transmitted pulse it is possible to obtain a series of echo lines and produce an image.
2.2.7
Grey-scale Imaging
Early ultrasound scanners used simple systems with a fixed focus transducer and a storage display to sweep out a series of echo lines and produce an image. A more successful approach has been to use a transducer comprising a series of individual elements (64-256) in a single probe holder. The multiple individual elements are excited by a shaped excitation pulse and the resultant wavefront pulse focused into a line with a prime focus at a location defined by the excitation pulse shape (Fig. 4). Generally, each pulse uses a slightly different excitation waveform which steers the beam in a different direction. Electronic phased-array systems can dynamically change their focus depth and field-of-view (Krestel, 1990). The field-of-view depth selected places an upper limit on the pulse repetition frequency (PRF) (number of pulses per unit time) since it is necessary to wait for all returning echoes before beginning the next excitation in order to avoid ambiguous echo locations. By repeating the excitation in the same location with different focal depths it is possible to obtain several echo lines, with different focal depths, and combine them into an optimized scan line utilizing the prime focus range for each excitation (Fig. 5). The number of updated images produced per second depends on the number of scan lines, the field-of-view depth and the number of focal zones used per line. Grey-scale images can produce high quality tomographic displays of patient anatomy updated rapidly as the sonographer scans across the patient surface (Fig. 6). Most organs are larger than the field of view provided by the transducer. As a result
192
THOMAS R. NELSON
FIG.3. A schematic diagram of the propagation of an ultrasound pulse through tissue. At each interface a fraction of the incident energy based on the relative impedance differences is reflected back towards the transducer with the remainder continuing in the forward direction. As the pulse propagates through the tissue, the intensity is attenuated due to absorption and reflection and scattering. Attenuation is approximately exponential. The time delay from the pulse initiation to the returning echo depends on the depth of the reflecting interface and the velocity of sound (approximately 1540 m/s).
scanning across the patient surface is necessary to completely visualize patient anatomy. The limited field-of-view afforded by current transducers makes it difficult to visualize the entire organ in a single image, something that complicates the diagnostic process and requires the physician to mentally combine information from multiple images into a single impression. One approach used to overcome this problem has been to develop a family of transducers each possessing features optimized for scanning a particular type of anatomy. In addition to the standard skin surface designs there also are intra-luminal transducers the size of a catheter that are used to image the inside of vessels and immediately adjacent tissues. These types of catheters are particularly useful for imaging obstructions to blood flow. There are a range of intra-cavitary transducers (e.g. endo-vaginal,
ULTRASOUND VISUALIZATION
193
FIG.4. Modem transducers are multi-element phased arrays in which a shaped excitation pulse is used to simultaneously excite several piezo-electric elements as shown in the left image. The excitation pulse to different elements in the transducer is delayed to produce the desired acoustic pulse profile. The shape of the excitation determines the acoustic wave properties including intensity, uniformity, focus, etc. Generally, groups of elements are excited, giving rise to tightly focused waves whose position is shifted across the transducer face, producing multiple excitation lines that can be used to create an image as shown in the right image. Typical scanning configurations produce approximately 128-256 lines per image, depending on the desired frequency, depth, frame rate etc.
endorectal, etc.) that are used to get the transducer closer to the object or organ of interest. There also are trans-esophageal transducers that are used for cardiac imaging. The objective of these devices is to get the transducer as close as possible to the organ to improve image quality or overcome limitations imposed on
194
THOMAS R. NELSON
FIG. 5. The focal zone resulting from a shaped excitation can be altered depending on the location of the target of interest. Optimal resolution occurs in the focal zone, degrading as one moves progressively farther away as can be appreciated from the resolution points in the left image of a resolution test pattern. A more uniform resolution through the entire field of view can be produced by using multiple excitations at the same location with different focal zone depths. Combination of the different zones through time-gating of the echo line results in improved resolution as seen in the right image of a resolution test pattern. The more uniform resolution comes at the cost of reducing frame rates since multiple excitations must be used to form a single line in the image. The (>) sign in each of the resolution test pattern images specifies the location of the focus zones.
surface imaging such as depth, overlying obstructions (e.g. lung air, bone, etc.). As a result, it often is possible to use imaging frequencies that are higher than would be possible if imaging from the skin surface, which improves spatial and contrast resolution. More sophisticated techniques also can improve anatomic visualization. Recently, scanning techniques have been introduced utilizing the highly correlated information from one scan to the next to electronically combine images from different viewing regions. By combining data from a series of scans acquired along a line in a common plane images exhibiting an extended field-of-view are possible (Fig. 7). Using this technology it is possible to image the entire length of a structure such as the artery or vein in a leg.
ULTRASOUND VISUALIZATION
195
Grey Scale Imaging
FIG.6 . A typical grey-scale image of an adult liver imaged at 3 MHz using a 140 mm field of view. The dots along each side of the image represent 1 cm distances and the focal zone is indicated by the (2)sign. Note the brighter signal levels down-field from the anechoic (dark) area corresponding to the gall bladder which is the result of reduced attenuation in the gall bladder. Variations in image uniformity are strongly dependent on the local architecture and composition of the tissues in the field of view.
2.2.2
Depth Gain Compensation
As mentioned above, the amplitude of the echo depends on the depth at which the echo occurs and the tissue acoustic properties. It is possible to automatically correct for echo attenuation if the attenuation coefficient is known. In practice, we rarely know the exact attenuation coefficient for the tissues being imaged and rely instead upon a time-gain compensation (TGC) system that provides an amount of amplification based on the time delay from the transmitted pulse. The gains of a series of time delay (depth) windows are adjusted independently to produce a relatively flat output amplitude response for the returning echoes along each line (Fig. 8). Typically, a level of individual patient optimization is required to obtain optimal images since the patient anatomy, organs, tissues and fluids imaged can have a significant effect on the resultant image which makes automated algorithms difficult to design.
196
THOMAS R. NELSON
Extended Field-of-View Imaging
RG.7. An extended field-of-view image showing a transplanted kidney adjacent to the liver. The top image is a standard transducer field of view. The lower image is produced by moving the transducer across the anatomy of interest. Each image during the moving scan is combined with previous images using image correlation data to “stitch” together the extended field-ofview image. The overall anatomic relationships are more clearly seen with tlus method with the tradeoff in the loss of “real-time” imaging. (Images courtesy of Siemens Ultrasound).
2.2.3 Spatial and Contrast Resolution Transducers and acquisition and post-processing algorithms are critical elements in visualization. Without proper selection and adjustment of scanner parameters optimal images will not be easy to obtain. Optimal organ visualization critically depends on spatial, contrast and temporal resolution.
2.2.3.7
Spatial Resolution. Spatial resolution can be further specified in terms of its axial, lateral and elevational resolution (Figs 9 and 10). Axial resolution is the minimum distance between two objects that are positioned along the beam axis that can be resolved. Axial resolution depends on the pulse length and the frequency of the ultrasound. The axial resolution can be approximated by the
197
ULTRASOUND VISUALIZATION
Gainj
Phantom wlo TGC
Liver w/o TGC
Phantom w TGC
Liver w TGC
FIG.8. Tinie-gain compensation is used to correct for the exponential attenuation of the ultrasound signal as it propagates through the patient. Since the exact attenuation cannot be easily determined (or optimized automatically), the sonographer uses a series of sliders corresponding to different depths in the patient that individually adjust the receive gain as a function of depth (time). The left part of the image schematically shows the type of gain relationships versus depth. The first set of images is for a test object imaged with different frequency transducers with and without time-gain compensation (TGC). The right images are for a liver with and without TGC. The far field part of the images are more uniformly visualized with TGC.
pulse length divided by two. Higher frequency transducers will have a shorter pulse length for the same number of cycles in the excitation pulse resulting in a higher axial resolution, Lateral resolution is the minimum distance between two objects positioned perpendicular to the beam axis in the image plane that can be resolved. Lateral resolution depends on the geometry of the excitation wavefront, the number of transducer elements and effective aperture, the depth in the patient and, most importantly, the proximity to the focus point of the acoustic field. The lateral resolution is the beam width (6 dB width) in the scan plane. Elevational resolution is the minimum distance between two objects positioned perpendicular to the imaging plane. Elevational resolution, like lateral resolution, depends on the
198
THOMAS
R. NELSON
FIG.9. The essential spatial resolution definitions for ultrasound imaging. Axial resolution is primarily determined by the pulse length. Lateral resolution is primarily determined by the transducer excitation shape. Elevational resolution is primarily determined by the physical acoustic lens that is placed in front of the transducer elements. Elevational focus is fixed and not altered as the field-of-view or focal zone are adjusted. More recent transducer designs use 1f D arrays in which the elevational elements are activated to provide some degree of elevational focus control.
proximity to the focus point of the image plane which is dependent on a physical lens rather than electronic beam forming. Future transducers will employ electronic elevational focusing in a similar manner to in-plane focusing methods used currently.
2.2.3.2 Contrast Resotution. Contrast resolution is the ability of the imaging system to resolve small differences in echo amplitude. Contrast resolution depends on the adjustment of the various gain parameters (transmit gain, receive gain, TGC, etc.) in addition to the size of the object, the frequency of the ultrasound and the relative speckle pattern distribution. The large dynamic range of the returning echoes (100 dB) is typically log-compressed before display on the
ULTRASOUND VISUALIZATION
199
Liver / Kidney Scan
2.5 MHz
3.5 MHz
4.0 MHz
5.0 MHz
6.0 MHz
FIG. 10. Liver images obtained at five different frequencies (2.5,3.5,4.0,5.0, and 6.0) for the same object in essentially the same location showing the relative performance as a function of frequency. In general, as the frequency increases the resolution improves but the penetration decreases. Note the finer speckle pattern with increasing frequency. The system dynamic range ultimately limits the usable penetration depth as noise begins to dominate.
more limited dynamic range of the display system (24 dB). Human observer dynamic range is typically approximately 20 dB (Kremkau, 1989). As a result contrast resolution depends to a large extent on the type of post-processing algorithm selected in the scanning system (Fig. 11).
2.3 Structure Motion Visualization Part of the versatility of ultrasound imaging includes the capability of visualizing motion occurring within the patient. The two principal types of motion are due to moving structures and blood flow. Structure motion is most commonly measured in evaluation of cardiac motion, although evaluation of joint motion or other structures also is possible.
2.3.1
Temporal Resolution
An important part of visualizing dynamic function is updating the image rapidly enough to accurately capture the motion occurring. The two primary areas where this becomes important are the heart and blood flow. Cardiac motion has relatively high frequency motion components such that imaging rates approaching 100 frames/second are necessary to adequately measure subtle motion changes throughout the cardiac cycle; a probiem exacerbated when imaging the
200
THOMAS R. NELSON
Phantom w, w/o Log compression
301 0
30 /-20
701 0
70 1-20
FIG. 11. One approach to improve visualization of large and small magnitude echoes over a range of depths or tissue types is to use signal compression. The four images show the effect of logarithmic compression on object visualization with different compression (30 or 70 dB) and gain (0 and -20 dB) settings.
fetal heart which may have rates as high as 180 beats/minute. While modem ultrasound equipment may be able to image at high frame rates (>50 frames/second), most display systems are linked to video standards of 25-30 frames/second. There also are physical limits on how fast an image field may be updated based on the number of scan lines, the depth of field, the number of focal zones used to make the image and the type of post-processing or signal averaging occurring. For example, Doppler imaging requires additional processing and signal averaging to improve signal-to-noise ratios which can reduce frame rates. The propagation of sound in tissue places limits on how many scan lines can be produced per unit time for a given field of view. For example, a 128 scan line image for a 10 cm field of view can produce a maximum of 60 images per second based solely on pulse propagation considerations. If the number of scan lines is increased to improve spatial resolution then the frame rate drops accordingly.
2.3.2 Cine-loop Visualization Structure motion can be evaluated by observing motion on the display. Storage of a series of images into a video memory for subsequent redisplay as a
ULTRASOUND VISUALIZATION
20 1
“cine-loop’’ facilitates evaluation of repeated motion events such as cardiac contraction. Cine-loop capability is essential for systems that can produce images at greater than 30 frames/second since all the images are stored in the cine memory and can be played back at slower rates that display all the images.
2.3.3 M-mode Visualization Alternatively, it is possible to place an echo line at a specific location in an image and obtain repeated line samples over time. Originally called motion-mode (M-mode), a trace is displayed showing the change in echo intensity as a function of time (Fig. 12). Quantitative measurements are possible with this approach.
Cardiac M-Mode
FIG.12. Cardiac imaging showing an M-mode display. The upper right image shows the anatomy being imaged. The dotted line shows the location in the image for which a time motion display is produced which is shown in the bottom trace. The vertical lines represent one second intervals. The periodic motion of the heart can be clearly seen. From the M-mode display measurements of wall thickness, valve motion, changes in chamber dimension, etc. may be made.
202
THOMAS
2.4
R. NELSON
Blood Flow Visualization
A more sophisticated means of measuring motion, particularly of blood, is to extract the phase shift information from the returning echo (Wells, 1994; Kremkau, 1990; Taylor et al., 1987), the Doppler shift (An; Af
=
2fv cos(6) C
where 6 is the angle between the ultrasound beam and the motion. The measured Doppler shift for the returning echo depends on the transmitted frequency ( f ) and the velocity component of the motion (v) occurring along the beam axis. Since blood cells generally do not have sufficiently strong echoes to be visible in the displayed image they are seldom visualized directly. However, the intensity of the Doppler shift is often strong enough to be detected. There are several methods used to obtain and display velocity information. The maximum velocity (V,,,) is defined by the pulse repetition frequency:
V,,,
C
= - PRF.
f
(3)
There is a tradeoff between the sensitivity to flow and aliasing due to the PRF and Nyquist sampling limits.
2.4. I
Duplex Doppler
Historically, Doppler information regarding moving blood was obtained using a continuous wave ultrasound probe and the Doppler shift presented as an audible signal. The typical Doppler shift is in the range between 10 and 1000 Hz,which is well within the audible range. Uncertainty regarding vessel location often complicated comprehension of the significance of audible patterns and required skilled practitioners. However, ultrasound imaging provides a clear tomographic picture of the patient anatomy that can be used to identify vessels of interest. Once a vessel has been identified a single echo line with a range, or depth, a gate is positioned in the image at the desired location in the vessel. The Doppler shift from echoes arising at the desired depth in the range-gate is measured and the amplitude displayed as a moving trace similar to that for M-mode but showing the velocity distribution versus time in the range gate (Fig. 13). The combination of ultsasound imaging with range-gate Doppler is called duplex imaging. The shape of the velocity distribution depends on the type of blood flow occurring in the vessel (Fig. 14). Near the heart blood flow changes rapidly and is most like a plug of fluid moving at a constant velocity with a narrow velocity distribution, whereas farther away from the heart viscous effects dominate and a parabolic wavefront develops with a range of velocities. Downstream from constrictions and the
ULTRASOUND VISUALIZATION
203
Duplex Doppler
Rc. 13. Duplex Doppler image of the carotid artery. The upper image is used to identify the vessel of interest and locate a range-gate as shown by the angled line and small box in the vessel. The time-varying velocity is shown on the trace below. It is necessary to correct for the relative orientation between the ultrasound pulse as depicted by the large angled box in the image and the flow channel in order to make the cos(8) correction necessary to obtain a correct velocity measurement. The relative brightness in the Doppler trace represents the relative distribution of velocities within the range-gate. (Image courtesy of Siemens Ultrasound.)
resultant high velocity jets we find a broad distribution of frequencies reflecting the turbulent nature of the blood flow.
2.4.2 Velocity Doppler Imaging A further advance in visualization of blood flow was achieved with the introduction of velocity Doppler imaging (Maslak and Freund, 1991; Wells, 1994). In velocity Doppler imaging the frequency shift within a region-of-interest of the ultrasound image is computed on a pixel by pixel basis and displayed as a color on the image whose hue depends on the direction and whose saturation depends on the velocity component measured (Fig. 15). Several parameters may be calculated depending on the desired goal including: the mean velocity, the
204
THOMAS
Plug Flow
R. NELSON
Laminar Flow
1
Turbulent Flow
I
-3
U
Velocity
Velocity
Velocity
FIG. 14. A diagram depicting the types of velocity distributions observed in vessels, their Doppler velocity distributions. Plug flow is observed near the aortic outflow where there are large velocity gradients over time with inertial terms dominating the flow evolution. Further downstream the viscous effects begin to dominate, resulting in the classic laminar flow parabolic wavefront. High velocity jets often form downstream from a stenosis (or narrowing) in the vessel and the resultant turbulent flow pattern produces a wide distribution of velocities and complex flow pattern.
peak velocity and the variance. The sensitivity of velocity Doppler imaging techniques depends on the number of scatterers and their velocity. Velocity Doppler imaging provides direct visual information regarding the location of velocity components within the field of view, often with sensitivities that make detection of small vessels possible even if not visible in the gray-scale image. Typically, velocity Doppler imaging shows the average velocity within the image plane. Also, due to the low amplitude of the returning echoes, some amount of spatial and temporal averaging is employed to improve the signal-to-noise ratio at the expense of reduced spatial and temporal resolution compared to the grayscale image. In addition, the sampled nature of the process limits the maximum velocity that may be measured for a given sensitivity, making it difficult to obtain good images of small vessels or slow flow without introducing aliasing in larger vessels or significant color noise into the images. Image optimization requires a skilled operator.
ULTRASOUND VISUALIZATION
205
Velocity Doppler Imaging
FIG. 15. Velocity Doppler image of the liver with color signals from velocity components of blood flow in the liver vessels. Colors represent direction with respect to the ultrasound beam and the direction of flow. Different color maps are used to display variance information regarding the distribution of frequencies in the vessel. Color is determined by the direction of flow, not whether the vessel is a vein or artery and color saturation is determined by the magnitude of the velocity component. (Image courtesy of Siemens Ultrasound.)
2.4.3 Power Doppler Imaging The sensitivity of velocity Doppler imaging techniques depends on the number of scatterers and their velocity. Since it often is difficult to obtain good images of small vessels or slow flow due to the limited amount of signal in velocity Doppler imaging a slightly different analysis approach based on the total energy (or power) in the returning echo can be used to extract information regarding blood flow. Instead of showing the mean or variance of the blood flow, power Doppler imaging computes the integral of the entire velocity distribution and displays the magnitude as a color and brightness value (Rubin et al., 1994) (Fig. 16). The angular dependence of velocity Doppler imaging is significantly reduced in power Doppler imaging since the integral is being evaluated instead of a specific
206
THOMAS R. NELSON
Power Doppler Imaging
FIG. 16. Power Doppler image of the placenta and umbilical cord. Power Doppler differs from velocity Doppler in that while velocity Doppler generally presents an image of the mean velocity or variance of the blood flow in the vessel while power Doppler integrates the entire velocity distribution and displays the magnitude of the blood flow. Integration of the entire velocity distribution gives power Doppler improved sensitivity compared to velocity Doppler imaging but generally does not provide an indication of flow direction. The angular dependence of velocity Doppler is significantly reduced in power Doppler because of the integration of all velocities in the vessel limited only by the signal-to-noise ratio and non-perpendicular measurement of velocity.
velocity. The enhanced sensitivity of power Doppler imaging provides more detailed images of vascular anatomy but generally does not show directional information. The increased sensitivity also results in an increased susceptibility to artifact production due to patient motion and tissue movement.
2.5
Enhanced Blood Flow Visualization
Visualization of blood flow in small vessels is often difficult, if not impossible, even for power Doppler imaging. Since these are often the vessels of greatest interest from a visualization standpoint, additional means are needed to enhance their signal-to-noise ratio and detectability. Two complementary technologies are
ULTRASOUND VISUALIZATION
207
emerging that offer great promise to improve blood flow visualization: ultrasound contrast materials and harmonic imaging.
2.5. I
Contrast Materials
Recent work in the area of ultrasound contrast demonstrates considerable progress in visualizing small vessels and improving the visualization of larger vessels (Forsberg et al., 1994; Goldberg et al., 1994). Most ultrasound contrast agents are based on using a gas-filled micro-bubble to produce a large impedance discontinuity in the blood. The large discontinuity produces a strong scattering signal that can be readily detected by the echo-processing electronics (Fig. 17). The enhanced echo signal make it possible to measure Doppler shifts and produce Doppler images with much greater sensitivity, which greatly facilitates visualization. The strong signal from the contrast material makes it possible to directly observe blood flow patterns in vessels. Additionally, enhanced tissue signals also are available from the contrast material as it passes through the capillary bed perfusing the tissue. Functional imaging of tissue perfusion and transit
Enhanced Blood Flow Visualization with Contrast Grey-scale
Velocity Doppler
Power Doppler
Harmonic Imaging
FIG.17. Enhanced blood flow visualization of a dog kidney using conventional, velocity and power Doppler and harmonic imaging with micro-bubble contrast. The strong scattering properties of micro-bubble contrast agents greatly improves detection of small vessels by all imaging modalities. Harmonic imaging offers superior spatial resolution to velocity and power Doppler imaging and provides significant suppression of the background grey-scale echo signal since imaging occurs at a harmonic to the fundamental excitation frequency. (Images courtesy of Siemens Ultrasound and Dr Robert Mattrey).
208
THOMAS R. NELSON
time studies become possible to assist in visualization of underlying function and physiology.
2.5.2 Harmonic lmaging Ultrasound imaging using micro-bubble contrast material enhanced images can provide improved sensitivity and visualization to flow (Burns et al., 1994). However, while the enhanced echo signal is readily detected by Doppler techniques, when imaged using grey-scale methods, under some circumstances, vessels filled with contrast material exhibit similar speckle patterns and intensities to the surrounding tissues. Observations that micro-bubbles absorb ultrasound as a resonant process have led to the development of imaging systems that can detect the re-radiated ultrasound at harmonics of the excitation frequency. When the imaging system is tuned to receive only the harmonic frequencies the non-resonant tissue signals are significantly reduced resulting in very high signal-to-noise ratio images having excellent contrast (Fig. 17).
2.6
Artifacts
Visualization of patient anatomy relies on obtaining an accurate representation of the tissues and organs. The basic physics of ultrasound propagation and reflection also give rise to artifacts that can distort or alter the images used for diagnostic interpretation. The coherent imaging properties of ultrasound inherently produce constructive and destructive interference patterns which are called speckle (Trahey et al., 1986; Wells and Halliwell, 1981; Zagzebski, 1983). The precise speckle pattern produced depends on the frequency, transducer geometry, sidelobe amplitudes, aperture, tissue micro-structure properties, overlying and underlying tissues, orientation of interfaces that give rise to refractive changes in the pulse path and motion which can give rise to aliasing with Doppler measurements or images (Fig. 18).
2.7
Measurement and Quantification
An important part of ultrasound visualization is providing images of sufficient quality to measure the length, area or volume of organs and temporal changes. While visual assessment is valuable, quantitative data provides a more accurate basis for decision making and comparison against previous studies or reference data bases.
2.7.I Length Ultrasound images are tomographic in nature and permit ready measurement of length within the image. The frequency of the transducer, field-of-view and the
209
ULTRASOUND VISUALIZATION
Imaging Artifacts - Speckle Texture, Shadowing Speckle in Uniform Phantom 3.5 MHz 4.0 MHz
2.5 MHz
Shadowing
Doppler Artifacts - Aliasing, Flash
FIG.18. Some examples of imaging artifacts commonly encountered in ultrasound visualization. Speckle is a fundamental part of all ultrasound images and the speckle pattern changes with frequency, transducer geometry, sidelobe amplitudes, aperture, tissue micro-structure properties. Shadowing also is a common ultrasound artifact occurring when a highly reflective or attenuating structure reduces the intensity of the beam for deeper structures such as is shown by the kidney stone in the upper right image. Pulsed Doppler has a specific range over which it can provide valid velocity information dependent on the Nyquist frequency which is determined by the Fourier transform sampling rate and pulse repetition frequencies. Velocity scale adjustments are used to avoid aliasing as is shown in the lower left image. Generally, there is a tradeoff between the maximum un-aliased velocity that can be measured and the sensitivity to low velocity values. Notice the green part of the two images which represents aliasing as the peak velocity wraps around from the peak of the reddish scale to the peak of the greenish scale. The smaller range of velocities provides better visualization of lower velocity components at the expense of aliased higher velocities. While power Doppler does not suffer from aliasing artifacts its greater sensitivity to low flow and motion make it susceptible to flash or motion artifacts as is shown in the lower right image which is due to tissue motion arising from aortic pulsations.
210
THOMAS R. N E L S O N
limiting resolution of the scanner ultimately limit the accuracy of the measurement to a few percent and the precision of the measurement to kO.1 mm. An image cursor is generally placed over the first location and marked. A second cursor is moved to the second location with the distance being directly read from the image display (Fig. 19).
2.7.2 Area Measurement of area is also easily accomplished using a cursor in the image. An outline of the desired anatomic region is drawn using either a free-form closed
Measurement of Length and Area (HC /
BPD)
AC
FL)
Measurement Results
FIG. 19. Measurements are an important part of many ultrasound examinations. These images show measurement of the fetal head circumference (HC) (dots) and bi-parietal diameter (BPD) (arrows), femur length (FL) (arrows) and abdominal circumference (AC) (dots). Each measurement is applied to the appropriate lookup table to determine the gestational age and a composite assessment utilizing all measurements is used to estimate the fetal age and make an intercomparison between the growths of each of the objects measured. Linear and planar measurements are standard on all clinical scanners.
ULTRASOUND VISUALIZATION
21 1
curve or polygon (e.g. an ellipse whose shape, size and orientation are easily modified) method that permits the operator to directly measure the area of an organ or vessel without difficulty (Fig. 19).
2.7.3
Volume
Direct measurement of volumes is not possible with 2D methods since the scanner typically does not provide a means of connecting adjacent tomographic slices. There are algorithms based on assumptions of ellipsoidal geometry that are used to estimate organ volume (Brinkley et al., 1982a; Chan 1993). On the other hand, volume sonographic data provide the opportunity to obtain quantitative data regarding organ size and volume. In distinction to 2D ultrasound imaging that permits accurate distance measurement and limited volume (or circumference) measurement, volume sonography also permits volume measurement (Ariet et al., 1984; Elliott et al., 1996a; Favre et al., 1993; Gilja et al., 1994; Hodges et al., 1994; Hughes et al., 1996; King et al., 1991; Moritz et al., 1980, 1993; Nelson and Pretorius, 1993; Riccabona et al., 1995, 1996; Siu et al., 1993; Terris and Stamey, 1991). Most volume measurements made using conventional 2D ultrasound methods generally are accurate to within *5% if the organs are regularly shaped (i.e. spherical) but are only accurate to within *20% when irregularly shaped. Volume measurement is accomplished by masking using either individual planes masks or a volume interactive tool to limit the volume region of interest to only the object of interest. After the object is masked the voxels are summed and the matrix voxel scaling factors applied to determine the volume (Fig. 20). This approach permits measurement of distance and volume for regular, irregular and disconnected objects with an accuracy of better than 5% for regular and irregular objects and in vivo organs relatively independently of the object size over several orders of magnitude. For small objects in which machine resolution is a significant factor, errors will increase accordingly. The absolute error will vary with the volume of the object and the size of the voxel. A larger field of view will result in larger voxels for the same matrix size with a corresponding increase in the absolute error. In general, the improved measurement accuracy afforded by volume sonographic methods makes possible accurate quantitative measurement of heart chambers, vessel dimensions and organ volumes.
2.7.4
Growth Tables-Fetal
Age
Serial measurements acquired at different times also provide valuable insight regarding normal growth and development. Many scanners incorporate standardized growth tables for fetal development. Measurement data of specific anatomic structures (e.g. head circumference, femur length, etc.) are stored in a data base
212
THOMAS
R. NELSON
Measurement of Volume (Bladder)
FIG.20. Measurements of organ volume have historically utilized approximations based on ellipsoidal or spherical geometries. As such volume measurements for irregular organs often have errors of 20% or larger. Utilizing slice masking under operator control or automatic border finding algorithms each plane of a volume data set is masked and the regions integrated to yield the volume measurement. 3DUS volume measurements can have errors less than 5%.
and at the conclusion of the scan a detailed display is provided showing the estimated gestational age for the fetus (Fig. 18).
2.8 Time Changes Temporal changes visible in ultrasound images typically include structure motion (e.g. cardiac valves and muscle), blood flow (e.g. carotid artery blood flow) and serial changes (e.g. fetal growth and tumor response to therapy) as was described in Section 2.3.
2.8.1 Cardiac Function Cardiac function is a particularly important area of ultrasound visualization and analysis, As mentioned previously, M-mode displays can be used to assess valve
ULTRASOUND VISUALIZATION
213
and contraction motion changes over the cardiac cycle (Fig. 10). It is also possible to monitor the change in geometry of the cardiac chambers throughout the cardiac cycle and measure the amount of blood ejected and other physical parameters associated with contraction dynamics. While much of this quantitative data can be, and is, presented as numerical values, visualization often provides a much better representation of normal or abnormal function (Fig. 21).
2.8.2 Waveform Indices Quantitative evaluation of blood flow typically relies on Doppler techniques to obtain direct measures of velocity in the vessel. Depending on the size of the range-gate the Doppler spectrum may be narrow or wide. Measurement of the peak and average velocity is possible directly from the Doppler waveform and in some cases the mean velocity over the cardiac cycle (MEAN) (Fig. 22).
Visual Display of Cardiac Motion (HP Kinetic Display)
FIG.21. Moving structures such as found in the heart can be measured with display of the relative motion as is seen in this image showing motion of the endo-cardial surface of the left ventricle of the heart. The color represents the relative time delay for the motion and as such provides a visual indication of uniformity of contraction. (Images courtesy of Hewlett-Packard.)
214
THOMAS R. NELSON
Time FIG.22. A diagram of common Doppler ultrasound measurements for a pulsatile waveform. The indices are angle independent subject only to signal-to-noise limitations. As such, they provide a meaningful method of comparing blood flow data when the precise angle is not known.
Owing to the difficulty of exactly determining the relative angle between the ultrasound beam and the vessel axis, particularly in tortuous vessels, several indices have been defined that compare values at two different times in the cardiac cycle: typically end-systole (ES) and end-diastole (ED). These indices (i.e. the Pourcelot Index [(ES-ED)/ES], the Pulsatility Index[(ES-ED)/MEAN] are useful for two reasons; first they tend to cancel out the angular dependence of the measurement since they are the ratio of two Doppler measurements obtained at the same angle, second, they can provide additional insight regarding vascular impedance-something that may not be directly visible from a single Doppler measurement.
3. Volume Visualization Volume visualization methods project a multidimensional data set onto a 2D image plane with the goal of gaining an understanding of the structure contained within the volumetric data. Medical volume visualization techniques must offer
ULTRASOUND VI SUALlZATlON
215
understandable data representations, quick data manipulation, and fast rendering to be useful to physicians. Physicians should be able to change parameters and see the resultant image in real time. Improved display hardware capability at affordable prices have made interactive visualization possible on workstations used in the medical imaging environment. Optimization of volume visualization algorithms is an important area of study, with an understanding of the fundamental algorithms essential to optimize analysis methods (Brodlie, 1991; Wolff, 1992a,b, 1993; Wood, 1992). Volume visualization methods for medical data, while available for some time in computed tomography (CT), single photon emission computed tomography (SPECT), positron emission tomography (PET), and magnetic resonance imaging (MRI) (Fishman et al., 1991), have not achieved widespread clinical use because of the time required to obtain and process high-resolution image data. Real-time 2D ultrasound imaging intrinsically provides interactive visualization of underlying anatomy while providing flexibility in viewing images from different orientations in real time. While real-time 2D sonography has made it possible for physicians to make important contributions to patient management, there are occasions when it is difficult to develop a three-dimensional impression of the patient anatomy, particularly with curved structures, when there is a subtle lesion in an organ, when a mass distorts the normal anatomy, and when there are tortuous vessels or structures not commonly seen. Complex cases often make it difficult for even specialists to understand three-dimensional anatomy based on two-dimensional images. Abnormalities may be difficult to demonstrate with two-dimensional sonography because of the particular planes that must be imaged to develop the entire 3D impression. Integration of views obtained over a region of a patient with volume sonography may permit better visualization in these situations and allow for a more accurate diagnosis to be made (Pretorius and Nelson, 1995b; Baba et al., 1997; Baba and Jurkovic, 1997). As a result, ultrasound volume visualization methods must offer interactivity in order to be competitive with current patient imaging equipment. Volume ultrasound visualization also has an important role in demonstrating normalcy and reassuring patients. Among the first areas of volume ultrasound imaging to be explored has been cardiology (Belohlavek et al., 1992, 1993a,b; Dekker et al., 1974; Fulton et al., 1994; Geiser et al., 1982b; Greenleaf et al., 1993; Levine et al., 1992; McCann et al., 1987, 1988; Ofili and Nanda, 1994; Pandian et al., 1992; Salustri and Roelandt, 1995; Seward et al., 1995; Stickels and Wann, 1984; Vogel et al., 1995) and more recently fetal cardiology (Deng et al., 1996; Nelson et al., 1996; Zozmer et al., 1996). The complex anatomy and dynamics of the heart make it a challenging organ to image. In addition, functional measurements regarding blood flow and ejection fractions make obtaining quantitative information essential to completely understand the heart (Ariet et al., 1984; Nikravesh et al., 1984). Other
216
THOMAS R. NELSON
areas that have also benefitted from volume ultrasound imaging have been the fetus (Devonald et al., 1995; Hamper et al., 1994; Kelly et al., 1994; Kuo et al., 1992; Merz et ul., 1995; Nelson and Pretorius, 1992; Pretorius andNelson, 1995b; Steiner et al., 1994; Warren et al., 1995), which also is a challenging and difficult to evaluate object (Crane et a/., 1994), gynecology (Athanasiou et al., 1997; Balen et al., 1993), urology (Ng et al., 1994a; Tong et al., 1996) and the breast (Rotten et al., 1991). Specialized catheters for intra-cavitary (Feichtinger, 1993; Hiinerbein et al., 1996; Wang et al., 1994) and intra-vascular (Delcker and Diener, 1994; Franceschi et al., 1992; Ng et al., 1994b; Pasterkamp et al., 1995; Rosenfield et al., 1991; von Birgelen et al., 1996) imaging and the capability for interventional procedures with biopsy guidance (State et al., 1996) are expanding the applications of volume imaging.
3.1 Volume Ultrasound Data Acquisition Numerous efforts have focused on the development of 3D imaging techniques using ultrasound’s positioning flexibility and data acquisition speed (Baba et al., 1989; Brinkley et al., 1982a; Fenster and Downey, 1996; Ganapathy et al., 1992; Geiser and Kaufman, 1982; Ghosh et al., 1982; Greenleaf et al., 1993; Kirbach and Whittingham, 1994; King et a/., 1993; Nelson and Pretorius, 1997; Rankin et al., 1993). Much of this effort has targeted integrating transducer position information with the gray-scale sonogram. Multidimensional transducer technology will also benefit volumetric data acquisition and display (Davidsen et al., 1994; Jian and Greenleaf, 1994; Pearson and Pasierski, 1991; Shattuck and von Ramm, 1982; Smith et al., 1992; Snyder et al., 1986; Turnbull and Foster, 1992; von Ramm and Smith, 1990; von Ramm et al., 1991,1994). Thus far ultrasound’s image signal-to-noise properties and speckle characteristics have challenged researchers’ abilities to define organ interfaces and vascular anatomy with sufficient accuracy to permit successful use of standard volumetric data analysis and display methods.
3.1.1 lmage Plane Acquisition The general features of a volume sonographic data acquisition system are shown in Fig. 23. In general, current volume sonographic imaging systems are based on commercially available one-dimensional or annular transducer arrays whose position is accurately monitored by a position-sensing device. Position data may be obtained from stepping motors in the scan head, a translation or rotation device or a position sensor that may be electromagnetic, acoustic or optical (Detmer et al., 1994; Hernandez et al., 1996a; King et al., 1990; Kossoff et al., 1995; Moskalik et al., 1995; Raab et al., 1979). During acquisition, images and position data are stored in a computer for subsequent reprojection into a volume
ULTRASOUND VISUALIZATION
217
FIG. 23. A block diagram of a three-dimensional ultrasound imaging system. Image data are collected in synchronization with position data and stored in a memory system. Subsequently, position and image data are combined to build a volume which is processed, rendered and displayed on the graphics display.
data set. Depending on the type of acquisition used, the acquisition slices may be in the pattern of a wedge, a series of parallel slices, a rotation around a central axis such as from an endocavitary probe or in arbitrary orientations (Fig. 24). The amount of data stored depends on the duration of the acquisition and the number of images per second that are acquired. For example a static scan of the liver may only require 5-15 seconds of data at 10 frames/second while a cardiac examination may require 30-45 seconds of data at 30 frames/second. If color Doppler data is also acquired then additional storage may be required. Systems that image color Doppler data must also project both color and gray-scale data into the volume or separate the color from gray-scale data depending on the visualization task.
3.1.2 Volume Creation After a series of 2DUS images are acquired, the volume is created by placing each image at the proper location in the volume (Fig. 25). Systems that acquire
218
THOMAS R. NELSON
FIG.24. Different types of volumetric ultrasound data acquisition. In each case the probe is moved, either by a motor drive or the sonographer, with position data generated. Freehand scanning requires a separate position sensor to encode the location of the transducer. Wedge, linear and rotational scanning determines position from the motor drive system used to change the array position.
each image using a mechanical stepping of the transducer essentially replicate the process to produce the volume while systems that use a freehand imaging technique must re-register each slice to the volume by more complex trigonometric calculations. Volume sonography imaging systems that integrate the volume transducer/positioning system with the sonographic imager have direct access to the scaling data present in creating the 2D image. Systems that add position sensing to commercially available systems require a calibration process at the initial setup in order to obtain quantitative data from the images. Incorporation of scaling data permits measurement of distance, area and volume in the data volume.
ULTRASOUND VISUALIZATION
219
FIG.25. A block diagram of the process used to build a volume from individual scan planes. Individual pixels are scaled and combined with any offset values from the positioner. Rotation of the scaled pixel to the proper orientation positions the pixel into the proper part of the volume where the voxel at the position is updated by the pixel value.
3.7.3 Physiological Synchronization Imaging cardiac dynamics or blood flow as a function of time in the cardiac cycle requires some method to synchronize the data to the appropriate time of the cardiac cycle. Generally additional data is obtained by using the electrocardiogram (ECG) signal from patient electrodes to provide a trigger signal to synchronize the data acquisition. Either an image at a specified point in the cardiac cycle
220
THOMAS R. NELSON
Open FIG. 26. Two images of an adult heart aortic valve in the open and closed positions. ECG gating is used to synchronize motion with the cardiac cycle. (Images courtesy of TOMTEC.)
can be obtained or the trigger signal can be used to separate all the acquired images into the appropriate part of the cardiac cycle (Fig. 26). It may also be necessary to eliminate ectopic or irregular heart beats from the study analysis. An alternate approach is to perform some type of analysis on the acquired data to extract information about the motion in the image field-such as due to the periodic contraction of the heart. One approach uses a temporal Fourier analysis of the cardiac motion to identify the fundamental frequency of the heart motion and then use the phase information from the Fourier transform to identify the location of each beat within the acquired images (Nelson et al., 1995) (Fig. 27). This method does not require electrical connection to the patient and utilizes the acquired data to determine the periodic behavior. As such it is a retrospective method compared to real-time during the acquisition. In general, synchronization of images to physiologic triggers is demanding from a data storage, analysis and display point of view and is not widely available at this time.
3.1.4
Compounding, Speckle Reduction and Filtering
The signal quality of ultrasound volume data can be improved by image compounding, which occurs when pixels in a multiple planes are reprojected through the same voxel. However, precise image registration is necessary for accurate volume reprojection which presumes that (1) the patient does not move, (2) that accurate system calibration has occurred and (3) that the velocity of sound is constant. Since the speckle intensity of ultrasound images often exceeds specula echo intensity, traditional boundary extraction and segmentation algorithms are
ULTRASOUND VISUALIZATION
22 1
A
B
C
27
D
IIeart Rate
-
Time
FIG.27. A method of determining the cardiac cycle timing using temporal Fourier analysis of the periodic cardiac motion. The upper left image shows a region of interest located over the fetal heart. The upper right image is a magnitude display of the temporal Fourier transform. The lower left image is a plot of the amplitude of the temporally summed values of the temporal Fourier transform showing the location of the fundamental cardiac beat frequency. The upper curve is a profile through the transform data. The lower right image is a time slice through the 2DUS acquisition in the region of the heart showing the heart motion (similar to the previously discussed M-mode) with the Fourier-based cardiac cycle synchronization shown as vertical lines corresponding to end-diastole.
difficult to implement. Image compounding reduces speckle intensity compared to 2D ultrasound and improves segmentation performance (Bashford and von Ramm, 1995; Trahey et al., 1986). After image data are acquired and assembled into a volume some form of interpolation may be necessary to fill in for any gaps due to separation between acquired images as a result of the sweep of the volume. This step is important since gaps in the volume are distracting and make interpretation of the volume data more difficult. The signal-to-noise ratio can further be improved by filtering the volume data with either 3D median and Gaussian filters prior to application of visualization algorithms (Pratt, 1991; Russ, 1992) (Fig. 28).
222
THOMAS R. NELSON
Original
Plane
Augmented Plane
Median Filtered
FIG.28. Images from a liver volume acquisition showing three orthogonal planes through the liver. The first vertical set of images is the original acquisition with gaps where no data were acquired. The center vertical set of images is the same data after application of a nearest neighbor augmentation algorithm to estimate the value of the missing data. The right vertical set of images is following application of a 3-pt cubic (3 x 3 x 3) median filter to the data. The median filter reduces speckle without significantly reducing resolution of the data.
3.2 V o l u m e Visualization Methods Although ultrasound volume data are well suited to visualization (Nelson and Elvins, 1993),the optimal method for physicians to review and interpret patient data has yet to be determined, in part because acoustic data do not represent density and cannot be classified like CT and MRI since tissues exhibit similar acoustic properties and have relatively lower signal-to-noise characteristics than other types of medical image data. Instead ultrasound signal intensity provides a differential measure of how the acoustic impedance changes as sound passes through tissues. Image intensity increases at the interface between tissues of two different impedance values. It is possible to transform ultrasound volume data into another parameter domain that more clearly differentiates structures in the data such as using Doppler-shifted data from regions of moving blood displayed
ULTRASOUND VISUALIZATION
223
as color-coded images merged with gray-scale image data to differentiate vascular from soft tissue structures. Volume data are usually treated as an array of voxels with visualization algorithms sharing common steps. The initial step is data acquisition and volume creation as has already been described. Next volume data are normalized so that they cover a good distribution of values, are high in contrast, and are free of noise and out-of-range values. Generally, the same processing is applied to the entire volume uniformly. Although having a regular grid with identical elements has advantages, most medical imaging techniques use a rectilinear grid, where the voxels are axis-aligned rectangular prisms rather than cubes due to nonisotropic image resolution. Since the rendering process often needs to resample the volume between grid points, the voxel approach assumes that the area around a grid point as the same value as the grid point. This approach has the advantage of making no assumptions about the behavior of data between grid points. That is, only known data values are used for generating an image. Ideally, the data set is scaled so that the ratio of the dimensions is proportional to the ratio of the dimensions of the original object. It might be necessary to interpolate between values in adjacent slices to construct new slices, replicate existing slices, estimate missing values, or convert an iuegular grid or nonorthogonal grid onto a Cartesian grid. Volume filtering, typically a cubic median filter, or some type of data enhancement is applied to ultrasound data because of the relatively poor signal-to-noise ratios. Depending on the visualization objective, data classification or thresholding might then be performed. After data classification, a mapping operation maps the elements into geometric or display primitives. This stage varies the most among volume visualization algorithms (Elvins, 1992b; Foley and Van Dam, 1982). Data classification results in primitives that can be stored, manipulated, intermixed with externally defined primitives, shaded, transformed to screen space, and then displayed. Shading and transforming steps can be reordered and done in several ways.
3.2.1
Volume Visualization Algorithms
There are three basic approaches to volume visualization (Kaufman, 1996; Kaufman et af., 1994); slice projection, surface-fitting and volume rendering. Volume rendering methods include ray-casting, integration methods and splatting.
3.2.1.1 Slice Projection. Extraction of aplanar image of arbitrary orientation at a particular location in a three-dimensional data set utilizes standard coordinate transformations and rotations. This is the most computationally straightforward approach to review data throughout the volume requiring minimal
224
THOMAS R. NELSON
processing with isotropic data (Fig. 29). Slicing methods are one of several interactive techniques available for physician review of patient data (Jones and Min Chen, 1995). Interactive display of planar slices offers the physician retrospective evaluation of anatomy, particularly viewing of arbitrary planes perpendicular to the primary exam axis and other orientations not possible during data acquisition. Multiple slices displayed simultaneously can be particularly valuable to assist in understanding patient anatomy (Fig. 30). Typically, slicing methods are fully interactive, replicating the scanner operational “feel”. Multi-plane displays are often combined with rendered images to assist in localization of slice plane position.
3.2.7.2 Surface Fitting. Surface fit algorithms typically fit planar surface primitives, such as polygons or patches, to values defined during the segmenting process (Ekoule et al., 1991; Kim and Jeong, 1996; Levoy et al., 1988, 1990a; Lorensen and Cline, 1987; Sakas and Walter, 1995; Sander and Zucker, 1986; Schroeder and Lorensen, 1996; Udupa et al., 1991). The surface fit approaches include contour-connecting, marching cubes, marching tetrahedra, dividing cubes, and others. Once the surface is defined interactive display is typically faster than volume rendering methods since fewer data points are used because surface fit methods only traverse the volume once to extract surfaces compared to re-evaluating the entire volume in volume rendering. After extracting the surfaces, rendering hardware and standard rendering methods can be used to quickly render the surface primitives each time the user changes a viewing or lighting parameter (Fig. 31). Changing the surface fit threshold value is time conVolume Slicing Single Plane, Cube Surface, Clip Plane
FIG.29. Diagram of different methods of obtaining slice data from volume data. The first image is a single plane extracted from the volume at an arbitrary orientation. The middle image shows three clipping planes producing a sub-cube in the volume. The right image shows the combination of a single slice extracted with volume rendering behind the clip plane to reveal internal structure of the object.
225
ULTRASOUND VISUALIZATION
Coronal
Axial
Sagittal
Rendered
FIG.30. Volume data for a 35 week fetal scan using the Kretz Combison 530 scanner. Three orthogonal slices are extracted simultaneously. The lower right image is a volume rendered image of the fetal face with a line showing the level of the axial plane in the rendered image. Combination of slices with volume rendered data offers a good method of identifying the specific location of a slice within a volume. The slices in this volume were reoriented prior to display to put them into a standard anatomic orientation which also assists comprehension of the data.
suming because it requires that all of the voxels be revisited to extract a new set of surface primitives. Surface fit methods can suffer from occasional false positive and negative surface pieces, and incorrect handling of small features and branches in the data. Artifacts can be a serious concern in medicine since they could be incorrectly interpreted by physicians as features in the data. Although surface fitting provides a good means for visualizing spatial relationships for the entire volume in a readily comprehended manner, small features may be poorly visualized unless a significant number (>500K) of polygons are used. Large numbers of polygons slow down display performance to the point where volume rendering methods become superior.
226
THOMAS R. NELSON
Rendering of the Fetus
LUUD
wan
Tiled lso-surface
Iso-surface Mesh
Ray-traced
Original Specimen
FIG.3 1. A schematic of the process of surface rendering of a volume of data for a fetal specimen. The 2DUS scan is projected into a volume data set as has already been described. After determination of the appropriate threshold a polygonal iso-surface mesh is generated using standard surface rendering methods. A low resolution mesh is shown in the upper right image. Determination of the optimal threshold is an interactive process of adjustment and review of results. The final high resolution polygonal mesh may have greater than 250 000 polygons to preserve the fine detail of the data. A volume-rendered ray-traced image from the same data is shown in the bottom center image. The lower right image is a photograph of the original specimen showing the high fidelity possible with both types of rendering.
3.2.1.3 Data Classification and Segmentation. Data classification is the most difficult function that a volume visualization user has to perform. Data classification means choosing a threshold if you want to use a surface-fitting algorithm or choosing color (brightness) and opacity (light attenuation) values to go with each possible range of data values if you want to use a volume rendering algorithm. Accurate and automatic segmentation of ultrasound data, essential for high quality surface fitting, is a particularly difficult problem because most tissues in the data volume have similar signal characteristics, making it difficult to separate one tissue or organ from another based on signal intensity alone (Chen et al., 1995; Bovik, 1988; Bashford and von Ramm, 1995, 1996; Bamber et al., 1992; Cootes et al., 1994; Fine et al., 1991; Richard and Keen, 1996; Sakas
ULTRASOUND VISUALIZATION
227
et al., 1994, 1995). Further, acoustic signal intensities vary with depth and tissue overburden so some form of equalization is necessary to stabilize image field uniformity. This task is further complicated by areas subject to acoustic shadowing. Segmentation based on signal void greatly simplifies extraction of structural features. Recent work by Baba (1997) has demonstrated fetal surfaces in near real time using simple thresholding to identify the fetal surface in the amniotic fluid. Blood flow data, whether from Doppler, contrast or signal void may be reprojected and processed in concert with or separately from three-dimensional tissue data (Ashton et al., 1996; Blankenhorn et al., 1983; Bruining et al., 1995; Carson et al., 1992; Cavaye et al., 1991; Downey and Fenster, 1995; Ehricke et al., 1994; Ferrara et al., 1996; Guo and Fenster, 1996; Hashimoto et al., 1995; Kitney et al., 1989; Klein et al., 1992; Picot et al., 1993; Pretorius et al., 1992; Ritchie et al., 1996; Selzer et al., 1989; Zhenyu et al., 1995). Segmentation of blood flow data is relatively straightforward compared to tissue and organ segmentation where interfaces are less visible. A user usually creates color and opacity tables after exploring a slice of data with a probe that prints out data values. The user enters ranges of data values along with preliminary color and opacity values, then renders a test image. Based on the test image the user adjusts the value ranges and the corresponding color and opacity values, then renders a second test image. This process is repeated many times until an acceptable image is generated. Interactive performance greatly assists this process by providing real-time feedback regarding parameter adjustment. Using colors that approximate the color of the tissue being studied can heighten both the “realism” and cognition process when there are stereotypical colors associated with a certain object, such as red for arteries and blue for veins (Fig. 32). Figure 33 shows a ray-cast image of liver vessels using a simple classification based on the power Doppler signal and negative contrast to differentiate between liver tissue and vessels. Slight changes in opacity values often have a large impact on the rendered image.
3.2.1.4 Volume Rendering. Volume rendering methods map voxels directly onto the screen without using geometric primitives (Cabral, et al., 1995; Cohen et al., 1992; Kajiya and Von Herzen, 1984; Levoy, 1990; Sabella, 1988; Sarti et al., 1993, Steen and Olstad, 1994; Watt and Watt, 1992). One disadvantage of using volume rendering methods is that the entire data set must be sampled each time an image is rendered. Sometimes a low resolution pass is used to quickly create low-quality images for parameter checking. The most often used volume visualization algorithm for the production of high quality images is raycasting. Ray-casting conducts an image-order traversal of the image plane pixels, finding a color and opacity for each (Drebin et al., 1988). Consider a ray passing
228
THOMAS R. NELSON
Rendered Carotid Vessels Surface Volume
FIG.32. Volume data from a velocity Doppler study of the carotid artery. The left image is a surface rendered image based on a defined threshold for the color velocity data. The right image is a volume rendered, ray-cast image of the same data.
through a voxel in a volume. The trajectory of the ray is defined by the relative viewing orientation of the observer and the volume. Along a particular ray, the intensity of the emerging ray is determined by all of the voxels lying along the ray path. At a specific voxel lying on the ray path we must consider several parallel processes. The value of the emerging ray (C,,) is determined by the shade (c(k))and opacity (u(k))of the voxel (V(k)) and the value of the incoming ray (Ci,) (Fig. 34) and is given by
c,,, = C,,U
-
+ c(k)a(k),
(4)
where a(k) is the opacity value based on a mapping function relating opacity to the voxel brightness. When a(k)= 0 the voxel is transparent and when a ( k )= 1 the voxel is opaque. c(k)is a shade value based on the local gradient or some other modifying parameter for the voxel. V(k)describes the voxel’s brightness and/or color. The specific rendering result depends on the choice of the mapping functions for a(k)and c(k). The opacities, shades and colors encountered along the ray are blended to find the opacity and color of the pixel (P(r)).Each element along a
ULTRASOUND VISUALIZATION
229
Segmentation of Liver Vessels Using Signal Void and Power Doppler
FIG.33. Two approaches to segmentation of vessels. The left image uses the minimum intensity projection of a liver dataset to clearly show the 3D structure of the vessels. Minimum intensity projection is an important method for presenting echo-poor structures embedded in tissue, such as vessels, cysts, etc. requiring minimal parameter adjustment and capable of being performed very rapidly. The right image segments the vessels from the grey-scale data based on the power Doppler signal. Volume rendering of the flow signal produces a clear picture of the continuity of the vessel flows within the liver. (Minimum intensity projection image courtesy of Dr G. Sakas.)
ray contributes intensity and color to the final image. The overall process for a particular pixel ( P ( r ) )along a ray ( r ) in the rendered image is given as
where r , k is the kth voxel along the rth ray and
c(r$ = ‘background a(r,O) = 1
c
In ray-casting, rays continue in a straight line until a(k)= 1 or the ray exits the rear of the volume when the process ends since the projection is fully opaque and nothing further can be added to the image. No shadows or reflections are generated in ray-casting and these useful visual cues must be added to optimize visual presentation. Ray-casting methods can also be used to project the maximum intensity along a ray onto the plane. While ray-casting is CPU-intensive, the images show the entire data set depending on opacity and intensity values, not just a collection of thin surfaces as in surface fitting. Ray-casting can be parallelized
230
THOMAS R. NELSON
Ray Casting of Voxels
FIG.34. A diagram showing the basic concepts of ray-casting. In ray-casting, the voxel intensity is propagated forward toward the viewing plane along each ray from back to front. Each voxel (V(k))contributes to the fmal image intensity based on shading (c(k))and transparency (a(k))values. Prototype examples of the c(k) and a ( k ) mapping functions are shown in the small graphs. Once C ( a ( k ) )= 1, full opacity has been reached and further passage of the ray into the volume does not contribute to the final image.
at the pixel level since rays from all of the pixels in the image plane can be cast independently. Fourier transform methods offer performance and computational advantages for volume rendering (Levoy, 1992; Malzbender, 1993). Another volume-rendering algorithm called splatting performs a front-to-back object-order traversal of the voxels in the volumetric data set. Each voxel’s contribution to the image is calculated and composited using a series of table lookups. Some splatting optimizations are given in Laur and Hanrahan (1991). Splatting also has the advantage that the viewer sees all of the data values. Maximum and minimum intensity projection ( M I P ) methods are one form of ray-casting where only the maximum (minimum) voxel value is retained as the ray traverses the data volume. These techniques are extremely simple to implement
ULTRASOUND VISUALIZATION
23 1
and provide good quality results for many applications (Nelson and Pretorius, 1992, 1995; Pretorius and Nelson, 1994).
3.2.1.5 Viewing and Depth Cue Enhancement. Using orthographic views for medical visualization assures that physicians will not see patient data warped by the perspective transformation. When perspective is not used, other depth cues such as animation, depth-fog attenuation, depth-brightness attenuation, and shading become necessary. Most volume rendering and surface fit algorithms use some type of gradient shading. One method for finding the gradient is to use the Mark-Hildreth operator, which is the convolution of a Gaussian with a Laplacian:
V(X,Y,Z) = V2(V(x,y,z) "*G((x,y,z),c~)) where
The gradient can be used to approximate the normal to an imaginary surface passing through the voxel location. Most standard graphics shading models can be applied in shading elements once an approximate normal is known. Ambient, diffuse, and sometimes specular lighting components are used in the volume visualization shading process. As in other computer graphics applications, shading is an important factor in creating understandable volume images of anatomy. Exponential depth shading is also a useful technique to enhance the perspective of depth in a volume rendered image. In this approach, often used with MIP techniques, the intensity of the voxel is reduced by an amount based on an exponential reduction in intensity as a function of the voxel depth in the volume (Fig. 35). K
without depth shading: P(r) = max V(r,k ) k=O
K
with depth shading: P(r) = max (V(r,k ) k=O
where: B(k) is the attenuation factor at kth location along the rth ray
(7)
Depth shading reduces visual aliasing during rotation when near and far points of the volume cross. Figures 36, 37, 38, 39, and 40 demonstrate some examples of volume-rendered data sets using depth shading and different types of rendering algorithms.
3.2.1.6 Rendering of High-resolution Ultrasound Data. Most commercially available ultrasound scanners produce images of approximately 512 by 512 pixels with an 8-bit integer representing each pixel. A 5123 volume is
232
THOMAS R. NELSON
Maximum Intensity Projection
Depth Shading
0
180
No Depth Shading
0
180
FIG.35. Images showing maximum intensity projection comparing the effect of depth shading. All images are produced from the same data set. The two images on the left show front (0") and back (180")views of the 22 week fetal spine. Structures nearest the viewer are emphasized through use of an exponential weighting function (eq. 7). In comparison, the two righthand images are produced without a weighting function. The image appears the same from both orientations (0' and 180") which results in visual aliasing during rotation when the rotating object appears to change direction (or orientation) spontaneously. Objects close to the back appear to be moving in the opposite direction to objects close to the front. With both near and far objects having the same intensity, the viewer is unable to determine the true position and location. Depth shading eliminates this difficulty.
134 megabytes in size. Rendering a data volume of this size in real time is untenable on affordable workstations and just a few of these data volumes would fill many current hard-disk storage devices. Since physicians need to be able to work with large data volumes to preserve resolution for diagnostic evaluation, progressive and adaptive rendering methods are used to provide intermediate rendering to adjust parameters followed by high resolution rendering for the final image. Highend computers integrated into a distributed medical imaging system can also be used so that large data volumes can be interactively manipulated and viewed, over a network via a high-performance rendering engine or via a VRML web site (Elvins, 1992a, 1996; Elvins and Nadeau, 1991; Bapty er al., 1994; Garrett er al., 1996; Gorfu and Schattner, 1992). Interactive, user-directed erosion techniques where the user removes layers of three-dimensional data with the resulting volume immediately rendered so that the user can analyze and adjust the amount of data removed can be helpful in improving interactivity. If the user has only affected a small area of the volume, then the image is adaptively rendered to save CPU time. Allowing users to
233
ULTRASOUND VISUALIZATION
Rendering of Fetal Face
Maximum Intensity
Surface
Cloud
X-ray
FIG.36. Four images of a fetal face showing the effect of different types of rendering. The left image shows a maximum intensity projection image demonstrating internal structures such as bones. The second image from the left shows a surface rendered image using a combination threshold-gradient method that clearly shows surfaces, but sometimes produces a “stone-hard’’ surface. The third image from the left shows a semi-transparent surface “painted” by the original gray values of the volume preserves high detail and includes original gray-scale values on the 3D surface. The fourth image is an X-ray image, which is essentially an average of all the voxels along a given ray and is of limited value for most ultrasound data. (Images courtesy of Dr G . Sakas.)
separate structures of interest from surrounding diagnostically less important data enhances previously difficult to see anatomical features.
3.2.1.7 Animation. Animation sequences such as rotation and gated “cine-loop’’ review greatly assist volume visualization. Without the animation display of rendered volumetric images offered by real-time processing or pre-calculation, the physician often, has a difficult time extracting three-dimensional information from two-dimensional displays. Motion can be viewed at normal, accelerated or reduced speed to enhance comprehension. Furthermore, analysis of dynamic function can increase the diagnostic value of many studies (Schwartz et al., 1994) (Fig. 41). Volume rendering methods make it possible to follow structural curvature that cannot be viewed in any planar orientation. Rotation of anatomy to a standard presentation facilitates identification of subtle anatomic landmarks by clarifying the
234
THOMAS R. NELSON
Rendering of Fetal Faces
35 wks
26 wks
micrognathia cleft lip/palate
FIG. 37. A series of four fetal faces showing the superb quality of volume rendering methods in depicting fetal anatomy. These images were produced on a Kretz Combison 530 scanner. The right image clearly shows a cleft lip (arrow) which can be diagnostically challenging to identify using conventional 2DUS.
Rendering of Umbilical Cord and Placental Vessels
2D Ultrasound
Volume Rendered
FIG.38. Volume rendering of power Doppler data of the placenta and umbilical cord. The left image is a single slice from the 2DUS acquisition. The right image is a volume-rendered image of the vascular anatomy. The gray-scale signal has been removed. The rendering is a modified maximum intensity method with depth-coding.
ULTRASOUND VlSUALiZATlON
235
Rendering of Liver Vessels
20 Ultrasound
Volume Rendered
R G . 39. Volume rendering of power Doppler data of the liver. The left image is a single slice from the 2DUS acquisition. The right image is a volume rendered image of the vascular anatomy. The gray-scale signal has been removed. The rendering is a modified maximum intensity method with depth-coding.
Rendering of Polycystic Kidney
20 Ultrasound
Volume Rendered
FIG.40. Volume rendering of power Doppler data of a polycystic kidney. The left image is a single slice from the 2DUS acquisition; the signal voids represent the cysts in the kidney. The right image is a volume rendered image of the vascular anatomy and the cysts. The rendering is a modified maximum intensity method with depth coding. Note that the relative position of the cysts and vessels is clearly shown in the volume rendered image.
236
THOMAS R. NELSON
Rotation of Fetal Heart Chambers About Vertical Axis
Rendered Fetal Heart Chambers Through Cardiac Cycle
ED
ES
ED
FIG.41. Imaging of the fetal heart. Volume rendered images of the cardiac chambers and vessels. The signal void in the original acquisition has been extracted to show only the blood signal. Chamber and vessel rendering uses a modified maximum intensity method with depth coding.The upper panel shows volume rendered images from several orientations.The lower panel contains a series of volume rendered images at different points in the cardiac cycle. Note the tricuspid valve (TV) which is clearly seen in each image. The TV is closed in the first lefthand image and proceeds to open during ventricular diastole in the next few frames as blood flows from the atria into the ventricle. Toward the end the TV is closed once again as blood is ejected into the pulmonary artery (PA). Interactive display of cardiac dynamics greatly assists comprehension of cardiac anatomy and function.
relative location of the overlapping structures. Interactive stereo viewing incorporating motion cues makes it possible to separate structures and identify the continuity of structures and enhances anatomic visualization (Adelson and Hansen, 1995).
3.2.1.8 Volume Editing Tools. Physicians often need to measure three-dimensional structures in a data volume. This is straightforward computationally once the user has identified the structure to be measured. Identifying the boundary of an organ or structure in 3D ultrasound data for morphometric analysis, however, benefits from the use of stereo displays and 3D input devices. Although segmentation of ultrasound data is difficult, visualization of high contrast objects such as by using Doppler imaging assists in extraction of vessel or organ features. For other situations, clear identification and differentiation of
ULTRASOUND VISUALIZATION
237
organ interfaces can be difficult. Some of the systems being developed provide an editing box that can be positioned around the object of interest to exclude obscuring structures. Often these boxes are combined with a threshold control to further eliminate unwanted echoes. Another means of removing unwanted echoes is to use interactive editing tools that function in a volume rendered or virtual reality environment. In this situation, the physician uses an “electronic scalpel” to dissect away tissues that are not part of the organ of interest (Nelson et al., 1995) (Fig. 42). The dissection process is
Volume Editing with an Electronic Scapel
FIG. 42. Volume editing of an 18 week fetal study with an electronic scalpel. The original volume data contain the uterus, the fetus and the amniotic fluid. Volume rendering methods cannot readily differentiate what is important and what is superfluous. Removal of unimportant objects is readily accomplished by editing either the individual slices or the volume directly. The electronic scalpel is designed to remove a few voxels between the viewer and the object of interest by direct application to the volume rendered data. Alternatively, a single slice can be edited as shown in the upper left hand image slice with the contour drawn around the fetus. Once the contour has been defined that contour is applied to the entire volume in that orientation and similarly for the two other orthogonal slices. Direct editing of the volume is also used to refine the extraction of the fetus (or object of interest). Using volume editing methods makes it possible to completely, and accurately, extract the fetus from the surrounding tissue in less than 30 seconds compared to 5-10 minutes for conventional single slice editing. Furthermore, volume editing tools make it possible to correct any errors directly on the volume. Editing is conducted from any orientation, permitting rapid optimization of feature extraction.
238
THOMAS R. NELSON
greatly assisted by using volume rendered data, compared to single plane data, particularly if some type of stereoscopic display can be used to create the impression of working in a 3D environment. Under these conditions it is possible to rapidly remove non-important tissue signals and isolate the structure of interest. Recent work using the National Library of Medicine Visible Human Project Database has resulted in a significant growth in development of editing and visualization tools for volumetric anatomy data that will assist visualization of volume ultrasound data.
3.2.7.9 Volume Data User Interfaces. A key challenge in developing a clinically useful ultrasound visualization system is to provide the clinician with an interactive means of reviewing patient data and extracting the diagnostically important information from the study and assist physicians in observing and evaluating patient anatomy (Brady et QZ., 1995; Kaufman et al., 1993; Robb, 1988, 1989). One challenge in viewing volume data is to present it in a readily comprehensible, unambiguous manner. Volume rendering methods enhance comprehension by presenting information from the entire volume to the physician in a single view. To facilitate usage in a clinical environment it is important to have an intuitive, easy-to-use interface with a rapid learning curve that facilitates physician operation of a system. Many current techniques for user-data interaction are often awkward, non-intuitive and unsuited for use in a clinical setting. User interfaces should incorporate interactive review of data to permit optimization of viewing orientation and data presentation (Fuchs et al., 1989). Due to the computationally intensive nature of many visualization algorithms, either high performance computing or pre-calculation is required. Review of patient data must include the flexibility to rotate, scale and view objects from perspectives that optimize visualization of the anatomy of interest. Stereoscopic viewing systems let physicians utilize their binocular interpretative system to clarify structural relationships (Adelson and Hansen, 1995; Herman et al., 1985; Hernandez er al., 1995, 1996b; Howry et ~ l .1956; , Lateiner and Rubio, 1994; Martin et al., 1995). This approach has been shown to enhance identification of small structures with greater confidence in less time. Our experience with interactive stereo displays has been encouraging. Stereo displays are used by physicians integrating display and interactive devices so that physicians do not have to use a computer keyboard; rather they interact with data volumes in 3D space via a stereo display. Touch-screen-based visualization systems also can assist interactivity. A key feature of touch-screen systems is the lack of keyboard and mouse input devices and the use of intuitive touch screens to provide physicians with an easy-to-use and learn interface to the computer system. With this approach, learning time to use the system is significantly reduced and physicians experience an operational environment matching existing film-based viewing systems.
ULTRASOUND VISUALIZATION
239
3.3 Optimization of Volume Ultrasound Data Visualization Determining the optimal method for display of volume ultrasound data is challenging and an area that is continuing to develop, since each choice of visualization method offers trade-offs. A difficulty with volume visualization of medical data has been that anatomic imaging systems generally produce a dense object, that is, one where the majority of voxels contain nonzero, although not necessarily clinically relevant, information. While CT or MRI image data often permit the practitioner to define a clearly identifiable threshold, ultrasound volume data are quite different. Often, different tissues have similar intensities, with only the interface being visible. An important feature of visualization is the interactive capability of the viewing station. The computationally intensive nature of many visualization algorithms requires either high performance computing or precalculation. The ability to review patient data interactively is critical, and in particular the flexibility to rotate, scale, and view objects from perspectives that optimize visualization of the anatomy of interest requires high performance viewing stations. Viewing planar slices from arbitrary orientations is a straightforward method of display for interactive review. It most closely resembles clinical scanning procedures. Additionally, planar slices offer projections that may not be available during patient scanning. Slice displays do not necessarily require intermediate processing or filtering of ultrasound image data, which further minimizes delay time to viewing, Surface fitting provides for rapid evaluation of the overall surface features of the object but is sensitive to noise in the data; the bumpy or erratic surface significantly distorts features used in arriving at the correct diagnosis, which potentially limits use in ultrasound imaging except in specialized applications such as vascular imaging. Volume rendering methods produce high quality images that are relatively tolerant of noise in the ultrasound data. Filtering can improve results as long as it does not obscure fine detail. Careful selection of opacity values helps provide an accurate rendition of the structure being studied. Transparency permits viewing surface and subsurface features, which can help in establishing spatial relationships. For specific applications, maximum intensity methods give a clear view of structures such as bones in the hands, spine, or ribs, although they are often best applied to selected regions of the volume. Spatially aligning two data sets sampled from the same patient (such as soft tissue and blood flow) can create hybrid images that improve the diagnostic process. You can do this by using an algorithm that weights the display based on values from the two data sets at every grid point, either during rendering or as a postprocess compositing step. Physician involvement in optimizing and enhancing visualization tools is essential as part of the ongoing evaluation of visualization techniques.
240
THOMAS R. NELSON
Stereographic viewing appears to be of clinical value to further enhance visualization. Soon ultrasound imaging systems will acquire volume data directly into the ultrasound image workstation, where physicians and sonographers can immediately examine, visualize, and interpret the patient’s anatomy. Clinical applications of volume ultrasound visualization are increasing. Realtime visualization of patient volume data by the physician can enhance the diagnostic process. However, researchers must address specialized problems before widespread clinical use can occur, including improved visualization algorithms and real-time hardware. Regardless of which viewing technique is used a key benefit of volume sonography is the fact that once the patient has been scanned, the original 2D image acquisition data may be analyzed for the entire region or magnified sub-regions without the need to rescan the patient.
4.
Summary
Applications of ultrasound visualization in medicine are ubiquitous (Wells, 1993). With the advent of specialized intra-vascular, endo-tracheal, and endo-cavitary imaging probes nearly every organ system is accessible to ultrasound scanning. Contrast materials further extend the range of diagnostic application. Although diagnostic ultrasound power levels have not been shown to produce bio-effects, recent work using high intensity focused ultrasound has been shown to have potentially useful therapeutic value in producing localized tumor hyperthermia (ter Haar, 1995). An advantage of ultrasound visualization is that it offers tomographic imaging capability at fast update rates (10-100 images/second). Ultrasound visualization traditionally has relied on acquisition of images from a variety of orientations in which the operator has a good eye-hand linkage to assist in feature recognition. As a result ultrasound imaging has been one of the few areas of medical imaging that has not routinely used standardized viewing orientations relying instead on the interactivity of the imaging process to optimize visualization of patient anatomy. Visualization of important landmarks is essential for interpretation and identification of anatomy; particularly in less skilled practitioners. A benefit of ultrasound volume visualization is that data review may be carried out at the console after the patient has left the clinic and the data reoriented to standard views; experience in our institution has shown that viewer comprehension and recognition of anatomy are enhanced by reorienting volume data to a standard anatomic position. An important part of ultrasound visualization is the ability to review patient data interactively. The flexibility to rotate, scale and view objects from perspectives that optimize visualization of the anatomy of interest is critical. Physician involvement in optimizing and enhancing visualization tools is essential in the ongoing evaluation of all these techniques.
ULTRASOUND VISUALIZATION
241
There is a continuing need for development of intuitive user interfaces and semiautomatic data-classification tools, so physicians can quickly learn and use volume visualization systems. Future volume visualization systems will need to render volumes in the context of the source of the data, incorporating images from different modalities for the same patient. Minimizing artifacts from volume visualization algorithms is a matter of diagnostic, ethical, and legal importance where lingering flaws could produce images that lead to an incorrect medical diagnosis. Errors in images could have serious ramifications in surgical procedures that rely on multidimensional data. A standard means for validating algorithms and performance is essential to future visualization efforts. Optimization of ultrasound visualization depends on several factors: First, the overall image quality of the images is dependent on scanner setup and patient habitus. The distance between the structure and the transducer may lead to one side of the anatomy being imaged more clearly than the other side. Second, acquisition of 2DUS images may be better in one plane than others due to thickness of the US scan plane. Third, rendered image quality is affected by the anatomic orientation and beam pathway. Signal dropout from shadowing due to overlying structures can significantly obscure structural detail. Compounding data obtained from different orientations can minimize the adverse effects of signal dropout although elastic deformation of tissues due to pressure from the transducer during scanning can distort and complicate realignment of scans. Volume visualization quality is highly dependent on 2D image quality; poor images result in poor volume data. If the patient moves during the scan then the scan must be repeated or some form of correction for motion must be made. Importantly, faster affordable hardware is needed to shorten the time between acquisition and display. Some equipment is now becoming available for which the volume data are acquired directly into the rendering engine permitting direct volume visualization immediately upon completion of the image acquisition which will greatly enhance clinical acceptance. As experience with 3DUS increases continued refinement of analysis and visualization software will further assist making a more accurate diagnosis. Among the many potential advantages of volume ultrasound visualization compared to current real-time ultrasound methods are improved visualization of normal and abnormal anatomic structures, and evaluation of complex anatomic structures for which it is difficult to develop a 3D understanding. Reduced patient scanning times compared to current 2D techniques could increase the number of patients scanned, thereby increasing operational efficiency and more cost-effective use of sonographers and equipment. Standardization of the ultrasound examination protocols can lead to uniformly high quality examinations and decreased health care costs. Ultimately an improved understanding of ultrasound data offered by volume
242
THOMAS
R. NELSON
ultrasound visualization may make it easier for primary care physicians to understand complex patient anatomy. Tertiary care physicians specializing in ultrasound can further enhance the quality of patient care by using high speed computer networks to review volume ultrasound data at specialization centers. Access to volume data and expertise at specialization centers affords more sophisticated analysis and review, further augmenting patient diagnosis and treatment. Volume ultrasound visualization ultimately will increase patient throughput and reduce health care costs through decreased data acquisition time, improved understanding of complex anatomy and physiological spatial relationships, and, when necessary, rapid computer network transfer of data to more experienced physicians. Shared high-speed networks will connect examination rooms with specialists around the world sharing volume data transparently between medical centers. Physicians in different cities will be able to collaboratively investigate 3D patient data in a shared environment leading to instantaneous data analysis and diagnosis which should further improve health care delivery.
4.1
Future Developments
Volume sonography is an area undergoing rapid development that represents a natural extension of conventional sonography methods which require integrating a series of 2D image slices to develop a 3D impression of underlying anatomy or pathology. Interactive volume visualization enhances the diagnostic process by providing better delineation of complex anatomy and pathology. However, the interactivity essential to assist physicians comprehend patient anatomy and injury and quickly extract vital information currently requires affordable high performance computer graphics systems which are now beginning to become available, Interactive manipulation of images by rotation and zooming in on localized features or isolating cross-sectional slices greatly assists interpretation by physicians and allows for quantitative measurement of volume or area. Sonographic volume imaging permits the sonographer to obtain images that may not be obtainable by conventional sonography due to limitations in patient position or anatomy. Volume sonography that displays images of anatomy and organs in an intuitively straightforward manner will allow physicians to feel as if they are holding a model of the organ in their hands, allowing them to “see” the organ or fetus as it actually is (Bajura et al., 1992; Fuchs et al., 1996; State et al., 1994) (Fig. 43). Dynamic motion can be reviewed by “slowing down” the images to assess fine movement; the valves or the walls of the heart could be or artificially “stopped” to make measurements or to compare specific images, either volume or conventional planar images. Blood flow visualization through the cardiac cycle would be 3D rather than 2D. The volume sonography system of the future will not treat sonographic data as a series of 2D images viewed in either real time or statically but a volume with which the clinician rapidly interacts as
ULTRASOUND VISUALIZATION
243
Virtual Reality Obstetrics Scan
FIG.43. Future volume imaging in ultrasound, and other modalities, will offer real-time interactive review and manipulation of patient data. Instead of viewing patient data on a computer or scanner console the sonographer or physician will directly view the internal patient anatomy using head mounted displays that project the image onto the patient such as is seen in this simulation of what the display would look like. Direct viewing of internal anatomy would facilitate more rapid comprehension of the patient’s condition and provide improved feed-back for interventional procedures.
though exploring internal patient anatomy directly. The technology to accomplish this is available today and will benefit from continued performance increases and cost reductions ultimately providing volume sonographic imaging systems as standard equipment at costs comparable to conventional systems available today. Acquisition of volume ultrasound data affords the possibility of review after the patient has left the medical facility and allows communication of the entire volume via an interactive communications link to a specialist at a tertiary care center. The availability of volume sonographic data potentially could reduce the need to refer a patient to a specialized center by permitting the primary physician and the specialist to consult and interactively review the study, thus improving patient care and reducing costs. In addition, network review of volume data could reduce the operator dependence in patient scanning, therefore standardizing examination protocols. Access to volume data at specialization centers may afford more sophisticated analysis and review, further augmenting patient diagnosis and treatment. Ultimately an improved understanding of patient anatomy
244
THOMAS
R. NELSON
offered by volume sonography may make it easier for primary care physicians to understand complex patient anatomy. ACKNOWLEDGMENTS The author would like to acknowledge the assistance of Dr Dolores H. Pretorius for her assistance in collecting the patient data. The author also appreciates the assistance of Kretz-Teknique, Siemens Ultrasound, General Electric Medical Systems and Acuson Corporation.
REFERENCES AND FURTHER READING Adelson, S. J., and Hansen, C. D. (1995). Fast stereoscopic images with ray-traced volume rendering. Proceedings 1994 Symposium on Volunie Visualization, 3-9, 125. Ariet, M., Geiser, E. A,, Lupkiewicz, S. M., Conetta, D. A,, and Conti, C. R. (1984). Evaluation of a three-dimensional reconstruction to compute left ventricular volume and mass. Anz. J . Cardiol., 54, 4 15-420. Ashton, E. A., Phillips, D., and Parker K. J. (1996). Automated extraction of the LV from 3D cardiac ultrasound scans. Proceedings of the SPIE, 2727,423-429. Athanasiou, S., Khullar, V., and Cardazo, L. (1997). Three-dimensional ultrasound in urogynecology. In Three-dimensiord ultrasound in obstetrics and gynecology. (K. Baba and D. Jurkovic, Eds.), pp. 95-105. Parthenon, New York. Baba, K., and Jurkovic, D. (1997). Three-dimensional Ultrasourid in Obstetrics and Gynecology, Parthenon, New York. Baba, K., Satoh, K., Sakamoto, S., Takashi, Okai, and Ishii, S. (1989). Development of an ultrasonic system for three-dimensional reconstruction of the Fetus. J . Perinat. Med., 17, 19-24. Baba, K., Okai, T., Kozuma, S., Taketani, Y, Mochizuki, T., and Akahane, M. (1997). Real-time processable three-dimensional US in obstetrics. Radiology, 203(2), 57 1-574. Bajura, M., Fuchs, H., and Ohbuchi, R. (1992). Merging virtual objects with the real world: seeing ultrasound imagery within the patient. Coryuter Graphics, 26, 203-210. Balen, F. G., Allen, C. M., Gardener, J. E., Siddle, N. C., and Lees W. R. (1993). 3-dimensional reconstruction of ultrasound images of the uterine cavity. Brirish Jourrial of Radiology, 66, 588-591. Bamber, J. C., Eckersley, R. J., Hubregtse, P., Bush, N. L., Bell, D. S., and Crawford D. C. (1992). Data processing for 3-D ultrasound visualization of tumour anatomy and blood flow. SPIE, 1808, 651-663. Bapty, T., Ball, B., and Abbott, B. (1994). Interactive parallel volume rendering. Proceedings of rhe 5th International Corlfererice on Signal Processing Applications and Technology, 2, 1290-1294. Bashford, G. R., and von Ramm, 0. T. (1995). Speckle structure in three dimensions. Journal of the Acoustical Society of America, 98, 35-42. Bashford, G. R., and von Ramm, 0. T. (1996). Ultrasound three-dimensional velocity measurements by feature tracking. IEEE Transactions on Ultrasonics,Ferroelectrics arid Frequency Control, 43, 376-384. Belohlavek, M., Dutt, V., Greenleaf, J. F., Foley, D. A,, Gerber, T. C., and Seward, J. B. (1992). Multidimensional ultrasonic visualization in cardiology. IEEE I992 Illtrasonics Syniposiuni (Cat. NO. 92CH3118-7/, p. 1137-45.2. Belohlavek, M., Foley, D. A., Gerber, T. C., Greenleaf, J. F., and Seward J. B. (1993a). Three-dimensional ultrasound imaging of the atrial septum: Normal and pathologic anatomy. J . Am. Coll. Cardiol., 22, 1673-1678. Belohlavek, M., Foley, D. A., Gerber, T. C., Kinter, T. M., Greenleaf, J. F., and Seward J. B. (1993b).
ULTRASOUND VISUALIZATION
245
Three- and four-dimensional cardiovascular ultrasound imaging: A new era for echocardiography. Muyo Cliri. Pror., 68, 221-240. Blankenhorn, D. H., Chin, H. P., Strikwerda, S., Bamberger, J., and Hestenes, J. D. (1983). Common carotid artery contours reconstructed in three dimensions from parallel ultrasonic images. Work in progress. Radioiogy. 148, 533-537. Bovik, A. C. (1988). On detecting edges in speckle imagery. IEEE Trotis. Acoust., Speech, Signal Processing, 36(10), 1618-1627. Brady, M. L., Higgins, W. E., Ramaswamy, K., and Srinivasan, R. (1995). Interactive navigation inside 3D radiological images. Proc.vedirlgs 1995 Biomedical Visualization (Cat No 95TBlOOOOl), pp. 33-40.84. Brinkley, J. F., McCallum, W. D., Muraniatsu, S. K., and Liu, D. Y. (1982a). Fetal weight estimation from ultrasonic three-dimensional head and trunk reconstructions: Evaluation in vitro. Am. J. Obsiet. Gynecol., 144, 715-721. Brinkley, J. F., Muramatsu, S. K., McCallum, W. D.. and Popp, R. L. (1982h). If7 vitro evaluation of an ultrasonic three-dimensional imaging and volume system. Ultrasotzir Imaging, 4, 126- 139. Brodlie, K. W. (1991). Scientific Visualization-Techniques and Applications. (K. W. Carpenter, L. A. Earnshaw, R. A. Gallop, J. R. Huhbold, R. J. Mumford, C. D. Osland. C.D. and P. Quarendon Eds.), Springer Verlag Press, Berlin. Bruining, N., van Birgelen, C., Di Mario, C., Prati, F., Li, W., den Heed, W., Patijn, M., de Feyter, P. J., Sermys, P. W., and Roelandt, J . R. T. C. (1995). Dynamic three-dimensional reconstruction of ICUS images based on an EGG-gated pull-back device. Corriputers in Cardiology, 1995 (Car.No. 95CH35874), pp. 633-636. Burns, P. N., Powers, J. E., Simpson, D. H., Brezina, A,, Kolin, A., Chin, C. T., Uhlendorf, V., and Fritzsch, T. (1994). Harmonic power mode Doppler using microbubble contrast agents: an improved method for small vessel flow imaging. 1994 IEEE Ulrrasorzirs Syniposiurn Proceedirigs (Cat. No. 94CH3468-6). p. 1547-50.3. Cabral, B., Cam, N., and Foran, J. (1995). Accelerated volume rendering and tomographic reconstruction using texture mapping hardware. Pi-oceedirrgs1994 Symposium 011 Voiunre Vi.siwiizatiori, pp. 91-98, 131. Carson, P. L., Adler, D. D., Fowlkes, J. B., Hamist, K. and Rubin, J. (1992). Enhanced color flow imaging of breast cancer vasculature: Continuous wave Doppler, three-dimensional display. J . Ultrasound. Med., 11, 377-385. Cavaye, D. M., Tabbara, M. R., Kopchok. G. E., Laas, T. E., and White, R. A. (1991). Three-dimensional vascular ultrasound imaging. T/ic A m . Surg., 57, 751-755. Chan, H. (1993). Noninvasive bladder volume measurement. J . Neuroscience. Nurs., 25, 309-3 12. Chen, C. H., Lee, J. Y., Yang, W. H., Chang, C. M., and Sun Y. N. (1995). Segmentation and reconstruction of prostate from transrectal ultrasound images. Biorriedical Engirieeririg, Applicarioris Basis Corimniunirations,8, 287-292. Coatrieux, J. L., Toumoulin, C., Hamon, C., and LOU.L. (1990). Future mends in 3D medical imaging. IEEE Engineeririg in Medicine and Biology, December, pp. 33-39. Cohen, M. F., Painter, J., Mehta, M., and Kwan-Liu, M. (1992). Volume seedlings. Proceedings qfrhe 1992 Syniposiuni 011 Interactive 3D Gruphirs, ACM Press, pp. 139-145. Cootes. T. F., Hill, A., Taylor, C. J., and Haslem, J. (1994). The use of active shape models for locating structures in medical images. lninge mid Vision Computing, 12, 276-285. Crane, J. P., LeFevre, M. L., Winbom. R. C., Evans, J. K., Ewigman, B. G., Bain, R. P., Frigoletto, F. D., and McNellis, D. (1994). Randomized trial of prenatal ultrasound screening: Impact on detection, management and outcome of anomalous fetuses. Am. .I. Ohsret. Gyrecol.. 171,392-399. Davidsen, R. E., Jensen, J. A,, and Smith. S. W. (1994). Two-dimensional random arrays for real time volumetric imaging. Ultrasonic h i u g u i g , 16, 143-163.
2 46
THOMAS
R. NELSON
Dekker, D. L., Piziali, R. L., and and Dong, E., Jr. (1974). A system for ultrasonically imaging the human heart in three dimensions. Comnput.Biomed. Res., 7, 544-553. Delcker, A,, and Diener, H. C. (1994). Quantification of atherosclerotic plaques in carotid arteries by three-dimensional ultrasound. British Journul ~fRadiology, 67,672-678. Deng, J., Gardener, J. E., Rodeck, C. H., and Lees, W. R. (1996). Fetal echocardiography in three and four dimensions. Ultrasound Med. Biol., 22(8), 979-986. Detmer, P. R., Bashein, G., Hodges, T., Beach, K. W., Filer, E. P., Bums, D. H., and Strandness, D. E., Jr. (1994). 3D ultrasonic image feature localization based on magnetic scanhead tracking: in vitro calibration and validation. Ultrasound Med. Biol., 20: 923-936. Devonald, K. J., Ellwood, D. A,, Griffiths, K. A., Kossoff, G., Gill, R. W., Kadi, A. P., Nash, D. M., Warren, P. S., Davis, W., and Picker, R., (1995). Volume imaging: three-dimensional appreciation of the fetal head and face, J . Ultrasound Med., 14, 919-925. Downey, D. B., and Fenster, A. (1995). Vascular imaging with a three-dimensional power Doppler system. Am. J. Roentgen., 165, 665-668. Drebin, R. A,, Carpenter, L., and Hanrahan, P. (1988). Volume rendering. Cornput. Graphics., 22(4), 65-7 1. Ehricke, H.-H., Donner, K., Koller, W., and Strasser, W. (1994). Visualization of vasculature from volume data. Computers & Graphics., 18, 395-406. Ekoule, A. B., Peyrin, F. C., and Odet, C. L. (1991). A triangulation algorithm from arbitrary shaped multiple planar contours. ACM Trans. Graphics., 10(2), 182-199. Elliott, T. L., Downey, D. B., Tong, S., Mclean, C. A,, and Fenster, A. (1996). Accuracy of prostate volume measurements in vitro using three-dimensional ultrasound. Academic Radiology, 3, 401-406. Elvins T. T. (1992a). Volume rendering on a distributed memory parallel computer. P roceediizgs Visualization '92 (Cat No 92CH3201-1), pp. 93-98. Elvins, T. T. (1992b): A survey of algorithms for volume visualization. Cornpurer Graphics, 26, 194-20 1. Elvins, T. T. (1996). Volume visualization in a collaborative computing environment. Computers & Graphics., 20,219-222. Elvins, T. T., and Nadeau, D. R. (1991). NETV: An experimental network-based volume visualization system. Proceedings of Visualization 91, IEEE Computer Society, pp. 239-245. Favre, R., Nisand, G., Bettahar, K., Grange, G., and Nisand, I. (1993). Measurement of limb cicumferences with three-dimensional ultrasound for fetal weight estimation. Ultrasound Obstet. Gynecol., 3, 176-179. Feichtinger, W. (1993). Transvaginal three-dimensional imaging. Ultrasound Obstet. Gynecol., 3, 375-378. Fenster, A,, and Downey, D. B. (1996). 3-D ultrasound imaging: a review. IEEE Engineering in Medicine and Biology Magazine., 15,41-5 1. Ferrara, K. W., Zagar, B., Sokil-Melgar, J., and Algazi, V. R. (1996). High resolution 3D color flow mapping: Applied to the assessment of breast vasculature. Ultrasound Med. Biol., 22, 293-304. Fine, D.. Pemng, M. A,, Herbetko, J., Hacking, C. N., Fleming, J. S., and Dewbury, K. C. (1991). Three-dimensional ultrasound imaging of the gallbladder and dilated biliary tree: reconstruction from real-time B-scans. Brit. J. Radiol., 64, 1056-1057. Fishman, E. K., Magid, D., Ney, D. R., Chaney, E. L., Pizer, S. M., Rosenman, J. G., Levin, D. N., Vannier, M. W., Kuhlman, J. E., and Robertson D. D. (1991). Three-dimensional imaging. Radiology, 181, 321-337. Foley, J. D., and Van Dam, A. (1982). Fundun~entalsof Interactive Coniputer Graphics, ACM Press, New York. Fomage, B. D., McGavran, M. H., Duvic, M., and Waldron C. A. (1993). Imaging of the skin with 20-MHz US. Radiology, 189, 69-76.
ULTRASOUND VISUALIZATION
247
Forsberg, F., Ji-Bin, Liu, Merton, D. A,, Rawool, N. M., and Goldberg, B. B. (1994). I n vivo evaluation of a new ultrasound contrast agent. 1994 IEEE Ultrasonics Symposiurn Proceedings (Cat No 94CH3468-6),p. 1555-8.3. Franceschi, D. Bondi, J., and Rubin, J. R. (1992). A new approach for three-dimensional reconstruction of arterial ultrasonography. J . Vusc. Surg., 15, 800-805. Fuchs, H., Levoy, M., and Pizer, S. M. (1989). Interactive visualization of 3D medical data. IEEE Computer, August, pp. 46-5 1. Fuchs, H., State, A,, Pisano, E. D., Garrett, W. F., Hirota, G., Livingston, M., Whitton, M. C., and Pizer, S. M. (1996). Towards performing ultrasound-guided needle biopsies from within a headmounted display. Visualization in Biomedical Computing 4th Iizternational Corzfererzce. VBC '96 Proceedings, pp. 591-600. Fulton, D. R., Marx, G. R., Pandian, N. G.. Romero, B. A,, Mumm, B., Gauss, M., Wollschlager, M., Ludomirsky, A,, and Cao, Q.-L. ( 1994). Dynamic three-dimensional echocardiographic imaging of congenital heart defects in infants and children by computer-controlled tomographic parallel slicing using a single integrated ultrasound instrument. E(,hocardiography, 11(2), 155-164. Ganapathy, U., and Kaufman, A. (1992). 3D acquisition and visualization of ultrasound data. Pruceedings of the SPIE Conjereitce on Visualization irz Biuniediccd Computing, SPIE 1808, pp. 535-545. Garrett, W. F., Fuchs, H., Whitton, M. C., and State, A. (1996). Real-time incremental visualization of dynamic ultrasound volumes using parallel BSP trees. Proceedings Visuahatiori '96 (IEEE Cat No 96CB36006),pp. 235-240,490, Geiser, E. A,, Christie, L. G., Conetta, D. A,, Conti, C. R., and Gossman, G. S. (1982a). A mechanical arm for spatial registration of two-dimensional echocardiographic sections. Cutheteriztrtioiz and Cardiovascular Diagnosis, 8, 89-101. Geiser, E. A,, Ariet, M., Coneetta, D. A,. Lupkeiwcz, S. M., and Christie, L. G. (1982b). Dynamic three-dimensional echocardiographic reconstruction of the intact human left ventricle: Technique and initial observations in patients. Ant. Heart J . , 103, 1056-1065. Ghosh, A.. Nanda, N. C., and Maurer, G. (1982). Three-dimensional reconstruction of echo-cardiographic images using the rotation method. Ultrasound Med. Biol.,8(6), 655-661. Gilja, 0. H., Thune, N., Matre, K., Hausken, T., Odegaard, S., and Berstad, A. (1994). In vitro evaluation of three dimensional ultrasonography in volume estimation of abdominal organs. Ulrrusound. Med. Biol.,20, 157-165. Goldberg, B. B., Liu, J., and Forsberg, F. (1994). Ultrasound contrast agents: A review. Litti-usound M e d . Biol., 20, 319-333. Gorfu, Y., and Schattner, P. (1992). Parallel computation for rapid reconstruction of volumetric ultrasonic images. Proc. IEEE Workshop uri Accoustics and Signal Processirzg. SRI International, Menlo Park, California. Greenleaf, J. F. (1982). Three-dimensional imaging in ultrasound. Jourrzul of Medical S y s t e m , 6, 579-589. Greenleaf, J. F., Belohlavek, M., Gerber, T. C., Foley, D. A., andSeward, J. B. (1993). Multidimensional visualization in echocardiography: An introduction. Mayo Cliri.Proc., 68, 213-219. Guo, Z., and Fenster, A. (1996). Three-dimensional power Doppler imaging: A phantom study to quantify vessel stenosis. Ultrasourid Med. Biol.,22, 1059-1069. Hamper, U. M.,Trapanotto, V., Sheth, S., DeJong, M. R., andcaskey, C. I. (1994). Three-dimensional US: preliminary clinical experience. Rudiology, 191, 397-401. Hashimoto, H., Shen, Y., Takeuchi, Y., and Yoshitome, E. (1995). Ultrasound 3-dimensional image processing using power Doppler image. 1995 IEEE Ultrasonics Symposium Proceedings An Interriatiorzal Symposium (Cat N o Y5CH35844),p. 1423-6.2. Herman, G. T., Vose, W. F., Gomori, J. M., and Gefter, W. B. (1985). Stereoscopic computed threedimensional surface displays. RadioGraphics, 5(6), 825-852.
248
THOMAS R. NELSON
Hernandez, A., Basset, O., Dautraix. I., Magnin, I., Favre, and C., Gimenez, G. (1995). Stereoscopic visualization of 3D ultrasonic data for the diagnosis improvement of breast tumors. 1995 IEEE U1rrasonic.s Symposium Proceedings A n International Synzposiu~n (Cat No 95CH35844), p. 1435-8.2. Hernandez, A., Basset, O., Chirossel, P., and Gimenez, G. (1996a). Spatial compounding in ultrasound imaging using an articulated scan arm.Ulrrasourrd Med. Bid., 22,229-238. Hernandez, A., Basset, O., Dautraix, I., and Magnin, 1. E. (1996b). Acquisition and stereoscopic visualization of three-dimensional ultrasonic breast data. IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 43, 576-580. Hodges, T. C., Detmer, P. R., Bums, D. H., Beach, K. W., and Strandness, D. E., Jr. (1994). Ultrasonic three-dimensional reconstruction: in vitro and in vivo volume and area measurement. Ultrasound Med. Biol., 20, 719-729. Howry, D. H., Posakony, G., Cushman, R., and Holmes, J. H. (1956). Three dimensional and steroscopic observation of body stmctures by ultrasound. J . AppE. Physiol., 9, 304-306. Hughes, S. W., D’Arcy, T. J., Maxwell, D. J., Chiu, W., Milner, A., Saunders, J. E., and Sheppard, R. J. (1996). Volume estimation from multiplanar 2D ultrasound images using a remote electromagnetic position and orientation sensor. Ultrasound Med. Biol., 22,561-572. Hiinerbein, M., Below, C., and Schlag, P. M. (1996). Three-dimensional endorectal ultrasonography for staging of obstructing rectal cancer. Dis. Colon Rectum, 39(6), 636-642. Jian-Yu, L., and Greenleaf, J. F. (1994). A study of two-dimensional array transducers for limited diffraction beams. IEEE Transactions 011 Ultrasonics, Ferroelectrics and Frequency Control, 41, 724-739. Jones, M. W., and Min Chen (1995). Fast cutting operations on three dimensional volume datasets. Visualization in Scientific Cornputing, pp. 1-8. Kajiya, J. T., and Von Herzen, B. P. (1984). Ray tracing volume densities. Coi?rput. Graphics, 10(3), 165-173. Kaufman, A. E. (1996). Volume visualization. ACM Coniputing Surveys, 28, 165-167. Kaufman, A. E., Sobierajski, L. M., Avila, R. S., and Taosong H., (1993). Navigation and animation in a volume visualization system. Models and Techniques irz Coinputer Anirrrarion, pp. 64-74. Kaufman, A,, Hohne, K. H., Kruger, W., Rosenblum, L., and Schroder, P. (1994). Research issues in volume visualization. IEEE Conymer Graphics and Applications, 14.63-67. Kelly, I. M., Gardner, J. E., Brett, A. D., Richards, R., and Lees, W. R. (1994). Three-dimensional US of the fetus-work in progress. Radiology, 192, 253-259. Kim, J. J., and Jeong, Y . C. ( I 996). An efficient volume visualization algorithm for rendering isosurface and ray casted images. 1996 Pucific Graphics Corlference Proceedings, pp. 9 1-105. King, D. L., King, D. L., Jr.. and Shao, M. Y. C. (1990). Three-dimensional spatial registration and interactive display of position and orientation of real-time ultrasound images. J . Ultrasound Med., 9,525-532. King, D. L., King, D. L. J., and Shao, M. Y. (1991). Evaluation of in vitro measurement accuracy of a three-dimensional ultrasound scanner. J . Ultrasound.Med., 10, 77-82. King, D. L., Hamson, M. R., King, D. L., Jr., Gopal, A. S., Kwan, 0. L., and Demaria, A. N. (1992). Ultrasound beam orientation during standard two-dimensional imaging: Assessment by threedimensional echocardiography. J . Am. Soc. Echocardiogr., 5, 569-576. King, D. L., Gopal, A. S . , Sapin, P. M., Schroder, K. M., and Demaria, A. N. (1993). Three-dimensional echocardiography. Ani. J . Card. Inzaging, 3, 209-220. Kirbach, D., and Whittingham, T. A. (1994). 3d Ultrasound-the Kretztechnik VolusonB Approach. European Journal of Ultrasound, 1, 85-89. Kitney, R. I., Moura, L., and Straughan. K. (1989). 3-D visualization of arterial structures using ultrasound and Voxel modelling. Intern. J . Cardiuc lniag., 4, 135-143. Klein, H.-M., Giinther, R. W.. Verlande, M., Schneider, W., Vorwerk, D., Kelch, J. and Hamm, M.
ULTRASOUND VISUALIZATION
249
(1992). 3D-surface reconstruction of intravascular ultrasound images using personal computer hardware and a motorized catheter control. Cardiovasc. Intervent. Radiol., 15, 97-101. Kluytmans, M., Bouma, C. J., ter Haar Romeny, B. M., Pasterkamp, G., and Viergever, M. A. (1995). Analysis and 3D display of 30 MHz intravascular ultrasound images. Coniputer Vision, Virtual Reality arid Robotics in Medicine Fir.sr Iriternutional Conference, CVRMed '95 Proceedings, pp. 406-412. Kossoff, G. (1995). Three-dimensional ultrasound-technology push or market pull? Ultrusound Obstet. Gynecol., 5, 217-218. Kossoff, G., Griffiths, K. A., and Kadi, A. P. (1995). Transducer i-otation:a useful scanning manoeuvre in three-dimensional ultrasonic volume imaging. Radiology, 195, 870-872. Kremkau, F. W. (1989). Diagnostic ulrrusourid: Principles, Instrumenfs,and Exercises.Third edition. W. B. Saunders Company, Philadelphia, Pennsylvania. Kremkau, F. W. (1990). Doppler ultrasound: Principles and instruments (W. P. Saunders Staff, Eds.). W. B. Saunders Company, Philadelphia, Pennsylvania. Krestel, E. (1990). Inzugirzg Systemsfor Medical Diagnostics. Siemens Aktiengesellschaft, Berlin. Kuo, H. C., Chang, F. M., Wu, C. H., Yao. B. L., and Liu, C. H. (1992). The primary application of three-dimensional ultrasonography in obstetrics. Am. J . Obsret. Gynecol., 166,880-886. Lateiner, J. S., and Rubio, S., I1 (1994). An efficient environment for stereoscopic volume visualization. lnteiiigent Engineering Sysrems Through Artificial Neural Networks, 4 76 1-766. Laur, D., and Hanrahan, P. (1991). Hierarchical splatting: A progressive refinement algorithms for volume rendering. Conzput. Graphics., 25(4), 285-288. Levine, R. A., Weyman, A. E., and Handschumaker, M. D. (1992). Three-dimensional echocardiography: techniques and applications. Am. J . Cardiol., 69, 121H-130H. Levoy, M. (1988). Display of surfaces from volume data. IEEE Computer grapliics arid Applications., 8(3), 29-37. Levoy, M. (1990a). Volume rendering, a hybrid ray tracer for rendering polygon and volume data. IEEE Computer Graphics arid Applicutions, 10, 33-40. Levoy, M. (1990b). Efficient ray tracing of volume data. ACM Transurtions on Gruphics, 9,245-261. Levoy, M. (1992). Volume rendering using the Fourier projection-slice theorem. Proceedings Graphics Inmface '92, pp. 61-69. Lorensen, W. E., and Cline, H. E. (1987). Marching cubes: A high resolution three-dimensional surface construction algorithm. Conipuf. Graphics., 21(7), 163-1 69. McCann, H. A,, Chandrasekaran, K., Hoffman, E. A,, Sinak, L. J., Kinter, T. M., and Greenleaf, J. F. (1987). A method for three-dimensional ultrasonic imaging of the heart in vivo. Dynamic Cardiovusc Imaging, 1,97-109. McCann, H. A,, Sharp, J. C., Kinter, T. M., McEwan, C. N., Barillot, C., and Greenleaf, J. F. (1988). Multidimensional ultrasonic imaging for cardiology. Proceedings of rhe IEEE, 76, 1063- 1073. Macovski, A. (1983). Basic ultrasonic imaging. In Medical Imaging Systems, pp. 173-224. PrenticeHall, Englewood Cliffs, New Jersey. Malzbender, T. (1993). Fourier volume rendering. ACM Transactions on Graphics, 12, 233-250. Martin, R. W., Legget, M., McDonald, J.. Li, X.-N, Leotta, D., Bolson, E., Bashein, G., Otto, C. M . , and Sheehan, F. H. (1995). Stereographic viewing of 3D ultrasound images: a novelty or a tool? 1995 lEEE Ultrasonics Syrnposiun? Proceedirigs AIZ lnteniatiotial Symposium (Cat N o 95CH35844), pp. 1431-4.2. Maslak, S. H., and Freund, J. G. (199 1). Color Doppler instrumentation. In Vascular Imaging by Color Doppler and Mugnetic Resonance, (P. Lanza, A. P. Yoganathan Eds.), Springer-Veriag, Berlin, pp. 87-123. Merz, E., Bahlmann, F., and Weber, G. (1995). Volume scanning in the evaluation of fetal malformations: a new dimension in prenatal diagnosis. Ulrrusound Obstet. Gyzecol., 5, 228-232. Moritz, W. E., Medema, D. K., Ainsworth, M. E., McCabe, D. H., and Pearlman, A. S. (1980). Three-
250
THOMAS
R. NELSON
dimensional reconstruction and volume calculation from a series of nonparallel, real-time, ultrasonic images. Circulation, 62 (Suppl.), 111-143. Moritz, W. E., Pearlman, A. S., McCahe, D. H., Medema, D. K., Ainsworth, M. E., Boles, and M. S. (1993). An ultrasonic technique for imaging the ventricle in three dimensions and calculating its volume. IEEE Trans. Bioined. Engiiz., 30(8), 482-491. Moskalik, A,, Carson, P. L., Meyer, C. R., et al. (1995). Registration of three-dimensional compound ultrasound scans of the breast for refraction and motion corrections. Ulrrasound Med. Biol., 21, 769-778. Nelson, T. R. (1995). Synchronization of time-varying physiological data in 3DUS studies. Med. Physics., 22(6), 973. Nelson, T. R., and Elvins, T. T. (1993). Visualization of 3D ultrasound data. fEEE Computer Graphics and Applications, 13, 50-57. Nelson, T. R., and Pretorius, D. H. (1992). Three-dimensional ultrasound of fetal surface features. Ultrasound Obstet. Gyizecol., 2, 166-174. Nelson, T. R., and Pretorius, D. H. (1993). 3-dimensional ultrasound volume measurement. Med. Physics., 201(3), 927. Nelson, T. R., and Pretorius, D. H. (1995): Visualization of the fetal thoracic skeleton with threedimensional sonography: A preliminary report. AJR, 164, 1485-1488. Nelson, T. R., and Pretorius, D. H. (1997). Interactive acquisition, analysis, and visualization of sonographic volume data. International Journal of Iniagirzg Systems arid Technology, 8,26-37. Nelson, T. R., Davidson, T. E., and Pretorius, D. H. (1995). Interactive electronic scalpel for extraction of organs from 3DUS data. Radiology. 197(P), p. 191. Nelson, T. R., Pretorius, D. H., Sklansky, M., and Hagen-Ansert, S . (1996). Three-dimensional echocardiographic evaluation of fetal heart anatomy and function: Acquisition, Analysis, and Display. J . Ultrasound Med., 15, 1-9. Ng, K. J., Gardener, J. E., Rickards, D., Lees, W. R., and Milroy, E. J. (1994a). Three-dimensional imaging of the prostatic urethra-an exciting new tool. Br. J . Urology, 74(5), 604-608. Ng, K. H., Evans, J. L., Vonesh, M. J., Meyers, S. N., Mills, T. A., Kane, B. J., Aldrich, W. N., Jang, Y. T., Yock, P., Rold, M. D., Roth, S. I., and McPherson, D. D. (1994h): Arterial imaging with a new forward-viewing intravascular ultrasound catheter, 11. Three-dimensional reconstruction and display of data. Circulation, 89,718-723. Nikravesh, P. E., Skorton, D. J., Chandran, K. B., Attarwala, Y. M., Pandian, N., and Kerber, R. E. (1984). Computerized three-dimensional finite element reconstruction of the left ventricle from cross-sectional echocardiograms. Ultrasonic fniag., 6, 48-59. Ofili, 0. E., and Nanda, N. C. (1994). Three-dimensional and four-dimensional echocardiography. Ultrasound Med. Biol.,20(8), 669-675. Pandian, N. G., Nanda, N. C., Schwartz, S . L., Fan, P., and Cao, Q. (1992). Three-dimensional and four-dimensional transesophageal echocardiographic imaging of the heart and aorta in humans using a computed tomographic imaging probe. Eclzocardiography, 9,677-687. Pasterkamp, G., Borst, C., Moulaert, S. R., Bouma, C. J., van Dijk, D., Kluytmans, M., ter Haar and Romeny, B. M. (1995). Intravascular ultrasound image subtraction: A contrast enhancing technique to facilitate automatic three-dimensional visualization of the arterial lumen. Ultrasound Med. Biol., 21(7), 913-918. Pearson, A. C., and Pasierski, T. (1991). Initial clinical experience with a 48 by 48 element biplane transesophageal probe. Am. Heart J., 122,559-568. Picot, P. A,, Rickey, D. W., Mitchell, R., Rankin, R., and Fenster, A. (1993). Three-dimensional colour Doppler imaging. Ultrasound Med. Biol.,19,95-104. Pratt, W. K. (1991). Digital Image Processing, J. Wiley, New York Pretorius, D. H., and Nelson, T. R. (1991). 3-Dimensional ultrasound imaging in patient diagnosis and management: The future. Ultrasound Obstet. Gyizecol., 1(6), 381 -382.
ULTRASOUND VISUALIZATION
25 1
Pretorius, D. H., and Nelson, T. R. (1994). Prenatal visualization of cranial sutures and fontanelles with three-dimensional ultrasonography. J . UltrusourzdMed., 13,87 1-876. Pretorius, D. H.. and Nelson, T. R., (199Sa). Three-dimensional ultrasound, Ultrasound Obstet. Gynecol., 5, 219-221 Pretorius, D. H., and Nelson, T. R. (1995b). Fetal face visualization using three-dimensional ultrasonography. J . Ultrasound Med., 14, 349-356. Pretorius, D. H., Nelson, T. R., and Jaffe. J . S. (1992). 3-Dimensional sonographic analysis based on color flow Doppler and gray scale image data: A preliminary report. 1. Ultrusound Med., 11, 225-232. Raab, F. H., Blood, E. B., Steiner, T. O., and Jones, H. R. (1979). Magnetic positioning and orientation tracking system. IEEE Trans. Aerospace and Electronic Systenis, 15, 709-7 17. Rankin, R. N., Fenster, A,, Downey, D. B., Munk, P. L., Levin, M. F., and Vellet, A. D. (1993). Threedimensional sonographic reconstruction: techniques and diagnostic applications. AJR, 161, 695-702. Riccabona, M., Nelson, T. R., Pretorius, D. H., and Davidson, T. E. (1995). Distance and volume measurement using three-dimensional ultrasonography. J . Ultrasound Med., 14,881-886. Riccabona, M., Nelson, T. R., Pretorius, D. H., and Davidson, T. E. (1996). Three-dimensional sonographic measurement of bladder volume. J . Ultrasourzd Med., 15(9), 627-632. Richard, W. D., and Keen, C. G. (1996). Automated texture-based segmentation of ultrasound images of the prostate. Cornputerized Medical Iniugitig and Graphics, 20, 131-140. Ritchie, C. J., Edwards, W. S., Mack, L. A,, Cyr, D. R., and Yongmin K. (1996). Three-dimensional ultrasonic angiography using power-mode Doppler. Ultrcrsound Med. Biol., 22, 277-286. Rohb, R. A,, and Barillot, C. (1988). Interactive 3-D image display and analysis. SPZE, 939, 173-202. Robb, R. A,, and Barillot, C. (1989). Interactive display and analysis of 3-D medical images. IEEE Trans. Med. Imaging, 8, 217-226. Rosenfield, K., Losordo, D. W., Ramaswamy, K., Pastore, J. O., Langevin, R. E., Razvi, S., Kosowsky, B. D., and h e r , J. M. (1991).Three-dimensional reconstruction of human coronary and peripheral arteries from images recorded during two-dimensional intravascular ultrasound examination. Circulation, 84, 1938-1956. Rotten, D., Levaillant. J. M., Constancis, E.. Collet-Billon, A,, LeGuerine, Y., and Rua, P. (1991). Three-dimensional imaging of solid breast tumors with ultrasound: preliminary data and analysis of its possible contribution to the understanding of the standard two-dimensional sonographic images. Ultrusound Obstet. Gyrzecol., 1,384-390. Rubin, J. M., Bude, R. 0.. Carson, P. L., Bree, R. L.. and Adler, R. S. (1994). Power Doppler US: A potentially useful alternative to mean frequency-based color Doppler US. Radiology, 190, 853-856. Russ, J. C. (1992). The Image Processiiig Handbook, CRC Press, Boca Raton, Florida. Sabella, P. (1988). A rendering algorithm for visualizing 3D Scalar fields. Coniput. Graphics., 22(4), 5 1-57. Sakas, G., and Walter, S. (1995). Extracting surfaces from fuzzy 3D-ultrasound data. Computer Graphics Proceedings SIGGRAPH 95,465-474. Sakas, G., Schreyer, L.-A., and Grimm. M. (1994). Visualization of 3D ultrasonic data. Proceedirigs Visualization '94 (Cat No 94CH357071, pp. 369-373, CP42. Sakas, G., Schreyer, L.-A,, and Grimm, M. (1995). Preprocessing and voiume rendering of 3D ultrasonic data. IEEE Computer Graphics and Applicatiorzs, 15,47-54. Salustri, A., and Roelandt, J. R. T. C, (1 995). Ultrasonic three-dimensional reconstruction of the heart. UltrasouridMed. Biol., 21, 281-293. Sander, T. S., and Zucker, S. W. (1986). Stable surface estimation. IEEE, 1165-1167. Sarti, A., Lamberti, C., Erbacci, G., and Pini, R. (1993). Volume rendering for 3-D echocardiography visualization. Proceedirzgs Computers in Cardiology 1993 (Cat No 93CH3384-51, pp. 209-21 2.
252
THOMAS R. NELSON
Schroeder, W., and Lorensen, B. (1996). 3-D surface contours. Dr Dobb's Journal, 7 , 2 6 4 6 . Schwa-, S. L., Cao, Q.-L., Azevedo, J., and Pandian, N. G. (1994). Simulation of intraoperative visualization of cardiac structures and study of dynamic surgical anatomy with real-time three-dimensional echocardiography. Am. J . Cardiol., 73, 501-507. Selzer, H., Lee, P. L., Lai, J. Y., and Frieden, H. J. (1989). Computer-generated three-dimensional ultrasound images of the carotid artery. Coniputers in Cardiology, pp. 21-26. Seward, J. B., Belohlavek, M., O'Leary, P. W., Foley, D. A., and Greenleaf, J. F. (1995). Congenital heart disease: wide-field, three-dimensional, and four-dimensional ultrasound imaging. Am. J . Cardiac Iinag., 9( l), 38-43. Shattuck, D. P., and von Ramm, 0. T. (1982). Compound scanning with a phased array. Ultrasonic. Inucgirag, 4,93-107. Siu, S. C., Rivera, J. M., Guerrero, J. L., Handschumacher, M. D., Lethor, J. P., Weyman, A.E., Levine, R. A,, and Picard, M. H. (1993). Three-dimensional echocardiography. In viva validation for left ventricular volume and function. Circulation, 88(4), 1715-1723. Smith, S. W., Trahey, G. E., and von Ramm, 0.T. (1992). Two-dimensional arrays for medical ultrasound. Ultrasonic Imaging, 14, 213-233. Snyder, J. E., Kisslo, J., and van Ramm, 0. T. (1986). Real-time orthogonal mode scanning of the heart. I. system design. J . Am. CON.Cardiol., 7, 1279-1285. State, A,, Chen, D. T., Tector, C., Brandt, A., Hong, C., Ohbuchi, R., Bajura, M., and Fuchs, H. (1994). Observing a volume rendered fetus within a pregnant patient. Proceedings Visualization '94 (Cat No 94CH35707J,pp. 364-368, CP41. State, A,, Livingston, M. A., Garrett, W. F., Hirota, G., Whitton, M. C., Pisano, E. D., and Fuchs, H. (1996). Technologies for augmented reality systems: realizing ultrasound-guided needle biopsies. Cotnputer Graphics Proceedings SIGGRAPH '96, pp. 439-446. Steen, E., and Olstad, B. (1994). Volume rendering of 3D medical ultrasound data using direct feature mapping. IEEE Transactions on Medical Irnaging, 13, 5 17-525. Steiner, H., Staudach, A., Spinzer, D., and Schaffer, H. (1994). Three-dimensional ultrasound in obstetrics and gynaecology: technique, possibilities and limitations. Human Reprod., 9(9), 1773-1778. Stickels, K. R., and Warm L. S. (1984). An analysis of three-dimensional reconstructive echocardiography. Ultrasound Med. Biol., 10,575-580. Taylor, K. J. W., Bums, P. N., and Wells, P. T. N. (1987). Clinical Applications of Doppler Ultrasourid, Raven Press, New York. ter Haar, G. (1995). Ultrasound focal beam surgery. Ultrasound Med. Biol., 21, 1089-1 100. Tems, M. K., and Stamey, T. A. (1991). Determination of prostate volume by transrectal ultrasound. J . Urology, 145,984-987. Tong, S., Downey, D. B., Cardinal, H. N., and Fenster, A. (1996). A three-dimensional ultrasound prostate imaging system. Ulrrasourui Med. Biol., 22, 735-746. Trahey, G. E., Allison, J. W., Smith, S. W., and Van Ramm, 0. T. (1986). A quantitative approach to speckle reduction via frequency compounding. Ultrasonic Imaging, 8, 151-164. Turnbull, D. H., and Foster, F. S. (1992). Two-dimensional transducer arrays for medical ultrasound: beamforming and imaging. Proceedings of rhe SPIE-The hifernufional Society for Optical Engineering, 1733,202-2 15. Udupa, J., Hung, H. M., and Chuang, K. S. (1991). Surface and volume rendering in 3D imaging: A comparison. J . Digital Imaging, 4, 159-168. Vogel, M., Ho, S. Y., Buhlmeyer, K., and Anderson, R. H. (1995). Assessment of congenital heart defects by dynamic three-dimensional echocardiography; methods of data acquisition and clinical potential. Acta. Paediatr., 410 (Suppl.), 34-39. von Birgelen, C., Di Mario, C., Reimers, B., Prati, F., Bruining, N., Gil, R., Senuys, P. W., and Roelandt, J. R. T. C. ( 1996). Three-dimensional intracoronary ultrasound imaging. Methodology
ULTRASOUND VISUALIZATION
253
and clinical relevance for the assessment of coronary arteries and bypass grafts. J . Cardiavasc. Surg., 37, 129-139. von Ramm, 0. T., and Smith, S. W. (1990). Real time volumetric ultrasound imaging system. SPIE, 1231,lS-22. von Ramm, 0. T., Pavy, H. G., Jr., Smith, S. W., and Kisslo, J. (1991). Real-time, three-dimensional ecbocardiography: The first human images. Circrr/atio/i 84(Supp1.2), Il-h85(Abstract). van Ramm, 0. T., Durham, N. C., Smith, S. W.. and Carroll, B. A. (1994). Real-time volumetric US imaging. Radiology, 193 (P), 308. Wang, X. F., Li, Z. A., Cheng, T. O., Deng, Y. B., Zheng, L. H., Hu, G., and Lu, P. (1994). Clinical application of three-dimensional transesophageal echocardiography. Aniericari Heart Journal, 128, 381-389. Warren, P. S., Davis, W., and Picker, R. (1995). Volume imaging: three-dimensional appreciation of the fetal head and face. J . Ulrra.sourid Med.. 14,919-925. Watt, A,, and Watt, M. (1992). Volume rendering techniques. In Advariced Anirizariorz arid Rendering Techniques: Theory arid Practice, (P. Wegner. Ed.). pp. 297-321. ACM Press, New York. Wells, P. N. T. (1993). The present status of ultrasonic imaging in medicine. Ultrasonirs, 31, 345-353. Wells, P. N. T. (1994). Ultrasonic colour flow imaging. Physics in Medicine and Bidogy, 39, 21 13-2145. Wells, P. N. T., and Halliwell, M. (1981). Speckle in ultrasonic imaging. Ultra.miics, 19,225-229. Wolff, R. S. (1992a). Volume visualization I: Basic concepts and applications. Conipur. Physics., 6(4), 42 1-426. Wolff, R. S . (1992b). Volume visualization I!: Ray-tracing of Volume Data. Conrput. Pl7ysIc.r.. 6(6), 692-695. Wolff, R. S. (1993). Volume visualization 111: Polygonally Based Volume Rendering. Compur. Phy.vics., 7(2), 158-161. Wood, S. L. (1992). Visualization and modeling of 3-D structures. IEEE. Engirieer. Med. B i d . June, pp. 72-79. Zagzebski, J. B. (1983). Images and artifacts. In Textbook ofdiagiiastic ultrosoriic~s, (S. L. Ansert Ed.), 2nd edn, pp. 58-60. Mosby Company, St Louis, Missouri. Zhenyu Guo, Moreau, M., Rickey, D. W.. Picot. P. A,, and Fenster, A. (1995). Quantitative investigation of in vitro flow using three-dimensional colour Doppler ultrasound. Ulrrasourid Med. B i d . , 21. 807-816. Zosmer, N., Jurkovic, D., Jauniaux, E., Griiboeck, K., Lees, C., and Campbell, S. (1996). Selection and identification of standard cardiac views from three-dimensional volume scans of the fetal thorax. J . Ultrasourid Med., 15, 25-32.
This Page Intentionally Left Blank
Patterns and System Development BRANDON GOLDFEDDER Emerging Technologies Consultants Inc. Mt Laurel, New Jersey
Abstract The practice of developing software has changed significantly since early projects that had a set of fixed requirements, built a system, and then expected that system to be maintained for many years. Now we have market forces that drive competitiveness and a more mature user community with ever increasing expectations. This user community continues to expect systems to be built faster and more flexible than ever before. In developing systems, experienced developers find themselves continuing to face problems that are identical to, or similar to, problems they have faced in the past. One of the major challenges we have is how to describe these problems and potential solutions so that they can be accessed and applied in a systematic manner. Patterns provide a possible means of capturing (or harvesting) this experience into a form that is readily transferable. This chapter provides an overview of what patterns are and how they can be applied as an invaluable tool in developing robust and extensible systems.
1. What are Patterns? . . . . . . . . . . . . . . . . . . . 2. Analysis: What is a pattern? . . . . . . . . . . . . . . 2.1. Name . . . . . . . . . . . . . , . . . . . . . 2.2. A Diversion on Form and Context . . , . . . 2.3. Problem . . . . . . . . . . . . . . . . . . . . 2.4. Solution . . . . . . . . . . . . . . . . , . . . 2.5. Forces . . . . . . . . . . . . . . . . . . . . . 2.6. Context . . . . . . . . . . . . . . . . . . . . 2.7. Resulting Context . . . . . . . . . . . . . . . 2.8. Sketch/Diagram . . . . . , . . . . . . . . . . 2.9. Rationale . . . . . . . . . . . . . . . . . . . 2.10. Known Uses . . . . . . . . . . . . . . . . . . 3 . An Example Pattern: HandsInView . . , . . . , . . . 4. Okay-So What Does This Have to do with Software? 5. Applying Patterns . . . . . . . . . . . . . . . . . . . 5.1. Solution1 . . . . . . . . . . . . . . . . . . . . 5.2. S o l u t i o n 2 . . . . . , . . . , . . , . . . . . . . 5.3. Analysis of choices . . . . . . . . . . . . . . . 6. Beware: Misapplication of Patterns . . . . . . . . . . ADVANCES INCOMPUTERS, VOL 47 ISBN 0-12-012147-6
255
. . . .
.. . . . . . . . . . , . . ... .. . ... ... ... ... ... . . ... . .. . . . ... . . .
. . . .
. . . . . . . . . . . . . .
.
.. .. .. .. .. .. .. .. . . .. .. .. . . .. .. .. . . .. ..
... .. ... . . . ... . .. . . . ... ... ... . .. .. . ... ... .. . . .. .. . . . . . . .
.
. . .. .... . . .. . . .. . ... . . .. .... . . . . .... .. .. ... . .. .. .... .... .. .. ... . . ... .. .. . . . .
.. . .. . . .. .. .. .. .. .
. . . . . .
. .
.. .. .. . . .. ..
256 256 257 257 259 259 259 26 1 26 1 261 26 1 262 262 263 267 267 273 280 280
Copyright 0 1998 by Academic Press All rights of reproduction in any form reyewed
256
BRANDON GOLDFEDDER
RealityCheck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages of Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applying Patterns in the Development Process . . . . . . . . . . . . . . . . . . . Frameworks and Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Capturing Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1. Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Capturing the Pattern Form . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3. Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4. Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. WhereNow?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1. lnternet Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Special Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . .
7. 8. 9. 10. 1 I.
280 282 283 284 285 285 286 288 290 290 290 290 291 29 1
1. What are Patterns? A pattern is a recurring architectural theme that provides a solution to a problem within a particular context. A good pattern can capture a successful expert practice in an accessible and systematic manner and then can be later applied in developing systems to describe these solutions. Patterns can describe software abstractions at a much higher level than traditional techniques, allowing developers to focus on the trade-offs involved in design by having a large body of proven solutions available to them, rather than spending their time reinventing known solutions. Today, patterns are being used to capture, understand and document system architectures and frameworks. This applies to new systems as well as a variety of legacy systems so that these systems can be better understood. Patterns are also being used to better understand organizations and processes that affect the software development process. Patterns are becoming more important as we recognize that there is a developing base of literature describing known solutions and that a significant commonality exists in the solutions we regularly apply. Projects are now beginning to see the power of expression that comes from embracing patterns. Additionally, we are faced with many of our more knowledgeable developers leaving their present companies, either to move on to other opportunities or to retire. Their knowledge, if lost, presents a significant risk to their companies. Capturing of their knowledge through patterns is being examined as one way to reduce this risk, allowing this knowledge to be shared with others.
2. Analysis: What is a pattern? The pattern is at the same time a thing, which happens in the world, and the rule, which tells us how to create that thing and when we must create it. It is both a
PATTERNS AND SYSTEM DEVELOPMENT
257
process and a thing; both a description of a thing which is alive, and a description of the process that will generate that thing. [Ale791
There are two major pieces to a pattern, the idea that it conveys and how the writer conveys it. An idea that has proven to be successful over time that cannot be conveyed to the reader is useless. By the same token, a well-written pattern that fails to work in the real world is worse than useless’ since it may send teams in the wrong direction. First and foremost, a pattern must connect itself to the readers’ mind so that it can be easily recalled and applied. As Alexander states: We must make each pattern a thing so that the human mind can use it easily and so that it may take its part among the other patterns of our pattern language. [Ale791
There are several formats in use to describe patterns. The two most common are GOF Form (“Gang of Four” meaning the format used by Gamma, Helm, Johnson and Vlissides, the authors of Design Patterns [Gam95]) and variations of the Portland Form (so named because the three original submitters in this form were from Portland, Oregon). Regardless of the actual form, a good pattern format should contain these elements (although not all forms make them explicit): 0 0
0 0
0 0 0 0 0
name; context; problem; solution; forces; resulting context; sketchtdiagram; rationale; known uses.
We will describe each of these and their importance below.
2.1
Name
A pattern name should be short yet descriptive. It should strive to be a simple tangible item so that it creates a meaningful image in the readers’ mind. So long as a pattern has a weak name, it means that it is not a clear concept, and you cannot clearly tell me to make “one.” [Ale791
2.2
A Diversion on Form and Context
A pattern consists of two parts: A form and a context (Fig. 1). The form is the solution to a problem; the context defines the problem. We can better view the
’
Since by definition a “pattern” must have stood the test of time, having proved itself in real systems, this should not occur.
258
BRANDON GOLDFEDDER
FIG.1. Form and context.
form as the mapping of Problem to Solution and the context describing the fit in which the form is appropriate. It is important to realize that the division between form and context is arbitrary or at the very least highly mutable in software development. The form is a part of the world over which we have control, and which we decide to shape while leaving the rest of the world as it is. The context is that part of the world which puts demands on this form. [Ale641
The form is the part of the world in which we have control. The context puts demands on the form and we assume it is something over which we do not have control. In reality, we have control over it as well, but cannot hope to handle both the form and context of the universe in one swoop. Think of the form as a puzzle piece. We want there to be a smooth fit to make the puzzle itself more complete and stable. If we try to force the wrong piece, we risk breaking the whole puzzle. We can all think of modules of code that made the entire system stronger and more cohesive. Unfortunately, we can think of many more areas where these appear to have been applied by pure brute force, the result compromising the system. Our goal is to create an environment where each pattern becomes part of the whole- strengthening it but also living off it, much as the relationship between people and plants strengthens both. A pattern does not exist in a vacuum. A pattern depends not only on the specific pattern itself but also on those patterns that it is composed of and those that compose it. These dependencies form connections that create a “pattern language.” In the same way that a small number of words lead to the ability to express many complex ideas, a small number of patterns can yield many complex systems. The proper application of patterns should create a system, which is in order, and complete, but still extensible. A system of harmony should exist (Fig. 2).
PATTERNS AND SYSTEM DEVELOPMENT
259
FIG.2. An example system in harmony.
2.3 Problem This is the specific problem which this pattern solves. While a problem statement should be somewhat abstract, it should avoid being too abstract up-front. Often a motivation section showing a concrete version of this problem aids the reader.
2.4
Solution
A pattern is a solution to a given problem. There may be many solutions to a problem but the context should be distinct enough to yield a common solution. The solution may lead to a set of other problems that must be addressed by the application of other patterns. (This forms a pattern language). There are millions of particular solutions to any given problem; but it may be possible to find some property, which will be common to all these solutions. [Ale791
2.5 Forces Form (a Problem +Solution mapping) has been described as the diagram of forces2 in which the forces are kept in check (Fig. 3). When we introduce context (Fig. 4), it allows us to resolve the strong forces yielding a more stable system. We can think of these forces as considerations that the pattern must reconcile. A good pattern should resolve the forces and build on other patterns making them stronger. The right piece strengthens; the wrong one weakens. Gerard Meszaros defines forces as: The often contradictory considerations that must be taken into account when choosing a solution to a problem. The relative importance of the Forces (those that need to be optimized at the expense of others) is implied by the context. [Mes96]
* D’Arcy Thompson is given credit for coining this phrase
260
BRANDON GOLDFEDDER
(Problem ->Solution) mappings
FIG.3.
Understanding the forces at work in a situation is the key to developing a pattern. As Alexander states: A pattern only works, fully, when it deals with all the forces that are actually present in the situation. [Ale791
He also notes that: If you can’t draw a diagram [of forces], it isn’t a pattern. [Ale791
Again view the form as a puzzle piece. We can use the forces to help determine the appropriateness of each piece.
FIG.4.
PATTERNS AND SYSTEM DEVELOPMENT
26 1
FIG.5.
2.6
Context
The constraints on the solution. Context serves to prioritize the forces in a pattern leading to different forms. This variation in form may lead to different solutions for the same problem. Context may include: 0 0
0
environmental issues; language issues; organizational issues; platform issues.
2.7
Resulting Context
When the pattern is applied, the system as a whole changes. Often, other patterns need to be applied afterwards. The resulting context should be more meaningful than simply a statement that the problem was solved.
2.8
Sketch/Diagram
Most patterns need a sketch or other visual method of expression. This can be a picture, object notation or rough sketch. The important consideration is that people tend to connect with graphical information at a much more important level.
2.9
Rationale
A good pattern works without requiring the appiier to consciously think about the details as to why it is working. It is essential, though, that the rationale be present to understand why it works, so that we can better understand other places where it can be applied and, more importantly, to give the reader faith that it works.
262
BRANDON GOLDFEDDER
2.10
Known Uses
Patterns are not invented. Instead they are discovered by looking at known solutions to significant problems. Because of this, it is essential that known uses be present to understand where this pattern was “harvested” from.
3. An Example Pattern: HandslnView Let’s look at an example pattern (Pattern 1) from a domain far removed from software development. This pattern, written by Don Olson at AG Communication Systems is one of the best examples of a “good” pattern I have seen. It provide all the essential elements and demonstrates that “Quality without a name” referred to by Alexander [Ale79].
~
~
Pattern Name: Hands in View Problem The skier fails to commit downhill on steeps and bumps, resulting in slides, backward falls, and “yard sales.” Context In order to explore the entire mountain environment, a skier must be comfortable and adaptable to any terrain and rapid terrain change. To take advantage of this pattern, the skier should be skiing at a level at which parallel turns can be linked consistently. Forces 0 Fear of falling is the most basic of all responses. 0 Reliance on equipment is essential. 0 Continuous movement is essential. 0 Fatigue can be a factor in long descents. 0 Commitment downhill over skis is essential for skis to function as designed. Solution Concentrate on keeping the hands in view. Bring then into sight immediately after each pole plant and turn. Resulting Context Keeping the hands in view changes the alignment of the body from sitting timidly back and allowing the edges to skid out from under the skier. Thus,
PATTERNS AND SYSTEM DEVELOPMENT
263
keeping the hands in view pulls the body forward and thus downhill, bringing the skier’s weight over the downhill ski, forcing the edge to bite and turn.
Rationale As steepness increases, the natural tendency of any sane person is to sit back against the hill and retain the perpendicularity the inner ear prefers. Unfortunately, skis must be weighted to perform as designed, the weight causing flex, which in turn pushes the edges into the snow in an arc, making a turn. Therefore, it is essential to “throw” oneself down the mountain and over the skis, depending on them to “catch” the fall as they bite into the snow to turn underneath the perpetually falling skier. Intellectually this can be clearly understood but fear prevents execution. Concentrating on something as simple and indirect as “look at your hands” causes the desired behavior without directly confronting the fear. This is directly analogous to what occurs when an individual walks: the weight is thrown forward in a fall, with the consequent forward thrust of the leg to catch this fall, repeated for left and right sides in a continuous tension and release of yielding to gravity in order to defy it. Known Uses Successful reuse efforts 0 Related Patterns Single Point of Documentation Sketch N/A Author(s) Don Olson 95/07/07 Anonymous ski instructor somewhere in Utah. Wherever you are, thanks for providing the breakthrough to better skiing for the author.
PATTERN1. 4.
Okay-So
What Does This Have to do With Software?
First, let’s look at what goes on in a typical design review with a new team. Often there is a fair amount of discussion on the implementation details and language-specific issues (Fig. 6). This is because the pattern language that programmers are used to dealing with is devoted to implementation problems. Our developers have a good working vocabulary of implementation terms and
264
BRANDON GOLDFEDDER
FIG.6. A design? review
concepts but are pitifully deficient in communicating design concepts. Probably the number one reason for this is the preoccupation with the text editor/compiler model that, for many developers, is the only understood tool. This results in the thinking process limiting itself to language syntax issues. Often design problems are solved in the code, resulting in awkward, hard to maintain cases, casting, and obtuse code. As Alexander states: When a person is faced with an act of design, what he does is governed entirely by the pattern language which he has in his mind at that moment. Of course, the pattern languages in each mind are evolving all the time, as eachperson’s experience grows. But at the particular moment he has to make a design, he relies entirely on the pattern language he happens to have accumulated up until that moment. His act of design, whether humble or gigantically complex, is governed entirely by the patterns he has in his mind at that moment, and his ability to combine these patterns to form a new design. [Ale791
One reason that I believe this occurs is that I see several mistakes made in choosing a system software architect:
(1) Forgetting that a good software architect must also be a good programmer.3
I have seen software architects who really did not understand the languages and tradeoffs involved in the underlying tools, languages, etc. I realize more and more the powerful impact these provide. (2) Assuming that just because someone is a good programmer they are a good architect. This directly corresponds to Alexander’sbelief that an architect must also be a builder.
PATTERNS AND SYSTEM DEVELOPMENT
265
I recently heard a comment that “Bill must be a good architect, he used to write compilers.” At first this sentence made sense to me until I thought about it. Are the skill sets the same? There is a fundamental disconnection between programming and design in the communication activities. Programming is an independent exercise where the communication is between programmer and machine. Design, however, is a different exercise where the communication is between individuals. Many developers make the mistake of assuming that the syntax and semantics we use in speaking of implementation is rich enough to describe that of design. One common misunderstanding among developers is that design patterns are about “code reuse”, rather than design reuse because in software development the end product happens to be code. St’s surprising to me when I observe a new project that the team seems never to have developed a software system before, and spends a fair amount of time before productivity seems to set in. I believe that much of this time is spent developing a language that they can use.4 Consider how much more effective is a team that is kept together on their second and subsequent projects. They often refer to pieces they have used in other projects in solving new solutions. They start to relate solutions to each other in terms of other pieces they have built, and refer to these cases. This greatly assists those developers who understand the older project, AFNS,’ but does nothing to assist those who do not (Fig. 7). However, there are several problems with this approach:
(1) The way in which people understand AFNS may differ. (2) The solution applied in AFNS may not be appropriate here. A natural tendency is to reuse known solutions. However, without knowing the context in which the solution worked we run a significant risk. S have observed several systems fail because they were unaware that the context of the previous approach did not apply. (3) Anyone unfamiliar with AFNS is out of luck. There are two aspects we need to address. The first is that among the patterns in the developer’s mind, a proper one is present. The second is that it is chosen over other possibly less appropriate solutions. This is where the importance of context comes in. It is important that the developer have the proper experience to evaluate the set of patterns available and, based on the context, choose the correct one. Although, as I recently discovered, quite painfully, setting up the development environment properly accounts for far more time than is understood. AFNS-A Fictitiously Named System.
’
2 66
BRANDON GOLDFEDDER
Hey, let’s handle this like we did in the
FIG. 7. Building on previous experiences.
We need to increase the set of design tools available to developers. Design patterns attempt to compensate for this by giving developers a set of good working tools that can be systematically applied. More than merely a problem +solution set, a pattern includes a context to allow the developer to determine the appropriateness of a pattern. When a group knows a set of patterns, their vocabulary available increases. The design review becomes something like Fig. 8.
FIG.8. A design review using patterns.
267
PATTERNS AND SYSTEM DEVELOPMENT
Note that we have resolved the potential misunderstanding (1) by having a concise form that expresses the pattern. We address the ability to determine appropriateness (2) by explicitly recognizing the context. By using or building upon a common set of patterns such as those available in Design Patterns or other sources, we can readily ensure that this knowledge can be easily communicated ( 3 ) . To ensure this communication can occur, it is essential that developers be familiar with the 23 basic patterns expressed in Design Patterns so that they understand common sorting and searching algorithms and data structures. I believe it is as (if not more) legitimate to ask of a software engineer, “How would using an Observer pattern here affect us?” as asking the space/time tradeoffs of Quicksort versus Bubblesort. It is important to realize that this retooling ofdevelopers will take time. I have observed the paradigm‘ shift that occurs when a whole team is trained on using Design Patterns. They can quickly translate their previous experience into a common vocabulary and easily document new patterns. This has had some amazing results in creating new cross-background teams [Go196].
5. Applying Patterns I’m going to focus on a simple example that I started to introduce in training patterns. This example readily shows a simple pattern approach but keeps things at the code level. Imagine that you are developing an order processing system that can handle orders which can be in any of the following modes: entered, submitted, rejected, or completed. For version 1 of this system, we only need to support submitted or rejected. Version 2 may need to add the other modes. An order provides many functions including getstatus, editorder, and yrocessOrder that may vary depending on the mode used.
5.1
Solution 1
Solving this problem using a “traditional coding” approach7 would look like Fig. 9. A subsection of the code might include: class Order C pub1i c :
enurn S t a t u s C E N T E R E D ,
SUBMITTED,
REJECTED,
COMPLETED);
O r d e r 0;
void setStatus(Status Status g e t S t a t u s 0 ;
newstatus);
“Paradigm” (like “meta”) is one of the most overused words in the current literature; however, in this case its use is justified. All code examples use a subset of C++ so that coding details do not obfuscate the examples.
’
268
BRANDON GOLDFEDDER
I
Order
enum Status ( ENTERED, SUBMITTED, REJECTED, COMPLETED); Order(); void setStatus(Status newstatus); Status getstatus(); void editorder(); void processorder();
FIG.9.
void editOrder0; void processOrder0;
... .
private: Status status;
1; 0rder::OrderO C?
: status(ENTERED1
0rder::Status 0rder::getStatusO C r e t u r n status;
1 void Order::setStatus(Status
newstatus)
c s t a t u s = newstatus;
> void 0rder::editOrderO
c switch(status) C c a s e ENTERED: edit order break; c a s e SUBMITTED: e d i t submitted order break; c a s e REJECTED: edit a rejected order break;
.. . ...
...
PATTERNS AND SYSTEM DEVELOPMENT c a s e COMPLETED: / / TED
-
269
NOT S U P P O R T E D I N V 1 . 0
/ I M a y b e some s t u b c o d e break; default :
>
...
1;
void 0rder::processOrderO
c switch(status) I case ENTERED: ... e n t e r e d o r d e r p r o c e s s i n g break; case SUBMITTED: _ . .s t a n d a r d p r o c e s s i n g break; case REJECTED: ... r e j e c t e d p r o c e s s i n g break; c a s e COMPLETED: I / T B D - NOT S U P P O R T E D I N V 1 . 0 I / M a y b e some s t u b c o d e break; default : 1;
...
>
The above example assumes the developer took advantage of the foreseen future requirements and created code stubs to anticipate that behavior, hoping to reduce future maintenance costs. This code is then released into production. There are many possible types of changes for version 2 (V2) that can be anticipated for as the system develops. We will consider several of these: adding a new mode for V2 that was foreseen (COMPLETED); adding a new mode for V2 that was not anticipated (SUSPENDED); not adding a foreseen mode in V2 (COMPLETED); the system is not properly working in the same mode (REJECTED): a maintainer recognizes common behavior and groups sections of code. We’ll consider each of these with respect to:
0
the effort required to implement this change; the testing cost involved in testing this new change and regression testing existing functionality; the level of knowledge required by developer handling this event for this module and the system as a whole; the potential impact to performance that may result from this change.
270
BRANDON GOLDFEDDER
We will ignore the coding effort and performance issues raised by the functionality itself. We are limiting our concerns to simply the impact of adding it to the existing system.
5.1.7 Adding a New Mode for V2 that was Foreseen (COMPLETED) 5.7.1.1. Effort Required. The developer must complete the stub they made. If the stub was correct (usually not so in my experience), this work may be minimal and they can leverage the effort that was applied to support this mode. All potential functions affected must be examined to ensure they accommodate the new mode.
5.1.1.2. Testing Cost. The entire order must be retested since this mode may have affected other modes and the rest of the order. As a general rule, you must assume that if you touch a section of code, you may have broken it. 5.1.1.3. Knowledge Required by Developer Handling This Event. The developer must understand the behavior of other modes as well as this mode since there is no guarantee that there are not dependencies between modes. These dependencies might create unforeseen side-effects.
5.1.1.4. Potential Impact to Performance. As we had added stubs in the code, we were already paying the overhead for that mode even though we didn’t support it. This affected all existing modes. 5.1.2 Adding a New Mode for V2 that was Not Foreseen (SUSPENDED)
5.1.2.1. Effort Required. A new mode must be added to the system. This may impact the other modes in other ways. Additionally, we must ensure that all functions that address modes are handled (this is complicated by default behavior that presumes certain modes). 5.1.2.2.
Testing Cost. As before, the entire class Order must be retested.
5.1.2.3. Knowledge Required by Developer Handling This
f vent,
As before, the developer must understand the behavior of all other modes as well as this mode.
5.1.2.4. Potential Impact to Performance. Performance of the order system will potentially decrease in a linear manner since we have added
PATTERNS AND SYSTEM DEVELOPMENT
27 1
another evaluation to the system, whether we use this mode or not. Yes, there is a way in certain situations to force jump table optimization and certain other issues, trading complexity to avoid the linear decrease in performance, but these techniques are much more difficult and unreliable than the technique to be discussed below.
5.1.3 Not Adding a Foreseen Mode in V2 (COMPLETED) 5.1.3,l. Effort Required. This requires no new effort, but we wasted effort initially in the building of the stub. 5.1.3.2. Testing Cost. None, since there are no changes. 5.1.3.3. Knowledge Required by Developer Handling This Event. None, since there are no changes. 5.1.3.4. Potential Impact to Performance. We have potentially decreased performance for a mode we will not use.
5.1.4 The System Is Not Working Properly in Some Mode (REJECTED) 5.1.4.1. Effort Required. We must examine the entire class Order code since we do not know where this state specific behavior is located. 5.1.4.2. Testing Cost. As before, the entire class must be retested. 5.1.4.3. Knowledge Required by Developer Handling This Event. As before, the developer must understand the behavior of all other modes as well as this mode.
5.1.4.4. Potential Impact to Performance. We must avoid making changes that affect other modes since these would then be retested. 5.1.5 Recognizes Common Behavior A developer often recognizes common behavior among modes and groups them into common segments of code. For example, suppose a developer recognizes that REJECTED and COMPLETED are similar by observing that both
272
BRANDON GOLDFEDDER
involve invocations of f l ( ) and f 3 0 : void 0rder::processOrderO
c switch(status)
c
case ENTERED:
...
entered order processing break; c a s e SUBMITTED:
...
standard processing break; c a s e REJECTED:
f l 0
...
f r o
f 3 0 break; c a s e COMPLETED:
f l 0 // f c 0 f 3 0
/ I Maybe some s t u b c o d e break; default
:
...
1; 1
The developer then might change the code as follows: void 0rder::processOrderO
c i f ((status
:=
REJECTED)
11
( s t a t u s ==COMPLETED))
f l 0 ;
1 switch(status1 I c a s e ENTERED:
...
entered order processing break; c a s e SUBMITTED:
_..
standard processing break; c a s e REJECTED:
break; c a s e COMPLETED:
// f c 0 / / Maybe s o m e s t u b c o d e break;
C
273
PATTERNS AND SYSTEM DEVELOPMENT
_..
default : 1; i f ( ( s t a t u s == R E J E C T E D ) f30; 3
11
( s t a t u s == C O M P L E T E D ) )
C
3
5.1.5.1. Effort Required. Grouping these sections can sometimes be difficult and may reduce the readability of the code. However, by grouping the code, we reduce the code space to a single point. These two “forces” must be weighed against each other.
5.7.5.2. Testing Cost. As before, the entire class must be retested. 5.1.5.3. Knowledge Required by Developer Handling This Event. As before, the developer must understand the behavior of all other modes as well as this mode.
5.1.5.4. Potential Impact to Performance. We may have introduced additional evaluations that could reduce performance since it is very common to do comparisons at the beginning and end of segments that cost far more in multiple evaluations than the savings in code space would seem to justify.
5.2 Solution 2 Let’s take a step back and define our problem as: The class Order contains several cases of state-specific behavior. We know that some of these states have been identified, others have not. We want to minimize the impact of change if new states (and their associated behavior) are added to the order. We do not want to pay for states which we are not using. We want a constant performance regardless of the number of states in the order. Weighing these considerations we decide to consider a State [Gam95] pattern. The state pattern “allows an object to alter its behavior when its internal state changes. The object will appear to change its class.” We accomplish this by turning the states themselves into separate objects and then delegating the statespecific behavior to this new object. From this application we would identify a base (or parent) class: OrderSrutus. We would make this an abstract class. An abstract class is one that we would never create an instance of. A concrete class is any class that we can create instances from. Often concrete classes define the implementation of the class whereas abstract classes define the interface. Concrete classes are then used to define the implementation and are used to create instances from. We would define one concrete class for every state value: Rejected, Entered, Completed, and Submitted.
274
BRANDON GOLDFEDDER
The immediate problem that results from this approach is that we now have to address the coupling (or dependency) of these many concrete classes with the order. We would like to avoid, as much as possible, an impact on the order when we modify the states. To reduce this coupling, we could apply a variation of a factory method [Gam95] pattern. A factory method “defines an interface for creating an object, but lets subclasses decide which class to instantiate.” By introducing the class OrderStatusFuctory, we allow the order to utilize the full set of statuses without direct knowledge of the actual concrete classes involved. The OrderStatusFuctory is then the only point of impact when new classes are added or modified. The careful reader might notice at this point that if we had 1000 orders which could each provide up to 4 states, then we have created 4000 instances if we create these up front. Alternatively, we could create only the current state for each order and concern ourselves with creation/destruction issues and reduce ourselves to 1000 instances. In this case, we could also utilize either a singleton which “ensures a class only has one instance” or preferably a flyweight, which “uses sharing to support large numbers of fine-grained objects efficiently” [Gam95]. We can encapsulate the flyweight mechanism and sharing pool within the OrderStatusFactory we created earlier. This would result in only one instance of the state class no matter how many orders exist. Graphically, we can show this using a variation of the object modeling notation developed by Rumbaugh [Rum91]. A short key to this notation is given in Fig. 10. In Fig. 11 we give our solution using this notation. A subsection of the code might include: class Order C public: e n u m S t a t u s C E N T E R I N G , S U B M I T T E D , R E J E C T E D , COMPLETED); Order(OrderStatusFactory& orderStatusFactory); v o i d setStatus(Status newstatus); Status getStatus0; void e d i t O r d e r 0 ; void p r o c e s s O r d e r 0 ;
... .
private: Orderstatus* status; 1; Order::Order(OrderStatusFactory& orderStatusFactory): status(orderStatusFactory->createOrderStatus(SUBMITTED)) / I default for now C) void Order::setStatus(Status newstatus) C delete status; status = orderStatusFactory->createOrderStatus(newStatus);
1;-' PATTERNS AND SYSTEM DEVELOPMENT
Abstract Class
creates
.. . . . .
- .-
AbstractOperation
aggregation
Class name
acquaintance
inheritence
n
r-7 Class name
I
Attribute:
+many
(>= 0)
Ooptional (011)
FIG. 10. Object modeling notation. 3 0rder::Status
0rder::getStatusO
c return status->getStatusO;
3 void 0rder::editOrderO
c status->editOrdertthis); 3 void 0rder::processOrderO
< status->processOrder(this);
3
We also create the OrderStatus class as an abstract class: class Orderstatus C v i r t u a l - 0 r d e r S t a t u s O <3; v i r t u a l S t a t u s g e t s t a t u s 0 = 0;
275
276
BRANDON GOLDFEDDER
CompletedOrderStatus EnferingOrderStatus
FIG. 11. Solution 2.
v i r t u a l v o i d e d i t O r d e r ( O r d e r * c o n t e x t ) = 0; v i r t u a l v o i d p r o c e s s O r d e r ( O r d e r * c o n t e x t ) = 0;
1;
and for each kind of Orderstatus USED we create a concrete class. Note there is no reason to create a class for a mode not used in this release. class EnteredOrderStatus C v i r t u a t - 0 r d e r S t a t u s O €1; virtuat Status g e t s t a t u s 0 = 0 '; v i r t u a l v o i d e d i t O r d e r ( O r d e r * c o n t e x t ) = 0; v i r t u a l v o i d p r o c e s s O r d e r ( O r d e r * c o n t e x t ) = 0;
1; Entered0rderStatus::-
EnteredOrderStatusO
<1 0rder::Status
EnteredOrderStatus::getStatus(Order*
context)
c r e t u r n 0rder::ENTERED;
1
' Virtual function-name = 0; is the C++ syntax that the class defies an abstract class.
PATTERNS AND SYSTEM DEVELOPMENT
void EnteredOrderStatus::editOrderOrder*
c
...
edit
277
context)
entered order using the context
1 v o i d EnteredOrderStatus processOrder(Order*
c
.._e n t e r e d
context)
order processing using the context
1
5.2.1 Note The code to implement the functionality is the same in both solutions. We did have to write additional header files in the second solution. One source of confusion among new programmers to C++ is the use of virtual functions. Virtual functions defer the cost of deciding which actual method to invoke until run-time. From an implementation stand-point; this results in some minimal overhead for the compiler (virtual table per concrete class) and a performance cost of a single level of indirection in order to invoke a virtual function. We can look at the selection of the proper method when using a simple if or case condition as being O(n) where n is the number of states to evaluate versus a fixed cost using virtual methods regardless of how many states are present. It is easy to see that as the number of states increase the approach using virtual method should have far greater performance. More importantly, the performance measurements remain constant so that your development and testing measurements with a few cases should be identical to those with many cases. The importance of this, particularly in real-time systems, can not be understated.
5.2.2 Adding a New Mode for V2 that was Foreseen [COMPLETED) 5.2.2.7. Effort Required. New functionality is written and we create a new concrete class to encapsulate it. A hook is made into the one place where the decision to use the new mode is made. We do not create the class until we actually decide we wish to use it. No stubbing of code fragments is needed! 5.2.2.2. Testing Cost. ONLY THE NEW MODE MEEDS TO BE TESTED. As we haven’t touched the other modes there should not be an impact. Integration testing into the order of course should still occur. 5.2.2.3. Knowledge Required b y Developer Handling This Event. Only the new modes and its functionality need to be addressed.
278
BRANDON GOLDFEDDER
5.2.2.4. Potential hpaCt to Performance. There is no change to the overhead. The only impact comes directly from the new performance. 5.2.3 Adding a New Mode for V2 that was Not Foreseen (SUSPENDED) This is no different from a foreseen mode since we do not implement this mode until used.
5.2.4. Not Adding a Foreseen Mode in V2 (COMPLETED) This creates no impact during development or in system performance since we never paid any overhead for this mode nor did any other coding specifically for it.
5.2.5.
The System is Not Working Properly in Some Mode (RNECTED)
5.2.5.1. Effort Required. We must examine only the concrete class corresponding to this mode, which we can then modify or replace. 5.2.5.2. Testing cost. Only the mode needs to be retested. Since we didn’t touch other pieces of code we probably didn’t affect it. 5.2.5.3. Knowledge Required by Developer Handling This Event. The developer needs to know only about this mode. 5.2.5.4.
Potential Impact to Performance. There is little impact as
cross mode dependency is unaffected.
5.2.6 Recognizing Comm on Beha vior As before, the developer may recognize that REJECTED and COMPLETED are similar (superstates) by observing the following: void RejectedOrderStatus::processOrder(
...I
c f l 0
...
f r o
f 3 0
1 void CornpletedOrderStatus::processOrder(
c f l 0
...
f 3 0
1
f c 0
...I
PATTERNS AND SYSTEM DEVELOPMENT
279
However, we now can consider the concept of superstates and introduce a superstate called FinishedOrderStutus, of which both REJECTED and COMPLETED status are substates (Fig. 12): void FinishOrderStatus::processOrder(
... )
c f l 0 processStep0; f 3 0
void RejectedOrderStatus::processStep(
E
...
... )
f r o
1 void CompletedOrderStatus::processStep(
c
...
...I
f c 0
1
--
5.2.6.7. EffortRequired. Grouping these sections is fairly simple if we identify hierarchical states. While enumerations in C++ cannot easily express this hierarchy, classes can.
i OrderStatus
state specific ops
1 processorder() protected: processingStep()
RelecfedOrderStatus protected processrngStep()
EnterrngOrderStatus
CompletedOrderStatus protected processrngStep()
We will probably handle processorder as a TemplateMethod [Gam95].
SubrnittedOrderStalus
280
BRANDON GOLDFEDDER
5.2.6.2. Testing cost. Only the affected modes have to be tested. 5.2.6.3. Knowledge Required by Developer Handling This Event. Only the affected modes need to be understood. 5.2.6.4. Potential Impact to Performance. We may have introduced a single level of indirection in return for reuse and simplification of the system. This also affects our ability to inline certain functions.
5.3 Analysis of Choices It should be clear that in this case, based on performance, flexibility, ease of expression, and maintainability, the approach that utilized patterns holds clear advantages. The number of classes has increased (although the effort in each class has decreased). This enables me to develop the functionality in parallel without impact and to risk-manage my development process. What we have effectively accomplished is a simplification of the long term coding effort.
6. Beware: Misapplication of Patterns While the use of the state pattern allows for easy system variance when new states or modes can be identified is of great value, we have to be careful of misapplying a pattern. A pattern is misapplied when it attempts to solve a problem that does not exist or when the context is not suitable for the application of the pattern. For example, I would be suspicious of a developer that used a state pattern to handle a Boolean case or other non-extensible case. In cases where I have seen this misapplication occur, the system quickly becomes too complex to handle and the immature organization may blame patterns for this problem. In a similar vein, I question the wisdom of many organizations in using programming languages that the developers do not fully understand. Since a pattern is a solution to a problem in a context, one should avoid solving a problem one doesn’t have or applying it in an improper context. This misapplication tends to increase the complexity of the system and may result in creating additional problems.
7.
Reality Check
Let’s assume at this point that I have shown that the approach based on patterns creates a much better system that may “stand the test of time” and that you understand how to avoid misapplying it. Take another look at the design diagrams. Notice that the complexity of the design has increased. Yes, there probably is less
P A l T E R N S AND SYSTEM DEVELOPMENT
281
code and it will last longer, but don’t underestimate the difficulty that people have in seeing abstractness (speaking from experience on both the pro and cons). In addition, recognize that you must now ensure that all developers understand the patterns you are using. I’ve seen several managers who can easily understand why if they are using C++ (for example) their programmers should understand C++. The same managers often fail to recognize why if they use pattern X it is essential that their developers all understand pattern X. In general, I have been applying Pattern 2 in developing or extending systems.
Pattern Name: Build for Today, Design for Tomorrow Problem How do you handle present requirements efficiently in the face of future requirements? Context Developers building or extending a system Forces Future requirements are often not fully understood and are highly subject to change. 0 Efforts in reuse often lead to elaborate components that are not only not reusable but often fail to be completed. 0 It is often unclear and easy to lose sight of what is a “future” vs. current requirement. 0 Developers tend to look at problems only as coding problems and fail to consider the power of design to solve these types of problems. 0 C++ and other Object Oriented languages provide ways in which to build flexibility through mechanisms such as virtual methods and templates. Unfortunately, many developers do not properly utilize these tools.
0
Solution Never write any unneeded code in the implementation. Instead, transfer the effort to design. Ensure that the architecture can handle all potential scenarios. Resulting Context A system that meets existing requirements and is capable of addressing future requirements when necessary. The cost to add these new capabilities is often less than if an up-front effort had been made.
282
BRANDON GOLDFEDDER
Rationale By shifting the focus from implementation to design, we can solve higher level problems more effectively and often dismiss the problem entirely. Often the features which the system is supposed to handle are never implemented and other features come into existence due to market forces. Normally these new features can be easily handled since the architecture has been made extensible.
Known Uses Successful reuse efforts 0 Related Patterns Single Point of Documentation
Author($ Brandon Goldfedder Date(s) 5/29/96, 10/1/96 PATTERN2.
8. Advantages of Patterns Patterns provide several benefits to software development. Patterns aid developers in understanding the real problems at work. They provide a means to record not only the existing practices, but additionally provide a context where the solution is appropriate and rationale as to why it works. The underlying principles become apparent and easier to work with. As shown earlier, one of the key benefits in utilizing patterns is the ability to focus on change management. The key is not to avoid writing new code for new requirements, but to do so only by modifying the point in the software where the new feature is used. The system should accommodate change without impacting existing code. By doing this, we can focus on the existing system requirements without compromising future expansion. We can also design our systems to enable parallel development, thus making the best use of the resources we have available. Design is a communication activity. In order to effectively talk about design we need a vocabulary that can express the issues we face. Patterns provide that vocabulary. Patterns gives developers a powerful common vocabulary to drastically improve the productivity of design reviews and make documentation
PATTERNS AND SYSTEM DEVELOPMENT
283
easier to produce and more meaningful. It provides the means to describe the abstract relationships between objects at both high and low levels. In documenting frameworks, the use of patterns to describe the system can be essential (see Section 10). By using patterns it is possible to create a guidebook or tradebook of best practices. Through these mechanisms, experience can be more readily and consistently transferred to other developers. Several organizations are using patterns to directly transfer knowledge to new projects and to ensure that the experience gained in earlier systems is applied properly to new systems.
9.
Applying Patterns in the Development Process
At the time of writing I am the technical leader and system architect of a telephony system. This team consists entirely of developers who have not only never worked together in the past, but have not shared a common domain background. At the onset of the project, I felt it was important to establish a common vocabulary up front. I gave a shortened version of the patterns class that I normally deliver. This provided enough of an initial foundation to overcome any misconceptions that the development team had, and to provide an important starting point. Throughout the development process we have been applying patterns to solve and document our design problems. This allowed us to easily address key risk areas in our initial object modeling and to allow our conversations to remain focused. There are several things that I allowed in this project that I normally would not. First of all, the requirements for this project were far more fluid that those in other projects that I have been involved with, so we built in far more flexibility in the design than normally would be appropriate. While this does not result in any more code (in fact, we have found it to significantly reduce it), it does raise the conceptual design complexity. Abstraction, as pointed out by many authors such as Richard Gabriel [Gab96], is not a skill present in many developers so it has to be carefully managed. In applying a more abstract design, it is important to have a system architect who is creating a “vision” of where the design is going and monitoring the process. It is also essential that the system architect make sure a consistent model is developed. Using patterns allows a sufficiently high level view to make sure this is the case. In creating the detailed design documentation, I kept to an extremely high level. While this allowed the use of patterns to capture large amounts of information, it also kept the documentation far more abstract than appropriate. I realized, quite painfully, that understanding abstraction is not merely a training/experience curve but may in fact be a trait that some possess and others do not. If the time permitted more concrete documentation, this would have been invaluable. I am working on ways to introduce this into the second iteration.
284
BRANDON GOLDFEDDER
I believe that every member of the team can take a portion of the design document at this point and understand the design through the use of patterns. However, I have also discovered that looking at the problem and context to properly choose the proper pattern to apply is a far more difficult problem. On the positive side, only a few members (or even only one) in a team needs this design skill, while every member needs to know how to apply it. I’m not sure at this point if this is a natural skill or one based on experience.
10. Frameworks and Patterns In approaching the application discussed above, we took an approach that would easily lend itself to a framework. As one might guess, a very close relationship exists between frameworks and patterns. According to Taligent: A framework is a set of prefabricated software building blocks that programmers can use, extend, or customize for specific computing solutions. [Ta194, Glossary]
Frameworks can address the application, domain or support level. An application framework provides horizontal functionality. Some examples of an application framework include Borland’s OWL, and Visix’s Galaxy. This is the type of framework which most developers are familiar with. A domain framework addresses a specific problem domain. This is where there is probably the most payback for many companies. These types of frameworks exist (and are being developed) for control systems, securities trading, and telephony areas. A support framework addresses system-level services. These include file access, distributed computing, device drivers, etc. Examples of support frameworks would include the backbones of CORBA or DCOM. No matter the reason, using a framework is advantageous because it means there will be less code to design and implement for the user of the framework. In addition, it allows the developer to focus on the specific expertise areas in the framework allowing this expertise to be shared. As the framework will be reused in multiple applications, it tends to be more reliable and robust. It allows the company to improve its consistency, integration and interoperability of applications-which reduces the overall maintenance costs for the. organization and allows orderly program evolution [Ugg95]. Developing a framework has historically suffered from several difficulties. Probably the most pronounced has been pointed out by Grady Booch: The most profoundly elegant framework will never be reused unless the cost of understanding it and then using its abstractions is lower than the programmer’s perceived cost of writing them from scratch. [Boo941 Utilizing frameworks and extending them through component-based software appears to be one of the key areas of development for the near future. Frameworks,
PATTERNS AND SYSTEM DEVELOPMENT
285
although powerful, are extremely difficult to understand. I have found that using patterns to document the framework allows us to focus on documenting the implementation, as well as explaining to developers how to extend the framework. The proper use of patterns to communicate our approaches is essential for these approaches to succeed. As Erich Gamma points out: People who know the patterns gain insight into the framework faster. Even people who don’t know the patterns can benefit from the structure they lend to the software, but it’s particularly important for frameworks. Frameworks often pose a steep leaming curve that must be overcome before they’re useful. While design patterns might not flatten the learning curve entirely, they can make it less steep by making key elements of the framework’s design more explicit. [Gam95]
11. Capturing Patterns Two of the major problems we must overcome in software is determining how to get developers knowledgeable about an existing system and how to reuse proven approaches from previous systems in the newer systems we are developing. Patterns are not invented, they are discovered or “harvested” from proven systems that have stood the test of time. There is some difference of opinion here in what constitutes standing the test of time. Many authors include the test of time to indicate multiple systems with multiple domains as well (as did the authors of Design Patterns [Gam95].) Regardless, the process is the same consisting of these four steps: (1) observation; (2) capturing the pattern form; ( 3 ) refinement; (4) iteration.
11.1 Observation The process of harvesting patterns begins with observation. As Alexander states: In order to discover patterns ... we must always start with observation ._.Now try to discover some property which is common to all the ones which feel good, and missing from all the ones which don’t feel good. The property will be a highly complex relationship. [Ale791
Observation may be accomplished in many ways: by examining a system, interviewing domain experts, research of trade practice, etc. What we are looking for are those patterns that improve the quality of life of the system and those that
286
BRANDON GOLDFEDDER
work in it. We are discovering at this point:
(1) What problems we have solved. (2) Why this method of solving the problem is good. (3) Why solving the problem in this way works. What are the underlying forces that we are resolving. (4) Where else we have applied this solution to this problem in the past and where we could apply this in the future. (5) Why this solution "feels" right or how it improves our life." No matter what method is used, the pattern is an attempt to discover some invariant feature, which distinguishes good places from bad places with respect to some particular system of forces. [Ale791
11.2 Capturing the Pattern Form 11.2.1 Form As mentioned earlier, there are several formats in use today to describe patterns. While the two most common are the GOF Form and the Portland Form, several other authors have experimented with styles to better address their goals. I believed that a single style was the goal to promote easy cataloging and sharing of patterns. I recently changed that naive view after seeing many recent styles that conveyed themselves much better in an alternative form. The author should choose a form that lets the pattern they wish to express come across smoothly and feel that they are constrained. The essential elements of whatever form the author chooses were discussed earlier in Analysis: What is a pattern.
11.2.2 Level One of the more difficult questions to answer in capturing a pattern is the level of the pattern we wish to describe. For example, suppose we tried to capture the patterns describing the parts of certain types of houses and also of certain types of dams: 0
We could choose to describe the concrete structures that each type of house or dam uses.
"Avoid aesthetic issues too much, a good pattern brings life into a system. It makes us feel better about using it. The way in which we have to live within our software has been understated to date.
PATTERNS AND SYSTEM DEVELOPMENT
0 0
287
We could describe common types of houses or dams. We could describe the commonality of brick arrangements that all share.
The question is best answered by the intent of the pattern. A wiring code used by an electrician and a handbook used by an electrical engineer might express the same core principles but captured at a different level for a different intent. Similarly, if we are writing a pattern to describe the patterns used to add new protocols to an existing switching system, we could attempt to generalize this to address telephony as a whole, but in doing so run the risk of failing to assist the reader. Remember that a brick is not a house. Conversely, once we have captured the pattern in a concrete manner, it is important to consider abstracting it to a 2nd order pattern. I use the term nth order to indicate that several patterns at a level were looked at and a common core pattern was discovered (Fig. 13). I believe that these core patterns provide important benefits in understanding the principles at work. However, they should not be used in place of the more concrete versions. I have also noticed that as we increase in order, the domain specificity of the pattern decreases. This is an area that can not be explored in detail until many more patterns are harvested. We can also look at the core horizontal patterns and see connections between them (Fig. 14). Anyone working with the basic patterns can see there are some relationships. These are covered in some level in the Design Patterns book itself [Gam95] Walter Zimmer [Zim95] has also created an excellent model that
c
rn Core Principles
Principles
C
Principles
Specific
FIG. 13.
Principles
288
BRANDON GOLDFEDDER
FIG. 14. Relationships between patterns.
addresses this connection and adds a common base pattern, Objectifier, which considers the base concept of implementation variance.
11.3.
Refinement
Once the pattern has been captured in a format, begin the process of refinement. It must now be shared. An important consideration is that a pattern is a statement that could be true or false. However as Alexander states: In short, whether this formulation, as it $tands, is correct or not, the pattern can be shared, precisely because it is open to debate, and tentative. Indeed, it is the very fact that it is open to debate, that makes it ready to be shared. [Ale791
The true effects of a pattern only come about when the pattern is shared with others. If we want a language which is deep and powerful, we can only have it under conditions where thousands of people are using the same language, exploring it, making it deeper all the time. And this can only happen when the languages are shared. [Ale791
One technique that has proved extremely successful in refining patterns is the use of a writer’s workshop. The purpose of the writer’s workshop is to present the authors work to other author’s and to have that work improved by the process. The author should be an observer to this process allowing the pattern to speak for itself. Often, new dimensions to a pattern are discovered this way. A basic format for the writer’s workshop is that used by Linda Rising at AG Communication Systems. In this approach a skilled moderator leads the
PATTERNS AND SYSTEM DEVELOPMENT
289
workshop. Several persons recommend that all members should contribute some work to the workshop so it is truly a group of peers. In some cases, no authors are considered second class citizens in a workshop. I have found that non-authors can provide valuable insight and a unique perspective so am not in favor of the separatist approach. Regardless of their authorship, all attendees must have read the pattern prior to attending the workshop. It is structured more as literary review than as a software inspection, the best workshops obtaining the coffee shop environment. It is interesting to note the effect that the room settings have in setting this tone. This is one reason why pattern conferences are held in rustic areas. The author then reads any selection(s) from the pattern that he or she feels is important to explain the pattern to others. The author then becomes a “fly on the wall” and does not participate in the discussion. This can be extremely trying for the most patient person when people misunderstand the writing, but this often provides the best insights for the author. A member of the group (excluding the author or moderator) can then present a summary of the pattern. Other members can amend this if necessary. Comments on the pattern then begin. It is important to remember that these are comments on the pattern as a separate entity, independent from the author. There is no eye contact with the author. The author’s name is never mentioned; all references are to “the author.” Others in the group who also “know” the pattern should not try to clarify or speak for the author during the discussion. The pattern should “stand on its own.” I recommend structuring the comments to discuss positives (things you liked), followed by suggestions for change. The group might consider whether the pattern has the “quality without a name” or how the pattern makes the programmer’s life easier. I have found it useful to focus separately on form and content. Keeping a structure where possible presents an order to the approach. Having a pattern undergo a writer’s workshop can be an extremely painful event or extremely rewarding. The more accustomed the group is to each other, the better. In addition, a good group understands that the goal is to improve the patterns, making it as valuable as possible. I have observed groups take a problem pattern and discover hidden gems of knowledge. I have also observed dysfunctional groups criticize a pattern without suggestions for improvement, denigrating it to the point that the author doubts the value of this process. The moderator plays the pivotal role in new groups. They should constrain the discussion to the pattern at hand, usually with a comment “point noted” and then proceed with the workshop. After the members have given their feedback, the author may ask questions from the group about their suggestions. The author should never offer apologies. The expectation is that the author is an expert in the area and will act appropriately to suggestions. There is no need to “check” on whether the suggestions are taken to heart. However, the author can request a second workshop as a helpful next step. At this point, the group must praise the author.
290
BRANDON GOLDFEDDER
11.4
Iteration
As patterns are used the context and details become more pronounced. It is expected that pattern will evolve and that this approach to capturing a pattern can occur recursively. This results in pattern refinement and often the discovery of other patterns.
12. Where Now? Over the past two years quite a few excellent publications on patterns have become available. I’ll mention a few of them here now. First and foremost is the now classic Design Patterns by Gamma et al. [Gam95]. I believe this should be required reading by all software developers. Several other authors have been working. In addition, Frank Buschmann’s book Pattern-Oriented S o f i a r e Architecture: A System of Patterns [Bus961 is an excellent companion to this book. I particularly find Martin Fowler’s book on Analysis Patterns [Fow96] to provide a great incite into object modeling. Lastly, of course is the classical work by the master architect, Christopher Alexander himself [Ale79]. The yearly conference PLOP (Pattern Language of Programming) provides a forum for pattern authors to share their patterns and to gain feedback from other authors. Unlike many conferences, the attendees are expected to have read the papers prior to attending. The focus is on the discussion of the paper itself. The author is kept as a fly on the wall and must merely gain the feedback of the group using a writers’ workshop approach. This conference has now grown large enough that several other conferences are being held. At the time of writing, the first two conference paper publications have been bound and the third is on its way [Cop95].
12.1
Internet Resources
Resources available through the internet include the following: 0 0 0
Patterns Mailing List [email protected] with “subscribe” or “unsubscribe” in the SUBJECT part of the mail message; Patterns Home Page http ://hillside .net/patterns/patterns.html; Portland Pattern Repository http://c2.com/ppr/index.html.
13. Concluding Remarks Patterns may be the most important concept to enter mainstream software development since object-oriented programming. And as with Object-Oriented
PATTERNS AND SYSTEM DEVELOPMENT
29 1
Programming, it will be some time before it is understood, accepted and properly used. I like to leave this chapter with a quote from Christopher Alexander that sums up my views: The most important thing is that you take the pattern seriously. There is no point at a11 in using the pattern if you only give lip service to it. Christopher Alexander [Ale791 SPECIAL THANKS
The author wishes to thank his wife, Susan, for reviewing this and other works and for putting up with him in general. The author also wishes to thank the editor Marvin Zelkowitz for his assistance in this paper and over the years. REFERENCES AND FURTHER READING [Ale641
Alexander, C. (1964). Nores on fke Synthesis of Form, Harvard University Press, Cambridge MA. [Ale791 Alexander, C. (1964). The Timeless Way qf Building, Harvard University Press, Cambridge MA. [Alf95] Alfred, C., and Mellor, S. J. (1995). Observations on the Role of Patterns in ObjectOriented Software Development. OBJECT Magazine, May. [Boo941 Booch, G. (1994). Designing an application framework. Dr Dobb’s Journal lO(2). [Boo961 Booch, G. (1996). Object Solutions: Managing the Object-Oriented Project, Addison Wesley, Menlo Park, CA. [Bus961 Buschmann, F., Meunier, R., Rohnert, H., Somrnerlad, P., and Stal, H. (1996). PatterriOriented Sqftware Architecure: A System ofPatterm, Wiley, Chichester, England. [COP921 Coplien, J. (1992). Advanced C++ Programming Styles arid Idioms, Addison-Wesley, Reading, MA. [COP941 Coplien, J. (1994). Software Design Putterns: Cornnzon Questions and Answers, Posted Paper. [COP951 Coplien, J., and Schmidt, D. (1995). Pattern Language of Program Desigrz, AddisonWesley, Reading, MA. Putterns: Reusable Object Models, Addison-Wesley, [Fow96] Fowler, M. (1996). Aiiu Reading, MA. [Gab961 Gabriel, R. (1996). Parterris of Software, Oxford University Press, New York, NY. [Cam951 Gamma, E., Helm R., Johnson, R., and Vlissides, J. (1995). Design Patterns, AddisonWesley, Reading, MA. [Go1961 Goldfedder, B., and Rising, L. (1996). A training experience with patterns, Conzinurzicatiorzs of the ACM. 39( 10). [Joh88] Johnson, R. E., and Foote, B. (1988). Designing reusable classes. Journal of Ohjectoriented Programming, June/July . [Joh93] How to Design Frameworks. OOPSLA 93 tutorial notes. Liskov, B. (1988). Data abstraction and hierachy. SIGPLAN Notices 23(5). [Lis88] [McC95] McCarthy, J. (1995). Dyiwmics qfSqftwure Development, Microsoft Press, Redmond, WA. [Me5961 Meszaros, G., and Doble, J. (1996). MetaPatrerns: A Pattern Language For Pattern Writing, submission to PhoP-96 conference proceedings.
292
BRANDON GOLDFEDDER
Olson, (1995) D. Harlds in View, Portland Pattern Repository, http://www.c2.corn/ppr /index.html. [Rum911 Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., and Lorensen, W. (1991). ObjecrOriented Modeling and Design, Prentice Hall, Englewood Cliffs, NJ. [Sou95] Soukup, J. (1995). Implementing Patterns, in Pattern Laiiguage of Program Design, Addison-Wesley, Reading, MA. Taligent White Paper on Building Object-OrientedFrameworks, Taligent, Inc. 1994. Joyce Uggla CommonPoint Frameworks Take the Spotlight, 1995, Published in AIXpert May 95. [Vi195] Viljamaa, P. (1995). The patterns business: impressions from PLOP-94. ACM Software Engineering Notes, 20( 1). Vlissides, J. (1 995). Reverse archimre. Position Paper Dagstuhl Seminar 9508. [Vli95] Vlissides, J . (1996). To Kill a Singleton. C++ Report, June. [Vli96] [Zim95] Zimrner, W. (1995). Relationships between design patterns. Pattern Language ofPrograni Design, Addison-Wesley, Reading, MA. [OlS95]
High Performance Digital Video Servers: Storage and Retrieval of Compressed Scalable Video SEUNGYUP PAEK AND SHIH-FU CHANG Department of Electrical Engineering Columbia University, New York
Abstract Developments in the fields of computing, computer networks and digital video compression technology have led to the emergence of unprecedented forms of video communications. In the future, it is envisioned that users will be able to connect to a massive number of distributed video servers, from which users will be able to select and receive high quality video and audio. High performance video servers will store a large number of compressed digital videos and allow multiple concurrent clients to connect and retrieve a video from the collection of videos. High performance video servers must provide guaranteed quality of service for each video stream that is being supported. The video server has to retrieve multiple video streams from the storage system and transmit the video data into the computer network. There has been a great deal of research and development in the past few years on the various issues related to high performance video servers. In this chapter, we provide an overview and snapshot of some of the main research areas that have been worked on recently. We critically evaluate and summarize the main research results. In particular, we focus on the interrelated issues of digital video compression, storage and retrieval for video servers.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Compressed MPEG Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Variable Bit Rate and Constant Bit Rate MPEG Video . . . . . . . . . . . . . . . 2.2 Scalable MPEG Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Inter-disk Data Placement of Constant Bit Rate Video . . . . . . . . . . . . . . . . 3.1 SystemModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Balanced Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Periodic Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Multiple Segmentation Placement of Scalable Video . . . . . . . . . . . . . 3.5 Fault Tolerant Video Storage . . . . . . . . . . . . . . . . . . . . . . . . . 4. Buffer Replacement Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 SystemModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 BASIC Buffer Replacement Algorithm . . . . . . . . . . . . . . . . . . . . ADVANCES IN COMPUTERS, VOL. 41 ISBN 0-12-012 147-6
293
294 295 298 398 299 301 302 302 304 307 309 310 3 11
Copyright 0 1998 by Academic Press All rights of reproduction in any form reserved.
294
SEUNGYUP PAEK AND SHIH-FU CHANG
4.3 Comparison of Buffer Replacement Algorithms . . . . . . . . . . . . . . . . 5. Interval Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Interval Caching policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Comparison of Interval Caching Policy with Static Policy . . . . . . . . . . 6. Batching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Proposed Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Comparison of Batch Scheduling Policies . . . . . . . . . . . . . . . . . . . 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Retrieval Scheduling and Resource Reservation of Variable Bit Rate Video . . . . 7.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Retrieval Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Constant Time Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Constant Data Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Minimal Resource Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Buffer-bandwidth Resource Relation . . . . . . . . . . . . . . . . . . . . . 7.7 Comparison of Retrieval Schedules . . . . . . . . . . . . . . . . . . . . . . 7.8 Resource Reservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Progressive Display of Scalable Video . . . . . . . . . . . . . . . . . . . . 7.10 Performance Evaluations of Retrieval Schedules . . . . . . . . . . . . . . . 8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . .
312 312 313 314 315 316 316 317 317 318 319 320 322 322 324 326 329 330 330 333 333 337 338
1. Introduction Developments in the fields of computing. computer rletworks and digital video compression technology have led to the emergence of unprecedented forms of video communications. Advances in computer networks have led to broadband networks that can handle much higher data rates with greater reliability. Advances in state of the art digital video compression technologies have greatly reduced the data rate and storage requirements of digital video. together with the added flexibility of scalable resolutions. Advances in computing have led to multimedia workstations for the home that can provide high quality video and audio. One of the forms of video communication that are enabled by and dependent on the confluence of these key technologies is video on demand (VoD). In VoD. multiple users will be able to connect to remote digital video libraries and view videos “on demand”. It is envisioned that broadband computer networks will allow users to connect to a massive number of distributed “video servers”. from which users will be able to select and receive high quality video and audio. One of the critical computing systems of VoD is the video server . High performance video servers will store a large number of compressed digital videos and allow multiple concurrent clients to connect over the computer network to retrieve a video from the collection of videos .
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
295
High performance video sei-vers not only have to store and archive a large number of compressed digital videos, but must also provide guaranteed quality of service for each video stream that is being supported. The video server has to retrieve multiple video streams from the storage system and transmit the video data into the computer network. There has been a great deal of research and development in the past few years on the various issues related to high performance video servers. There have been many proposals for different algorithms to maximize the utilization of computing resources, and also many proposals for the architecture of high performance video servers. In this work, we provide an overview and snapshot of the main research areas that have been worked on recently and describe why each area has been considered to be important. We critically evaluate and summarize the main research results that have been presented. In particular, we focus on the interrelated issues of digital video compression, storage, retrieval and network transmission in video servers.
2. Compressed MPEG Video High performance video servers store a large number of compressed digital videos and allow multiple concurrent clients to connect over the computer network to retrieve a video from the collection of videos. Due to the extremely large storage and bandwidth requirements of digital videos, it is very important to compress the video data for cost effective storage and transmission in video servers. Video compression technology addresses the problem of how to reduce the data rate and storage requirements of digital videos, without significantly reducing the visual quality of videos. In the last few decades, image and video compression has been a dominant topic in the image and video processing community, and it will undoubtedly continue to be a critical technology for the future of emerging multimedia applications [lS]. This is because video sequences can require very large bandwidths and storage when represented in digital form. This can be demonstrated by a simple example. Consider an image with 720 x 480 pixels. If the picture is in color, 3 bytes can be used for each pixel, i.e. one byte for each color component (R, G, B) of a pixel. This means that each picture is 1037 KBytes. If a video sequence comprises a sequence of such images at 24 frames per second, the video sequence will have a data rate of 200 Mbps. For a one hour video, the storage requirement for uncompressed video is approximately 90 GB. At current magnetic disk technologies, this corresponds to about 44 hard disks to store a single video. Clearly, it is not cost effective to store uncompressed digital videos with current technologies. With state of the art video compression, the above uncompressed video sequence
296
SEUNGYUP PAEK AND SHIH-FU CHANG
with a data rate of 200 Mbps can be compressed to data rates as low as 4 Mbps. Therefore, the storage requirement for compressed video will be approximately 2GB. In order to understand video server technologies, it is important to understand video compression technologies. In this section we briefly overview the Motion Picture Experts Group (MPEG) video compression standard, which is being adopted as a world wide standard for compressed digital video. The MPEG-1 standard is a video compression standard for digital storage media applications. The MPEG-2 standard is a follow on to MPEG-1 and is intended primarily for higher bit rates, larger picture sizes, and interlaced video frames. The MPEG-2 standard builds directly upon MPEG-1, and is a relatively straightforward extension of MPEG- 1. However, MPEG-2 provides a different set of capabilities, including advanced techniques for high definition television (HDTV). In particular, MPEG-2 provides a set of scalable extensions to provide video with multiple resolutions. This will be discussed further below. The MPEG-1 and MPEG-2 standards actually comprise several parts. The video part of the MPEG standards is aimed at the compression of video sequences. The audio part is aimed at the compression of audio data. The systems part deals with issues such as the multiplexing of multiple audio and video streams, and the synchronization between different streams. In this section, we briefly overview the video part of the MPEG standards. An in depth coverage of the MPEG standards can be found in [22]. The overview presented here is based on [22]. MPEG video compression is based on both inter-frame and intra-frame techniques. Inter-frame techniques refer to compression techniques in which information in adjacent frames of a video sequence are used to compress a given frame. Intra-frame techniques refer to compression techniques in which a frame is compressed independent of information in any other frames. Inter-frame techniques are very effective in video compression because there is a great deal of redundancy between adjacent frames i.e. adjacent frames of a video generally have very little variation. Consider a scene in a film in which two people are talking. If we look at the frames of the video, we will see that adjacent frames are almost identical, with minor variations in the expression on the faces of the people talking. Therefore, it is not necessary to compress all the information in each frame of the video independently of each other. Intuitively, we can see that significant compression can be achieved by only encoding the variations between successive frames of a video. We will first describe the syntax of MPEG compressed video bitstreams. The outermost layer of an MPEG video bitstream is the video sequence layer. The video sequence layer is divided into consecutive groups of pictures (gop), as shown in Fig. 1. Each gop is composed of pictures that are either I, P, or B pictures. I pictures are coded independently with intra-frame techniques. P, B
297
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
I B B P B B P B B P B B I
Time
FIG. 1. Typical MPEG group of pictures.
pictures are compressed by coding the difference between the picture and reference pictures that are either I or P pictures. P (predictive-coded) pictures are coded by using information from temporally preceding I, P pictures. B (bidirectionally predictive-coded) pictures obtain information from the nearest preceding and/or following I, P pictures. The basic building block of each MPEG picture is the macroblock. Each macroblock is composed of four 8 x 8 blocks of luminance samples in addition to two 8 x 8 blocks of chrominance samples (one for Cb and one for Cr). An MPEG picture is not simply a sequence of macroblocks but is composed of consecutive slices, where each slice is a contiguous sequence of macroblocks. We will now briefly describe the main techniques used for MPEG video compression. Discrete Cosine Transform. The Discrete Cosine Transform (DCT) algorithm in MPEG video is the basis of both intra-frame and inter-frame coding. The DCT has properties that simplify the coding model. Basically, the DCT decomposes a block of image data into a weighted sum of spatial frequencies. For example, in MPEG, an 8 x 8 block of pixels is represented as a weighted sum of 64 two dimensional spatial frequencies. If only low frequency DCT coefficients are nonzero, the data in the block varies slowly. If high frequency coefficients are present (non-zero), the block intensity changes rapidly from pixel to pixel within the 8 x 8 block. Quantization. In MPEG, the DCT is computed for a block of 8 x 8 pixels. It is desirable to represent coefficients for high spatial frequencies with less precision. This is referred to as quantization. DCT coefficients are quantized by dividing it by a non-zero positive integer called the quantization value, followed by rounding the quotient. The bigger the quantization value, the lower the precision of a quantized DCT coefficient. Lower precision coefficients can be transmitted to a decoder with fewer bits. MPEG uses larger quantization values for higher spatial frequencies. This allows the encoder to selectively discard higher spatial frequencies.
298
SEUNGYUP PAEK AND SHIH-FU CHANG
A macroblock is composed of four 8 x 8 blocks of luminance samples and two 8 x 8 blocks for each of two chrominance samples. Chrominance samples represent color in terms of the presence or absence of red and blue for a given luminance intensity. 8 x 8 blocks of data are the basic units of data processed by the DCT. A lower resolution is used for the chrominance blocks since the human eye resolves high spatial frequencies in luminance better than chrominance. This sub-sampling also contributes significantly to the compression. The DCT has several advantages from the point of view of data compression. For intra-frame coding, coefficients have nearly complete decorrelation. Therefore, the coefficients can be coded independently. In inter-frame coding, the difference between the current picture and a picture already transmitted is coded. The DCT does not really improve decorrelation. However, the main compression gain is mostly by visually weighted quantization. Motion compensation. If there is motion in a video sequence, better compression is obtained by coding differences relative to areas that are shifted with respect to an area being coded. The process of determining motion vectors in an encoder is called motion estimation. Motion vectors describing the direction and amount of motion of macroblocks are transmitted to decoders. In MPEG, the quantized DCT coefficients are coded losslessly. The DCT coefficients are organized in a zig zag scan order, in which the order approximately orders coefficients in ascending spatial frequency. Visually weighted quantization strongly deemphasizes higher spatial frequencies. Therefore only few lower frequency coefficients are non-zero in a typical transformation.
2.1
Variable Bit Rate and Constant Bit Rate MPEG Video
The variable bit rate of MPEG2 video is dependent on the encoding structure of the MPEG2 coding algorithm. In the MPEG2 digital video technology, compression is achieved by the combination of techniques such as the discrete cosine transformation (DCT), variable length codes, quantization of DCT coefficients, motion estimation and motion compensated inter-frame prediction. MPEG2 has a buffer control mechanism in which the quantization parameter can be varied adaptively in order to achieve a constant average bit rate of the compressed video. The disadvantage of this mechanism is that the subjective visual quality will be variable, since the quantization parameter is continually varied. An alternative is to maintain a constant quantization parameter during the encoding of video. This results in variable bit rate video, in which the amount of data to represent different time scales of video (macroblock, slice, frame, group of pictures etc.) are variable. Figure 2 shows the data trace of a variable bit rate encoded MPEG-2 video sequence.
2.2
Scalable MPEG Video
Compared to simulcast coding, scalable coding schemes can provide multiple
HIGH PERFORMANCE DIGITAL VIDEO
3
SERVERS
299
10
E .P LI
4
5
5 3.0:144
wO
1000
2000
3000
TimdO.5 sex cycles FIG.2. MPEG-2 VBR video trace data.
levels of video with a minimal cost of extra bandwidth or storage capacity. In scalable video coding, subsets of the full resolution bitstream are used to obtain subsets of the full resolution video [12]. Scalable video will be used in advanced computer networks to support heterogeneous clients. Mobile wireless clients may only have computing resources to receive the lowest layer of video, while high performance workstations will request all the scalable layers of video. The MPEG2 standard allows a combination of spatial, SNR (signal-to-noise ratio) and temporal scalability for up to three layer coding of video sequences. In one possible hybrid, three layer scalable coding scheme, the base layer provides the initial resolution of video. The spatial enhancement layer enables the upsampling and hence increase in frame size of the base layer. Finally, the SNR enhancement layer increases the visual quality of the (base + spatial enhancement) layers of video. In another scheme for MPEG-2 video, three layer temporally scalable video is achieved as follows. The lowest scalable layer comprises the I frames of a video (I layer). The P frames (P layer) enable an increase in the temporal resolution of the I frame layer. Finally, the €3 frames (B frames) increase the temporal resolution of the I + P layers. We refer to this as IPB scalable video. In this scheme, scalabilty is inherently provided by the MPEG-2 encoding structure.
3. Inter-disk Data Placement of Constant Bit Rate Video In this section we focus on the storage system of a video server. Specifically, we will consider storage systems that are based on a parallel array of independent magnetic disks. The way in which video data is stored on the magnetic disk
300
SEUNGYUP PAEK AND SHIH-FU CHANG
systems can have a significant impact on the performance of a video server. We will refer to the way in which data is stored on an array of disks as the data placement scheme. In this section, data placement refers to inter-disk data placement, and not intradisk data placement. Inter-disk data placement refers to how video data is distributed across multiple disks, whereas intra-disk data placement refers to how data is stored within a single disk. Given that a set of videos have to be stored on a set of disks, data placement schemes are important in achieving load balancing in video servers. Consider a simple data placement strategy in which an entire video is stored on a single disk. The advantage of this scheme is that it is simple. The disadvantage is the lack of load balancing. If all users connecting to a video server request a video that is stored on one disk, the disk storing the popular requested video will be fully utilized. However, all the other disks will remain idle. If the data for the popular video was distributed over all the disks, the combined throughput of all the disks could have been used to allow more clients to connect to the server to view the videos. In this section we overview the research in the placement of video data on a parallel array of disks. We will show how the performance of a video server depends on the data placement scheme. In the data placement of videos on an array of disks, we will show that there is a basic trade-off between the worst case interactivity delay and the utilization of disks. For the guaranteed retrieval of video data, if the data placement scheme maximizes the utilization of the disk systems, the worst case interactivity delay is shown to be at a maximum. Conversely, if a data placement scheme minimizes the worst case interactivity delay, the utilization of the disk system is at a minimum. In relation to the data placement scheme, we also consider how scalable video can improve the performance of video servers. Research on scalable video data placement in which the utilization of the disk system is maximized is presented in [6]. However, the proposed scheme has a large worst case start up and interactivity delay. In [19], a multiresolution video data placement scheme is presented in which the interactivity delay is minimized, but in which the utilization of the disks is low. In contrast to the data placement schemes presented in [6, 191, a data placement scheme is presented in [25] in which different videos can have a range of interactivity and hence disk utilization performance. On one end of the spectrum, the utilization of the disks is maximized (hence increasing the number of concurrent video streams). On the other end, the maximum interactivity delay is minimized. The flexibility of this strategy is that different videos can operate at different points of this performance spectrum to provide a range of interactivity QoS. This is in contrast to the schemes presented in [6, 191, in which the performance is at the extreme points of the performance spectrum. It is also shown how the performance of a video server supporting scalable video can be improved for the proposed data placement strategy.
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
301
This section is organized as follows. We first present the system model of the video server that we will consider. Then, two extreme strategies for the placement of video data on a parallel array of disks is compared. These schemes show the tradeoff between the worst case interactivity delay and the utilization of disks. For each scheme, advantages and disadvantages are compared. Finally, a flexible strategy for the placement of video data on a parallel array of disks is presented. It is also shown how scalable video can improve the overall utilization and interactivity performance of a video server based on the proposed data placement strategy.
3.1 System Model We consider a parallel array of disks each connected in parallel to a memory system. The details of magnetic disk systems can be found in [lo]. Consider the operation of a single disk. When a request for an 1/0 operation is received at a disk, two types of overhead are incurred before data can be transferred from the disk to the memory. These are as follows: the time it takes for the head to move to the appropriate cylinder (referred to as the seek time), and the time it takes for the first sector to appear under it (referred to as the rotation latency). Following this overhead, the transfer of the data begins. The transfer time for a request is a function of the total data requested. Each disk is assumed to use the SCAN disk head scheduling algorithm. [9] Covers disk head scheduling algorithms in detail. In the SCAN scheduling algorithm, the scanning cycle consists of two phases. During the first cycle, the head scans the disk from the innermost track to the outermost track. While scanning the disk, data blocks belonging to different streams are read from the disk. Upon reaching the outermost track, the head is returned to the initial position. The disks of a video server are assumed to operate on a cycle. For every cycle of the video server, each disk completes one complete SCAN cycle. In this section, we assume that videos have constant bit rate. Therefore, in each cycle, one retrieval block of video data has to be retrieved for every video stream. The data placement scheme determines how each retrieval block of a video is stored on a parallel array of disks. If each of the retrieval blocks is stored on a single disk, a higher utilization efficiency can be achieved for the disks. This is because for each disk seek overhead, more data is retrieved. However, a larger buffer size will be required. Each video stream is serviced in a round robin fashion during each cycle. The retrieval block is a fixed number of frames that are referred to as a group of frames (gof). For an approximate analysis of the utilization of a single disk using the SCAN disk head scheduling algorithm, several assumptions are made. Firstly, any stream accessed during the first phase will add to the total retrieval cycle the maximum rotational latency, the data reading time and the minimum seek time. Secondly, since the retrieval cycle consists of two phases of head movement, we add two maximum seek delays to the total cycle time.
302
SEUNGYUP PAEK AND SHIH-FU CHANG
It is shown that utilization of a single disk system is as follows:
is the maximum number of video streams that the parallel array of disks can support, R, is the video playback rate and R , is the maximum disk transfer rate.) As shown in [25],, ,S can be derived from the following equation, where Tcycle is the round robin cycle time, T, is maximum seek time, TTm is the minimum seek time and T, is the maximum rotation latency:
,S (,,
3.7.1 Interactivity 00s In advanced digital video systems of the future, we reconsider the commonly accepted notions of interactivity. The goal for interactivity in the video server is not to “simulate” VCR functions exactly but to achieve effective search mechanisms while efficiently utilizing the limited resources of a video server. We propose that the critical functions of interactivity that are required for video servers are location of specific scenes and multiple rate “scanning” of video segments (fast/slow forward/reverse).
3.2
Balanced Placement
This scheme has the lowest interactivity delay; however, the utilization of each disk is low. The interactivity delay (defined in Fig. 3) of this scheme is one cycle. For example, if a user pauses the playback of a video stream and after some time requests that the video stream be resumed, the video stream would be able to resume in the following cycle. In this scheme, each group of frames (gof) of each resolution of video is divided into N , equal segments and placed over all N , disks. In [ 191 a similar data placement strategy is presented, in which the full resolution gof is segmented to N , segments. For the balanced placement strategy, since data in each gof is distributed over all disks, the data retrieved in each disk for each disk seek is small. Therefore, the disk utilization is low.
3.3 Periodic Placement This scheme [6] represents the opposite side of the interactivity QoS. This scheme maximizes disk utilization, however the worst case access and interactivity delay will be shown to be N , cycles. This is in contrast to the balanced data placement scheme, in which the delay is always one cycle.
303
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
start up delay
access delay
resume delay
4-b
;-
I
I I I
Igof 1l l iI
I
I
I
r*---h I
I
I
t
gofr
I
I I I I
t
pause
t
I
I
I
I
t’
I s+g
gof
t
I
1
t t
start request
I
I
forward scan x 1
pause
request gof
I
I s+gx2
I
..,
I
1 1 ,
resume
I
b
s + (g x 2) x 2
S
t
(g: scan granularity)
forward scan x 2 FIG.3. Interactivity functions of video server.
For each video, consecutive gof are placed on consecutive disks in a round robin fashion. For every cycle, one gof is retrieved for every video stream connected to the video server. Each gof is retrieved from a single disk during a cycle (compared to multiple disks in the balanced placement scheme). If a single disk can support n gof retrievals in one cycle, then Nddisks support ( N d -n ) video streams concurrently. The observation is made that for a video stream starting retrieval of video data at cycle r , the video stream accesses a different single disk during each cycle. However, the video stream accesses the same single disk during each cycle as all video streams with start cycles in the following set: { r,, i = 1,2, ... N,l(r, modulo Nd = r modulo N d )1.
(3)
r , denotes the start cycle of video stream i and we assume the first gof of all videos are stored on the same disk. This observation shows that for all video streams connected to the video server, we can group the video streams into N , video stream sets. All video streams in a video stream set retrieve data from the same disk during any given cycle (Fig. 4). It can be shown that the worst-case interactivity delay for a video stream is N , cycles for any of the equivalent interactivity functions. To prove this, we first note that each of the interactive functions are equivalent in that a specific required gof must be retrieved from the array of disks. The number of video streams being serviced on the disk that contains the required gof is the number of video streams in the video stream set that is accessing the disk during a particular cycle. The required gof cannot be accessed until a video stream set
304
SEUNGYUP PAEK AND SHIH-FU CHANG
Cycle
Disk
0 1
Video stream set 0 7 8
9
B b
-Video
set 7
stream
FIG.4. Video stream sets for periodical placement strategy.
that can accommodate a new video stream is accessing the appropriate disk. Since the total number of sets is N,, the maximum access delay before retrieval is Nd. We can also show that the scan granularity for this scheme is Nd. It has been shown that for regular playback, a video stream j accesses consecutive disks to retrieve consecutive gof. If video streamj requires a forward scan while only utilizing the resources reserved in its video stream set, we can show that the scan granularity is g = N , + 1.
3.4
Multiple Segmentation Placement of Scalable Video
This scheme [25] is flexible in that it allows videos to take on a range of maximum interactive delay and scan granularity values. It is shown that decreasing interactivity delay can be achieved at the cost of decreasing utilization efficiency. Therefore, there is a design range for the placement of video data. The scheme presented here uses different degrees of segmentation of gof blocks for the placement of gof blocks across a parallel array of disks. We first present the multiple segmentation (MS) scheme:
(1) For a parallel array of N , disks, define (log, N,) + 1 segmentation levels: S = S j = 2', i = 0, 1,...,log, N,. (2) For a given segmentation level S, divide each gof into S equal segments. (3) For a given segmentation level S , specify (N,/S) sets of disks. (4) For each video sequence, the consecutive retrieval blocks (goo which were each divided into S equal segments are stored on consecutive sets of disks as in Figure 5. Balanced placement is a special case of multiple segmentation with S = 8, while periodic placement is a special case with S = 1. It is shown that increasing the
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
305
segmentation S reduces the maximum interactivity delay at the price of utilization efficiency. For a segmentation level S, a video stream accesses S disks during each cycle. Extending the structure of video stream sets, we develop the structure of component video stream sets. For a parallel array of N , disks, we define N , component video stream sets. For a video j that is stored with segmentation level S, we say that S component video streams are required for a single video stream of video j . Therefore, resources are reserved on S component video stream sets for the retrieval of a single video stream for video j . Figure 5 shows which component video stream sets are used for the retrieval of a given video stream at cycle r stored with segmentation level S. All component video streams in a video stream set retrieve data from the same disk during any given cycle. Based on this, it can easily be shown that the worstcase interactivity delay for a video stored with segmentation level S is Nd/S cycles for any of the equivalent interactivity functions. Suppose that a video j is stored with segmentation level S = 2 on an N , = 8 disk array. Consider a video stream that has reserved resources on component video stream sets (0,4), with all other video stream sets exhausted by other video streams. Assume a request for a gof stored on disks (0,4) is made during cycle r , and the desired start gof is stored on the disks which have just been accessed during cycle r. The delay before the desired gof can be accessed from the appropriate disks is Nd/S = 4. In summary, a video stored with segmentation level S requires S component video streams. If resources are reserved on S component video stream sets, a maximum of Nd/S cycles are required before a given set of S component video
Comwnent video stream sets (r mod 8)+ 0
S=l
s=2
s=4
S=8
(r mod 8)+7 U
FIG.5. Component video streams sets for the retrieval of video stream at cycle r.
306
SEUNGYUP PAEK AND SHIH-FU CHANG
stream sets has accessed all disks. Using a similar analysis, we can also show that the scan granularity for this scheme is Nd/S. This scheme has advantages in flexibility in that videos with high interactivity delay tolerance can be stored with a smaller segmentation level, and videos requiring low interactivity delay are stored with higher segmentation levels. Multiple levels of segmentation can be supported on the same array of disks in a video server to provide a range of interactivity QoS. The proposed multiple segmentation scheme can be used to segment each gof of each layer of scalable video into S segments. The specific value of S to use is a design parameter that can be chosen by the system designer for each video in the video server, depending on its access requirements. We now consider how to divide each gof of each resolution (layer) of video into S segments. Each gof of each layer consists of a sequence of I, B, P frames of MPEG2 video. There are two basic ways to segment the gof of a given layer, as shown in Fig. 6. Using method 1, we see that if one segment is not retrieved, all frames of a gof will be affected. Method 2 is clearly a better option. Furthermore, based on method 2, we may group together frames of the same type (I, P, B) before segmentation. In this way, we assign the highest priority to segments containing I frames, intermediate priority to segments containing P frames, and the lowest priority to segments containing B frames. Segmentation does not occur exactly at frame boundaries. Each segment has an associated priority, and the priorities can be used in the video server scheduler to selectively drop segments to achieve graceful degradation in the case of congestion. In addition, further granularity in interactive scan
Method 2 1
6 B B B
- %
-+ --8 P
frame Method 1
frame
Modified Method 2 (not to scale) I
I
IIP b P P P P B 8 8 q B B B . . . I
u Method 3
I
- ---- ---- --_-- ----
-----
frame
FIG. 6. Multiple segmentation based on scalable MPEG2 video.
HIGH PERFORMANCE DIGITAL VIDEO
SERVERS
307
functions can be easily achieved by skipping B and/or P frames. In many real time applications or near real time applications for which fast responses are critical, lower layers may be segmented with a higher level (method 3, Fig. 6) so that lower layers can be retrieved with shorter delays for a high degree of interactivity. This can be used for progressive retrieval in which lower layers are displayed before full resolution layers are fully retrieved.
3.4, I
Admission Control Framework Based on Multiple Segmentation
The multiple segmentation placement strategy has a simple admission control framework. It was shown that each incoming video stream can be decomposed into a number of component video streams. Higher segmentation levels require more component video streams for a single video stream. Admission control at the video server is an operation at the call establishment level for a video stream request at a video server. Given an incoming request with a specific QoS requirement, the admission control must decide to accept or reject the call. The policy has to determine if the request can be serviced by the video server while maintaining heterogeneous QoS requirements of all video streams already connected to the video server. The challenge is to maximize the utilization of the video server resources while ensuring heterogeneous QoS requirements of connected video streams. For a parallel array of N , disks, we define N , component video stream sets (CVSS). All component streams in a given CVSS retrieve video data from the same single disk during a given cycle. The component video streams in a CVSS are said to be connected to the same logical disk. The CVSS simplification provides a strategy for admission control in the video server. In the video server, we maintain a single CVSS admission control table. For each incoming video stream, we update the corresponding CVSS entries accordingly. Note that depending on the resolution of the video stream, we calculate whether the incoming video stream can be supported on each logical disk associated with each CVSS. All logical disks are assumed to be identical with the same disk characteristics.
3.5
Fault Tolerant Video Storage
In very large scale video servers, a very important issue is that of fault tolerance [ 2 ] .A large scale video server can potentially have 1000 (1GB) disks. This would provide enough storage for approximately 300 MPEG-2 movies at 4.5 Mbps or 900 MPEG-1 movies at 1.5 Mbps. Assuming a bandwidth of 4 Mbytes per second, 1000 disk drives provide enough bandwidth to support approximately 6500 concurrent MPEG-2 users or 20 000 MPEG-1 users. Although a single disk can be fairly reliable, given such a large number of disks, the aggregate rate of disk failures can be too high. The Mean Time To
308
SEUNGYUP PAEK AND SHIH-FU CHANG
Failure ( M i T F ) of a single disk is on the order of 300 000 hours. Therefore, the MTTF of some disk in a 1000 disk system is on the order of 300 hours (approx. 12 days). In large scale video servers, all the video data may be stored on tape drives, with a subset being stored on the disks. Therefore, data loss on disks can always be recovered. However, a disk failure can result in the interruption of requests in progress. If a video being retrieved from disks and transmitted into the network is stored on the failed disk, this can lead to degraded video quality at the client. In large scale video servers, as shown in the previous section, it is likely that each video will be striped over multiple disks in order to improve the load balancing in video servers. In that case, multiple videos can have a portion of their data stored on a failed disk. This means that a single disk failure can lead to degraded video quality for multiple videos. Once a disk has failed, one option is to restore the data for multiple videos that were stored on the failed disk from the tape archive onto a new disk. This can be a very slow process, since data for multiple videos has to be retrieved from robotic tape archives onto the disk. Therefore, without some form of fault tolerance, such a system is not likely to be acceptable. Reliability and availability can be improved by using a fraction of disk space to store redundant information. Typically parity schemes and mirroring schemes are used for this purpose. For example, in a disk system with five disks, four disks may be used to store actual data, while a fifth disk is used to store parity information. In this example, four fifths of the total storage space is used to store data, and four fifths of the disk bandwidth is used to retrieve data from disks. In [2] two observations are stated for the design of fault tolerant video servers. Observation I : One should not mix data blocks of different objects in the same parity group. If this observation is violated, there may not be enough disk bandwidth to reconstruct data on the fly in the event of a disk failure. Observation 2: To avoid degradation in video quality when a failure occurs, the first fragment in a parity group cannot be scheduled for transmission over the network until the entire parity group has been read from the disk. We note that in the unlikely event of two disks failing in the same parity group, a rebuild from tape drives would have to be performed. This is referred to as a catastrophic failure. The goal of fault tolerant video servers is to achieve very low probabilities of catastrophic failures with a minimum of increase in disk storage, disk bandwidth and buffer requirements. There is another serious type of system failure which is called degradation of service [ 2 ] .This occurs when there is insufficient available disk bandwidth, due to failures, to continue delivering all active requests. One way this can happen is if observation 1 is violated. For example, suppose that a parity group contains
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
309
fragments of video x and video y . If video x was already being retrieved, then disk bandwidths have already been reserved for its retrieval. However, if video y was not being retrieved, then there may not be enough disk bandwidth available for its retrieval when a disk failure occurs. The Streaming RAID (Redundant Array of Independent Disks) scheme was first proposed in [30]. For fault tolerance, disks are grouped into fixed sized clusters of C disks, each with C - 1 disks and one parity disk. The set of data blocks, one per data disk, and a parity block on the parity disk form a parity group. The parity block is the bitwise exclusive of the data blocks. Each video is striped over all the data disks. The parity groups associated with an object are allocated in a round-robin fashion over all of the clusters. For example, parity groups one, two, three are stored on clusters one, two, three respectively. For each active stream, a single parity group is read from a single cluster in a single cycle and delivered to the network in the subsequent cycle. With this scheme, in the event of a disk failure, the missing data can be reconstructed by a parity computation. For each active stream, an entire parity group is read from a cluster in each cycle. If a disk failure has occurred on that cluster, the parity block is also read from the parity disk and the lost data is reconstructed on thefly. The Streaming RAID scheme achieves fault tolerance of up to one disk per cluster. If more than one disk fails in a single cluster, then there is a catastrophic failure. In this case, data has to be rebuilt on a new disk from tape archives. Streaming RAID achieves fault tolerance at the cost of disk bandwidth and storage, since a portion of the disk resources are used for redundant parity information.
4.
Buffer Replacement Algorithms
Disks are a primary form of storage for computer systems. In disk based computer systems, a bufler cache is used to reduce the number of disk I/O. When a request is made for data, the operating system first checks to see if the data is available in a memory cache. In this way, disk 1/0 can be reduced. A buffer manager is responsible for buffer replacement to free memory to accommodate new data blocks from disk. Each time data is retrieved from disk, the buffer manager has to decide which data in the buffer cache to remove in order to store the newly retrieved data. The buffer replacement algorithm has the goal of reducing the total number of cache misses. An optimal buffer replacement achieves the lowest number of cache misses, and hence the lowest number of disk I/O. In general the optimal algorithm is unachievable as it requires exact future knowledge of data requests. Most disk systems use approximation algorithms such as Least Recently Used (LRU) and Most Recently Used (MRU). However, it can be
310
SEUNGYUP PAEK AND SHIH-FU CHANG
shown that both algorithms yield poor performance for continuous media data such as video. Access to continuous media data requires rate guarantees so that clients can meet their timing constraints. To ensure rate guarantees, buffer space and disk bandwidth are reserved for each client on a video server. In this section, we will assume that a global buffer cache is used in which data can be shared among all the clients. The rationale for using a cache is to reduce disk I/O. Reducing disk 1/0 is important due to the following reason. Video servers need to handle continuous media data as well as conventional data. Continuous media data requires a high 1/0rate. Furthermore, continuous media data requires the 1/0 bandwidth to be reserved throughout the duration of playback to meet the real time transfer rate requirement. The reservation tends to last a long time. Reserving large portions of disk bandwidth for long durations can result in drastic degradation in performance of accesses to other data types. The lower the buffer cache miss ratio for continuous media data, the higher the performance for conventional data accesses will be in a multimedia storage system. In this section, we will present a buffer replacement algorithm for video servers and then compare the proposed algorithm with the LRU, MRU and optimal buffer replacement algorithms. The proposed algorithm is shown to have significantly better performance than LRU and MRU.
4.1
System Model
A video server or multimedia storage system can store various types of data. Clients access data either in real time mode or non-real time mode. In real time mode, clients can retrieve data at a guaranteed rate. In non-real time mode, no rate guarantees are provided to clients. A multimedia storage system must ensure that rate guarantees for clients accessing data in real time mode are met without yielding poor performance for clients accessing data in non-real time mode. We will assume that real time clients access data in read only mode. We also assume that each real time client must specify the rate at which data transfer must occur when it first requests data. For constant bit rate video, real time clients access data at the rate at which the data was encoded. To provide rate guarantees, the video server performs admission control and resource reservation. We assume the system retrieves data blocks from disks into buffer space and flushes data from the buffer space to disks periodically in service cycles. The maximum length of a service cycle will be denoted T . Finally, we assume clients can interactively control data transfer by pause, resume and jump. The buffer cache space is managed as a buffer cache consisting of n,, buffers,
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
31 1
TABLEI BUFFERMANAGER TASKS Buffer manager tasks (1) Add the buffers containing data blocks which were consumed in
the last service cycle to the free buffer pool. (2) Determine which data blocks need to be retrieved from disk in the current service cycle. (3) Determine if a block that has to be fetched is already in the buffers. (4) Allocate buffers from the set of free buffers for those data blocks that need to be retrieved from disks. ( 5 ) Issue disk I/O to retrieve the needed data blocks from disks into allocated buffers.
each of size d. Each buffer can either be free or used. We will refer to all the free buffers as the free buffer pool. At the begining of each service cycle, the storage system scans each client to determine which data blocks were consumed by the client and which data blocks need to be fetched from disk. Table I shows the steps that accomplish this task. The algorithm that decides which free block should be allocated for a block that needs to be pre-fetched is called the buffer replacement algorithm. The content of a free buffer is either valid or invalid. We assume the replacement algorithm first uses all invalid free buffers. In the following, we will present a buffer replacement algorithm for video servers and then compare the proposed algorithm with the LRU, MRU and optimal buffer replacement algorithms.
4.2
BASIC Buffer Replacement Algorithm
The LRU algorithm selects the buffer which contains the buffer that is used least recently. The MRU algorithm selects the buffer which contains the block that is used most recently. The optimal algorithm selects the buffer that contains the block that will not be referenced for the longest time. This cannot be acheived in practice, but can be implemented for comparison purposes in simulation. In order to describe the BASIC buffer replacement algorithm presented in [23], we first define a progressing client as a client that is in a state other than pause. When a buffer is to be allocated, BASIC selects the buffer which contains a block that would not be accessed for the longest period of time by the existing progressing clients if each client consumed data at a specified rate from the moment on.
312
SEUNGYUP PAEK AND SHIH-FU CHANG
If there are buffers which contain blocks that would not be accessed by the existing clients, the block with the highest offset rate ratio is selected as the victim. For example, the tenth block of a video with a rate of r = 1.5 Mbps in a system with buffer size d = 32 kB has an offset-rate ratio of 9 x 32 kB/1.5Mbps = 1.536 s. In [23], the BASIC algorithm is compared with LRU, MRU and the optimal algorithm in the case where there are only two clients accessing the same continuous media file continuously at a rate of r. The length of a file is denoted as 1 blocks. The distance between two clients is denoted dist blocks. We assume that any two consecutive service cycles are T units apart and d = T.r holds. If there is one client and initially the buffer cache does not contain any of the data blocks that the client will access, then the number of disk I/Os is 1. In [23], for the restricted scenario mentioned, it is shown that LRU will always yield miss ratios higher than BASIC. It is also shown that even if clients are not relatively close to each other, BASIC will reduce cache misses periodically. For this scenario, BASIC and MRU are equivalent to the optimal algorithm. However, it can be shown that as the number of clients increases, MRU results in significantly more cache misses than BASIC.
4.3
Comparison of Buffer Replacement Algorithms
In [23], it was shown that conventional buffer replacement algorithms such as LRU and MRU perform poorly for servers supporting continuous media data. The commonly used LRU and MRU buffer cache replacement algorithms do not reduce disk 1/0significantly when used in this domain. If all clients access videos continuously, LRU yields miss ratios in the range of 99% and 88%. The new buffer replacement algorithm called the BASIC algorithm was shown to reduce cache misses up to 30% compared to LRU and MRU, when all clients access the same continuous media data. It is shown that the new algorithms only have at most about 3% increase in cache misses compared to the optimal algorithm if videos are sufficiently long. As such, the proposed algorithm is a very suitable candidate for buffer replacement scheme in storage systems with continuous media data.
5. Interval Caching Traditional buffer management policies employed by various software systems (e.g. operating systems) are based upon the concept of a hot set of data. Various buffer management policies such as LRU use different mechanisms to identify the hot set and to retain it in the buffer. In the previous section, we presented the BASIC buffer replacement algorithm as an improvement to the LRU
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
313
and MRU policies. From extensive simulations, the BASIC algorithm is shown to improve the performance of a video server by reducing the cache miss probability. In the previous work, it was assumed that buffer and disk bandwidth resources would be reserved for each stream requiring real time retrieval. The goal of the BASIC buffer replacement algorithm was to try to reduce the disk 1/0 as much as possible so that the performance of non-real time disk 1/0will be improved. Note that even with the BASIC algorithm to reduce disk 1/0 for continuous media, a fixed disk bandwidth must still be reserved for each real time request. The BASIC algorithm only tries to reduce the disk 1/0 as much as possible, but does not provide any guarantees (deterministic or statistical) about how much the performance will be improved. In general, buffer cache policies that operate at the block level such as BASIC and LRU do improve performance. However the improvement in performance is unpredictable and the server does not guarantee any specific performance improvement. In this section, we propose a buffer management policy called the interval caching policy. The interval caching policy is based on the following observation. For any two consecutive requests for the same video, the later stream can read the data brought into the buffer by the earlier stream if the data is retained in the buffer until it is read by the later stream. With a large number of concurrent streams, it is possible to choose the streams to be retained so as to maximize the number of streams that are read from the buffer rather than from the disk, thereby reducing disk I/O. The buffer cache is used to store the interval between subsequent requests of the same video while providing guaranteed continuous retrieval. The policy identifies certain streams and temporarily buffers the pages brought in by those streams. We examine the efficacy of this technique to reduce disk overload and hence to increase the capacity of the video server. It is shown that the interval caching policy makes it cost effective to use memory to buffer video streams even with a uniform access pattern to all movies.
5.1
System Model
The system model relevant to this section is shown in Fig. 7, which shows the various software components of a video server. The buffer space is divided into a number of blocks m and the blocks containing data are in the buffer pool while the rest of the blocks are in the free pool. During normal operation, for each data stream the disk manager initiates a disk 1/0 in anticipation and reads data for that stream to a free buffer block. It does this by acquiring a free block, inserting it into the buffer pool and starting the disk 1/0 to read data into a block. The communication manager initiates
314
SEUNGYUP PAEK AND SHIH-FU CHANG
Buffer pool
From disk
In
data
Stored data
Out data
Toclient
/
--
Transmit
Communication manager
Get buffer
Buffer Free pool FIG. 7. Interval caching system model.
the communication 1/0 process and sends the previous data block of a stream stored in the buffer to the client process. Data brought in by a stream can be reused by other closely following streams, if sufficient buffer space is available to retain the data blocks in the buffer. The completion of the communication 1/0 process then invokes the buffer manager to decide whether the block should be returned to the free pool or retained in the buffer pool for reuse by other streams. The blocks in the buffer pool are thus divided into in-blocks that are in the process of having video data read into them, out-blocks from which data is being transmitted to one or more clients and data-blocks that are being retained for reuse by other clients.
5.2 Interval Caching Policy The main idea of the interval caching policy is to choose the consecutive pairs to be buffered so as to maximize the number of streams served from the buffer. The policy orders all consecutive pairs in terms of increasing buffer requirements and then allocates buffers to as many of the consecutive pairs as possible. The buffer requirement of a consecutive pair depends on the time interval between the two streams and the compression method used. The main idea is illustrated in Fig. 8. The small arrows marked by S11 through S31 represent the pointers corresponding to the various playback streams on the movies 1,2,3. Two streams S, and S, are defined as consecutive if S, is the stream that next reads the data blocks that have just been read by S,. In Fig. 8, (S,,, S,,), (S,,,SI3)and (S,,, S,4) form three consecutive pairs for movie 1.
315
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
Movie 1
s12
s13
s14
s11
+ b12
b13
b14
b s31
Movie
b22
f
Ordered buffer requirement list: b12b2 b13 b14 Buffered streams: SI1.S21, S12 FIG.8. Interval caching policy.
5.3
Comparison of Interval Caching Policy with Static Policy
The effectiveness of the interval caching policy is studied by modeling a server with various amounts of buffer sizes delivering MPEG-1 videos. Since the effectiveness of the buffering policy depends on the distribution of access to the movies, two distributions are used in the simulations. The first distribution is based on the empirical data on video rentals in various video stores during one particular week. The second distribution is a uniform distribution over all the movies. In the simulations, the interval caching policy is compared to that of the policy of statically buffering the most popular movies. In [15, 141 it is shown that the performance of the interval caching policy is superior to that of the static policy of buffering the most popular movies, using the number of streams buffered as the performance metric. For the same amount of buffer, the interval caching policy buffers more streams than the static policy. The relative difference depends on the actual distribution of the accessed over the movies. For a uniform distribution, the interval caching policy buffered ten times as many streams as the static policy, while under a skewed distribution, the interval caching policy buffered twice as many streams. In addition, the interval caching policy automatically adapts to the changes in the access distribution, always providing superior performance, whereas the performance of the static policy may be affected severely by the changes in the access rates.
316
SEUNGYUP PAEK AND SHIH-FU CHANG
6.
Batching
In the future, as the cost of magnetic disk storage and memory continue to drop and broadband (gigabit) network infrastructures become widespread, large scale Video-on-Demand systems will become cost effective. In such VoD systems, large databases of movies will be stored in a set of centralized servers. Geographically distributed clients will request movies from the centrallized video servers. For a video server to support a video stream, it is necessary for CPU, memory buffer, disk bandwidth and network interface bandwidth to be reserved at the video server. Therefore, there is a hard limit on the number of streams that can be supported concurrently by a video server. In such a VoD system, a new movie stream can be started to satisfy each request. Alternatively, requests for movies can be batched together so that a single stream will satisfy requests for the same video. In this way, the video server can increase the number of supported clients. In a video server supporting batching, the same video stream can be multicast to all the clients in a given batch group. In this section, we will present research on batch scheduling policies. The batch scheduling policies determine which set of playback requests should be batched at any given time at a video server [ 131.
6.1
System Model
In this section we describe the system model relevant to batch scheduling policies. 0
0
0
Playback requests. Playback requests for movies from different clients are assumed to be independent of each other and will arrive at random time intervals at the video server. Video access frequencies. Given that a video server stores a set of videos, we assume that access to movies is non-uniform. Some movies are more popular than others. Based on the rental statistics from video rental stores, the access frequencies to various movies are characterized by a Zipf distribution with parameter 0.27. In a Zipf distribution, if the movies are sorted by access frequencies, then the access frequency for the ith movie is given byf, = c/i" -'I, where 8 is the parameter for the distribution and c is a normalization constant. Customer reneging behavior. Batch scheduling policy can depend on user behavior. Once a client requests a certain video from a video server, the amount of time a user will wait before deciding to leave may not be known in advance. In the following, we will assume that the reneging time R of each client is a random variable with an exponential distribution.
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
317
For the exponential reneging time distribution, the probability that a customer leaves at any moment is independent of the amount of time the client has been in the system. We further assume that all clients for all movies have the same mean reneging time. Therefore, all customers in the queue are equally likely to remain in the queue.
6.2
Proposed Policies
We first present two orthogonal classes of policies that select a movie for batching, given that there are a set of requests at a video server [13]. Afterwards, we will summarize the results of the simulations to compare the performance of these two policies.
6.2.7
First-come-first-served (FCFS) Policy
In this policy, the requests for all movies join a single queue which we call the requests queue. Each customer can leave the queue independently of others if it has to wait too long. This is dependent on the reneging time of each client. Once server capacity for delivering a stream becomes available, the client at the front of the requests queue is served. All customer requests for the same movie are also satisfied by the same stream. This is a fair policy since it selects a movie independent of the identity of a movie.
6.2.2 Maximum Queue length (MQL) Policy In this policy, requests for each movie join a separate queue. The movie with the maximum queue length is selected when resources are available. One drawback of this policy is that it may only choose hot movies since there are very few requests for cold movies within a short time. This may considerably increase the reneging probability of the requests for cold movies, causing an increase in unfairness.
6.3
Comparison of Batch Scheduling Policies
We now summarize the results of the simulations presented in [ 131 which were run to compare the performance of these two policies.
6.3.7 Reneging Probability The reneging probability was found to be lower under MQL than FCFS. This is due to the memoryless property of the exponential distribution. MQL selects the movie with the largest number of waiting clients. Since the reneging probability
318
SEUNGYUP PAEK AND SHIH-FU CHANG
of all clients are the same, MQL minimizes the overall reneging probability. This may not hold if the reneging probability of waiting clients depends on the amount of time it has been waiting in the system.
6.3.2 Average Waiting Time of Accepted Clients The average waiting time of accepted clients was found to be lower for MQL than FCFS. This is because in FCFS, a playback stream can be used to serve a single client request while many requests for a popular video are waiting. However, the difference in average waiting times was found to be greatly reduced as the server capacity increases.
6.3.3 fairness Based on the Zipf distribution, the set of all movies can be classified into bins, such that movies in each bin receive roughly the same number of requests. For example, the movies can be divided into 10 bins such that roughly 10% of the requests fall into each bin. The measure used for fairness is defined as follows: M
In the above equation, M is the number of bins, r i is the reneging probability of the ith bin movies and roUis the average reneging probability over all bins. In the simulation, it is found that FCFS is more fair. This can be understood intuitively because FCFS treats all the movies the same.
6.3.4 Complexity FCFS is easier to implement than MQL since little state information is required.
6.4
Summary
In general, batching is more effective in larger servers since there are more available requests for batching. We presented two batch scheduling policies proposed in [131 that differ in their choice of a movie to be played when server capacity becomes available. Since analytical modeling is complex, simulations were performed to compare the performance of the scheduling policies. Various policies trade off various performance objectives. The performance objectives we looked at were reneging probability, average waiting time, and fairness.
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
319
The FCFS policy always serves the longest waiting client and thereby ensures fairness. It is also easier to implement. MQL serves the requests for the movie that has the largest number of waiting clients. The policy attempts to maximize the number of clients that are served at the expense of unfairness. In the simple case of assuming exponential distributions for the reneging times, MQL was found to always have lower reneging probabilities. However, if the reneging probability depends on the amount of waiting i.e. if it is not memoryless, MQL may not even perform as well as the FCFS policy. For large scale video servers, the average waiting time was found to be very close (less than 5%). Therefore, overall, FCFS is preferred over MQL.
7 . Retrieval Scheduling and Resource Reservation of Variable Bit Rate Video In section 2 it was shown that state of the art digital video compression introduced by MPEG-2 can produce bursty, variable bit rate (VBR) video. Figure 2 shows trace data for MPEG-2 VBR video data. The video sequence is from the movie Forrest Gunzp. VBR video can provide several advantages over constant bit rate (CBR) video, including consistent video quality and lower encoder complexity. However, the bursty nature of VBR compressed video complicates the design of real time systems such as video servers, in contrast to the simpler case of CBR video. Video servers operate in cycles, and during each cycle time, video data is retrieved by a disk retrieval scheduler from the disk system to memory for each stream that is supported. The disk retrieval scheduler determines how much data should be retrieved from the disk system to memory during each cycle, for each stream that the video server supports. Video data has to be retrieved to memory before it can be transmitted into the network. For CBR data, the disk retrieval schedule is simple. Let us say the data rate of all videos are 4.0 Mbps, and let us assume that a cycle is 0.5 s. Then, during each cycle, 2.0 Mbits of video data is retrieved from the disk system to the memory for each stream. In the case of VBR video, the data rate is constantly changing. For VBR video, it is not clear what the best way for data to be retrieved for each stream should be. In constant time (CT)retrieval [7, 3 11, data corresponding to a constant time is retrieved for each VBR stream during each cycle. Let u(s) be the video that a stream s is retrieving. For continuous, lossless retrieval, it is necessary to reserve a disk bandwidth equal to the peak data rate of video for stream s. In Constant data ( C D )retrieval [7, 11, a constant amount of data is retrieved for each VBR stream during each cycle of operation of a video server. A disk bandwidth equal to the average data rate of video u(s) is reserved for a stream s. This
320
SEUNGYUP PAEK AND SHIH-FU CHANG
retrieval schedule will be shown to require a certain amount of pre-fetch data to be retrieved from disk before transmission into the network can begin. These retrieval schedules will be discussed in more depth in this section. It will be shown that CD and CT retrieval are inflexible and do not fully utilize the bandwidth and memory resources of a video server in maximizing the number of supported VBR streams. The minimal resource (MR) retrieval schedule and its associated resource reservation algorithm can be shown to effectively overcome the limitations of CD and CT retrieval. The MR retrieval approach fully utilizes the video server resources to maximize the number of supported video streams. In M R retrieval, a range of disk bandwidths can be reserved for the retrieval of VBR data. This is in contrast to CD retrieval in which a disk bandwidth equal to the average data rate of a VBR video is reserved. This is also in contrast to CT retrieval in which a disk bandwidth equal to the peak data rate of a VBR video is reserved. However, it is shown that each disk bandwidth reservation requires a certain amount of pre-fetch buffer reservation. The M R schedule minimizes the amount of buffer that is required for each disk bandwidth reservation. We present performance evaluations based on simulations using MPEG2 trace data. It is found that MR retrieval dramatically improves the performance of video servers compared to CT or CD retrieval. For a video server configuration with four disks and a memory resource of 120 MBytes, the MR retrieval approach supports 50% more video streams than approaches based on CT retrieval. For the same configuration MR retrieval supports 275% more video streams than approaches based on CD retrieval. A key point is that M R retrieval always has better or equal performance over CD or CT retrieval, irrespective of the particular video server resource configurations. The increase in complexity for M R retrieval is also shown to be small. Furthermore, The MR retrieval schedule and associated resource reservation algorithms are flexible enough to be implemented on general purpose computers. The M R retrieval approach does not depend on any special video data layout strategies on disks, and is directly applicable to video servers that are based on general fault tolerant storage architectures (e.g. RAID-3 Redundant Array of Independent Disks [lo].) Compared to simulcast coding, scalable coding schemes can provide multiple levels of video with a minimal cost of extra bandwidth or storage capacity. In scalable video coding, subsets of the full resolution bitstream are used to obtain subsets of the full resolution video. We show how scalable video can improve the performance of a video server when used with the MR retrieval schedule.
7.1
System Model
In this section we describe the video server system model relevant to this
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
321
section. The system model is shown in Fig. 9. The video server has a fault tolerant disk array for the storage of video data, a memory resource for pre-fetch buffers, and a network interface for transmission into the computer network. The disk retrieval scheduler has a cyclic operation [2,24]. Each cycle, the disk retrieval scheduler retrieves data for multiple video streams from the disk system to the memory. The network transmission scheduler also has a cyclic operation, although the cycle time of the network transmission scheduler will typically be much smaller than the cycle time of the disk retrieval scheduler. Much like the disk retrieval scheduler, the network transmission scheduler transmits data for multiple video streams into the network during each cycle. The video data is interleaved over all the disks of the array. During each cycle, the disk heads of each disk in the array complete one cycle of a SCAN disk head schedule, as described in Section 3. The network is assumed to accommodate the peak bandwidth of all video streams and introduces zero delay and zero jitter. Recently, research has been done on the network transmission schedule for VBR video in which limited network bandwidth, network delay and network jitter are being considered [21,28]. The video server supports completely interactive viewing and continuous, lossless retrieval. Video servers supporting completely interactive viewing allow viewers to pause and resume playback at any time during a viewing session. Playback can also resume at any point of a video. Video servers supporting continuous, lossless retrieval provision resources so that once transmission of a portion of a video has started, no delay is introduced at the server, i.e. the transmission never stops until all the video has been transmitted. Note that this does
Fault tolerant disk array
display
? I
Disk retrieval scheduler Memory buffers Network stream scheduler
FIG.9. Video server/VoD architecture.
322
SEUNGYUP PAEK AND SHIH-FU CHANG
not mean that there can be no delay before transmission begins. This is an important distinction, and will be discussed more below. Each stream supported by a video server has a fixed bandwidth and fixed buffer reserved for the entire duration of interactive retrieval. There are no renegotiations of resources for each stream during a viewing session. This is a key simplifying assumption.
Retrieval Constraints
7.2
In the following sections, although the operation of a video server is based on discrete time cycles, we will use continuous time notation to clearly convey the central ideas. Table I1 defines the notation we will use in the following sections. In the following, we make an important assumption about the network transmission schedule as follows:
0 v(t - f,,) dt
for t < t,, for t 2 t,, .
(5)
The retrieval constraints for the retrieval of a video are as follows:
(1) i(t) 2 o(t) (continuous retrieval constraint) (2) i(t) - o(t)< m (buffer constraint) (3) di(t)/dt < h (disk bandwidth constraint)
7.3 Constant Time Retrieval Constant time (CT) retrieval retrieves data from disk to memory according to
TABLE I1 RETRIEVAL SCHEDULING NOTATION
Notation
Description
r(t)
Data rate of stored video Start of network transmission Start of disk retrieval Duration of video Cumulative data transmitted out of memory into network (output) Cumulative data retrieved from disk into memory (input) Buffer memory reserved for retrieval Disk bandwidth reserved for retrieval
*,, t,
T o(t) i(t)
m b
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
323
the video data rate. This scheme is described by the equation for t < t,.
0 r(t - t,) dt
for t 2 t,.
The cumulative input data and cumulative output data are equivalent at any given time. In an actual system, the network transmission scheduler waits one cycle time t , after disk retrieval begins before it can start transmission into the network. Therefore o(t)= i(t- t o . The delay of one cycle exists because we assume that the video server uses a double buffer scheme for retrieval and transmission (Fig. 10). During each cycle, data is read from the disk into the disk read buffers for each stream. Each disk executes one SCAN cycle. Concurrently, data is written from the network write buffers into the network interface card (NIC) for each stream. At the end of each cycle, the contents of the disk read buffers are written onto the network write buffers, and the process repeats. No pre-fetch buffer is required for CT retrieval. In the first cycle, the disk read buffer is being filled, while the network write buffer is empty. At the end of the first cycle, the contents of the disk read buffer are copied onto the network write buffer. Therefore, network transmission can only begin after a delay of one cycle. This delay is different from the pre-fetch delay mentioned in the following sections and is common to all the retrieval schedules. We shall ignore this delay in the following sections. The pre-fetch delay for CT retrieval is zero. For each stream, a disk bandwidth of b equal to the peak data rate of the video being retrieved must be reserved for the entire duration of an interactive viewing session: b = max( r(t),O < t < T } .
Disk read buffers
Network write buffers
D:Disk retrieval
N: Network stream scheduler
scheduIer
FIG. 10. CT retrieval.
(7)
324
SEUNGYUP PAEK AND SHIH-FU CHANG
In the following sections, we ignore the disk read buffers and network write buffers and only consider the pre-fetch buffers. For CT retrieval, for each stream, the memory requirement for pre-fetch buffers is zero. For the MPEG-2 VBR video shown in Fig. 2, CT retrieval has to reserve a bandwidth of 14 Mbps for the entire duration of retrieval. The buffer requirement for a video server operating at 0.5 s cycle time is 1.75 MB. In [31],CT retrieval is the basis for the admission control scheme in multimedia servers. The primary contribution of the work is that statistical service guarantees are provided to all streams. In other words, for each stream, a continuous retrieval is guaranteed to a fixed percentage of the video data. It is proposed that a certain percentage of video data can have the continuity requirement violated without significantly affecting the quality of the video. This leads to an improvement in the utilization of the server. New clients are admitted for service as long as the statistical estimate of the aggregate data rate requirement (rather than the peak data rate requirement) can be met.
7.4 Constant Data Retrieval In constant data (CD) retrieval, a bandwidth of b Mbps equal to the average data rate of a video is reserved. The reserved bandwidth is typically much smaller than in CT retrieval, in which the peak bandwidth is reserved. This scheme is described by the equation
i(t)=
fort < t, b . (t - t,)
for t 2 t, .
CD retrieval retrieves a fixed amount of data during each cycle (Figs 11, 12). This differs from CT retrieval, in which a variable amount of data is retrieved in each cycle. In this scheme, data has to be pre-fetched to ensure that continuous, lossless
p
v
'4
U
\
N
Disk read buffers
Pre-fetch buffers
Network write buffers
FIG. 1 1. CD retrieval.
325
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
I
Pr&elch delay FIG. 12. Cumulative data analysis for CD retrieval.
retrieval of the video is guaranteed. This can occur because the amount of data transmitted during each cycle is variable, while the amount of data retrieved from the disk system is constant during each time cycle. Since a pse-fetch data has to be retrieved, there is a pre-fetch delay associated with CD retrieval. The worst case pre-fetch delay can be determined for stored video because the entire trace is known a priori: t -t 'I
p
Y b'
=-
= max( ~ ( t ,+, A ) - b . A } ,
0GAGT.
(9) (10)
(11)
For the MPEG-2 VBR video shown in Figure 2 , CD retrieval reserves a bandwidth of 3.8 Mbps for the entire duration of retrieval. The buffer requirement for a video serves operating at 0.5 s cycle time is found to be 50 MB. In [7], CD retrieval is the basis for both a deterministic and statistical admission control scheme. The retrieval scheme is referred to as constant data length (CDL) retrieval. In [ 11, two retrieval schemes called traditional CDL and generalized CDL (GCDL)are presented. The traditional CDL scheme described is actually very different from the CDL scheme of [7]. In the traditional CDL scheme of [7], a constant amount of data is retrieved from the disk for a video stream in the first disk cycle. In the second cycle, the same constant amount of data is retrieved only if it is required to prevent buffer underflow. Otherwise, no data is retrieved. This
326
SEUNGYUP PAEK AND SHIH-FU CHANG
process repeats throughout the retrieval. Therefore, each retrieval cycle is either an idle or active round. Although a constant data amount is retrieved during an active round, the overall retrieval can be considered to be variable bit rate. This is different from [l], in which the overall retrieval is constant bit rate, i.e. there are no idle rounds. In [ 11, the GCDL scheme is an extension of the traditional CDL scheme in which the retrieval round can be different for different video streams and which are a multiple of the disk cycle. This is shown to reduce the buffer requirements compared to traditional CDL.
7.5
Minimal Resource Retrieval
In this section, we present a minimal resource (MR) retrieval schedule [27] for continuous, lossless retrieval of VBR video. MR retrieval is similar to CD and CT retrieval in that the retrieval alternates between intervals of constant time retrieval and constant data retrieval. However, it differs in that a range of bandwidths can be reserved for the retrieval. CD retrieval requires that a bandwidth equal to the average data rate is reserved, while CT retrieval requires a bandwidth reservation equal to the peak data rate. If the bandwidth reserved for retrieval of VBR video is less than the peak data rate, then data has to be pre-fetched to ensure continuous, lossless retrieval. Therefore, a buffer for pre-fetch data is required. In order to minimize the buffer requirement, data should be pre-fetched just-in-time. For the retrieval of VBR video, M R retrieval minimizes the worst case buffer requirement that is required for a given disk bandwidth reservation. We first describe the MR retrieval schedule and then discuss its properties and then compare it with CD and CT retrieval. As before, let t,, be the start of network transmission of a stream. The accumulated data output from the memory to network is o(t). The bandwidth reserved for the retrieval is b. We define the following function that we will use in describing M R retrieval:
a(a,p) = b , ( t - a ) + 6. We also use two new variables t,, t , to mark the beginning and end of each maximum retrieval interval. The determination of the MR retrieval schedule of a stored video is described in Table 111 and is shown graphically in Figs 13 and 14. MR retrieval is defined by the maximum retrieval intervals. Figure 13 shows i(t) for MR retrieval. If the time t falls inside any of the maximum retrieval intervals, the retrieval rate is at the maximum bandwidth b. Otherwise, the retrieval amount is equal to the data rate. The buffer status at time t is m(t) = i(t) - a(r). The worst case pre-fetch delay is d=
max { / 7 7 ( t ) } b
’
t,, < t < t,, i- T ,
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
327
TABLE111 DETERMINATION OF MR RETRIEVALSCHEDULE Algorithm to determine MR retrieval schedule
(1) (2) (3) (4) (5) (6) (7)
Sett,=t,,+T Decrease t , until du(f,)/dt 2 b Find the intersection of a(t,, o(t,)) with o(t) Let f h < t , equal the intersection point Mark the interval [t,,, t p ] as a maximum retrieval interval I f f , < t,, then stop Set t , = t , and return to step 2
Proof of M R retrieval optimality MR retrieval can be shown to minimize the worst case pre-fetch buffer requirement for a given bandwidth reservation for the continuous, lossless retrieval of a video. We can prove this by showing that MR retrieval is based on just-in-time retrieval. Consider the first maximum retrieval interval [t,, t,]. In M R retrieval, the retrieval rate during a maximum retrieval interval is 6. We define a small time interval A . Consider the start of retrieval to be delayed to t , + A . In this case, it can be seen that even if the retrieval is at the maximum retrieval rate of b, the continuous retrieval constraint will be violated (Fig. 15). Therefore, t , is the latest time at which pre-fetch of data can start if continuous retrieval is to be guaranteed up to t,.
b.(r - re) - ofre)
3P
-
‘C m
FIG. 13. Determination of MR reetrieval.
328
SEUNGYUP PAEK AND SHIH-FU CHANG
I I
Maximum retrieval interval
ic
-
I I
t
Maximum retrieval interval
FIG. 14. Cumulative data analysis for h4R retrieval.
Consider the start of retrieval to start earlier at t , - A . It can be seen that the buffer requirement will increase for any possible retrieval schedule as data is retrieved earlier than required. This analysis can be done iteratively for all the maximum retrieval intervals. Therefore, since MR retrieval is based on just-in-time retrieval,
Violation of continuous, I
tb-A
I
I
i" tb+A
Start of first maximum retrieval interval
FIG.15. MR retrieval optimality.
e t
329
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
it minimizes the buffer requirement while satisfying the constraints for continuous, lossless retrieval.
7.6
Buffer-bandwidth resource relation
For a given bandwidth reservation, MR retrieval minimizes the worst case pre-fetch buffer that is necessary for continuous, lossless retrieval. It can also be shown that increasing the reserved disk bandwidth will reduce the worst case buffer requirement. Therefore, MR retrieval leads to a buffer-bandwidth resource relation for the retrieval of a video. Figure 16 shows the buffer-bandwidth resource relation for the MPEG2 encoded video trace data shown in Fig. 2. This relation shows the worst case pre-fetch buffer reservation that is necessary for a given disk bandwidth reservation. From this relation, the corresponding worst case pre-fetch delay can be found. If the worst case buffer requirement for the retrieval of a given video is rn, then the worst case pre-fetch delay is rnlb, where b is the corresponding reserved bandwidth. For the interactive viewing of videos, we introduce the concept of a pre-fetch delay tolerance quality of service (PDT QoS). A PDT QoS is specified for each stream that a video server supports, and it specifies the worst case pre-fetch delay that can be tolerated during interactive viewing. It is shown that a PDT QoS specified for a stream s that retrieves video U ( S ) is equivalent to placing a lower bound on the bandwidth that can be reserved for stream s. The lower bound for the
0 ioa
a
0
0 5
BandwidtWMbps FIG. 16. Resource relation of MPEG-2 VBR video.
15
330
SEUNGYUP PAEK AND SHIH-FU CHANG
bandwidth can be determined from the buffer-bandwidth resource relation of video u(s).
7.7
Comparison of Retrieval Schedules
The primary strength of MR retrieval is the flexibility to optimally trade bandwidth and buffer. This is captured in the buffer-bandwidth resource relation. While CD and CT retrieval are each represented by a single point on the resource relation for the stored video, MR retrieval can operate at multiple operating points on the resource relation. MR retrieval can set any bandwidth reservation. As the bandwidth reservation is reduced, it is necessary to increase the reserved buffer. Consider a video server that uses MR retrieval. We present two cases to demonstrate the advantage of using MR retrieval over CD or CT retrieval. Case I . Assume that initially, each stream has a bandwidth reservation equal to the peak data rate of the video being retrieved. Assume that the total bandwidth reserved for all streams is equal to the total bandwidth of the video server. Assume that a large buffer memory exists in the video server. Using CT retrieval, no more streams can be supported by the video server because of the bandwidth limitation. In MR retrieval, if all viewers can tolerate a pre-fetch delay, the bandwidth reserved for all the streams can be substantially reduced from the peak bandwidth. Reducing the reserved bandwidth for each stream requires an increase in pre-fetch buffer requirements for each stream, if continuous, lossless retrieval is to be guaranteed. In this way, memory resources can be utilized to alleviate the 1/0 bandwidth bottleneck. By reducing the total bandwidth reserved for all the streams, the video server can potentially increase the number of streams that are supported. Case 2 . Assume that initially each stream has a bandwidth reservation equal to the average data rate of the video being retrieved. Each stream has a pre-fetch buffer requirement. Assume that the total buffer memory reserved for all streams is equal to the total memory resource of the video server. However, assume that total bandwidth reserved for all streams is less than the total disk bandwidth of the video server. Using CD retrieval, no more streams can be supported by the video server because of the memory limitation. In M R retrieval, by increasing the bandwidth reserved for each stream, the memory requirement for each stream can be substantially reduced. In this way, the bandwidth resources can be utilized to alleviate the memory bottleneck of a video server. By reducing the total memory reserved for all the streams, the video server can potentially support more streams.
7.8 Resource Reservation In the previous section we developed and presented the MR retrieval schedule. It was found that the worst case memory buffer requirement for the retrieval of a
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
Buffer
33 1
t
PI bl
Pu
Stream I
Stream M
bu
FIG.17. Resource reservation problem.
VBR video decreases as the bandwidth reservation increases. It was seen that M R leads to a buffer-bandwidth resource relation. In this section, we develop the resource reservation algorithm based on MR retrieval for multiple streams in a video server. In a video server, data for multiple, concurrent streams is retrieved from the disk system to memory and then transmitted into the network. The video server has resources of disk bandwidth and memory that have to be shared amongst all streams. If M R is used for the retrieval of each stream, the important question remains as to what buffer and bandwidth reservations should be made for each stream, i.e. the operating point on the resource relation of the video retrieved by each stream must be determined. Figure 17 shows the resource reservation problem. For each incoming stream, the reservations should be made to maximize the number of streams that can be supported by a video server while guaranteeing the continuous, lossless retrieval and PDT QoS of each stream. Before describing the reservation problem, we present some definitions in Table IV. The lower bound on the stream bandwidth reservation is determined as follows: y , = min b,
(15)
Subject to:
where R,(b,)/b, is the pre-fetch delay. Our objective in a video server is to maximize the number of streams that can be supported. Therefore, we formulate the resource reservation problem as follows. For a given set of streams, determine if there exists a reservation vector for
332
SEUNGYUP PAEK AND SHIH-FU CHANG
TABLEIV
NOTATIONFOR RESOURCE RESERVATION Notation for ressource reservation algorithm
M k = 1, ...,M B, b,. B,, bk= 1,2,. .. Pk
CB
B = ( b , ,_ .., b M ) S
Total number of streams Stream index Bandwidth increment Stream bandwidth reservation Lower bound for stream bandwidth reservation Peak bandwidth of video accessed by stream k Stream buffer-bandwidth resource relation Stream PDT QoS System memory resource constraint System disk bandwidth resource constraint Reservation vector for all streams Reservation vectors withp, < b,, k = 1, , ..,M
all streams that satisfies the following constraints:
E c s.
(19)
The constraints are the memory, disk bandwidth and PDT QoS constraints respectively. If the reservation vector exists, the set of streams can be supported by a video server, otherwise the set of streams cannot be supported. Note that in an actual system, the computation of the total bandwidth reservation is not as simple as above, since the SCAN disk head schedule is assumed. The actual computation based on simplifying assumptions is given in [25]. The resource reservation problem above can be shown to be equivalent to the following two step algorithm: (1) Find a reservation vector which is the solution to the following constrained minimization problem: min M , subject to B,, C S.
( 2 ) If M,
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
333
vector in A that will also meet the memory resource constraint. This is because the optimal reservation vector is the vector in A that minimizes the memory requirement. This means that there can be no reservation vector that meets all system resource constraints and the PDT QoS constraint. Therefore the two step algorithm is equivalent to the optimal resource reservation algorithm. In [25],both a fast optimal solution and a fast heuristic solution to the resource reservation problem are developed and presented. The algorithms are not presented here.
7.9 Progressive Display of Scalable Video In section 2 we described scalable MPEG video. Scalable video can improve the performance of a video server that uses the MR retrieval schedule and its associated resource reservation algorithm. In the progressive display of scalable video for interactive viewing [27], a progressively increasing PDT QoS is specified for the progressively higher scalable layers of a video. Each scalable layer is considered as an independent video. This is in constrast to non-progressive display of scalable video in which a single PDT QoS is specified for a full resolution video. In progressive display, the pre-fetch data for all the scalable layers are retrieved simultaneously since each layer is considered to be an independent video. At any given time, a video is transmitted only with all the scalable layers for which the prefetch data have been fully retrieved. For example, suppose that transmission is to be resumed after an interactive function. If enough time has elapsed only for the prefetch data of the lowest scalable layer to have been retrieved, only the lowest scalable layer is transmitted. If enough time has elapsed for the pre-fetch data of the first two layers to have been retrieved, the first two scalable layers are transmitted. Progressive display of scalable video improves the performance of a video server. For scalable video, let the lowest layer of video have a PDT QoS of 1s. The higher layers will have PDT QoS values larger than 1s. In non-progressive display of scalable video, to achieve the same degree of interactivity, all the scalable layers have the same PDT QoS value of 1s. There are various ways for scalable video data to be placed on disks. In this research, we assumed that each scalable layer is stored separately as an independent “video” and that each layer is interleaved over all disks.
7.10
Performance Evaluations of Retrieval Schedules
This section has a subset of the performance evaluations presented in [27].For the performance evaluation presented here, the disk performance characteristics of two disk systems are as shown in Table V. Disk system 1 has the disk performance characteristics of a current magnetic disk. In disk system 2, the
334
SEUNGYUP PAEK AND SHIH-FU CHANG
TABLEV DISKPERFORMANCE PARAMETERS Parameter Disk cycle time/sec Max. rotation latency/ms Max. seek latency/ms Min. seek latency/ms Max. disk transfer rate/Mbps No. of disks in array
Disk system 1 0.5 14.2 18.0 1.5 60.0 4
Disk system 2
0.5 7.1 9.0 0.75 120.0 4
performance parameters were improved by a factor of two to project the performance characteristics of the next generation of magnetic disk systems. For performance evaluation, trace data for MPEGZ scalable and non-scalable video was obtained using Columbia University’s full-profile, standard-conforming MPEG2 software encoder/decoder [32]. In the following simulations, the video server receives requests for videos from clients. Each new request specifies a certain video which is stored on the video server, for which there exists a resource relation. Each request also has an associated PDT QoS. For each new request, the video server determines if it can accept the request or not. If the video server can accept the new request, a stream is established for the new request. For each simulation the total number of video streams that can be supported by the video server is found for a given set of available video server resources. The simulations find the total number of admissible streams to the video server system as the on-board memory resource is increased, while maintaining a fixed disk bandwidth resource.
7.70.7
Comparison of MR and CT Retrieval
Figures 18 and 19 compare the performance of M R and CT retrieval scheduling. Each line corresponds to a single simulation. In each simulation, all streams are accessing the same MPEG-2 VBR video which has the trace data shown in Fig. 2. Also, in each simulation, each stream specifies the same PDT QoS value. For disk system 1, if a video server has 120 MB, MR retrieval supports 50% more streams than CT retrieval, if users can tolerate a pre-fetch delay of 10.0 s. In the case of M R retrieval scheduling, we used the heuristic resource reservation algorithm to maximize the number of streams that can be supported concurrently by a video server. In the case of CT retrieval, the resource reservation algorithm is based on the fact that each stream requires a bandwidth reservation equal to the peak data rate of the requested video. Figures 18 and 19 show the total number of admissible video streams at the video server system as the total memory resource is increased, while keeping the disk system the same.
335
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
15r
8
0
I
50
100
i0
MemorylMByte FIG. 18. Performance comparison of MR and CT retrieval (disk system 1): (a) MR retrieval PDT QoS 10.0; (b) MR retrieval PDT QoS 2.0; (c) CT retrieval (PDT QoS 0.0).
MemorylMByte FIG. 19. Performance comparison of MR and CT retrieval (disk system 2):(a) MR retrieval PDT QoS 10.0; (b) MR retrieval PDT QoS 2.0; (c) CT retrieval (PDT QoS 0.0).
For MR retrieval, we can see that the number of streams that can be supported increases as the video server memory resources are increased. For continuous, lossless retrieval in interactive viewing, the resource reservation algorithm based on MR retrieval guarantees that no other retrieval schedule can support more video streams for a given set of video server resources.
336
SEUNGYUP PAEK AND SHIH-FU CHANG
It can be seen that CT retrieval cannot take advantage of any increase in the memory resource of a video server. The advantage of the CT retrieval schedule is that the PDT QoS is always zero. This does not mean that the total delay that the client experiences before receiving its requested video is zero. However, the prefetch delay is zero. It can be seen that the performance of this scheme is the same as MR retrieval in which clients specify a PDT QoS of zero.
7.10.2
Comparison of MR and CD Retrieval
Figure 20 compares the performance of MR and CD retrieval scheduling. Each line corresponds to a single simulation run. In each simulation all streams are accessing the same MPEG-2 VBR video which has the trace data shown in Fig. 2. Also, in each simulation, each stream specifies the same PDT QoS value. For disk system 1, if a video server has 150 MB, MR retrieval supports 275% more video streams than CD retrieval. CD retrieval is memory bound. In the case of MR retrieval scheduling, we used the fast heuristic resource reservation algorithm to maximize the number of streams that can be supported concurrently by a video server. In the case of CD retrieval, the resource reservation algorithm is based on two facts. Firstly, each stream requires a bandwidth reservation equal to the average data rate of the requested video. Secondly, there is a fixed memory requirement for the retrieval of the video. Figure 20 shows the total number of admissible streams at the video server
0
50
100
150
200
250
MemoryNBytes FIG.20. Performance comparison of M R and CT retrieval (disk system 1): (a) MR retrieval (PDT QoS 135.0); (b) CD retrieval (PDT QoS 135.0).
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
337
system as the on-board memory resource is increased, while keeping the disk bandwidth the same. We can see that the number of streams supported by CD retrieval is generally much lower than MR retrieval. This scheme is essentially memory bound. The bandwidth is not fully utilized since the memory requirements are the limiting factor in the resource reservation.
8. Conclusions There is no doubt that video servers will be a critical component of the information technologies of the future. Currently many industries are investing research and development teams in developing the next generation of high performance video servers for the Internet, intranet and broadcast markets. In the past few years, there has been a great deal of excitement about video servers for video-on-demand (VoD) and interactive video. Many people (both technical and non-technical) were led to believe that the technology for such systems were just around the corner. Various companies proposed a model of VoD in which a large scale, centralized server would store massive amounts of video information. Consumers around the country and even around the world would connect to this video server to select from the massive number of video titles and receive it on demand. Just as a ball park figure, by very large scale, newspaper articles, technology white papers and research papers mentioned VoD systems in which video servers store on the order of 1000 videos and support on the order of 1000 users. Naturally, it was in the economic interests of companies striving to develop such systems to tiy to convince investors that this particular model of VoD was indeed about to explode on the market. When companies failed to deliver on their promises, there was a resultant skepticism about VoD. The companies found out that the technology was not at a point where large scale, centralized video servers could be developed cost effectively. Furthermore, the network infrastructure to support relatively high bandwidth applications to the home (such as VoD) are currently not available on a wide scale. There was no market for VoD at the prices that the companies could offer. Consumers were unwilling to pay large sums of money to have a limited selection of videos, all of which (and much more) could be borrowed at a fraction of the price from their local video rental store. At this point it would be extremely short sighted to say that VoD is finished. Despite the current skepticism about interactive video, there is no doubt that VoD will indeed become a reality. How can such a strong claim be made in the face of disappointing efforts to develop VoD systems? Such a strong claim can be made for several reasons. Firstly, magnetic disk, robotic tape archive, RAM memory, computer network
330
SEUNGYUP PAEK AND SHIH-FU CHANG
bandwidth costs are continuing to drop rapidly. Furthermore, there is rapid progress in optical networks that will eventually lead to gigabit and terabit networks. It is just a matter of time before VoD will indeed explode on the market. Then, consumers will indeed have access to an array of high quality visual content. The era in which consumers passively accepted whatever content broadcast companies chose to distribute will come to an end. Secondly, although it is currently economically infeasible to develop very large scale centralized servers, the technology is definitely ready for small scale video servers, especially for intranet environments, in which it is much easier to guarantee QoS end to end. It is easy to imagine that there will be a proliferation of small scale video servers on university and corporate based intranets. As the global network infrastructure improves, it is conceivable that video servers on intranets will eventually be interconnected. At that point, the network-connectivity gain will result in users on the global network having access to massive amounts of visual information. The network-connectivity gain is as follows. Consider the global network to have N , video servers, each storing N,, videos. This means that each user on the network can potentially have access to N;N, videos. We will refer to this as the network-connectivity gain. As an example, consider each video server to store 10 videos. If there are 10000 video servers on the global network, each user has access to 100 000 videos. This is the advantage that results from the connectivity of computer networks. It is not necessary to stick to the model of a video server that has to store massive amounts of information. The aggregate of massive numbers of video servers, each of which may only store a small number of videos, will lead to unprecedented amounts of visual information available on the global network. It is clear that the development of video servers will be a key component of the information technology revolution. Video server development will be intimately dependent on the costs of components such as RAM memory and disk systems. Video server development will also be dependent on the extent that QoS and seamlesss connectivity can be provided in a cost effective way by computer networks. Video servers will likely first be developed for the intranet and broadcast environment. In the midst of all of the above dynamically varying factors and industry trends, research in video servers will be critical in guiding the development and evolution of cost effective and high performance video servers. REFERENCES AND FURTHER READING [l] Biersack, E., Thiesse, F., and Bemhardt, C. (1996). Constant data length retrieval for video servers with VBR streams. Proceedings of the International Conference on Multimedia
Computing and Systems. June 1996. [2] Berson, S., Golubchik, L., and Muntz, R., R. (1995). Fault tolerant design of multimedia servers. ACM SIGMOD '95.
HIGH PERFORMANCE DIGITAL VIDEO SERVERS
339
Bolosky, W. J., Barrera, J. S.111, Draves, R. P., Fitzgerald, R. P., Gibson, G., A., Jones, M. B., Levi, S. P., Myhrvold, N. P., and Rashid, R. F. (1996). (Microsoft Corp). The Tiger Video Fileserver. NOSSDAV 96. [41 Chaney, A. J., Wilson, I. D., and Hopper, A. (1995). (Olivetti Research Labs). The design and implementation of a RAID-? multimedia file server. NOSSDAV 95. r51 Chang, S.-F., Elefthenadis, A,, and Anastassiou, D. (1996). Development of Columbia's video on demand test bed. Iniage Coniriiunicarion Journal, Special issue on Video on Demand and Interactive TV. Chang, E., and Zakhor, A. (1994). Scalable video data placement on parallel disk arrays. SPIE Symposium on Iniaging Technotogy, San Jose. ~ 7 1 Chang, E., and Zakhor, A. (1994). Admissions control and data placement for VBR video servers. IEEE International Conferewe on Image Processing. Chang, E., and Zakhor, A. (1996). Cost analyses for VBR video servers. IEEE Multimedia. Chen, M. S., Kandlur, D. D., and Yu, P. S. (1993). Optimization of grouped sweeping scheduling (GSS) with heterogeneous multimedia streams. ACM Multimedia '93. Chen, P. M., Lee, E. K., Gibson, G. A,, Katz, R. H., and Patterson, D. A. (1993). RAID: highperformance, reliable secondary storage. Submitted to ACM Coniputirig Surveys. Also UCB//CSD-93-7 78. Chervenak, A. L., Patterson, D. A,, and Katz, R. H. (1995). Choosing the best ctorage system for video service. Proceedings of ACM Multinzedia '95. October. Chiang, T., and Anastassiou, D. (1994). Hierarchical coding of digital television. IEEE Co17zrnunicationsMagazine. May. Dan, A., Sitaram, D., and Shahabuddin, P. Scheduling policies for an on-demand video server with batching. Proceedings qfACM Multiniedia. Dan, A., Dias, D., Mukherjee, R., Sitaram, D., andTewari, R. (1995). Buffering and caching in large scale video servers. Procwdings qf COMPCON. Dan, A., and Sitaram, D. (1994). Buffer Management Policy for an On-Demand Video Server. IBM Research Report RC 19347, IBM T. J. Watson Research Center, Yorktown Heights, N.Y., January. Flynn, R., and Tetzlaff, W. (1996). (IBM T. J . Watson Research Center, Yorktown Heights, N.Y .). Disk striping and block replication algorithms for video file servers. Proceedings qfthe International Conference on Mirltiriiedia Computing arid Systems. June. Gemmell, D. J., and Han. J. (1994). Delay sensitive multimedia on disks. fEEE Midfirnedia Magazine. Gall, D., L. ( I 994). Digital multimedia systems. Corniiiunicarioiisq f t h e ACM. 34, April. Keeton, K., and Katz, R. (1996). Evaluating data layout strategies. ACM Muttiniediu Systerns fournul. Martin, C.. Narayanan, P. S., Ozden, B., Rastogi, R., and Silberschatz, A. (1996). The Fellini Multimedia Storage Server. Bell Laboratories, Murray Hill, N.J. Technical report. McManus, J. M., and Ross, K. W. (1996). Video on demand over ATM: constant rate transmission and transport. IEEE Jourrial on Selected Areas i n Cornrnunicatioris, 14(6), August. Mitchell, J. L., Pennebaker, W. B.. Fogg, C. E., and Gall, D., L. (1997). MPEG video compression standard. Digitul Multiniedia Staridards Series. Chapman & Hall, London. Ozden, B., Rastogi, R., and Silberschatz, A. (1996). (Bell Laboratories, Murray Hill, N.J.). Buffer Replacement algorithms for multimedia storage systems. Proceeding.r of the International Coifererice uii Multiuirdia Coniputirig ond Sy.steiiis, pp. 172-180 June. Ozden, B., Rastogi, R., and Silberschatz. A. (1996). (Bell Laboratories, Murray Hill, NJ). Disk striping in video server environments. Proceedings of the International Conference or1 Multinzedia Conzputirig arid Systeui.v. June.
340
SEUNGYUP PAEK AND SHIH-FU CHANG
Paek, S., Bocheck, P., and Chang, S.-F. (1995). Scalable MPEG2 video servers with heterogeneous QoS on parallel disk arrays. NOSSDAV '95. April. Paek, S . , and Chang, S.-F. (1996). Video server retrieval scheduling for variable hit rate scalable video. Proceedings of the Interriationul Corlference on Multimedia Coniputing and Systems. Paek, S . , and Chang, S.-F. (1997). Video server retrieval scheduling and resource reservation for variable hit rate scalable video. Submitted to IEEE Trurisacrionson Circuits arid Systems for Video Techriology. Salehi, J. D., Zhang, Z.-L., Kurose, J. F., and Towsley, D. (1996). Supporting stored video: reducing rate variability and end-to-end resource requirements through optimal smoothing. Proceeding of ACM Sigmetrics. May. Tewari, R., Mukherjee, R., Dias, D. M., and Vin, H. M. (1996). Design and performance tradeoffs in clustered video servers. Proceedings of the Interiiationul Conference on Multimedia Computing and Systems. Tobagi, F. A,, Pang, J., Baird, R., and Gang, M. (1993). (Starworks). Streaming RAID-a disk array management system for video files. Proceedings qfACM Multimedia 93. Vin, H., Goyal, P., Goyal, A. (1). and Goyal, A. (2). (1994). A statistical admission control algorithm for multimedia servers. ACM Mulfirnedia '94, San Francisco, USA. Yu, Y . (1994). Columbia MPEG Software Release 6.5 User's Manual. Technical report of Image and Advanced Television Laboratory, Columbia University.
Software Acquisition: The Custom/Package and lnsource/Outsource Dimensions PAUL NELSON ABRAHAM SEIDMANN William E. Simon Graduate School of Business Administration University of Rochester Rochester, NY
WILLIAM RICHMOND Perot Systems Corporation Vienna, VA
Abstract Companies have been outsourcing some or all their information systems projects over the past four decades. These projects include applications development and the maintenance of commercial packages or in-house systems. This paper presents a decision framework that captures the major tradeoffs a firm faces when a software acquisition decision is made. This framework and the method of empirical analysis differ significantly from previous work on software acquisition. The software acquisition problem is depicted as two-dimensional, with the firm deciding whether to custom develop the software or base it on a package and whether to insource acquisition tasks or outsource them. Multinomial logit analysis of extensive field data on actual business decisions (not stated rationales) identifies and measures the key factors affecting these two decisions. We present evidence that a significant interaction exists between decisions on the two dimensions of the software acquisition problem. This “confounding effect” leads firms to make the custom/package and insource/outsource decisions simultaneously. Software acquisition decisions are strongly affected by application properties, technological characteristics and organizational considerations. Surprisingly, we find the software acquisition decision to be largely unaffected by whether or not the system is strategic. Finally, while there is support for the popular belief that software development outsourcing has increased over time, we do not find evidence of such a trend for packages. The framework and empirical results in this paper offer managers a basis for structuring and benchmarking their software acquisition decisions. The paper also characterizes the types of software development and integration projects that it will be most beneficial for vendors to target.
ADVANCES IN COMPUTERS, VOL. 47
ISBN 0-12-012147-6
341
Copyright 0 1998 by Academic Press All rights of reproduction in any fonn reserved.
342
PAUL NELSON ETAL .
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 2. The Software Acquisition Cost-Benefit Framework . . . . . . . . . . . . . . . . 345 2.1 Cost-Benefit Drivers and the Software Acquistion Decision . . . . . . . . . 346 2.2 System Value (Benefits) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 2.3 Needs Analysis Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 2.4 Coding and Installatioii Costs . . . . . . . . . . . . . . . . . . . . . . . . . 348 2.5 Monitoring Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 2.6 Contracting Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 3. Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 3.1 Technological Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 3.2 Application Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 3.3 Organizational Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 352 3.4 Date of Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 3.5 Interaction between the Custom/Package and Insource/Outsource Decisions 353 4. Alternative Models of the Software Acquisition Problem . . . . . . . . . . . . . . 354 5. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 6. Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 6.1 Custom/Package and Insource/Outsource Decision Interaction . . . . . . . 359 6.2 Technological Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 6.3 Application Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 6.4 Organizational Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 363 6.5 Temporal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 7 . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
1. Introduction Software acquisition is a critical concern to firms worldwide. and the magnitude of its impact on the inner workings of firms is expected to grow . Similarly. the number of firms providing either custom or packaged software services is expected to continue its climb. US firms spend more than $250 billion annually acquiring software as they automate more functions and try to keep pace with changing technological and business environments (Standish Group. 1994). When these investments in software are successfully implemented. they result in substantial productivity gains and strategic advantages (Neumann. Ahituv and Zviran. 1992; Brynjolfsson. 1993). To realize these benefits. one must identify and understand the considerations that have the most impact on the firm’s software acquisition decisions . The cost-benefit framework developed in this paper allows for this investigation. and our empirical analysis provides significant support for the framework . This paper develops and empirically validates the tradeoffs firms face when they dscide on how to acquire a software system . The previous software acquisition literature focuses on the project by project insource/outsource decisions made by firms and typically looks at a single particular aspect of these decisions. such as contracting costs . It is clear that outsourcing. or the purchase of a
SOFTWARE ACQUISITION
343
packaged solution, does not mean a transfer of project responsibility. Certain tasks can be delegated, but the ultimate responsibility for, as well as the costs and benefits of, the choice of a particular development path reside with the management of that organization. This paper extends the literature through development of a general, economics-based, cost-benefit framework that encompasses both the customfpackage and insource/outsource dimensions of the software acquisition problem. Six hypotheses are generated concerning the expected effects of key system characteristics on the software acquisition decision. Logit analysis of actual business decisions is used to test these hypotheses. The literature has many studies dealing with the outsourcing of data center operations, communications, and users support activities (Hirscheim and Lacity, 1993). The most popular reasons cited for outsourcing these activities include scale economies, improved service quality, price predictability, flexibility, transforming fixed assets costs to variable costs, and freeing human resources. On the other hand, these empirical studies have shown that some benefits are not shared by all users outsourcing their services (Loh and Venkatranian, 1995). Technological dependence on a single vendor could eventually result in higher costs and limited functionalities. Users critically depend upon the economic viability of the service provider and run the risk of losing proprietary information when outsiders deal with their databases. These issues, while important, are not directly relevant to the decision process associated with software acquisition. Software acquisition projects involve a variety of activities with informational attributes that make it particularly hard to manage. They are not as repetitive as the daily activities involved in running a corporate data center and have major risks and uncertainties that make it difficult for the system to achieve its goals (Banker and Slaughter, 1997; Cusumano and Selby, 1997; Wang, Barron and Seidmann, 1997). The four main issues involve: Information asymmetries. Users have clear ideas about the potential value of the new system, while developers have a better estimate of the anticipated development costs. As the sophistication of systems grows, they produce fewer tangible and more intangible benefits. By definition, there are no objective economic tools for pricing those intangible benefits. Relationship-specific investments. Both parties have to spend a significant amount of time communicating the users’ priorities and the technological tradeoffs faced by the developers. A common problem is the difficulty in establishing information requirements, both for individual users and for the organization as a whole. These requirements may be too complex, fuzzy, or subject to numerous changes during the course of the project. Lack of inarket prices. Applications vary a lot from site to site, making it hard to use market prices as a metric for estimating the true development or integration costs or benefits associated with each particular functionality.
344
PAUL NELSON ETAL.
Especially in larger projects, it is hard to estimate the integration needs and the overall cost and time to deliver. (4) Limited observability. The abstract nature of the final product makes it extremely difficult for users to observe the actual development rate and the overall long-term quality of the system. For instance, it may not be apparent right away to the users that the developer has staffed the project with inexperienced people, and when an external developer runs behind schedule the problem may be hidden for a long time (Mehler, 1991). Understanding the managerial impact of these issues requires a cost-benefit framework that unifies the important components of the software acquisition problem that have been investigated separately in the literature. It builds upon the contracting and monitoring cost issues addressed in the transaction cost (Coase, 1937; Williamson, 1985, 1989) and incomplete contracting (Grossman and Hart, 1986; Holmstrom and Tirole, 1989; Whang, 1992; Richmond, Seidmann and Whinston, 1992; Richmond and Seidmann, 1993; Chaudhury, Nam and Rao, 1995) literatures and includes the tradeoff between monitoring and production costs developed in the agency theory literature (Ross, 1973; Jensen and Meckling, 1976; Gurbaxani and Kemerer, 1989). The framework also encompasses the various rationales provided in the empirical literature regarding firms’ insource/outsource decisions. These papers report reasons given by managers when asked to explain their decision to either insource or outsource certain information technology (IT) services (i.e., facilities management services). The custom/package decision has not received attention. The primary reasons given for outsourcing include acquiring technical expertise, lowering staff and production costs, and speeding system installation (Altinkemer, Chaturvedi and Gulati, 1994; Meyer, 1994; McFarlan and Nolan, 1995). The primary reasons given for insourcing are that the system requires extensive knowledge of the business or pertains to a core competence, and that contracting costs for outsourcing are high (Lacity, 1992; Jones, 1994; King, 1994; Lacity, Hirscheim and Willcocks, 1994). These same justifications for insourcing or outsourcing are given in the trade press (Caldwell 1989; Gilliam 1990; Harrar, 1993; Evans, 1994). The trade literature cites failure stories involving in-house systems (Groenfeldt, 19971, packages (King, 1997), and outsourced projects (Bulkeley, 1996).Patane and Jurison (1994) investigate software system development in particular and their findings mirror those for IT services. Most of these studies use survey instruments that directly capture the decision rationales from respondents. However, a substantial literature in marketing shows direct subjective measurements of the importance of decision factors provides inferior explanatory power relative to importances inferred from analyzing stated preference or actual decision data (Hauser and Urban, 1977). Our empirical analysis, therefore, extends
345
SOFTWARE ACQUISITION
the previous literature in that the key factors behind the software acquisition decision are determined from firm actions and not by post-decision rationalizations. The paper is organized as follows.' First, a general, cost-benefit decision framework for the software acquisition problem is developed. We examine the drivers behind this framework and tie these drivers to the underlying technological, application and organizational characteristics of a software acquisition project. Nexr, we generate six hypotheses concerning the effects of these characteristics on the software acquisition decision. These hypotheses are then investigated using logit analysis of actual choice data. We close with a summary that highlights our empirical findings.
2. The Software Acquisition Cost-Benefit Framework The software acquisition problem involves two decisions-what acquisition approach to use (whether to custom develop the software or base it on a package) and who should carry out this task (internal resources or external providers). The insource/outsource dimension is analogous to the make/buy decision faced by a manufacturer when it decides who is to make an input. The second decision concerning how the input is to be made (the custom/package dimension), however, makes the software acquisition problem fundamentally different. This twodimensional, four-option decision is depicted in Fig. 1. The process for using Acquisition Team
Custom Acquisition Approach
Package
Insource
Out source
Internal resources only for
Vendor performs needs
needs analysis, coding, etc
analysis, coding, etc.
Internal resources only for
Vendor performs package
package selection,
selection, modification, etc.
modification, etc. FIG. 1. The software acquisition problem.
'
An abridged version of this research was previously published as Two dimensions of software acquisition, Communications of the A C M , 39, 29-35, 1996.
346
PAUL NELSON ETAL.
packaged software involves identifying the company’s information processing requirements, obtaining and evaluating information on alternative software packages, selecting a package, possibly modifying the package, and installing and testing the selected system. If an external provider is used to carry out these functions, the acquisition is termed an outsourced package (or [ outsource, package]). If in-house staff are used, the acquisition is an insourced package ([ insource, package]). When the unique requirements of a user can not be cost-beneficially provided through modifications to packaged software, customized software is developed. Custom development of software generally follows the systems development life cycle or is based on prototyping. These involve a series of possibly repeated steps starting with needs analysis, system design, coding, and testing, and ending with training and installation. An outsourced custom project, [outsource, custom}, uses a vendor to carry out these activities, while an insourced custom development, [ insource, custom 1, uses only internal resources. In making the customlpackage and the insource/outsource decisions, the company needs to understand the costs and benefits associated with each choice, as well as any interactions between these choices. Typically, the company’s goal is to maximize the net present value of the software acquisition, subject to any organizational biases and constraints. Consequently, to make a software acquisition decision properly, the firm must understand the expected benefits of the software system (system value) as well as a variety of costs incurred during the acquisition of the system. These include production costs ([a] needs analysis and [b] coding and installation), and costs related to monitoring the acquisition process and to contracting.
2.1
Cost-Benefit Drivers and the Software Acquisition Decision
Typically, the financial magnitudes of these five cost-benefit drivers (system value and costs related to needs analysis, coding and installation, monitoring and contracting) differ depending upon whether an internal or external software acquisition team is used (i.e., whether the acquisition is insourced or outsourced). Similarly, these magnitudes differ depending on whether the software is based on a package or custom developed. Table I presents the five cost-benefit drivers and identifies which drivers typically favor a particular acquisition team or approach. Upward pointed arrows ( T ) denote which cost-benefit drivers typically favor insourcing over outsourcing and vice versa and which favor custom development over package-based development and vice versa. For example, Table I shows that contracting costs tend to favor insourcing and packages since this cost is typically lowpr when internal rather than external resources are used to acquire a software system and when the system is based on a package rather than custom developed.
347
SOFTWARE ACQUISITION
TABLE I IMPACT OF COST-BENEFIT DRIVERSON THE SOFTWARE ACQUISITION DECISION
Acquisition team Cost-benefit driver
Insource
System value Needs analysis Code and install Monitoring Contracting
T T
Outsource
Acquisition approach Custom
Package
T T T
T T T
Upward pointed arrows ( T ) denote which cost-benefit drivers typically favor different acquisition teams and/or approaches.
The relationships in Table I are important to the system acquisition problem because they identify which cost-benefit drivers typically favor which acquisition options. For example, the decision to insource or outsource depends on (a) the relative sizes of the advantages that insourcing has with respect to the costs of contracting and needs analysis and (b) the relative sizes of the advantages that outsourcing has with respect to monitoring and coding and installation costs. If insourcing’s advantage on, say, contracting costs increases, insourcing becomes relatively more appealing.
2.2
System Value (Benefits)
The first of the five cost-benefit drivers presented in Table I is system value. A particular software system is designed to meet possibly unique requirements that support a set of business functions. The value of the system to the firm depends upon the financial importance of these business functions, how effectively and efficiently the system performs the associated requirements and how quickly the software is delivered. In comparing internal to external development of a particular project, system value does not differ because both development teams are expected to deliver the same system. System value is expected to be greater for a custom system since a custom system can be specified so that it better fits the application requirements and thus captures more of the application’s potential value. Alternatively, a package is designed to be more general and address requirements common to many firms. Hence, it is less effective in covering a company’s unique requirements.
2.3
Needs Analysis Costs
Production costs include costs associated with (a) needs analysis and (b) coding
348
PAUL NELSON H A L .
and installation. Needs analysis involves determining the requirements necessary to automate the desired business application (e.g., customer reservations, manufacturing resource planning and accounts receivable). This requires firm-specific human and informational capital. Needs analysis costs are lower internally since the internal information systems (IS) group better understands the firm’s specific business operations and procedures. On the custom/package dimension there are no differences in needs analysis costs because the same needs analysis must be conducted whether the software is acquired as a package or custom developed.
2.4
Coding and Installation Costs
Coding and installation (code and install) includes the detailed software design that translates the users’ requirements into a programmable solution, the coding, testing and installation of the programs, and finally the conversion of business operations over to the new system. For software packages, the design, coding and some of the testing costs are included in the purchase price or licensing fee. The actual implementation of the automated business process depends on the firm’s existing infrastructure, which is a corporate constraint. Coding and installation costs are higher when the software is developed internally since the external market has more competitive sources of labor and technical expertise, and it is in a better position to take advantage of scale and scope economies (Jones, 1994).As an example, the advent of overseas programming houses in low-wage countries has dramatically driven down prices charged by external developers (Patane and Jurison, 1994). Similarly, due to the scale and scope economies available to vendors, coding and installation costs are lower for packaged software than for custom developed software.
2.5
Monitoring Costs
Software differs from most manufacturing inputs or “physical goods” in that its quality is difficult to assess prior to extended experience with the “product.” Economists would refer to software as a credence good. For example, the attribute levels (e.g., size, shape and weight) of most physical goods generally can be ascertained prior to or on the date of delivery through past experience or search. On the other hand, measurement of software system attributes (e.g., functionality, data integrity, response time and the ability to be modified) requires repeated experience with the completed system. Similarly, only imprecise evidence of production progress (the acquisition team’s effort and quality) is available since physical input measures such as lines of code or the number of function points are indirect measures, at best, of development progress and system value. These characteristics of software acquisition heighten concerns over opportunistic behavior by employees or vendors that may result in higher coding and installation costs or
SOFTWARE ACQUISITION
349
lower system value. Monitoring and contracting act as controls on these concerns. Monitoring entails supervising the development team to ensure timely and efficient completion of the system with the desired quality. This includes the costs of managing the project as well as any costs incurred for quality assurance, progress reviews and training. Monitoring costs are lower if external development is undertaken. Software vendors, largely due to scale and scope economies, are typically more efficient at managing software production activities due to better defect prevention and process change management (Joch, 1995). Monitoring costs are lower for packaged software than they are for custom developed systems because of the “physical good” aspects of a package.
2.6 Contracting Costs Contracting costs include the costs of searching for and evaluating potential custom or packaged software vendors, benchmarking and screening their capabilities, specifying the legal terms of a contract, negotiating the contract’s financial details and dispute resolution. In addition, it includes the expected cost of any opportunistic behavior and any risks associated with the possible loss of proprietary information. A contract acts as a legal means to ensure the desired levels of development progress and completed quality. Contracting costs are lower if development is done internally since the company does not have to search for a vendor, negotiate a project-specific contract or use external entities to enforce this contract. Contract disputes over the failure of computer hardware or software to perform as expected are common (Pollack, 1990). Typically, the firm relies on monitoring rather than project-specific contracts for its internal personnel. As with monitoring, contracting costs are lower for packaged software because of the “physical good” aspects of a package.
3. Hypotheses For a particular software acquisition problem, if the financial values for the five cost-benefit drivers-needs analysis, coding and installation, monitoring, contracting and system value-are known for each of the development options considered, the optimal choice is the option that provides the maximum net present value. Unfortunately, such data are generally not available to managers when making a software acquisition decision because many of the data are typically very difficult to ascertain prior to (or even after) project launch or completion. Prior to an acquisition decision, managers know the characteristics of the desired software and application to be automated, the firm’s abilities relative to these characteristics and the organizational attitudes and constraints concerning software development. These determinants of the decision process center on three
350
PAUL NELSON ETAL.
factors: technological features, application properties and organizational considerations. The first two factors directly affect the five cost-benefit drivers and, hence, steer the manager’s decisions on both dimensions of the software acquisition problem in predictable ways. Organizational considerations may or may not affect the cost-benefit drivers, but in either case they influence the software acquisition decision. Table I1 outlines the intuitive relationships among these three factors, the five cost-benefit drivers and the two dimensions of the software acquisition decision. Four hypotheses discussed below address the expected marginal impacts of technological, application and organizational factors on a particular software acquisition decision. A temporal effect on this decision is also hypothesized because the trade literature frequently indicates that outsourcing has become more popular over time. Finally, we ask whether decisions on these two dimensions of the software acquisition problem interact.
3.1 Technological Features According to both managers and researchers, one of the primary reasons for outsourcing software development is to obtain access to expertise in specialized technologies and advanced development environments (Meyer, 1994; Patane and Jurison, 1994). External vendors can leverage expertise and design concepts across numerous projects and firms, thereby incurring scale and scope economies that are unlikely to be available to the typical firm. Consequently, as the technology becomes more specialized or advanced, coding and installation costs more strongly favor outsourcing. Since more complex technology is more difficult and technically demanding, it also increases monitoring and contracting costs. However, while the relative increase in the importance of monitoring costs is likely to be smaller than that for contracting costs, outsourcing’s increased coding and installation cost advantage is likely to dominate. As shown in Table 11, the increased importances of the costs associated with coding and installation, monitoring and contracting all favor packaged software.
Hypothesis 1: Systems using specialized technology or advanced development environments are, all else equal, more likely to be outsourced and/or associated with packaged software.
3.2
Application Properties
Managers and researchers frequently cite two application properties-the application’s uniqueness and its strategic role-as strongly favoring a decision to insource (Jones, 1994; Lacity, Hirscheim and Willcocks, 1994). Strategic applications are intended to provide a company with a competitive advantage over its
TABLEI1 RELATIONSHIPS BETWEENPROJECT CHARACTERISTICS, COST-BENEFITDRIVERS AND SOFTWARE ACQUISITION ~~~~~~~~~~
~~~
Software acquisition Relative importance of cost-benefit drivers Acquisition team Hypotheses H,
H2
H3 H4
H5
Project characteristics Specialized/ advanced technology Strategic application Common application Organizational considerations Installation date after 1990
System value
T
Needs analysis
Code and install
Monitoring
Contracting
T
T
T
T
T
T
lnsource
Outsource
Acquisition approach Custom
T
T
T
T
Package
T T
T
T
T T
On the left side of the table, upward pointed arrows ( T ) denote those cost-benefit drivers whose relative importances are increased by a particular project characteristic. Based on the discussion pertaining to Table I on how the cost-benefit drivers impact the acquisition team and approach decisions, the upward pointed arrows on the right side of the table denote which acquisition team and approach are more likely as a result of each particular project characteristic. Thus, the right side of the table synopsizes Hypotheses 1 through 5 (H,-H5).
352
PAUL NELSON ETAL.
rivals. The intimate relationship between a strategic application and a firm’s core competencies results in higher system value. Consequently, needs analysis costs and the costs associated with potential opportunistic behavior (monitoring and contracting costs) become relatively more important. For such an application, the higher contracting costs associated with outsourcing are expected to exceed insourcing ’s higher monitoring costs. This, coupled with the increased importance of needs analysis, favors the use of internal development. Similarly, the effect on system value is expected to be stronger than the effect on potential opportunistic behavior. Thus, the model predicts that strategic systems are more likely to be custom developed.
Hypothesis 2: Strategic applications are more likely to be insourced and/or custom developed, all else equal. For applications that are common to many organizations and that use standard data structures, vendors can leverage their investment and labor pool across multiple clients. This effectively increases the relative advantage of outsourcing and packaged software with respect to the coding and installation costs. In a similar vein, a unique system specifically designed and tailored to a particular company’s requirements entails higher needs analysis costs, resulting in a tendency toward insourcing.
Hypothesis 3: Common systems are more likely to be outsourced and/or acquired as packaged software, all else equal.
3.3 Organizational Considerations A company’s culture and the system’s sponsor have been identified as significant factors in software acquisition decisions (Arnet and Jones, 1994; Nault and Richmond, 1995). Examples of possible organizational biases are the “notinvented-here syndrome” and “kingdom-building” (Jensen and Meckling, 1976). These two particular biases result in a tendency toward insourced, custom development. Alternatively, a firm or system sponsor may have biases that favor outsourcing or packaged software. Organizational biases of this type can be seen as covariates to the cost-benefit model. Other organizational factors, such as worker empowerment or the ability to retain employees with desirable state-of-the-art technical skills, may affect the relative costs and benefits of packaged software versus custom development and insourcing versus outsourcing. For example, a firm may excel in managing vendor relationships, which reduces the relative importance of contracting costs.
Hypothesis 4: Organizational biases favor insourced and/or custom software acquisition, all else equal.
SOFTWARE ACQUISITION
3.4
353
Date of Installation
Recent publications propose four reasons for a perceived increase in the level of outsourcing over time: the increased visibility of outsourcing, an increased focus by firms on their core competencies, the increased availability of outside vendors and management blindly following the latest fad (Loh and Venkatraman, 1992). Two alternative explanations argue against inclusion of temporal effects as an additional covariate to our model. One possibility is that, while the absolute amount of outsourcing has increased due to an increase in the total number of functions being automated, the percentage of projects outsourced has not changed over time. This could also be said for the use of packaged software. A second explanation for increased outsourcing or use of packaged software is that technological features and application properties have changed over time in a manner that favors outsourcing and packages. Both of these alternative explanations imply that the date of system installation has no incremental effect on the software acquisition decision beyond those of the technological, application and organizational factors.
Hypothesis 5: Outsourced software acquisition has become more likely over time, all else equal.
3.5
Interaction between the Custom/Package and Insource/Outsource Decisions
The custom/package and insource/outsource decisions are related since they share the same five cost-benefit drivers. If, in addition, these two decisions interact, software acquisition is fundamentally different from a manufacturer’s unidimensional rnake/buy decision. That is, the “confounding effects” of each decision on the other lead the firm to make these decisions in combination, rather than separately. The use of packaged software lowers the relative importance (monetary value) of all the drivers except needs analysis costs, with coding and installation costs the most affected. Since both of the drivers that favor outsourcing-coding and installation costs and monitoring costs-are negatively affected, the use of a package favors insourcing. Analogously, custom development favors outsourcing. Alternatively, outsourcing reduces the importance of coding and installation costs and increases the importance of contracting and monitoring costs. Insourcing leads to the opposite effects. Neither leads to a clear-cut custom or packaged development tendency. These interactions increase the complexity of the software acquisition decision. Correspondingly, this results in either a sequential decision structure, with the custom/package decision preceding the insource/outsource decision, or a simultaneous choice among four acquisition options-{ insource, package), ( insource, custom), [ outsource, package} and { outsource, custom).
354
PAUL NELSON ETAL.
Hypothesis 6: An interaction exists between the insource/outsource and custom/package decisions.
4. Alternative Models of the Software Acquisition Problem We have constructed three models to identify the existence and type of interaction between the custom/package and insource/outsource decisions. Underlying each of the three hypothesized models is the idea that the utility of a software acquisition option depends on its perceived costs and benefits, which in turn depend on the system’s technological, application, organizational and temporal factors. This is similar to the traditional multiattribute utility framework used in marketing and economics to model brand choice or preference with respect to a set of products or services (Lancaster, 1966; Rosen, 1974; Green and Srinivasan, 1990). For any particular project m,the utility of acquisition optionj is
I
The a, intercept term is an option-specific constant depicting a firm’s predisposition to use optionj (i.e., organizational biases or corporate policy). X,,, represents the level of explanatory variable i intrinsic to project in. These explanatory variables measure the technological, application, organizational and temporal factors. If the explanatory variable i is nominal-scaled, X,,, is a binary (dummy) variable equal to one if variable i is present in project m and zero otherwise. For example, a possible explanatory variable-let us call it variable 1-may pertain to whether or not a third-generation programming language is used, resulting in X,,,j equal to one if the system incorporates a third-generation language and zero otherwise. Similarly, explanatory variable 2, X Z m ,may equal one if the application is strategic and zero Otherwise. The /3 terms reflect the impact of a particular explanatory variable i on the utility of option j . Note that, following the discussion in the hypotheses section, the explanatory variables may have differential effects on the utility of each acquisition option (i.e., P I ,rather than /3, is modeled). The firm wishes to choose the acquisition option with the largest utility. Hence, for a particular project the probability that the f i chooses option j over option k is
,,
P [ j preferred to k ] = P [ q > Uk].
(2)
It follows that the probability that option j is preferred to all other acquisition options is
P [j ] = P[U, > U,: for every k , k
=
1,2, ...,K , k
* j].
(3)
If the utilities can be calculated with certainty, these probabilities equal either zero or one. However, if uncertainty exists due to unknown or unmodeled factors, the estimate of each acquisition option’s utility equals its true utility plus an error
SOFTWARE ACQUISITION
term, Ulest= U,
355
+ E ] . This results in
P [ j ] = P [ U y t- IJ? >
E~
- F , : for every k, k = 1,2, ..., K , k t j ] .
(4)
The error terms are commonly assumed to be independently distributed random variables with a Gumbel distribution. This assumption results in the multinomial logit model, which has an intuitively appealing representation for the probability of choice of each possible option (McFadden, 1974; Gaudagni and Little, 1983). The probability that option j is chosen depends on the ratio of an exponential transformation of option j's utility to the sum of exponential utility transformations over all options including j : eu;"'
P[j]=-.
(5)
X = l
For expository ease and identification reasons, we define option 1 as a base case with utility normalized to zero. For a particular project, the "utilities" of the other options then depict the differences in utility between each particular option and the base case (option 1). Mathematically, for project m, the probability that option j is chosen is
Setting y, = aI - a and b , = p,, - pil, these equations simplify to
356
PAUL NELSON ETAL.
The higher the utility of option j relative to that of the base case, the greater is its probability of being chosen. It follows that technological, application, organizational and temporal factors that favor option j relative to the base case have positive parameter values for the yJ and d,, terms, and those that favor the base case have negative values. So, if for optionj the variable measuring whether or not the application is strategic (called X , , earlier) is found to have a positive parameter value d , , the utility of option j is 6, greater than the utility of option 1, the base case. Given equations (8) and (9), this means that the probability of choice for optionj is higher if the application is strategic, while for option 1 this probability is lower. The first possible model of software acquisition is a simple four-option version of the multinomial logit model proposed in equations (8) and (9). This model assesses the probability of each of the four { insource/outsource, custom/package ] combinations, P[insource/outsource, custom/package], directly. The base case option 1 is defined as (insource, custom]. Option 2 is (outsource, custom }; option 3 is { insource, package 1; option 4 is (outsource, package]. Our second model of software acquisition behavior assumes that decisions on the two dimensions of the software acquisition problem do not interact. This assumption simplifies the representation of the firm's decision to the product of two binomial (two-option multinomial) logit models: P[insource/outsource, custom/package] = P[insource/outsource]P[custom/package]
( 10)
eu""x, hex,,', 1 + eu""+x,d?",",
(11)
1 P[insource] =
1 + er""+x,6:%",
;
P[outsource] =
The third hypothesized model assumes a particular type of interaction exists between the custom/package and insource/outsource decisions. It is motivated by a few managers who suggested that their firm first decides whether or not to use a package and then decides whether to outsource or insource. This model assumes a nested decision process in which the insource/outsource decision is conditional on the custom/package decision. A nested logit framework results in which P[insource/outsource, custom/package] can again be represented as the product of two binomial logit models (McFadden, 1981: Ben-Akiva and Lerman, 1985): P[insource/outsource, custom/package], P[custom/package]P[insource/outsourcegiven custom/package]
( 13)
357
SOFTWARE ACQUISITION
insourcelcustom] =
1 1 + eYp+x:, 4 x,,,,’ eY’
outsource(custom] =
insourcelpackage] =
+I,w,,,,
1 + e Y ‘ + I , a:x,,,,
(14)
1 1 + eY”+x,YX,,,, ’
5. Data Survey data were collected from five companies concerning every documented software development project that each firm had ever undertaken. Each firm’s actions (revealed preferences) rather than post-decision rationalizations for each custom/package, insource/outsource decision pair were ascertained. This allows our analysis to differ from and improve upon previous studies by investigating the factors affecting what firms do rather than what firms say affects their actions. In a similar situation, marketers find estimations of attribute importances based on stated brand preferences or choice to provide superior explanatory power relative to the direct statements of attribute importances from respondents (Hauser and Urban, 1977). Multiple measures of the technological, application and organizational factors were collected, as was the date of each system’s installation. The data cover 186 system projects ranging from small personal computer applications to large distributed systems and from common accounting systems to strategic operational systems. System installation dates ranged from 1967 to 1993. A single individual
358
PAUL NELSON ETAL.
in each company filled out all of the firm’s surveys, which were then reviewed with the authors. The participating companies range in size from $20 million to $900 million in annual revenue. They include a regulated utility, a long-distance telephone company, a payroll processing company, a rocket engine manufacturer and a building products manufacturer. Although the sample may not be representative of industry in general, it does include a broad cross-section of applications, industries and company sizes. The two dimensions of the software acquisition decision for each project were recorded. Sixty-four percent of projects were { insource, custom}. Seventeen percent were { outsource, custom). Eleven percent were { insource, package}, and the remainder (eight percent) were { outsource, package }. Five technological features for each system were specified by the survey respondent:
(1) (2) (3) (4) (5)
database management system (DBMS); programming language; hardware platform; system architecture; processing mode.
Information regarding the operating system(s) and on-line monitor(s) was also collected. Since these responses were highly correlated with the hardware platform responses, they were not analyzed further. Four variables were ascertained to measure each system’s application properties. The first three measures pertain to the application’s uniqueness, while the fourth pertains to its strategic role:
(1) (2) (3) (4)
application type (organizational level supported); functional area; application uniqueness (relative to other firms in the industry); strategic mission.
Two variables measure organizational considerations: 1. system sponsor (organizational level); 2. firm (binary variables used to encompass idiosyncratic organizational biases and constraints). In order to measure a possible time trend, the installation date of the system was recorded.
6.
Analysis and Results
We begin by determining the most appropriate logit model and corresponding view of the software acquisition problem. For the chosen model, we then discuss the
SOFTWARE ACQUISITION
359
relative importances of the technological, application and organizational factors to the software acquisition problem. Finally, we address a possible temporal effect.
6.1 Custom/Package and Insource/Outsource Decision Interaction For each of our 186 observations, the variables measuring the technological, application, organizational and temporal factors are nominal scaled. Hence, each variable i is represented by a binary (dummy) variable, X,,,. For each of the three models developed, these binary variables act as the right-hand-side independent variables in the non-linear probability equations outlined in Section 4. The dependent variable in the three models is the insource/outsource, custom/package decision. The nominal scale of these data rules out the use of regression analysis as a tool to estimate the relationship between the independent variables and the insource/outsource, custom/package decision. Consequently, a maximum likelihood procedure is used to estimate each logit model’s parameters (the y’s and 6’s). This complex procedure basically computes the set of parameter values that at the aggregate level maximizes for each project (observation) the estimated probability of choice for the option actually chosen, and minimizes the estimated choice probabilities for all the other options. For mathematical details see McFadden (198 1). Statistical evaluation is used to determine which of the three decision models presented in Section 4 best depicts firm behavior. Since the model assuming that the custom/package and insource/outsource decisions do not interact is not a restricted version of the four-option multinomial logit model, we use Akaike’s information criteria (Akaike, 1974) to compare the models. This criterion is analogous to the comparison of adjusted r-squares in regression analysis and strongly supports the four-option multinomial logit model. A nonparametric x 2 test was also carried out concerning the hypothesis that P[insource/outsource, custom/package] = P[insource/outsource]P[custom/package]. This test also rejected the hypothesis of no interaction at the 5% level. Consequently, we find support for the hypothesis (Hypothesis 6) that an interaction exists between the insource/outsource and custom/package decisions. The data do not support the notion that the insource/outsource decision is conditional on the custom/package decision. Since the four-option multinomial logit model is a restricted version of the nested logit model depicted in equations (13)-(lS), a comparison of the two models’ log likelihood values is applicable. For the nested logit model to be appropriate, the category value parameters, q5c and @, must be statistically significantly different from zero; otherwise, the fouroption multinomial logit model results. These conditions are not borne out in the estimation results. The hypothesis that @ < and @)? equal zero can not be rejected at even the 10% level.
360
PAUL NELSON ETAL.
Based on these findings, the four-option multinomial logit model expressed in equations (8) and (9) best depicts firm behavior and is used to investigate Hypotheses 1 through 5. The parameter estimates for this four-option mukinomid logit model, the y J s and d,Jsin equations (8) and (9), are provided in Table III. In general, they show significant support for our hypotheses. Note that some explanatory variables for particular options have been dropped in order to simplify the analysis, reduce collinearity problems and focus attention on the key decision factors. If a variable was dropped, it was not statistically significant (onesided) at the 10% level. Also note that a positive parameter value means that the probability of the denoted acquisition option increases relative to the base case { insource, custom], holding all else constant. A negative parameter value means that the probability of the denoted acquisition option decreases relative to { insource, custom], holding all else constant. For example, Table I11 presents a positive (outsource, custom} parameter value (22.80) for the dummy variable equal to one for Firm 2 and zero otherwise. This implies that, holding the effects of all other explanatory variables constant, Firm 2 is more disposed toward outsourced custom development than insourced custom development (the base case). Moreover, since this Firm 2 parameter is larger than both the Firm 2 parameter for [ insource, package] and the Firm 2 parameter for [ outsource, package] (4.28 and 5.63, respectively), Firm 2 is more likely to use outsourced custom development than any of the other three acquisition options, holding the effects of all other explanatory variables constant. The discussion of each parameter estimate that follows implicitly holds all else constant.
6.2 Technological Features We find general support for the idea that systems using specialized technology or advanced development environments are more likely to be outsourced and/or associated with packaged software (Hypothesis 1). Estimated parameter values pertaining to four of the five technological factors provide support for this hypothesis.
(1) Database management system (DBMS). We find that the ( outsource, package] option is less likely when no database system (i.e., a file-based system) is used. This follows our hypothesis, since file-based systems use the simplest, most standard technology and are not associated with advanced development environments. ( 2 ) Programming language. We find that systems written in third-generation languages are less likely to be { outsource, custom}, and those written in fourth-generation languages are more likely to be [ outsource, package]. These findings support Hypothesis 1, since third-generation languages are a standard technology, whereas fourth-generation languages are more specialized and are associated with advanced development environments.
361
SOFTWARE ACQUISITION
TABLEIII
FOUR-OPTION MULTJNOMIAL LOGITRESULTS System characteristics
Variables
Technological features
H,
DBMS Programming language
File based Third-generation Fourth-generation Minicomputer Mu1tiple Distributed Batch
Hardware platform Architecture Processing mode
Application properties H29 H3 Strategic mission Strategic Application uniqueness Common Application type Transaction processing Combination Functional area Finance Organizational considerations Firm
(Outsource, custom] parameters
(Lnsource, package 1 parameters
-0.10
4.34*** 1.81* -
-
2.01*** 1.61*** -
2.22***
(Outsource, package} parameters
-2.17*** -
2.18*** 3.63*** 2.32** -
0.98* 2,44*** 3.14***
2.11*** 1.61**
22.80***
4.28*** 4.28***
5.63*** 3.88** 5.63* * *
22.80*** -
2.61**
5.63***
-
-
Intercept
-25.13***
-9.91***
- 10.47***
Log likelihood value
-123.0***
-
0.89* 1.43** 1.42" -
-
H4
Sponsor level
Firm 2 Firm 3 Firm 4 Firm 5 Executive
Time of installation
H5
22.80*** -
Before I98 1 After 1990
Intercept term
H4
Goodness of fit Significance levels (one-sided)
*** ** *
1% level 5% level 10%level
Results pertain to a four-option version of the model expressed by equations (8) and (9). The parameter estimates provided relate to the likelihood of the three noted acquisition options relative to a base case, which is { insource, custom 1.
362
PAUL NELSON ETA.!..
( 3 ) Hurdwure platform. The results show that applications running on multiple platforms or minicomputers are more likely to be associated with outsourcing and packaged software. Both findings support Hypothesis 1. Multiple platform systems require specialized abilities in systems integration. Minicomputer environments are frequently departmental computers with minimal IS support. Thus, technological expertise and manpower constraints are more binding. (4) Sysfenz architecture. Whether the system is distributed or centralized has no statistically significant effect on the software acquisition decision. The impact of distributed versus centralized systems may be captured by the multiple hardware platform variable. ( 5 ) Processing mode. We find that companies are more likely to outsource custom development of batch systems despite the low technological complexity. The custom aspect of this finding fits with Hypothesis 1, but the outsourcing component is unexpected.
6.3 Application Properties The evidence is clear that common applications are more likely to be outsourced and/or acquired as packaged software (Hypothesis 3). On the other hand, little support is evident for the hypothesis that strategic applications are more likely to be insourced and/or custom developed (Hypothesis 2). There is weak statistical evidence that strategic applications are more likely to be (insource, package]. With respect to insourcing, this is consistent with Hypothesis 2, but the use of packaged software is counter to our expectations. We propose two explanations for these results. (1) Measurement errors occur because some applications classified as strategic are critical to firm operations (missioncritical) but do not give the firm a strategic advantage. Indications of this include one company classifying its human resource management system as strategic. In addition, some applications may be misrepresented as strategic. (2) The information system itself does not generate a competitive advantage. Rather, it is how the system is used that generates the value. In such systems, coding and installation costs or even contracting costs may become relatively more important decision drivers. These explanations are consistent with those made by Lacity, Willcocks and Feeny (1995) and by DiRomualdo and Gurbaxani (1996) for why IT services labeled as strategic are not necessarily insourced. These observations imply that even for strategic applications a good economic rationale may exist for not internally developing the system from the ground up. Companies may do so in order to realize their strategic business ambitions during an era of increased technological specialization. Our three variables measuring application uniqueness generally confirm the
SOFTWARE ACQUISITION
363
notion that more common applications are more likely to be based on a package and/or outsourced (Hypothesis 3). (1) Application type. We find that TPSs are more likely to be {outsource, custom 1. TPSs support functions common to many businesses, but each system requires a substantial amount of firm-specific coding in order to meet the firm’s idiosyncratic data and processing requirements. This combination of unique and common features increases the likelihood of outsourcing and custom development, as stated in Hypothesis 3. We also find combination systems to be outsourced. This is consistent with Hypothesis 1, since such systems involve more specialized technologies or advanced development environments. ( 2 ) Functional area. Results indicate that applications in the finance/accounting area are strongly associated with { insource, package]. This supports Hypothesis 3, as these applications are very common. (3) Application uniqueness. The data show that common applications are more likely to be outsourced and/or packaged. This strongly supports Hypothesis 3. In summary, the first three hypotheses are generally supported by our data. The results for Hypotheses 1 and 3 can be simply depicted graphically, as shown in Fig. 2. Figure 2 shows that neither application nor technological factors dominate the software acquisition decision. A system addressing a common application and based on common technology is more likely to be acquired as a package, but there is no clear evidence concerning the insource/outsource dimension. Similarly, a system addressing a common application and based on specialized technology and advanced development environments is more likely to be outsourced, but there is no clear evidence concerning the custom/package dimension. Analogous findings pertain to the other two quadrants in Fig. 2.
6.4
Organizational Considerations
There is strong support for the idea that companies have idiosyncratic tendencies concerning their software development decisions (Hypothesis 4). The intercept terms and firm specific constants show that all firms have a strong predisposition to insource and custom develop software systems. Firms 2 , 4 and 5 are a bit less predisposed toward custom development, while Firm 3 is less averse to packaged software if its installation is done by an outside vendor. We find the organizational level of the project’s sponsor to have no effect on either dimension of the software acquisition decision. These results indicate that, for whatever reasons (possibly kingdom building or the not-invented-here syndrome), organization-wide biases exist toward { insource, custom] in all of the firms analyzed.
364
PAUL NELSON H A L .
Technological Factors StandardSimple
Common
Specialize1 {Advanced
PACKAGE
OUTSOURCE
insource or outsource?
custom or package?
Application Factors
Unique
L
INSOURCE
CUSTOM
custom or package?
insource or outsource?
I
FIG. 2. Relationship between technological and application factors and the software acquisition decision.
6.5 Temporal Effects Consistent with statements in the trade press and Hypothesis 6, we find some evidence that firms have become more predisposed toward outsourcing over time, in that outsourced custom systems were less likely prior to the 1980s.On the other hand, the results concerning a possible temporal effect show no change with respect to packaged software. This implies that any increase in the use of packaged software is captured by changes in the technological, application and organizational factors specified in our model.
7. Conclusions We presented a decision framework addressing the custom/package and insource/outsource dimensions of the software acquisition problem. Data from 186 actual software acquisition decisions provide significant statistical support for this framework. This paper extends the information systems literature in two key areas. The previous software acquisition literature has focused on the insource versus outsource decision. Theoretical work has focused on particular aspects of this decision, such as contracting. This paper develops a more general economic framework that addresses the acquisition problem in a broader context and encompasses the
SOFTWARE ACQUISITION
365
custom/package decision as well. In addition, the key technological, application and organizational factors behind the actual acquisition decisions of firms are identified using multinomial logit analysis. Our key findings are as follows. Companies do not make the custom/package and insource/outsource decisions independently. Both the system’s technological features and the application’s properties play a significant role in the software acquisition decision. Neither always dominates the other. For example, all else equal, systems using specialized and advanced technologies to address common applications tend to be outsourced, and systems using simpler, more basic technologies to address common applications tend to be acquired as packages. Contrary to popular belief, strategic applications are not more likely, all else equal, to be insourced and/or custom developed. Companies have a strong predisposition to acquire software through internal, custom development. This predisposition with respect to insourcing has decreased over time, as stated in previous studies. On the other hand, no temporal change in the adoption of packaged software is observed. Finally, the framework and empirical findings in this paper facilitate a broader and more detailed understanding of the managerial considerations behind the firm’s software acquisition decisions. For managers, this framework provides a basis for structuring and benchmarking software acquisition decisions. Vendors may find new insights into the types of software projects they should target. For academics, this paper provides a stepping stone to more focused research on particular aspects of the problem, such as pricing, incentives, warranties, maintenance, managerial oversight and benchmarking. REFERENCES AND FURTHER READING
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19,716-723. Altinkemer, K., Chaturvedi, A,, and Gulati, R. (1994). Information system outsourcing: issues and evidence. International Journal of Infornzation Management, 14, 252-268. b e t , K. P., and Jones, M. C. (1994). Firms that choose outsourcing: a profile. Information Management, 26, 179-188. Balakrishnan, S. (1994). The dynamics of make or buy decisions. European Journal of Operations Research, 14,552-571. Banker R.D., and Slaughter, S. (1997). A field study of scale economies in software maintenance. Management Science, 43112), 1709- 1725. Ben-Akiva, M., and Lerman, S . R. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand, MIT Press, Cambridge, MA. Brynjolfsson, E. (1993). The productivity paradox of information technology. Communicarions ofthe ACM, 36.67-77. Bukeley, W. M. (1996). When things go wrong. The Wall Street Journal, November 18, 1996. Caldwell, B. (1989). Is outsourcing a good move?, Information Week, September 25, pp. 40-48. Chaudhury, A,, Nam, K., and Rao, H. R. (1995). Management of information systems outsourcing: a bidding perspective. Journal of Mariagenzent Information Systems, 12, 131-160.
366
PAUL NELSON EJAL.
Coase, R. H. (1937) The nature of the firm. Econoniira, 4, 386-405. Cusumano, M. A., and Selby, W. (1997). How Microsoft builds software. Conzmunications of the ACM, 40,53-61. Dale, B. G., and Cunningham, M. T. (1984). The importance of factors other than cost consideration in make or buy decisions. International Journal of Operations and Production Marzagenzent, 4, 43-54. DiRomualdo, A,, and Gurbaxani, V. (1996). A New Strategic Framework for IT Outsourcing, Working Paper, Graduate School of Management, University of California, Irvine, CA. Evans, R. (1994). Should IT stay or should IT go?, Management Today, November, 66-71. Gardiner, S. C., and Blackstone, J. H. (1991). The theory of constraints and the make-or-buy decision. International Journal of Purchasing and Material Management, 27, 38-43. Gdudagni, P., and Little, J. D. C. (1983). A logit model of brand choice calibrated on scanner data. Marketing Science, 2, 203-238. Gilliam, L. (1990). Outsourcing issues. Computerworld, 24, 67-72. Green, P.. and Srinivasan, V. (1990). Conjoint analysis in marketing: new developments with implications for research and practice,’ Journal of Marketing, 54,3-19. Groenfeldt, T. (1997). Why in-house systems fail. Derivatives Strategy Technology, 12-17, May, 1997. Grossman, S., and Hart, 0.(1986). The costs and benefits of ownership: a theory of vertical and lateral integration, Journal of Political Economy, 94, 691-7 19. Gurbaxani, V., and Kemerer, C. (1989). An agent-theoretic perspective on the management of information systems. Proceedings of the 22nd Annual International Conference on System Sciences, 3, 141- 150. Harrar, 6. (1993). Outsourcing tales. Forbes, June 7, ASAP Supplement, 37-42. Hauser, J. R., and Urban, G. L. (1977). A normative methodology for modeling consumer response to innovation. Operations Research, 25, 579-619. Hirscheim, R., and Lacity, M. (1993). The IT outsourcing bandwagon. SIoan Matiagernenr Review, Spring, pp. 9-23. Holmstrom, B. R., and Tirole, J. (1989). The theory of the firm. In Handbook of Industrial Organization, Vol I (R. Schmalensee and R. Willig, Eds.), North-Holland, Amsterdam. Jensen, M., and Meckling, W. H. (1976). Theory of the firm:managerial behavior, agency costs and ownership structure. Jourrial of Financial Economics, 3,305-360. Joch, A. (1995). How software doesn’t work,’ Byte, December, pp, 49-60. Jones, C. (1994). Evaluating software outsourcing options. Information Systenzs Managernent, 11, 28-33. King, J. (1997). Dell zaps SAP. Coniputerwortd, May 26, 1997. King, W. R. (1994). Strategic outsourcing decisions. Infornzation Systems Managenzerzt, 11, 58-61. Lacity, C. (1992). Outsourcing: the untold tales. Inforniarion Week, August 10, p. 64. Lacity, M., Hirscheim, R., and Willcocks, L. (1994). Realizing outsourcing expectations: incredible expectations, credible outcomes. lrlformation Systems Managenzent, 11, 7- 18. Lacity, M., Willcocks, L., and Feeny, D. (1995). IT outsourcing: maximizing flexibility and control, Harvard Business Review, 73,84-93. Lancaster, K. J. (1966). A new approach to consumer theory, Journal of Political Economy, 74, 132-157. Loh, L., and Venkatraman, N. (1992). Determinants of information technology outsourcing: A cross-sectional analysis. Journal of Management Information Systenzs, 8 , 7-24. Loh, L., and Venkabaman, N. (1995). An empirical study of information technology outsourcing: benefits, risks, and performance implications. Proceedings of the 1995 ICIS, Amsterdam, pp. 277-288. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics (P. Zarembka, Ed.), Academic Press, New York.
SOFTWARE ACQUISITION
367
McFddden, D. ( 1981). Econometric models of probabilistic choice. In Handbook of Ecorionierrics, Vol. I I , (2.Griliches and M. Intriligator, Eds.), Elsevier Science Publishers BV, Amsterdam. McFarlan, F. W., and Nolan, R. L. (1995). How to manage IT outsourcing alliance, Sloan Muriugeinent Review, 35,9-23. Mehler, M. (1991). Reining in runaway systems, Information Week, December 16, pp, 20-24. Meyer, N. D. (1994) A sensible approach to outsourcing: the economic fundamentals, Irzformation Syste~7isManagenient, 11, 22-27. Nault, B., and Richmond, W. B. (1995). Outsourcing as a signal of IS manager quality, Working Paper, George Mason University, Fairfax, VA 22030. Nelson, P., Richmond, W.. and Seidmann, A. (1996). Two dimensions of software acquisition. Conimunicatioris of the ACM, 39, 29-35. Neuinann, S., Ahituv, N., and Zviran, M. (1992). A measure for determining the strategic relevance of IS to the organization, Irlformarion and Muriagenient, 22, 28 1-299. Patane, J. R., and Jurison, J. (1994). Is global outsourcing diminishing the prospects for American programmers? Journal ofSysterns Managemerit, 45, 6- 10. Pollack, A. (1990). Revlon sues supplier over software disabling. New York Times, October 8, 1990. Richmond, W. B., and Seidmann, A. (1 993). Software development outsourcing contract: structure and business value. Journal ofMarzagerrienr Irlfornzation Systems, 10, 57-72. Richmond, W. B., Seidmann, A,, and Whinston, A. B. (1992). Incomplete contracting issues in information system development outsourcing. Decision Support Systems, 8,459-477. Rosen, S. (1974). Hedonic prices and implicit markets: product differentiation in pure competition. Journal of Political Econonzy, 82, 34-55. Ross, S. (1973). The economic theoiy of agency: the principal’s problem. American Economic Review, 63 134-139. The Standish Group, (1994). Charting the seas of information technology. Technical Report. Wang, E. T. G., Barron, T., and Seidmann, A. (1997) Contracting structures for custom software development: the impacts of international rents and uncertainties. Manugernenr Science, 43(12), 1726-1744. Whang, S. (1992). Contracting for software development. Management Science, 38, 307-324. Williamson, 0. E. (1985). The Econonrir Institutions cf Cupitulisnz, Free Press, New York. Williamson, 0. E. (1989). Transaction cost economics. The Haridbook of Industrial Orgarzizutiorz: Vol. I (R. Schmalensee and R. Willig. Eds.), North-Holland, Amsterdam.
This Page Intentionally Left Blank
Author Index Numbers in italics indicate the pages on which complete references are given Aaronson, A. 138 Abbott, B. 232,244 Abney, S. 58 Adams, S. 25.40,52,63 Adelson, S.J. 236, 238,244 Adler, D.D. 227,242 Adler, R.S. 205,251 Ahituv, N. 342,367 Ainsworth, M.E. 21 1,249,250 Akahane, M. 215,227,244 Akaike, H. 359,365 Akmajian, A. 5, 6, 21, 30, 36,58 Alan, J. 338 Aldnch, W.N. 216,250 Alexander, C. 257,258,259,260,262,264, 285,286,288,290,291 Alexander, S.M. 137 Alexanderson, J. 15,40,5X Alfred, C. 291 Algazi, V.R. 227,246 Allen, C.M. 216,244 Allen, J. 5, 11,20, 30,58 Allison, J.W. 208, 221,252 Altinkemer, K. 344,365 Amato, I. 180, 180 Anabitarte, D. 149,181 Anastassiou, D. 299,338,339 Anderson, J.R. 77, 78, 79,90, 134,138. 140 Anderson, M.P. 45,65 Anderson, N.S. 78,140 Anderson, R.H. 215,252 Appelt, D.F. 41, 61 Apte, C. 139 Arbib, M. 142, 149, 150, 180 Arbib, M.A. 27,63 Arms, Y. 37,40,66 Arent, M. 89,140 Anenti, G. 40,523 Ariet, M. 211,215,244,247 Armentrout, S. 142, 154, 155, 156, 158, 159, 163, 179,182 Amet, K.P. 352,365
ADVANCES IN COMPUTERS, VOL. 47
Arnold, D. 31,41,58 Aronson, D. 78,137 Ashton, E.A. 227,244 Athanasiou, S. 216,244 Attarwala, Y.M. 215,250 Avila, R.S. 238,248 Azevedo, J. 233,252 Azpeitia, I. 149,181 Baatz, E. 25,40,52,63 Baba, K. 215,216,227,244 Bahlmann, F. 216,249 Bain, R.P. 216,245 Bajura, M. 242,244,252 Balakrishnan, S. 365 Balen, F.G. 216,244 Ball, B. 232,244 Bamber, J.C. 226,244 Bamberger, J. 227,245 Banker, R.D. 343,365 Banks, E. 149,180 Bapty, T. 232, 244 Bar-Hillel, Y. 7, 8, 9 , 5 8 Barillat, C. 215,238,249,251 Barr, A. 84,89,95,137 Barrera, J.S.111 317,338 Barrera, 0. 149,181 Banon, T. 343,367 Barmtieta, A. 149, 181 Bashein, G. 216,238,246,249 Bashford, G.R. 221,226,244 Bass, L. 51,58 Basset, 0. 216,238,248 Bates, M. 10, 12,52,58, 65 Batsell, R.R. 48,64 Baum, L.E 20,58 Beach, K.W. 211,216,246,248 Bell, D.S. 226,244 Belohlavek, M. 215,216,244,247,252 Below, C. 216,248 Ben-Akiva, M. 356,365 Benyon, D. 13,46,53,54,64
369
Copyright 0 1998 by Academic Press Ltd. All rights of reproduction in any form reserved.
370
AUTHOR INDEX
Bereiter, C. 81,137 Berlekamp, E. 145,180 Bernhardt, C. 325, 326,338 Berson, S. 307,308,321,338 Berstad, A. 21 1,247 Bettahar, K. 21 1,246 Biermann, A.W. 58 Biersack, E. 325, 326,338 Blaha, M. 274,291 Blanco, H. 149,181 Blank, G.D. 27,28, 36,58 Blankenhom, D.H. 227,245 Blood, E.B. 216,251 Bobrow, D.G. 35,58 Bocheck, P. 300,304,332,339 Bod, R. 58 Boles, M.S. 250 Bolosky, W.J. 317,338 Bolson, E. 238,249 Bonar, J. 80, 137 Bonarini, A. 58 Bondi, J. 216,247 Booch, G. 284,291 Booth, A.D. 2,7,58,62 Borenstein, N.S. 80,137 Borst, C. 216,250 Bouma, C.J. 190,216,249,250 Bovik, A.C. 226,245 Boyce, S.J. 4,54,60 Brady, M.L. 238,245 Brainerd, R.C. 78,140 Brandt, A. 242,252 Bratt, H. 40,51,63 Bree, R.L. 205,251 Brehm, J.W. 81,137 Brehm, S. 8 I, I37 Brett, A.D. 216,248 Brezina, A. 208,245 Briggs, L. 77, 78,137 Brinkley, J.F. 211,216,245 Brodlie. K.W. 215,245 Brown, J.S. 10, 35, 37.59, 77, 82,138, 140 Brownlee, K.A. 118,138 Bmining, N. 216,227,245,252 Brynjolfsson, E. 342,365 Bude, R.O. 205,251 Buhlmeyer, K. 215,252 Bulkeley, W.M. 344,365 Burger, J.D. 59 Burks, A. 142, 147, 149, 150, 156,180
Bums, D.H. 21 1,216,246,248 Bums, P.N. 202,208,245,252 Burton, R. 79,82,138 Burton, R.R. 10, 17, 18,35,37,59 Buschmann, E. 290,291 B u s e m m , S. 40,59 Bush, N.L. 226,244 Butler, C. 22,23,59 Byl, J. 153, 156, 179, 180 Cabral, B. 227,245 Caelen 46,59 Caldwell, B. 344, 365 Cam, N. 227,245 Campbell, R.L. 138 Campbell, S . 215,253 Cao, Q. 215,250 Cao, Q.-L. 215,233,247,252 Carbonell, J. 3,59 Carbonell, J.G. 90, 138, I39 Cardazo, L. 216,244 Cardinal, H.N. 216,252 Carey, T. 13,46,53,54, 64 Carpenter, L. 227,246 Carroll, B.A. 216,253 Carroll, J.M. 138 Carson, P.L. 205,216,227,245,250,251 Case, J. I80 Caskey, C.I. 216,247 Cassaniga, T. 40,58 Caudill, M. 22,23,59 Cavaye, D.M. 227,245 Cercone, N. 18,41,59 Chan, H. 211,245 Chandran, K.B. 215,250 Chandrasekaran, K. 215,249 Chaney, E.L. 215,246 Chaney, W.I.D. 338 Chang, C.M. 226,245 Chang, E. 300,302,319, 325,339 Chang, F.M. 216,249 Chang, S.-F. 300,304,326, 332,333,338,339, 340 Charniak, E. 20,59,84,95,140 Chatuwedi, A. 344,365 Chaudhury, A. 344,365 Chen, C.H. 226,245 Chen, D.T. 242,252 Chen, L. 50,51,59 Chen, M.S. 301, 302,339
AUTHOR INDEX Chen, P.M. 301,320,339 Chen, R. 61 Cheng, T.O. 216,253 Chervenak, A.L. 339 Cheyer, A. 40,51,63 Chiang, T. 299,339 Chin, C.T. 208,245 Chin, D. 37,40,66 Chin, D.N. 59 Chin, H.P. 227,245 Chinchor, N. 12,59 Chirossel, P. 216,248 Chiu, W. 211,248 Chomsky, C. 10,60 Chomsky, N. 6, 17,27,59 Chou, H. 142,154, 155, 156, 158, 159, 160, 163,165, 166, 175, 178, 179,180,182 Christie, L.G. 215,247 Chuang, K.S. 224,252 Church, K. 59 Church, K.W. 9, 11, 12, 14, 15, 16, 18, 19, 20, 21,45,59 Clancy, W. 84, 138 Clark, A. 11,24,59 Cline. H.E. 224,249 Clow, J. 50, 51,59 Coase, R.H. 344,366 Coatrieux, J.L. 245 Codd,E. 142, 150, 152, 153, 155, 156, 157, 174,180 Cohen, M.F. 227,245 Cohen, P. SO, 5 1,59 Cohen, P.R. 49,59,64 Cole, R. 65 Collet-Billon, A. 216,251 Collins, A. 77, 132,138 Conetta, D.A. 21 1,215,244,247 Conklin, J. 99,138 Conley, W. 45,65 Constancis, E. 216,251 Conti, C.R. 211,215,244,247 Conway, J. 145, I80 Cootes, T.F. 226,245 Coplien, J. 290,29I Corhett, A.T. 79, 90,138 Coutaz, J. 46, 51,58,59 Cowan, J.D. 9, 11, 22,60 Crahhe, F. 25,40,52,63 Crane, J.P. 216,245 Crangle, C. 60
371
Crawford, D.C. 226,244 Cunningham, M.T. 366 Cuomo, D. 12,47,54,61 Cushman, R. 238,248 Cusumano, M.A. 343,366 Cyr, D.R. 227,251 Dale, B.G. 366 Dan, A. 315,316,317,318,339 D’Arcy, T.J. 21 1,248 Dautraix, I. 238,248 Davidsen, R.E. 216,245 Davidson, T.E. 21 I , 220,237,250,251 Davis, R. 95, I38 Davis, W. 216,246,253 Day, M.C. 4,54,60 De Angeli, A. 49, SO, 64 Declerck, T. 40,59 de Feyter, P.J. 227, 245 DeJong, M.R. 216,247 Dekker, D.L. 215,246 Delcker, A. 216,246 Demaria, A.N. 216,248 Demers, R.A. 5, 6, 21,30,36,58 Demongeot, J. 146,180 Deng, J. 215,246 Deng, Y.B. 216,253 den Heed, W. 227,245 Detmer, P.R. 211,216,246,248 Devonald, K.J. 216,246 Dewhury, K.C. 226,246 Diagne, A.K. 40,59 Dias, D. 315,339 Dias, D.M. 340 Diener, H.C. 216,246 Digalakis, V. 52, 60 Di Mario, C. 216,227,245,252 Dini, L. 40,59 DiRomualdo, A. 362,366 Dohle, J. 259,291 Dominick, W. 25, 38, 51, 62 Dong, E., Jr. 215,246 Donner, K. 227,246 Dose, K. 183 Dowding, J. 40, 5 1, 63 Dowling, G.R. 16,61 Downey, D.B. 211,216,227,246,251,252 Drauss, M. 215,247 Draves, R.P. 317,338 Drehin, R.A. 227,246
AUTHOR INDEX Drexler, K. 143,180 Duff, M. 146,182 Dumais, S.T. 42,43,44,60 Durand, S. 143, 179,181 Durham, N.C. 216,253 Dutt, V. 215,244 Duvic, M. 190,246 Earley, J. 19, 60 Echarte, F. 149, I81 Eckersley, R.J. 226,244 Eddy, F. 214,291 Edwards, W.S. 221,251 Ehricke, H.-H. 227,246 Ekoule, A.B. 224,246 Eleftheriadis,A. 338 Elliott, T.L. 211,246 Ellis, T.O. 138 Ellwood, D.A. 216,246 Elman, J.L. 22,60 Elvins, T.T. 222,223,232,246,250 Epstein, R. 29,60 Erbacci, G. 221,251 Erman, L.D. 95, I38 Evans, J.K. 216,24.5 Evans, J.L. 216,250 Evans, R. 10,60,344,366 Evans-Rhodes,D. 140 Everen, S . 40,66 Ewigman, B.G. 216,245 Fan, P. 215,250 Fanty, M. 23, 25,65 Farmer, A.K. 5,6,21,30,36,58 Farmer, D. 146,180 Farmer, J. 143,181 Farrell, R.G. 17,78, 134,140 Fass, D. 48,49,50,61 Fatehchand, R. 9,60 Favre, C. 238,248 Favre, R. 2 11,246 Feeny, D. 362,366 Feichtinger,W. 216,246 Feigenbaum, E.A. 13, 26,60,84,89,95,137 Feldberg, R. 182 Feldman, D.H. 82,138 Feng, Q. 142, 180,180,181 Fenster, A. 211,216,221,246,247,250,251, 252,253 Ferrara, K.W. 227,246
Fikes, R. 84,138 Filer, E.P. 216,246 Fme, D. 226,246 Firebaugh, M.W. 13,22, 28, 29, 60 Fischer, G. 77,90,138 Fiscus, J.G. 12,54,64 Fisher, W.M. 12,54,64 Fishman, E.K. 215,246 Fitzgerald, R.P. 317,338 Flanagan, J.L. 57,60 Fleming, J.S. 226,246 Flores, F. 4, 66, 69, I40 Flynn, R. 339 Fodor, J.A. 22,23,60 Fogg, C.E. 296,339 Foley, D.A. 215,216,244,247,252 Foley, J.D. 223,246 Foote, B. 291 Foran, J. 227,245 Fornage, B.D. 190,246 Forsberg, F. 207,247 Foster, F.S. 216,252 Fowler, M. 290,291 Fowlkes, J.B. 227,245 Fox, S. 183 Franceschi, D. 216,247 Franz, A. 10,60 Freedman, W. 179,181 Freitas, R. 143,181 Freund, J.G. 203,249 Frieden, H.J. 227,252 Frigoletto, F.D. 216,245 Fritzsch, T. 208,245 Fuchs, H. 216,232,238,242,244,247,252 Fulton, D.R. 215,247 Fumas, G.W. 42,43,44,60 Furui. S. 60 Gabriel, R. 283,291 Gall, D.L. 296,339 Galliers, J.R. 54, 65 Galotti, K.M. 43.62 Gamma, E. 257.274.279.285.287.290.291 Ganapathy, U. 216,247 Gang, M. 309,340 Gano, S. 133,138 Gardener, J.E. 215,216,244,246,250 Gardin, F. 40,58 Gardner, J.E. 216,248 Gardner, M. 145, I81
AUTHOR INDEX Garofolo, J.S. 12, 54, 64 Garrett, W.F. 216,232, 242,247,252 Gamin, P.L. 60 Gaudagni, P. 355,366 Gawron, J.M. 13, 15, 17, 19, 20, 21, 22,40, 51, 61,63 Gazdar, G . 10, 18,60 Gefter, W.B. 238,247 Geiser, E.A. 21 1,215,216,244,247 Gemmell, D.J. 339 Genesereth, M. 82, 138 Gentner, D.R. 86,89,138 Gerber, T.C. 215,216,244,247 Gerhardt, M. 179,181 Ghosh, A. 216,247 Giachin, E.P. 60 Gibson, G.A. 301,317,320,338,339 Gil, R. 216,252 Gilbreath, W. 143,181 Gilja, O.H. 21 1,247 Gill, R.W. 216,246 Gilliam, L. 344,366 Gimenez, G. 216,238,248 Givon, H. 81,140 Glaser, R. 88, 138 Globerson, T. 81,140 Goeke, M. 143, 179,181 Goldberg, A. 123, 138 Goldberg, B.B. 207,247 Goldberg, D. 167,181 Goldfedder, B. 291 Goles, E. 146, 180 Golubchik, L. 307, 308,321,338 Gomez, L.M. 42,43,44,60 Gomori, J.M. 238,247 Gopal, A.S. 216,248 Gorfu, Y. 40,51,63,236,247 Gossman, G.S. 247 Goyal, A.(1) 319, 324,340 Goyal, A.(2) 319, 324,340 Goyal, P. 319, 324,340 Grange, G. 211,246 Green, B.F. 10,60 Green, P. 354,366 Greenleaf, J.F. 215, 216,244,247,248,249, 252 Grefenstette, J. 167, 181 Griesrner, J. 139 Griffiths, K.A. 216,246,249 Grimm, M. 226-7,251
Grise, R.F., Jr. 138 Groenfeldt, T. 344,366 Grossman, S. 344,366 Grosz,B.J. 10, 18,41,61,64 Grubaxani, V. 344,366 Gruboeck, K. 215,253 Grudin, J. 61 Guada, N.S. 48,64 Guerrero, J.L. 211,252 Gulati, R. 344,365 Giinther, R.W. 227, 248 Guo, 2,227,247 Gurbaxani, V. 362,366 Gutowitz, H. 146, 181 Guy, R. 145,180 Hacking, C.N. 226,246 Hagen-Ansert, S. 21 5,250 Hall, G. 48,49, 50, 61 Hall, P.A.V. 16,61 Halliwell, M. 208,253 Hamm, M. 227,248 Harnon, C. 245 Hamper, U.M. 216,247 Han, J. 339 Handschumacher, M.D. 21 1,252 Handschumaker, M.D. 215,249 Hanrahan, P. 227,230,246,249 Hansen, C.D. 236,238,244 Harkreader, A. 17,25,63 Hamish, R.M. 5, 6 , 21, 30, 36,58 Hamist, K. 227,245 Harper, M.P. 61 Harrar, G. 344,366 Harris, M.D. 18,27, 33,61 Harrison, M.R. 248 Hart, 0,344,366 Hartman, H. 182 Hartson, H.R. 53,61 Hartwell, S. 43, 62 Hashimoto, H. 227,247 Haslem, J. 226,245 Hauptmann, A.G. 3, 13,63,65 Hauser, J.R. 344,357,366 Hausken, T. 211,247 Hayes-Roth, F. 61 Hebb, D.O. 9,61 Heidegger, M. 69,138 Helm 257, 214,279,285,287,290,291 Helzerman. R.A. 61
373
374 Hemphill, C. 52,61 Hendrix, G.G. 10, 35,36,37,61 Herbetko, J. 226,246 Herdan, G. 61 Herman, G. 181 Herman, G.T. 238,247 Hernandez, A. 216,238,248 Hestenes, J.D. 227,245 Hewitt, C. 84,87,95,139 Higgins, W.E. 238,245 Hill, A. 226,245 Hill, D.R. 61 Hill, W. 61 Hillis, W. 179, 181 Hindsholm, M. 182 Hirota, G. 216,242,247,252 Hirscheim, R. 344, 350,366 Hirschman, L. 12.47,54,61 Hix, D. 53,61 Ho, S.Y. 215,252 Hodges, T. 216,246 Hodges, T.C. 21 1,248 Hoffman, E.A. 215,249 Hofstadter, D.P. 5 , 29, 33,61 Hohne, K.H. 223,248 Hollan, J. 61 Holland, J. 167,181 Holland, J.H. 181 Holland, S. 13,46,53,54,64 Holmes, J.H. 238,248 Holmstrom, B.R. 344,366 Honda, Y. 143,182 Hong, C. 242,252 Hong, J. 142, 180,181 Hopcroft, J. 176,181 Hopcroft, J.E. 89, 106.139 Hopper, A. 338 Houghton, R.C. 139 Hovy, E. 15,61 Howry, D.H. 238,248 Hu, G. 216,253 Huber, B. 140 Hubregtse, P. 226,244 Hughes, S.W. 21 1,248 Hunerbein, M. 216,248 Hung, H.M. 224,252 Ibanez, J. 149,181 Ishii, S. 216,244 h e r , J.M. 216,251
AUTHOR INDEX Jackendoff, R. 61 Jacobs, P.S. 10,6/ Jacobson, H. 143,181 Jacobstein, N. 61 Jaffe, J.S. 227,251 Jagannathan, V. 137 Jamieson, L.H. 61 Jang, Y.T. 216,250 Jauniaux, E. 215,253 Jensen, J.A. 216,245 Jensen, M. 344,352,366 Jeong, Y.C. 224,248 Ji-Bin 207,247 Jim-Yu, L. 2 16,248 Joch, A. 349,366 Johnson 257,274,279,285,287,290,291 Johnson, R.E. 291 Johnston, M. 50.5 1,59 Jones, C. 348,350,366 Jones, H.R. 216,251 Jones, M.B. 317,338 Jones, M.C. 352,365 Jones, M.W. 224,248 Josselson, H.H. 7, 8 , 9 , 6 1 Juang, B.-H. 11,21,22, 23,64 Jurison, J. 344,348,350,367 Jurkovic, D. 215,244,253 Kadi, A.P. 216,246,249 Kajiya, J.T. 227,248 Kamm, C. 61 Kandlur, D.D. 301, 302,339 Kane, B.J. 216,250 Kane, M. 3, 13,63 Kaplan, R.M. 35,58 Kastner, J. 139 Katz, R. 300,339 Katz, R.H. 301, 320,339 Kaufman, A. 216,223,247,248 Kaufman, A.E. 216,223,238,248 Kay, M. 13, 15, 17, 18, 19,20,21,22,35,58, 61 Keeler, 84,138 Keen, C.G. 226,251 Keeton, K. 300,339 Kelch, J. 227,248 Kelly, I.M. 216,248 Kemerer, C. 344,366 Kephart, J. 143,181 Kerber, R.E. 215,250
375
AUTHOR INDEX Kernighan, B.W. 102, 109, 110,139 Kfoury, A.J. 27,63 Khullar, V. 216,244 Kim, J.J. 224,248 King, D.L. 211, 216,248 King,D.L., Jr. 211, 216,248 King, J. 344,366 King, M. 54,61 King, W.R. 366 Kinter, T.M. 215,244,249 Kirbach, D. 216,248 Kisslo, J. 216,252, 253 Kitney, R.I. 227,248 Klein, H.-M. 227,248 Klein, J. 40,59 Kluytmans, D. 216,250 Kluytmans, M. 190,249 Knill, K. 20,21, 61 Knudsen, C. 182 Kolin, A. 208,245 Koller, W. 227,246 Koons, D.B. 62 Kopchok, G.E. 227,245 Kosowsky, B.D. 216,251 Kossoff, G. 187,216,246,249 Koved. L. 123,140 Koza, J. 167,181 Kozuma, S. 215,227,244 Krause, J. 62 Kremkau, F.W. 191,199,202,249 Krestel, E. 190, 191,249 Kruger, W. 223,248 Kuhlman, J.E. 215,246 Kuhn, K. 49,50,64 Kuno, S. 10,62 Kuo, H.C. 216,249 Kupiec, J. 62 Kurose, J.F. 321,340 Kwan, O.L. 248 Kwan-Liu, M. 227,245 Laas, T.E. 227,245 Lacity, C. 344,366 Lacity, M. 344,350, 362,366 Lai, J. 51,62 Lai, J.Y. 227,252 Laing, R. 181 Lamberti, C. 221,251 Lancaster, K.J. 354,366 Landauer, T.K. 42,43,44,60,62
Lane, A. 33,62 Lane, D.M. 48,64 Langevin, R.E. 216,251 Langton,C. 142, 143, 151, 152, 157, 163, 175, 181
Lari, K. 62 Lateiner, J.S. 238,249 Laughery, K. 10,60 Laur, D. 230,249 Lawrence, K. 139 Lazcano, A. 143,182 Leatherby 48, 64 Ledgard, H. 47,62 Lee, E.K. 301, 320,339 Lee, J.Y. 226,245 Lee, K.-F. 28,62,65 Lee, P.L. 227,252 Lees, C. 215,253 Lees, W.R. 215,216,244,246,248,250 LeFevre, M.L. 216,245 Legget, M. 238,249 LeGuerine, Y. 216,251 Lemke, A. 77,90,138 Leotta, D. 238,249 Lepper, M.R. 139 Lerman, S.R. 356,365 Lesser, V. 95,138 Lethor, J.P. 21 1,252 Levaillant, J.M. 216,251 Levi, S.P. 317,338 Levin, D.N. 215,246 Levin, M.F. 216, 251 Levine, R.A. 211,215,249,252 Levinson, S.E. 62 Levitt, H. 62 Levoy, M. 224,227,230,238,247,249 Lewis, H.R. 28, 62 Li, W. 227,245 Li, X.-N. 238,249 Li, Z.A. 216,253 Lieberman, H. 89,139,140 Liskov, B. 291 Little, J.D.C. 355,366 Liu 207,247 Liu, C.H. 216,249 Liu, D.Y. 211,216,245 Liu, J. 207,247 Livingston, M. 242,247 Livingston, M.A. 216,252 Locke, W.N. 2,7,58,62
376
AUTHOR INDEX
Loh, L. 353,366 Lohn, J. 167, 169, 179,181 Lorensen, B. 224,252 Lorensen, W. 274,291 Lorensen, W.E. 224,249 Losordo, D.W. 216,251 Lou, L. 245 Lu, P. 216,253 Ludomirsky, A. 215,247 Luger, G.F. 21,62 Lund, B.A. 12,54,64 Lupkeiwcz, S.M. 215,247 Lupkiewicz, S.M. 21 1, 215,244 Luria, M. 40.66 McCabe, D.H. 21 1,249,250 McCalla, G. 18,41,59 McCallum, W.D. 21 1,216,245 Mccann, H.A. 215,249 McCarthy, J. 291 McCorduck, P. 29,62 McCulloch, W.S. 9.62 McDonald, J. 238,249 McEwan, C.N. 215,249 McFadden, D. 355,356,359,366,367 McFarlan, F.W. 344,367 McGavran, M.H. 190,246 McCee, D. 50,5 1,59 McGlashan, S. 60 Mack, L.A. 227,251 Mackinlay, J. 139 Mclean, C.A. 211,246 McManus, J.M. 321,339 McNellis, D. 216,245 Macovski, A. 190,249 McPherson, D.D. 216,250 McPheters, L. 61 Madon, D. 143, 179,181 Madore, B. 179,181 Maegaard, B. 31,41,62 Magid, D. 215,246 Magnin, I. 238,248 Magnin, I.E. 238,248 Maier, E. 15,40,58 Makhoul, J. 62 Malmkjaer, K. 62 Malone, T.W. 139 Malzbender, T. 230,249 Manaris, B.Z. 3, 12, 13, 17, 25, 30, 31, 38,51, 63
Mange, D. 143, 179, I81 Manna, Z. 139 Marcus, M. 11, 19,20,45,63 Marcus, M.P. 28,34,35,63 Margolus, N. 162,179,183 Markowitz, J.A. 4, 12, 22,63 Marsh, E. 40,66 Marshall, R.J. 59 Martin, A. 12,54,64 Martin, C. 339 Martin, J. 40, 66 Martin, P. 25,40,52,63 Martin, P.A. 41,61 Martin, R.W. 238,249 Marx, G.R. 215,247 Maslak, S.H. 203,249 Massinter, L. 135,140 Mastaglio, T. 76,139 Matre,K.211,247 Maurer, G. 216,247 Mauri, M. 40,58 Maxwell, D.J. 21 1,248 Mayfield, J. 40,66 Mays, E. 139 Meckling, W.H. 344,352,366 Medema, D.K. 2 11,249,250 Megaros, G. 259,291 Mehler, M. 367 Mehta, M. 227,245 Meisel, W.S. 13,63 Mellish, C. 60 Mellor, S.J. 291 Mercer, R.L. 9, 11, 18, 19,20,21,59 Merkle, R. 143,182 Memll, M.D. 88,139 Merton, D.A. 207,247 Merz, E. 216,249 Meszaros, G. 259,291 Meyer, C.R. 216,250 Meyer, N.D. 344,350,367 Meyer, T. 169,182 Meyers, S.N. 216,250 Michalski, R.S. 90,139 Microsoft Corp. 127,139 Miikkulainen, R. 22,25,63 Miller, L.C. 15,63 Miller, S. 143,182 Mills, T.A. 216,250 Milner, A. 21 1,248 Milroy, E.J. 216,250
AUTHOR INDEX Min Chen 224,248 Minsky, M. 11,63,96,139 Mitchell, C.D. 61 Mitchell, J.L. 296,339 Mitchell, M. 167, 169,182 Mitchell, R. 227,250 Mitchell, T.M. 90,139 Mittal, V.O. 14, 63 Mochizuki, T. 215,227,244 Moll, R.N. 27,63 Moon, D. 139 Moore, E. 149,182 Moore, J.D. 14,63 Moore, R. 40, 51, 63 Moore, R.C. 12,54,63 Moreau, M. 227,253 Moritz, W.E. 21 1,249,250 Morowitz, H. 182 Morris, N.M. 139 Moskalik, A. 216,250 Mostow, J. 3, 13, 63 Moulaert, S.R. 216,250 Moura, L. 227,248 Mukherjee, R. 315,339,340 Mumm, B . 215,247 Munakata, T. 5 , 63 Munk, P.L. 216,251 Muntz, R.R. 307,308,321,338 Muramatsu, S.K. 21 1,216,245 Murveit, H. 52,60 Myers, B.A. I39 Myhill, J. 149, I82 Myhrvold, N.P. 3 17,338 Nadeau, D.R. 232,246 Nam, K. 344,365 Nanda, N.C. 215,216,247.250 Napier, H.A. 48,64 Narayanan, P.S. 339 Nash, D.M. 216,246 Nault, B. 352,367 Navarro-Gonzilez, R. 143,180,182 Nelson, P. 367 Nelson, T.R. 187,211,215,216,220,222,227, 237,250,251 Neumann, S. 342,367 Newell, A. 10, 17,63 Newmeyer, F.J. 64 Ney, D.R. 215,246 Ney, H. 64
377
Ng, K.H. 216,250 Ng, K.J. 216,250 Nikravesh, P.E. 215,250 Nirenburg, S. 14,65 Nisand, G. 211,246 Nisand, I. 21 1,246 Nolan, R.L. 344,367 Norman, D. 35,58 Norvig, P. 13, 15, 17, 19.20, 21,22,29,61,65 Obermeier, K.K. 3 , 4 , 7 , 9 , 10, 11, 12, 13, 14, 15, 16, l8,22,64 Odegaard, S. 21 1,247 Odet, C.L. 224,246 Oettinger, A.G. 10,62,64 Ofili, O.E. 215,250 Ohbuchi, R. 242,244,252 Okai 216,244 Okai, T. 215,227,244 O’Leary, P.W. 215,252 Olson, D. 291 Olstad, B. 227,252 Orgel, L. 142, 143, 152, 180, I82 Orb, J. 143,182 Osborne, K. 10,60 Otten, K.W. 64 Otto, C.M. 238,249 Oviatt, S. 49, 50, 51,59,64 Oviatt, S.L. 59 Ozden, B. 311,312,321,339 Packwood, N. 169,182 Paek, S. 300,304, 326,332, 333,339,340 Painter, J. 227,245 Pallett, D.S. 12, 54.64 Pandian, N. 215,250 Pandian, N.G. 215,233,247,250,252 Pang, J.B.R. 309,340 Papadimitriou, C.H. 28,62 Papert, S. 11,63 Pargellis, A. 167, 182 Pans, C. 16.64 Park, T. 180,180 Parker, K.J. 227 Pasierski, T. 216,250 Pasterkamp, G. 190, 216,249,250 Pastore, J.O. 216,251 Patane, J.R. 344,348,350,367 Patijn, M. 227,245 Patterson, D.A. 301, 320,339
378
AUTHOR INDEX
Pauch 48,64 Pavy, H.G., Jr. 216,253 Pearlman, A S . 211,249,250 Pearson, A.C. 216,250 Pellom, B. 61 Peng, Y. 142,154, 155, 156,158, 159, 163, 179, I82 Pennebaker, W.B. 296,339 Penrose, L. 142, 143, 182 Pereira, C.N. 18,41,61, 64 Pemer, J. 174, 175,182 Perring, M.A. 226,246 Perschke, S. 31,41,62 Perzanowski, D. 40,66 Pesavento, U. 182 Petrie, T. 58 Peyrin, F.C. 224,246 Phillips, D. 227,244 Picard, M.H. 211,252 Picker, R. 216,246,253 Picot, P.A. 227,250,253 Pike, B. 102, 109, 110,139 Pini, R. 227,251 Pirolli, P. 139 Pisano, E.D. 216,242,247,252 Pittman, J. 50, 51,59 Pins, W.H. 9,62 Pizer, S.M. 215, 238,242,246,247 Piziali, R.L. 215,246 Pollack, A. 349,367 Ponnamperuma, C. 143,182 Popowich, F. 48,49, SO, 61 Popp, R.L. 24.5 Posakony, G. 238,248 Potisuk, S. 61 Powers, J.E. 208,245 Prati, F. 216, 227,245,252 Pratt, W.K. 221,250 Premerlani, W. 274,291 Preston, K. 146,182 Pretorius, D.H. 187, 211, 215, 216, 220, 227, 237,250,251 Priece, J. 13,46, 53, 54,64 Priese, L. 149,182 Przybocki, M.A. 12,54,64 Pylysbyn, Z.W. 22,23,60 Raab, F.H. 216,251 Rabiner, L. 11,21,22, 23,64 Ramaswamy, K. 216,238,245,251
Rankin, R. 227,250 Rankin, R.N. 216,251 Rao, H.R. 344,365 Rash, W. 13,64 Rashid, R.F. 317,338 Rasmussen, S. 143,181, I82 Rastogi, R. 311,312, 321,339 Rath, G.J. 78, 139 Rau, L.F. 12, 14, 15, 16,45,59 Rawool, N.M. 207,247 Ray, T. 142,182 Razvi, S. 216,251 Rebek, J. 142, 180,180,181 Reddy, R. 3, 13,64 Reeker, L.H. 64 Reggia, J. 142, 154, 155, 156, 158, 159, 160, 163, 165, 166, 167, 169, 175, 178, 179, 180,181,182 Reich, P.A. 27, 28, 64 Reigeluth, C.M. 77, 86,139 Reimers, B. 216,252 Reiser, B.J. 77,78, 134, I39 Reisner, P. 139 Reithinger, N. 15,40,58 Revesman, M.E. 139 Revzin, 1.1. 64 Riccabona, M. 21 1,251 Rich,E. 61,81,140 Richard, W.D. 226,251 Richards, F. 169,182 Richards, R. 216,248 Richmond, W. 367 Richmond, W.B. 344,352,367 Rickards, D. 216,250 Rickey, D.W. 227,250,253 Rising, L. 291 Rissland, E. 88,140 Ristad, E.S. 57.64 Ritchie, C.J. 227,251 Rivera, J.M. 21 1,252 Robb, R.A. 238,251 Robertson, D.D. 215,246 Robson, D. 123,138 Rodeck, C.H. 215,246 Roe, D.B. 64 Roelandt, J.R.T.C. 215,216,227,245,251, 252 Rogers, Y. 13,46,53,54, 64 Rold, M.D. 216,250 Romero, B.A. 215,247
AUTHOR INDEX
Rosen, R. 149,182 Rosen, S. 354,367 Rosenblatt, F. 9,64 Rosenblum, L. 223,248 Rosenfield, K. 216,251 Rosenman, J.G. 215,246 Ross, K.W. 321,339 Ross, S. 344,367 Rotello, V. 142, 180,181 Roth, S.F. 3, 13,63 Roth, S.I. 216,250 Rotten, D. 216,251 Rouse, W.B. 139 Rua, P. 216,251 Rubin, J. 227,245 Rubin, J.M. 205,251 Rubin, J.R. 216,247 Rubio, S.H. 238,249 Rudnicky, A.I. 65 Rumbaugh, J. 274,291 Rumelhart, D.E. 29,65 Ruppin, E. 182 Russ, J.C. 221, 251 Russell, S. 29, 65 Sdbella, P. 221,251 Sacerdoti, E.D. 10, 35,36, 37, 61 Sagalowicz, D. 10, 35, 36, 37, 61 Sager, N. 65 Sakamoto, S. 216,244 Sakas, G. 224,226-7,251 Salehi, J.D. 321,340 Salomon, G. 81,140 Salustri, A. 215,251 Sander, T.S. 224,251 Sapin, P.M. 216,248 Sarti, A. 227,251 Satoh, K. 216,244 Saunders, J.E. 21 1,248 Scardamalia, M. 81,137 Schaffer, H. 216,252 Schalkwyk, J. 23, 25,65 Schank 17,65 Schar, R. 58 Schattner, P. 236,247 Schlag, P.M. 216,248 Schmeier, S. 40,59 Schmidt, D. 290,291 Schmucker, K.J. 21,65 Schneider, W. 227,248
379
Schofield, J. 140 Schreyer, L.-A. 226-7,251 Schroder, K.M. 216,248 Schroder, P. 223,248 Schroeder, W. 224,252 Schuster, H. 179, I81 Schwab, T. 77,90,138 Schwartz, R. 62 Schwartz, S.L. 215,233,250,252 Scott, J.C. 13,65 Seidmann, A. 343,344,367 Selby, W. 343,366 Selfridge, 0. 140 Selker, T. 76, 81,86, 123,140 Selzer, H. 227,252 Sermys, P.W. 216,227,245,252 Seward, J.B. 215, 216,244,247,252 Seymour, W. 47,62 Shahabuddin, P. 316,317,318,339 Shannon, C. 9.65 Shao, M.Y.C. 211,216,248 Shapiro, S.C. 18,65 Sharp, D.H. 9,11,22,60 Sharp, H. 13,46,53,54,64 Sharp, J.C. 215,249 Shattuck, D.P. 215,252 Sheehan, F.H. 238,249 Shen, Y . 227,247 Sheppard, R.J. 21 1, 248 Sheremetyeva, S. 14, 65 Sheth, S. 216,247 Shneiderman, B. 46,47,65 Shortliffe, E. 95, 138 Shrinivasan, P.N. 61 Sibley, W.L. 138 Siddle, N.C. 216,244 Silberschatz, A. 311, 312, 321,339 Simon, H. LO, 17,63 Simpson, D.H. 208,245 Sinak, L.J. 215,249 Singer, A. 47,62 Sipper, M. 174, 175, I82 Sitaram, D. 315, 316, 317, 318,339 Siu, S.C. 21 1,252 Sklansky, M. 215,250 Skorton, D.J. 215,250 Slator, B.M. 3, 12, 13, 30, 31,45,63,65 Slaughter, S. 343,365 Sleeman, D. 77,82,140 Slocum, J. 8,9, 10, 15, 19, 35, 36,37,61,65
380
AUTHOR INDEX
Smith, A. 149, I82 Smith, G. 52,65 Smith, I. 50,51,59 Smith, S.W. 208,216,221,245,252,253 Snelbecker, G.E. 78,140 Snyder, J.E. 216,252 Sobierajski,L.M. 238, 248 Sokil-Melgar,J. 227,246 Soloway, E. 80,137 Soukup, J. 292 Soules, G. 58 Sowa, J.F. 30, 38,65 Spark Jones, K. 10, 12, 18,54,61,65 Sparrell, C.J. 62 Spinzer, D. 216,252 Spolsky, B. 60 Srinivasan, R. 238,245 Srinivasan, V. 354,366 Stamey, T.A. 211,252 Standish Group, The 342,367 State, A. 216, 232, 242,247,252 Staudach, A. 216,252 Stauffer, A. 143, 179,181 Steen, E. 227,252 Steiner, H. 2 16,252 Steiner, T.O. 216,251 Stevens, A.L. 77, 132,138 Stickels, K.R. 215,252 Stock, 0 . 6 5 Strandness,D.E., Jr. 211,216,246,248 Strasser, W. 227,246 Straughan, K. 227,248 Strikwerda, S. 227,245 Strong, G. 56,65 Stubblefield,W.A. 21, 62 Sun, Y.N. 226,245 Sundheim, B. 12,59 Suppes, P. 60,78,140 Sussman, G. 84,95,140 Tabbara, M.R. 227,245 Takashi 216,244 Taketani, Y. 215,227,244 Takeuchi, Y. 227,247 Tamayo, P. 182 Taosong, H. 238,248 Taylor, C. 143, I81 Taylor, C.J. 226,245 Taylor, K.J.W. 202,252 Tchuente, M. 146,180
Tector, C. 242,252 Teitelman, W. 135, 140 Tempesti, G. 143, 174, 179,181,182 ter Haar, G. 240,252 ter Haar Romeny, B.M. 190,216,249,250 Terris, M.K. 21 1,252 Tetzlaff, W. 339 Tewari, R. 315,339,340 Thatcher, J. 149, 150, 156, 182 Thiesse, F. 325, 326,338 Thompson, B.H. 10,65 Thompson, F.B. 10,65 Thorisson, K.R. 62 Thune, N. 211,247 Tirole, J. 344,366 Titus, J.P. 8, 66 Tobagi, F.A. 309,340 Toffoli, T. 146, 162, 179,180,183 Tompson, H. 35,58 Tong, S. 21 1,216,246,252 Toumoulin, C. 245 Towsley, D. 321,340 Trahey, G.E. 208,216,221,252 Trapanotto, V. 216,247 Turing, A.M. 29,66 Turnbull, D.H. 216,252 Tyson, J.A. 179,181 Udupa, J. 224,252 Uhlendorf, V. 208,245 Ullman, J. 176,181 Ullman, J.D. 89, 106,139 Urban, G.L. 344,357,366 Van Dam, A. 223,246 van Dijk, D. 216,250 Vander Linden, K. 16,64 Vannier, M.W. 215,246 Vellet, A.D. 216,251 Venkatraman,N. 353,366 Vergo, J. 5 1, 62 Verlande, M. 227,248 Vermeulen, P. 65 Vertelney, L. 89,140 Viergever, M.A. 190,249 Viljamaa, P. 292 Vin, H. 319,324,340 Vin, H.M. 340 Vitinyi, P. 149, 183 Viterbi 20, 65
AUTHOR INDEX Vlissides 257,274,279,285, 287,290,291 Vlissides, J. 292 Vogel, M. 215,252 von Birgelen, C. 216, 227,245,252 Vonesh, M.J. 216,250 Von Herzen, B.P. 227,248 von Kiedrowski, G. 180,183 vonNeumann, J. 142, 146, 149, 156, 173, 174, 183 von R a m , O.T. 208,215,216,221,226,244, 252,253 Vonverk, D. 227,248 Vose, W.F. 238,247 Waibel, A. 15,40,66 Waldron, C.A. 190,246 Walter, S. 224,251 Wang, E.T.G. 343,367 Wang, X.F. 216,253 Wann, L.S. 215,252 Warren, P.S. 216,246,253 Waters, R.C. 86, I40 Watt, A. 227,253 Watt, M. 227,253 Wauchope, K. 40.66 Weaver, W. 7,66 Webber, B.L. 10, 18,61 Weber, G. 216,249 Weber, V. 22,25,66 Wegner 28,45 Weiser, M. 4, 66 Weiss, L. 140 Weiss, N. 58 Weizenbaum, J. 10, 33,42,66 Wells, P.N.T. 191, 202, 203, 208, 240,253 Wells, P.T.N. 202,252 Wermter, S. 22,25,66 Weyman, A.E. 21 1,215,249,252 Whang, S. 344,367 Whinston, A.B. 344,367 White, R.A. 227,245 Whiteside, J. 145 Whiteside, J.A. 47,62 Whittingham, T.A. 216,248 Whtton, M.C. 216,232,242,247,252 Wilensky, R. 37,40,66 Wilks, Y. 17,66 Willcocks, L. 362,366 Willcocks, R. 344,350,366 Williams, K. I83
381
Williamson, O.E. 344,367 Wilner, W. 61 Wilpon, J.G. 57,66 Winborn, R.C. 216,245 Winograd, T. 4 , 5 , 18,21,29, 35, 37,58,66, 69,84,95,140 Winston, P.H. 18,66, 95, 140 Witten, I.H. 70,83, 86,89,140 Wittenburg, K. 61 Wixon, D. 140 Wolf, A.K. 10, 60 Wolff, R.S. 215,253 Wolffram, S. 146,180 Wolfram, S. 135,140, 146,183 Wollschlager, M. 215,247 Wood, S.L. 215,253 Woods, W.A. 10, 17,27,34,35,36, 37, 38,66 Wroblewski, D. 61 Wu, C.H. 216,249 Wu, D. 40,66 Wu, J. I80 Yang, W.H. 226,245 Yankelovich, N. 25,40,52,63 Yao, B.L. 216,249 Ying, G. 61 Yock, P. 216,250 Yongmin, K. 227,251 Yoshitome, E. 227,247 Young, S . 20, 21,61 Young, S.J. 62 Yu, P.S. 301,302, 339 Yu, Y . 334,340 Zagar, B. 227,246 Zagzebski, J.B. 208,253 Zahnd, J. 142,143.182 Zakhor, A. 300,302,319,325,339 Zellermayer, M. 8 1,140 Zhmg, Z.-L. 321,340 Zheng, L.H. 216,253 Zhenyu Guo 227,253 Zimmer, W. 287,292 Zipf, G.K. 44,66 Zissos, A.Y. 70,83, 86,89,140 Zoltowski, C.B. 61 Zosmer, N. 215,253 Zucker, S.W. 224,251 Zue, V.W. 66 Zviran, M. 342,367
This Page Intentionally Left Blank
Subject Index Adaptive frames 92, 9 6 Adaptive help testbed 84-6 Adaptive learning systems 90 Adaptive mechanisms 132 Adaptive user help 86-7, 110-22 architecture 90- 108 Adaptive User Model (AUM) 69,70, 8 1,91, 94,96, 105, 106, 117 Admission control framework based on multiple segmentation 307 AFNS 265 Agents in adaptive teaching interface 133 Alternative multimodal user interface 46 ANACHIES 83,89 Analysis Patterns 290 Anatomic visualization 194 ANIMATE 123 Animated help 123 Animation 233-6 Aortic valve 220 Application program 174 Armchair model 43 Artificial intelligence (AI) 9 ,7 7 adaptive techniques 84 Artificial intelligent computer aided instruction (AICAI) 76 Association of Computational Linguistics (ACL) 8 Attenuation factor 23 1 Augmented context-free grammars (ACFGs) 37-8 Augmented semantic grammars (ASGs) 25, 37-8 Augmented transition networks (ATNs) 34-5 Authoring tools 136 Automatic Language Processing Advisory Council (ALPAC) Report 8 Average waiting time of accepted clients 318 Balanced placement 302 BASEBALL question answering system 10 BASIC buffer replacement algorithm 3 11-12 Basis-sets 99
ADVANCES IN COMPUTERS, VOL. 47
Batching 316 compaiison of policies 317-18 proposed policies 317 system model 316-17 Belousov-Zhabotinskii autocatalytic reaction 179 Bijective mapping 49 Blackboard 95-6 Blood flow 219 data 189 evaluation 213 visualization 202-8, 242 Bound bit 163 Bound field 163 Buffer-bandwidth resource relation 329-30 Buffer cache 309, 310 Buffer manager tasks 3 11 Buffer replacement algorithms 309-12 comparison 3 12 Cardiac cycle 214, 219, 221, 236,242 Cardiac dynanucs 219 Cardiac function 2 12- 13 Cardiac motion 21 3 Carotid artery 203,228 Case grammar representation 18 Catastrophic failure 308 Cellular automata framework 144-6 Cellular automata models 179 self-replicating shxctures 141-83 Cellular space models 144 Character classifying table 102 Chrominance samples 298 Chromosomes 168, 169, 171 Cine-loop visualization 200-1 Clause 176 CNESW neighborhood patterns 169-70 COACH 84-7 adaptive help system 71 advisory agent 70 case study 67-140 development status 122-3 1 evaluation 110-22 expert programmer 75-6
383
Copyright 0 1998 by Academic Press Ltd. All rights of reproduction in any form reserved.
SUBJECT INDEX COACH(continued) explicit dialog 70 for different domains 109-10 future research goals 131-7 future system development 134-7 future work 121-2 implicit dialog 70 in open systems 109 novice programmer 7 1-2 overview 69-70 pilot study 11 1 porting for use in different environments 135-6 quantitative study 111-21 research objectives 70 scenario 70-6 shell 108-10 student programmer 72-4 syntax 106-8 theoretical considerations 86-90 usability improvements 111-21 Web-based 137 window interface design 93 see also Adaptive user help; Adaptive user model (AUM) COACH/2 129, 130,131, 134, 135 Coaching environments 84 Coaching knowledge 92,100-2 Coaching systems 77,81-3 Cognitive Adaptive Computer Help. See COACH Comment sheets 118 Component Display Theory 88 Component field 163 Component video stream sets (CVSS) 307 Components 147, 158, 173 Compounding 220- 1 Compressed MPEG video 295-9 Compressed scalable video, storage and retrieval 293-340 Computational linguistics 17 Computed tomography (CT) 215 Computer aided instruction (CAI) 76-84 Computer interfaces 87 Concatenation of digital recordings 13 Concepts 99 Conceptual design 54 Concurrent multimodal user interface 46 Connectionist approach 22-4 strengths 23
weaknesses 24 Connectionist architectures 13 Consistency rules 92, 100-1 Constant bit rate (CBR) MPEG video 298 Constant bit rate (CBR) video 319 inter-disk data placement of 299-301 Constant data (CD) retrieval 3 19-20,324-6 Constant data length (CDL) retrieval 325 Constant time (CT) retrieval 319-20,322-4 Constructing Turing machines 150 Contrast materials 207-8 Contrast resolution 198-9 Controlled languages 45 Core 1.52 Critic systems 77, 80, 83-4 Crossover 168, 169 CSLU-C architecture 25 Cue cards 125-6 Customer reneging behavior 316-17 Data paths 147, 149, 152 Debuggy 79 Decision custom/package 346 insource/outsource 346 Degradation of service 308 Deictic controls 49 Depth cue enhancement 23 1 Depth gain compensation 195 Depth shading 23 1 Dermatology 190 Description help 89,98 Design patterns 263-7 Design Patterns 257,267,290 Design review 263, 266 Dialog boxes 127 Digital video servers. See High performance digital video servers DISCERN 24-5 Discourse management 14 Discrete Cosine Transform (DCT) algorithm 297-8 Disk failures 307-9 Disk 1/0 313 Disk performance parameters 334 Disk retrieval scheduler 319 Document preparation 15-16 Domain knowledge 92,94,96-100 Doppler shift 202 Drafter system 16
SUBJECT INDEX
Duplex Doppler 202-3 Duplicated program sequences 174-5 Educational goals 78 Electrocardiogram (ECG) 219 Electronic phased-array systems 191 Elevational resolution 197 ELIZA 10,29,33 EMACS 83,89, 112 Emergent behavior 146 EMYCIN 84 Enhanced blood flow visualization 206-8 Eurotra M I project 41 Evaluation 54 Example help 88 Exclusive multimodal user interface 46 Expanding problem solutions 175-8 Experthelp 88 Expert systems 26 shells 84 Explicit semantic representation 37 Exponential depth shading 231 Extended field-of-view imaging 196 Fairness 318 Fault tolerant video storage 307-9 Feedback 81 Fetal development 21 1-12 Fetal face rendering 233, 234 Fetal heart chambers 236 rotation about vertical axis 236 Fetal scan 225 Fields 162 Filtering 220-1 First-come-first-served (FCFS) policy 3 17 Fitness function 170 Fitness measures 169, 171 Frameworks and patterns 284-5 Fully-automatic high-quality translation (FAHQT) 8 Game of Life rules 146, 165 Garden-path sentences 21.36 GAS (Graphical Animation System) 122, 123 General help pane 94 Generalized CDL (GCDL) 325-6 Genetic algorithms 167, 168-71, 173 Genetic operators 168 Genetic search process 168 Glider 146
385
Grammars 27 Grey-scale imaging 191-4 Growth field 163 Growth measure 171 Growth tables 211-12 GRUNDY 81 GUI 123,125, 127, 135, 136 Guides 127-8 GUS 35 Hailing indicator 128-9 Hands in view pattern 262-3 Harmonic imaging 208 Help classification 88-9 Help presentation strategies 110 Help systems 77,80-1 Help taxonomy 88 Hidden Markov models (HMMs) 11,20,28, 41 High definition television (HDTV) 296 High performance digital video servers 293-340 system model 301-2, 320-2 Human-computer dialogue 28 Human-computer interaction (HCI) 1-66, 11, 14,22,43, 76 Hybrid approach to natural language processing (NLP) 24-5 Icon dressing 125 Image plane acquisition 216-17 Implicit semantic representation 36 Inactive state 145 Information technology (IT) services 344 Informational replicating systems 152, 180 Instructional design 77 Instructional techniques 132 Instrumented multilevel parser 102-8 Intelligent computer aided instruction (ICAI) 76 Intelligent multimedia interfaces 46 Intelligent tutoring systems (ITS) 76 Intelligent writing assistants 15- 16 Interaction machine 28 Interactive machine, translation 14-15 Interactive video 337 Interactivity functions of video server 303 Interactivity QoS 302, 306 Inter-disk data placement of constant bit rate video 299-301
386
SUBJECT INDEX
Inter-frame techniques 296 Internet, pattern resources 290 Interval caching 312-15 policy comparison with static policy 315 system model 313-14 Interview tapes 118-20 Intra-cavity transducers 192 Intra-frame techniques 296 Intranet 338 Intra-vascular imaging 190 JANUS-I1 40-1 KEE 84 Knowledge deficits classification 87-8 Knowledge levels, natural language processing (NLP) 30 Kretz Combison 530 scanner 234 Language parse model 107 syntax delimiters 107 Language statements 97 Language syntax 98 Language tokens 99 Lateral resolution 197 Learnable units 92,96-9, 121 Learning by analogy 90 Learning by programming 90 Learning from examples 90 Learning from instruction 90 Least Recently Used (LRU) algorithm 309- 10 LEX 102 Lexical analysis systems 32-4 Lexical level 30 LIFER 10,35,36 Linguistic analysis 54 Linguistic knowledge models 16- 17 Linguistics 5, 42 Lisp 28,40,70,71,76, 86, 87,95, 108, 113-14, 135 Lisp Critic 77 Lisp Tutor 78, 79 Liver images 199 Liver vessels 229 rendering 235 Liver volume acquisition 222 LRU algorithm 3 11-12 Luminance samples 298 LUNAR 10,35-7
Machine Learning 90 Machine translation (MT) 7 federally-sponsored groups 8 interactive 14-15 Magnetic resonance imaging (MRI) 215 Managing End User Conzputingfor Users with Disabilities 6 Mark-Hildreth operator 23 1 Masks 127-8 Maximum and minimum intensity projection (MIP) methods 230,232 Maximum queue length (MQL) policy 317 Mean time to failure (MTTF) 307-8 Menupane 94 Metacognitive questions 8 1 MICRO-PLANNER 84,95 Minimal resource (MR) retrieval 320, 326-9 Mixed initiative interaction 82 M-mode displays 212 M-mode visualization 20 1 Modality integration 49-5 1 Model help 88 Monitors 176, 177 Moore neighborhood 145 Most Recently Used (MRU) algorithm 309-10 Motion compensation 298 Motion Picture Experts Group (MPEG) video compression standard. See MPEG MPEG standard 296-9 MPEG-I standard 296,307,315 MPEG-2 standard 296,298,299,307,319-37 MRU algorithm 311-13 MS-Windows 48 Multi-element phased arrays 193 Multimedia help 133 Multimedia storage system model 3 10-1 1 Multimodal human-computer interaction 54 Multimodal interaction 46-7 Multimodal user interfaces 46 Multiple segmentation (MS) based on scalable MPEG2 video 306 placement of scalable video 304-7 placement strategy, admission control framework 307 Mutation 168, 169 NALIGE 25,38,51-2 Natural language, understanding 28-30 Natural language edit controls (NLECs) 48 Natural language interfaces 13-14, 38
SUBJECT INDEX Natural language processing (NLP) 1-66 application areas 12-16 classification of systems 31-41 computational issues 27-8 connectionist approach 22-4 definition 6 development methodologies 53-4 effective use 49-50 effects on user performance 47-8 engineering phase 9 evolution 9-12 extended definition 4-7 historical background 7-12 hybrid approach to 24-5 knowledge and processing requirements 26-46 knowledge levels 30 motivations 5-7 origins 2 overview 2-3 problem areas 41-2 scientific motivation 5 scope 6-7 stochastic models 19-22 symbolic approach 17- 19 technological motivation 6 theoretical phase 10 user-centered phase 11-12 user interfaces 3-4 Natural language widgets 48-9 NETtalk system 11, 13 Network-connectivity gain 338 Network interface card (NIC) 323 Network transmission scheduler 32 1 Neural networks 11,28 NLDOS 33-4 Non-real time mode 310 Number of rules 155 Oligonucleotides 180 On-line computer teaching 76-84 Open systems 109 Opthamology 190 Optimal multi-description model 44-5 Optimal single-description model 43-4 Output pane 94 Parentheses checking 174 Parity groups 309 Parser function 104-5
387
Parser structure 102 PARSIFAL 35-6 Pattern language 258,263 Patter-ti-Oriented Software Architecture: A System of Parferns 290 Patterns 2.5-92 advantages 282-3 analysis 256-62 analysis of choices 280 and frameworks 284-5 application 267-80 capturing process 285-90 context 257,261 definition 256 developing or extending systems 28 1-2 example 262 forces 259-60 form 257,286 formats 257 in development process 283-4 Internet resources 290 iteration 290 knownuses 262 level 286-8 misapplication 280 name 257 observation 285-6 order processing system 267-80 problem 259 rationale 261 reality check 280-1 refinement 288-9 relationships 288 resulting context 261 sketch/diagram 261 solution 259 PEARL 40 Periodic placement 302-4 Phase shift information 202 Phoneme probability estimator 23 Physical symbol system hypothesis 17 Physiological synchronization 2 19 Placental vessels rendering 234 Planar slices 239 PLANNER 84 Playback requests 3 16 PLOP(Pattern Language of Programming) conference 290 Polycystic kidney, rendering 235 Position measure 171
388
SUBJECT INDEX
Positron emission tomography (PET) 215 Power Doppler imaging 205-6 Pragmatic analysis systems 38-41 Pragmatic level 30 Pre-fetch delay tolerance quality of service (PDT QoS) 329,331-4,336 Presentation rules 92, 101 Probabilistic context-free grammar (PCFG) 20 Problem solutions 175-8 Programmed replicators 173-4 Progressing client 3 11 Prototyping 54 Pulsatile waveform 214 Pulse-echo imaging 191-9 Pulse repetition frequency (PRF) 191 Quality of service (QoS) 302,306, 338 see also Pre-fetch delay tolerance quality of service (PDT QoS) Quantization 297 Quiescent state 144 Range-gate Doppler 202 Ray-casting 227-30 Real-time mode 310 Real-time 2D sonography 215 Reasoning system 94-6 Reduced rule sets 159 Reference help 88 REL 10 Related knowledge 98 Reneging probability 317-18 Replicant measure 171 Replication rules 155, 159 Replication time 155 Replicator rules, automatic programming 167-73 Replicators, emergence of 160-7 Required knowledge 98 Resource reservation 330-3 notation 332 Retrieval constraints 322 Retrieval scheduling 319-37 comparison 330 of MR and CD 336-7 of MR and CT 334-6 notation 322 performance evaluations 333-7 Rotational symmetry 156-60 Rule tables 169, 171
Satisfiability problems (SAT problems) 176, 178 Saved exercise solutions 117 Scalable MF'EG video 298-9 Scalable resolutions 294 Scalable video, progressive display 333 SCAN disk head scheduling algorithm 301 Scanning system 199 Scanning techniques 194
SCREEN 25 Search algorithm 2 I Self-replicating loops 150-60, 163, 177 programming 173-8 Self-replicating molecular systems 180 Self-replicating polyominoe 172-3 Self-replicating systems 167 cellular automata models of 141-83 drive to simplification 149-50 early structures 143-50 emergence 160-73, 178 emergent 165, 167 modeling 179 non-trivial 166 overview 142-3 programmed 179 varying rotational symmetry 156-9 Semantic analysis systems 36-8 Semantic leve1 30 Sequences 167 Shadowing 209 Sheath 152 Sheathed loops 151-3, 157,175 SHRDLU 10,37 Signal compression 200 Signal sequence flow 149 Signal-to-noise ratio 206, 221 Single photon emission computed tomography (SPECT) 215 Slice projection 223-4 Slug trails 123 Software acquisition 341-67 alternative models 354-7 application factors 364 application properties 350-2,362-3 application type 363 application uniqueness 363 approach 345 coding and installation costs 348 contracting costs 349 cost-benefit drivers 346-7
SUBJECT INDEX cost-benefit framework 342-3.345-9 database management system (DBMS) 360 date of installation 353 decision making 346-7 functional area 363 hardware platform 362 hypotheses 349-54 information asymmetries 343 insource/outsource dimension 345 interaction between custom/package and insource/outsource decisions 353-4, 359-60 lack of market prices 343-4 limited observability 344 monitoring costs 348-9 needs analysis costs 347-8 organizational considerations 352, 363 outsourcing 343 overview 342-5 processing mode 362 programming language 360 relationship-specific investments 343 relationships with project characteristics and cost-benefit drivers 35 1 survey data analysis and results 358-64 survey data collection 357-8 system architecture 362 system value (benetits) 347 technological factors 350,360-2, 364 temporal effects 364 Software development 263-7 Sonographic volume imaging 242 SOPHIE 10, 35, 37, 82 Sound 129 Spatial aligning 239 Spatial resolution 196-8 Special field 163 Speckle 208 Speckle pattern distribution 198 Speckle reduction 220- 1 Speech generation systems 12-13 Speech processing systems 13 Speech recognition systems 12-13 Speech understanding systems 12- t 3 SpeechActs 52 Splatting 230 Starmodel 54 Starter help 88 State change replication rules 156 State change rules 156, 159
389
Statement attribute grammar parser 103 Statement knowledge 98 States 144-5 Static policy, comparison with interval caching policy 315 Stereographic viewing 240 Stereoscopic viewing 238 Stochastic models 19-22 strengths and weaknesses 21 Story understanding 14 Streaming RAID scheme 309 Structure motion visualization 199-201 Subject frames 92,97 SUITE 25,26 Surface fitting 224-5, 239 Surface rendering 226 Syllabus design 77 Symbolic linguistic models 19 Synergistic multimodal interfaces with natural language 50-1 Synergistic multimodal user interface 47 Syntactic analysis systems 34-6 Syntactic level 30 Syntactic parser 10 Syntax help 89,98 Synthesis by rule 13 System creation 258-9 System example 98 System knowledge 96 Task analysis 54 TAUM-METE0 15 Teaching feedback 77 Temporal resolution 199-200 Text generation 14 Three-dimensional ultrasound imaging system 217 Time-gain compensation (TGC) 195, 197 Token attribute grammar parser 103 Token help pane 93 Token parse table 103 Touch-screen-based visualization systems 238 Training-wheel tutoring systems 80 Transformational grammars 27 Transition functions 145, 157 Turing machines 26-8, 143, 147, 150 Turing test 29 Tutorial curricula 133-4 Tutoring environments 84 Tutoring research 77-80
390
SUBJECT INDEX
Tutoring systems 77 Ultrasound/acoustic imaging, overview 186-9 Ultrasound grey-scale imaging 187 Ultrasound image formation 189-214 basic acoustics 189-90 Ultrasound tissue 186-9 Ultrasound visualization 185-253 2Dplane 187 3D volume rendered 187 applications 240 area measurement 210-1 1 artifacts 208, 209, 241 flow diagram 188 future developments 242-4 length measurement 208-10 measurement and quantification 208-12 optimization 241 temporal changes 212-14 volume measurement 21 1 Umbilical cord rendering 234 Unbound components 162, 164 Understanding natural language 28-30 Universal constructor 147 UNIX 109, 110 UNIX Consultant (UC) 40 Unsheathed loops 153-7 Update rules 92, 100 Usagedata 96 User examples 96 User interaction pane 94 User interface environment aids 135 User interface management systems (UIMSs) 51-4 User proficiency tracking 89-90 Variable bit rate (VBR) 298,319-37 Vascular imaging 186-9 Velocity Doppler imaging 203-4 Video access frequencies 316 Video communications 294 Video-on-demand (VoD) 294,316,337 Video server 294 Video stream 305 Video stream sets for periodical placement strategy 304 Volume creation 217-18 Volume data 223 Volume data user interfaces 238
Volume editing electronic scapel 237 tools 236-8 Volume filtering 223 Volume imaging 186-9, 189 Volume-rendering algorithm 230 Volume rendering 227-31 high-resolution ultrasound data 23 1-3 Volume sonography 187 future developments 242-4 imaging systems 218 Volume transducer/positioning system 218 Volume ultrasound, data acquisition 216-21, 243 Volume ultrasound visualization clinical applications 240 potential advantages 241 Volume visualization 214-40 algorithms 223-38 areas explored 2 15- 16 artifacts 225 data classification and segmentation 226-7 future developments 242-4 methods 222-40 optimization data display 239-40 physician involvement 239-40 quality 241 Volumetric ultrasound data acquisition 218 von Neumann neighborhood 145 von Neumann’s self-replicating structure 148 von Neumann’s universal computerconstructor 146-9, 156-7, 173 Waveform indices 213-14 Web implementation 136-7 WEST 82 “What-to-How” software spectrum 26 WIMP 46 WIMPt+ 47 Window interface 92-4 Wizard-of-Oz techniques 54 Word processors 16 Writing Partner 81-2 X-Window 48 Yet Another Compiler Compiler (YACC) 102 Zipf distribution 318
Contents of Volumes in This Series
Volume 21 The Web of Computing: Computer Technology as Social Organization ROBKLINGAND WALT SCACCHI Computer Design and Description Languages SUBRATA DASGUPTA Microcomputers: Applications, Problems, and Promise ROBERTC. GAMMILL Query Optimization in Distributed Data Base Systems GIOVANNI MARIASACCO AND s. BING YAO Computers in the World of Chemistry PETERLYKOS Library Automation Systems and Networks JAMESE. RUSH
Volume 22 Legal Protection of Software: A Survey MICHAELC. GEMIGNANI Algorithms for Public Key Cryptosystems: Theory and Applications S. LAKSHMIVARAHAN Software Engineering Environments ANTHONY I. WASSERMAN Principles of Rule-Based Expert Systems BRUCEG. BUCHANAN AND RICHARD 0. DUDA Conceptual Representation of Medical Knowledge for Diagnosis by Computer: MDX and Related Systems B. CHANDRASEKARAN AND SANJAY MITTAL Specification and Implementation of Abstract Data Types ALFST. BERZTISSAND SATlSH THATTE
Volume 23 Supercomputers and VLSI: The Effect of Large-Scale Integration on Computer Architecture LAWRENCE SNYDER Information and Computation J. F. TRAUBAND H. WOZNIAKOWSKI The Mass Impact of Videogame Technology THOMAS A. DEFANTI Developments in Decision Support Systems ROBERT H. BONCZEK, CLYDE W. HOLSAPPLE, AND ANDREW B. WHINSTON Digital Control Systems PETER DORATOAND DANIELPETERSEN
392
CONTENTS OF VOLUMES IN THIS SERIES
International Developments in Information Privacy G.K. GUFTA Parallel Sorting Algorithms S. LAKSHMIVARAHAN, SUDARSHAN K. DHALL,AND LESLIEL. MILLER
Volume 24 Software Effort Estimation and Productivity S. D. CONTE,H. E. DUNSMORE, AND V. Y.SHEN Theoretical Issues Concerning Protection in Operating Systems MICHAELA. HARRISON Developments in Firmware Engineering AND BRUCED. SHRIVER SUsRATA DASGUPTA The Logic of Leaming: A Basis for Pattern Recognition and for Improvement of Performance RANANB. BANERJI The Current State of Language Data Processing PAULL. GARVIN Record? Advances in Information Retrieval: Where Is That /#I*&@$ DONALDH. KRAm The Development of Computer Science Education WnL1.m F. ATCHISON
Volume 25 Accessing Knowledge through Natural Language NICKCERCONE AND GORDON MCCALLA Design Analysis and Performance Evaluation Methodologies for Database Computers STEVEN A. DEMURJIAN, DAVID K. HSIAO,AND PAULAR. STRAWSER Partitioning of Massive/Real-Time Programs for Parallel Processing I. LEE, N. PRYWES, AND B. SZYMANSKI Computers in High-Energy Physics MICHAELMETCALF Social Dimensions of Office Automation ABBEMOWSHOWTZ
Volume 26 The Explicit Support of Human Reasoning in Decision Support Systems AMITAVADUTTA Unary Processing AND C. O’TOOLE W. J. POPPELBAUM, A. DOLLAS,J. B. GLICKMAN, Parallel Algorithms for Some Computational Problems ABHAMOITRAAND s. SITHARAMA IYENGAR Multistage Interconnection Networks for Multiprocessor Systems S. C. KOTHARI Fault-Tolerant Computing WING N. TOY Techniques and Issues in Testing and Validation of VLSI Systems H.K. REGHBATI
393
CONTENTS OF VOLUMES IN THIS SERIES
Software Testing and Verification LEEJ. WHITE Issues in the Development of Large, Dishibuted, and Reliable Software c . v . RAMAMOORTHY, ATUL PRAKASH, VIJAY GARG,TSUNEO YAMAURA, AND W
A h I BHIDE
Volume 27 Military Information Processing JAMESSTARKDRAPER Multidimensional Data Structures: Review and Outlook S. SITHARAMA IYENGAR, R. L. KASHYAP, V. K. VAISHNAVI, AND N. S. V. RAO Distributed Data Allocation Strategies ALANR.HEVNER AND ARUNARAO A Reference Model for Mass Storage Systems STEPHENW. MILLER Computers in the Health Sciences KEVINC. O’KANE Computer Vision AZRIELROSENFELD Supercomputer Performance: The Theory, Practice, and Results OLAFM. LUBECK Computer Science and Information Technology in the People’s Republic of China: The Emergence of Connectivity JOHNH. MAIER
Volume 28 The Structure of Design Processes SUBRATA DASGUPTA Fuzzy Sets and Their Applications to Artificial Intelligence ABRAHAM KANDELAND MORDECHAY SCHNEIDER Parallel Architecture for Database Systems M. H. EICH,AND B. SHIRAZI A. R. HURSON,L. L. MILLER,S. H. PAKZAD, Optical and Optoelectronic Computing MIR MOJTABAMIRSALEHI, MUSTAFAA. G. ABUSHAGUR, AND H. JOHN CAULFIELD Management Intelligence Systems MANFREDKOCHEN
Volume 29 Models of Multilevel Computer Security JONATHAN K. MILLEN Evaluation, Description, and Invention: Paradigms for Human-Computer Interaction JOHNM. CARROLL Protocol Engineering MtNG T. LIU Computer Chess: Ten Years of Significant Progress MONROENEWBORN Soviet Computing in the 1980s RICHARD w. JUDYAND ROBERTw. CLOUGH
CONTENTS OF VOLUMES IN THIS SERIES
394
Volume 30 Specialized Parallel Architectures for Textual Databases A. R. HURSON, L. L. MILLER, S . H. PAKZAD,AND JIA-BINGCHENC Database Design and Performance M A R K L. GILLENSON Software Reliability ANTHONY IANNINO AND JOHND. MUSA Cryptography Based Data Security GEORGEJ. DAVIDAAND YVO DESMEDT Soviet Computing in the 1980s: A Survey of the Software and its Applications RICHARDw.JUDYAND ROBERTW. CLOUGH
Volume 31 Command and Control Information Systems Engineering: Progress and Prospects STEPHEN J. ANDRIOLE Perceptual Models for Automatic Speech Recognition Systems &NATO DEMON,MATHEW J. PALAKAL, AND PIER0 COSI Availability and Reliability Modeling for Computer Systems DAVIDI. HEIMANN, NITIN MITTAL,AND KISHORS. "RIVED1 Molecular Computing MICHAELCONRAD Foundations of Information Science ANTHONY DEBONS
Volume 32 Computer-Aided Logic Synthesis for VLSI Chips SAEiURO
MUROGA
Sensor-Driven Intelligent Robotics MOHANM. TRNEDIAND CHUXrN CHEN Multidatabase Systems: An Advanced Concept in Handling Distributed Data A. R. HURSONAND M. W. BRIGHT Models of the Mind and Machine: Information Flow and Control between Humans and Computers KENT L. NORMAN Computerized Voting ROYG. SALTMAN
Volume 33 Reusable Software Components BRUCEW. WEIDE, WILLIAM F. OGDEN, AND STUART H. ZWEBEN Object-Oriented Modeling and Discrete-Event Simulation BERNARD P. ZIECLER Human-Factors Issues in Dialog Design THIAGARAJAN PALANIVEL AND MARTIN HELANDER Neurocomputing Formalisms for Computational Learning and Machine Intelligence S. GULATI, J. BARHEN, AND S. s. IYENGAR
CONTENTS OF VOLUMES IN THIS SERIES
Visualization in Scientific Computing THOMAS A. DEFANTI AND MAXINE D. BROWN
Volume 34 An Assessment and Analysis of Software Reuse TED J. BIGCERSTAFF Multisensory Computer Vision N. NANDHAKUMAR AND J. K. AGGARWAL Parallel Computer Architectures RALPH DUNCAN Content- Addressable and Associative Memory LAWRENCE CHISVINAND R. JAMESDUCKWORTH Image Database Management WILLIAM I. GROSKYAND RAJIVW H R O T R A Paradigmatic Influences on Information Systems Development Methodologies: Evolution and Conceptual Advances RLJDY HIRSCHHEIM AND HEINZK. KLEIN
Volume 35 Conceptual and Logical Design of Relational Databases S. B. NAVATHE AND G. PERNUL Computational Approaches for Tactile Information Processing and Analysis HRISHIKESH P.GADAGKAR AND MOHANM. TRIVEDI Object-Oriented System Development Methods ALANR. HEVNER Reverse Engineering JAMESH. CROSS 11, ELLIOTJ. CHIKOFSKY, AND CHARLES H. MAY,JR. Multiprocessing CHARLESJ. FLECKENSTEIN, D. H. GILL,DAVIDHEMMENDINGER, C. L. MCCREARY, JOHN D. MCGREGOR,ROY P. PARGAS, ARTHURM. RIEHL,AND VIRGIL wALLEN"E The Landscape of International Computing EDWARDM. ROCHE, SEYMOUR E. GOODMAN, AND HSINCHUN CHEN
Volume 36 Zero Defect Software: Cleanroom Engineering HARLAN D. MILLS Role of Verification in the Software Specification Process MARVINV. ZELKOWITZ Computer Applications in Music Composition and Research GARY E. WITTLICH,ERICJ. ISAACSON, AND JEFFREYE. HASS Artificial Neural Networks in Control Applications V. VEMURI Developments in Uncertainty-Based Information GEORGEJ. KLIR Human Factors in Human-Computer System Design MARYCAROL DAYAND SUSAN J. BOYCE
395
396
CONTENTS OF VOLUMES IN THIS SERIES
Volume 37 Approaches to Automatic Programming C. WATERS CHARLESRICHAND RICHARD Digital Signal Processing STEPHENA. DYERAND BRIANK. HARMS Neural Networks for Pattern Recognition S. C. KOTHARI AND HEEKUCK OH Experiments in Computational Heuristics and Their Lessons for Software and Knowledge Engineering JURGNIEVERGELT High-Level Synthesis of Digital Circuits GIOVANNI DE MICHELI Issues in Dataflow Computing BENLEEAND A. R. HURSON A Sociological History of the Neural Network Controversy MXEL OLAZARAN
Volume 38 Database Security G ~ H E PERNUL R
Functional Representation and Causal Processes B. CHANDRASEKARAN Computer-Based Medical Systems JOHNM. LONG Algorithm-Specific Parallel Processing with Linear Processor Arrays W. WAH, WEIJASHANG,AND KUMARN. GANNATHY JOSEA. B. FORTES,BENJAMIN Information as a Commodity: Assessment of Market Value ABBE MOWSHOWTZ
Volume 39 Maintenance and Evolution of Software Products ANNELIESEVON MAYRHAUSER Software Measurement: A Decision-Process Approach WARRENHARRISON Active Databases: Concepts and Design Support THOMASA. MUECK Operating Systems Enhancements for Distributed Shared Memory VIRGINIALo The Social Design of Worklife with Computers and Networks: A Natural Systems Perspective ROBKLLNG AND TOM JEWETT
Volume 40 Program Understanding: Models and Experiments AND A. M. VANS A. VON MAYMAUSER Software Prototyping ALANM. DAVIS
CONTENTS OF VOLUMES IN THIS SERIES
397
Rapid Prototyping of Microelectronic Systems AND J. D. STERLING BABCWK A ~ ~ S T O LDOLLAS OS Cache Coherence in Multiprocessors: A Survey MAZ" S . YOUSIF, M. J. THAZHUTHAVEETIL, AND C. R. DAS The Adequacy of Office Models JOEYF. GEORGE,OLIVIAR. LIU SHENG,AND JAYF. NUNAMAKER CHANDRAS. AMARAVADI,
Volume 41 Directions in Software Process Research AND MARTINVERLAGE H. DIETER ROMBACH The Expenence Factory and Its Relationship to Other Quality Approaches VICTOR R. BASILJ CASE Adoption: A Process, Not an Event JOCKA. RADER On the Necessary Conditions for the Composition of Integrated Software Engineering Environments DAVIDJ. CARNEYAND ALANW. BROWN Software Quality, Software Process, and Software Testing DICKHAMLET Advances in Benchmarking Techniques: New Standards and Quantitative Metrics THOMAS CONTE AND WEN-MEIW. HWU An Evolutionary Path for Transaction Processing Systems LEFF,AND SHU-WEI, F. CHEN CARLTON PV,AVRAHAM
Volume 42 Nonfunctional Requirements of Real-Time Systems TEREZA G. KIRNERAND ALANM. DAVIS A Review of Software Inspections ADAMPORTER, HARVEYsly,AND LAWRENCE VOTTA Advances in Software Reliability Engineering JOHN D. MUSAAND WILLA EHRLICH Network Interconnection and Protocol Conversion MINGT. LIU A Universal Model of Legged Locomotion Gaits S. T. VENKATARAMAN
Volume 43 Program Slicing DAVIDw.BINKLEYAND KElTH BRIANGALLAGHER Language Features for the Interconnection of Software Components RENATEMOTSCHNIG-~RIK AND ROLAND T. MITERMEIR Using Model Checking to Analyze Requirements and Designs JOANNE ATLEE,MARSHA CHECHIK, AND JOHNGANNON Information Technology and Productivity: A Review of the Literature ERIKBRYNJOLFSSON AND SHINKYU YANG The Complexity of Problems WILLIAM GASARCH
398
CONTENTS OF VOLUMES IN THIS SERIES
3-D Computer Vision Using Structured Light: Design, Calibration, and Implementation Issues FRED w. DEPIEROAND MOHANM. "IUVEDI
Volume 44 Managing the Risks in Information Systems and Technology (IT) ROBERTN. CHAREWE Software Cost Estimation: A Review of Models, Process and Practice AONA WALKERDEN AND ROSS JEFFERY Experimentation in Software Engineering SHARILAWRENCE FFLEEGER Parallel Computer Construction Outside the United States RALPHDUNCAN Control of Information Distribution and Access h L F HAUSER Asynchronous Transfer Mode: An Engineering Network Standard for High Speed Communications RONALDJ. V E ~ R Communication Complexity EYALKUSHILEVITZ
Volume 45 Control in Multi-threaded Information Systems PABLO A. STRAUB AND CARLOS A. HURTADO Parallelization of DOALL and DOACROSS Loops-a Survey A. R. HURSON,JOFORDT. LIM, KRISHNA M. KAVI,AND BENLEE Programming Irregular Applications: Runtime Support, Compilation and Tools JOELSALTZ, GAGANAGRAWAL, CHIALIN CHANG, RAJA DAS,GUY EDJLALI, PAUL HAVLAK, YUAN-SHIN HWANG, BONGKIMOON,RAVI PONNUSAMY, SHAMIK SHARMA, ALANSUSSMAN AND MUSTAFAUYSAL Optimization Via Evolutionary Processes SRILATARAhlAN AND L. M. PATNAIK Software Reliability and Readiness Assessment Based on the Non-homogeneous Poisson Process AMRITL. GOELAND KUNE-ZANG YANC Computer-supported Cooperative Work and Groupware JONATHAN GRUDIN AND STEVEN E. POLTROCK Technology and Schools GLENL. BULL
Volume 46 Software Process Appraisal and Improvement: Models and Standards MARKC. PAULK A Software Process Engineering Framework JYRKIKONTIO Gaining Business Value from IT Investments PAMELASIMMONS Reliability Measurement, Analysis, and Improvement for Large Software Systems JEFF RAN
CONTENTS OF VOLUMES IN THIS SERIES
Role-based Access Control RAVISANDHU Multithreaded Systems KRISHNA M. KAVI,BEN LEEAND ALLIR. HURSON Coordination Models and Languages GEORGE A. PAF'ADOPOULOS AND FARHAD ARBAB Multidisciplinary Problem Solving Environments for Computational Science ELIAS N. HOUSTIS, JOHNR. RICE AND NARENRAMAKRISHNAN
Volume 47 Natural Language Processing: A Human-Computer Interaction Perspective BILLMANARIS Cognitive Adaptive Computer Help (COACH): A Case Study EDWIN J. SELKER Cellular Automata Models of Self-replicating Systems JAMESA. REGGIA,HUI-HSIENCHOU, AND JASOND. LOHN Ultrasound Visualization THOMAS R. NELSON Patterns and System Development BRANDON GOLDFEDDER High Performance Digital Video Servers: Storage and Retrieval of Compressed Scalable Video SEUNGYUP PAEK AND SHIH-FU CHANG Software Acquisition: The Custom/Package and Insource/Outsource Dimensions PAUL NELSON,B R A H A M SEIDMANN, AND WILLIAM RICHMOND
399
I S B N 0-12-012147-6